Behavioural Processes 140 (2017) 19–32
Contents lists available at ScienceDirect
Behavioural Processes journal homepage: www.elsevier.com/locate/behavproc
Some properties of an adjusting-magnitude schedule of reinforcement: Implications for models of choice
MARK
C.M. Bradshaw1 Division of Psychiatry, University of Nottingham, UK
A R T I C L E I N F O
A B S T R A C T
Keywords: Adjusting-magnitude schedule Choice Fourier transform Multiplicative hyperbolic model Reinforcer size Risk sensitivity
Rats were trained under a discrete-trials adjusting-magnitude schedule in which a response on lever A delivered either a larger or a smaller reinforcer (qA1 and qA2) with equal probability, while a response on B delivered a reinforcer whose size qB was adjusted according to the rats’ choices. When A was preferred in a given block of trials, qB was increased in the following block; when B was preferred, qB was reduced in the following block. The oscillating changes in qB, analysed by the Fourier transform, could be described by a power spectrum with a dominant period of about 50 trial blocks. With qA1 held constant, the equilibrium value of qB (qB(50)) was monotonically related to qA2, and exceeded the arithmetic mean of qA1 and qA2 when qA1 was substantially greater than qA2. A model derived from the multiplicative model of intertemporal choice provided a post hoc description of the data. Simulation of block-by-block changes in qB derived from the model were generally consistent with the experimental data. Implications of the results for models of risky choice and for future use of the schedule in neurobehavioural experiments are discussed.
1. Introduction Adjusting schedules have made an important contribution to the analysis of intertemporal choice − i.e., choice between reinforcers that differ with respect to the delays that precede their delivery. Mazur’s (1987) adjusting-delay schedule is a prime example of this approach. In this schedule, the subject undergoes a series of trials in which choices are made between two reinforcers, A and B. The sizes of the reinforcers (qA and qB) and the delay to the smaller reinforcer (dA) are fixed, while the delay to the larger reinforcer (dB) is varied in accordance with the subject’s choices. When the subject shows preference for the larger reinforcer, the delay to that reinforcer is increased; when preference shifts to the smaller reinforcer, the delay to the larger reinforcer is reduced. After extended training on this schedule, a quasi-stable value of dB is attained, which is taken as the indifference delay, dB(50). The indifference delay has an important status in theoretical models of intertemporal choice. These models generally assume that experience of the alternative outcomes in a choice situation confers a psychological ‘value’ on each outcome (VA and VB), which may be defined in terms of quantitative features of the outcome (delay, size, etc.) using theoretical functions specified by the model. It is generally agreed that behavioural indifference between the two reinforcers signifies equality of their values. According to Mazur’s (1987) hyperbolic model of delay discounting
1
(hereafter referred to as the standard hyperbolic model, SHM), the value of a delayed reinforcer is determined by its size and the delay that precedes its delivery:
V=
q , 1 + K. d
(1)
where K is a constant of delay discounting, the hypothetical process whereby the value of a reinforcer declines as a function of delay. As originally stated, SHM identified the immediate value of a reinforcer with its size defined in physical units, as is implied by Eq. (1). (i.e., in the absence of a delay, V = q). However there has been a growing realization that the relationship between V and q may not be a linear one (Green and Myerson, 2004; Ho et al., 1999; Killeen, 2011; Locey and Dallery, 2009; Mazur and Bondi, 2009; Mazur and Herrnstein, 1988; Myerson and Green, 1995; Wogar et al., 1992). A body of the evidence points towards a negatively accelerated function, concordant with the economic doctrine of diminishing marginal utility (see Appendix A2). For example, according to the multiplicative hyperbolic model of intertemporal choice (MHM: Ho et al., 1999), the overall value of an outcome is determined by the product of two hyperbolic functions that represent the impact of reinforcer size and delay:
V=
E-mail address:
[email protected]. The author, who has retired, holds an associate position at the University of Nottingham.
http://dx.doi.org/10.1016/j.beproc.2017.03.021 Received 31 January 2017; Received in revised form 24 March 2017; Accepted 28 March 2017 Available online 02 April 2017 0376-6357/ © 2017 Elsevier B.V. All rights reserved.
1 1 ⋅ , 1 + K . d 1 + Q /q
(2)
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
investigating the neural mechanisms of choice (da Costa Araújo et al., 2010; McClure et al., 2014; Moschak and Mitchell, 2014). However little is known about the quantitative features of behaviour maintained by this schedule.
where Q is a size-sensitivity parameter. The representation of delayand size-sensitivity by distinct parameters in Eq. (2) is consistent with a growing body of evidence indicating that the contributions of sizesensitivity and delay-sensitivity to overall reinforcer value are mediated by distinct neural mechanisms (for reviews, see Valencia-Torres et al., 2013; Body et al., 2017). Eq. (2) draws attention to a significant methodological problem in studies of intertemporal choice. Since most intertemporal choice schedules entail choice between reinforcers that differ in both size and delay, it is often difficult to establish whether a change in preference induced by a neurobiological intervention has been brought about by a change in sensitivity to delay (represented by K), a change in sensitivity to size (represented by Q), or both (see Valencia-Torres et al., 2013). Two approaches have been adopted to dissect the influences of reinforcer size and delay on intertemporal choice. The first of these involves the application of null, or indifference equations. SHM and MHM define the point of indifference between two reinforcers A and B (VB = VA) as
Vi (B) Vi (A) = , 1 + K . dB (50) 1 + K . dA
2. Experiment 2.1. Introduction A consistent finding obtained with Mazur’s (1984) adjusting-delay schedule is that the indifference delay dB(50) is reliably shorter than the arithmetic mean of the two fixed delays, dA1 and dA2 (Mazur, 1984; Locey and Dallery, 2009; da Costa Araújo et al., 2010). This is predicted by SHM. Assuming that the probabilities associated with the two delays to A are equal (p(dA1) = p(dA2) = 0.5), indifference is defined thus:
⎤ ⎡ qB qA qA = 0.5 ⎢ + ⎥. ⎣ 1 + K . dA1 1 + K . dB (50) 1 + K . dA2 ⎦
Since qA = qB, the qs cancel out of the equation and solving for dB(50) yields
(3)
dB (50) =
where Vi(A) and Vi(B) are the immediate values of A and B. Solving for dB(50) yields the following linear indifference function:
dB (50)
⎡ Vi (B) ⎤ ⎤ 1 ⎡ Vi (B) ⎥ = . ⎢ − 1⎥ + dA . ⎢ K ⎣ Vi (A) ⎣ Vi (A) ⎦ ⎦
⎡ 1 + Q /q ⎤ 1 ⎡ Q / qA − Q / qB ⎤ A ⎥ + dA⋅⎢ ⎥: ⋅⎢ K ⎣ 1 + Q / qB ⎦ ⎣ 1 + Q / qB ⎦
dAm + K . dA1. dA2 , 1 + K . dAm
(6a)
where dAm is the arithmetic mean of the fixed delays to reinforcer A. Although not intuitively obvious, substitution of real numbers into Eq. (6a) reveals that, so long as dA1 ≠ dA2, dB(50) < dAm (see Mazur, 1984; for a graphical explanation). SHM provides a basis for deriving an expression for qB(50) in the adjusting-magnitude schedule. Adapting Eq. (3) and cancelling out the zero delays yields
(4)
(Mazur, 1987). Using the hyperbolic definition of Vi proposed by MHM, Eq. (4) may be expanded thus:
dB (50) =
(6)
qB(50) = 0.5. (qA1 + qA2), or qB(50) = qAm,
(5)
(7)
where qAm is the arithmetic mean of the fixed sizes of reinforcer A. MHM may be also be applied to the adjusting-magnitude schedule, but in this case the predicted value of qB(50) is less than qAm. Substituting the hyperbolic size/value function specified by MHM for the reinforcer sizes in Eq. (7) yields
(Ho et al., 1999). Q may be estimated by substitution of the values of qA and qB into the recovered value of the slope, and K may be derived using the formula [slope-1]/intercept. Eq. (5) has been applied in a number of experiments investigating the effects of cerebral interventions on intertemporal choice (see Body et al., 2017). However, a significant practical drawback of this approach is the inordinate length of time needed to obtain a family of steady-state values of dB(50) corresponding to a sufficient range of values of dA to allow reliable derivation of the parameters of Eq. (5). An alternative means of examining the separate contributions of delay- and size-sensitivity to intertemporal choice is to devise schedules in which the alternative outcomes differ in delay or size, but not both. Another adjusting-delay schedule devised by Mazur (1984) is an examplar of this approach. In this schedule, a response on operandum A delivers a reinforcer of size qA after a variable delay (dA1 or dA2, with equal probability), whereas a response on B delivers a reinforcer of the same size after a delay dB that is adjusted in accordance with the subject’s choices. Since qA = qB, the effect of an intervention on dB(50) obtained using this schedule cannot readily be attributed to a change in sensitivity to reinforcer size. da Costa Araújo et al. (2010) used an adjusting-magnitude schedule, based on the design of Mazur’s (1984) adjusting-delay schedule, to assess the sensitivity of choice to different magnitudes of reinforcement. Responses on A delivered a liquid reinforcer of variable volume (qA1 or qA2, with equal probability), whereas a response on B delivered a reinforcer whose volume (qB) was adjusted in accordance with the subject’s choices. The quasi-equilibrium value of qB was taken as the ‘indifference magnitude’, qB(50). da Costa Araújo et al. argued that since there were no scheduled delays to reinforcement (dA = dB ≈ 0), the effect of an intervention on qB(50) could not be attributed to a change in delay discounting. The adjusting-magnitude schedule described by da Costa Araújo et al. (2010) has been used in a small number of experiments
⎡ ⎤ 1 1 1 ⎥. = 0.5. ⎢ + 1 + Q / qB (50) 1 + Q / qA2 ⎦ ⎣ 1 + Q / qA1
(8)
Solving for qB(50),
qB (50) =
Q . qAm + qA1. qA2 Q + qAm
(8a)
Eq. (8a) stipulates that, so long as qA1 ≠ qA2, qB(50) < qAm, the discrepancy between qB(50) and qAm increasing as a function of the disparity between qA1 and qA2. This is apparent from Fig. 1, which shows plots of Eq. (8a) for two values of Q. One aim of the present experiment was to examine whether performance on the adjustingmagnitude schedule would conform to this expectation. A second aim was to examine the oscillations of qB during extended exposure to the schedule using the Fourier transform. This method produces a power spectrum of oscillations, from which the power within the dominant frequency band and the period corresponding to the dominant frequency may be determined (see Appendix A1). Valencia-Torres et al. (2011, 2012) used this approach to analyse the oscillations of dB in adjusting-delay schedules, and proposed a model based on MHM to describe these oscillations. In the present experiment, this approach was adapted to analyse the oscillations of qB. 2.2. Method 2.2.1. Subjects Eight female outbred hooded rats aged approximately six months and weighing 250–300 g were housed under a constant cycle of 12 h 20
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
The chamber remained in darkness until the start of the next trial. If no lever-press occurred within 5 s of the lever(s) being inserted, the lever (s) were retracted and the light was extinguished. Responses on lever A resulted in delivery of a reinforcer of one of two magnitudes, qA1 or qA2, with equal probability of occurrence (p = 0.5 in each case), whereas a response on lever B produced a reinforcer of variable magnitude, qB. In each block of trials the value of qB was determined according to the proportion of responses made on the two levers in the free-choice trials of the previous block. If lever A was chosen in three or four free-choice trials of block n, qB was increased by 20% in block n + 1; if lever B was chosen in three or four free-choice trials of block n, qB was reduced by 20% in block n + 1; if levers A and B were each chosen in two freechoice trials in block n, qB remained unchanged in block n + 1. The value of qB in the first block of each session was determined in the same way by the choices made in the final block of the previous session. The value of qA1 was maintained at 64 μl throughout the experiment; the value of qA2 was varied across phases of the experiment. In phase 1 (40 sessions) qA2 was 16 μl; in the remaining phases (25 sessions each) it was 4, 8, 32 and 64 μl, these values being presented in order of increasing magnitude for half the rats and in decreasing magnitude for the other half. The values of qAm (in μl) were thus 34 (qA2 = 4), 36 (qA2 = 8), 40 (qA2 = 16), 48 (qA2 = 32) and 64 (qA2 = 64). The positions of levers A and B (left and right) were counterbalanced across rats.
Fig. 1. Predicted relation between the indifference magnitude of reinforcer B, qB(50), and the arithmetic mean magnitude of reinforcer A, qAm (axes are log10-transformed). Broken diagonal line shows the prediction of SHM (Eq. (7)), where qB(50) = qAm. The continuous lines show the predictions of MHM (Eq. (8a)) for two values of the size-sensitivity parameter, Q.
light and 12 h darkness (light on 0600‐1800 h). Throughout the experiment they were maintained at 85% of their initial free-feeding body weights by providing a limited amount of standard rodent diet (Special Diet Services, UK: MP3 pellets) after each experimental session. Tap water was freely available in the home cages.
2.2.4. Data analysis 2.2.4.1. Indifference magnitude. The 800 trial blocks of phase 1 were divided into three 256-block segments (separated by 16-block gaps). The mean values of qB were compared across the three segments by one-factor analysis of variance (segment) with repeated measures. The mean value of qB in the final 256 blocks of each phase was taken as the indifference magnitude, qB(50). The values of qB(50) were first analysed by a two-factor analysis of variance (condition [i.e. value of qAm] × order of condition) with repeated measures on the former factor. As this analysis showed no significant main effect of order and no significant order × condition interaction [F < 1 in both cases], the ‘order’ factor was ignored, and the values of qB(50) were subjected to a one-factor analysis of variance (value of qAm) with repeated measures. Planned comparisons were made between the empirical values of qB(50) and the mean reinforcer size qAm using Student’s t-test.
2.2.2. Apparatus The rats were trained in standard operant conditioning chambers (CeNeS Ltd, Cambridge, UK) of internal dimensions 25 × 25 × 22 cm. One wall of the chamber contained a central recess covered by a hinged clear Perspex flap, into which a peristaltic pump could deliver a 0.6 M sucrose solution. Two apertures situated 5 cm above and 2.5 cm to either side of the recess allowed insertion of motorized retractable levers CeNeS Ltd, Cambridge, UK) into the chamber. The levers could be depressed by a force of approximately 0.2 N. The chamber was enclosed in a sound-attenuating chest with additional masking noise generated by a rotary fan. An Acorn microcomputer programmed in Arachnid BASIC (CeNeS Ltd, Cambridge, UK) controlled the schedules and recorded the behavioural data.
2.2.4.2. Oscillation of qB. The block by block changes in qB were analysed by the method used by Valencia-Torres et al. (2011) to analyse fluctuations in the indifference delay in the adjusting-delay schedule. The method is illustrated in Fig. 2. Plots were obtained of qB vs. blocks of trials in 256-block segments (upper panel). These data were log-transformed and expressed as deviations from the mean value of log qB (middle panel). The transformed data were subjected to a Fourier transform (Microsoft Excel) in order to derive power spectra of the oscillations of qB (power vs frequency: see lower panel). These spectra comprised 15 bands within the frequency range 0.004 (period = 250 blocks) to 0.0625 (period = 16 blocks). The total power of each spectrum, the power within the dominant frequency band and the period corresponding to the dominant frequency (i.e. the reciprocal of the frequency, in blocks) were derived for each rat in the three 256-block segments of phase 1 and in the last 256 blocks of each phase of the experiment (see da Costa Araújo et al., 2009, 2010; Valencia-Torres, et al., 2011). Comparisons between the values of these parameters across conditions were carried out by repeated-measures analyses of variance, followed, in the case of a significant F-ratio, by post hoc comparisons with the Fisher = Hayter MLSD test (Hayter, 1986). A significance criterion of p < 0.05 (two tailed) was adopted in all statistical tests, and η2p was used as the measure of effect size.
2.2.3. Behavioural training Experimental sessions took place at the same time each day, seven days a week, during the light phase of the daily cycle. At the start of the experiment the food deprivation regimen was introduced and the rats were gradually reduced to 85% of their free‐feeding body weights. They were then trained to press two levers (A and B) for the sucrose reinforcer (50 μl, 0.6 M), and were exposed to a discrete-trials continuous reinforcement schedule in which the two levers were presented in random sequence for three sessions. Then they underwent daily 48-min training sessions under the discrete-trials adjusting-magnitude schedule for the remainder of the experiment. The schedule was similar to that described by da Costa Araújo et al. (2010). Each session consisted of 20 blocks of eight trials, each trial being 18 s in duration. Four trials of each block were forced-choice trials in which each lever was presented alone in random sequence. The other four trials were free-choice trials in which both levers were presented. The beginning of each trial was signalled by illumination of a light above the reinforcer recess. After 2.5 s the lever or levers (depending on the type of trial) were inserted into the chamber. When a lever-press occurred, the lever(s) were withdrawn, the light was extinguished and the reinforcer was delivered. 21
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
Fig. 3. Mean adjusting magnitude of reinforcer B (qB, μl) obtained in the three 256-block segments of Phase 1 (qA2 = 16 μl, qAm = 40 μl). Data from individual rats are indicated by connected circles; group mean data ( ± s.e.m.) are shown by the large filled circles connected by the thick continuous line. The horizontal broken line indicates qAm.
Fig. 2. Example of Fourier transform of one rat’s qB data from the final 256 trial blocks of Phase 1 (reinforcer, 0.6 M sucrose; qA1 = 64 μl, qA2 = 16 μl). Top panel: raw data (qB expressed in μl). Middle panel: transformed data (qB expressed as deviations of the log10transformed data from the overall mean of the log10-transformed values, indicated by the horizontal line). Lower panel: Power spectrum, derived by Fourier transform, for the data shown in the middle panel (ordinate: power in arbitrary units; abscissa: frequency in blocks−1); the spectral power and the period corresponding to the dominant frequency band (blocks) are shown in the inset.
Fig. 4. Relation between indifference magnitude of reinforcer B (qB(50), μl) and the mean magnitude of reinforcer A (qAm, μl). Points show group mean data ( ± s.e.m.) from the final 256 blocks of each phase of the experiment. Diagonal line plots qB(50) = qAm. Significant deviations of the data points from the diagonal: * p < 0.05 (see text for details).
2.3. Results 2.3.1. Indifference magnitude The group mean values of qB obtained from the three 256-block segments of phase 1, in which the values of qA1 and qA2 were 64 and 16 μl (qAm = 40 μl) are shown in Fig. 3. Apart from one rat which showed an increase in qB from the first to the second segment, there was no apparent trend for qB either to increase or to decline across the three segments. In keeping with this impression, analysis of variance revealed no significant effect of segment on the value of qB [F(2,21) = 0.15, p = 0.858, η2p = 0.007]. The group mean values of qB(50) ( ± s.e.m.) derived from the final 256 blocks of each phase are shown in Fig. 4. qB(50) increased as a function of qAm. Analysis of variance revealed a significant effect of the value of qAm [F(4,35) = 6.6, p = 0.0004, η2p = 0.108]. Planned comparisons between each value of qB(50) and the corresponding value of qAm revealed significant differences in the case of the three lowest values of qAm [34 μl: p = 0.014; 36 μl: p = 0.026; 40 μl: p = 0.021), but not in the case of the two highest values (48 μl: p = 0.247; 64 μl: p = 0.076].
Visual inspection suggests that the oscillations of qB diminished in amplitude as training progressed. This impression is substantiated by the Fourier analysis, the results of which are summarized in Fig. 6. The total power and the power in the dominant frequency band declined across the three segments, the greatest change occurring between the first and second segments [total power: F(2,14) = 23.2, p < 0.0001, η2p = 0.769; power in dominant frequency band: F(2,14) = 71.2, p < 0.0001, η2p = 0.910]. Post hoc comparisons revealed significant differences (p < 0.05) between segments 1 and 2 and between segments 1 and 3, but not between segments 2 and 3. Similarly, power in the dominant frequency band differed significantly between segments 1 and 2 and between segments 1 and 3, but not between segments 2 and 3. There was no significant change in the period corresponding to the dominant frequency across the three segments [F (2,14) = 1.0, p = 0.389, η2p = 0.063]. The parameters of the power spectra derived from the final 256 blocks of the five conditions are shown in Fig. 7. Total power showed no systematic variation across the five values of qAm [F(4,35) = 0.79, p = 0.541, η2p = 0.021]. Similarly, power in the dominant frequency band, which represented 12–15% of the total power, did not vary significantly across the five values of qAm [F(4,35) = 1.4, p = 0.251,
2.3.2. Oscillation of qB The raw data obtained from each rat in phase 1 are shown in Fig. 5. 22
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
Fig. 5. Adjusting reinforcer magnitude (qB, μl) in 800 successive trial blocks of Phase 1: individual-subject data from eight rats. Vertical lines demarcate the three 256-block segments (separated by 16-block intervals) used in the analysis.
Fig. 6. Fourier analysis of the adjusting reinforcer magnitude data (qB) in the three 256-block segments of Phase 1. Left histogram: total spectral power (in arbitrary units) in 15 bands covering the frequency range 0.004–0.0625 blocks−1 (period range, 16–250 blocks). Middle histogram: power in the dominant frequency band. Right histogram: period corresponding to the dominant frequency (blocks). Columns show group mean data + s.e.m.; significant difference from the first segment: * p < 0.05.
η2p = 0.034], and there was no effect of the value of qAm on the period corresponding to the dominant frequency [F(4,35) = 0.3, p = 0.877, η2p = 0.008]. The period corresponding to the dominant frequency ranged between 40 and 60 blocks (mean 48.7 ± 2.8 blocks).
the values of A and B are determined by the sizes and delays of the primary reinforcers, modulated by Q and K. In each trial block the subject compares the values of A and B and selects the higher valued option. It is assumed that the subject’s discrimination of VA and VB is limited by psychophysical factors, and that in each trial block the probability that B will be chosen conforms to a logistic function, whose slope is defined by a parameter, s.2
3. Simulations 3.1. Introduction The model proposed by Valencia-Torres et al. (2011) to describe the oscillations of dB in Mazur’s (1984) adjusting-delay schedule posits that
2 Valencia-Torres et al. proposed that s increases towards an asymptote during extended training, reflecting practice-related improvement of the discrimination of value.
23
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
Fig. 7. Fourier analysis of the adjusting reinforcer magnitude data (qB) in the final 256 blocks of the five phases of the experiment. Left histogram: total spectral power (in arbitrary units) in 15 bands covering the frequency range 0.004–0.0625 blocks−1 (period range, 16–250 blocks). Middle histogram: power within the dominant frequency band. Right histogram: period corresponding to the dominant frequency (blocks). Columns show group mean data + s.e.m. obtained with each of the five values of qAm.
in each case, qA1 was set at 64 and simulations were run for each of five values of qAm (64, 48, 40, 36 and 34) yielding 135 conditions in total. Fifty simulations were carried out for each condition. For each simulation, the following measures were derived: qB(50) (the average value of qB for 256 blocks), power within the dominant frequency band of the spectrum, and the period corresponding to the dominant frequency (see Section 2.4.2). Each dependent variable was first subjected to a preliminary fourfactor analysis of variance (Q [3 levels] × s [3 levels] × a [3 levels] × qAm [5 levels]), with 50 cases (simulations) in each condition. In view of the numerous degrees of freedom generated by this large dataset (dferror = 6615), effects were only regarded as robust if the following conservative criteria were met: an F-ratio with p < 0.01 and an effect size η2p > 0.15 (Lakens, 2013; Richardson, 2011). When a significant main effect of any of the parameters, or a significant interaction involving that parameter, was found, follow-up analyses of the simple main effects were carried out at each level of that factor using the same criteria. In the following description of the analyses, only the effect sizes of significant effects (η2p) will be reported. (In the present analyses, when η2p > 0.15 the p-value associated with the Fratio was in every case < 0.0001.)
This model may be applied to the adjusting-magnitude schedule of da Costa Araújo et al. (2010) (see Appendix A1). In this case, K drops out of the computation, since there are no scheduled delays to reinforcement. The adapted model incorporates four processes. (i) The value of each outcome is computed from the size(s) of the primary reinforcer(s), as defined by MHM; this step entails the parameter Q. (ii) The overall value of outcome A is computed from the individual values of the large and small reinforcers, qA1 and qA2. The most straightforward computation would be the arithmetic mean of the values, as in Eq. (7); however, for reasons that will become apparent, it was found expedient to use the generalized (Hölder) mean incorporating an ‘averaging parameter’, a: VA = [(VA1a + VA2a)/2]1/a. (iii) The probability of responding on operandum B in each block is computed using the psychometric function, VB being calculated from the size of reinforcer B in the current block. (iv) Finally, p(B) is translated into actual responses (A or B) in each free-choice trial of the current block; a parameter-free Bernoulli process was invoked for this operation. It important to note that these four ‘processes’ are merely the mathematical components of the model; they correspond to computations in the simulations described below, but, at this stage of the game, no implication is intended of a necessary link between these ‘processes’ and psychological or neurophysiological processes underlying performance on the schedule. The following simulations were carried out with the aim of verifying the applicability of the model to performance on the adjusting-magnitude schedule.
3.3. Results The values of qB(50) derived from the simulations are shown in Fig. 8. Each graph shows the relation between qB(50) and qAm. The points show the mean value of qB(50) ± s.e.m., derived from 50 simulations; the three curves in each graph show the simulations carried out using the three chosen values of a. The three rows of graphs correspond to the three values of Q, and the three columns to the three values of s. The filled circles in the left-hand column show the simulations based on three values of Q when a = 1 and s = 1. It can be seen that the curves were consistently below the diagonal line that represents the predictions of SHM. Increasing the value of Q (graph A → graph D → graph G) induced a progressive upward tilt of the lefthand region of the curve. Comparison of the curves across the three columns shows that, for any given value of Q, alteration of the value of s had no discernible effect on the function. Comparison of the three curves in each graph shows that increasing the value of the ‘averaging parameter’, a, (circles → upright triangles → inverted triangles) had the effect of tilting the function upwards; in the case of higher values of Q and a, the curve was displaced above the diagonal. These impressions were substantiated by the analyses of variance. The preliminary
3.2. Method Simulations were carried out using Microsoft Excel (2003 version). Each simulation comprised 256 trial blocks, each of which contained four free-choice trials; qB was set equal to qAm ([qA1 + qA2]/2) in the first block and was thereafter adjusted in successive blocks as in the experiment described above. The simulated data were subjected to the Fourier transform as described above. Three values of each of the three parameters of the model (Q, a and s) were chosen; the values of Q were 25, 100, and 400 (in the case of the experiment described in Section 2, the units of Q are μl of 0.6 M sucrose), and the values of the dimensionless parameters s and a were 1, 2, and 3. Simulations were run for each of the 27 possible combinations of these parameter values;
(footnote continued) However, the present discussion is concerned only with the steady-state conditionin which s is assumed to be stable (see Appendix, A1, for a full description of the model).
24
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
Fig. 8. Simulated data derived using the modified model based on MHM: indifference magnitude, qB(50). Each graph shows the relationship between qB(50) and the mean magnitude of reinforcer A (qAm) for three values of the ‘averaging parameter’, a (circles, a = 1; upright triangles, a = 2; inverted triangles, a = 3). Graphs in the three columns were derived using three values of the ‘slope parameter’, s (left column, s = 1; middle column, s = 2; right column, s = 3). Graphs in the three rows were derived using three values of the ‘size-sensitivity parameter’, Q (top row, Q = 25; middle row, Q = 100; bottom row, Q = 400). Each point shows the mean value of qB(50) obtained from 50 simulations; In every case, the s.e.m. was smaller than the height of the symbol.
analysis revealed significant main effects of Q (η2p = 0.454), a (η2p = 0.450) and qAm (η2p = 0.884), and there were significant Q × qAm (η2p = 0.251) and a × qAm (η2p = 0.284) interactions. There was no significant main effect of s, and no significant interaction involving that parameter (η2p < 0.02 in every case). Analyses of the results obtained at the three values of Q showed that the main effects of a and qAm were significant in each case (η2p > 0.226); the a × qAm interaction was significant only in the case of the two higher values of Q. Fig. 9 shows the power within the dominant frequency band. At the lowest value of Q, power increased with increasing values of qAm; however this trend became less apparent when Q was increased. Power declined with increasing values of s, but was unaffected by changes of a. The preliminary analysis of variance revealed significant main effects of Q (η2p = 0.411) and s (η2p = 0.568). The values η2p of the main effects of a and qAm and all the interactions were < 0.06. The effect of s was
significant at every level of Q (η2p > 0.5 in every case). The power in the dominant frequency band accounted for 16% ± 1% of the total spectral power within the 15 frequency bands. Fig. 10 shows the period corresponding to the dominant frequency band of the power spectrum derived from the simulations. Within each combination of the three parameters of the model, the period showed no systematic variation across the five values of qAm. Increases in the values of Q and s were associated with modest apparent reductions of the period, but variation of the value of a had no systematic effect on the length of the period. The p-values associated with the main effects of Q and s were < 0.001, but in no case was the effect size η2p > 0.1. Neither the main effect of a nor any of the interaction effects was associated with a p-value < 0.05 or an η2p value > 0.1.
25
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
Fig. 9. Simulated data derived using the modified model based on MHM: power within the dominant frequency band of the power spectrum of oscillations of qB obtained by Fourier transform of the block-by-block changes in the value of qB. Conventions are as in Fig. 8.
4. Discussion
shown in Fig. 4 lie above the diagonal; that is, the indifference magnitude of adjusting reinforcer, B, was larger than the mean magnitude of the variable reinforcer, A. It is difficult to see how this result can be accommodated by MHM without recourse to the unpalatable remedy of invoking another free parameter. No doubt there is more than one way in which this could be accomplished. One relatively straightforward solution is to abandon the unproven assumption that the average value of reinforcer A may be defined as the arithmetic mean of the values of its various sizes (cf. Eqs. (7) and (8)) and instead adopt the generalized mean, which incorporates an ‘averaging parameter’, a:
The results of the experiment presented in Section 2 are clearly at variance with the predictions derived from both SHM and MHM. According to SHM, in its original form, the indifference magnitude of reinforcer B (qB(50)), should equate to the arithmetic mean of the two magnitudes of reinforcer A (qAm) (Eq. (7)); this prediction is represented by the diagonal line in Fig. 1 which depicts the hypothetical relation between qB(50) and qAm. According to MHM, qB(50) should be smaller than qAm whenever qA1 ≠ qA2 (Eq. (8)); this prediction is represented by the curved functions lying below the diagonal in Fig. 1. In contrast to both these predictions, the empirical values of qB(50) 26
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
Fig. 10. Simulated data derived using the modified model based on MHM: period corresponding the dominant frequency band of the power spectrum of oscillations of qB. Conventions are as in Fig. 8. Horizontal bars indicate s.e.m. where this was greater than the size of the symbol.
⎡ n ⎤1/ a VA = ⎢∑ (VA (i ) )a / n⎥ . ⎢⎣ i =1 ⎥⎦
qB may be summarized as follows. During the first phase of the experiment, when performance was gradually stabilizing, the overall spectral power and the power in the dominant frequency band declined in the three successive 256-block segments, but there was no consistent change in the period corresponding to the dominant frequency. Comparison among the spectra derived from the final 256 blocks of the five phases showed that neither the power nor the period was significantly affected by variation of mean reinforcer size, qAm. It might be argued, with some justification, that accounting for a monotonic relation between qB(50) and qAm is not a notable achievement for a model which now incorporates no less than three free parameters. However, the simulations presented in Section 3 offer some hope that the revised model may be capable of generating testable hypotheses,
where n is the number of reinforcer sizes used for the variable option, A (in the present case, 2), and VA(i) is the value of each reinforcer size. When a = 1, VA is equal to the arithmetic mean of the values; when a > 1, VA is biased in favour of the larger value. Thus, in the present experiment, a value of a greater than unity would result in the larger qA1 exerting a greater impact on VA than the smaller qA2, causing VA to exceed (VA1 + VA2)/2. Hence, at the point of indifference between A and B, qB(50) would be expected to exceed qAm, as was found to be the case in this experiment. The results of the Fourier analysis of the block-by-block changes in 27
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
Table 1 Summary of the effects of manipulating qAm and the parameters of the revised model on simulated performance on the adjusting-magnitude schedule.a Dependent variable Indifference magnitude (qB(50)) Parameter Schedule parameterb qA2 Model parameter Q s a
Power spectrum Power in dominant frequency band
Period corresponding to dominant frequency
* *
* *
† †
*
‘Robust’ effects are indicated by asterisks; statistically significant but ‘non-robust’ effects are indicated by dagger symbols; see text for criteria. qAm, the arithmetic mean of the reinforcer magnitudes in the variable outcome, A. In the present analysis, qA1 was held constant, and qAm was manipulated by changing qA2. a
b
schedule was considerably less than the arithmetic mean of the two delays to reinforcer A (dAm). However, in the adjusting-magnitude schedule, the indifference point, qB(50), was close to the mean of the two magnitudes of reinforcer A (qAm). This is compatible with the present model, which predicts that the non-linear value-averaging process represented by a should exert opposite effects on qB(50) and dB(50). This is because a operates on the discounted values, VA(i). Values of a greater than unity increase the overall value of A; thus in the adjusting-delay schedule the indifference delay to outcome B is reduced, whereas in the adjusting-magnitude schedule the indifference magnitude of outcome B is increased. Put another way, K and a exert consonant influences on dB (50) in the adjusting-delay schedule, whereas Q and a exert opposing influences on qB(50) in the adjusting-magnitude schedule. The foregoing discussion has concentrated on the application of MHM to the adjusting-magnitude schedule. However, it must be emphasised that other models may acount for the data described above equally well. An example of one such model is discussed in Appendix A2. The present modification of MHM may also have some wider implications for the analysis of choice behaviour. Mazur’s (1984) adjusting-delay schedule and the present adjusting-magnitude schedule are examples of ‘risky-choice’ schedules in which choices are made between an uncertain outcome, A, and a certain outcome, B (Mazur, 2004). The two schedules differ only with respect to the dimension (delay or size) along which the values of the reinforcers are manipulated. In both schedules the primary behavioural measure is the indifference point (dB(50) or qB(50)). By convention, choice is regarded as ‘risk-insensitive’ if the indifference point coincides with the arithmetic mean value of the manipulated dimension of the uncertain outcome. Thus, in the case of the adjusting-delay schedule, if dA1 = 2 s and dA2 = 18, then risk-insensitivity is said to occur when dB(50) = 10 s; when dB(50) < 10 s, choice is regarded as ‘risk-prone’ because the subject is assumed to have equated the value of the variable outcome A with a ‘better’ value of outcome B, whereas when dB (50) > 10 s, choice is regarded as ‘risk-averse’ because the subject is assumed to have equated the value of the variable outcome A with a ‘worse’ value of outcome B. Applying the same logic to the adjustingmagnitude schedule, if qA1 = 64 μl and qA2 = 16 μl, risk-insensitivity corresponds to qB(50) = 40 μl; when qB(50) > 40 μl choice is regarded as ‘risk-prone’, whereas when qB(50) < 40 μl it is regarded as ‘riskaverse’ (Mazur, 2004). Previous findings with this type of risky-choice schedule have yielded mixed results. There is general consensus that when animals choose between certain and uncertain delays, risk-proneness is the norm (Bateson and Kacelnik, 1995; Cicerone, 1976; Davison, 1969; Mazur, 1984; Rider, 1983; for reviews, see Kacelnik and Bateson, 1996; Kacelnik and El Mouden, 2013). In contrast, when choices entail certain
and that in some cases, the profile of effect of experimental manipulations on the three dependent variables examined here (qB(50), and the power and period indices derived from the Fourier analysis) may have specific implications for the effect of these manipulations on the model’s three parameters, Q, s and a (see below). The analysis of the simulation data is summarized in Table 1. Considering first the indifference magnitude, Fig. 8 shows that with a set at 1.0, the relation between qB(50) and qAm invariably fell below the diagonal line specified by SHM. Increases in the size-sensitivity parameter, Q, drew the relation into closer proximity to the diagonal; however, only when a was increased above 1.0 could the curve be enticed above the diagonal, and then only in combination with relatively high values of Q. This indicates that both Q and a are needed if MHM is to provide an adequate account of the qB(50) data. Turning to the Fourier analysis, it transpired that power within the dominant frequency band was impervious to manipulation of the independent variable qAm, consistent with the experimental finding shown in Fig. 7. It was also unaffected by changes Q or a, but was sensitive to changes in s. The latter result is consistent with the finding of a decline in power during the early stages of training under the schedule (Fig. 6), if it is accepted that, as proposed by Valencia-Torres et al. (2011), s expresses the precision of discrimination between VA and VB, which improves with extended experience of the choice alternatives. Inspection of Table 1 suggests that the profiles of effect of interventions on the indifference magnitude and the power spectrum may convey specific information about the involvement of the model’s three parameters in these effects. Thus, a change in qB(50) unaccompanied by a change in spectral power is suggestive of an effect on a, whereas an effect on power unaccompanied by a change in qB(50) is more likely to represent an effect on s; a simultaneous change of both qB (50) and spectral power is consistent with an effect on Q, although it could, of course, reflect a dual effect on a and s. The period corresponding to the dominant frequency of the power spectrum was not robustly affected by manipulation of any of the three parameters of the model, nor was it robustly affected by experimental or simulated manipulation of qAm. It remains to be seen whether this measure will prove sensitive to other schedule manipulations and/or to biological interventions. The revised model based on MHM may help to account for an unexpected finding noted by da Costa Araújo et al. (2010). These authors trained separate groups of rats under Mazur’s (1984) adjustingdelay schedule and the present adjusting-magnitude schedule. The same ratio, 1:9, was used for the delays to reinforcer A in the adjusting-delay schedule, (dA1 = 2 s, dA2 = 18 s) and the sizes of reinforcer A in the adjusting-magnitude schedule (qA1 = 20 μl, qA2 = 180 μl of a 0.6 M sucrose solution). In agreement with Mazur’s (1984) results, the indifference point, dB(50), in the adjusting-delay 28
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
et al., 2015; Schultz, 2015). It may be of interest, in future experiments, to explore the sensitivity of the parameters of the revised MHM to interventions that are known to affect risk-sensitivity in other riskychoice situations. Finally, a comment may be in order about the use of the adjustingmagnitude schedule in neurobehavioural studies. In previous experiments, the schedule has been used as a comparator for Mazur’s (1984) adjusting-delay schedule (da Costa Araújo et al., 2010; McClure et al., 2014; Moschak and Mitchell, 2014). For example, da Costa Araújo et al. (2010) compared the levels of Fos expression (an index of neuronal activation) in the orbital prefrontal cortex and nucleus accumbens of rats exposed to the adjusting-delay and adjusting-magnitude schedules. da Costa Araújo et al. found that Fos expression was enhanced in the nucleus accumbens, but not in the orbital prefrontal cortex, following exposure to the adjusting-delay schedule, whereas Fos activity in the orbital prefrontal cortex was enhanced following exposure to either schedule. The authors argued that their results were consistent with the suggestion that the nucleus accumbens may participate in the process of delay discounting, whereas the orbital prefrontal cortex may be involved in the collation of multiple influences on reinforcer value, including both the size and delay of food reinforcers. While the present results do not cast doubt on the suitability of the adjusting-magnitude schedule as a comparator for the adjusting-delay schedule in this type of experiment, they do suggest that the specific attribution of performance on the two schedules to the operation of the processes of delaydiscounting (K) and size-sensitivity (Q) may be an over-simplification. It seems possible that extant models of intertemporal choice may have underestimated the complex process whereby reinforcer values are combined when an overall value is assigned to an outcome composed of variable reinforcer sizes and/or delays. Of course, adoption of the generalized mean in place of the arithmetic mean as the principle whereby reinforcer values are combined does nothing to explain this complexity, but it does draw attention to the need for further investigation of this aspect of choice behaviour.
and uncertain reinforcer magnitudes, risk-proneness (Essock and Rees, 1974; Mazur, 1985; Young, 1981), risk-insensitivity (Reboreda and Kacelnik, 1991; Staddon and Innis, 1966; Wunderle and O’Brien, 1985) and risk-aversion (Bateson and Kacelnik, 1995; Clements, 1990; Hastjarjo et al., 1990; Menlove et al., 1979) have all been reported (for reviews, see Kacelnik and Bateson, 1996; Kacelnik and El Mouden, 2013). In the present experiment, choice was risk-insensitive when the discrepancy between qA1 and qA2 was small; however, larger discrepancies were associated with an increasing tendency towards riskproneness. It is noteworthy that risk-proneness in the present adjustingmagnitude schedule, while compatible with the revised MHM, thanks to the incorporation of the ‘averaging parameter’, a, is incompatible with SHM, which predicts risk-insensitivity, and with the original version of MHM, which predicts risk-aversion. It is also noteworthy that the model propounded by Bateson and Kacelnik (1995), based on psychophysical principles, has difficulty accounting for risk-proneness in situations where choices are made between certain and uncertain reinforcer amounts (see Kacelnik and El Mouden, 2013). However, it cannot be emphasised too strongly that the success of the revised MHM in accounting for the data reported here relies on the post hoc invocation of an additional parameter. Much further work is needed before the model may be regarded as a serious challenger to existing, more parsimonious, models of risky choice. A number of factors have been identified which influence the tendency towards risk-proneness or risk-aversion, including deprivation level and the availability of daily dietary requirements from alternative sources (Caraco et al., 1980; Stephens, 1981), the quantity and quality of the reinforcer (Craft et al., 2011; Mazur, 1988), and the responseeffort requirement (Kirshenbaum et al., 2000, 2003). It is also consistently affected by various neurobiological interventions (Adriani et al., 2009; Cardinal and Howes, 2005; Cocker et al., 2012; Kaminski and Ator, 2001; Kheramin et al., 2003; Larkin et al., 2016; Mobini et al., 2000; St Onge and Floresco, 2009; Sugam et al., 2012), and shows considerable inter-individual variability (e.g. Adriani et al., 2009; Sugam et al., 2012) (see Kacelnik and El-Mouden, 2013; Kirkpatrick Appendix A A1 The Fourier transform and its application to adjusting schedules
Fourier analysis is a process whereby complex periodic data may be decomposed into component frequencies. A conventional expression of the result of a Fourier analysis is the power spectrum, in which the power of oscillation (e.g. energy in physical systems or variance in statistical applications) is plotted against the frequency of oscillation. The period of each component oscillation is the reciprocal of its frequency, and the power of oscillation is the square of the amplitude. A power spectrum may comprise one or more peaks, representing major component frequencies, and low levels of power in the intervening frequency bands, representing ‘noise’. The total power is the integral of the power in all frequency bands. An example of a biological application of the Fourier transform is the analysis of heart rate variability. The power spectrum of inter-beat intervals contains three peaks: a high-frequency band (0.15–0.4 Hz) which reflects parasympathetic vagal activity (‘sinus arrhythmia’), a middle-frequency band (0.04–0.15 Hz) which represents fluctuatons in sympathetic nervous activity, and a low-frequency band (0.003–0.015 Hz) which is thought to reflect thermoregulatory mechanisms (Akselrod et al., 1981). The Fourier transform has also proved valuable in the analysis of ecological data, for example periodic variations in insect populations (Bigger, 1973) and seasonal patterns of the movement of fish (Papastamatiou et al., 2009). Fourier analysis was applied to the oscillating pattern of delay to reinforcement in adjusting schedules by da Costa Araújo et al. (2009) and Valencia-Torres et al. (2011, 2012). In these schedules the delay to reinforcer B, dB, in each block of trials is adjusted in successive blocks of trials, depending on the subject’s preference in the previous block (see Section 1); the temporal dimension in the Fourier analysis is therefore the number of trial blocks. The total power is related to the sum of the squares of the amplitudes of oscillations across the whole spectrum, and power in each frequency band reflects the proportion of the total power that may be attributed to oscillation within that frequency band. The spectrum found by da Costa Araújo et al. (2009) had a single dominant peak corresponding to a period of 60–80 trial blocks; similar results were obtained by da Costa Araújo et al. (2009) and Valencia-Torres et al. (2011). The model proposed by Valencia-Torres et al. (2011) to describe the oscillations of dB in Mazur’s (1987) schedule may be summarized as follows. It is assumed that the values of the two options, A and B, are determined by the sizes of the primary reinforcers, modulated by the parameters of MHM, Q and K:
VA =
1 1 1 1 ⋅ ; VB = ⋅ ; 1 + Q / qA 1 + K ⋅dA 1 + Q / qB 1 + K ⋅dB
(A1)
(Eq. (2) in main text). In each block the subject compares the values of A and B and selects option with the higher value. It is further assumed that discrimination of the values of A and B is limited by psychophysical factors, and that in each free-choice trial the probability of choosing B conforms to a logistic psychometric function: 29
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
Fig. A1. Comparison of the Multiplicative Hyperbolic Model (MHM) and the Exponential Model (EM) discussed in Appendix A2. Left-hand graphs (A and E): Curves show the least-squares fits to the data presented in Fig. 4; estimates of the parameters (MHM, a; EM, c), and the corresponding values of r2 are shown for each model. The diagonal line indicates the function corresponding to qB(50) = qAm. Histograms show indices derived from the Fourier analysis of the oscillations of qB in the five phases (qAm values are indicated on the abscissae): total spectral power (B and F), power within the dominant frequency band (C and G), and period (blocks) corresponding to the dominant frequency (D and H).
p(B) = 1/(1 + [VA/VB]s),
(A2)
where s defines the slope of the function, which increases during extended training, reflecting practice-related improvement of the discrimination of value. An exponential learning equation is invoked to model this process: s = smax.(1 − e−n/c),
(A3)
where smax is the asymptotic sensitivity, n is the ordinal position of the trial block, and c defines the rate at which s approaches smax. In each freechoice trial, p(B) is converted into an ‘actual’ response using a parameter-free Bernouilli process. (A random number between 0 and 1 is generated; B is chosen if this number is greater than p(B), A is chosen if it is less than p(B).) The imperfect discrimination of value encapsulated in Eq. (A2) and the uncertainty of response selection modelled by the Bernoulli process generate oscillations of q.Valencia-Torres et al. (2011) found that the oscillations of dB generated by the model resembled the pattern of oscillations seen in rats exposed to the adjusting-delay schedule. The present work applies this model to oscillations of qB in the adjusting- magnitude schedule. A2 Alternative models of choice in the adjusting-magnitude schedule Several strands of evidence favour a negatively accelerated relation between reinforcer size and value, in accordance with the economic principle of diminishing marginal utility. The evidence includes studies based on response-strength (for review, see de Villiers, 1977), ‘specific activation’ (Rickard et al., 2009; Bradshaw and Killeen, 2012), and choice-based (Killeen 1985; Wogar et al., 1992; Forzano and Logue, 1995; Chelonis and Logue, 1996; Mazur and Bondi, 2009) measures of reinforcer value. This principle is captured in MHM by the hyperbolic function q/(q + Q) (Wogar et al., 1992; Ho et al., 1997; 1999). Other models postulate an exponential modulation of reinforcer size, qc, diminishing marginal utility implying that c < 1 (Myerson and Green. 1995; Aparicio and Baum, 2009; Locey and Dallery, 2009). It is of interest to examine whether the latter approach can account for the present behavioural data without recourse to the additional ‘averaging parameter’ adopted in the present modification of MHM.3 The application of this approach (hereafter referred to as the ‘exponential model’, EM) to the adjusting-magnitude schedule may be summarized as follows. The values of the two components of the complex reinforcer A are qA1c and qA2c. Therefore, assuming that the arithmetic mean is the appropriate combination tool, the overall value of A is (qA1c + qA2c)/2. At equilibrium, VB = VA; i.e., 1
qB (50)c =
3
⎡ q c + qA2 c ⎤ c qA1c + qA2 c , or qB (50) = ⎢ A1 ⎥ . ⎣ ⎦ 2 2
(A4)
The author thanks an anonymous reviewer for this suggestion.
30
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
This equation was compared with the formula for qB(50) derived from the modified MHM. According to MHM, the equilibrium state, VB = VA, may be expanded thus:
1 Q⋅VA = VA, or qB (50) = , 1 + Q / qB (50) 1 − VA a
(A5)
VA2a)/2]1/a,
VA1 = 1/(1 + Q/qA1), and VA2 = 1/(1 + Q/qA2). where VA. = [(VA1 + To render the two models comparable, the value of Q in Eq. (A2) was fixed at 186 μl, this being the average value of Q derived from 36 rats tested with the reinforcer used in the present experiment (0.6 M sucrose solution: Rickard et al., 2009; Valencia-Torres et al., 2012). Thus for the purposes of this comparison, each model incorporated a single free parameter: c in the case of EM and a in the case of MHM. The left-hand graphs of Fig. A1 (panels A and E) show the least-squares fits of the two models to the experimental data presented in Fig. 3. Both models provided a satisfactory account of the indifference magnitudes, qB(50), the values of R2 being 0.941 for MHM and 0.946 for EM. The value of a, the ‘averaging parameter’ of MHM, was 2.51, and the value of c, the exponent in EM, was 1.91. These parameters were used in simulations based on the two models, performed as described in the body of this paper. The results of these simulations are shown in the histograms in Figure A1 (MHM: panels B-D; EM: panels F-H). The results of the Fourier transforms performed on the behavioural data (see Fig. 7) are shown as the connected points superimposed on each histogram. It is apparent that neither model perfectly captures the quantitative features of the rats’ power spectra. Simulations based on MHM yielded good predictions of the total power (B) and power within the dominant frequency band (C) but overestimated the period corresponding to the dominant frequency (D), whereas simulations based on EM accurately predicted the period (H) but markedly underestimated the spectral power (F and G). In terms of their performance in describing the behavioural data presented in this paper, there seems to be little to choose between MHM and EM; both give good qualitative accounts of the data but stumble, in different ways, in their quantitative predictions of the oscillation of qB. In terms of parsimony, EM has the edge, as it computes the overall value of the complex reinforcer, VA, with one parameter, c, whereas the modified MHM uses two: Q and a. Nevertheless, there may be theoretical grounds for favouring MHM over EM. By treating the computation of the values of the individual components of the complex reinforcer and the combination of these component values into the overall value as separate processes, the modified MHM offers a means of accounting for preference for variable over fixed reinforcers in the adjusting-magnitude schedule, and riskproneness in risky-choice schedules, without sacrificing the empirically supported principle that the value of a food reinforcer is a negatively accelerated function of its size. Because EM funnels both these processes into one exponential parameter, it is obliged to accept qualitatively different size/value relations operating in different situations (e.g. the principle of diminishing marginal utility implies c < 1, and preference for variable reinforcers c > 1). It is difficult to envisage an experimentum crucis that would decisively discriminate between MHM and EM; however the gradual accumulation of behavioural evidence may eventually tell which approach offers the greater potential for modelling choice behaviour. At this juncture it may be premature to discard either model out of hand.
da Costa Araújo, S., Body, S., Valencia-Torres, L., Olarte-Sánchez, C.M., Bak, V.K., Deakin, J.F.W., Anderson, I.M., Bradshaw, C.M., Szabadi, E., 2010. Choice between reinforcer delays versus choice between reinforcer magnitudes: differential Fos expression in the orbital prefrontal cortex and nucleus accumbens core. Behav. Brain Res. 213, 269–277. Davison, M.C., 1969. Preference for mixed-interval versus fixed-interval schedules. J. Exp. Anal. Behav. 12, 247–252. de Villiers, P.A., 1977. Choice in concurrent schedules and a quantitative formulation of the law of effect. In: Honig, K., Staddon, J.E.R. (Eds.), Hanbook of Operant Behavior. Prentice Hall Englewood Cliffs, NJ. Essock, S.M., Rees, E.P., 1974. Preference for and effects of variable as opposed to fixed reinforcer duration. J. Exp. Anal. Behav. 21, 89–97. Forzano, L.B., Logue, A.W., 1995. Self-control and impulsiveness in children and adults: effects of food preferences. J. Exp. Anal. Behav. 64, 33–46. Green, L., Myerson, J., 2004. A discounting framework for choice with delayed and probabilistic rewards. Psychol. Bull. 130, 769–792. Hastjarjo, X., Silberberg, A., Hursh, S.R., 1990. Risky choice as a function of amount and variance in food supply. J. Exp. Anal. Behav. 53, 155–161. Hayter, A.J., 1986. The maximum family error rate of Fisher’s least significant difference test. J. Am. Stat. Assoc. 81, 1001–1004. Ho, M.-Y., Wogar, M.A., Bradshaw, C.M., Szabadi, E., 1997. Choice between delayed reinforcers: interaction between delay and deprivation level. Q. J. Exp. Psychol. 50B, 193–202. Ho, M.-Y., Mobini, S., Chiang, T.-J., Bradshaw, C.M., Szabadi, E., 1999. Theory and method in the quantitative analysis of impulsive choice behaviour: implications for psychopharmacology. Psychopharmacology (Berl.) 146, 362–372. Kacelnik, A., Bateson, M., 1996. Risky theories −the effects of variance on foraging decisions. Amer. Zool. 36, 402–434. Kacelnik, A., El-Mouden, C., 2013. Triumphs and trials of the risk paradigm. Anim. Behav. 86, 1117–1129. Kaminski, B.J., Ator, N.A., 2001. Behavioral and pharmacological variables affecting risky choice in rats. J. Exp. Anal. Behav. 75, 275–297. Kheramin, S., Body, S., Mobini, S., Ho, M.-Y., Velazquez-Martinez, D.N., Bradshaw, C.M., Szabadi, E., Deakin, J.F.W., Anderson, I.M., 2003. Role of the orbital prefrontal cortex on choice between delayed and uncertain reinforcers: a quantitative analysis. Behav. Proc. 64, 239–250. Killeen, P.R., 1985. Incentive theory: IV. Magnitude of reward. J. Exp. Anal. Behav. 43, 407–417. Killeen, P.R., 2011. Models of trace decay, eligibility for reinforcement, and delay of reinforcement gradients, from exponential to hyperboloid. Behav. Proc. 87, 57–63. Kirkpatrick, K., Marshall, A.T., Smith, A.P., 2015. Mechanisms of individual differences in impulsive and risky choice in rats. Comp. Cog. Behav. Revs. 10, 45–72.
References Adriani, W., Boyer, F., Gioiosa, L., Macrì, S., Dreyer, J.L., Laviola, G., 2009. Increased impulsive behavior and risk proneness following lentivirus-mediated dopamine transporter over-expression in rats’ nucleus accumbens. Neuroscience 159, 47–58. Akselrod, S., Gordon, D., Ubel, F.A., Shannon, D.C., Barger, A.C., Cohen, R.C., 1981. Power-spectrum analysis of heart rate fluctuation: a quantitative probe of beat-tobeat cardiovascular control. Science 213, 220–222. Aparicio, C.F., Baum, W.M., 2009. Dynamics of choice: relative rate and amount affect local preference at three different time scales. J. Exp. Anal. Behav. 91, 293–317. Bateson, M., Kacelnik, A., 1995. Preferences for fixed and variable food sources: variability in amount and delay. J. Exp. Anal. Behav. 63, 313–329. Bigger, M., 1973. An investigation by Fourier analysis into the interaction between coffee leaf-miners and their larval parasites. J. Anim. Ecol. 42, 417–434. Body, S., Bradshaw, C.M., Szabadi, E., 2017. Delayed reinforcement: neuroscience. In: Stein, J. (Ed.), Reference Module in Neuroscience and Biobehavioral Psychology. Elsevier, Amsterdam. http://dx.doi.org/10.1016/B978-0-12-809324-5.02713-9. Bradshaw, C.M., Killeen, P.R., 2012. A theory of behaviour on progressive ratio schedules, with applications in behavioural pharmacology. Psychopharmacology (Berl.) 222, 549–564. Caraco, T., Martindale, S., Whittam, T.S., 1980. An empirical demonstration of risk sensitive foraging preferences. Anim. Behav. 28, 820–830. Cardinal, R.N., Howes, N.J., 2005. Effects of lesions of the nucleus accumbens core on choice between small certain rewards and large uncertain rewards in rats. BMC Neurosci. 6 (37), 1–19. Chelonis, J.J., Logue, A.W., 1996. Effects of response type on pigeons’ sensitivity to variation in reinforcer amount and reinforcer delay. J. Exp. Anal. Behav. 66, 297–309. Cicerone, R.A., 1976. Preference for mixed versus constant delay of reinforcement. J. Exp. Anal. Behav. 25, 257–261. Clements, K.C., 1990. Risk-aversion in the foraging blue jay, Cyanocitta cristala. Anim. Behav. 40, 182–195. Cocker, P.J., Dinelle, K., Kornelson, R., Sossi, V., Winstanley, C.A., 2012. Irrational choice under uncertainty correlates with lower striatal D2/3 receptor binding in rats. J. Neurosci. 32, 15450–15457. Craft, B.B., Church, A.C., Rohrbach, C.M., Bennett, J.M., 2011. The effects of reward quality on risk-sensitivity in Rattus norvegicus. Behav. Proc. 88, 44–46. da Costa Araújo, S., Body, S., Hampson, C.L., Langley, R.W., Deakin, J.F.W., Anderson, I.M., Bradshaw, C.M., Szabadi, E., 2009. Effect of lesions of the nucleus accumbens core on inter-temporal choice: further observations with an adjusting-delay schedule. Behav. Brain Res. 202, 272–277.
31
Behavioural Processes 140 (2017) 19–32
C.M. Bradshaw
choice. J. Exp. Anal. Behav. 64, 263–272. Papastamatiou, Y.P., Lowe, C.G., Caselle, J.E., Friedlander, A.M., 2009. Scale-dependent effects of habitat on movements and path structure of reef sharks at a predatordominated atoll. Ecology 90, 996–1008. Reboreda, J.C., Kacelnik, A., 1991. Risk sensitivity in starlings: variability in food amount and food delay. Behav. Ecol. 2, 301–308. Richardson, J.T.E., 2011. Eta squared and partial eta squared as measures of effect size in educational research. Educ. Res. Rev. 6, 135–147. Rickard, J.F., Zhang, Z.-Q., Body, S., Bradshaw, C.M., Szabadi, E., 2009. Effect of reinforcer magnitude on performance maintained by progressive-ratio schedules. J. Exp. Anal. Behav. 91, 75–87. Rider, D.P., 1983. Preference for mixed versus constant delays of reinforcement: effect of probability of the short, mixed delay. J. Exp. Anal. Behav. 39, 257–266. Schultz, W., 2015. Neuronal reward and decision signals: from theories to data. Physiol. Rev. 95, 853–951. St Onge, J.R., Floresco, S.B., 2009. Dopaminergic modulation of risk-based decision making. Neuropsychopharmacology 34, 681–697. Staddon, J.E.R., Innis, N.K., 1966. Preference for fixed vs. variable amounts of reward. Psychon. Sci. 4, 193–194. Stephens, D.W., 1981. The logic of risk-sensitive foraging preferences. Anim. Behav. 29, 628–629. Sugam, J.A., Day, J.J., Wightman, R.M., Carelli, R.M., 2012. Phasic nucleus accumbens dopamine encodes risk-based decision-making behavior. Biol. Psychiatry 71, 199–205. Valencia-Torres, L., da Costa Araújo, S., Olarte-Sánchez, C.M., Body, S., Bradshaw, C.M., Szabadi, E., 2011. Transitional and steady-state choice behavior under an adjustingdelay schedule. J. Exp. Anal. Behav. 95, 57–74. Valencia-Torres, L., Olarte-Sánchez, C.M., da Costa Araújo, S., Body, S., Bradshaw, C.M., Szabadi, E., 2012. Nucleus accumbens and delay discounting in rats: evidence from a new quantitative protocol for analysing inter-temporal choice. Psychopharmacology (Berl.) 219, 271–283. Valencia-Torres, L., Olarte-Sánchez, C.M., Body, S., Bradshaw, C.M., Szabadi, E., 2013. Investigations of the neurobiological bases of inter-temporal choice. Eur. J. Behav. Anal. 14, 215–239. Wogar, M.A., Bradshaw, C.M., Szabadi, E., 1992. Choice between delayed reinforcers in an adjusting-delay schedule: the effects of absolute reinforcer size and deprivation level. Q. J. Exp. Psychol. 45B, 1–13. Wunderle, J.M., O’Brien, T.G., 1985. Risk aversion in hand-reared bananaquits. Behav. Ecol. Sociobiol. 17, 371–380. Young, J.S., 1981. Discrete-trial choice in pigeons: effects of reinforcer magnitude. J. Exp. Anal. Behav. 35, 23–29.
Kirshenbaum, A.P., Szalda-Petree, A.D., Haddad, A.F., 2000. Risk-sensitive foraging in rats: the effects of response-effort and reward-amount manipulations on choice behavior. Behav. Proc. 50, 9–17. Kirshenbaum, A.P., Szalda-Petree, A.D., Haddad, A.F., 2003. Increased effort requirements and risk sensitivity: a comparison of delay and magnitude manipulations. Behav. Proc. 61, 109–121. Lakens, D., 2013. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front. Psychol. 4 (863), 1–21. Larkin, J.D., Jenni, N.L., Floresco, S.B., 2016. Modulation of risk/reward decision making by dopaminergic transmission within the basolateral amygdala. Psychopharmacology (Berl.) 233, 121–136. Locey, M.L., Dallery, J., 2009. Isolating behavioral mechanisms of inter-temporal choice: nicotine effects on delay discounting and delay sensitivity. J. Exp. Anal. Behav. 91, 213–224. Mazur, J.E., Bondi, D.R., 2009. Delay-amount trafoffs in choices by pigeons and rats: Hyperbolic versus exponential discounting. J. Exp. Anal. Behav. 91, 197–211. Mazur, J.E., Herrnstein, R.J., 1988. On the functions relating delay, reinforcer value, and behavior. Behav. Brain Sci. 11, 690–691. Mazur, J.E., 1984. Tests of an equivalence rule for fixed and variable reinforcer delays. J. Exp. Psychol.: Anim. Behav. Proc. 10, 426–436. Mazur, J.E., 1985. Probability and delay of reinforcement as factors in discrete-trial choice. J. Exp. Anal. Behav. 43, 341–351. Mazur, J.E., 1987. An adjusting procedure for studying delayed reinforcement. In: Commons, M.L., Mazur, J.E., Nevin, J.A., Rachlin, H.C. (Eds.), Quantitative Analyses of Behavior, Vol. 5: The Effects of Delay and Intervening Events on Reinforcement Value. Erlbaum, Mahwah, N.J, pp. 55–73. Mazur, J.E., 1988. Choice between small certain and large uncertain reinforcers. Anim. Learn. Behav. 16, 199–205. Mazur, J.E., 2004. Risky choice: selecting between certain and uncertain outcomes. Behav. Analyst Today 5, 190–202. McClure, J., Podos, J., Richardson, H.N., 2014. Isolating the delay component of impulsive choice in adolescent rats. Front. Integr. Neurosci. 8, 1–9 article 3. Menlove, R.L., Inden, H.M., Madden, E.G., 1979. Preference for fixed over variable access to food. Anim. Learn. Behav. 7, 499–503. Mobini, S., Chiang, T.-J., Ho, M.-Y., Bradshaw, C.M., Szabadi, E., 2000. Effects of central 5-hydroxytryptamine depletion on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl.) 152, 390–397. Moschak, T.M., Mitchell, S.H., 2014. Partial inactivation of nucleus accumbens core decreases delay discounting in rats without affecting sensitivity to delay or magnitude. Behav. Brain Res. 268, 159–168. Myerson, J., Green, L., 1995. Discounting of delayed rewards: models of individual
32