Learning and Motivation 53 (2016) 24–35
Contents lists available at ScienceDirect
Learning and Motivation journal homepage: www.elsevier.com/locate/l&m
Reinforcer distributions affect timing in the free-operant psychophysical choice procedure夽 Sarah Cowie a,∗ , Lewis A. Bizo b , K. Geoffrey White c a b c
University of Auckland, New Zealand University of New England, Australia University of Otago, New Zealand
a r t i c l e
i n f o
Article history: Received 28 July 2015 Received in revised form 16 October 2015 Accepted 16 October 2015 Keywords: Timing Reinforcers Free-operant psychophysical choice Pigeon
a b s t r a c t In procedures used to study timing behavior, the availability of reinforcement changes according to time since an event. Manipulation of this reinforcer differential often produces violations of scalar timing, but it is unclear whether such effects arise because of a response bias or a change in temporal discrimination. The present experiment investigated the effects of the overall and relative probability of obtaining a reinforcer on performance in the free-operant psychophysical procedure. We arranged short and long trials with unequal reinforcer ratios, at high or low overall reinforcer rates. Changes in the overall reinforcer rate produced only small changes in timing behavior. Changes in relative reinforcer probability, which caused differences in the likely availability of reinforcers across time within a trial, produced a change in both bias and discrimination. We suggest reinforcers affect timing, and that discrimination in timing tasks depends on the distribution of reinforcers in time, as well as on the interval to be timed. © 2015 Elsevier Inc. All rights reserved.
1. Introduction The effect of reinforcers on performance in interval timing tasks appears to extend beyond the reinforcer’s function as a marker event. While these effects are evident in a range of different temporal-discrimination procedures (e.g., Bizo & White, 1994a,b, 1995; Doughty & Richards, 2002; Galtress & Kirkpatrick, 2009), they are particularly salient in the free-operant psychophysical procedure (FOPP; Stubbs, 1980) for studying immediate timing. In the FOPP, a two-key concurrent variableinterval (VI) extinction (EXT) schedule operates for the first half of each trial (with VI on the left and EXT on the right), reversing to a concurrent EXT VI schedule for the second half of each trial (with EXT on the left and VI on the right), as illustrated in Fig. 1. There is no exteroceptive discriminative stimulus to signal the transition from first to second halves of a trial. The likely availability of reinforcement in the first versus second half of a trial is therefore signaled by time since the beginning of the trial. Unlike procedures in which the likely availability of a reinforcer is signaled by a specific time interval, as in the peak procedure (e.g., Beam, Killeen, Bizo, & Fetterman, 1998), timing in the FOPP involves discrimination of when reinforcers are likely to be obtained for each of two responses. Thus, the FOPP permits the investigation of how both relative and absolute properties of reinforcers affect temporal-discrimination performance, independent of variations in the duration to be timed.
夽 Parts of this research were presented at the annual conference of the ABAI (Seattle, WA, May 2012) ∗ Corresponding author at: Department of Psychology, University of Auckland, Private Bag 92019 Auckland, New Zealand. E-mail address:
[email protected] (S. Cowie). http://dx.doi.org/10.1016/j.lmot.2015.10.003 0023-9690/© 2015 Elsevier Inc. All rights reserved.
S. Cowie et al. / Learning and Motivation 53 (2016) 24–35
25
The psychometric function generated from responding on the FOPP relates responses made on one alternative (the right key in the present experiment) as a proportion of responses made on both alternatives (left and right keys) to time elapsed since trial onset (e.g., Bizo & White, 1994a, 1994b; Stubbs, 1980). In the procedure illustrated in Fig. 1, the pigeon typically responds on the left key at the beginning of the trial, and so the proportion of right responses is close to zero. As the trial continues, responding increasingly shifts to the right key and the proportion of right responses shows a sigmoidal increase as a function of time in the trial. Psychometric functions may differ in slope, or in their positioning along the time-sincetrial-onset axis. A change in the slope of the function implies a change in accuracy of timing. A shift in the positioning of the function could imply a preference for left- versus right-response keys independently of a change in timing accuracy; that is, a response bias. It could also imply a change in the discrimination of the transition between reinforcement probabilities in first versus second halves of a trial occurring too soon or too late, that is, a bias towards right-key responding sooner or later. In the absence of changes in the duration to be timed, differences in the psychometric function may be produced by variations in overall reinforcer rate (Bizo & White, 1994a, 1994b), as well as by changes in the ratio of reinforcers for responding at different halves of a trial (Bizo & White, 1995). Increases in the overall reinforcer rate appear to produce steeper psychometric functions, suggesting that overall reinforcer rate affects timing. In contrast, variations in the relative reinforcer rate appear to produce shifts in the positioning of the function consistent with a bias toward the response alternative associated with the relatively higher probability of reinforcement, as in standard concurrent choice procedures (e.g., Davison & McCarthy, 1987; Raslear, 1985; Stubbs, 1968). Such a bias is time-dependent, but independent of a change in discrimination. Similar shifts may also be achieved by varying the reinforcer rate in each quarter of a trial, but holding the overall reinforcer rate in the first- and second-half equal (Machado & Guilhardi, 2000). Thus, reinforcers may affect both the timing and decision-making component of temporal-discrimination tasks. The exact process underlying these effects of reinforcers remains unclear. Theories of timing attribute these effects of reinforcers on temporal discrimination performance to a variety of mechanisms. The Behavioral Theory of Timing (BeT; Killeen & Fetterman, 1988; Fetterman and Killeen (1991)) assumes that the speed of a hypothetical pacemaker varies directly with overall reinforcer density. Thus, BeT predicts that increases in the overall reinforcer rate produce a change in timing, such that subjective time passes more quickly. Although differences in the relative reinforcer rate do not alter the overall reinforcer density, pacemaker period may be made sensitive to the effects of relative reinforcer rate by assuming that unequal relative rates bias the pacemaker (see Bizo & White, 1995). However, this relation between pacemaker period and relative rate cannot account for changes in psychometric functions that arise when rates differ across quarters, but not halves, of a FOPP trial (e.g., Machado & Guilhardi, 2000). Learning to Time (LeT; Machado, 1997), a derivative of BeT, also predicts that the overall reinforcer rate controls the speed at which the animal moves through subjective time. According to LeT, a pacemaker controls the speed with which different states become active, and reinforcers obtained while a state is active increase the probability of that response being emitted while the state is active in the future. LeT thus interprets the effects of unequal reinforcer distributions across halves or quarters of a trial as the result of a bias toward emitting a particular response at a given time.
First half
Second half
Left key
Right key
VI
EXT
EXT
VI
Inter-trial interval Fig. 1. Diagram of the free-operant psychophysical procedure. The concurrent VI Ext schedule on left and right keys in the first half of the trial is reversed for the second half of the trial. Trial halves are discriminated by time since trial onset and not by an exteroceptive stimulus.
26
S. Cowie et al. / Learning and Motivation 53 (2016) 24–35
Some models of timing assume that discrimination of elapsed time is a process that is independent of the discriminated contingencies of reinforcement, and that reinforcers affect temporal-discrimination performance only because they cause a bias. The Behavioral Economic Model (BEM; Jozefowiez, Cerutti, & Staddon, 2009) predicts that the momentary probability of obtaining a reinforcer for a given response affects the decision criterion for switching from left-key to right-key responding in the FOPP, and therefore that all reinforcer-related effects on timing behavior result from a bias. While this approach predicts the effects of the relative reinforcer rate on the psychometric function observed by Bizo and White (1995) and Machado and Guilhardi (2000), it has difficulty accounting for the effects of manipulations which do not alter the momentary probability of obtaining a reinforcer (e.g., reinforcer magnitude—see Bizo & White, 1994a, 1994b). Thus, although some of the effects of reinforcers on responding arise independently of a change in timing, not all of them can be attributed to a response bias. LeT, BeT and BEM all assume that timing conforms to Weber’s law—that is, the error in duration discrimination should increase in proportion to the length of the duration (the scalar property; Gibbon, 1977). While this assumption has been shown to hold in a variety of procedures and across a variety of subjects, violations are occasionally observed (see Lejeune & Wearden, 2006 for a review). For example, Bizo and White (1997) showed that the relative mean and standard deviation of psychometric functions differed for short and long FOPP trials. While violations of scalar timing are uncommon, their apparently unpredictable occurrence indicates that our understanding of the processes mediating timing is incomplete. Accurate modeling of the processes mediating timing requires separation of the different mechanisms underlying the effects of reinforcers on temporal-discrimination performance. While substantial research has focused on the ability of different models of timing to account for variations in the contingencies of reinforcement (e.g., Bizo & White, 1994a,b; Machado & Guilhardi, 2000; Guilhardi, MacInnis, Church, & Machado, 2007), attempts to determine whether these variations result from a change in timing, a bias independent of timing, or a combination of these factors, are far less common. The present experiment therefore examined the extent to which bias and discrimination are affected by the relative and absolute probability of a reinforcer in the FOPP, in both short and long trials, in order to assess the theoretical assumptions that underlie our current understanding of timing. The influence of the absolute probability of obtaining a reinforcer was investigated by arranging two sets of conditions: Rich conditions, with a higher overall rate of reinforcement, and Lean conditions, with a lower overall rate of reinforcement. Both BeT and LeT predict that increasing the overall reinforcer rate should increase the speed of the pacemaker and decrease its variance, producing more accurate timing, and steeper psychometric functions. Because the overall reinforcer rate does not alter the momentary probability of a reinforcer for one response relative to the other, BEM predicts invariance in the psychometric functions across differing overall reinforcer rates. Within each set of Rich and Lean conditions, the arranged ratio of first- to second-half reinforcers across conditions was varied over three values: 5:1, 1:1 and 1:5. This allowed for investigation of the effect on timing behavior of the probability of obtaining a reinforcer for one response relative to the other. All three models predict that these manipulations would cause a bias, evident as a change in the point of subjective equality, but not a change in discrimination. To test whether timing behavior is affected by manipulations that do not alter the probability of reinforcement for a response, differentially-signaled short and long trials were arranged within a single session. All three models assume the scalar property of timing, and hence predict that psychometric functions plotted in terms of the proportion of trial duration would superpose. We therefore asked whether superposition of psychometric functions for short and long trials was influenced by overall or relative reinforcer rates.
2. Materials and methods 2.1. Subjects Subjects were five locally-supplied experimentally naïve adult homing pigeons (Columba livia) labelled A1 to A5. The birds were only included in an experimental session if their weight immediately before an experimental session fell within 80% ± 10 g of their free-feeding weight. Supplementary feed, which consisted of a mixture of corn, maple peas, and wheat, was given after all experimental sessions had finished. The birds were housed individually and had free access to water and grit when not in the experimental chamber.
2.2. Apparatus An interface panel was contained inside a sound attenuating experimental chamber that was 290 mm high, 320 mm wide, and 350 mm deep. The interior of the chamber and the interface panel were painted matt black. Masking noise was provided by a ventilation fan that was mounted on the wall of the chamber opposite the interface panel. There were two Plexiglass response keys 29 mm in diameter mounted on the interface panel 220 mm above the chamber floor and 100-mm apart. Each effective key peck required a force of approximately 0.2 N to close a reed relay switch that was mounted behind the key, and resulted in a 50-ms blackout of the key. A central hopper opening provided 3-s access to wheat, and when wheat was available the white light in the ceiling of the hopper was illuminated. Experimental events were recorded and controlled by a PC computer, MED-PC 2.0 software, and interfacing in an adjacent room.
S. Cowie et al. / Learning and Motivation 53 (2016) 24–35
27
Table 1 List of conditions, relative reinforcer ratio, arranged VI schedules in each half of a trial, and the number of experimental sessions in each condition. Condition
Ratio
First half (left key)
Second half (right key)
Sessions
Rich
1 2 3
1:1 5:1 1:5
VI 30 s VI 18 s VI 90 s
VI 30 s VI 90 s VI 18 s
24 33 A1, 32 A2–A5 36
Lean
4 5 6
1:1 5:1 1:5
VI 180 s VI 54 s VI 270 s
VI 180 s VI 270 s VI 54 s
36 52 24
Note: the ‘Sessions’ column shows the number of sessions per condition. For all conditions except Condition 2, all birds experienced the same number of sessions per condition; hence only one number is listed in this column. Condition 2 ran for 33 sessions for Pigeon A1, and 32 for the other pigeons (Pigeons A2–A5).
2.3. Procedure The pigeons were introduced to the main procedure following initial autoshaping to peck side keys and preliminary training for at least a month in the main procedure. Experimental sessions were scheduled for seven days a week. Each session was preceded and followed by a short blackout period of about 3 min, during which responding was not recorded, and began with the illumination of the two side keys. Trials lasted either 30 s or 100 s, with the two trial durations alternating across trials within the session. At the beginning of each trial, the left and right keys were illuminated, with the color of both keys (red or green) depending on trial duration (30 s or 100 s). Left and right keys were the same color throughout the trial, and the trial durations and associated colors alternated directly across the session. During the first half of a trial, the left key arranged reinforcer deliveries according to a constant-probability VI schedule (Fleshler & Hoffman, 1962), and the right key was in extinction. At the midpoint of the trial, the key-location of the VI and extinction schedule reversed, such that right-key reinforcers were scheduled according to a VI schedule, and the left key was in extinction for the remainder of the trial (Fig. 1). The VI schedules in the two halves of the trial were varied across conditions in both the ratio and absolute values of their intervals. In Conditions 1–3 (Rich), reinforcers could be obtained at three times the rate as those in Conditions 4–6 (Lean). In Conditions 1 and 4, the ratio of scheduled reinforcers for left-key and right-key responding in first and second halves of the trial was 1.0. In all other conditions, the reinforcer ratio (first:second) was either 1:5 or 5:1 (Table 1). The VI schedules ran independently, so that the VI schedule in the second half of the trial ran even when a reinforcer had been arranged but not obtained in the first half of the trial. If a reinforcer was arranged but not obtained in the first half of a trial, it was made available following the first response in the first half of the trial on the subsequent trial of the same trial duration. If a reinforcer was arranged but not obtained in the second half of a trial, it was made available following the first response in the second half of the trial on the subsequent trial of the same trial duration. This method of dealing with unobtained reinforcers is a standard procedure for multiple schedules (White, 1990) and was applied in the present procedure to the VI schedules in each half of the trial, separately for each trial duration. 2.4. Data analysis Pecks to the left and right keys were recorded in 2.5-s bins and were summed across the last seven sessions of each condition. Individual data generally followed the same pattern, and were thus well represented by data averaged across all pigeons. Because individual reinforcers may function as time markers in the FOPP (e.g., Bizo & White, 1994a, 1994b), data used in the following analyses were taken from trials in which no reinforcers were obtained. In these trials, responding should be influenced by the general distribution of reinforcers across time, but not by the discriminative effects of individual reinforcers. Psychometric functions were plotted by taking right-key responses as a proportion of total left- and right-key responses in each 2.5-s bin. Differences in the psychometric functions in terms of their slope and position were examined. A change in the slope of the function would suggest a change in discrimination, whereas a shift in the function as a whole indicates an effect that may be independent of timing. In order to calculate measures of the position and slope of the psychometric functions, logistic approximations to cumulative normal distributions (sigmoidal functions) were fitted to the proportion data for each condition and for each pigeon. The sigmoidal functions had four free parameters, representing the minimum, maximum of the function, the slope, and the point at which choice was half way between its minimum and maximum. The minimum and maximum were judged unimportant and are not reported here, as differences in timing and bias were better reflected in the mean and standard deviation of the functions. The important parameters were the mean and standard deviation of the fitted functions, which we used to provide close approximations to the mean and standard deviations of the psychometric functions, without being influenced by changes in their minima or maxima. The mean and standard deviation were measured in terms of proportion of trial duration (ranging from 0 to 1.0). The mean provides a measure of the position of the psychometric function along the x axis. With an increasing bias towards responding on the right, the psychometric function moves to the left and the mean becomes
28
S. Cowie et al. / Learning and Motivation 53 (2016) 24–35
VI 18 s VI 90 s
PROPORTION RIGHT RESPONSES
1.0
VI 30 s VI 30 s
VI 90 s VI 18 s
Obtained Predicted Mean
0.8
0.6
0.4
0.2
SD = 0.097 VAC = 97.7%
SD = 0.067 VAC = 98.2%
SD = 0.035 VAC = 98.8%
0.0 0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
PROPORTION OF TRIAL Fig. 2. Psychometric functions (filled circles) and fits of the sigmoidal function to Pigeon A2’s data from Rich 30-s trials as a function of the proportion of time in a trial. The location of the mean of the fitted function is shown as a dashed vertical line, and the standard deviation (SD) and variance accounted for by each fitted function (VAC) are shown in each panel.
smaller. The standard deviation is the reciprocal of the slope of the psychometric function and provides a measure of timing accuracy. To illustrate how the fitted functions relate to obtained choice, Fig. 2 shows the proportion of responses to the right key made by Pigeon A2, and the functions fitted to these data, in each of the three Rich conditions for 30-s trials. These conditions were selected because they yielded noticeably different slopes and means, and thus illustrate clearly the relation between the parameters estimated from the fits and the features of the obtained data. From Fig. 2, it is apparent that the fitted functions provided an accurate description of obtained response proportions. Indeed, the fits were excellent in each case and on average over all pigeons and conditions, accounted for 91.7% of the variance in each psychometric function. Fig. 2 shows that the obtained psychometric functions were flattest in the condition arranging a 5:1 reinforcer ratio (VI 18 s VI 90 s), and steepest in the condition arranging a 1:5 ratio (VI 90 s VI 18 s). These differences are reflected in the standard deviation parameter from the fits (SD); Fig. 2 shows that the standard deviation was largest in the condition arranging a 5:1 reinforcer ratio, and smallest in the condition arranging a 1:5 ratio. The smaller the standard deviation, the steeper the slope, and the more abrupt the transition in responding from the left to the right key. Because the change in the reinforcer differential was abrupt (at the trial midpoint), a smaller standard deviation implies more accurate discrimination. In Fig. 2, the mean of the fitted function is plotted as a dashed vertical line. This value is a proportion of the total trial time, and measures the tendency or bias to respond on the right sooner or later in the trial. In the VI 18 s VI 90 s condition, in which there was a greater tendency to respond on the right much later in the trial, the mean of the psychometric function is later in the trial, compared to the VI 90 s VI 18 s condition in which the mean is earlier in the trial and there is a strong tendency to respond on the right earlier in the trial. Because each pigeon experienced each of the possible conditions, repeated measures analyses of variance were conducted on proportion right-key response data and on mean and standard deviation parameters from the fitted functions. 3. Results 3.1. Overall reinforcer rate Fig. 3 plots psychometric functions from Rich and Lean conditions separately for short and long trials. Similar patterns of responding were evident in both Rich and Lean conditions, and in both short and long trials: The proportion of right-key responses increased progressively as a function of trial time, for 30-s trial durations, F(11, 44) = 195.88, p < .001, MSE = .012, p 2 = .98, and for 100-s trial durations (with time in 5-s bins), F(19, 76) = 141.87, p < .001, MSE = .01, p 2 = .97. Psychometric functions from Rich and Lean conditions arranging the same relative reinforcer rates did not perfectly superpose, but fell close to one another. Repeated-measures ANOVAs showed that there were no significant effects of the overall rich/lean reinforcer rate on proportion of right responses for either 30-s trials, F < 1, or 100-s trials, F < 1, nor were there significant interactions between Rich/Lean and Time, both Fs < 1. Additionally, overall reinforcer rate did not have statistically significant effects on the mean of the psychometric functions, F(1, 4) = 2.70, p = .176, or standard deviation, F(1, 4) = 7.46, p = .052. The
S. Cowie et al. / Learning and Motivation 53 (2016) 24–35
29
30-s TRIALS 1.0
0.8
0.6 VI 90 s VI 18 s VI 30 s VI 30 s VI 18 s VI 90 s VI 270 s VI 54 s VI 180 s VI 180 s VI 54 s VI 270 s
PROPORTION OF RIGHT RESPONSES
0.4
0.2
0.0
100-s TRIALS 1.0
0.8
0.6
0.4
0.2
0.0 0.0
0.2
0.4
0.6
0.8
1.0
PROPORTION OF TRIAL
Fig. 3. Psychometric functions for rich (unfilled symbols) and lean (filled symbols) reinforcer-rate conditions, plotted separately for 30-s (top panel) and 100-s (bottom panel) trials as a function of the proportion of time in a trial.
effect of overall reinforcer rate on the psychometric functions is therefore consistent with predictions made by BEM, but not those made by BeT and LeT.
3.2. Relative reinforcer rate When the reinforcer rate in the second half of a trial was lower than that in the first half of the trial, the psychometric function appeared considerably slower to change, and reached a lower maximum, than did the psychometric function from trials arranging equal rates of reinforcement, or from trials where reinforcer rates favored responding in the second half of a trial. Thus, unequal relative reinforcer rates displaced psychometric functions toward the relatively-richer response alternative (a shift in the maximum height of the function), and toward the times at which this response alternative was active. These conclusions were supported by significant main effects of reinforcer ratio for 30-s trials, F(2, 8) = 282.94, p < .001, MSE = .027, p 2 = .99, and for 100-s trials, F(2, 8) = 96.95, p < .001, MSE = .132, p 2 = .96, and also by significant interactions between reinforcer ratio and time for 30-s trials, F(22, 88) = 19.63, p < .001, MSE = .008, p 2 = .83, and for 100-s trials, F(38, 152) = 5.07, p < .001, MSE = .013, p 2 = .56. In order to assess whether these differences in the psychometric function resulted solely from a bias, or whether differences in the relative reinforcer rate caused a change in timing, the mean and standard deviation for each condition are plotted in Fig. 4. As the reinforcer ratio changed from favoring the first half to the second half of the trial, the first-half/second-half reinforcer ratio was associated with significantly smaller means, F(2, 8) = 61.17 p < .001, MSE = .001, p 2 = .94, and generally smaller standard deviations, F(2, 8) = 4.80, p < .043, MSE = .0008, p 2 = .55. Standard deviations were influenced by a significant interaction between trial duration and reinforcer ratio, F(2, 8) = 8.33, p = .011, p 2 = .68. The main contribution to this interaction was a smaller standard deviation in the VI 18 s VI 90 s and VI 54 s VI 270 s conditions for 100-s trials than for 30-s trials. This interaction can be seen in Fig. 5 for these reinforcer-ratio conditions, where the psychometric functions for 100-s trials are steeper than for 30-s trials. The bias in responding toward the richer alternative is correctly predicted by BeT, LeT and BEM, but the change in slope, indicative of a change in timing, is not predicted by any of the models.
30
S. Cowie et al. / Learning and Motivation 53 (2016) 24–35
ME A N
0.8
100-s Trials 30-s Trials
0.4
0.0
STANDARD DEVIATION
0.15
0.10
0.05
s 54
0 18
0
0 27 VI
VI
18
0
s
s
VI
VI
27 VI
54 s VI
90 VI
30 VI
s
s
VI
VI
30
18
s
s 90 VI s 18 VI
s
s
s
0.00
Fig. 4. Means and standard deviations of psychometric functions for 30-s and 100-s trials, as a function of relative reinforcer ratio. Means and standard deviations were measured in terms of proportion of time in the trial, and were estimated by the parameters of best fitting sigmoidal functions.
3.3. Trial duration Fig. 4 shows that the means and standard deviations were larger for short trials than for long trials. This conclusion was supported by a repeated-measures ANOVA which showed that both measures were influenced by trial duration, with larger means, F(1, 4) = 197.10, p ≤ .001, MSE = .002, p 2 = .98, and larger standard deviations, F(1, 4) = 16.40, p = .015, MSE = .0008, p 2 = .80, for short trials than for long trials. Because the standard deviation was measured in terms of proportion of trial duration, rather than seconds, these differences indicate that the effect of trial duration on timing behavior was driven by a change in discrimination. The means of the psychometric functions (Fig. 4) were also influenced by a significant interaction between reinforcer ratio and trial duration, reflecting a larger effect of trial duration when the reinforcer ratio favored the first half of the trial than when it favored the second half, F(2, 8) = 8.33, p = .011, MSE = .0003, p 2 = .68. Thus, while trial duration and relative reinforcer rate both affected temporal discrimination, these effects depended to some extent on the time within a trial at which more reinforcers are arranged. These effects of trial duration on temporal discrimination are not predicted by BeT, LeT or BEM. To illustrate the effects of manipulations of trial duration on the psychometric functions, Fig. 5 plots the same data as in Fig. 3, with functions grouped according to the relative reinforcer rate. Psychometric functions from long 100-s trials began to change after a smaller proportion of the trial had elapsed than did functions from short 30-s trials—this difference appeared larger in conditions arranging more reinforcers in the second than the first half of a trial than in conditions arranging more reinforcers in the first than in the second half of a trial. When more reinforcers were arranged in the first half of a
S. Cowie et al. / Learning and Motivation 53 (2016) 24–35
1.00
VI 180 s - VI 180 s
31
VI 30 s - VI 30 s
0.75 0.50 0.25
100-s TRIALS 30-s TRIALS
PROPORTION OF RIGHT RESPONSES
0.00 1.00
VI 90 s - VI 18 s
VI 18 s - VI 90 s
VI 270 s - VI 54 s
VI 54 s - VI 270 s
0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00 0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
PROPORTION OF TRIAL Fig. 5. Psychometric functions for 30-s and 100-s trials plotted separately for each reinforcer-ratio condition as a function of proportion of time in a trial.
trial, however, psychometric functions from long trials reached a lower maximum than those from short trials. These same effects of trial duration were evident, although to a lesser extent, in psychometric functions from conditions arranging equal numbers of reinforcers in each half of a trial. 3.4. Obtained reinforcers Fig. 6 shows the proportion of total reinforcers obtained in each half of the trial, in each time bin, during short and long trials. For the purposes of comparison, these reinforcer proportions are plotted as a function of the proportion of trial time. Because reinforcers were obtained across a greater number of time bins in long trials than in short trials, proportions are higher in short trials than in long trials. However, it is the pattern of change, rather than the absolute value, of the proportions that is important. In both short and long trials, obtained reinforcer proportions followed a similar pattern. In almost all conditions, a greater proportion of reinforcers obtained in the first half of a trial were obtained at the beginning of the trial, during the first time bin—this difference was more noticeable in short trials than in long trials, and in conditions arranging a lower reinforcer rate in the first half of a trial than in the second. The pattern of change in the reinforcers obtained in the second half of a trial did not vary systematically with reinforcer rate, but where reinforcer proportions increased across time in the second half of a trial, the increases were generally larger and more systematic in short trials
32
S. Cowie et al. / Learning and Motivation 53 (2016) 24–35
0.8
VI 180 s VI 180 s
VI 30 s VI 30 s 30-s TRIALS
0.6 100-s TRIALS
PROPORTION REINFORCERS OBTAINED
0.4 0.2 0.0 0.8
VI 90 s VI 18 s
VI 18 s VI 90 s
VI 270 s VI 54 s
VI 54 s VI 270 s
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 PROPORTION OF TRIAL
Fig. 6. Proportion of total first- and second-half reinforcers obtained in each time bin for 30-s and 100-s trials of each reinforcer-ratio condition as a function of proportion of time in a trial.
than in long trials. Thus, reinforcer proportions plotted as a function of trial time did not perfectly superpose, particularly when more reinforcers were obtained in the first half of a trial relative to the second. These differences in the obtained reinforcers (Fig. 6) were the result of the dynamical interaction between choice and obtained reinforcers (see also Cowie, Elliffe, & Davison, 2013), and not of a programming decision: A reinforcer arranged in the first half of a trial, but not obtained, was made available immediately at the beginning of the subsequent trial (a standard procedure for multiple schedules, White, 1990). Because choice shifted away from the key active in the first half of a trial shortly before the midpoint of the trial, reinforcers arranged at later times in the first half of a trial were progressively less likely to be obtained, and therefore more likely to be reallocated to the beginning of the subsequent trial. Because responding at the beginning of a trial was generally exclusive to the key active in the first half of a trial, these reallocated reinforcers were highly likely to be obtained at the beginning of a trial, thus creating a spike in the local obtained reinforcer proportion at the beginning of a trial. Generally, the earlier the shift in preference away from the key active in the first half of a trial (Fig. 6), the greater the probability of a reinforcer being reallocated to the subsequent trial, and the more extreme the spike in local reinforcer proportions at the beginning of the trial. 4. Discussion Although the influence of reinforcers on timing behavior has long been acknowledged (e.g., Bizo & White, 1994a, 1994b; 1995; 1997; Machado & Guilhardi, 2000; Stubbs, 1968, 1980), the nature of such effects, and the mechanisms that underlie them, are not well understood. The present experiment therefore sought to examine the nature of the effects of reinforcers on temporal-discrimination performance—that is, whether differences in the relative and absolute frequency of reinforcers
S. Cowie et al. / Learning and Motivation 53 (2016) 24–35
33
cause changes in the psychometric function because of a decision-making bias due to unequal payoffs, or because of a change in discrimination. The present results highlight the importance of the reinforcers obtained for making a temporal discrimination. Variations in the relative reinforcer rate had the largest effects on the psychometric function, resulting from a bias toward the relatively-richer alternative and from a change in discrimination. Variations in trial duration also caused changes in discrimination, apparently because they were associated with differences in the distribution of reinforcers across time within a trial (Fig. 6). Discrimination was improved in long trials, and in trials arranging more reinforcers in the second half. In contrast, discrimination was relatively unaffected by changes in the overall reinforcer rate. These findings are generally not predicted by current theories of timing. How do reinforcers affect timing? Changes in discrimination – timing – were produced by manipulations that caused variations in the distribution of reinforcers across time within a trial. While such variations were explicitly arranged by varying the relative reinforcer rate, they occurred with variations in trial duration because of the dynamical interaction between choice and obtained reinforcers (Fig. 6). If the error in estimating a duration increases relative to the mean estimate of that duration (e.g., Gibbon, 1977), choice will differ across time, and so too will the proportion of reinforcers obtained at each time. Regardless of whether such variations were explicitly arranged, they appear to produce differences in temporal discrimination. This result suggests that temporal discrimination depends on the distribution of reinforcers across the relevant duration, rather than on extended-level variables such as the overall reinforcer density. Indeed, variations in the overall reinforcer rate, which would not be expected to produce differences in the distribution of reinforcers across time, had little effect on performance. The failure of superposition of psychometric functions from short and long trials observed in the present experiment (Fig. 5) and in a small handful of other timing studies (e.g., Bizo & White, 1997) therefore appears to be the result of differences in the distribution of reinforcers across time, produced indirectly by variations in trial duration. That is, variations in the duration to be timed are unlikely in themselves to cause violations of scalar timing; they simply increase the likelihood that the distribution of reinforcers across time will differ as a result of differences in choice across time. Certainly, this is consistent with the interaction between relative rate and trial duration in the present results. Because of their dynamical nature, changes in the distribution of reinforcers across time will differ in magnitude across procedures and individual subjects. It is therefore unsurprising that changes in trial duration very seldom produce violations of the scalar property. The effects of the dynamical interaction between choice and obtained reinforcers are particularly salient in the FOPP and other temporal discrimination procedures where reinforcers are obtained during, rather than after, a duration. Even so, the present results suggest that the superposition of psychometric functions in any timing task will depend on the distribution of obtained reinforcers across relative time. Even when reinforcers are arranged after a particular duration, as in the peak and time-left procedures, reinforcers that are not obtained immediately after being arranged will cause the obtained reinforcer differential to differ from the arranged reinforcer differential (e.g., see Davison, Cowie, & Elliffe, 2013). Variation in the time at which reinforcers are obtained will increase as timing accuracy decreases, as per the scalar property of timing (Gibbon, 1977), which may further alter discrimination of the temporal contingency. Thus, responding in any timing task is the result not only of discrimination of elapsed time, but also of discrimination of the relation between elapsed time and obtained reinforcers. The mechanism by which reinforcer distributions control discrimination remains unclear. The inability of current models of timing to account for the present findings appears to relate primarily to two assumptions: that the overall reinforcer density mediates the speed of subjective time; and that differences in the distribution of reinforcers across time will only cause a response bias. As a result, these models cannot easily explain the finding that discrimination was improved in conditions arranging more reinforcers in the second than the first half of a trial (see also Bizo & White, 1995), and in longer trials (Fig. 4). Because these effects appear to relate to the distribution of reinforcers across time, rather than between halves, a simple modification such as biasing the pacemaker toward the relatively-richer portion of the trial (see Bizo & White, 1995) will not suffice. The effect of reinforcers on timing appears to operate at the most local level of control. In order to account for the present results, a theory of timing must predict that the frequency of each type of reinforcer obtained at each time within a trial, relative to the frequency at other times in a trial, affects discrimination. LeT captures this local-level control to some extent, because it assumes that the coupling between operant responses and active states increases with instances of reinforcement. Thus, the greater the number of reinforcers obtained while a state is active, the more likely the occurrence of that response when that state is active, and the less likely the occurrence of the other response. However, this effect of reinforcers serves only to bias decision-making, and not to change discrimination; LeT incorrectly attributes control of discrimination to the overall reinforcer density. While LeT may be modified so that the speed of the pacemaker is not affected by overall density (see Machado & Guilhardi, 2000), consistent with the present findings, such a modification does not aid the model in predicting the effect of the distribution of reinforcers across time on discrimination. An alternative approach is to consider temporal-discrimination performance as choice under stimulus control by elapsed time (e.g., Cowie et al., 2013; Cowie, Davison, & Elliffe, 2014; Davison et al., 2013). Where LeT would predict that a behavioral state occasions a timing response, a stimulus-control approach would assume that elapsed time functions as a discriminative stimulus signaling the response more likely to produce a reinforcer. Choice strictly matches the discriminated reinforcer differential, because it is the discriminated, rather than obtained, reinforcer differential that controls behavior (Davison & Nevin, 1999). Where elapsed time is imperfectly discriminated, estimates of the time at which reinforcers are obtained will occasionally be incorrect, which may cause the discriminated reinforcer differential to differ from the obtained reinforcer differential.
34
S. Cowie et al. / Learning and Motivation 53 (2016) 24–35
Such an approach predicts the sigmoidal shape of the psychometric function because some of the reinforcers obtained in one half of the trial will be discriminated to have been obtained in the other half of a trial, causing the discriminated reinforcer differential, and hence choice, to change progressively across time within a trial. Essentially, obtained reinforcers are redistributed across surrounding times, with the standard deviation of estimates increasing in proportion to the mean, to form the discriminated reinforcer differential. Larger numbers of reinforcers obtained at one time relative to others means that those reinforcers have a greater effect on the discriminated reinforcer differential – and hence choice – than do reinforcers obtained at surrounding times. This account is similar to how LeT predicts that the probability of an operant response changes according to the states active, and the coupling of the response with each state, except that it is elapsed time, rather than active behavioral states, that occasion responding. Because of this difference, a stimulus-control approach is free from the constraints of a pacemaker whose speed varies with overall reinforcer density. A stimulus-control account essentially predicts that timing and discriminating the reinforcer differential in time are two separable elements of performance on temporal-discrimination tasks. A change in either of these elements will result in a change in the slope of the psychometric function. Certainly, such an approach explains the apparently counter-intuitive finding that discrimination was worse in shorter trials (Fig. 4). In short trials, more reinforcers were generally obtained immediately after than immediately before the midpoint of the trial. These reinforcers would shift the reinforcer differential in the first half of the trial toward the right key to a greater extent than in long trials, where numbers around the midpoint are more equal. As a result, choice might change earlier, and more progressively, in short trials than in long trials. Quantitative models of the stimulus-control approach (e.g., Cowie et al., 2014), however, fail to predict improved discrimination under conditions that favor responding in the second half of a trial. Because the error in estimating elapsed time increases proportionally with the mean, increasing numbers of reinforcers obtained in the second half should have an increasingly large effect on the reinforcer differential in the first half of a trial, resulting in weaker discrimination. Clearly, the earlier change in choice relates to a bias toward the relatively-richer response; because the transition occurred earlier, the scalar property of timing dictates a more accurate discrimination. Although increased numbers of reinforcers at later times may also make the temporal contingency more discriminable, it is unclear how a stimulus-control approach might be modified explicitly to predict such findings. The stimulus-control approach has some advantages over traditional models of timing, because it does not predict an effect of overall reinforcer rate, and because it has a mechanism which may predict changes in discrimination produced by variations in the distributions of reinforcers across the relevant duration. Despite these advantages, however, it still fails to explain the effects of unequal relative reinforcer rates on discrimination. To conclude, the present experiment demonstrated that control of timing behavior depends not only on the relative differences between the durations to be timed, but also on the relative differences between the reinforcers obtained at times within those durations. The distribution of reinforcers obtained across time within a trial affects discrimination of the time-based contingency, in addition to causing a response bias. Timing behavior therefore cannot be described accurately by models of timing that assign a simple response-biasing function to reinforcers, or by models of timing that do not account for the strong control by the distribution of reinforcers for each timing response across the relevant duration. Isolation of the mechanism for these effects requires further research arranging explicit differences in the distributions of reinforcers across elapsed time.
References Beam, J. J., Killeen, P. R., Bizo, L. A., & Fetterman, J. G. (1998). The role of reinforcement context in temporal production and categorization. Animal Learning & Behavior, 26, 388–396. Bizo, L. A., & White, K. G. (1994a). The behavioral theory of timing: reinforcer rate determines pacemaker rate. Journal of the Experimental Analysis of Behavior, 61, 19–33. Bizo, L. A., & White, K. G. (1994b). Pacemaker rate in the behavioral theory of timing. Journal of Experimental Psychology Animal Behavior Processes, 20, 308–321. Bizo, L. A., & White, K. G. (1995). Biasing the pacemaker in the behavioral theory of timing. Journal of the Experimental Analysis of Behavior, 64, 225–235. Bizo, L. A., & White, K. G. (1997). Training with controlled reinforcer density: implications for models of timing. Journal of Experimental Psychology Animal Behavior Processes, 23, 44–55. Cowie, S., Davison, M., & Elliffe, D. (2014). A model for food and stimulus changes that signal time-based contingency changes. Journal of the Experimental Analysis of Behavior, 102, 289–310. Cowie, S., Elliffe, D., & Davison, M. (2013). Concurrent schedules: discriminating reinforcer-ratio reversals at a fixed time after the previous reinforcer. Journal of the Experimental Analysis of Behavior, 100, 117–134. Davison, M., Cowie, S., & Elliffe, D. (2013). On the joint control of preference by time and reinforcer-ratio variation. Behavioural Processes, 95, 100–112. Davison, M., & McCarthy, D. (1987). The interaction of stimulus and reinforcer control in complex temporal discrimination. Journal of the Experimental Analysis of Behavior, 49, 351–365. Davison, M., & Nevin, J. A. (1999). Stimuli, reinforcers, and behavior: an integration. Journal of the Experimental Analysis of Behavior, 71, 439–482. Doughty, A. H., & Richards, J. B. (2002). Effects of reinforcer magnitude on responding under differential-reinforcement-of-low-rate schedules of rats and pigeons. Journal of the Experimental Analysis of Behavior, 78, 17–30. Fetterman, J. G., & Killeen, P. R. (1991). Adjusting the pacemaker. Learning and Motivation, 22, 226–252. Fleshler, M., & Hoffman, H. S. (1962). A progression for generating variable-interval schedules. Journal of the Experimental Analysis of Behavior, 5, 529–530. Galtress, T., & Kirkpatrick, K. (2009). Reward value effects on timing in the peak procedure. Learning and Motivation, 40, 109–131. Gibbon, J. (1977). Scalar expectancy theory and Weber’s law in animal timing. Psychological Review, 84, 279–325. Guilhardi, P., MacInnis, M. L. M., Church, R. M., & Machado, A. (2007). Shifts in the psychophysical function in rats. Behavioural Processes, 75, 167–175. Jozefowiez, J., Staddon, J. E. R., & Cerutti, D. T. (2009). The behavioral economics of choice and interval timing. Psychological Review, 116, 519–539. Killeen, P. R., & Fetterman, J. G. (1988). A behavioral theory of timing. Psychological Review, 95, 274–295. Lejeune, H., & Wearden, J. H. (2006). Scalar properties in animal timing: conformity and violations. The Quarterly Journal of Experimental Psychology, 59, 1875–1908.
S. Cowie et al. / Learning and Motivation 53 (2016) 24–35
35
Machado, A. (1997). Learning the temporal dynamics of behavior. Psychological Review, 104, 241–265. Machado, A., & Guilhardi, P. (2000). Shifts in the psychometric function and their implications for models of timing. Journal of the Experimental Analysis of Behavior, 74, 25–54. Raslear, T. G. (1985). Perceptual bias and response bias in temporal bisection. Perception & Psychophysics, 38, 261–268. Stubbs, A. (1968). The discrimination of stimulus duration by pigeons. Journal of the Experimental Analysis of Behavior, 11, 223–238. Stubbs, D. A. (1980). Temporal discrimination and a free-operant psychophysical procedure. Journal of the Experimental Analysis of Behavior, 33, 167–185. White, K. G. (1990). Delayed and current stimulus control in successive discriminations. Journal of the Experimental Analysis of Behavior, 54, 31–43.