Behavioural Processes 45 (1999) 115 – 127
Timing and choice in concurrent chains Randolph C. Grace a,b,*, John A. Nevin a,b a
Department of Psychology, Uni6ersity of Canterbury, Pri6ate Bag 4800, Christchurch, New Zealand b Uni6ersity of New Hampshire, Durham, NH 03824, USA Received 20 March 1998; received in revised form 1 October 1998; accepted 23 October 1998
Abstract To investigate the role of timing processes in choice, we used a new procedure that provided simultaneous measures of ongoing choice and timing behavior. Pigeons responded in a peak procedure in which the delays to reinforcement signaled by red and green center-key stimuli were 10 and 20, or 20 and 40 s. After 25 sessions of training, the peak procedure was embedded within concurrent chains: The inter-trial interval was replaced by a choice phase in which the two side keys were illuminated white; responses to the left and right keys occasionally changed the center-key to red or green, respectively; and the terminal links signaled by the center-key stimuli were identical to the trials of the peak procedure. The temporal control of responding on no-food trials was the same regardless of whether the no-food trials occurred in the peak procedure or as the terminal links of concurrent chains. After an intervening condition with the peak procedure in which the delay for the 10 s stimulus was changed to 40 s (or vice versa), the pigeons were returned to concurrent chains. Choice responding did not reflect the changed delay, despite the fact that the pigeons timed the delays in both terminal links accurately as indexed by responding on no-food trials. This result challenges current accounts of choice based on timing processes, such as scalar expectancy theory, which assume that choice responding is mediated by a representation of terminal link delays to reinforcement. Apparently, pigeons’ choice and timing behavior in a single session can be controlled by temporal information from different temporal epochs. © 1999 Elsevier Science B.V. All rights reserved. Keywords: Timing processes in choice; White Carneau pigeons; Peak procedure; Timing behavior
1. Introduction Choice—how animals allocate their behavior among different sources of reinforcement — has * Corresponding author. E-mail address:
[email protected] (R.C. Grace)
been studied for many years. Most of the research in the operant tradition has employed recurrent choice paradigms, in which animals make many responses over an extended period of time and encounter outcomes repeatedly. The best-known empirical result in this area is the matching law, first reported for concurrent variable-interval (VI)
0376-6357/99/$ - see front matter © 1999 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 6 - 6 3 5 7 ( 9 9 ) 0 0 0 1 3 - 3
116
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127
schedules by Herrnstein, 1961 (see Williams, 1988, 1994, for review). Under a VI schedule, the first response after an unpredictable delay has elapsed is reinforced with food. The matching law predicts that the proportion of responses to one of two concurrently-available VI schedules equals the proportion of reinforcers obtained from that schedule. Because the independent variables in choice research are often temporal parameters of reinforcement, i.e. its rate or delay, it is not surprising that theories of timing have been extended to account for choice. A good example is scalar expectancy theory (SET) as applied to choice between delayed reinforcers in the concurrent chains procedure by Gibbon et al. (1988). In this procedure, subjects (typically pigeons) respond to two alternatives during a choice phase or ‘initial links’ (see Fig. 1 below). Responses during the initial links occasionally produce access to one of two mutuallyexclusive outcome schedules, called ‘terminal links’. Each terminal link is signaled by a distinctive stimulus. Responding during the terminal links is reinforced with food, after which the initial links are reinstated. The dependent variable is response allocation in the initial links, which depends primarily on the delay to reinforcement signaled by each terminal link. Gibbon et al. (1988) extended SET to concurrent chains by assuming that pigeons maintain separate memories for the delays to reinforcement signaled by each terminal link. These memories are represented as normal distributions with the standard deviation proportional to the mean of the delay (which is termed the scalar property). During the initial links, a pigeon repeatedly samples from the memories for each terminal link, and responds to the initial link that corresponds to the terminal link with the shorter remembered delay. Gibbon et al. showed that SET could predict many of the results from concurrent chains, including undermatching with VI terminal links, overmatching with fixed-interval (FI) terminal links (which reinforce the first response that occurs after a constant delay), and preference for variability with VI versus FI terminal links (but see Preston, 1994). For present purposes, the important feature of SET is that responding in the
initial links is mediated by a representation of the reinforcement delays signaled by each terminal link. Choice is thus explained in terms of a fundamental timing process that is manifested in a variety of other paradigms as well. By contrast, models for choice based on the matching law have assumed that responding in the initial links is determined primarily by the conditioned reinforcement value of the terminal link stimuli (e.g. Squires and Fantino, 1971; Killeen, 1982; Vaughan, Jr., 1985; Grace, 1994). Value, in turn, is a function of the reinforcement delay signaled by a terminal link. Although the details of these models differ, they share the assumption that initial link responding is mediated by terminal link value. Terminal link value is similar to the delay memories posited by SET because it is a theoretical construct representing the reinforcement history associated with a particular stimulus. The data to which both SET and matching law-based models have been applied are molar aggregates of steady-state responding in concurrent chains. In several dozens of studies, pigeons have been exposed to a given set of schedule parameters for enough sessions for choice responding to have stabilized, and the data are sums or averages over the last several sessions. In contrast, virtually nothing is known about acquisition of preference in concurrent chains. Thus, despite the empirical success of matching lawbased models in accounting for data from concurrent chains and other procedures (e.g. Grace, 1994, 1996), it is not known whether the assumption that initial link responding is determined by terminal link value can apply to choice responding during transitions. Similarly, as a formulation of choice, SET is an extension of a theory based on steady-state data from timing procedures (Gibbon, 1977), and the evidence that responding in the initial links of concurrent chains is mediated by repeated sampling from memories for terminal link reinforcement delays is indirect. To test the assumption that initial link responding is mediated by a theoretical construct corresponding to reinforcement history in the presence of a distinctive stimulus, we introduce a novel procedure for the study of choice and timing. Our
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127
117
Fig. 1. The upper panel shows the events on a trial in the peak procedure, used in conditions 1 and 3. The bottom panel shows the events on a trial in the procedure (peak + cc) used in conditions 2 and 4, in which the ITI of the peak procedure was replaced by the initial links of concurrent chains. See text for more details.
118
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127
procedure combines concurrent chains and the peak procedure in order to allow for simultaneous measures of choice and timing. In effect, this method allows us to measure terminal link value, based on expected delay to reinforcement, independently of choice. The peak procedure (Catania, 1970; Roberts, 1981) is similar to a discrete-trial FI schedule in which reinforcement is omitted on a percentage of trials. (see Fig. 1) The duration of these no-food trials extends well past the time that reinforcement is delivered under the FI schedule on food trials. Because average response rate as a function of time on no-food trials typically rises to a maximum or peak at approximately the FI schedule value, the peak procedure has been viewed as providing a direct window on an internal clock or timing process (Roberts, 1981). Thus, the peak procedure gives a measure of the expected delay to reinforcement. In the first condition of our experiment, pigeons were trained on a peak procedure in which two different delays to reinforcement arranged by FI schedules (10 and 20, or 20 and 40 s) were signaled by red or green illumination of the centerkey in a three-key operant chamber, separated by an inter-trial interval (ITI). On one-quarter of the trials no food was available, and the trial ended after a duration equal to three times the FI schedule value. In the second condition, the peak procedure was embedded within concurrent chains; the ITI was replaced by the initial links, signaled by white illumination of the side keys. Initial link responses occasionally gave access to the terminal links, which were identical in all respects to the trials of the peak procedure. By the end of the second condition, all pigeons responded much more frequently on the initial link preceding the terminal link that signaled the shorter delay. In the third condition, the pigeons were returned to the peak procedure and the FI 10 s (or 40 s) schedule was changed to FI 40 s (or 10 s). Finally, in the fourth condition the peak procedure was again embedded within concurrent chains. The key issue was whether choice responding would immediately reflect the updated values of the terminal link stimuli, established in the third condition, as the unchanged FI 20 s terminal link
signaled the shorter delay in the second condition and the longer delay in the fourth (or vice versa).
2. Method
2.1. Subjects Four White Carneau pigeons, numbered 022, 031, 119, and 319, participated as subjects, and were maintained at 85% ad libitum weight 9 15 g. All birds had previous experience with concurrent chains, but were naı¨ve to the peak procedure. They were housed individually in a vivarium with a 12:12 h light-dark cycle (lights on at 07:00 h). Water and grit were available continuously in the home cages.
2.2. Apparatus Four standard three-key operant chambers were used. The chambers measured 35 cm in length×35 cm in width × 35 cm in height, and the keys were located 26 cm above the floor. The keys could be illuminated red, white or green. Chambers were equipped with a houselight 7 cm above the center key for general illumination, and a grain magazine with a 6× 5 cm opening located 13 cm below the center-key. The magazine was illuminated when wheat was made available. A force of approximately 0.10 N was required to operate each key, and effective responses produced an auditory feedback click. Chambers were enclosed in sound-attenuating boxes that were fitted with ventilation fans for masking extraneous noises. The experiment was controlled and data collected using a MED-PC system interfaced to an IBM-compatible microcomputer located in an adjacent room.
2.3. Procedure Because all subjects were experienced, training began immediately in the first condition on the peak procedure (Catania, 1970; Roberts, 1981). Sessions were conducted daily with few exceptions.
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127
119
Table 1 Sequence of experimental conditions for each subjecta Condition
Procedure
Bird 022
1 2 3 4
Peak Peak+ccb Peak Peak+ccb
FI FI FI FI
20 20 20 20
Bird 031 FI FI FI FI
40 40 10 10
FI FI FI FI
10 10 40 40
Bird 119 FI FI FI FI
20 20 20 20
FI FI FI FI
40 40 10 10
Bird 319 FI FI FI FI
20 20 20 20
FI FI FI FI
20 20 20 20
FI FI FI FI
10 10 40 40
a
Listed are the FI schedule values for the red (left terminal link) and green (right terminal link) trials. All conditions lasted for 25 sessions. b Peak+cc= concurrent chains in which the terminal links are identical to trials in the peak procedure.
2.3.1. Peak procedure Sessions consisted of 72 trials. The sequence of events on a trial is depicted in the upper panel of Fig. 1. Each trial was preceded by an ITI of 15 s during which all lights in the chamber were extinguished. At the start of a trial, the houselight was illuminated and the center key was lighted either red or green. The color was determined randomly with the restriction that in every eight trials, four were red and four were green. There were two types of trials: regular trials which could end in food reinforcement, and peak trials, which always ended independently of subjects’ responding and without food reinforcement. Trial types were determined randomly, with the restriction that of the four trials of each color in each group of eight, there was one peak and three regular trials. Thus, of the 36 trials of each color in each session, there were 27 regular and 9 peak trials. On regular trials, an FI schedule with a 5 s limited hold operated. The FI schedule values depended on the experimental condition and are listed in Table 1. The first response after the duration specified by the schedule had elapsed was reinforced, provided that response occurred within 5 s. Reinforcement consisted of presentation of the grain magazine for 3 s, during which the only source of illumination in the chamber was the magazine light. After reinforcement the chamber remained dark and the next ITI began. If a response did not occur within 5 s after the FI duration had elapsed, the keylight and houselight were extinguished and the trial ended without reinforcement. On peak trials, the red or green keylight was illuminated for a duration equal to three times the FI schedule. After this duration elapsed, the keylight and houselight were
extinguished and the trial ended. No reinforcement was available on peak trials. After 25 sessions had been completed with the peak procedure, subjects began training in the second condition on a procedure in which the trials of the peak procedure served as the terminal link schedules in concurrent chains, as described below.
2.3.2. Peak procedure and concurrent chains Sessions consisted of 72 initial and terminal link cycles. The sequence of events in a cycle is shown in the bottom panel of Fig. 1. At the start of a cycle, the left and right keys were illuminated white, signifying the initial links or choice phase of the procedure. A terminal link entry was assigned randomly to either the left or right key, with the restriction that out of every eight terminal links, four were assigned to the left key and four to the right key. Thus the initial links represented a forced-choice procedure (Stubbs and Pliskoff, 1969). A response produced an entry into a terminal link if: (a) it was to the pre-selected key; (b) an interval selected from a VI 8 s schedule at the start of the cycle had timed out; and (c) a 1 s changeover delay (COD) was satisfied, i.e. at least 1 s had elapsed following a changeover to the key for which terminal link entry was arranged. The VI 8 s contained 12 intervals constructed from an arithmetic progression, a, a+ d, a+ 2d, . . . , in which a equals one-twelfth and d equals one-sixth the schedule value. The intervals were sampled randomly without replacement. The initial link schedule was chosen so that the obtained time in the initial links, on average, would be approximately equal to the ITI in the peak procedure (15 s).
120
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127
The onset of a terminal link was signaled by red or green illumination of the center key, coupled with darkening the side keys. Terminal links that were produced by responses to the left initial link key were signaled by red on the center key; those that were produced by right initial link responses were signaled by green. In all respects, red and green terminal links were identical to trials in the peak procedure; there were 36 presentations of each stimulus in a session, comprising 27 regular trials and nine peak trials. The same FI schedules operated with the same 5 s limited hold on regular trials; peak trials lasted for three times the duration of the FI schedule and ended without reinforcement. After the trial (i.e. terminal link) ended, the side keys were illuminated white signaling the initial links and the next cycle began. The houselight remained illuminated at all times except during reinforcement. In effect, the procedure consisted of a peak procedure in which the ITI was replaced by the initial links. The sequence of experimental conditions for each subject is listed in Table 1. A peak procedure was used for conditions 1 and 3; a concurrent chains procedure with terminal links that were identical to the trials of the peak procedure from the preceding condition was used for conditions 2 and 4 (‘peak+cc’). In one pair of conditions, the schedules were FI 10 s and FI 20 s, and in the other pair the schedules were FI 20 s and FI 40 s. The assignment of schedules was counterbalanced, but for all subjects the initial link key that gave access to the shorter FI schedule was reversed between conditions 2 and 4, and the stimulus associated with the FI 20 s schedule did not change. All conditions lasted for 25 sessions. The primary dependent variables were the number of responses during each peak trial, tallied in 1 s bins, and responses during the initial links in the peak +cc procedure. Initial link choice allocation was measured as the log (base 10) of the ratio of responses to the left and right keys.
3. Results A comparison of timing data from the peak procedure (conditions 1 and 3) with those from
the terminal links comprised by the peak procedure embedded within concurrent chains (‘peak+ cc’; conditions 2 and 4) will be considered first. Fig. 2 displays the response rates in the no-food trials or terminal links, aggregated across the last five sessions in each condition and averaged across subjects. The data for the peak procedure are shown in the upper panels; corresponding data for the peak + cc procedure are shown in the bottom panels. Different schedules are marked as noted in the legend. In general, the data show the pattern typical of the peak procedure, with response rates rising to a maximum near the FI schedule value and then decreasing (although not to zero). Variability in the distributions increased with the FI value. Visual inspection suggests that distributions for the peak and peak+ cc procedures were substantially similar. To facilitate a more precise comparison, several statistics were computed for the distributions in Fig. 2 for each subject. Peak times were calculated according to the trimmed median method described by Cheng and Roberts (1991). A cumulative response distribution was generated and the 1 s bin that contained the median response was located. The median time was calculated by linear interpolation. Then a new cumulative distribution was generated excluding all responses that occurred after twice that median. A new median was calculated, and the process was iterated until successive medians stopped changing. The final median time was taken as an estimate of the peak time. Once the median was obtained, the interquartile range was computed as the difference between the time values for the 75 and 25% percentile responses, each again calculated by linear interpolation. Fig. 3 shows the median times, averaged over subjects, for the peak (left panel) and peak + cc procedures (right panel). The interquartile range data are shown in Fig. 4. According to Weber’s law, measures of peak time and variability should increase proportionately with the interval being timed. Regression lines were calculated that were constrained to pass through the origin for the data in each panel. These regressions accounted for between 93 and 98% of the variance in each case, suggesting that both measures increased pro-
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127
121
Fig. 2. Response rate as a function of time in the no-food trials in both the peak procedure (upper panels) and peak procedure embedded within concurent chains (peak +cc; bottom panels). Data from the condition in which FI 10 s and FI 20 s schedules alternated are shown in the left panels; corresponding data from the FI 20 s and FI 40 s condition are displayed in the right panels. Data are aggregated over the last five sessions in each condition and averaged across subjects. Schedules are marked as noted in the legends.
portionately in accord with Weber’s law. Moreover, the slopes for the peak and peak + cc data were nearly identical for both median and interquartile range. This is evidence that responding in the terminal links is a result of the same timing process as responding in the peak procedure. Repeated-measures analyses of variance (ANOVAs) with procedure and FI schedule as factors were conducted on the median and interquartile range data. For the medians, there was a significant effect of schedule, F(3,9) = 131.98, P B 1×10 − 6, but the effect of procedure and the schedule–procedure interaction did not reach significance, F(1,3)=0.04, and F(3,9) = 2.59, both NS. The results for interquartile range were similar: There was a significant effect of schedule, F(3,9)=27.26, PB 1 ×10 − 4, but the effect of
procedure and the schedule–procedure interaction did not reach significance, F(1,3)= 0.05 and F(3,9)=1.10, both NS. These analyses confirm the visual impression from Fig. 2, i.e. that the temporal patterns of responding did not differ significantly between the peak and peak+ cc procedures. In addition, the absence of a significant interaction indicates that the median and interquartile range for the FI 20 s schedule were not systematically different depending on the alternated schedule. The key issue in the present experiment is the relation between measures of choice and timing across the same sessions. The data from condition 2 are displayed in Fig. 5. For all subjects, the log of the initial link response rate ratio is plotted on the right y-axis, and the median and interquartile
122
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127
Fig. 3. Medians of the response rate distributions in Fig. 2 as a function of FI schedule value. Medians were determined for distributions for individual pigeons and averaged. Data for the peak procedure are shown in the left panel; corresponding data for the peak + cc procedure are shown in the right panel. Error bars indicate 9 1 S.E. The diagonal line is the best-fitting regression constrained to pass through the origin. Different schedules are marked according to the legend. Note: FI 20 (10) =FI 20 s paired with FI 10 s; FI 20 (40)=FI 20 s paired with FI 40 s.
range for responding on the no-food presentations of each terminal link are plotted on the left y-axis. Different terminal link schedules are marked in the legend. The interquartile range is noted by the error bars extending above or below the median. The bars indicate one-half of the interquartile range (e.g. 50 – 75, or 25 – 50%). Each data point represents a single session, except for the left-most points. The left-most median and interquartile ranges are averages of the last five sessions of condition 1. The left-most log initial link response ratio is the average of the last five sessions of a previous experiment which the birds participated in and which ended just before condition 1 began, which used a concurrent chains procedure with identical stimuli as the present experiment. At the end of that prior experiment, all subjects were responding more on the initial link key preceding the same terminal link that signaled the shorter delay in condition 2. Thus, the preference for the shorter terminal link shown in the first session of condition 2 may be the result of carry-over effects. Fig. 5 shows that temporal control of behavior in the no-food trials was disrupted, to a varying extent across subjects, by the introduction of con-
current chains in condition 2. For Birds 022 and 119, the median for the longer schedule (FI 40) decreased in the first session, and subsequently recovered its previous value. For Bird 319, the median for the longer schedule (FI 20) increased and then recovered, whereas for Bird 031 the median for the longer schedule (FI 20) was unchanged. In general, medians for the shorter schedule were less changed. However, in all cases temporal control over terminal link responding recovered well before the end of the condition 2, so that the average median and interquartile range over the last 5 sessions were not systematically different from condition 1. In condition 3 the initial links were discontinued and the birds were again trained on a peak procedure with the FI 10 s changed to FI 40 s (and vice versa), while the FI 20 s schedule remained constant. Thus, the critical question was whether, when the birds were returned to concurrent chains in condition 4, initial link choice would show an immediate effect of the schedule change in condition 3. The choice and timing data for condition 4 are shown in Fig. 6. The left-most data points represent, for the median and interquartile range, the average of the last five ses-
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127
123
Fig. 4. Interquartile ranges of the response rate distributions in Fig. 2 as a function of FI schedule value. Interquartile ranges were determined for distributions for individual pigeons and averaged. Data for the peak procedure are shown in the left panel; corresponding data for the peak + cc procedure are shown in the right panel. Error bars indicate 91 S.E. The diagonal line is the best-fitting regression constrained to pass through the origin. Different schedules are marked according to the legend. Note: FI 20 (10)=FI 20 s paired with FI 10 s; FI 20 (40)=FI 20 s paired with FI 40 s.
sions in condition 3, and for the log initial link response ratio, the average of the last five sessions of condition 2 (i.e. the bird’s most recent history with each of the relevant stimuli). There was no evidence of an immediate effect of the schedule change on preference. All subjects initially demonstrated strong preferences for the initial link that had preceded the shorter terminal link in condition 2. These preferences changed gradually, so that by the end of condition 4 all subjects were responding more on the initial link that preceded the shorter terminal link. The change in preference over the course of condition 4 was very large, averaging 1.82 log (base 10) units. This change is in dramatic contrast with the overall stability of timing responding. There was less disruption at the outset of condition 4, compared with condition 2 (see Fig. 5); the median and interquartile range data did not change systematically for any subject, with the exception of the FI 40 s schedule for Bird 319. For this bird, the median increased at the beginning of condition 4 and decreased with continued training. However, this change is opposite in direction to the shift in preference for Bird 319, assuming that preference depends on a comparison of the
learned delays with reinforcement in the terminal links and that responding in the no-food trials measures that learning. The data from condition 4 demonstrate a clear dissociation between choice and timing: Whereas timing of the terminal link delays, as indexed by responding on the no-food trials, remained approximately constant from the beginning of condition 4, preference from the previous concurrent chains condition (2) persisted and relative initial link responding was not ordinally consistent with the timed delays until well into condition 4. This result challenges models for choice, such as Grace’s (1994) Contextual Choice Model (CCM) and SET (Gibbon et al., 1988), which assume that initial link responding reflects the current status of a theoretical construct (i.e. value or memories for delays) that represents the reinforcement history associated with a particular stimulus.
4. Discussion Measures of temporal control of responding on no-food trials—specifically, the median and interquartile range of response rate distributions—
124
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127
Fig. 5. Session-by-session choice and timing data from condition 2. Data for individual birds are given as noted. Plotted according to the left y-axis are the median (unfilled symbols; schedules noted in legend) and interquartile range (error bars). The 50–75% response is indicated with plus error bars; the 50–25% response by minus error bars. The left-most median and interquartile range points are the average over the last five sessions of condition 1 (peak procedure). The log initial link response ratios are indicated with filled circles and are plotted against the right y-axis. The left-most filled circle represents the average log initial link response ratio over the last five sessions of each birds’ most recent history with concurrent chains (see text for more explanation).
were not systematically different regardless of whether those trials occurred within a peak procedure, or the peak procedure was embedded as the terminal links of concurrent chains (peak+ cc). For both procedures, the median and interquartile ranges increased proportionately with the FI schedule value, as required by Weber’s law. The orderly temporal control of responding during the terminal links establishes our procedure as a valid technique for obtaining simultaneous measures of choice and timing. That the same temporal control characterizes responding during FI terminal links and the peak procedure is consistent with the notion that initial link choice can be explained in terms of timing the terminal link delays to reinforcement. If choice can be explained in terms of a fundamental timing process (e.g. Gibbon et al., 1988), then converging operations (Garner et al., 1956) which assess ongoing choice and timing should reveal a concordance between these aspects of behavior. However, results from condition 4 demonstrated a clear dissociation between choice and timing. The pigeons were returned to concurrent chains in condition 4 after the delay for one stimulus in the peak procedure had been changed in condition 3. Because of this change, the position of the initial link that led to the shorter terminal link delay was reversed in condition 4, compared with the concurrent chains in condition 2. Rather than show an immediate shift consistent with the new delay, choice responding in the initial links for all subjects changed slowly over the course of a number of sessions. The gradual change in preference was obtained despite the fact that the birds continued to time both terminal link delays accurately, as indicated by their responding in the no-food trials (see Fig. 6). This result is difficult to reconcile with theories of choice that assume that initial link responding is mediated by a representation of the terminal link delays. For example, according to scalar expectancy theory (Gibbon et al., 1988), pigeons maintain separate memories for the reinforcement delays signaled by the terminal link stimuli, and sample from these memories when deciding which initial link to choose. According to this view, the pigeons should have shown an immediate prefer-
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127
Fig. 6. Session-by-session choice and timing data from condition 4. Data for individual birds are given as noted. Plotted according to the left y-axis are the median (unfilled symbols; schedules noted in legend) and interquartile range (error bars). The 50 – 75% response is indicated with plus error bars; the 50–25% response by minus error bars. The left-most median and interquartile range points are the average over the last five sessions of condition 3 (peak procedure). The log initial link response ratios are indicated with filled circles and are plotted against the right y-axis. The left-most filled circle represents the average log initial link response ratio over the last five sessions of condition 2.
125
ence for the shorter terminal link in condition 4. Perhaps the lack of such a shift in preference could be explained if the initial link stimuli reinstated the delay memories from condition 2. But this would suggest that temporal control of responding during the no-food terminal links should have been severely disrupted; yet no such disruption occurred. Therefore, the question posed by the present data is why, given the undeniable fact that both initial link and terminal link responding is controlled by the terminal link reinforcement delays, responding in condition 4 was determined by delays from different temporal epochs (i.e. initial link responding by the delays from condition 2; terminal link responding by the delays from condition 3). For similar reasons, the present data also challenge matching law-based models of concurrent chains, which maintain that responding in the initial links is determined by the current value of the terminal link stimuli (Grace, 1994). The relative value of the stimuli should have reversed during condition 3; yet this reversal was not reflected in choice performance. The assumption that choice matches the relative value of the terminal link stimuli might apply only to molar, steady-state responding, but not to choice in transition. Thus, the present results pose a challenge to current theories of choice, which are based largely on steady-state data, because any eventual complete theory will also have to account for behavior in transition. Our results are consistent with those of Williams et al. (1995). They trained pigeons on a multiple chain schedule in which there were three components in each chain. The terminal link in each chain ended in food reinforcement. After baseline training, a devaluation phase occurred in which the terminal link stimuli were presented successively and one always ended in extinction. To test whether the devalued terminal link stimulus would have an immediate decremental effect on responding in the earlier links, the pigeons were returned to the original chain, with both terminal links now in extinction. In the first extinction session, terminal link response rates were greater in the presence of the stimulus that had not been devalued, but there were no differences
126
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127
in initial or middle link responding (although one did develop for the middle link with continued training). Thus, in Williams et al. (1995) and the present study, devaluing a terminal link stimulus outside the context of its chain schedule had no immediate impact on initial link performance, despite the fact that ongoing terminal link behavior suggested that the value of that stimulus had changed. The present experiment, and that of Williams et al. (1995), are similar in some respects to studies that have explored the impact of changing the value or effectiveness of a reinforcer. These studies have been taken as evidence that operant behavior is mediated by knowledge of the response–reinforcer relation (see Delamater and LoLordo, 1991; Dickinson, 1994, for review). For example, Adams and Dickinson (1981) trained rats to press a lever for one type of reinforcer (sucrose or food pellets) while the other type was presented independently of responding. They then devalued either the sucrose or food pellets by pairing with lithium chloride while the lever was removed from the chamber. In a subsequent extinction test, lever pressing decreased more when the response-contingent reinforcer had been devalued. Because responding was precluded during devaluation and the devalued reinforcer was never present during the extinction test, this result implies that the decrease in lever pressing reflected an updated knowledge about the value of the reinforcer. However, lever pressing did not fall to zero despite non-consumption of the devalued reinforcer, and this residual responding could be ascribed to the subjects’ histories of reinforcement before devaluation. Thus, in Adams and Dickinson’s study, behavior during extinction was jointly determined by events during two prior temporal epochs. These and related results led Dickinson (1994) to conclude that operant contingencies establish responding ‘partly as a goal-directed action, mediated by knowledge of the instrumental relation, and partly as an S-R habit impervious to outcome devaluation’ (52 pp.). In Dickinson’s terms, our data suggest that initial link choice allocation early in condition 4 was impervious to the values of the terminal link schedules established during condition 3. How-
ever, we cannot conclude that the subjects’ histories in condition 3 had no effect on the rate of choice acquisition during condition 4. Comparison with a control group for which condition 3 was omitted or replaced with training on an irrelevant task might show that our subjects acquired their final choice allocation in condition 4 more rapidly because of their knowledge, gained during condition 3, of the FI values arranged in condition 4. To summarize, our results showed that measures of the temporal control of behavior were the same regardless of whether no-food trials occurred in the peak procedure or as the terminal links of concurrent chains. This suggests that our new procedure, which combines the peak procedure and concurrent chains, is a valid technique for obtaining simultaneous measures of ongoing choice and timing behavior, and should therefore be useful for future research that explores the role of temporal processes in choice. However, the dissociation in choice and timing that we observed in condition 4, with initial link responding apparently controlled by the reinforcement delays from condition 2 while terminal link responding was controlled by the delays from condition 3, presents a problem for explanations of choice based on an underlying timing process.
Acknowledgements This research was supported by NSF Grant IBN-9507584 to the University of New Hampshire.
References Adams, C.D., Dickinson, A., 1981. Instrumental responding following reinforcer devaluation. Q. J. Exp. Psychol. 33B, 109 – 121. Catania, A.C., 1970. Reinforcement schedules and psychophysical judgments: A study of some temporal properties of behavior. In: Schoenfeld, W.N. (Ed.), The Theory of Reinforcement Schedules. Appleton-Century-Crofts, New York, pp. 1 – 42. Cheng, K., Roberts, W.A., 1991. Three psychophysical principles of timing in pigeons. Learn. Motiv. 22, 112 – 128.
R.C. Grace, J.A. Ne6in / Beha6ioural Processes 45 (1999) 115–127 Delamater, A.R., LoLordo, V.M., 1991. Event revaluation procedures and associative structures in Pavlovian conditioning. In: Dachowski, L., Flaherty, C.F. (Eds.), Current topics in animal learning: Brain, Emotion, and Cognition. Lawrence Erlbaum, Hillsdale, NJ, pp. 55–94. Dickinson, A., 1994. Instrumental conditioning. In: Mackintosh, N.J. (Ed.), Animal Learning and Cognition. Academic Press, San Diego, pp. 45–79. Garner, W.R., Hake, H.W., Eriksen, C.W., 1956. Operationism and the concept of perception. Psychol. Rev. 63, 149 – 159. Gibbon, J., 1977. Scalar expectancy theory and Weber’s law in animal timing. Psychol. Rev. 84, 279–325. Gibbon, J., Church, R.M., Fairhurst, S., Kacelnik, A., 1988. Scalar expectancy theory and choice between delayed rewards. Psychol. Rev. 95, 102–114. Grace, R.C., 1994. A contextual model of concurrent-chains choice. J. Exp. Anal. Behav. 61, 113–129. Grace, R.C., 1996. Choice between fixed and variable delays to reinforcement in the adjusting-delay procedure and concurrent chains. J. Exp. Psychol.: Anim. Behav. Process. 22, 362 – 383. Herrnstein, R.J., 1961. Relative and absolute strength of response as a function of frequency of reinforcement. J. Exp. Anal. Behav. 4, 267 –272.
.
127
Killeen, P., 1982. Incentive theory: II. Models for choice. J. Exp. Anal. Behav. 38, 217 – 232. Preston, R.A., 1994. Choice in the time-left procedure and in concurrent chains with a time-left terminal link. J. Exp. Anal. Behav. 61, 349 – 373. Roberts, S., 1981. Isolation of an internal clock. J. Exp. Psychol.: Anim. Behav. Proc. 7, 242 – 268. Stubbs, D.A., Pliskoff, S.S., 1969. Concurrent responding with fixed relative rate of reinforcement. J. Exp. Anal. Behav. 12, 887 – 895. Squires, N., Fantino, E., 1971. A model for choice in simple concurrent and concurrent-chains schedules. J. Exp. Anal. Behav. 15, 27 – 38. Vaughan, Jr., W., 1985. Choice: A local analysis. J. Exp. Anal. Behav. 43, 383 – 405. Williams, B.A. 1988. Reinforcement, choice, and response strength. In: Atkinson, R.C. Herrnstein, R.J. Lindzey, G. Luce, R.D. (Eds.), Stevens’ Handbook of Experimental Psychology, 2nd edn. Learning and Cognition, Vol. 2, Wiley, New York, pp. 167 – 244. Williams, B.A., 1994. Reinforcement and choice. In: Mackintosh, N.J. (Ed.), Animal Learning and Cognition. Academic Press, San Diego, pp. 81 – 108. Williams, B.A., Ploog, B.O., Bell, M.C., 1995. Stimulus devaluation and extinction of chain schedule performance. Anim. Learn. Behav. 23, 104 – 114.