The differential reinforcement of reward-produced and response-produced stimuli

The differential reinforcement of reward-produced and response-produced stimuli

LEARNING AND MOTIVATION 10, 364-381 (1979) The Differential Reinforcement of Reward-Produced and Response-Produced Stimuli STEVEN J. HAGGBLOOM Ar...

1MB Sizes 1 Downloads 105 Views

LEARNING

AND

MOTIVATION

10,

364-381 (1979)

The Differential Reinforcement of Reward-Produced and Response-Produced Stimuli STEVEN J. HAGGBLOOM Arkansas

State

University

In two experiments, groups received successive large-reward trials on oddnumbered days and successive small-reward trials on even-numbered days in the same gray alleyway. This produced a discrimination problem in which the memory of large reward (SL) was reliably discriminative of large reward and the memory of small reward (Ss) was reliably discriminative of small reward. Intertrial interval (ITI) was varied both between and within groups. In the within-groups manipulation, IT1 separating S+ trials differed from that separating S- trials. Experimental groups learned the discrimination, running slower to Ss (S- cue) than to SL (S+ cue), and showed a negative contrast effect, running slower to Sthan a small-reward control group. Discrimination was somewhat faster at massed than at spaced trials. The within-groups manipulation of IT1 suggested that the effects of ITI were mediated by time-dependent changes in internal cues produced by reward events and by the instrumental response. The control exercised by internal cues was shown to be associative rather than nonassociative, e.g., motivational. Similarities with, and implications for, conventional brightness differential conditioning were discussed.

In differential instrumental conditioning, responses in the positive stimulus (S+) are associated with large reward and responses in the negative stimulus (S-) are associated with small reward or nonreward. The relevant discriminanda, e.g., brightness cues, such as a black versus a white runway, are presented on separate trials. Appropriate discriminative behavior occurs in the form of faster running in S + than in S - . Discrimination learning experiments have traditionally emphasized the behavioral control exercised by external stimuli such as brightness cues, Experiment 1 was submitted by the author to Purdue University in partial fulfillment of the requirements for the Ph.D. degree. The author would especially like to thank Professor E. J. Capaldi who served as the committee chairperson and committee members Elizabeth Capaldi, Dale Leonard, and Henry Roediger III for their many helpful comments and suggestions. Experiment 2 was conducted while the author was a Visiting Assistant Professor of Psychology at Purdue University, during which time Professor Capaldi generously supplied laboratory equipment, supplies, and space. Requests for reprints should be addressed to Steven J. Haggbloom, Psychology Department, Arkansas State University, P. 0. Box 2127, State University, AR 72467. 364 0023-%90/79/030364-18$02.00/O Copyright All rights

@ 1979 by Academic Press, Inc. of reproduction in any form reserved.

DIFFERENTIAL

REINFORCEMENT

365

circles versus triangles, etc. By contrast, the control exercised by internal stimuli, e.g., memories produced by reward outcomes (Capaldi, 1966, 1%7) and by the instrumental response (Capaldi & Haggbloom, 1975), has been emphasized in partial and varied reinforcement experiments. Both discrimination learning and partial reinforcement involve intermittent or varied reinforcement, however, and the two situations appear to be highly interrelated (Capaldi, 1979; McHose & Blackwell, 1975). Thus, Capaldi, Berg, and Morris (1975) suggested that even though brightness cues may be relevant to solving a discrimination problem, reward-produced memories nevertheless gain substantial control over behavior. As in partial reinforcement, the control exercised by reward-produced cues in brightness differential conditioning is a function of trial sequence (Capaldi et al., 1975; Haggbloom, 1978) and intertrial interval (ITI) (Haggbloom, 1978). Internal stimulus control in differential conditioning may either facilitate or retard the development of appropriate discriminative responding to brightness cues. It thus constitutes an important key to understanding discrimination learning. One approach to understanding the role played by internal stimuli in regulating discriminative behavior is to manipulate trial sequence and IT1 in brightness differential conditioning as in the experiments cited above. An alternative approach, and that taken here, is to employ internal cues as the relevant discriminanda in a situation where external cues cannot be used to solve the discrimination problem. Single alternation experiments in which reward and nonreward occur on alternate trials represent one example of this approach. The particular task employed here, however, differs from single alternation and appears to have more in common with a conventional discrimination learning task such as brightness differential conditioning. Because of this similarity, which will become evident as the task and data are described, the present situation may prove particularly useful not only for studying the discriminative stimulus properties of internal cues, but may help in the effort to untangle internal and external stimulus control of behavior in brightness differential conditioning as well. The present experiments were also concerned with the effects of ITI on discrimination learning. In brightness differential conditioning, discrimination learning is impaired at a 24-hr IT1 compared to massed trials (Haggbloom, in press), but manipulations of IT1 within a day’s trials ordinarily have no effect on discriminative behavior with one exception; when the trial sequence is arranged so that correct responding is simultaneously regulated by reward-produced cues and brightness cues, discrimination occurs more rapidly at massed trials than at spaced trials (Haggbloom, 1978). Single alternation learning, where reward-produced cues are the relevant discriminanda, occurs readily at massed trials but is more difficult to obtain as IT1 increases (Jobe, Mellgren, Feinberg, Littlejohn, & Rigby, 1977).

366

STEVEN

J. HAGGBLOOM

The current tendency is to attribute IT1 effects in instrumental learning to time-dependent changes in internal cues, e.g., part or all of a cue may dissipate over time thereby decreasing its capacity to control behavior. Although time-dependent changes in the capacity of reward-produced cues to control behavior have been emphasized (Amsel, Wong, & Traupmann, 1971; Capaldi, 1972; Capaldi, Berg, & Sparling, 1971), internal stimuli produced by the instrumental response (SIR), which also change over time, may in many situations be at least as important, if not more so (Capaldi & Haggbloom, 1975). In the present experiments, IT1 was manipulated in order to identify the control over discriminative behavior exercised by internal reward-produced and response-produced cues and to determine how the effects of IT1 might be mediated by time-dependent changes in those cues. In keeping with the approach taken by Capaldi and Haggbloom (1975), the ITI preceding a trial is said to determine the value of reward-produced and response-produced cues present on that trial. Small reward produces the internal cue Ss and large reward the internal cue SL. The internal cue SIR is produced by the instrumental response. Both classes of internal cues, reward-produced and response-produced, remain viable sources of stimulus control for as long as 30 min after being occasioned and thus have been described as memories, a description that does not preclude the possibility that they consist partly of fading traces or otherwise are modified over time (Capaldi & Haggbloom, 1975). Also, the responseproduced cues of interest here, as in Capaldi and Haggbloom (1975), are traces or memories of responses previously executed rather than feedback stimuli from a response currently being executed.’ It is important to note that SIR must occur as part of a stimulus compound also containing either Ss or SL. For example, if small reward occurs on Trial T and the IT1 is 2 min, the 2-min values of both SIR and Ss would be present on Trial T + 1. If large reward occurs on Trial T and the IT1 is 2 min, the 2-min values of both SL and SIR would be present on Trial T + 1. At a 30-min IT1 the stimulus compounds would consist of the 30-min values of SL and SIR or of S” and SIR. The stimulus SK is thus seen to be common to trials separated by a given ITI. EXPERIMENT

1

In the discrimination situation used here all trials occurred in the same gray alleyway and the Ss or SL occasioned on the immediately preceding trial served as the S+ or S- discriminandum. Each of four differential ’ It is of little consequence to the present analysis whether the stimulus SIRis produced by the instrumental response or instead derives partly or even completely from some other source. In any case, there appear to be cues other than those produced by goal events that are common to trials separated by a given ITI. To identify them as response-produced cues is partly a matter of expediency, but it is also in keeping with a prior analysis Kapaldi & Haggbloom, 1975).

DIFFERENTIAL

REINFORCEMENT

367

conditioning groups received four large-reward trials on all odd-numbered days and four small-reward trials on all even-numbered days of training. Discriminative behavior was measured over Trials 2,3, and 4 of each day, the discriminative cue for each trial having been provided by the reward outcome on the immediately preceding trial. The first trial of each day will be discussed later in this report. For the moment, view Trial 1 with respect to a singular function, that of providing the discriminative cue for Trial 2. According to the sequential view (Capaldi, 1966, 1967), the memory SL produced on Trials 1,2, and 3 of odd-numbered days would be reinstated, respectively, on Trials 2, 3, and 4 of those days. Since large reward occurred on all four trials of odd-numbered days, responding in the presence of SL was always associated with large reward. Thus, SL would be expected to acquire properties of an S+ cue in much the same way as, say, a black alley always associated with large reward. The last three trials of even-numbered days were defined as S- trials. Those trials occurred in the presence of Ss, which served as the S- cue since it was always associated with small reward. In Phase 1, all daily trials for two differential conditioning groups were separated by a 2- or a 30-min ITI. In the 2-min IT1 condition, the 2-min value of SIR was common to S+ and S- trials. In the 30-min IT1 condition, the 30-min value of SIR was common to S+ and S- trials. In two additional differential conditioning groups the IT1 separating S + trials differed from that separating S- trials. In one group all S+ trials were separated by a 30-min ITI, whereas all S- trials were separated by a 2-min ITI. Those conditions were reversed in another group; that is, the IT1 separating S+ trials was 2 min and the IT1 separating S- trials was 30 min. If, as indicated by Capaldi and Haggbloom (1975), the 2- and 30-min values of SIR are discriminably different, the distinctiveness of the stimulus compounds present on S+ and S- trials would be increased, and hence discrimination learning facilitated, in groups given different ITIs between successive St trials versus successive S- trials. That is, SIR would function as an additional discriminative stimulus in the latter two groups where, unlike the former two groups, S-t and S- trials would not be accompanied by a common value of SIR. Two control groups in Experiment 1 received small reward on all trials and either a 2- or a 30-min ITI. Let the letters “D” and “SC,” respectively, denote differential conditioning and small-reward control groups. For the differential conditioning groups, S+ IT1 is denoted by the appropriate number followed by “+” and S- IT1 by the appropriate number followed by “-.” The six groups in Experiment 1, then, were D:2+2-, D:30+30-, D:2+30-, D:30+2-, SC:2, and SC:30. Following the initial stage of training (Phase 1) which lasted 30 days,

368

STEVEN

J. HAGGBLOOM

there were two ITI-shift phases in which the S- ITI was changed while the S+ IT1 remained the same. IT1 shifts in other situations have been found to be useful in identifying internal stimulus control of behavior (e.g., Amsel et al., 1971; Capaldi ef al., 1971; Capaldi & Haggbloom, 1975). In Phase 2, Groups D:2+2and D:30+2received a 30-min IT1 separating S- trials; S+ IT1 was not changed. There was no IT1 shift in Phase 2 for the two differential conditioning groups currently receiving a 30-min IT1 between S- trials, Groups D:2+30- and D:30+30-. In Phase 3, all groups were shifted to massed S- trials (Zmin ITI). Thus, groups D:2+2and D:30+2were returned to their preshift IT1 whereas Groups D:2+30- and D:30+30- experienced a 2-min IT1 on S- trials for the first time. Method

Subjects. The subjects were 60 male rats, approximately 90 days old upon arrival at the laboratory, obtained from the Holtzman Company, Madison, Wisconsin. The rats were housed individually and had access to water at all times. Apparatus. The apparatus was a runway 208.4 cm long by 22.9 cm high and 10.2 cm wide. It had a hinged lid of 1.3-cm hardware cloth and was painted flat gray throughout. The initial 20.3 cm and final 30.5 cm of the runway constituted start and goal boxes, respectively. These were separated from the rest of the alley by manually operated guillotine doors. Three clocks (0.01 set) recorded start, run, and goal times over respective distances of 5.1 cm from the start box door, the next 132.1 cm, and finally over a 39.4-cm section. Opening the start box door activated the first clock, whereas photoelectric circuitry controlled its offset and the operation of the remaining two clocks. A goal cup constructed by boring a circular hole with a 6-cm diameter in a block of wood was positioned against the end wall of the goal box. General procedures. Two weeks before the beginning of the experiment the rats were placed on a deprivation schedule consisting of 12 g of rat chow per day which continued throughout the experiment. On each of 3 days preceding the first day of runway training the rats were handled in squads of three for 3 min per squad and were given ten 45mg Noyes pellets in their home cage. The number of pellets eaten either in the home cage or in the apparatus was always subtracted from the 12-g daily ration. Ten rats were randomly assigned to each of six groups. The trial administration procedures were such that rats given a short IT1 received their training trials during the IT1 for spaced groups. All trials were initiated by opening the start box door regardless of the rat’s orientation approximately 3 set after placing the rat in the start box. On each trial the rat was removed from the goal box after the times had been recorded

DIFFERENTIAL

REINFORCEMENT

369

(after approximately 10 set) unless the reward had not yet been consumed. A maximum criterion time of 30 set was allowed in each section of the alley. If an animal exceeded the criterion time in any one section, the additional time was subtracted from the criterion time allowed in the next section forward and added to the latency score of that section. When an animal refused to approach the goal cup within the criterion time, it was placed in the goal box and confined until the reward had been consumed. During experimental training each rat received four trials per day. The differential conditioning groups received ten 4%mg Noyes pellets (large reward) on all four trials of odd-numbered days and one pellet (small reward) on all four trials of even-numbered days. Two small-reward control groups, Groups SC:2 and SC:30, received one pellet on all trials of both odd- and even-numbered days. Phase 1. Phase 1 consisted of the first 30 days of training or 15 cycles of S+ and S- trials. (A cycle consists of a single odd-numbered S-t day and the immediately following even-numbered S- day.) In Phase 1, two differential conditioning groups received a 2-min IT1 on S+ days and either a 2-min IT1 (Group D:2+2-) or a 30-min IT1 (Group D:2+30-) on S- days. The remaining two differential conditioning groups received a 30-min IT1 on S+ days and either a 2-min IT1 (Group D:30+2-) or a 30-min IT1 (Group D:30+30-) on S- days. Phase 2. Phase 2 followed Phase 1 immediately and consisted of the next 6 days of training or three complete cycles of S+ and S- trials. In Phase 2, Groups D:2+30-, D:30+30-, SC:2 and SC:30 were treated as in Phase 1. Groups D:2+2- and D:30+2experienced a change in the IT1 separating S- trials from the 2-min IT1 received in Phase 1 to a 30-min ITI. On S+ days the procedures for all groups were identical to those in Phase 1. Phase 3. Phase 3 began immediately after Phase 2 and, like Phase 2, consisted of 6 days or three cycles of S+ and S- trials. In Phase 3 Group SC:2 continued to receive a 2-min IT1 and Groups D:2+2- and D:30+2were shifted back to a 2-min IT1 on S- days. Groups D:2+30and D:30+30experienced a shift to a 2-min IT1 on S- days during Phase 3. Group SC:30 was not run in Phase 3. As in Phase 2 there was no change in the running procedures on S+ days except for the omission of Group sc:30. Results The last three trials of each day were defined as discrimination trials. Behavior on those trials was of major interest here and is described immediately below. Behavior on the first trial of the day will be considered separately. All analyses were performed on time scores transformed into speeds (rdsec). Total speeds were obtained by a transformation of

STEVEN

370

2

CYCLES

J. HAGGBLOOM

4

6

8

OFSt

IO

AND

12

S-

14

DAYS

FIG. 1. Mean difference between speeds in S+ and speeds in S- over the total alley for the four differential conditioning groups over the 15 cycles of Phase 1 in Experiment 1.

the sum of the time scores over the three alley segments. Only total speeds are reported for Experiment 1, but some important behavioral differences across alley sections are noted. All statistics cited as significant or reliable had p < 0.05. One rat each from Groups SC:30 and D:30+2died during the experiment and their data were discarded. Phase 1. Figure 1 shows the mean difference between speeds over successive cycles of S+ and S- days in Phase 1 for each of the four differential conditioning groups. Figure 2 shows the mean speeds collapsed across daily S- trials for the two differential conditioning groups given massed S- trials and the massed trial control group in the left panel and speeds for the two differential conditioning groups given spaced Strials and the spaced trial control group in the right panel.

cz 5 1.0. ?I Id 2 -I ‘0 o.2r5 1.2-

0.8.

0.6’

0.4-

2 FIG. 2. massed Sconditioning Phase I of

6

IO

14

18

22

26

30

2

6

IO

I4

18

22

26

30

SDAYS Mean total speed in S- for the two differential conditioning groups given trials and the massed trial control group (left panel), and for the two differential groups given spaced S- trials and the spaced trial control group (right panel) in Experiment 1.

DIFFERENTIAL

REINFORCEMENT

371

As can be seen in Fig. 1, each of the differential conditioning groups came to run slower in S - than in S + . Discrimination developed first in Group D:30+2-, next in Groups D:2+2and D:2+30-, and last in Group D:30+30-. This pattern was observed in each section of the alley. Within each group, discrimination learning was most rapid in the run and goal sections of the alley and slowest in the start section, occurring an average of four cycles later in the start than in the run and goal sections. Group differences in discriminative performance were greatly reduced by the end of training. As can be seen in Fig. 2, each of the differential conditioning groups eventually ran slower than their respective smallreward control groups in S-, i.e., a negative contrast effect (NCE) developed on S- trials. At massed S- trials (left panel, NCE was facilitated by different S+ and S- ITIs, being apparent almost from the outset of training in Group D:30+2- whereas it developed more slowly in Group D:2+2-. By the end of training, the magnitude of NCE was nearly the same in both groups. At spaced S- trials, on the other hand, both the development and magnitude of NCE were similar for the two differential conditioning groups; there was only a very slight tendency for NCE to have been promoted by different S+ and S- ITIs. The above observations regarding the development of discriminative behavior were supported by a 6 (groups) x 2 (discriminanda) x 3 (trials) analysis of variance applied separately to successive cycles of S+ and Sdays. Observations concerning asymptotic behavior were evaluated by a similar analysis of variance applied to speeds over Cycles 13-15 and including days as a factor. The analyses showed that beginning with Cycle 3 and on each cycle thereafter, there was a significant Groups x Discriminanda interaction. Partitioning those interactions into simple effects of discriminanda for each group showed that discrimination developed first in Group D:30+2-, being reliable by Cycle 3 [F (152) = 11.991. Discrimination developed last in Group D:30+30-, not being reliable until Cycle 12 [F (152) = 8.941. Reliable discrimination occurred by Cycle 6 in Group D:2+2- [F (152) = 6.591 and by Cycle 5 in Group D:2+30[F (1,52) = 14.801. Each of the differential conditioning groups continued to show reliable discrimination on each cycle following the first instance of significant differential responding. There was no evidence of discriminative behavior in either of the control groups. Newman-Keuls tests showed that NCE was reliable by Cycle 3 in Group D:30+2and by Cycle 10 in the remaining three groups. Except for trials, the only significant sources of variance in the ANOVA over Cycles 13-15 were a main effect of discriminanda [F (1, 104) = 85.091 and a Groups x Discriminanda interaction [F (5, 104) = 8.691. The Groups x Discriminanda interaction was partitioned into simple effects of groups at S+ and S- and simple effects of discriminanda for

372

STEVEN

J. HAGGBLOOM

each group. The simple effects of groups at S+ and S- required a pooled error term. The u” were calculated using the Saterthwaite method recommended by Winer (1971) which gave & = 5, 85. For the simple effect of discriminanda at each group all df = 1, 52. The simple effects of discriminanda for each group showed that S+ speeds were reliably faster than S- speeds in each of the four differential conditioning groups (smallest F = 31.26 for Group D:30+30-). The simple effect of groups within S+ did not reveal any differences among groups (F < 1) whereas there were substantial differences among groups in S- (F = 5.84). Group differences in S- were further evaluated using the Newman-Keuls procedure. Those tests supported what can be seen in Fig. 2, viz., late in training the differential conditioning groups all ran slower in S- than their respective small-reward control groups. There were no other differences among groups in S- at this late stage of training. None of the differential conditioning groups showed an NCE in the start section of the alley. In the run section, NCE was reliable only for the two groups given massed S- Trials. There was a decrement in speeds over successive S- trials in the differential conditioning groups in Experiment 1 (data not shown). This decrement occurred at both massed and spaced trials and followed the same developmental course as shown for discrimination in Fig. 1. That is, the decrement occurred first in Group D:30+2-, last in Group D:30+30-, and at an intermediate level of training in Groups D:2+2and D:2+30-. There were no reliable differences among groups on the first trial of each day. Phase 2. The left panel of Fig. 3 shows the mean speed in S- for each of the four differential conditioning groups and Group SC:30 over the Sdays of Phase 2 during which all groups (except SC:2) received spaced Strials. During Phase 2, the two nonshifted groups continued to run faster in S+ Phase

2

Phase

3

= 0.6$ 0.4 I-

0.2-

5 o.o:

P MD

2+

2-

‘---“-E:;;+;“c,----0i-J 30+30t I I 32 34 36

-

/’

c----d

I 36

I 40

I 42

sDAYS 3. Mean total speed on each S- day of Phase 2 for the four differential conditioning groups given spaced S- trials and the spaced trials control group (left panel), and four of the four differential conditioning groups during Phase 3 when they received massed S- trials along with the massed trial control group (right panel). FIG.

DIFFERENTIAL

REINFORCEMENT

373

than in S- and, as can be seen in Fig. 3, continued to show a sizable NCE. In groups shifted to spaced S- trials, on the other hand, S- speed increased markedly and discrimination and NCE were reduced or eliminated. These observations were supported by a Groups x Discriminanda x Trials x Days analysis of variance applied to speeds in Phase 2 for all groups but Group SC:2. There was a reliable Groups x Discriminanda interaction in Phase 2 [F (4,43) = 7.271 which was partitioned into simple effects of discriminanda for each group and simple effects of groups in S+ and S- . The simple effects of groups in S+ and S- required a pooled error term and adjusted df using the Saterthwaite method (Winer, 1971). The simple effects of discriminanda showed that S+ speeds were reliably faster than S- speeds in the two nonshifted groups [F (1,43) = 46.14 for Group D:2+30- and F (1,43) = 43.89 for Group D:30+30-1. After the shift, Group D:2+2- continued to run slower in S- than in S+ [F (1,43) = 17.571 but the IT1 shift eliminated discriminative behavior in Group D:30+2[F (1,43) = 2.181. There were no differences between S+ and S- speeds in Group SC:30 [F (1,43) = 2.581, and the IT1 shift had no apparent effect on St- speeds (data not shown), as the simple effect of groups at S+ was not reliable (F < 1). The simple effect of groups at S- was highly reliable [F (4, 72) = 5.391. Newman-Keuls tests showed that NCE was reliable in both nonshifted groups, but that NCE was not reliable in either shifted group. Indeed, Group D:30+2ran reliably faster in S- than did the two nonshifted groups. No other differences among groups in S- were reliable. Thus, a shift from massed to spaced S- trials disrupted discriminative behavior and eliminated NCE and these effects were more pronounced when S+ trials were spaced than when S+ trials were massed. Phase 3. The right panel of Fig. 3 shows mean speeds in S- in Phase 3 for each of the four differential conditioning groups and Group SC:2 over massed S- days. The figure reveals that the shift to massed S- trials further depressed speeds in Group D:30+30-. On the other hand, the shift to massed S- trials in Group D:2+30-, for which a 2-min IT1 had previously been associated with S + trials, produced a marked increase in S- speeds, disrupting discriminative behavior and eliminating NCE. These observations were supported by an analysis of variance identical to that used to evaluate Phase 2 behavior. Again, the finding of importance was a highly reliable Groups x Discriminanda interaction [F (4, 44) = 39.451. Simple effects of discriminanda for each of the four differential conditioning groups showed reliable discrimination in every case [smallest F (1,44) = 39.47 for Group D:2+30-I. S+ speeds did not differ from S- speeds in Group SC:2 (F < l), and the simple effect of groups in S+ was not reliable [F (4, 60) = 1.371. In S-, however, there was a highly significant groups effect [F (4, 60) = 19.341, which was further evaluated

374

STEVEN

J. HAGGBLOOM

by Newman-Keuls tests. These tests showed that NCE was reliable in each of the differential conditioning groups except Group D:2+30-. They further showed that Group D:30+30ran reliably slower in S- than Groups D:30+2and D:2+2-, which in turn ran slower than Group D:2+30-. Discussion

The results of Experiment 1 indicate that reward-produced and response-produced cues can very effectively serve as discriminative stimuli. Each of the differential conditioning groups learned to run slower on trials signaled by Ss, the S- cue, than on trials signaled by SL, the S+ cue, and discrimination learning was facilitated when response-produced cues, SIR, also functioned as discriminative stimuli. The occurrence of discrimination even at a 30-min IT1 is consistent with the view (Capaldi & Haggbloom, 1975) that reward-produced and response-produced cues remain viable sources of stimulus control for relatively long time intervals. The effects of IT1 in Experiment 1 appeared to be largely, if not entirely, mediated by ITI-dependent changes in internal cues. Previous experiments have indicated that the rat’s aversive emotional reaction to a smaller than expected reward occasions an internal stimulus state identified as frustration. Stimuli produced by frustration appear to dissipate rapidly, being most intense, and thus having the greatest capacity to control behavior, immediately after the frustrative event and exercising increasingly less behavioral control as IT1 increases (Amsel ef al., 1971; Capaldi et al., 1971). Frustration is aversive and elicits goal avoidance. Thus, if frustration constitutes a substantial portion of the 2-min value of Ss, discrimination should be facilitated by massed S- trials relative to spaced S- trials. It was also suggested here that SIR contributes to both the S+ and Sstimulus compounds and that those compounds would be made less similar, and discrimination facilitated, if S+ and S- trials occurred following different ITIs, rather than the same ITI. The results of Phase 1 were consistent with the propositions just described. Both discrimination learning and NCE appeared more rapidly when all trials within a day were separated by a 2-min IT1 (Group D:2+2-) than when the IT1 was 30 min (Group D:30+30-). Discrimination, but not NCE, was also facilitated by different S+ and S- ITIs in Group D:2+30-. The combination of massed S- trials and different S+ and S- ITIs in Group D:30+2promoted extremely rapid learning. The present results indicate that the 2- and 30-min values of both Ss and SIR were discriminably different and contributed to the stimulus compounds regulating behavior on S+ and S- trials. Indeed, results such as these could not have occurred otherwise. Just how strongly behavior was

DIFFERENTIAL

REINFORCEMENT

375

controlled by ITI-dependent values of Ss and SR was reflected by behavior in Phases 2 and 3. In Phase 2, the shift to spaced S- trials in Group D:30+2- involved the removal of an inhibitory component of Ss, frustration, from the stimulus complex governing behavior on S- trials and the introduction into that stimulus complex of a value of SIR previously associated only with S+. The combined effect was the elimination of both discriminative behavior and NCE in Group D:30+2in Phase 2. Since the shift from massed to spaced S- trials in Group D:2+2involved only the removal of the frustrative component of Ss, and not the introduction of a positive stimulus as well, the shift had a less disruptive effect on behavior, eliminating NCE but reducing rather than eliminating discrimination. In Phase 3, the shift to massed S- trials in Group D:30+30newly introduced both frustration and unconditioned values of SIR into the stimulus compound controlling S- behavior. This produced a marked decrement in S- speeds. The same shift to massed S- trials in Group D:2+30-, however, parallels to some degree the shift to spaced S- trials in Group D:30+2in Phase 2. Here a value of SIR was introduced on Strials that had previously been associated only with S-t and both discrimination and NCE were eliminated. It is noteworthy that the introduction of the unconditioned, frustrative component of Ss which also would have attended the shift to massed S- trials did not result in slow running in Sin Group D:2+30-. Apparently the control exercised by frustration, which may be largely motivational, was counteracted by the associative control exercised by SIR. Differential conditioning groups in the present experiment showed a decrement in speeds over successive S- trials within a day. This decrement may be unique to situations in which behavior is under the control of reward-produced cues. It has been reported in brightness differential conditioning experiments where the sequence of S+ and S- trials was arranged so that both brightness and reward-produced cues were relevant discriminative stimuli (Haggbloom, 1978). Whether this decrement over trials reflects an increasing availability of the reward-produced discriminative cue over successive S- trials, an accumulation of aversive properties of the cue, changes in inhibition over trials, or some other factor is not at this point clear. Finally, on the first trial of each day performance was independent of reward outcome on that trial. This result is not surprising. Whatever discriminative cues may be present on Trial 1 would have relatively few opportunities to become differentially associated with large and small reward. Moreover, the stimulus complex present on Trial 1 of each day may be highly similar across days, i.e., there may be a paucity of discriminative cues available on Trial 1. Capaldi and Morris (1974) have suggested, for example, that the first trial of the day is a very unique and

376

STEVEN

J. HAGGBLOOM

salient event for the rat. If cues uniquely associated with Trial 1 exercise substantial control over behavior, that control may well be at the expense of discriminative cues (Haggbloom, in press). EXPERIMENT

2

The purpose of Experiment 2 was to test the hypothesis that discriminative responding in Experiment 1 was due to the associative control exercised over behavior by cues such as Ss and SIR. This experiment is necessary because of the possibility that discriminative behavior in Experiment 1 was due to differential motivational consequences of responding following large and small reward. Indeed, decreasing motivation over successive S- trials, or an increase in an aversive motivational state such as frustration, could account for the decrement in speeds over S- trials in Experiment 1 as well as for discrimination itself. The very large advantage to discrimination provided by spaced S + trials combined with massed S trials (Group D:30+2-) might be due to some process like Hull’s (1943) reactive inhibition affecting behavior at massed S- trials but not at spaced S+ trials, rather than to associative control by SIR. In Experiment 2 the importance of associative control over discriminative behavior by Ss and SR was identified by systematically eliminating the discriminative stimulus properties of both Ss and SR in one group and eliminating either Ss or SIR in two other groups. Stated differently, these cues were rendered irrelevant by causing them to occur with both the S+ and S- cue compounds. As in the preceding experiment, groups received large-reward and small-reward trials on alternate days. Group D:30+2and Group D:2+2- were treated essentially like the groups so designated in Experiment 1. In Group D:30+2- the 2-min IT1 values of both Ss and SR would be relevant discriminanda. In Group D:2+2-, on the other hand, only the 2-min IT1 value of Ss is a reliable predictor of small reward, the 2-min IT1 value of SIR being associated with both S+ and S-. With the 2-min IT1 value of SR acting as an additional discriminative cue in Group D:30+2-, discrimination should occur more rapidly in that group than in Group D:2+2-, as was the case in Experiment 1. The two differential conditioning groups unique to Experiment 2, and critical for the hypothesis being tested, also received a 2-min IT1 between all daily S- trials. On S+ days Group LL received a 30-min IT1 between all daily trials with one exception, that being a 2-min IT1 between Trials 2 and 3 on some days or between Trials 3 and 4 on other days. In this way there was one occasion on each S+ day when the 2-min IT1 value of SR was associated with large reward. SR would thus be irrelevant in Group LL as in Group D:2+2- and those groups would be expected to learn the discrimination at a simlar rate. In Group SL small reward occurred on one trial on each S+ day

DIFFERENTIAL

REINFORCEMENT

377

followed at a 2-min IT1 by a large-reward trial. In this way the 2-min IT1 values of both Ss and SIR would be irrelevant cues in Group SL since they would be associated with both large and small reward. Consequently, discrimination should be greatly impaired in Group SL. Method

The subjects were 32 male rats, 150 days old at the start of the obtained from the Holtzman Company. Apparatus. The apparatus was the same as in Experiment 1. Procedure. The procedures preceding the beginning of training were essentially the same as in Experiment 1. General runway procedures were also the same including the alternation of large-reward (odd-numbered) and small-reward (even-numbered) days. However, in Experiment 2, all rats received five trials per day. On the first day of training eight rats were randomly assigned to each of four groups. All rats received a 2-min ITI between all trials on smallreward (even-numbered) days. On large-reward (odd-numbered) days, Group D:30+2received a 30-min IT1 between all trials and Group D:2+2received a 2-min IT1 between all trials. Group LL received a 30-min IT1 between all trials on large-reward days except between Trials 2 and3(Days1,5,9,13,and17)orTrials3and4(Days3,7,11,15,and19), when the IT1 was 2 min. Group SL was treated exactly like Group LL except that small reward occurred on the trial preceding the 2-min ITI. As in Experiment 1, trials separated by a short IT1 were run during the IT1 for spaced groups. There were 20 days of training in Experiment 2. Subjects.

experiment,

Results

Of principal interest in Experiment 2 was behavior over the last four trials of each day. As in Experiment 1, all analyses were performed on time scores transformed to speeds (msec). Only total speeds are reported for Experiment 2. Figure 4 shows the mean difference between S+ and S- speeds for each of the four groups. As can be seen, discrimination developed first in Group D:30+2-, next in Groups D:2+2and LL, and little or no discrimination occurred in Group SL. These observations were supported by separate analyses of variance, with groups as a between-subjects factor and discriminanda and trials as within-subjects factors, applied to successive cycles of S+ and S- days. Beginning on Cycle 3 and on each subsequent cycle there was a reliable Groups x Discriminanda interaction. Simple effects of discriminanda at each group showed that Group D:30+2ran reliably faster in S+ than in S- by Cycle 4 [F (1, 27) = 18.561; Groups D:2+2and LL ran faster in S+ than in S- by Cycle 6 [F (1, 27) = 36.92 and F (1, 27) = 39.71, respectively]; and Group SL did not learn to respond faster in S+ than in

378

STEVEN

J. HAGGBLOOM

FIG. 4. Mean difference between speeds in S+ and S- over the total alley for each of the groups in Experiment 2.

S- [largest F (1,27) = 2.98 for Cycle 91. Additional analyses revealed that there were no differences among groups in S+ over the last six cycles (F < 1). Newman-Keuls tests showed that within S- Group SL ran faster than the remaining groups and there were no other group differences in S- . As in Experiment 1, discrimination was worse in start than in later alley sections. Also, there was a decrement in speeds over successive Strials, as in Experiment 1, in all groups except Group SL. Discussion

The results of Experiment 2 clearly indicate the importance of Ss and SIR as distinctive sources of discriminative stimuli. More importantly, the present results support a conditioning interpretation of discrimination under conditions employed in these experiments. Since all groups were treated alike on S- days, any motivational or other nonconditioning behavioral effects of Ss and SIR that might have influenced S- behavior would have been the same for each group. The occurrence of SIR (Groups D:2+2- and LL) or both SIR and Ss (Group SL) on S+ trials, which would have conditioned approach tendencies in their presence, markedly interfered with discrimination learning relative to Group D:30+2-. Furthermore, discrimination was completely precluded, within the number of trials used here, in Group SL where both sources of discriminative stimuli, Ss and SIR, were rendered irrelevant by their presence in both S+ and S-. GENERAL DISCUSSION

Groups given days on which large reward occurred on all trials altemating with days on which small reward occurred on all trials eventually came to discriminate the late trials of small-reward days from the late trials of large-reward days. This situation was identified here as a differential conditioning problem wherein the discriminative stimuli are memories

DIFFERENTIAL

REINFORCEMENT

379

of large reward, SL, and small reward, Ss. Because SL is always followed by large reward and Ss is always followed by small reward (within a day), SL takes on properties of an S+ cue eliciting faster running than Ss, which takes on properties of an S- cue. Experiment 1 demonstrated that SL and Ss are viable sources of stimulus control in differential conditioning at relatively spaced trials as well as at massed trials, although discrimination occurs more rapidly at massed trials. Experiment 1 also suggested that the effects of IT1 were mediated by time-dependent changes in the internal cues contributing to the discriminative stimulus complex and that some stimulus common to largeand small-reward days, e.g., response-produced cues or P, gains control over behavior. Experiment 2 demonstrated that the control exercised by reward-produced and response-produced cues was due to associative processes rather than to nonassociative processes such as motivation. There were both similarities and differences between behavior in the present task and in conventional differential conditioning. Most notable among the similarities is the occurrence of a persistent NCE in both situations, suggesting that here, as is the case in brightness differential conditioning, the S- cue becomes inhibitory. This similarity is of some significance in establishing the present situation as one involving differential conditioning since NCE with successive shifts in reward magnitude is usually transient and repeated shifts reduce or eliminate NCE (Capaldi & Lynch, 1967; Capaldi, 1972). In the present differential conditioning situation, discrimination and NCE appeared earlier in training and were larger in later alley sections than in the start section. In brightness differential conditioning, the reverse is generally true. Ludvigson and Gay (1967, Experiment 2) showed that in the conventional case, discrimination and NCE are largest at the point in the alley where the rat receives the discriminative cue. Rats beginning each trial from a neutral gray start box leading to either a white or a black alley receive the discriminative cue upon entering the alley. Ludvigson and Gay (1967) reported better discrimination and larger NCE in the start section of the alley than in other alley sections when the start box and alley were not the same color. Discrimination and NCE in start were greatly reduced when the start box and alley were the same color so that the discriminative cue was received prior to entering the alley. A measure of latency to orient to the start box door revealed discrimination and NCE in the start box in the condition where the start box and alley were the same color. Although there is a behavioral difference between the present task and brightness differential conditioning with respect to the location in the alley of best discrimination and NCE, there may nevertheless be a functional similarity. In the present situation the discriminative cues were memories of reward outcomes which, according to the sequential view (Capaldi,

380

STEVEN

J. HAGGBLOOM

1%7) are stored in the context of cues associated with late alley segments. The alley cues are assumed to function as memory retrieval cues and, when present, to facilitate the retrieval of reward-produced memories stored in their context. Thus, reward-produced memories are more likely to be retrieved in late alley segments than in start. In other words, the location in the alley of best discrimination and largest NCE in the present task would also appear to be a function of the availability of discriminative cues. In both experiments reported here, differential conditioning groups showed a decrement in responding over successive S- trials. This decrement has not been observed in the vast majority of differential conditioning experiments. However, in conventional differential conditioning S+ and S- trials are usually intermixed and there are rarely more than two consecutive S- trials in a schedule. Moreover, it has not been common practice to report trial-by-trial data in the conventional case. In two recent brightness differential conditioning experiments (Haggbloom, 1978), groups showed a decrement over successive S- trials similar to that obtained here under conditions where no S+ trials followed S- trials within a day (Experiment 1) or when massed S- trials were followed by an S + trial only after a relatively long IT1 (Experiment 2). When S + trials followed S- trials at the same IT1 separating successive S- trials, as in most conventional differential conditioning experiments, the decrement over successive S- trials was greatly reduced or did not occur at all (Haggbloom, 1978). Thus declining speeds over successive S- trials are not unique to the present task, although they may be unique to discriminations under internal stimulus control. Moreover, in brightness differential conditioning, as here, the decline in speeds over successive S- trials is greater and occurs earlier in training at massed trials than at spaced trials. In both situations the effects of IT1 are apparently mediated by ITIdependent changes in internal stimulus control. REFERENCES Amsel, A., Wong, P. T. P., & Traupmann, K. L. Short-term and long-term factors in extinction and durable persistence. Journal of Experimental Psychology, 1971, 90, 90-95. Capaldi, E. J. Partial reinforcement: A hypothesis of sequential effects. Psychological Review, 1966, 13, 459-477. Capaldi, E. J. A sequential hypothesis of instrumental learning. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation: Advances in theory and research. New York: Academic Press, 1967. Vol. 1. Capaldi, E. J. Successive negative contrast effect: Intertrial interval, type of shift and four sources of generalization decrement. Journal of Experimental Psychology, 1972, 96, 433-438. Capaldi, E. J. Latent discrimination learning under a regular schedule of partial reinforcement. Animal Learning and Behavior, 1979, 7, 63-68.

DIFFERENTIAL

REINFORCEMENT

381

Capaldi, E. J., Berg, R. F., & Morris, M. D. Stimulus control of responding in the early trials of differential conditioning. Learning and Motivation, 1975, 6, 217-229. Capaldi, E. J., Berg, R. F., & Sparling, D. L. Trial spacing and emotionality in the rat. Journal of Comparative and Physiological Pyschology, 1971, 76, 290-299. Capaldi, E. J., & Haggbloom, S. J. Response events as well as goal events as sources of animal memory. Animal Learning and Behavior, 1975, 3, l-10. Capaldi, E. J., & Lynch, D. Repeated shifts in reward magnitude: Evidence in favor of an associational and absolute (noncontextual) interpretation. Journal OfExperimental Psychology, 1%7, 75, 226-235. Capaldi, E. J., & Morris, M. D. Reward schedule effects in extinction: Intertrial interval, memory and memory retrieval. Learning and Motivation, 1974, 5, 473-483. Haggbloom, S. J. Intertrial interval effects on internal stimulus control of behavior in brightness differential conditioning. Learning and Motivation, 1978, 9, 347-358. Haggbloom, S. J. Effects of a 24-hour intertrial interval on successive differential conditioning and simultaneous negative contrast. American Journal of Psychology, in press. Hull, C. L. Principles of Behavior. New York: Appleton-Century, 1943. Jobe, J. B., Mellgren, R. L., Feinberg, R. A., Littlejohn, R. L., & Rigby, R. L. Patterning, partial reinforcement, and N-length effects at spaced trials as a function of reinstatement of retrieval cues. Learning and Motivation, 1977, 8, 77-97. Ludvigson, H. W., & Gay, S. E. An investigation of conditions determining contrast effects in differential reward conditioning. Journal of Experimental Psychology, 1%7, 75, 37-42. McHose, J. H., & Blackwell, D. R. Performance in differential instrumental conditioning as a function of the pattern of partial S+ reward. Animal Learning and Behavior. 1975,3, 63-66. Wmer, B. J. Statistical principles in experimental design. New York: McGraw-Hill, 1971, 2nd ed. Received January 25, 1979 Revised June 26, 1979