Effects of expectancies of different reward magnitudes in transfer from noncontingent pairings to instrumental performance

Effects of expectancies of different reward magnitudes in transfer from noncontingent pairings to instrumental performance

LEARNING AND MOTIVATION 7, 197-210 (1976) Effects of Expectancies of Different Reward Magnitudes in Transfer from Noncontingent Pairings to Instr...

978KB Sizes 0 Downloads 48 Views

LEARNING

AND

MOTIVATION

7, 197-210

(1976)

Effects of Expectancies of Different Reward Magnitudes in Transfer from Noncontingent Pairings to Instrumental Performance ELIZABETH

D. CAPALDI,JOHN

R. HOVANCIK,'AND Purdue

FRANK FRIEDMAN

University

In two experiments, rats received noncontingent pairings of two stimuli with food reward, one paired with small reward and the other with large reward, and received bar press training with large reward or with small reward. When the noncontingent stimuli (NS) were presented for test during subsequent rewarded bar pressing and during early extinction ofbar pressing, responding for each group was faster in the presence of the NS which was paired with the same reward magnitude that group received in bar press training than to the NS which had been paired with a different reward magnitude. As extinction progressed, all groups responded more slowly in the presence of the NS which had been paired with the large reward than in the presence of the NS which had been paired with small reward. These results were interpreted as indicating that responding in the presence of an NS depends on: (i) whether the reward expectancy elicited by the NS has been conditioned to the instrumental response, and (ii) the relationship between the reward expected in the presence of the NS and that received in test.

Many theorists have proposed two learning processes to explain instrumental learning phenomena (e.g., Spence, 1956; Trapold & Overmier, 1972). One process is the formation of a connection between the stimuli present and the instrumental response (instrumental response learning). The second process is formation of a mediator* characteristic of the reward through pairing of stimuli with reward (termed here expectancy learning). If, indeed, as seems to be the case, two learning processes must be postulated to explain instrumental learning phenomena, it becomes of great importance to specify how these two processes interact. This was the problem of interest here. The transfer-of-control paradigm was employed to investigate the interaction between these two processes. In the transfer of control paradigm, This research was supported in part by Grant MH 23446-01 from the National Institute of Mental Health to the first author. Reprints can be obtained from ElizabethD. Capaldi, Department of Psychological Sciences, Purdue University, West Lafayette, Indiana 47907. * Now at State University of New York College at Oswego. * The term mediator is employed here as a general term for any representation of the reward which may be formed as a result of stimulus-reward pairings. As will be seen later, the present results suggest that this mediator is profitably conceptualized as an expectancy of reward. 197 Copyright All rights

@ 1976 by Academic Press. Inc. of reproduction I” any form reserved

198

CAPALDI,

HOVANCIK

AND

FRIEDMAN

customarily, three experimental phases are employed. In the first phase, a stimulus is paired with some reward event with no response being required (noncontingent phase). Presumably during this phase, a mediator characteristic of the reward is conditioned to the stimulus. In the second phase, an instrumental response is conditioned to another stimulus (contingent phase). In the third phase, the stimulus from the noncontingent phase (noncontingent stimulus, NS) is presented during instrumental responding. The purpose of this phase is to evaluate the effect of the NS and the mediator it elicits on instrumental performance. In general, it is found employing this paradigm that if the same motivation-reward is employed in both the noncontingent and contingent phases (e.g., hunger-food), test performance to the NS is facilitated relative to various control stimuli (e.g., Trapold & Winokur, 1967). If, however, a different motivation-reward system is employed in the noncontingent and contingent phases (e.g., shock avoidance in the contingent phase, hunger-food in the noncontingent phase), performance in the presence of the NS is inhibited (e.g., Grossen. Kostansek, & Bolles, 1969). Two different explanations of these effects, proposing two different ways reward produced mediators could influence instrumental responding, have been suggested. One hypothesis (Trapold & Overmier, 1972) is that much, if not all, of the effects of reward produced mediators may be attributed to their distinctive cue properties. Assume that during the noncontingent phase a mediator characteristic of the reward is conditioned to the NS. Also assume that during the contingent phase a mediator characteristic of the reward employed in the contingent phase is formed, and stimuli produced by this mediator are conditioned to the instrumental response. Then, in the test phase when the NS is presented, it elicits the mediator conditioned to it. If the stimulus characteristic of this mediator has been conditioned to the instrumental response (as is the case when the same motivation-reward is employed in both phases), responding in the presence of the NS will be facilitated. If the NS elicits a mediator which has not been conditioned to the instrumental response (as is the case when different motivation-rewards are employed in the two phases), responding will be reduced due to stimulus generalization decrement. A not incompatible hypothesis is that during noncontingent pairing a mediator characteristic of the reward is formed which affects performance motivationally (e.g., Rescorla & Solomon, 1967; Spence, 1956). Assume that pairing a NS with food conditions positive incentive motivation to the NS, and that positively and negatively based incentive motivation interact subtractively. Then an NS paired with food will facilitate performance motivated by hunger-food, but will inhibit performance based on negative incentive motivation (e.g., avoidance responding).

REWARD MAGNITUDE

EXPECTANCIES

199

The purpose of Experiment 1 was to separate the incentive motivational effects of reward produced mediators from effects attributable to their distinctive cue properties. To accomplish this, hunger-food was used as the motivation in both contingent and noncontingent phases in Experiment 1 and reward magnitude was varied. Group 1 received one pellet for bar pressing to SD while Group 5 received five pellets. Both groups received noncontingent pairings of two stimuli with reward. The Sl was paired with one pellet and S5 with five pellets, and then these stimuli were presented on test trials with SD. Within an incentive motivational view, if the mediator conditioned to the NS is within the same motivational system as the motivation underlying the instrumental response (e.g., food based incentive motivation), then responding should be facilitated by the NS, and more so the greater the incentive motivation elicited by the NS. Accordingly, in Experiment 1, responding should be faster to S5 than to Sl for both groups. On the basis of associative properties of reward produced mediators, the important consideration is the similarity of the stimuli produced by the reward mediators in the contingent and noncontingent phases. Since the mediators characteristic of one and five pellets appear to be distinctively different (e.g., Flaherty & Davenport, 1968), in Experiment 1, Group 1 should respond faster to Sl than to S5 and Group 5 should respond faster to S5 than to Sl. Responding should be facilitated when the NS presented has been paired with the reward used in instrumental training for a given group because the mediator characteristic of the reward should directly elicit the instrumental response. Responding should be reduced when the NS which has been paired with the reward magnitude not used in instrumental training is presented because the mediator should produce stimulus generalization decrement. Although the critical comparisons are between the test stimuli for each group, it is also of interest to compare test trial performance to performance when only SD is presented. Aside from the effects of the mediators produced by the NS, the NS themselves should produce stimulus generalization decrement because the instrumental response has not been conditioned to the NS. Thus within an incentive motivational account, no prediction can be made regarding SD vs. test-trial performance. Although the mediators produced by the NS should facilitate performance, the NS should produce stimulus generalization decrement. A similar difficulty exists in making predictions regarding SD vs. Sl performance for Group 1 and SD vs. S5 performance for Group 5 within an associative account. In these cases, within an associative account, the mediator elicited by the test stimulus should elicit the instrumental response, facilitating performance, but the NS itself should produce stimulus generalization decrement. However, it can be predicted within an associative account that SD performance should be superior to S5 performance for Group I, and SD performance should be superior to S 1 per-

200

CAPALDI,

HOVANCIK

AND FRIEDMAN

formance for Group 5, since in these cases both the NS and the mediator produced by the NS should produce stimulus generalization decrement. EXPERIMENT

1

Method Subjects

The subjects were 16 naive male albino rats approximately 90 days old upon arrival from the Holtzman Company, Madison, Wisconsin. Apparatus

The apparatus consisted of two identical commercial operant conditioning chambers in conjunction with automatic programming equipment. Each chamber was equipped with two automatically retractable response bars with a food cup located midway between. The left bar of each chamber was permanently retracted during the experiment. A 2900-Hz tone generator and a 130-Hz buzzer were attached to each chamber. The chambers were enclosed in separate sound-insulated boxes and were illuminated by 7lk-W incandescent light sources. Water was continuously available in the chamber throughout the course of the experiment. Procedure

Following arrival in the laboratory, subjects were fed ad libitum for 4 days and were assigned to two groups matched on weight on the fourth day. On this day (Day I), subjects were placed on a food deprivation schedule consisting of 12 g/day of Wayne Lab Blox. This schedule continued throughout the experiment. On days when food was received in the apparatus, the amount received was subtracted from the daily ration which was served approximately 30 min after return to the home cage. On Days 8-10 each rat was handled individually for 90 sec. Half of the subjects in each group was assigned to one of the conditioning chambers for the duration of the experiment. The procedure may be divided into six phases: preliminary training, free operant training, noncontingent pairings, discrete-trial bar-press training, rewarded test phase, and extinction test phase. Preliminary training. On Day 11, subjects were given 15 min of adaptation to the chambers with the response bar retracted. On each of Days 1214, subjects were given magazine training during which each subject received 24 noncontingent reinforcements delivered with a variable interreinforcement interval averaging 30 sec. A random half of the reinforcements in each daily session consisted of a single 0.045-g pellet, while the other half consisted of five pellets delivered to the food cup at the rate of one each 0.5 sec.

REWARD

MAGNITUDE

EXPECTANCIES

201

Free operant training. On Days 15- 19, subjects received free operant bar-press training. The right response bars were inserted into the chambers and subjects were reinforced for each press for a total of 15 reinforcements on each day. Subjects in Group 1 received one pellet for each press and subjects in Group 5 received five pellets. The time necessary for each subject to complete 15 reinforced presses was recorded. Noncontingent pairings. Following free operant bar-press training, the response bars were again retracted from the chambers. On Days 20-39, subjects were given 12 3-set randomly intermixed presentations each of the tone and the buzzer stimuli on a VI 30-set schedule, with the restriction that a minimum of 9 set intervene between stimulus presentations. For half the subjects in each group the tone was S, and the buzzer was Sg, and for the other half of the subjects vice versa. Termination of the S, stimulus was followed immediately by the delivery of a single 0.045-g pellet. S, presentations were followed immediately by the delivery of five pellets at the rate of one each 0.5 sec. Discrete trial bar-press training. On Days 40-47, the right response bars were reinserted into the chambers, and all subjects were trained on a discrete trial bar-pressing task, at the rate of 12 trials per day. Each trial began with the insertion of the response bar into the chamber. The first press after trial onset retracted the bar and delivered the reinforcement. Group 1 received a single reward pellet and Group 5, five pellets for each press. The time between the insertion of the bar and the subject’s response was recorded as response latency. The intertrial interval was VI 30 set with a minimum of 9 set between trials. Rewarded test phase. The procedure in this phase (Days 48-55) was identical to the preceding phase, except that on Trials 6 and 12 of each day the insertion of the bar was accompanied ‘by the onset of either S, or S,. The stimulus remained on until the subject’s first response. As in the preceding phase, this response also retracted the bar and delivered reinforcement. The order of presentation of S, and S, was counterbalanced for each subject, half of the subjects in each group beginning with S, (S, S, S, S, etc.) and half of the subjects beginning with S,. Extinction test phase. This phase (Days 56-60) was identical to the two preceding phases with the following exceptions. Of the 12 daily discrete trials, insertion of the bar was accompanied by the onset of S, on four trials and by the onset of S5 on four trials. The trials within a day on which each stimulus was presented were determined randomly. No food reward was given on any trial. The subject’s response (or 120 set of no response) resulted only in termination of the auditory stimulus (if any), and retraction of the bar. Results

Data in both the free operant phase and discrete trial acquisition were analyzed in a between-within analysis of variance including reward

202

CAPALDI,

HOVANCIK

AND FRIEDMAN

magnitude as the between factor and days as the within factor. All analyses initially included buzzer vs. tone as a factor, but since this variable produced no significant effects it was dropped from the analyses. Free operant acquisition. In free operant acquisition, Group 5 took significantly longer to complete 15 reinforced presses than Group 1 [F(1,14) = 130.53, p < JOI] presumably because of the longer time necessary to eat five pellets. Also significant was the effect of days, [F(4,56) = 4.15 p < .Ol] reflecting acquisition of the bar press. Discrete trial acquisition. In each discrete trial phase (acquisition, rewarded test phase, extinction test phase), latencies for each trial were converted to log (latency + 1) for analysis. In discrete trial acquisition, the only significant difference was that due to latencies for all groups decreasing over days [F(7,98) = 7.36, p < .OOl]. Performance on SD alone trials in the test phases. In the rewarded test phase, Group 5 responded faster than Group 1 on SD trials, but this difference was not significant (F < 1). In extinction, Group I was more resistant to extinction than Group 5 on SD trials [F(1,14) = 12.74, p < .Ol]. Performance on test trials. Figure 1 shows the mean of log (latencies + 1) summed over all of the rewarded test phase (panel A) and summed over all of the extinction test phase (panel B) for each group on Sl and S5 test trials and trials when only the insertion of the bar signaled the start of the trial (SD trials). Data are presented summed over all of each test phase because days did not interact significantly with any other variable in either phase. As can be seen in Fig. 1, in both the rewarded test phase and the extinction test phase, Group 1 responded more rapidly when Sl was presented than when S5 was presented; and Group 5 responded more rapidly when S5 was presented than when S 1 was presented. That is, both groups responded more rapidly when the NS presented had been paired with the same reward magnitude that group had received in instrumental training. As can also be seen in Fig. 1, SD performance fell between performance to the two test stimuli except for Group 5 in extinction. Data were analyzed in a between-within analysis of variance including groups as the between factor and stimulus presented (SD alone, SD + Sl, SD + S5) and days as within factors. Considering first the rewarded test phase (panel A in Fig. l), the Stimulus Presented x Groups interaction was significant [F(2,28) = 4.15. p < .03]. Considering just test trials, the Sl vs. S5 x Groups interaction was also significant [F(1,14) = 4.49, p < .05]. Subsequent Newman-Keuls tests indicated that Group I responded significantly faster to S 1 than to S5 (p < .Ol), while the difference between Sl and S5 performance for Group 5 was not significant. SD performance did not differ significantly from test trial performance for either group. Turning now to the extinction test phase (panel B in Fig. 1). the Stimulus Presented x Groups interaction was significant [F(2,8) = 4.41,

REWARD

SD

MAGNITUDE

Sl

D

s5

STIMULUS

203

EXPECTANCIES

Sl

s5

d&TED

FIG. 1. Mean log (latency + I) to each of the test stimuli and to SD alone for each group summed over all of the rewarded test phase (Panel A) and summed over ail of the extinction test phase (Panel B).

p < .033. Considering only test trials, the Sl vs. S5 x Groups interaction was also significant [F(1,14) = 12.85, p < .Ol]. Subsequent Newman-Keuls tests (p < .05) indicated that Group 1 responded significantly faster to Sl than to S5, and Group 5 responded significantly faster to S5 than to Sl. Group 5 also responded significantly faster to S5 than to SD. No other differences between SD performance and test trial performance were significant. Discussion The results of Experiment 1 may be interpreted as indicating that at least a portion of the response level in the presence of a stimulus which has been paired noncontingently with reward (NS), is due to the NS eliciting a mediator characteristic of the reward, a mediator with distinctive cue properties. Thus for animals which were trained with one pellet, responding was faster in the presence of an NS which was paired with one pellet, while for animals which were trained with five pellets responding was faster in the presence of an NS which was paired with five pellets, at least in extinction. These results are most easily attributed to each NS eliciting a mediator characteristic of the reward with which it was paired. If the stimulus characteristic of this mediator has been conditioned to the instrumental response, then responding in the presence of the NS will be more vigorous than if the stimulus produced by the mediator has not been conditioned to the instrumental response.

204

CAPALDI,

HOVANCIK

AND FRIEDMAN

Consistent with this interpretation, SD performance in the rewarded test phase for both groups fell between performance to Sl and to SS. Within an associative interpretation, it would be expected that test trial performance to the NS paired with the reward magnitude which was not received in instrumental training would be inferior to SD performance due to stimulus generalization decrement associated with presentation of the NS and stimulus generalization decrement associated with the reward mediator elicited by the NS. This result occurred for both groups, although differences between performance and test trial performance were not significant. When the NS which was paired with the same reward magnitude as that received in instrumental training is presented, stimulus generalization decrement due to the NS is opposed by direct elicitation of the instrumental response by the mediator characteristic of the reward used in instrumental training. Since performance in the rewarded test phase to the NS paired with the same reward as that received in instrumental training was superior to SD performance for both groups (although not significantly so), it appears that direct elicitation of the instrumental response by the mediator outweighed stimulus generalization decrement due to the NS. Accordingly, differences between Sl and S5 performance, when both response elicitation by one mediator and stimulus generalization decrement due to the other mediator were operating, were larger than differences between SD performance and performance to either NS when only one of these factors was operating to produce a difference. While the difference between Sl and S5 performance for Group 1 was significant in both the rewarded test phase and the extinction test phase, the difference between Sl and S5 performance for Group 5 was significant only in the extinction test phase. Since the extinction test phase followed the rewarded test phase where the groups received both Sl and S5 in conjunction with a particular reward magnitude, it is difficult to interpret the differences in extinction. For example, during the rewarded test phase, Sl was paired with 5 pellets for Group 5P and, thus, it is not clear what reward expectancy Sl would elicit in extinction for Group 5P. To clarify the extinction results, an additional experiment was run in which the rewarded test phase was eliminated. EXPERIMENT

2

In Experiment 2, the reward magnitudes received in noncontingent pairings were either one and five pellets or one and 10 pellets. This variable was manipulated because of results obtained by Hyde, Trapold, and Gross (1968). These investigators gave bar press training with one pellet and then presented, during a rewarded test phase, a stimulus which had been noncontingently paired with one pellet and a stimulus which had been noncontingently paired with 10 pellets. Hyde er al. obtained no difference in performance to the two stimuli, while in Experiment 1, an

REWARD MAGNITUDE

205

EXPECTANCIES

analogous group (Group 1) responded significantly faster to Sl than to SS during the rewarded test phase. Since one obvious difference between Group 1 in Experiment 1 and Hyde et al.‘s group is the size of the large reward employed (five vs. 10 pellets), this variable was included in Experiment 2. Method Subjects

The subjects were 24 rats of the same description in Experiment 1.

as those employed

Apparatus

Three operant conditioning chambers of the same description employed in Experiment 1 were used.

as those

Procedure

Procedure in Experiment 2 was identical to that of Experiment 1 with the following exceptions. There were four groups. Two groups received pairings of Sl with one pellet and S5 with five pellets, one of these groups received instrumental training with one pellet, Group l(5), and the other received instrumental training with five pellets, Group 5. Two additional groups received pairings of Sl with one pellet and SlO with 10 pellets, one of these groups received instrumental training with one pellet, Group l(lO), the other with 10 pellets, Group 10. Rats were handled individually for 60 set on Days 8- 10. Because some animals received a lo-pellet reward, the number of trials per day was reduced from that used in Experiment 1 to avoid the animals becoming satiated, and the time between trials was increased to give animals time to eat. Thus in magazine training, in noncontingent pairing, and in the discrete trial phases a VI-60 set schedule was employed with a minimum interreinforcement interval in magazine training, a minimum interstimulus interval in noncontingent pairings, and a minimum intertrial interval in discrete trial training of 30 sec. In free operant training, there were 8 days of nine reinforcements per day. There were 24 days of noncontingent pairings with 12 presentations of each stimulus per day. There were 10 days of discrete trial acquisition with nine trials per day. And there was no rewarded test phase. Following discrete trial acquisition, animals were given extinction trials for 6 days, nine trials per day. On a random three of the extinction trials each day, S 1 was presented and on a random three trials S5 [for Groups l(5) and 53 or SlO [for Groups l(10) and IO] was presented. An extinction trial was terminated if the animal did not respond within 90 sec. Two subjects in Group l(5) died during the course of the experiment and their data were discarded.

206

CAPALDI,

HOVANCIK

AND FRIEDMAN

Results In free operant acquisition, the difference due to groups was significant [F(3,18) = 87.35,~ < .OOl], animals which received 10 pellets taking longer to complete nine reinforced presses than animals which received five pellets, which in turn took longer than those which received one pellet. Latencies in discrete trial acquisition and the extinction test phase were converted to log (latency + 1) for analysis of variance (using the unweighted means solution for unequal n). Initially analyses included buzzer vs. tone as a variable, but there were no significant differences associated with this variable so it was dropped from the analyses. In discrete trial acquisition, the groups did not differ significantly [F(3,18) = 2.30, p < .lO], all groups decreasing in latencies over days [F(9,162) = 23.29, p < .OOl]. In the extinction test phase, on SD trials the groups differed significantly [F(3,18) = 5.22, p < .Ol], Group l(5) being significantly more resistant to extinction than the other groups which did not differ significantly (Newman-Keuls, p < .05). On the extinction test trials, the pattern of results was different early in extinction from late in extinction. Accordingly, data are presented separately in Fig. 2 for early extinction (Days l-2, left panel) and late extinction (Days 3-6, right panel). As can be seen in Fig. 2, early in extinction each group responded more quickly in the presence of the NS which had been paired with the reward magnitude they received in instrumental training than in the presence of the other NS, and SD per3.2

-

t >

-Group -Group +--eGroup o--.dGroup

l(5) 5 ItlO) 10

1.0 -

ii

?.a-

8 2

.6 -

% ?L-

4

I SD

, + , I I Sl s5cslo SD Sl STIMULUS PRESENTED

1 s5orso

FIG. 2. Mean log (latency + 1) to each of the test stimuli and to SD alone group summed over Days l-2 of the extinction test phase (left panel) and summed over Days 3-6 of the extinction test phase (right panel).

REWARD MAGNITUDE

EXPECTANCIES

207

formance fell between performance to the two NS. As can also be seen in Fig. 2, late in extinction all groups responded more slowly in the presence of the NS which was paired with large reward magnitude than to the other NS or to SD. Data on Days l-2 and on Days 3-6 were analyzed in a between-within analysis of variance including reward magnitude received in instrumental training (small, one pellet, or large, five or 10 pellets), and size of the large reward magnitude (5 vs 10 pellets) as between factors, and stimulus presented and days as within factors. Groups which received 10 pellets as the large reward magnitude [Groups 10 and l(IO)] were less resistant to extinction than groups which received five pellets as the large reward pays l-2, F(l,l8) = 18.32,~ < .OOl,Days 3-6, F(1,18) = 8.66,~ < .Ol], but this variable did not interact significantly with the effects of the NS, indicating that the pattern of results did not differ as a function of size of the large reward magnitude. In early extinction (Days l-2), the interaction of stimulus presented and reward magnitude received in instrumental training was highly significant [F(2,36) = 17.83, p <: .OOl]. Considering only test trials, the interaction between stimulus presented and reward magnitude received in instrumental training was also highly significant [F(l,l8) = 23.37, p < .OOl]. Subsequent Newman-Keuls tests indicated that Group l(5) responded significantly faster to Sl than to S5, and significantly faster to SD than to S5 (ps < .Ol). Also, Group l(10) responded significantly faster to S 1 than to S 10 and significantly faster to SD than to SlO (ps < .Ol). Differences in responding to the three stimulus conditions for Groups 5 and 10 were not significant, although both groups tended to respond more quickly to the NS which was paired with the larger reward (i.e., the reward they had received in instrumental training) than to the other NS. Late in extinction (right panel of Fig. 2) the Stimulus Presented X Reward Magnitude interaction was significant when all three stimulus conditions were included [F(2,36) = 6.03, p < .Ol], but was no longer significant considering only test trials [F( 1,18) = 1.721. As can be seen in the right panel of Fig. 2, late in extinction on test trials all groups responded more slowly in the presence of the NS which had been paired with the large reward magnitude than in the presence of Sl [F( 1,18) = 8.35, p < .Ol]. Subsequent Newman-Keuls tests comparing test trial performance to SDperformance indicated that Group 5 responded significantly faster to SD than to SS, and Group 10 responded significantly faster to SD than to SlO. SD performance for Groups l(5) and l(l0) fell between performance to the two NS and did not differ significantly from performance to either NS. Discussion

In Experiment 2, the pattern of results changed over extinction trials. Early in extinction the pattern of results was what would be expected on

208

CAPALDI,

HOVANCIK

AND FRIEDMAN

the basis of the distinctive cue properties of reward produced mediators elicited by the NS. Animals which were trained with one pellet [Groups l(5) and l( lo)] responded significantly faster when S 1 was presented than when the NS paired with the large reward was presented, and Groups 5 and 10 early in extinction tended to respond faster when the NS paired with the reward magnitude they received in instrumental training was presented than when Sl was presented, although not significantly so. And SD performance for all groups was superior to performance to the NS which was paired with the reward magnitude different from that received in instrumental training, although these differences were not significant for Groups 5 and 10. Later in extinction, however, all groups responded more slowly in the presence of the NS which was paired with the large reward magnitude than to the other NS or to SD, regardless of the reward magnitude they received during bar press training. These results are easily explained if the mediators characteristic of the reward magnitudes are explicitly specified to be expectancies of reward (e.g., Trapold, 1970). Within this view, since in extinction, nonreward is given on every trial, nonreward is received in the presence of an expectancy of one pellet when Sl is presented, in the presence of an expectancy of five pellets when S5 is presented, and in the presence of 10 pellets when S 10 is presented. Receiving nonreward in the presence of an expectancy of reward would be expected to produce frustration and/or inhibition (Amsel, 1958; Black, 1968), the amount of frustration and/or inhibition being greater the larger the reward expected. On this basis, extinction performance would be disrupted in the presence of the NS, and more so in the presence of S5 and S 10 than in the presence of S 1, regardless of the reward magnitude which was received in bar press acquisition. Assuming both a distinctive cue function of mediators and decremental effects of receiving a reward smaller than expected also provides an explanatory basis for the entire pattern of results obtained in Experiment 1. In the rewarded test phase of Experiment 1, Group 1 received one pellet on every trial, and thus in the presence of Sl received the reward expected, but in the presence of S5 received one pellet, i.e., a reward smaller than expected. Accordingly, during the rewarded test phase in Experiment 1, frustration and/or inhibition would work in addition to the distinctive cue function of the reward produced mediator to produce slower responding to S5 than to Sl for Group 1. For Group 5, however, five pellets were received on every trial in the rewarded test phase. Thus in the presence of S 1, five pellets are received in the presence of an expectancy of one pellet. If anything, this would tend to increase responding to S1 (positive contrast effect) working against the difference due to distinctive cue properties of the mediators (which favors responding to S5). This analysis thus accounts for the lack of difference for Group 5 in the

REWARD

MAGNITUDE

EXPECTANCIES

209

rewarded test phase of Experiment 1. As can be seen, the present analysis suggests that a large rewarded test phase and an extinction test phase both tend to work against finding faster responding to an NS paired with large reward than to an NS paired with small reward. In support of this analysis, the differences favoring Sl for groups trained with one pellet were clear and evident in all test phases in both experiments, while the differences favoring the NS paired with large reward for groups trained with large reward were small, the only significant difference for a group trained with large reward was that for Group 5 in the extinction test phase of Experiment 1. Although animals which received 10 pellets were less resistant to extinction than animals trained with five or one pellet, the effect of the NS in Experiment 2 did not vary as a function of the size of the large reward. Thus, this variable does not seem to be the source of the difference between the present results and those of Hyde et al. (1968). Nor does it seem likely that the difference between Hyde ef al.‘s findings and those reported here is due to the use of an extinction test phase here and of a one pellet rewarded test phase by Hyde et al., since the extinction test phase in Experiment 2 produced the same pattern of results for Group l(5) and l(10) as were obtained for Group 1 in the rewarded test phase of Experiment 1. Other variables which differed between these experiments and that of Hyde et al., and thus which could be responsible for the difference in results, are the SD employed (bar coming in vs light onset), number of pairings (a larger number in Hyde et al’s experiment than here), and the testing procedure (SD + NS here, NS instead of SD in Hyde et al.‘s experiment). Regardless of the source of the difference between these results and those of Hyde et al., the present results provide clear evidence for distinctive cue effects of reward expectancies. However, it is equally clear that an additional process(es) affects performance in the presence of NS. This additional process does not seem to be a result of incentive motivation being conditioned to the NS and producing increased responding directly, since within this view it would be expected that a main effect of the reward magnitude paired with the stimuli would have been obtained, producing faster responding in the presence of the NS which was paired with large reward than in the presence of the NS paired with small reward. This effect did not occur. There was no main effect of reward magnitude paired with the NS in Experiment 1, and in extinction in Experiment 2, responding was actually slower in the presence of the NS paired with large reward than in the presence of the NS paired with small reward. This result may be interpreted as indicating that it is not the absolute reward magnitude paired with the NS that is of importance, but rather the relationship between the reward received in the test phase, and the reward expected in the presence of the NS. This analysis provides

210

CAPALDI,

HOVANCIK

AND FRIEDMAN

an explanation of why differences between performance to the NS paired with large reward and to the NS paired with small reward were small and largely nonsignificant for animals trained with large reward and were large and significant in all phases for animals trained with small reward. And this analysis explains why performance to the NS signifying large reward was slower than that to the NS signifying small reward for all groups late in extinction in Experiment 2. In addition, this analysis suggests that an important direction for future research is to vary the reward received in the test phase in relation to the. reward paired with the NS. REFERENCES Amsel,A. The role offrustrative nonreward in noncontinuous reward situations. cal

Bulletin,

1958,

55,

102-

Psychologi-

119.

Black, R. W. Shifts in magnitude of reward and contrast effects in instrumental and selective learning. Psychological RevieNl, I%& 75, 114- 126. Flaherty, C. F., &Davenport, J. W. Noncontingent pretraining in instrumental discrimination between amounts of reinforcement. Journal of Comparative and Physiological Psychology, 1%8,66,707-71 I. Grossen, N. E., Kostansek. D. J.. & Belles. R. C. Effects of appetitive discriminative stimuli on avoidance behavior. Journal qf’ E.vperimenttr/ Psychology, 1969 81, 340-343. Hyde, T. S., Trapold, M. A., & Gross, D. M. Facilitative effect of a CS for reinforcement upon instrumental responding as a function of reinforcement magnitude: A test of incentive-motivation theory. Journal of Experimental Psychology, 1968, 78, 423-428. Rescorla, R. A., & Solomon, R. L. Two process learning theory: Relationships between Pavlovian conditioning and instrumental learning. Psychological Review,. l%7, 74, 151-182. Spence, K. W.Behavior theory andconditioning. New Haven: Yale University Press, 1956. Trapold, M. A. Are expectancies based upon different positive reinforcing events discriminably different? Lenrning and Motil,trtiotl. 1970. 1. 129- 140. Trapold, M. A., & Overmier, J. B. The second learning process in instrumental learning. In A. H. Black&W. F. Prokasy (Eds.), C/assicct/c,ondilioning II. New York: Appleton-Century-Crofts, 1972. Trapold, M. A., & Winokur, S. W. Transfer from classical conditioning and extinction to acquisition, extinction and stimulus generalization of a positively reinforced instrumental response. Journal of Experimental Psychology, 1%7, 73, 517-525. Received May 20, 1975 Revised September I, 1975