LEARNING
AND
MOTIVATION
18,
167-184 (1987)
Interaction of the Effects of Overshadowing and Reinforcer Devaluation Manipulations on Instrumental Performance ROBERT ST. CLAIRE-SMITH Wilfrid Laurier University Rats were trained on three different responses, and then one of the two reinforcers of each response was devalued by pairings with lithium chloride. During instrumental training. a brief stimulus consistently signaled one of the reinforcers for one of the responses and the alternate reinforcer for one of the other responses. Neither reinforcer was signaled in the case of the third response. The major finding of the experiment is that the effect on subsequent extinction responding of postconditioning devaluation of one of two reinforcers of a response depends on whether one of the two reinforcers was signaled during training and, if so, whether the reinforcer signaled was the one subsequently devalued. Relative to the level of performance of a response where neither reinforcer was signaled during training, the level of performance of a different response, where one of the two reinforcers was signaled during training, was higher if the signaled reinforcer was the one later devalued and lower if it was the unsignaled reinforcer later devalued. D 1987 Academic
Press, Inc.
Although a number of diverse theories of instrumental learning have been advanced, most have attempted to account for the changes in behavior that predictably follow typical instrumental conditioning procedures by appealing to simple associative mechanisms variously incorporating, as associates, the stimulus in the presence of which the response reinforcer contingency occurs, the response, and the reinforcer that is delivered contingent on the performance of the response. Within this associative tradition, several theories emphasize the role of associations formed to the stimulus as exclusive determinants of instrumental performance, and exclude either the encoding of the reinforcer as part of the cognitive structure elaborated during conditioning, as in the case of stimulusresponse theories (e.g., Guthrie, 19.52; Hull, 1943; Thorndike, 1898), or the learning of an association representing the response-reinforcer conReprints may be obtained from Dr. Robert St. Claire-Smith, Department of Psychology, Wilfrid Laurier University, Waterloo, Ontario, Canada N2L 3C5. The research was supported by Grant A0547 from the National Sciences and Engineering Research Council of Canada to the author. 167 0023-%%I87 $3.00 Copyright 0 1987 by Academic Press, Inc. All rights of reproduction in any form reserved.
168
ROBERT
ST. CLAIRE-SMITH
tingency, as in the case of Pavlovian stimulus-reinforcer accounts (e.g., Bindra, 1974) and two-process theories (e.g., Rescorla & Solomon, 1967; Trapold & Overmier, 1972). In contrast to such views, an increasing number of theorists have argued that the demonstrable effectiveness of response-reinforcer contingencies in controlling instrumental behavior might reasonably be attributed to animals learning an association between the execution of a response and the occurrence of the reinforcer (e.g., Adams, 1980; Bolles, 1972; Dickinson, 1980; MacKintosh, 1974; St. ClaireSmith, 1979a, 1979b). The purpose of the present experiment was to examine the convergence of two different experimental procedures that have provided much of the current evidence in support of this latter position. The first of these procedures involves testing the effects on subsequent performance of an instrumental response of motivational or conditioning operations introduced following original training of the response and designed to independently modify the significance or value of the reinforcing stimulus. It is argued (e.g., Rozeboom, 1958) that postconditioning devaluation of the reinforcer could not affect subsequent execution of the instrumental response unless the reinforcer was encoded during original training as part of the cognitive structure mediating performance of the response. Although several failures to demonstrate an effect of devaluation operations have been reported (e.g., Adams, 1980, 1982; Holman, 1975; Morgan, 1974; Morrison & Collyer, 1974; Wilson, Sherman, & Holman, 1981), a substantial number of studies have found changes in instrumental responding consistent with the nature of the postconditioning changes in the significance of the reinforcer (e.g., Adams, 1982; Adams & Dickinson, 1981; Chen & Amsel, 1980; Colwill & Rescorla, 1985a, 1985b; Dickinson, Nicholas, & Adams, 1983; St. Claire-Smith & MacLaren, 1983). The second of the two procedures involves an attempt to apply to the analysis of instrumental learning the design and logic of studies of associative interference in classical conditioning. It is well established that a Pavlovian arrangement in which a US is signaled by multiple CSs can result in reciprocal interference whereby each stimulus present on a trial interferes with the capability of each of the other CSs present to develop an association with the US on that trial (see Mackintosh, 1975, for a review). It is also well documented that one stimulus may overshadow another if it is the more reliable or valid of the two as a predictor of the US, and even more strongly interfere with the degree to which the second stimulus enters into an association with the US (e.g., Holland, 1977; Kamin, 1968; Rescorla & Wagner, 1972; Wagner, 1969). A number of psychologists have argued that such an interference paradigm could be useful in assessing the associative structure of instrumental conditioning. For example, if a decrease in the rate of instrumental learning resulted when the reinforcer was signaled by a stimulus that predicted the reinforcer better than the response during training, such a result would indicate
OVERSHADOWING
AND
REINFORCER
DEVALUATION
169
the overshadowing of the instrumental response by a Pavlovian CS, and suggest interference with the development of an association between the response and the reinforcer necessary to the performance of the response. Although several failures to obtain expected interference effects have been reported (e.g., Roberts, Tarpy, & Lea, 1984; Tarpy, Lea, & Midgley, 1983; Tarpy & Roberts, 1985), a significant number of studies have reported results consistent with an overshadowing interpretation (e.g., Dickinson, Peters, & Shechter, 1984; Garrud, Goodall, & Mackintosh, 1981; Hall, 1982; Hall, Channell, & Pearce, 1981; Pearce & Hall, 1978; Shettleworth, 1981; St. Claire-Smith, 1979a, 1979b; Tarpy, Lea, & Midgley, 1983; Tarpy et al., 1984; Tarpy, St. Claire-Smith, 8z Roberts, 1986; Williams, 1975, 1978; Williams & Heyneman, 1982). Although these procedures are quite different in design and are based on quite different rationales, each is an attempt to provide a measure of the extent to which performance of a response is determined by the encoding of the reinforcer, as an associate, during instrumental training. The present experiment attempted to evaluate the two procedures as convergent operations by employing a mixed design incorporating the essential features of both the interference procedure and the reinforcer devaluation procedure. More specifically, the experiment was designed to test the prediction that postconditioning devaluation of a reinforcer that was signaled during instrumental training would have less impact on subsequent performance of the response than devaluation of a reinforcer that was unsignaled during original training. Rats were separately trained to make three different instrumental responses, each reinforced on an interval schedule that delivered randomly, but with an equal probability of occurrence throughout a session, one or the other of two different rewards contingent on the response. A visual stimulus, presented during a brief delay interval interposed between the operative occurrence of a response and the delivery of the reinforcer (the interference manipulation), consistently signaled one of the two reinforcers for the one response, and the alternate reinforcer for a second response. The third response served as a control response and neither reinforcer was signaled. At an intermediate stage of conditioning, all animals received a single session of extinction during which all three manipulanda were present. Following response training, one of the reinforcers (but not the other) was paired with a toxin (the reinforcer devaluation operation) and all animals were then given a second extinction test with access to all three manipulanda (the stimulus was not presented during either test). Appropriate care was taken to ensure that the presentation of the reinforcers during aversion conditioning closely approximated the manner of their occurrence during instrumental training. A comparison of the results of the first and second extinction tests provided the basis for determining the effect of the devaluation manipulation on the relative
170
ROBERT
ST. CLAIRE-SMITH
rates of the three responses. The first test was included, of course, to control for the possibility that the relative rates of the three responses when available singly would not be perfectly reflected by the rates during concurrent extinction. The logic of the experiment is reasonably straightforward. It would be expected that different cognitive structures would be elaborated during the unique training of each response. In the control case, both reinforcers should be associatively represented, and performance should be multiply determined by the current value of each. It would be predicted, therefore, that execution of the control response following aversion conditioning would be relatively independent of the particular reinforcer devalued. The presentation of a stimulus prior to the delivery of one of the reinforcers, however, would be expected to interfere with encoding of the relationship between execution of a response and the occurrence of that reinforcer. Thus, with regard to the responses trained with the overshadowing stimulus, it would be expected that aversion conditioning would have little impact on subsequent performance of the response for which the devalued reinforcer was signaled during training, but have a considerable impact on the performance of the response for which the devalued reinforcer was unsignaled. By contrast, if encoding of each reinforcer as a critical associate occurred to the same degree in each case, or was irrelevant to performance of the responses, selective effects of devaluation would not be anticipated. A critical feature of the experiment is that a within-subject design was employed and each animal experienced both the overshadowing and the devaluation operations. As noted by Colwill and Rescorla (1985a), the use of such a design provides inherent controls for various nonassociative factors that might influence test performance. METHOD Subjects
The subjects were 44 experimentally naive male Long-Evans rats from Charles River Laboratories, St. Constant, Quebec. They were about 130 days old at the start of the experiment. During the course of the experiment the animals were maintained in individual wire mesh cages at 85% of their free-feeding weight. Water was available on an ad libitum schedule throughout. Apparatus
The apparatus consisted of four Coulbourn Instruments operant conditioning chambers. Each chamber measured 30 cm long x 25.5 cm wide x 29 cm high, and was enclosed in a sound-attenuating and light-resistant shell equipped with a ventilating fan. The end walls of each chamber were aluminum; the side walls and ceiling were clear acrylic plastic. The
OVERSHADOWING
AND
REINFORCER
DEVALUATION
171
floors consisted of OS-cm stainless-steel rods spaced 1.7 cm apart. Ambient illumination in each chamber was provided by a bottom-hooded 7.5-W, 125-V house light in the middle of the front panel 2 cm below the ceiling. A Coulbourn pellet trough was recessed in the front panel of each chamber 2 cm above the floor equidistant from each side wall. The chambers could be placed in any one of, or any combination involving, three response modes. The position of each of the three manipulanda was constant throughout the experiment and the same for each chamber. A standard Coulboum lever was located 7 cm above the floor midway between the food magazine and the right side wall. The nosepoke manipulandum, which was a standard Coulboum pigeon key recessed 0.9 cm behind a round opening 2.5 cm in diameter, was located to the left of the magazine 3 cm above the floor. A stainless-steel rod, 0.8 cm in diameter and 11 cm in length, was suspended from the ceiling 13 cm from the front panel and midway between the side walls. To prevent access to any one of the manipulanda, it was removed from the chamber. In the case of the nose key and the lever, an aluminum plate covered the opening in the front panel created by their absence. These manipulanda established three quite distinct response topographies. The rats typically used one of their front paws to press the lever and invariably their nose to depress the key. To reach the rod the animals had to rear: A push in any direction sufficient to displace the rod about 1.5 cm from vertical operated the rod microswitch. The overshadowing cue was a 0.5-s visual stimulus provided by three 7.5-W, 125-V opaque white jewel lamps. Two of the lamps were positioned on the front panel 4 cm from the ceiling; one was directly above the lever and the other was directly above the nose key. The third lamp was located on the ceiling 12 cm from the rod in the direction of the rear panel. The lamps were present in the chambers at all times, and all three were turned on simultaneously to provide the visual cue. Experimental events were controlled and recorded by a Coulbourn modular system located in an adjoining room. Procedure
All subjects received 15-min sessions of magazine training in the morning and in the afternoon on each of the first 2 days. In each session, 15 reinforcers were delivered on a random-time 60-s schedule (RT 60) with a probability of reinforcement in a 3-s interval set at l/20. For half the animals, in the morning session of Day 1 and the afternoon session of Day 2 the reinforcer was a standard 45-mg Bioserv dustless precision pellet (rat chow) while in the afternoon session of the first day and the morning session of the second day the reinforcer was a Bioserv 45-mg sucrose pellet (Product 004B2). This order was reversed for the remaining animals. The response manipulanda were not present during this stage of training.
172
ROBERT
ST. CLAIRE-SMITH
Responses were trained in separate sessions throughout the experiment. Two general features of the response training procedure should be noted. First, a 0.5-s delay interval was interposed between the occurrence of the operative response and the operation of the pellet dispenser. This delay of reinforcement interval was in effect for all response training sessions. Second, the type of food pellet dispensed according to a schedule of reinforcement was randomly determined, with the delivery of a sucrose pellet or a standard pellet being equally probable as the response-contingent reinforcer on any particular occasion. Training of the rod response was begun the day following magazine training. Subjects were placed in the chambers, with the magazines preloaded with four food pellets (two of each type) and each rod push programmed to deliver a reinforcer, and permitted 30 min in which to earn 50 reinforcers. For each subject, these daily sessions continued until a criterion of 50 reinforcers earned in 30 min was met. Except for a few instances where hand shaping was necessary, each session terminated after 30 min, or upon the delivery of the 50th reinforcer. All rats met the criterion within 4 days. Over the next 4 days, the rod-push response was reinforced on a random-interval (RI) schedule. On the 1st and 2d day, the probability of a reinforcer being available in each successive 3-s interval was set at l/10 (RI 30). On the 3d and 4th day, an RI 60-s reinforcement schedule was in effect, with the probability of reinforcement contingent on a response set at l/20 for each 3-s interval. Daily sessions were 30 min in duration. Training to esablish the lever-press and nose-poke responses occurred over the next 4 days. On the day immediately following the last day of RI training of the rod response the chambers were placed in the leverresponse mode and all animals received a 30-min, RI 30-s, lever-press training session. The next day the same procedure was used to establish the nose-poke response. On each of the next 2 days, with the RI 60-s reinforcement‘schedule in effect, the rats received two 20-min training sessions, one in the morning and one in the afternoon. On the morning of the 1st day and the afternoon of the 2d day the chambers were in the lever-response mode; during the afternoon session of the 1st day and the morning session of the 2d day the chambers were in the nose-poke response mode. The overshadowing manipulation was introduced at the same time that nose-poke and lever-press response training was begun. For each animal, the visual stimulus signaled one type of reinforcer during nose-poke response sessions and the alternate reinforcer during lever-press response sessions. Figure 1 provides a complete description of the critical procedures in the experiment. For half the animals (Groups LVS+ :N- S+ and L-F+ :NVF+ in Fig. l), the 0.5-s visual stimulus (V) occurred in the delay interval between the operative lever-press response (L) and the
OVERSHADOWING
AND REINFORCER
DEVALUATION
173
I Procechrras RBllPOlW Extinction R~WZOll~.3 Aversion Extinction L”S+:
L-S+:
N-S+
NVS+
Training
Test 1
ftetraintng
LVS. L-F
N. L. R
LVS. L-F
S -LiCt
N-S, NVF
N-S, NVF
F-5
R-S. R-F
R-S, R-F
NVS, N-F
N. L. R
R-S, R-F L-S, LVF LVF*:
N-F+
LVF. L-S N-F, NVS
N. L. R
R-F. R-S L-F+:
NYF+
Condltioning
NVF. N-S R-F, R-S L-F, LVS
NVS. N-F
s-
R-S, R-F L-S, LVF
F-5
LVF, L-S
F-LO
N-F,
s-e
NVS
Test 2
N. L. R
LiCl
N. L. R
N. L. R
R-F. R-S N. L, R
NVF. N-S
F-LiC,
R-F, R-S
S-0
N. L, R
L-F. LVS
FIG. 1. Experimental procedures for each group. N = nose poke, R = rod push, L = lever press, V = visual stimulus (a 0.5-s light, (-) = a 0.5-s gap between a response and operation of the food magazine, S = sucrose pellet, F = food pellet, LiCl = lithium chloride.
operation of the sucrose (S) pellet delivery mechanism, and in the delay interval between the operative nose-poke response (N) and the operation of the standard food (F) pellet dispenser. For the remaining animals (Groups L-S+ :NVS + and LVF+ :N -F+ in Fig. l), the opposite arrangement held, and sucrose pellets were signaled when contingent on nose-key responses and food pellets were signaled when contingent on lever responses. Neither sucrose nor food pellets were signaled in the case of the rod response. Over the next 6 days of training involving all three responses, morning and afternoon sessions were 20 min in length and all responding was reinforced on a RI 60-s schedule. For half the animals in each group, the order of the sessions was rod push, nose poke, lever press (cycle repeated); the order for the remaining animals was rod push, lever press, nose poke (cycle repeated). Thus, of a total of 12 training sessions, four sessions were devoted to each response, two in the morning and two in the afternoon. All animals were then given a 20-min extinction test during which all three manipulanda were available for responding. The purpose of the test was to provide a reference level of responding on each manipulandum against which the effects of subsequent reinforcer devaluation could be assessed. The visual stimulus was not presented during the test. Following the test, all animals again experienced the 6-day training sequence that preceded the test. Thus, an additional four sessions of training on the RI 60-s schedule was provided for each of the three responses.
174
ROBERT
ST. CLAIRE-SMITH
After completion of response training, an aversion was conditioned to one of the reinforcers. This conditioning was carried out in the operant chambers, with the manipulanda removed, over 12 consecutive days (following the procedure reported by Colwill and Rescorla, 1985a). On the 1st day, and on each odd-numbered day, the reinforcer designated to be devalued was delivered on an RT 60-s schedule for 20 min or until the animal refused to consume it. At the end of these 20-min sessions, or 10 min after the consumption of the last reinforcer, subjects were administered a 0.5 mL/kg intraperitoneal (ip) injection of 0.6 M lithium chloride (LiCl) and then returned to their home cages. On the 2d day, and on each even-numbered day, the other reinforcer was delivered on the RI 60-s schedule. At the end of these 20-min sessions, the subjects were returned to their home cages. During critical sessions, the animals were observed through viewscopes in the doors of the sound-attenuating shells. For half the animals, food pellets were conditioned; for the remaining animals an aversion was established to sucrose pellets. This procedure established the four experimental groups identified in Fig. 1. In the first two groups (Groups LVS + :N - S + and L-S + :NVS +), subjects were averted to the sucrose pellets (S+); in the last two groups (Groups LVF+ :N-F+ and L-F+ :NVF+), the animals were averted to the food pellets (F+). Unfortunately, one animal died as a result of the injection procedure. To equate the size of the groups, one subject, randomly chosen, was dropped from each of the remaining three groups. On the day following completion of the reinforcer devaluation training sequence, a second extinction test was administered to all animals. This test was identical to the first test: During the 20-min session all three maniuplanda were available, but responses were not reinforced and the overshadowing stimulus was not presented. The next day the effectiveness of the aversion conditioning treatment was assessed by two consumption tests, one in the morning and one in the afternoon. Subjects were placed in the chambers and 10 reinforcers were delivered on a RT 60-s schedule during each test session. Half of the subjects in each of the groups received the poisoned reinforcers during the morning session and the unpoisoned reinforcers during the afternoon session: The remaining subjects were given the unpoisoned reinforcers in the first session and the poisoned reinforcers in the second session. The response manipulanda were not in the chambers during these sessions. RESULTS
The animals gradually increased their rate of responding on each manipulandum throughout RI training and recovered quickly from the effects of the first extinction test. Performance during response training was assessed by comparing the mean number of responses made on each manipulandum during the last RI training session prior to each extinction
OVERSHADOWING
AND REINFORCER
DEVALUATION
175
TABLE 1 Mean Responses per Minute for Each Experimental Group on Each Maniuplandum during the Second Extinction Test Responses Group LVS+ :N-S+ L-S+ :NVS+ LVF+ :N-F+ L-F+ :NVF+
N
R
L
6.4 11.7 6.6 13.4
3.7 4.3 3.8 4.0
5.6 4.2 5.8 4.9
Note. N = nose poke, R = rod push, L = lever press.
test. Split-plot analyses of variance, with Groups (4) as the betweensubject factor and Response Identity (3) as the within-subject factor, revealed no reliable differences between the groups but indicated a significant effect due to Response Identity prior to both the first [F(2, 72) = 6.36, p < .Ol] and second [F(2, 72) = 5.25, p < .Ol] test. The mean rate of responding (collapsed across groups) on the nose key, rod, and lever, respectively, was 27, 11.7, and 11.9 responses per minute prior to Test 1, and 36.5, 14.8, and 16.5 responses per minute prior to Test 2. Fisher’s LSD multiple comparisons test confirmed that the animals made significantly more responses on the nose key than on either of the other two manipulanda during these RI sessions. Performance during extinction testing was reasonably robust, with all animals distributing responses across the three simultaneously available manipulanda during each test. The mean rate of responding (collapsed across groups) during Test 1 on the nose key, rod and lever, respectively, was 9.9, 4.6, and 5.8 responses per minute. A split-plot analysis of variance revealed no significant differences between groups in mean total responses on each manipulandum during Test 1, but yielded a significant main effect for Response Identity [F(2, 72) = 8.15, p < .Ol]. A multiple comparisons test indicated that this effect was the result of the rats responding, as they had done during response training, significantly more on the nose-poke manipulandum than on either of the other two manipulanda (p < .05). Table 1 shows mean responses per minute for each group on each manipulandum during Test 2. Split-plot analysis of mean number of responses on each manipulandum during this test revealed a significant main effect for Response Identity [F(2, 72) = 5.61, p < .Ol] and a significant Groups x Response Identity interaction [F(6, 72) = 4.49, p < .Ol]. The multiple comparisons test revealed that the mean number of responses on the nose-key manipulandum for both Groups
176
ROBERT ST. CLAIRE-SMITH
” Lvsc: L-F:
N-S+ NVF
L-S+: LVF:
NV% N-F
LVF+:
N-F+
L-S:
NVS
L-F+: LVS:
NVF+ N-S
GROUPS FIG. 2. Mean relative score percentages for the nose poke (N), rod push (R), and lever press (L) responses during the second extinction test for each of the four experimental groups. The left panel displays the performance of the two groups for whom the sucrose pellet was paired with LiCl (S+), while the right panel displays the performance of the groups for whom the devalued reinforcer was the standard food pellet (F+).
L-S + :NVS+ and L-F+ :NVF+ was significantly higher than the mean number of responses on the rod and lever manipulanda for these groups and significantly higher than the mean number of responses on any of the manipulanda for either of the other two groups (p < .05). Thus, as Table 1 shows, and as these analyses confirm, devaluation of the reinforcer that was unsignaled during nose-poke training (Groups LVS + :N - S + and LVF + :N - F + ) eliminated the pattern characteristic of RI training and Test 1 performance of substantially greater responding on the nose key than on either of the other two manipulanda. The data of primary interest are presented in Figs. 2 and 3, which show the performance of each group on each manipulandum during Test 2 expressed as a function of Test 1 performance. In order to provide a direct comparison of response rates on the two tests, the number of responses made by each subject on a manipulandum during the second extinction test was converted to a percentage of the number of responses made by the subject on that manipulandum during the first test. Figure 2 shows the performance of each group on each manipulandum during Test 2 expressed as a mean relative score percentage of Test 1 responding. As base rates of responding increased substantially between the first and second test, a score above 100% is not necessarily surprising. As can be seen in Fig. 2, relative scores were lowest for the nosepoke response and highest for the lever-press response in the two groups
OVERSHADOWING
ii6
90
:
80
2 ga
Pi 0,
DEVALUATION
177
100
;5
;
AND REINFORCER
70 8
60 50
p”; 40 g 2
30
$
20
z
10
Y
0 LVS+:
N-3+
L-F:
NVF
L-S*: LVF:
w.s+
LVF+
N-F
L-3:
: N-F+
L-F+:
NV3
Lvs:
NVF+ N-3
GROUPS FIG. 3. Mean relative proportion score percentages for the nose-poke (N), rod-push (R), and lever-press (L) responses during the second extinction test for each of the four experimental groups. Responses made by the rats on each manipulandum during each test were expressed as a proportion of responses made on the manipulandum during the prior RI training session, and the proportional scores for Test 2 were expressed as a percentage of proportional scores for Test 1.
(Groups LVS + :N-S + and LVF+ :N-F+) where the devalued reinforcer was unsignaled for the nose-poke response and signaled for the lever-press response. Conversely, scores were highest for the nose-key response and lowest for the lever-press response in the two groups (Groups L - S + :NVS + and L - F + :NVF + ) where the devalued reinforcer was signaled for the nose-poke response and unsignaled for the lever-press response. Mean relative scores were intermediate for the rod-push response within each group. A split-plot analysis of variance of the mean scores revealed a significant Groups x Response Identity interaction [F(6, 72) = 4.77, p < .Ol]. Fisher’s multiple comparisons test confirmed the significance of a number of differences among the 12 mean relative scores displayed in Fig. 2. Except for the rod versus lever comparisons in Groups L - S + :NVS + and LVF + :N - F + , and the nose-key versus rod comparison in Group L-S + :NVS + , all nose-key, rod, and lever mean score within-group differences displayed in Fig. 2 were significant (p < .05). The multiple comparisons test revealed also that each of the mean scores for the nosepoke manipulandum in Groups LVS + :N - S + and LVF + :N - F + , which did not differ significantly from each other, differed significantly 0, < .05) from each of the mean scores for the same manipulandum in Groups L - S + :NVS + and L - F + :NVF + (p < .05), which were significantly
178
ROBERT
ST. CLAIRE-SMITH
different from each other (p < .05). A similar pattern emerged for the between-group comparisons of mean relative scores for the lever-press manipulandum. Mean lever-response scores in Groups LVS + :N - S + and LVF+ :N - F+ , while not significantly different from each other, were significantly different (p < .05) from the mean scores for the lever manipulandum in Groups L - S + :NLS + and L - F + :NLF + , which did differ significantly from each other 01 < .05). None of the other pairwise comparisons showed significant effects. In order to provide a comparison of performance on the two extinction tests corrected for differences in base rates of responding on the three manipulanda, a second analysis of Test 2 versus Test 1 performance was conducted. The number of responses made by each subject on each manipulandum during a test was converted to a proportional score that represented the number of occurrences of a particular response during a test as a percentage of the number of occurrences of that response during the preceding RI training session. Relative proportional scores were then calculated by expressing the Test 2 proportional score for each subject on each manipulandum as a percentage of the Test 1 score of the subject on the same manipulandum. Given the substantial differences that existed between the responses in RI training rates, it can be argued that the conversion of raw scores to proportional scores provides a more appropriate basis for comparisons of Test 2 versus Test I performance. Mean relative proportional scores for each of the three responses for the four experimental groups are displayed in Fig. 3. On this measure, a score of 100% indicates an equivalence of Test 1 and Test 2 proportional responding; that is, rates of responding on each test relative to acquisition rates are the same. The lower the score, the greater the magnitude of the decrease in responding from the pre-Test 2 training session to the Test 2 extinction session relative to the magnitude of the decrease in responding from the pre-Test 1 training session to the Test 1 extinction session. Of course, a score exceeding 100% would indicate greater resistance to extinction on Test 2 than on Test 1. As Fig. 3 shows, however, proportional scores were substantially lower for Test 2 than for Test 1. As can be seen, the relative proportional scores displayed in Fig. 3 show the same pattern of Test 2 versus Test 1 responding that characterized the relative scores presented in Fig. 2. Relative proportional scores were lowest for the nose-poke response and highest for the lever-press response when the devalued reinforcer was unsignaled for the nose-poke response and signaled for the lever-press response (Groups LVS+:N-S+ and LVF + :N - F + ) and, conversely, highest for the nose-poke response and lowest for the lever-press response when the devalued reinforcer was signaled for the nose-poke response and unsignaled for the leverpress response (Groups L - S + :NVS + and L-F + :NVF +). The scores for the rod-push response were intermediate for each group.
OVERSHADOWING
AND
REINFORCER
DEVALUATION
179
A split-plot analysis of variance of the mean relative proportional scores revealed a significant Groups x Response Identity interaction [F(6, 72) = 7.93, p < ,011. Fisher’s multiple comparisons test indicated that all within-group nose-key, rod, and lever comparisons were significantly different (p < .05) except for the rod versus lever comparisons in Groups L - S + :NVS + and LVF + :N - F + . Thus, except in two instances (where analysis of the mean relative scores presented in Fig. 2 also failed to reveal significant differences), all within-group differences displayed in Fig. 3 were significant. Further, the mean relative proportional scores for the nose-poke manipulandum in Groups LVS + :N-S + and LVF + :N - F + , which did not ditfer from each other, differed significantly from each of the mean scores for the same manipulandum in Groups L - S + :NVS + and L - F + :NVF + (p < .05), which were not significantly different from each other. Mean lever-response scores in Groups LVS + :N - S + and LVF + :N -F + , while not significantly different from each other, were significantly different from the mean scores for the lever manipulandum in Groups L - S + :NLS + and L - F + :NLF + (p < .05), which were not significantly different from each other. None of the other pairwise comparisons showed significant effects. The results of the consumption tests administered after Test 2 indicated that a Pavlovian discrimination had been successfully established during aversion conditioning. All animals consumed all 10 unpoisoned reinforcers delivered, while the mean number of poisoned reinforcers consumed for each of the groups in Figs. 2 and 3 (reading from left to right) was 0.4, 0.3,0.2, and 0.2, and no animal ate more than 2 of these pellets. Aversion conditioning did not proceed quite as rapidly as reported in other studies (e.g., Colwill & Rescorla, 1985a): On the last day of conditioning the mean number of poisoned reinforcers consumed ranged between 0.9 and 1.4. Differences between the two types of reinforcers both in terms of rates of aversion conditioning and test results were too small to warrant statistical analysis. DISCUSSION The major finding of this experiment is that, when one of two reinforcers of a response is signaled during training, the rate of responding in extinction relative to the rate of responding during training is lower following postconditioning devaluation of the unsignaled as opposed to the signaled reinforcer. This result is entirely consistent with the proposition that, when instrumental conditioning involves the presentation of two different reinforcers of a response, a cognitive structure is elaborated during training incorporating associations involving each reinforcer, and performance of the response in extinction is multiply determined by the current value of the encoded representation of each reinforcer as a function of the degree to which each is established during training as an associate.
180
ROBERT
ST. CLAIRE-SMITH
The overall pattern of results obtained in the present study is clearly predicted from an overshadowing analysis: The greatest decrease in responding would be expected when one reinforcer was signaled and the other reinforcer was devalued, because performance of the response would depend primarily on its association with the devalued reinforcer; a moderate decrease in responding would be expected when neither reinforcer was signaled, because devaluation would reduce or eliminate one of the two sources of support for the response; and the smallest decrease in responding would be expected when the signaled reinforcer was devalued, because performance of the response would be a function primarily of its association with the nondevalued reinforcer. Although the effect of signaling a reinforcer has been assumed in this experiment to be a decrease in the degree to which the reinforcer enters an association critical to the generation of the response, an alternative to the view of the overshadowing manipulation as a learning-decrementing procedure has been advanced by Tarpy and his colleagues, who have suggested that this manipulation might well be a learning-incrementing procedure. On the basis of a variety of evidence, including both the finding that signaling reinforcement on DRL and DRH schedules produces appropriately lower response rates on the first schedule and higher rates on the second schedule (Tarpy & Roberts, 1983, and the finding that signaling reinforcement increases resistance to disruption, as measured by relative rates of responding during extinction following post-training satiation (Roberts, Tarpy, & Lea, 1984) these investigators have proposed that signaling reward results in enhanced rather than attenuated learning. The pattern of results in the present study, however, would not appear to support the view that the overshadowing manipulation is a learningenhancing procedure. Under the conditions of the present experiment, at least, there was no evidence that the animals learned more about the reinforcer when it was signaled, whether it was devalued or nondevalued, and substantial evidence that they learned less. While the present pattern of results is consistent with theories of instrumental learning that emphasize the importance of the encoding of the reinforcer, as an associate, to the control of instrumental responding, these findings do not permit a decision to be made concerning the nature of the critical associative structure incorporating the representation of the reinforcer. One possibility is that separate associations are formed between a response and each reinforcer contingent on the response, and performance of the response depends on the net effect of all reinforcers, each contributing according to its current value weighted by the strength of the association between it and the response. This was the hypothesis entertained in undertaking the experiment, and is an extension of the basic response-reinforcer model (e.g., Mackintosh & Dickinson, 1979; St. Claire-Smith, 1979a, 1979b). Another possibility is that the separate
OVERSHADOWING
AND
REINFORCER
DEVALUATION
181
associations formed are between contextual stimuli and each responsecontingent reinforcer (because of the correlation of such stimuli and each reinforcer during instrumental training), and performance of the response is motivated or controlled by Pavlovian conditioned responses that reflect the current value of each reinforcer weighted by the strength of the association between it and the contextual stimuli. This latter possibility is, of course, consistent with various theories that emphasize the role of conditioning to contextual stimuli in mediating instrumental performance (e.g., Rescorla & Solomon, 1967; Trapold & Overmier, 1972). While the present data do not permit rejection of either these two possibilities, as pointed out by Colwill and Rescorla (1985b) the use of the within-subject design does permit rejection of any interpretation of selective effects of experimental manipulations based on the influence of general background cues. As these authors note, conditioning to general background stimuli cannot account for the observation of selective effects of reinforcer devaluation under these conditions. Similarly, the selective effects of the overshadowing manipulation cannot be attributed to interference with the degree of conditioning to general background cues. Such effects can only be accounted for if, during response training, the critical associations established were between the reinforcers and each response or, in a Pavlovian account, were between the reinforcers and the local or defining features of each response manipulandum. On this last point, Hall, Channel], and Pearce (1981) found that the demonstration of the overshadowing effect requires response-contingent presentations of the signal-food sequence. As noncontingent presentations of the sequence did not produce the effect, even though the signal remained a better predictor of food than the response, Hall et al. concluded that the effect of the overshadowing manipulation cannot be attributed to overshadowing of context-reinforcer associations. Finally, it has been pointed out that the overshadowing effect could result if signaled reinforcers are simply less effective than unsignaled reinforcers (e.g., Rescorla & Holland, 1982) in stamping in stimulusresponse (S-R) connections. However, the nature of the interactions between the devaluation and overshadowing operations observed in the present experiment is not consistent with such a view, nor, more broadly with the view that instrumental performance is determined by S-R connections. If the role of a reinforcer was simply to promote an association between a response and the stimulus situation and signaling a reinforcer simply reduced its effectiveness as catalyst in promoting this association, then in the present experiment there would have been no difference in the strength of the S-R connections established for the nose-poke and lever-press responses at the end of RI training and, therefore, no basis for the observation of differences in the performance of these responses as a function of whether the unsignaled or signaled reinforcer was the one subsequently devalued.
182
ROBERT
ST. CLAIRE-SMITH
REFERENCES Adams, C. D. (1980). Post-conditioning devaluation of an instrumental reinforcer has no effect on extinction performance. Quarterly Journal of Experimental Psychology, 32, 447-458. Adams, C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly Journal of Experimental Psychology, 34B, 77-98. Adams, C. D., & Dickinson, A. (1981). Actions and habits: Variations in associative representations during instrumental learning. In N. E. Spear & R. R. Miller (Eds.), Information processing in animals: Memory mechanisms (pp. 143-165). Hillsdale, NJ: Erlbaum. Bindra, D. (1974). A motivational view of learning, performance, and behavior modification. Psychological Review, 81, 199-213. Bolles, R. C. (1972). Reinforcement, expectancy, and learning. Psychological Review, 79, 394-409. Chen, J. S., & Amsel, A. (1980). Recall (versus recognition) of taste and immunization against aversive taste anticipations based on illness. Science, 209, 851-853. Colwill, R. M., & Rescorla, R. A. (1985a). Postconditioning devaluation of a reinforcer affects instrumental responding. Journal of Experimental Psychology: Animal Behavior Processes,
11, 120-132.
Colwill, R. M., & Rescorla, R. A. (1985b). Instrumental responding remains sensitive to reinforcer devaluation after extensive training. Journal of Experimental Psychology: Animal
Behavior
Processes,
11, 520-536.
Dickinson, A. (1980). Contemporary animal lenrning theory. Cambridge: Cambridge Univ. Press. Dickinson, A., Nicholas. D. J., & Adams, C. D. (1983). The effect of the instrumental training contingency on susceptibility to reinforcer devaluation. Quarterly Journal of Experimental
Psychology,
35B,
249-263.
Dickinson, A., Peters, R. C., & Shechter. S. (1984). Overshadowing of responding on ratio and interval schedules by an independent predictor of reinforcement. Behavioral Processes,
9, 42 1-429.
Garrud, P., Goodall, G.. & Mackintosh, N. J. (1981). Overshadowing of a stimulus-reinforcer association by an instrumental response. Quarterly Journal of Experimental Psychology, 33B, 123-135. Guthrie, E. R. (1952). The psychology of learning. New York: Harper. Hall, G. (1982). Effects of a brief stimulus accompanying reinforcement on instrumental responding in pigeons. Learning and Motivation, 13, 26-43. Hall, G., Channel], S., & Pearce, J. M. (1981). The effects of a signal for free or for earned reward: Implications for the role of reponse-reinforcer associations in instrumental performance. Quarterly Journal of Experimental Psychology, 33B, 95-107. Holland, P. C. (1977). Conditioned stimulus as a determinant of the form of the Pavlovian conditioned response. Journal of Experimental Psychology: Animal Behavior Processes, 3, 77-104. Holman, E. W. (1975). Some conditions for dissociation of consummatory and instrumental behavior in rats. Learning and Motivation, 6, 358-366. Hull, C. L. (1943). Principles of behavior. New York: Appleton-Century-Crofts. Kamin, L. J. (1968). “Attention-like” processes in classical conditioning. In M. R. Jones (Ed.), Miami symposium on predictability, behavior and aversive stimulation (pp. 933). Coral Gables, FL: Univ. of Miami Press. Mackintosh, N. J. (1974). The psychology of animal learning. Orlando/London: Academic Press. Mackintosh, N. J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276-298.
OVERSHADOWING
AND REINFORCER
DEVALUATION
183
Mackintosh, N. J., & Dickinson, A. (1979). Instrumental (Type II) conditioning. In A. Dickinson & R. A. Boakes (Eds.), Mechanisms of learning and motivation: A memorial volume to Jerzy Konorski (pp. 143-169). Hillsdale, NJ: Erlbaum. Morgan, M. J. (1974). Resistance to satiation. Animal Behavior, 22, 449-466. Morrison, G. R., & Collyer, R. (1974). Taste-mediated aversion to an exteroceptive stimulus following LiCl poisoning. Journal of Comparative and Physiological Psychology, 86, 51-55. Pearce, J. M., & Hall, G. (1978). Overshadowing the instrumental conditioning of a lever press response by a more valid predictor of reinforcement. Journal of Experimenral Psychology:
Animal
Behavior
Processes,
4, 356-367.
Rescorla, R. A., & Holland, P. C. (1982). Behavioral studies of associative learning in animals. Annual Review of Psychology, 33, 265-308. Rescorla, R. A., & Solomon, R. L. (1%7). Two-process learning theory: Relationships between Pavlovian conditioning and instrumental learning. Psychological Review, 74, 151-182. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64-99). New York: Appleton-Century-Crofts. Roberts, J. M., Tarpy. R. M., & Lea, S. E. G. (1984). Stimulus-response overshadowing: Effects of signaled reward on instrumental responding as measured by response rate and resistance to change. Journal of Experimental Psychology: Animal Behavior Processes, 10, 244-255. Rozeboom, W. W. (1958). “What is learned?“-An empirical enigma. Psychological Bulletin, 65, 22-33. Shettleworth, S. J. (1981). Reinforcement and the organization of behavior in golden hamsters: Differential overshadowing of a CS by different responses, Quarterly Journal of Experimental Psychology, 33B, 241-255. St. Claire-Smith, R. (1979a). The overshadowing of instrumental conditioning by a stimulus that predicts reinforcement better than the response. Animal Learning and Behavior, 7, 224-228. St. Claire-Smith, R. (197913). The overshadowing and blocking of punishment. Quarterly Journal of Experimental Psychology, 31, 51-61. St. Claire-Smith, R., & MacLaren, D. (1983). Response preconditioning effects. Journal of Experimental Psychology: Animal Behavior Processes, 9, 41-48. Tarpy, R. M., Lea, S. E. G., & Midgley, M. (1983). The role of response-US correlation in stimulus-response overshadowing. Quarterly Journal of Experimental Psychology, 35B, 53-65. Tarpy, R. M., & Roberts, J. E. (1985). Effects of signaled reward in instrumental conditioning: Enhanced learning on DRL and DRH schedules of reinforcement. Animal Learning and Behavior, 13, 6-12. Tarpy, R. M., Roberts, J. E., Lea, S. E. G., & Midgley, M. (1984). The stimulus-response overshadowing phenomenon with VI versus FI schedules of reinforcement. Animal Learning and Behavior, 12, 50-54. Tarpy, R. M., St. Claire-Smith, R., & Roberts, J. E. (1986). The effect of informational stimuli on instrumental response rate: Signaling reward versus signaling the availability of reward. Quarterly Journal of Experimental Psychology, 38B, 173-189. Thorndike, E. L. (1898). Animal intelligence: an experimental study of the associative processes in animals. Psychological Review Monograph Supplements, 2, l-109. Trapold, M. A., & Overmier, J. B. (1972). The second learning process in instrumental learning. In A. A. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 427-452). New York: Appleton-Century-Crofts.
184
ROBERT
ST. CLAIRE-SMITH
Wagner, A. R. (1969). Stimulus validity and stimulus selection in associative learning. In N. J. Mackintosh & W. K. Honig (Eds.), Fundamental issues in associative learning. Halifax, Canada: Dalhousie Univ. Press. Williams, B. A. (1975). The blocking of reinforcement control. Journal of the Experimental Analysis
of Behavior,
24, 215-227.
Williams, B. A. (1978). Information effects on the response-reinforcer Learning
and Behavior,
association. Animal
6, 371-379.
Williams, B. A. (1982). Blocking the response-reinforcer association. In M. L. Commons, R. J. Herrstein, & A. R. Wagner (Eds.), Quantitative analyses of behavior: Vol. 3. Acquisition (pp. 427-445). New York: Appleton-Century-Crofts. Williams, B. A., & Heyneman, N. (1982). Multiple determinants of “blocking” effects on operant behavior. Animal Learning and Behavior, 10, 72-76. Wilson, C. L., Sherman, J. E., & Holman, E. W. (1981). An aversion to the reinforcer differentially affects conditioned reinforcement and instrumental responding. Journal of Experimental Psychology: Animal Behavior Processes. 7, 165-174. Received July 25, 1986 Revised December 12, 1986