Serial spatial reversal learning in rats: Comparison of instrumental and automaintenance procedures

Serial spatial reversal learning in rats: Comparison of instrumental and automaintenance procedures

Physiology & Behavior, Vol. 50, pp. 1145-1151. PergamonPress plc, 1991. Printed in the U.S.A. 0031-9384/91 $3.00 + .00 Serial Spatial Reversal Learn...

670KB Sizes 0 Downloads 68 Views

Physiology & Behavior, Vol. 50, pp. 1145-1151. PergamonPress plc, 1991. Printed in the U.S.A.

0031-9384/91 $3.00 + .00

Serial Spatial Reversal Learning in Rats: Comparison of Instrumental and Automaintenance Procedures P H I L I P J. B U S H N E L L 3 A N D M A R K E. S T A N T O N

Neurotoxicology Division, U.S. Environmental Protection Agency, Research Triangle Park, N C 27711 R e c e i v e d 2 4 S e p t e m b e r 1990 BUSHNELL, P. J. AND M. E. STANTON. Serial spatial reversal learning in rats: Comparison of instrumental and automaintenance procedures. PHYSIOL BEHAV 50(6) 1145-1151, 1991.--Serial reversals of a spatial discrimination were trained in rats under automaintenance conditions, in which food reward occurred regardless of responding. This automalntained reversal learning was compared to instrumental reversal learning in other rats trained under a similar procedure which required responding for reward. In the automaintenance (AU) procedure, rats received food after every retraction of a "positive" response lever (S÷); retraction of a second, "neutral" lever (S°) was not paired with food delivery. Responses to the S + were elicited at fairly constant rates during daily 100-trial conditioning sessions. Responses to the S° occurred early in each session but rapidly diminished across trials. When the valences of the levers were reversed, responding shifted to the new S + and diminished on the new S°. Criterion for reversal was defined as a discrimination ratio (DR) of at least 90% responding to the S + in two consecutive 10-trial blocks. With repeated reversals, acquisition of criterion performance occurred with increasing rapidity, reaching an asymptote below that required for the original discrimination. A second group of rats was trained on a similar instrumental schedule, in which at least one response to the S ÷ was required for food delivery, Response rates in this instrumental ON) group were approximately double those of the AU group. However, ratios of S ÷ to S° response rates were similar to those of the AU group, and the serial reversal curves generated were qualitatively similar. Thus rats can show improvement across serial reversals of a spatial discrimination based entirely on pairings of stimulus events (automaintenance), in a manner similar to that observed in instrumental procedures, in which reward is contingent upon correct responding. Automaintenance Instrumental conditioning Pavlovian conditioning Repeated acquisition Serial reversals Spatial discrimination

TRADITIONALLY, studies of reversal learning have employed instrumental learning procedures; studies of serial reversal learning that have used other conditioning procedures are rare or nonexistent. Reversal of classically conditioned nictitating membrane responses in rabbits has been used to evaluate the role of hippocampal lesions (14), but only with a single reversal. Since Pavlovian learning procedures have facilitated solution of theoretical problems in the analysis of instrumental (operant) discrimination learning procedures [e.g., (12,19)], analysis of serial reversal learning may also benefit from this approach. It was therefore of interest to determine whether the changes in rate of reversal learning typical of instrumental tasks could be obtained using a classical conditioning approach. Automaintenance, in which responding to a signal for reward is elicited by pairing the signal consistently with the occurrence of that reward, was used for this purpose. In typical studies of this type, rats press a single lever, the retraction of which reli-

Rat

Reversal learning

ably precedes food delivery; these presses are maintained by the pairing of the lever and the food (9,18). In contrast to instrumental tasks, no response is required for delivery of food; thus this conditioning procedure lacks any response-reward contingency, and relies instead upon stimulus-reward pairings to elicit responses from the animal. We have shown previously that automaintained responding can be used as an index of reversal learning (3,4). In this work, two levers were repeatedly inserted into a test chamber for a brief time interval; the retraction of one lever reliably preceded pellet delivery, while that of the second did not. Under these conditions, rats emitted many responses to the first lever (S ÷) and few if any to the second lever. Because its retraction was temporally uncorrelated with pellet delivery, this second lever was called an S ° (rather than an S - ) . When the contingencies between the levers were reversed, the rat's behavior shifted toward the new S ÷ and away from the new S °, resulting in an

1The research described in this article has been reviewed by the Health Effects Research Laboratory, U.S. Environmental Protection Agency, and approved for publication. Approval does not signify that the contents necessarily reflect the views and policies of the Agency nor does mention of trade names or commercial products constitute endorsement or recommendation for use. 2A portion of these data was presented at the annual meeting of the Society for Neuroscience, Phoenix, AZ, November, 1989. 3Requests for reprints should be addressed to Philip J. Bushnell, Ph,D., Neurotoxicology Division, MD-74B, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711.

1145

BUSHNELL AND STANTON

1146

automaintained reversal. After a number of reversals, acquisition rates reached a steady state, in which a reversal was completed within 30-50 trials of a 100-trial session. Since reward was not contingent upon the occurrence of a response, differential reinforcement for choice accuracy did not occur. Previous reports with this automaintenance procedure utilized asymptotic reversal performance as a baseline against which the effects of p-xylene (3) and trimethyltin (4) were evaluated. The purposes of this paper were 1) to demonstrate that repeated reversals of a spatial discrimination under automaintenance conditions yield progressive improvement in learning individual reversals, and 2) to compare reversal learning in this task with that generated by an analogous instrumental procedure. METHOD

Subjects Thirteen male Long-Evans rats (Charles River, Raleigh, NC), 5 to 6 months of age at the beginning of behavioral training, were housed individually in suspended plastic cages on heattreated, shaved-pine bedding, under a 12:12-h L:D photoperiod with light onset at 0600 h. All testing occurred during the light phase of the cycle. Each animal was maintained at 350 g body weight by scheduled food (Ralston Purina, St. Louis, MO) delivery l to 6 h after daily behavioral testing; water was available ad lib in the home cage. The two groups of rats consisted of the control groups from two ongoing studies of the effects of repeated oral styrene exposure on behavior. All rats in this study received daily oral doses of corn oil vehicle (1.0 ml/kg b.wt.), 5 days/week for 8 weeks, ending 2 months prior to the beginning of reversal training. Testing of the instrumental group (IN: n = 7) preceded that of the automaintained group (AU: n = 6) by 3 months.

Apparatus Eight standard rat operant conditioning chambers (Coulbourn Instruments, Lancaster, PA) were each equipped with two retractable response levers 3.4 cm in width, mounted with inside edges 13 cm apart on one wall of the chamber. The levers were extended and retracted pneumatically with travel time of approximately 200 ms. The levers were modified to register depressions of less than 0.30 N; within a given chamber, the force varied by no more than 0.05 N. A cue light was mounted immediately above each lever. A food cup with a swinging plastic door (Campden Instruments, via Stoelting, Chicago, IL) was centered between the levers. A microswitch attached to the door registered nosepokes into the food cup. Each chamber was located in a sound-attenuating shell within which white noise (80 db with flat attenuation, measured at the opening of the food cup) was provided. Reinforcers were 45-mg pellets (Bio-Serv, Frenchtown, NJ). Control of stimuli and recording of responses were accomplished by computer (PDP8/a, Digital Equipment, Maynard, MA) and SKED interface with SUPERSKED software (State Systems, Kalamazoo, MI).

Initial Training Rats were trained to press one lever using a combined autoshaping-operant protocol described elsewhere (3). In brief, the lever was extended into the operant chamber and lit by its cue light. In the absence of a leverpress, the lever was retracted after 15 s and a pellet was delivered to the food cup. A leverpress during the 15-s period caused immediate retraction of the lever and pellet delivery. Trials were separated by a variable-time 45-s

intertrial interval (ITI).

Reversal Learning Instrumental group: Single-lever training. Each rat reliably pressed the response lever for food on the autoshaping-operant schedule within 18 50-trial sessions. Beginning with the 19th session, the lever remained inserted for the full 15-~; trial period, regardless of the occurrence of a response. If at least one press (R +) occurred, a pellet was delivered upon retraction of the lever at the end of the 15-s period. If no press occurred, no pellet was delivered. Twenty 100-trial sessions on this schedule were administered, followed by a series of 10 sessions involving e~tinction (data not reported). Instrumental group: Two-lever training. Beginning with the 53rd test session, the second lever (S'~I was extended into the test chamber. The contingencies between the S ~ , R ~ , and pellet delivery continued unchanged. The S° was presented on a schedule with identical intervals as that of the S ~, but S" presentations were temporally uncorrelated with the S ' schedule and with pellet delivery. Since pellet delivery was randomly timed with respect to the S ° , it could by chance follow retraction of the S ° on some trials. Each press to the S '~ (R"J was recorded but had no programmed consequence. The average value of the interstimulus intervals for both S ~ and S ° was reduced to 30 s (range, 1.6 to 99.1 s) at this time as well. Five 100-trial sessions on this schedule served to define the original discrimination (OD) between the S + and the S °. Automaintenance group: Single-lever training. When all rats reliably emitted leverpresses (16 50-trial sessionsL the schedule was modified such that a leverpress no longer caused lever retraction, and a pellet was delivered at the end of the 15-s period regardless of the rat's behavior. As for the IN group, daily sessions were 100 trials on this automaintenance schedule, which remained in effect for 16 sessions. A series of tests of the effects of extinction (8 sessions) and changes in probability of reinforcement (12 sessions) followed (data not reported). Automaintenance group: Two-lever training. Beginning with the 58th test session, the S° was presented for the first time to the AU group. It was extended and retracted in an identical manner to that described above for the S° in group IN. The average interstimulus interval was reduced to 30 s for this group at this time as well. Also like the IN group, each animal received five 100-trial sessions under these conditions, which served to define the OD. Reversals A reversal was defined as a change in the designation of the S + and S °. At the beginning of a reversal session, the lever which was previously the S + was designated the S °, and vice versa. No cue for this change was provided to the animal. The first reversal was programmed at the beginning of the 6th twolever session for both groups. Lever designations were changed after all animals reached criterion (see below) on a given reversal, for a total of 16 reversals. Reversal 1 consisted of 500 trials, Reversal 2 of 600 trials, Reversal 3 of 400 trials, and succeeding reversals of progessively fewer trials, to a minimum of 100 trials (1 session). The OD and Reversals 1 to 9 were used to demonstrate the improvement in the rate of reversal learning as asymptotic learning was approached, and Reversals 11 to 16 were used to compare the groups' reversal performance after asymptotic reversal performance had been achieved.

Dependent Measures Response frequencies to each lever were summed across 10trial blocks for each rat. These response frequencies, f(R ÷) and

SPATIAL REVERSAL LEARNING SETS IN RATS

f(R°), were examined across trial blocks and sessions to quantify changes in response tendency toward the S + and S°. A Discrimination Ratio (DR) was calculated from these frequencies as the proportion of total responses in a given trial block directed toward the S +, i.e., DR=ffR+)/[(f(R +) + f(R°)]. The learning criterion for each reversal was a DR at or exceeding 0.9 for two consecutive 10-trial blocks. To characterize improvement in reversal learning, the number of trials to reach this criterion was counted for each rat. Trials to criterion were then averaged across rats within groups and plotted as a function of reversal. To characterize asymptotic reversal learning, daily reversals of the two groups were compared after asymptotic reversal performance had been achieved (Reversals 11 to 16). To quantify reversal acquisition at this stage of learning, parameters were estimated from these sessions by fitting nonlinear functions to DR values across 10-trial blocks for each rat within each reversal session. The equation o f Hull (11) was modified to calculate an asymptote A, a deviation from asymptote D, and a (negative exponential) learning rate R. The equation has the form DR = A - D(10-m), where DR is the Discrimination Ratio (y value) calculated for each 10-1rial block (x). X values were designated at the midpoint of each block (i.e., given values of 5, 15, 25 . . . 95 for the 100-trial session). With perfect discrimination, A is unity; the greater the deviation from A at the start of a reversal, the larger the absolute value of D; the faster the return to asymptote, the larger the absolute value of R (typically ranging from 0.01 to 0.20).

Statistical Analyses Analysis of variance [ANOVA: SAS General Linear Model (15)] with groups as a between-subject factor and reversals as a within-subject factor was used to assess improvement in reversal learning (trials to criterion). Frequencies of R + and R °, summed within trial blocks for the original discrimination, Reversal 1, and Reversals 11 to 16 combined, were subject to similar analyses with blocks and days as repeated measures. Group differences in the parameters of the nonlinear acquisition function, A, D, and R, were evaluated with t-tests. Greenhouse-G-eisser df corrections were applied as necessary to repeated-measures factors in all A.NOVAs; nominal df and correction factors (~) are reported for the F ratios obtained. The overall ot level for each ANOVA or t-test was 0.05. RESULTS

Reversal Training: Procedural Limitations During acquisition of Reversal 2, 2 of the 7 IN group rats ceased responding to the S° and did not begin responding to the S +. This extinction was remedied by continuing reversal training with 4 sessions using the AU procedure. Delivery of pellets in the absence of a R + reinstated responding to the S + in these rats. The animals were then returned to the IN procedure for the remainder of the study; however, their data were excluded from all analyses, leaving 5 rats in the IN group. Also during acquisition of Reversal 2, 2 of the 6 AU rats' leverpress rates fell close to zero. Observation of the animals showed that they attended to the S + and engaged in vigorous exploratory activity in its immediate vicinity, but did not emit detectable leverpress responses. In an attempt to shape their behavior to a topography more amenable to detection by the lever, these rats were placed on a single-lever variable-ratio (VR) schedule (1 day at VR5 followed by 1 day at VR10). Both rats

1147

responded appropriately on the VR schedule and continued to respond when returned to the AU schedule. However, their rates never approached those of the other rats in the group, and their behavior toward the manipulandum tended to drift back to its preferred mode over time. Their data, too, were excluded from all analyses, leaving 4 rats in the AU group.

Reversal Learning: Trials to Criterion The original discrimination (OD) was acquired in an equivalent number of trials by both groups (Fig. 1), and learning improved across reversals in both groups. However, different patterns of improvement were obtained: the IN group required far fewer trials to reach criterion on Reversal 1 than did the AU group, but the AU group reversed more quickly later (see below). Overall analysis of trials to criterion on Reversals 1 to 9 revealed a significant effect of Reversal, F(9,63)=14.47, ~=0.345, p<0.0001, and a significant Group × Reversal interaction, F(9,63)=4.96, ~=0.345, p<0.0085; the effect of Group was not significant, F(1,7)<1. The IN group required fewer trials to reach criterion than did the AU group on Reversal l, F(1,7)= 11.58, p<0.02. In contrast, performance of the AU group was superior to that of the IN group late in the reversal series. Averaged across Reversals l l to 16, the AU group required 32.1-+2.9 trials to reach criterion, while the IN group required 52.7_+8.0 trials, t(7)=2.42, p<0.05.

Reversal 1: Response Frequencies and Discrimination Ratios Acquisition of Reversal 1 can be seen in Fig. 2 as a rise in R ~- frequency and a fall in R ° frequency across trials and sessions (5 10-block sessions are shown) for both groups. R + frequency rose in both groups, F(49,343) = 5.40, ~ = 0.070, p<0.005, but remained significantly lower in the AU group than in the IN group throughout the reversal [main effect of Block, F(1,7)= 33.93, p<0.0006; Group × Block interaction, F(49,343)= 1.71, ~ =0.070, p>0.15]. R° frequency fell across trials in both groups [main effect of Block, F(49,343)= 12.03, ~=0.066, p<0.0001] and remained statistically equivalent throughout the reversal as neither the main effect of Group, F(1,7)= 1.14, p>0.30, nor the Group × Block interaction, F(49,343)= 1.3, = 0.066, p>0.20, was significant. Nevertheless, as measured by trials to criterion (Fig. 1) and discrimination ratios, Reversal 1 was acquired more slowly by the AU group than by the IN group. DR values across the reversal (Fig. 3) were higher for the IN group overall, F(1,7)= 6.33, p<0.04, rose across blocks for both groups, F(49,343)= 13.13, ¢=0.076, p<0.0001, but rose more slowly for the AU group than for the IN group [Group × Block interaction, F(49,343) = 3.83, ~ =0.076, p<0.016].

Asymptotic Reversal Performance After Reversal 10, all rats completed reversals within one or two 100-trial sessions (cf. Fig. 1). Performance at this stage of training is illustrated as response frequencies (Fig. 4) and DRs (Fig. 5) as a function of trial blocks, averaged across reversals 11 to 16. The ANOVA on R + frequencies showed that they remained lower in the AU group than in the IN group on all but the fast trial block; that is, while the main effects of Group, F(1,7)=4.57, p<0.07, and Block, F(9,63)= 1.95, ,=0.1981, p>0.18, did not reach statistical significance, the Group x Block interaction did, F(9,63)=4.45, a =0.198, p<0.04. Rates of responding to the S° differed between the groups more dra-

1148

BUSHNELL AND S I'ANTON

I

~

I

t

--

t

m_]

300-

g

+ i

250r

O

200,

to

¢_

,



,

t~

150-

Instrumental Automalntenance

O

I

ff)

+J

tt~

10050I OD

0

I 2

I 4

I 6

Reversal

I 8

Number

FIG. 1. Serial reversal learning curves obtained from the Automaintained (open circles) and Instrumental (filled circles) groups. Values are mean _+SEM trials to criterion [criterion= 2 consecutive 10-trial blocks with a discrimination ratio (DR) >-0.90] plotted as a function of Reversal number. OD stands for Original Discrimination.

F(1,7)=8.03, p<0.03. No significant differences between the groups were found for these reversals in either asymptotic DR values (A in Equation 1:0.972___0.015 for the AU group versus 0.977-+0.018 for the IN group), or deviations from asymptote (D in Equation 1:0.748-2--0.079 for the AU group versus 0.769 -+ 0.165 for the IN group).

matically. The ANOVA on R ° frequency showed that the IN group responded at a higher rate than did the AU group overall, F(1,7) = 13.59, p<0.008, and their rate of R ° responding fell at a slower rate than did that of the AU group as well [Group x Block interaction, F(9,63)= 10.31, ~=0.130, p<0.02]. That the AU group reversed faster in Reversals 11 to 16 is clear from the relative changes in DR across trial blocks for the two groups (Fig. 5). This difference is seen as a significantly higher learning rate parameter (R in Equation 1) for the AU group, 0.114+--0.034, versus 0.029-+0.002 for the IN group,

DISCUSSION

These results demonstrate that rats improved in their rate of reversal acquisition across a series of spatial reversals when re-

t

I

I

I

40

50

if_

.~

,

.50

,

i ,

,

,

__

....

0

i0

2'0

3'0

FIG. 2. Acquisition of Reversal 1: Response frequencies. Values are mean frequencies of responses to the S + (circles) and S° (squares) for the Automaintained (open symbols) and the Instrumental (filled symbols) groups plotted across the 10 10-trial blocks of five daily sessions.

SPATIAL REVERSAL LEARNING SETS IN RATS

1149

I

I

I

t

I

1.000

rr

8

0.75-

¢-

0 u)

0.50-

c5 ¢t~ t9

~

~

0.25-

0.0

0

~

G

~

)

~

I

10



~

'

Instrumental Auto maintenance

C)

I

I

20

I

30

40

50

Blocks of Ten Trials

FIG. 3. Acquisition of Reversal 1: Discrimination ratios (DRs). Values are mean DRs for the Automaintained (open circles) and Instrumental (filled circles) groups plotted across the l0 10-trial blocks of five daily sessions.

sponding was maintained strictly by classically conditioned pairing of conditioned and unconditioned stimuli. Addition of an instrumental contingency, i.e., requiting the rats to respond to the S ÷ for food reward, increased response rates and changed the shape of the serial reversal curve in a quantitative, but not qualitative, manner (Fig. 1). Reversal learning and improvement across serial reversals have been described for instrumental procedures, in which reward is dependent upon emission of a particular correct response. Discrimination reversal based on classically conditioned responding has been described for nictitating membrane re-

I

200-

sponses in rabbits (1,14) and for observational learning in rats (7). The present data are, however, the first to describe and characterize serial reversal learning based only on stimulus pairings, i.e., without an operant contingency between response and reward. The serial reversal curves in Fig. I resemble those previously reported for single-response procedures in mammals [e.g., (2, 5, 10, 16, 17)]. Thus the improvement in learning across reversals does not require standard conditions of discrimination reversal training, but rather generalizes both to an insmmaental procedure in which multiple responses to each presentation of the S + are

I

I

I

I

f-

g=

1.50"

Uc

I00" G) rr

c

50-

0.0

'''''''--,-________________ 0

J

i

2

4

...... r

m

6

_. . . . . . m

n

8

.~__~

i

10

Blocks of Ten Trials

FIG. 4. Acquisition of Reversals 11-16: Response frequencies. Values are mean_ SE response frequencies, with symbols as in Fig. 2. Response frequencies for each rat were averaged across the 6 reversals and within training groups, and plotted across 10-trial blocks within the averaged reversal.

i150

BUSHNELL AND S I'ANTON

i .00-

.... ~----~

. . . . i~i. . . . i-----aq~. . . . ~ . . . . m.___.m_ . . . . l-

t~

N

0.75-

0 50 • ©

g ~ o-25T 0.0

0

Instrumental Automalntanance -0.029X

.......

t

l

I

2

I

3

I

4

0"972-0"748*I0-0 0 977-0.769"10

I

5

I

6

I

7

l14X

"

I

8

I

9

I

10

Blocks of Ten Trials

FIG. 5. Acquisition of Reversals 11-16: Discrimination Ratios (DRs). DRs for each rat were calculat~l and plotted as in Fig. 4. Points represent mean +-SE DRs at each 104rial block, with symbols as in Fig. 3. Learning curves were fit to the data from each rat, using Equation 1; the plotted curves were drawn using averaged asymptote (A), deviation (D), and rate (R) parameters.

recorded prior to reinforcement delivery, and also to an automaintenance procedure, in which responding to the S + is not necessary for reinforcement. The serial reversal curve obtained from the AU animals also resembles closely those observed in previous automaintenance studies from this laboratory, from which asymptotic reversal data were used to evaluate the effects of chemicals on learning (3,4). Thus the present data are not unique to this particular set of animals. Moreover, the animals in the prior studies did not experience trials without reward (extinction or reduced probability of reinforcement for presentations of a single response lever) before serial reversal training began. The similarity of the results from these studies indicates that this aspect of the present training history had little impact on reversal acquisition. Despite qualitative similarity in the serial reversal curves, important quantitative differences were also apparent: the lack of response contingency in the automalntenance task slowed acquisition of Reversal 1, but enhanced asymptotic reversal learning (acquisition of Reversals 11 to 16), relative to the instrumental task. The source of these effects on the trials-to-criterion measure can be seen by examining changes in response rates and discrimination ratios at these phases of learning. During acquisition of Reversal l (Fig. 2), the IN group shifted responding both to the new S + and from the new S ° more quickly than did the AU group, and responding to the S ÷ by the IN group increased simultaneously with reduction in responding to the S°. In contrast, the AU group first appeared to stop responding to the S°, and only after 2 sessions of very low rates of response to either stimulus did they begin to respond appreciably to the S +. Thus the major difference in acquisition of Reversal 1 involved acquisition of the new R + , rather than extinction of the old R °. Indeed, spontaneous recovery of R ° at the beginning of each postreversal test session (seen as response rates on trial blocks 11, 21, 31, and 41 of Fig. 2) was equivalent under the two training conditions for several days. In con-

trast to the similarity of R ° rates, R + rates of the IN animals reached prereversal levels by the beginning of the second session, while those of the AU group did not do so until the fourth session. Clearly, the fact that reward was contingent upon responding to the S + in the IN condition increased the rate at which R +s were emitted by the IN group compared to the AU group. Acquisition rates across Reversals 11 to 16 again differed between the two groups (Figs. 4 and 5); at this stage of training, however, the AU group reversed faster than the IN group. Figure 4 shows that R + rates exceeded R ° rates in the AU group even at the beginning of the reversal, suggesting that these rats acquired the discrimination during the first 10-trial block. In contrast, IN animals generally required 3 blocks of 10 trials to achieve DRs similar to those of AU animals (Fig. 5). Unlike Reversal 1, the difference in reversal acquisition at this stage of training can be attributed to slower extinction of R ° by the IN rats compared to the AU rats. Perhaps IN rats maintained higher response rates to both S + and S ° at the outset of each session to ensure emission of correct responses, and hence reward, on each trial. Extinction of R ° then followed only when the relationship between S + and reward (the "correctness" of the response) was clear from the pairing of the retraction of S + with pellet delivery. Derivation of Discrimination Ratios (DRs) from frequency data provided a means of quantifying reversal acquisition both during Reversal 1 (Fig. 3) and after asymptotic reversal rates had been achieved (Fig. 5). These ratios also provided the metric for reversal criterion and thus the trials-to-criterion measure for quantifying the serial reversal curve itself (Fig. 1). Finally, the DR reduced large differences between animals in overall response rate. These within-group rate differences obscured the acquisition of Reversal 1 in a statistical sense: ff examined only on the basis of rate, no difference in acquisition was evident. However, rats with high R + rates also tend to have high R °

SPATIAL REVERSAL LEARNING SETS IN RATS

1151

rates; thus conversion of rate data to DR reduced the withingroup variability and permitted the acquisition difference to emerge. Response frequencies also provided important information regarding reversal acquisition, by identifying independent response tendencies to the S ÷ and the S °. Measurement of response frequencies to both stimuli enables one to view independently the processes of formation of the S ÷-reward association and extinction of the S°-reward association across trials (Figs. 2 and 4). This information is typically not available from single-response choice behavior, upon which most discrimination reversal studies are based, and from which the processes of reversal learning have in the past been inferred. Collecting multiple responses to each stimulus on each trial thus provides a direct index of the associative strength of each stimulus at each stage of training. From these data, it can be determined that the major determinant of asymptotic reversal rate is the decline in responding to S °, while changes in S ÷ responding play a lesser role [Fig. 4; see also (3,4)]. Olton and Samuelson (13) have proposed another approach to this issue, using response latencies to enrich interpretation of maze learning data. The function fitted to the DR curves provides a quantitative index of acquisition which permits one to identify three possible differences in learning. In some cases, the asymptotic discrimination (A) is affected (4); in the present case, the rate of approach to that asymptote (R) was affected. These parameters may find application in studies of the effects of neurobiological manipulations, as well as on dissociating the effects of environmental cues and contingencies, on acquisition. Faster learning across repeated problems is the hallmark of a learning set, as defined by Harlow (8). Serial reversal learning shows this primary property of the learning set, but differs from

the original definition by being based upon repeated reversal of a single discrimination rather than upon learning a series of novel discriminations, with or without reversal. It has nevertheless been termed a "reversal learning set" in the literature [e.g., (5, 10, 16)]. The facts that 1) improvement occurs across reversals, 2) that asymptotic reversal acquisition often requires but a single trial in a standard discrimination format [e.g., (5)] and 3) that serial reversal training transfers positively to a standard multiproblem discrimination learning set (20) all argue that serial reversal learning reflects a learning set phenomenon. Regardless of the terminology, it should be noted that this type of "learning to learn" (8) has not to our knowledge been demonstrated previously with classical conditioning procedures. In addition to their phenomenological interest, the present data may also have practical value. Instrumental learning procedures cause omission of reinforcement for incorrect responding, which generates differential reward across treatment groups if the groups differ in their choice accuracy. While it provides some of the impetus for behavior change, differential reward can be confounded with treatment, if the treatment reduces response accuracy. Intercalating reward between choice trials after incorrect responses can overcome this problem (6) at the cost of slowed acquisition rate, loss of efficiency, and additional interpretive problems. The automaintenance task, by eliminating the response-reward contingency and delivering food after every trial regardless of responding, provides another means by which this confound can be avoided. ACKNOWLEDGEMENTS We thank D, Dunn, D. New and K. Riggsbee for expert technical assistance, and D. A. Eckerman and M. Picker for reviews of the manuscript.

REFERENCES 1. Berger, T. W.; Orr, W. B. Hippocampectomy selectively disrupts discrimination reversal conditioning of the rabbit nictitating membrane response. Behav. Brain Res. 8:49---68; 1983. 2. Bitterman, M. E. Phyletic differences in learning. Am. Psychol. 20:396--410; 1965. 3. Bushnell, P. J. Behavioral effects of acute p-xylene inhalation in rats: Autoshaping, motor activity, and reversal learning. Neurotoxicol. Teratol. 10:569-577; 1988. 4. Bushnell, P. J, Delay-dependent impairment of reversal learning in rats treated with trimethyltin. Behav. Neural Biol. 54:75-89; 1990. 5. Bushnell, P. J.; Bowman, R. E. Reversal learning deficits in young monkeys exposed to lead. Pharmacol. Biochem. Behav. 10:733742; 1979. 6. Bushnell, P. J.; Henry, K. R.; Bowman, R. E. The intercalated reinforcement technique: Learning without total differential reinforcement between groups. Behav. Res. Methods Instr. 5:337-339; 1973. 7. Denny, M. R.; Bell, R. C.; Clos, C. Two-choice, observational learning and reversal in the rat: S-S versus S-R effects. Anim. Learn. Behav. 11:223-228; 1983. 8. Harlow, H. F. The formation of learning sets. Psychol. Rev. 56: 51--65; 1949. 9. Hearst, E.; Jenkins, H. M. Sign-tracking: The stimulus-reinforcer relation and directed action. Monogr. Psychon. Soc.; 1974. 10. Holmes, E. J.; Butters, N.; Jacobson, S.; Stein, B. M. An examination of the effects of mammillary-body lesions on reversal learning sets in monkeys. Physiol. Psychol. 11:159-165; 1983.

11. Hull, C. L. Principles of behavior. New York: Appleton-CenturyCrofts; 1943:119-122. 12. Mackintosh, N. J, Conditioning and associative learning. Oxford: Oxford University Press; 1983. 13. Olton, D. S.; Samuelson, R. Decision making in the rat: Responsechoice and response-time measures of discrimination reversal learning. J. Comp. Physiol. Psychol. 87:1134--1147; 1974. 14. On', W. B.; Berger, T. W. Hippocampectomy disrupts the topography of conditioned nictitating membrane responses during reversal learning. Behav. Neurosci. 99:35-45; 1985. 15. SAS. SAS user's guide: Statistics. Version 5 edition. Cary, NC: SAS Institute; 1985. 16. Slotnick, B. M. Olfactory stimulus control in the rat. Chem. Senses 9:157-165; 1984. 17. Stephens, D. N.; Weidmann, R.; Quartermain, D.; Sarter, M. Reversal learning in senescent rats. Behav. Brain Res. 17:193-202; 1985. 18. Terrace, H. S. Autoshaping and two-factor learning theory. In: Locurto, C. M.; Terrace, H. S.; Gibbon, J., eds. Autoshaping and conditioning theory. New York: Academic Press; 1981:1-18. 19. Wagner, A. R. Incidental stimuli and discrimination learning. In: Gilbert, R. M.; Sutherland, N. S., eds. Animal discrimination learning. London: Academic Press; 1969. 20. Warren, J. M. Reversal learning and the formation of learning sets by cats and rhesus monkeys. J. Comp. Physiol. Psychol. 61:421428; 1966.