Variable reward and choice behavior of rats

Variable reward and choice behavior of rats

LEARNING AND MOTIVATION Variable (1970) 1, 276-280 Reward and Choice Behavior of Rats1 T. GARY WALLER University of Waterloo Reward conditio...

292KB Sizes 2 Downloads 97 Views

LEARNING

AND

MOTIVATION

Variable

(1970) 1, 276-280

Reward

and

Choice

Behavior

of Rats1

T. GARY WALLER University of Waterloo Reward conditions for a correct response were varied for three groups of rats in a T maze. Groups 2-O and 4-O received 50% reward for correct responses with two and four pellets, respectively. Group 4-2 received four pellets for 50% of the correct responses and two pellets for the other 50%. Performance of Group 4-2 was predictable from performance of Groups 4-O and 2-O.

The behavior of rats given a choice in a T maze between unequal amounts of reward can be predicted from the behavior of rats given a choice between something and nothing. For example, Clayton (1964) predicted the behavior of rats given a choice between four and two pellets from the behavior of two other groups of rats, one group given a choice between four and no pellets and another group given a choice between two and no pellets. The prediction procedure first required an index of acquisition rate, J, according to the equation:

J = q,(U)

(1)

where ql is the probability of an error on trial 1 and T is mean errors to acquisition criterion. The acquisition rate of a group permitted to choose between s and t pellets, designated Js,t was predictable from the acquisition rates of two other groups according to the equation:

J,,t = J,,o- Jt,o

(2)

where J,,. and Jt,o are the acquisition rates for groups permitted to choose between s pellets and nothing and t pellets and nothing. As noted by Clayton, Eq. 1 is easily recognized as a solution from several mathematical models of simple learning studied by Bush and Mosteller (1955) where J r This research was supported by Grant APAfrom the National Research Council of Canada and by a University Research Grant from the University of Waterloo. Requests for reprints should be sent to the author, Department of Psychology, University of Waterloo, Waterloo, Ontario. 276

REWARD

MAGNITUDE

AND

CHOICE

277

is equivalent to their 1 = (Y, and by Estes and Suppes (1959) with J being equivalent to their 13. The present study sought to determine if another prediction similar to that originally made by Clayton (1964) would be supported. Specifically, magnitude and percentage of reward were manipulated to determine if choice behavior of Ss sometimes rewarded with x pellets and sometimes with y pellets could be predicted from the behavior of Ss partially rewarded with one or the other but not both. For example, could the choice behavior of rats sometimes rewarded with four pellets and sometimes with two pellets be predicted from the behavior of two other groups, one sometimes given four pellets, and another sometimes given two pellets. The specific prediction was that:

Ju = Ja + Jm

(3)

where J4,2 under these circumstances is acquisition rate (as defined in Eq. 1) for Ss which receive four pellets for 50% of their correct response and two pellets for the other 50% of their correct responses. Similarly J,,,, represents acquisition rate for animals which receive four pellets for 50% of their correct responses and nothing for other responses, a simple 50% partial reward condition. METHOD

Experimental

Design

Three groups of 17 Ss each were tested. Treatment of the three groups differed only in terms of reward conditions for a correct response. Group 2-O received two .045-g pellets on a randomly selected 50% of the correct responses and no reward on the other 50%. Group 4-O received four pellets on half the correct trials and nothing on the other half. Group 4-2 received four pellets for 50% of the correct responses and two pellets on the other 50% of the correct responses. Subjects

The Ss were 51 experimentally naive male albino rats of the SpragueDawley strain supplied by Holtzman Co. Ss were approximately 100 days old on arrival at the laboratory. Apparatus

The apparatus was an enclosed wooden single-unit T maze which was painted flat gray; it was 8 in. deep throughout, and covered with Plexiglas. The start box was 6 X 8 in., the stem was 3t X 15 in., and each of the arms, which constituted the goal boxes, was 4 x 16

278

T.

GARY

WALLER

in. Reward pellets were placed in gray food cups 14 in. in diameter and f in. deep. Each food cup was placed behind a wooden barrier 14 in. high located 2 in. from the rear of the arm. The food cup was present when the goal box was unbaited. A guillotine door located 1 in. into the stem separated the start box from the stem. Horizontally sliding doors located 13 in. into each arm separated the stem from each goal box. Procedures Ss arrived in five separate weekly squads. On arrival each squad was housed in a colony cage and given food and water ad libitum for 3-4 days. Ss were then caged individually and placed on a 23-hr deprivation schedule. At the same time each day all Ss were given access to wet mash for 1 hour. Each S received 4- 10 days of prehandling. In each prehandling session S was permitted to explore the top of a large black table for 2 min, was picked up and replaced five times by E, and was permitted access to three .045-g Noyes food pellets. During prehandling the daily feeding was given approximately 30 min after the last S was prehandled. In training three Ss, one in each experimental condition, were run each day. Each S was given consecutive acquisition trials separated by a minimum lo-set intertrial interval until that S reached an acquisition criterion of 12 consecutive correct responses. In other words, each S received all of his training trials without interruption on the same day. The order in which Ss were trained was determined by their behavior during prehandling. Ss were chosen for training in the order in which they began to eat prehandling pellets except that no S was run until he had eaten at least a total of nine pellets. If more than one S met the prehandling criterion for running at any given time, one S was selected at random. When each S was selected for training he was randomly assigned to one of the three treatment groups subject to the restriction that on each day one S be run in each condition. During training S was placed in the start box. When S oriented to the guillotine door it was raised and S was permitted to enter the alley. The start box door was always closed behind S. When S entered a goal box, the sliding door was closed. Correction of errors was never permitted, and errors were never rewarded. On rewarded trials S was confined until he had eaten the pellets; on nonrewarded trials (either errors or nonrewarded correct response) S was confined for 10 sec. Between trials S was returned to the home cage which was located on a stand behind the start box. Intertrial interval was a minimum of 10 set but was longer if S was drinking. S was not run until he ceased to drink. Any S which refused to leave the start box, refused to choose, or refused to eat within 180 set was

REWARD

discarded. acquisition

MAGNITUDE

AND

CHOICE

279

Six Ss, two from each group, were discarded in the first 15 trials by these criteria. RESULTS

AND

DISCUSSION

Statistical analysis was provided by analysis of variance of errors to acquisition criterion. Means and standard deviations are given in Table 1. Overall analysis of variance indicated a significant groups effect (F = 18.08, df= 3/64, p < .Ol). Individual comparisons by the NewmanKeuls procedure indicated that all groups were significantly different from each other (all p < .Ol); Group 4-2 made fewer errors than Group 4-0, and Group 4-O made fewer errors than Group 2-O. The superiority of Group 4-O to Group 2-O confirms previous findings that choice performance is a direct function of reward magnitude under conditions of partial reward as well as consistent reward (e.g., Clayton & Koplin, 1964). The primary purpose of the study was to determine if the performance of Ss receiving a combination of two reward magnitudes each occurring on half the trials could be predicted from performance of other groups receiving partial reward with only one particular reward amount. For this study the question was specifically whether or not performance for Group 4-2 could be predicted from Groups 4-O and 2-O. To test this prediction, J4,,, and Jz,O were estimated, using Eq. 1, to be .113 and .08 1 respectively. From Eq. 3, J4,2 was estimated to be .194. Using J4,2 = .194 and substituting into Eq. 1, mean errors to criterion for Group 4-2 was predicted to be 5.15. The obtained value was 5.29, which, with a standard error of the mean for Group 4-2 of 58, was clearly well within the range of sampling error. The 90% confidence interval for the prediction is 4.19 to 6.11 errors (5.15 -+ .96). It is recognized that the prediction is supported by acceptance of the null hypothesis (Binder, 1963; Grant, 1962), but it should be noted that the size of the groups (n = 17) was reasonably large. Also, of the three groups involved, the prediction was to the group with the smallest variability, Group 4-2. Thus, errors in prediction were more likely to be detected than if performance of Group 4-O or 2-O was predicted. TABLE 1 Mean and Standard Deviation of Errors to Acquisition Criterion Group

M

SD

2-o 4-o 4-2

12.35 8.88 5.29

4.5 I 3.35 2.31

280

T. GARY

WAILER

One theoretical interpretation of the obtained relationship is provided by Estes and Suppes (1959) who suggested that, “. . . in our formulation, varying the amount of reward can modify only the probability of reinforcement” (p. 145). For example, increasing the magnitude of reward increases the probability that a reinforcing event will occur. Relating that assumption to this experiment, the argument is that the reinforcement probability for Group 4-O is .I 13; for Group 2-O it is .08 1. Since Group 4-2 receives both sets of reward conditions, reinforcement probability for Group 4-2 should be approximately .194, the sum of. 113 and .081. Using the obtained value, .194, and making appropriate substitutions into a simple linear model (Bush & Mosteller, 1955; Estes & Suppes, 1959), the obtained results, in terms of mean errors to acquisition criterion, can be predicted. REFERENCES BINDER, A. Further considerations on testing the null hypothesis and the strategy and tactics of investigating theoretical models. Psychological Review, 1963, Xl, 107-l 15. BUSH, R. R., & MOSTELLER, F. Stochastic models for learning. New York: Wiley, 1955. CLAYTON, K. N. T-maze choice learning as a joint function of the reward magnitude for the alternatives. Journal of Comparative and Physiological Psychology, 1964, 58, 333338. CLAYTON, K. N., & KOPLIN, SALLY T. T-maze learning as a joint function of probability and magnitude of reward, Psychonomic Science, 1964,1,381-382. ESTES, W. K., & SUPPES, P. Foundations of linear models. In R. R. Bush & W. K. Estes (Eds.), Studies in mathematical learning theory. Stanford: Stanford Univ. Press, 1959, pp. 137-179. GRANT, D. A. Testing the null hypothesis and the strategy and tactics of investigating theoretical models. Psychological Review, 1962, 69, 54-6 1. (Received December 26, 1969)