Preference as a function of the correlation between stimuli and reinforcement outcomes

Preference as a function of the correlation between stimuli and reinforcement outcomes

LEARNING AND MOTIVATION l&238-255 (1980) Preference as a Function of the Correlation between Stimuli and Reinforcement Outcomes LEONARD GREEN W...

1MB Sizes 0 Downloads 42 Views

LEARNING

AND

MOTIVATION

l&238-255

(1980)

Preference as a Function of the Correlation between Stimuli and Reinforcement Outcomes LEONARD

GREEN

Washington University Pigeons’ preferences for stimuli that were to varying degrees correlated with outcomes were studied in two experiments using a concurrent-chain procedure. The pigeons chose between two terminal links, each ending with food reinforcement and with blackout on half of the trials. In the first experiment, one terminal link (nonpredictive or unreliable link) provided stimuli completely uncorrelated with the outcomes while the other terminal link (predictive or reliable link) provided stimuli that were, to varying degrees, correlated with these outcomes. All pigeons showed increasing preferences for the predictive link as the reliability of the stimuli in that link increased. In the second experiment, stimuli in both terminal links were differentially correlated with the outcomes. The pigeons again preferred the more reliably correlated terminal link. The relation between these results and the delay-reduction hypothesis and conditioned reinforcement account is noted. The behavioral value of predictive stimuli may lie in their permitting the organism to more effectively apportion its time between interim activities and terminal responses.

There is, presently, a considerable body of research demonstrating that organisms prefer situations in which stimuli are correlated with outcomes to situations in which noncorrelated stimuli are provided. This preference is obtained even though both situations ultimately lead to the same reinforcement outcomes (e.g., Bower, McLean, & Meacham, 1966; Catania, 1975; Hendry, 1969; Hursh & Fantino, 1974). For example, Hendry (1969, Experiment B5) allowed his pigeons to choose between a multiple schedule in which a red key signaled an FRIO and a green key signaled an FR90 schedule of reinforcement and a mixed schedule in which the red and green key lights were uncorrelated with the two different fixed-ratio schedules. Preferences for the multiple schedule were, indeed, quite strong. Such results have been discussed within the context of an informational analysis which posits that the reduction of uncertainty provided by the The author gratefully acknowledges the expert assistance of Sandra Schrader during all phases of the present research. Reprints may be obtained from Leonard Green, Department of Psychology, Washington University, St. Louis, MO 63130. 238 0023-9690/80/020238-18$02.00/O Copyright All rights

@I 1980 by Academic Press, Inc. of reproduction in any form reserved.

PREFERENCES

FOR

RELIABLE

STIMULI

239

correlated stimuli is reinforcing (Hendry, 1969). This approach predicts that preferences for informative situations must vary appropriately as amount of information provided by the stimuli about consequent outcomes is varied. Intuitively, informational concepts are quite appealing. They suggest experimental manipulations and have guided much fruitful research. Unfortunately, results using the mathematical measure of information, the bit, have not proven as successful as the concept’s intuitive appeal had led us to expect. Many inconsistencies of data with an informational approach are apparent. For example, according to this approach “the strength of a stimulus is a function of its informativeness about primary reinforcement, i.e., how much uncertainty reduction it provides about reinforcement” (Fantino, 1977). This view posits that stimuli paired with negative outcomes should be reinforcing just as much as stimuli paired with positive outcomes. That is, “ ‘bad news’ (is) just as much ‘news’ as (is) ‘good news’ ” (Bloomfreld, 1972). The weight of the evidence clearly does not support such a view (Dinsmoor, Browne, & Lawrence, 1972; Jenkins & Boakes, 1973; Mulvaney, Dinsmoor, Jwaideh, & Hughes, 1974). An alternative to the information-as-reinforcing theory is the delayreduction hypothesis developed by Fantino (1977). This hypothesis states that “(1) organisms will choose the stimulus correlated with the greatest reduction in time to primary reinforcement and (2) preference will be greater the larger the difference in the delay reductions correlated with the chosen alternative” (p. 326). Furthermore, organisms should be indifferent between stimuli that are not correlated with a relative reduction in delay to primary reinforcement. If the subject is offered a choice between a mixed and a multiple VI EXT schedule, preferences for the multiple schedule (in which stimuli are perfectly correlated with outcomes) can be predicted. Under this situation, presentation of the mixed stimulus does not reduce expected time to primary reinforcement whereas the signal for the VI component of the multiple schedule does. However, when the stimuli are not perfectly correlated with outcomes, it is not immediately obvious what predictions the delay-reduction hypothesis makes. In the following experiments, the correlation between stimuli and outcomes was varied while the relative reduction in time to primary reinforcement was not necessarily affected by which stimuli were produced. The first experiment was designed to test whether pigeons prefer situations in which stimuli are correlated with outcomes to a situation in which the stimuli are uncorrelated with the same outcomes. The second experiment assessed whether preference is influenced by the relative degree of correlation between stimulus situations. In this experiment, the two situations provided stimuli that were to differing degrees correlated with the outcomes.

240

LEONARDGREEN

EXPERIMENT

1

In an attempt to quantify how preferences for informative stimuli vary as the amount of information provided by the stimuli is varied, Green and Rachlin (1977) used a concurrent-chain procedure (Autor, 1969) in which pigeons chose between two terminal-link situations. Both terminal links ended in food reinforcement with probability (p) and in blackout with probability (l-p). One terminal link, the noninformative link, was signaled by a stimulus uncorrelated with either food or blackout. The other terminal link, the informative link, provided stimuli perfectly correlated with these outcomes. Amount of information provided by these stimuli was varied across conditions by changing the probability of reinforcement and blackout. Although marked preferences for the informative link were obtained, such preferences did not vary as predicted by quantitative formulations of information theory. However, in those studies in which amount of information was changed, other variables were necessarily altered. The number of times and amount of time the stimuli correlated with the outcomes were presented were changed. Furthermore, the probability of reinforcement was varied across conditions. To control for the effects that such changes might have on choice, it is necessary to hold rate of reinforcement constant while varying only the probabilistic relation between the stimuli and the outcomes. In the present experiments this was achieved by varying the extent to which stimuli were correlated with reinforcement and blackout while maintaining the probability of both of the outcomes constant across experimental conditions. Reinforcement occurred on half the trials, and blackout occurred on the other half of the trials. The extent to which stimuli were correlated with these outcomes was manipulated. A zero correlation between stimuli and outcomes is produced when both outcomes occur with equal probability in the presence of each stimulus (reliability or predictability of the stimuli about the outcome equals .50). If reinforcement always follows a red signal and blackout always follows a green signal, the stimuli are perfectly predictive (reliability equals 1.0). Three other points along the continuum of reliability were studied in the first experiment, .65, .75, and 85. Method Subjects. Three experimentally naive, male, White Carneaux pigeons were maintained at 80% of their free-feeding weights. Water and grit were continuously available in their home cages. Apparatus. A standard sound-insulated pigeon chamber measuring 36.2 cm long by 33.7 cm wide by 40.6 cm high contained two translucent response keys mounted 8.9 cm apart, each requiring a minimum force of 0.10 N to be activated. The left key could be transilluminated from behind

PREFERENCES

FOR

RELIABLE

STIMULI

241

with either white, yellow, or blue light. The right key could be transilluminated with either white, red, or green light. Pecks on the response keys during the initial link when both keys were illuminated with white light produced an auditory feedback click by operating a 110-V AC relay. The food hopper, situated between and below the keys, provided access to mixed grain. When operated, the hopper was illuminated by two 7-W white lights. General chamber illumination was provided by two 7-W white lights mounted on the ceiling. White noise was continuously present and electromechanical programming equipment was located in an adjacent room. Procedure. The birds were trained to peck the white response keys using a modified autoshaping technique (Brown & Jenkins, 1968). Once reliable responding was established, the pigeons were placed on the concurrent-chain schedule. In the initial link, both keys were transilluminated with white light. Pecks produced an audible feedback click. Entrance into the terminal links was arranged by two independent VI timers on a concurrent variable-interval I-min variable-interval I-min schedule (cone VI 1-min I-min). The 12 intervals used were derived from the distribution suggested by Fleshler and Hoffman (1962). When such entrance was scheduled by either VI timer, it stopped operating, but the other VI timer continued to operate. The next response on the appropriate key produced the terminal-link stimulus associated with that key and the other key was darkened. During a terminal link, both VI timers stopped. The relative rate of responding on the initial-link keys was the measure of preference for the more reliably correlated terminal link. When the peck that gained access to the terminal link was on the left key, it turned either yellow or blue and the right key was darkened. When the peck that gained access to the terminal link was on the right key, it turned either red or green and the left key was darkened. Twenty seconds after the start of a terminal link an outcome was presented, either reinforcement (5set access to grain) or blackout (5 set with all lights extinguished). Any key peck during the final 10 set of the trial postponed the outcome until 10 set passed without a response. At the end of the trial outcome, both keys were reilluminated with white light, indicating the beginning of the next trial in the initial link. The probability of each of the terminal links ending in food delivery was .50 throughout the experiment. Likewise, half of the trials in each terminal link ended with the blackout period. For two of the birds the red and green key lights of the right terminal link were correlated with a given probability with the reinforcement and blackout outcomes, while the yellow and blue key lights of the left terminal link were always uncorrelated with these outcomes. For the third bird the reverse was true. The order of conditions, different for each bird, and the number of sessions each was studied under a given reliability value are shown in Table 1.

242

LEONARDGREEN

TABLE 1 Number of Sessions Each Bird Was Studied at Each Reliability Value in Experiment

1”

Reliability Bird 40 41 44

0.50

0.65

0.75

0.85

1.00

o.sob

76 (1) 107 (1) 113 (1)

92 (5)

65 (2)

101 (4) 67 (3) 101 (2)

50 (3) 68 (4) 56 (5)

91 (2) 99 (5) 57 (3)

188 (6) 225 (6) 220 (6)

97 (4)

a Numbers in parentheses give the order in which experimental ducted for each bird. * Represents return to initial baseline condition.

conditions were con-

Conditions were changed when the following criteria for stability were attained: (a) the bird remained on a condition for at least 50 days; (b) on the 50th day and every day thereafter until stability had been reached, the last 15 days were divided into three successive blocks of five sessions each, and the medians of these blocks showed neither an upward nor a downward trend, i.e., neither Md, < Md, < Md3 nor Md, > Md, > Md,; and (c) there was no visible trend during the last 10 days on a condition tor either relative response rate or absolute rate of response. The stimuli in one of the terminal links were always uncorrelated with the outcomes (reliability of 50). The reliability with which the stimuli in the other terminal link predicted the outcomes was manipulated. Five different reliability values were studied: .50 (the probability of receiving food given, e.g., red, is equal to the probability of receiving the blackout following the red key), .65, .75, 35 and 1.OO(the stimuli here are perfectly reliable predictors of outcomes). Initially, all birds were studied with the key colors totally uncorrelated with outcomes for both terminal links (reliability equal to .50). Under this condition reinforcement and blackout occur with the same probability in the presence of each stimulus. When key colors are somewhat reliable, say .75, thenp (reinforcement/color,) = .75 (the probability of reinforcement given color, equals .75) and p (reinforcement/color,) = .25. Likewise, p (blackout/color,) = .25 and p (blackout/color,) = .75. Color, is now a reliable predictor of reinforcement on 75% of the trials and unreliable on 25% of the trials. With perfect reliability, 1.OO, color, is always followed by reinforcement (and never by blackout) and color, is always followed by blackout (and never by reinforcement). It must be noted that the bird is choosing between the two terminal links, one of which presents stimuli correlated with different reliability values with the outcomes whereas the other link presents stimuli always uncorrelated with the outcomes; half of the outcomes for each link end with reinforcement and half end in blackout. Experimental sessions were conducted 7 days/week, with each session lasting until 80 outcomes were obtained. The sequence of reinforcement

PREFERENCES

FOR RELIABLE

STIMULI

243

and blackout outcomes was randomly determined by an eight-channel paper-tape reader with the sequence of outcomes changed daily. Results Median relative rates during the free-choice initial links (proportion of pecks on the initial-link key leading to the more reliably correlated terminal link) from the last five sessions at each reliability value for each pigeon are shown in Fig. 1. Relative rate in the initial link is the measure of preference for the reliable stimuli in the terminal link. Points above the SO broken line indicate preference for the terminal link in which the more reliably correlated stimuli were presented. When the stimuli in both terminal links were completely uncorrelated with the outcomes, the pigeons were indifferent with respect to the two links. As the reliability between the stimuli and the outcomes increased in one of the terminal links, so too did pigeons’ preferences for that link. Preferences were highest when the stimuli were perfectly reliable. When the birds were returned to the initial baseline condition in which the stimuli in both terminal links were uncorrelated with outcomes, each showed a substantial decline in preference, approaching its baseline value (relative rate equalled S36, .516, and .509 for birds 40, 41, and 44, respectively). Figure 2 shows, for each pigeon, the absolute rate of responding on the initial-link keys leading to the reliable and unreliable terminal links at each reliability value from which relative rates were calculated. These response rates were obtained by dividing the total number of responses made to each key separately by the total time spent in the initial link. Rate of response on the initial-link key leading to the more reliable terminal link increased as reliability increased, whereas the rate on the key leading to the always unreliable terminal link decreased. Responses during the final 10 set of either terminal link reset a timer, thus postponing the outcome. This procedure effectively suppressed re-

40

u 30 k LL

44

70

41

9. i= 2 250 s .50

L4?f .50

------.65

.75

45

1.00

RELIABILITY

FIG. 1. Relative rate (proportion of pecks on the key leading to the more reliably correlated terminal link) during the initial link for each pigeon. The points are the median rates from the last five sessions at each reliability value.

244

LEONARD

GREEN -R&able ---Unreliobla

~~~~

I/-:

.%I

b5.75

.85

1.00

Key Ke”

15

.50 b5.75

.85

1.00

50

65

.75 .8??00

RELIABILITY FIG. 2. Absolute rate of response for each pigeon on the initial-link keys leading to the reliable and unreliable terminal links at each reliability value for sessions shown in Fig. 1.

sponding to the key. The amount of time spent in the left and right terminal links was computed for each pigeon separately, and the median of the last five sessions on each condition was obtained. The percentage increase in time spent in both the left and right terminal links over the minimum time possible was never greater than 1%. Thus, there was no significant responding during either terminal link. Discussion Clearly, preferences for the correlated stimuli were a function of the reliability of those stimuli: preferences increased as the correlation between the stimuli and the outcomes increased. The pigeons were indifferent with respect to the terminal links when all stimuli were uncorrelated with the outcomes. When the stimuli were perfectly correlated with outcomes (reliability = 1.00) preferences ranged from .685 to .813. The absolute degree of preference appeared to be related to the order in which the conditions were experienced. The highest preference was shown by bird 40 who experienced the 1.00 condition first; the next highest preference was shown by bird 44 who experienced that condition second, while the lowest preference was shown by bird 41 who was placed on the perfect reliability condition last. In the present experiment neither the probability of reinforcement nor the probability of either stimulus was varied. Food and blackout occurred equally often, and each of the stimuli was also presented an equal number of times in each session. Only the correlation between the stimuli and the outcomes varied. Using an observing response procedure in which the reliability of the stimuli was also varied between .50 and 1.OO, Kendall (1973, Experiment 2) found that observing-response rate was an increasing function of the reliability of the stimuli. The present results further extend this result to

PREFERENCES

FOR

RELIABLE

STIMULI

245

the use of a concurrent-chain technique measuring preferences for such stimuli. Responses during the final half of the terminal links kept resetting a timer until 10 set elapsed without a key peck. This procedure was effective in suppressing responding to the keys. Although certain authors (Fantino, 1%9; Gollub, 1970; Squires & Fantino, 1971) have questioned the independence of initial-link choice from terminal-link responding, the procedure used here effectively eliminated any key responding in the terminal link which might have differentially influenced preference. EXPERIMENT

2

In all previous experimental work using both the observing-response technique (Wyckoff, 1969) and a choice procedure, subjects were responding for stimuli that were to varying degrees correlated with outcomes as compared to stimuli completely uncorrelated with the outcomes. For example, with the observing response technique, such responses change a mixed (uncorrelated) schedule into a multiple (correlated) schedule. In choice studies, responses to one manipulandum produce stimuli completely uncorrelated with outcomes while responses to the alternative manipulandum produce stimuli that are, to varying extents, correlated with the outcomes. When the independent variable was amount of information or reliability of the stimuli, the choice was always between some correlation versus none. Under such procedures, secondary reinforcement accounts and the delay-reduction hypothesis predict that subjects will prefer or make observing responses to produce the multiple schedule. However, these accounts do not make unambiguous predictions about whether pigeons are sensitive to differing correlations between stimuli and outcomes when borh terminal links do provide reliably correlated stimuli. The second experiment, therefore, measures preferences between terminal links when both provide some, though differing degrees of correlation. Method Subjects. Five male, White Carneaux pigeons, all experimentally naive, were maintained at 80% of their free-feeding weights, with water and grit continuously available in their home cages. Apparatus. An experimental chamber identical to that used in Experiment 1 was used. The left key could be transilluminated with white, yellow, or blue light. The right key could be transilluminated with white, red, or green light. Pecks on the illuminated keys produced a feedback click. Reinforcement consisted of 5-set access to mixed grain and blackout was a 5-set period during which all lights were extinguished. Other details were the same as in Experiment 1. Procedure. After the birds were autoshaped to peck the white response

246

LEONARDGREEN

keys, they were placed on a concurrent schedule with independent variable-interval 1-min schedules providing reinforcement (cone VI 1-min VI I-min). This concurrent schedule of reinforcement continued for 7 days, and equal responding to both keys was shown by all birds (relative rate of responding to the’left key equalled .49, .51, .58, .52, and .50for birds 65-69, respectively, on the seventh day). Following initial training, the concurrent-chain procedure was instituted. As in Experiment 1, the initial-link keys were transilluminated with white light. Entrance into the terminal links was arranged by two independent VI timers on a concurrent variable-interval 1-min variableinterval 1-min schedule. When such entrance was scheduled by either VI timer, it stopped operating while the other VI timer continued to operate. A changeover delay (COD) of 1 set was in effect during the initial link. Given the rather small differences between the terminal links in the degree of correlation of the stimuli with the outcomes to be used in this experiment, it was assumed that the COD might aid in the separation and differentiation of the terminal links. The COD specified a minimum time interval of 1 set that had to elapse between a changeover from one key to the other before entrance into a terminal link could occur. The next effective response on the appropriate key (provided the COD had timed out) produced the terminal-link stimulus associated with that key and the other key was darkened. During a terminal link, both VI timers stopped. When the peck that gained access to the terminal link was on the left key, it turned either yellow or blue and the right key was darkened. When the peck that gained access to the terminal link was on the right key, it turned either red or green and the left key was darkened. The terminal links lasted for 10 set followed by the scheduled outcome (reinforcement or blackout). Unlike the first experiment, key pecks were permitted during the terminal link although they had no effect on the outcome: food and blackout were presented independent of responding. As in the previous experiment, food and blackout occurred on half the trials in each terminal link. For birds 65, 66, and 69, the left terminal link provided stimuli that were reliably correlated with the outcomes on 85% of the trials. For birds 67 and 68, the right terminal link provided stimuli that were reliably correlated with outcomes on 85% of the trials. The reliability with which the stimuli in the alternative terminal link predicted the outcomes was manipulated. Four different reliability values were studied: 65, 75, 85, and 95%. Notice that the birds are always presented stimuli that are to some extent correlated with outcomes. The order of conditions for each bird is shown in Table 2. Conditions were changed when the following criteria for stability were attained: (a) the bird had been on a given condition for a minimum of 30 days; (b) on the 30th day and every day thereafter until stability had been reached, the last 9 days were divided into three successive blocks of three sessions each, and the medians of these blocks showed neither an upward nor a down-

PREFERENCES

FOR RELIABLE

247

STIMULI

TABLE 2 Number of Sessions Conducted at Each Value for Each Bird in Experiment 2” Reliability value of alternative link Bird

0.65

65 66 67 68 69

55 55 64 90 60

(4) (1) (4) (4) (4)

0.75

0.85

52 (1) 65 (2) 76 (3) 47 (3) 55 (2)

75 (2) 70 (3) 51 (2) 57 (2) 55 (3)

0.95 62 55 50 45 34

(3) (4) (1) (1) (1)

a Numbers in parentheses indicate the order in which the conditions were conducted for each bird.

ward trend; and (c) there was no visible trend during the last 5 days of the condition for either relative or absolute response rates. Sessions were conducted seven days a week, with each session lasting until 80 outcomes were obtained. The sequence of reinforcement and blackout outcomes was randomly determined daily. Results The data of interest are the median relative rates during the free-choice, initial links as a measure of preference for the constant (85% reliable) terminal link. Figure 3 presents these median relative rates (proportion of r

z- 90Y’ B 2.80z’ $.706 ” ho-

bk----RELIABILITY

OF ALTERNATIVE

.75 KEY

.85

.95

FIG. 3. Relative rate (proportion of pecks on the key leading to the constant, 0.85, terminal link) during the initial link for each pigeon as a function of the reliability of the alternative, terminal link. Points are the median rates from the last five sessions at each reliability value. Left-hand figure, overall relative rate; right-hand figure, post-COD relative rate.

248

LEONARD

GREEN

pecks on the initial-link key leading to the constant, .85-reliable, terminal link) from the last five sessions for each pigeon as a function of the reliability value of the alternative link. Points above the dashed line at 30 indicate a preference for the constant link, whereas points below the SO line indicate a preference for the alternative link. The left-hand figure shows overall relative rates whereas the data in the right-hand figure are based only on post-COD response distributions. Without exception, the pigeons showed decreasing preferences for the constant link as the reliability of the stimuli in the alternative link increased. In general, the birds preferred the constant link when its stimuli were more reliably correlated with outcomes than were the stimuli in the alternative link, and preferred the alternative link when its stimuli were more reliably correlated with outcomes than the constant link. When both links provided stimuli that were equally reliable (.85), the pigeons were relatively indifferent between the terminal links. The mean relative rates for all birds are shown in the left-hand part of Fig. 4. As was evident in the individual data, post-COD relative rate was a more sensitive indicator of preference than was overall relative rate. In addition to the distribution of responses, the amount of time the pigeons spent on each key during the initial link was also recorded. Time on a key was measured from the initial peck to that key until either the bird pecked the other key or entered a terminal link. Relative rate and relative time show similar results. The right-hand part of Fig. 4 shows the mean rate of response on each of the initial-link keys as a function of the reliability value in the alternative terminal link. Rate of response on the initial-link key leading to the constant terminal link decreased, while rate of response on the key leading to the alternative terminal link increased as the reliability of the stimuli in the alternative link increased. The mean rates are representative of the individual rates of responding.

zQ 2

.Ore.all Rot. OPort-COD Rots . Tin.

20 /-: .65

.75 .85 95 RELIABILITY of

s

20’ ??ic%-ALTERNATIVE

.A5

.95

KEY

FIG. 4. (Left) Mean relative rate and relative time during the initial links for the constant, .SSterminal link. (Right) Mean absolute rate of response on the initial-link keys leading to the constant and the alternative terminal links.

PREFERENCES

FOR

RELIABLE

STIMULI

249

Although the outcomes were presented independent of responding in the terminal links, the birds did respond at considerable, though highly variable rates. The only consistent result obtained was more frequent key-pecking when the signal with the higher correlation with food was presented than when the poorer-correlated stimulus was presented. This result held for every bird for both links. GENERAL

DISCUSSION

The purpose of the present experiments was to see if pigeons can detect differences in the reliability with which stimuli are correlated with outcomes of food delivery and blackout, and show appropriate preferences for the more highly reliable stimulus situation. Experiment 1 clearly demonstrated that pigeons prefer somewhat reliable stimuli to completely unreliable ones, and that such preference increases as the difference in reliability increases. In the second experiment, the pigeons chose between a situation which provided stimuli reliably paired with the outcomes on 85% of the trials and a situation in which stimuli were reliably paired with outcomes on 65,75,85, and 95% of the trials. The results from this experiment showed that even when all stimuli were to some degree reliably correlated with outcomes, the birds were able to detect differences between situations and preferred the one which provided the more reliably correlated stimuli. In both experiments absolute rates of responding on the two initial-link keys showed qualitatively similar changes. Rate of response increased on the key leading to the more reliable terminal link as the reliability of the stimuli in that link increased. Conversely, rate of response on the key leading to the constant link (unreliable link in Experiment 1; .85-reliable link in Experiment 2) decreased as the reliability of the stimuli in the alternative link increased. In a previous experiment investigating pigeons’ preferences for informative stimuli (Green & Rachlin, 1977) amount of information was manipulated by varying the probability of reinforcement. In that experiment, the degree of preference was related to the amount of information provided by the stimuli but not as predicted from quantitative formulations of information theory. Preferences began to decrease only at extreme values. It is unlikely that the birds’ preferences were not sensitive to such comparatively small differences in amount of information. Pigeons are quite sensitive to changes in other variables (e.g., amount and duration of reinforcement) so it is hard to understand why, a priori, differences in amount of information should be more difficult to detect. The most likely reason for their insensitivity was that differences in the amount of reinforcement had an effect over and above that of only changing amount of information. To control for the effects that differential reinforcement rates have on preferences, the present experiments kept reinforcement fre-

250

LEONARD

GREEN

quency constant and varied the probabilistic relation between the stimuli and the outcomes. The present results demonstrate that pigeons can detect rather small differences in the reliability of stimuli and change their preference accordingly. In Experiment 2, post-COD relative rates were more sensitive indicators of degree of preference than were overall relative rates. Such a result is in keeping with Silberberg and Fantino’s (1970) findings that response rates during the COD period following a changeover from one key to the other were much higher than post-COD response rates, and that consequently relative response rates within the COD were close to .50 (indifference). This increased rate of responding was probably due to the direct relationship between the probability of reinforcement on one VI schedule soon after a changeover response and the length of time the bird had spent responding on the other key (Catania, 1966). The same increasing probability of entering a terminal link as a function of time just spent on the other key held for the present experiment as well, given the use of independent VI timers controlling entrance into the terminal link. Given the marked tendency for the pigeons to prefer the more reliably correlated situation to less reliably correlated situations, of what value is it to an organism to know when reinforcement is due? We have proposed a behavior-allocation hypothesis in which the value of predictive or informative stimuli lies in their function as discriminative stimuli for various activities. Green and Rachlin (1977) found that the behaviors of the pigeon during the terminal link fit the distinction made by Staddon and Simmelhag (1971) between terminal responses and interim activities. During a period in which the probability of reinforcement was low, the animals engaged in interim activities, behaviors unrelated to food delivery (e.g., turning away from the magazine wall, wing flapping). During a period in which the probability of reinforcement was high, the animals engaged in terminal responses, behaviors related to food presentation (e.g., pecking). Predictive stimuli may thus have value because they permit the pigeon to appropriately apportion its time between interim and terminal behaviors. Nonpredictive, uncorrelated stimuli do not permit such differentiation of behaviors. In the situation in which a hungry pigeon occasionally receives food and occasionally receives blackout, the following hypothetical hierarchy of values of terminal link situations may be constructed: (1) Food preceded by terminal responses; (2) food preceded by interim activities; (3) interim activities’ intrinsic value (followed by blackout); (4) terminal responses’ intrinsic value (followed by blackout). Predictive stimuli ensure that the pigeon will obtain the higher-valued situations (1) and (3) during food and blackout links and avoid situations (2) and (4). Nonpredictive stimuli, on the other hand, force the pigeon to experience the lower-valued situations (2) and (4) at least occasionally.

PREFERENCES

FOR

RELIABLE

STIMULI

251

Knowing when reinforcement is due restricts the period during which terminal responses need to be engaged in, and increases the period during which interim activities can be performed. Predictive stimuli, and thus information, are valuable, therefore, to the extent that they permit the effective distribution of such behaviors. That is, they enable terminal responses to be made before food and also allow the animal to perform interim activities at other times. Consequently, informative stimuli are preferred to noninformative stimuli not because the stimuli themselves are valuable, but because the stimuli allow that behavioral situation as a whole to become more valuable than one in which nonpredictive, noninformative stimuli are present. This behavior-allocation analysis of the value of information may also pertain to the present experimental situations in which the reliability of the stimuli is varied. Unreliable stimuli often force the animal to experience the lower-valued behavioral situations such as food preceded by interim activities. Reliable stimuli, however, ensure that the pigeon experiences the higher-valued situations. The greater the reliability between stimuli and outcomes, the more often the higher-valued situations are experienced, and the more the pigeon should prefer that situation. This account has certain features in common with an approach stressing preparatory responses (Perkins, 1955). Preparatory responses are assumed to occur when a stimulus signals reinforcement, and these responses are assumed to increase the magnitude or quality of the reward. When a noninformative stimulus is presented, optimal preparatory responses are assumed not to occur. This would explain the preference for informative stimuli. There are, however, certain difficulties, the most obvious being the measurement and proof of such responses. The detection of overt preparatory responses has not been successful. These responses should somehow increase the value of the reward, the prototypic example being that of salivating before food presentation. Stein (1958), however, showed that a prereinforcement stimulus became a conditioned reinforcer when brain stimulation was the reward. No appropriate preparatory response is known that increases the value of electrical stimulation of the brain, although the possibility of some central, rather than peripheral, response has not been ruled out (but see Cantor, 1979, in which it is suggested that Stein’s animals may have been turning their heads during the signal, thereby altering the impedance of the electrode and receiving more current). The behavior-allocation account based on interim and terminal behaviors need not posit that a response prior to reward must somehow increase the value of that reward over its value were no response made. Rather, the overall value for a situation in which appropriate behaviors occur is higher than that for a situation in which these behaviors are often performed at inappropriate times, as when stimuli are unreliable and

252

LEONARDGREEN

nonpredictive about outcomes. Such an approach states what behaviors an organism will engage in at various time intervals, behaviors which are measurable (Green & Rachlin, 1977; Staddon & Simmelhag, 1971). Predictive stimuli permit the organism to apportion its time between these two classes of behaviors. A situation permitting the organism to engage in appropriate interim and terminal behaviors would be more highly valued than one in which terminal responses are consistently performed although reinforcement is not consistently forthcoming, or in which interim activities are performed with food delivery on some occasions. A secondary reinforcement account would posit that the conditioned reinforcing strength of a stimulus is related to the density of reinforcement occurring in its presence. Organisms’ preferences for multiple over mixed schedules are assumed to result from a concave utility function relating secondary reinforcement value of a cue to the probability of reinforcement in its presence (see Bower et al., 1966, and Wyckoff, 1959, for this analysis). Unfortunately, such a function “is inferred from the data in a post hoc manner” (Bower et al., 1966), and, as pointed out by Green and Rachlin (1977), suffers from lack of precision because almost any result could be explained through judicious changes in the shape of the curve. Moreover, this approach is even less precise when stimuli are not perfect predictors of outcomes, as when the probability of reinforcement is held constant and only the correlation between the stimuli and the outcomes is varied. ‘Exactly how one is to average the secondary reinforcement strengths accruing to different stimuli that are sometimes followed by food while at other times are followed by a blackout period is left unspecified. Consider the present case in which one situation provides a stimulus (say, red) which predicts food 85% of the time and the other stimulus (say, green) is a reliable predictor 15% of the time. Now compare this situation with one in which a blue signal is predictive on 65% of the occasions and a yellow signal, 35%. The red should be highly reinforcing and the green minimally so. The value of the blue and yellow signals fall somewhere in between. How can we predict which set of stimuli the organism will prefer? The conditioned reinforcement account remains disturbingly silent. The present analysis of the value of informative stimuli complements this and Fantino’s (1977) delay-reduction hypothesis of conditioned reinforcement by providing a behavioral basis underlying the obtained preferences. According to the delay-reduction hypothesis, “when an organism chooses between two stimuli correlated with different reductions in delay to primary reinforcement, its choice of either stimulus is a monotonic function of the relative reduction in average time to reinforcement correlated with that stimulus” (Fantino, 1977, p. 337). When the choice is between a multiple and a mixed schedule of reinforcement, the delayreduction hypothesis correctly predicts preference for the multiple

PREFERENCES

FOR

RELIABLE

STIMULI

253

schedule. For example, consider a mixed VI 1-min EXT schedule with equiprobable components. In the presence of this mixed schedule, the average delay to reinforcement at the onset of a component is 120 sec. However, with an equivalent multiple schedule, the onset of the signal for food is correlated with an average reduction in delay to reinforcement of 60 set (one-half that of the mixed schedule). The delay-reduction account also predicts the preferences obtained in the first experiment. The uncorrelated stimuli do not reduce expected time to primary reinforcement whereas the correlated stimuli sometimes do. As the correlation between stimuli and outcomes increases, the situation begins to approach that of the multiple versus mixed condition and preference would be predicted to increase concomitantly. However, under a concurrent-chain procedure as used in Experiment 2, the stimuli in both terminal links were to different degrees correlated with the equiprobable outcomes but did not change the overall average reduction in delay to primary reinforcement. Difficulties in averaging again arise and preferences in such a concurrent-chain procedure are not easily predicted. Consider the present situation in which one link provides stimuli that are 85% reliable and the alternative link provides stimuli that are 65% reliable, with each link lasting 10 sec. Since reinforcement occurred on half the trials, and the initial link was a c~nc VI I-min VI 1-min, average delay to reinforcement from the onset of each initial link was 80 sec. (Responses during the initial link produce entry into the terminal links on the average every 60 set for each key. Thus, expected time to a terminal link is 30 sec. Time to an outcome from the onset of the left and right terminal link is 10 set for each. Since the terminal links are equiprobable, the overall expected time to an outcome from onset of the initial links would be 30 set + [(i) . (10) + ($J * (lo)] = 40 sec. However, half of the outcomes end in a blackout and consequently, expected time to reinforcement is twice this, or 80 sec.) When the red signal is presented, 85% of the time the bird is 70 set closer to reward than it had been at the onset of the initial link keys. When green is presented, it is 70 set closer to reward on 15% of the trials. For the other link, blue signals a reduction in time to reward of 70 set on 65% of the trials, and yellow, 35% of the time. At all other times, average delay to reinforcement is not reduced when the signal is presented. In the present experiment, this means that of the 20 food deliveries in each terminal link, 17 were signaled with a red key light and 3 with a green under the .85 link. For the .65 link, 13 of the food deliveries were preceded by a yellow signal and 7 by a blue. As with the traditional account, the delay-reduction hypothesis of conditioned reinforcement does not state how such values are to be averaged. Thus, this account also does not make obvious predictions. Indeed, Fantino (1977) has noted that “while the delay reduction hypothesis of conditioned reinforcement is consistent with all of the data from the observing re-

254

LEONARD

GREEN

sponse and concurrent chains literature when only the size of the IRI [interreinforcement interval] is manipulated, some events occurring during the IRI must also be considered for a complete account of choice” (p. 336). Such events include differential discriminative stimuli in the terminal links. The present approach provides a basis for understanding why pigeons come to prefer reliably-correlated or informative stimuli to uncorrelated or less informative stimuli, even when the overall average reduction in time to primary reinforcement might not be affected by the production of these stimuli. REFERENCES Autor, S. M. The strength of conditioned reinforcers as a function of frequency and probability of reinforcement. In D. P. Hendry (Ed.), Conditioned reinforcement. Homewood, 111.:Dorsey, 1969. Bloomfield, T. M. Reinforcement schedules: Contingency or contiguity? In R. M. Gilbert & J. R. Millenson (Eds.), Reinforcement: Behavioral Analyses. New York: Academic Press, 1972. Bower, G., McLean, J., & Meacham, J. Value of knowing when reinforcement is due. Journal of Comparative and Physiological Psychology, 1966, 62, 184-192. Brown, P. L., L Jenkins, H. M. Auto-shaping of the pigeon’s key-peck. Journal of the Experimental Analysis of Behavior, 1968, 11, I-8. Catania, A. C. Concurrent operants. In W. K. Honig (Ed.), Operant behavior: Areas of research and application. New York: Appleton-Century-Crofts, 1%6. Catania, A. C. Freedom and knowledge: an experimental analysis of preference in pigeons. Journal of the Experimental Analysis of Behavior, 1975, 24, 89-106. Cantor, M. B. Brain stimulation reinforcement: Implications of an electrode artifact. Science, 1979, 204, 1235-1236. Dinsmoor, J. A., Browne, M. P., & Lawrence, C. E. A test of the negative discriminative stimulus as a reinforcer of observing. Journal of the Experimental Analysis of Behavior, 1972, 18, 79-8.5. Fantino, E. Conditioned reinforcement, choice, and the psychological distance to reward. In D. P. Hendry (Ed.), Conditioned reinforcement. Homewood, 111.:Dorsey, 1969. Fantino, E. Conditioned reinforcement. Il. Choice and information. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior. Englewood Cliffs, N.J.: Prentice-Hall, 1977. Fleshler, M., & Hoffman, H. S. A progression for generating variable-interval schedules. Journal of the Experimental Analysis of Behavior, 1962, 5, 529-530. Gollub, L. R. Information on conditioned reinforcement. A review of Conditioned Reinforcement, edited by D. P. Hendry. Journal of the Experimental Analysis of Behavior, 1970, 14, 361-372. Green, L., & Rachlin, H. Pigeons’ preferences for stimulus information: Effects of amount of information. Journal of the Experimental Analysis of Behavior, 1977, 27, 255-263. Hendry, D. P. Reinforcing value of information: Fixed-ratio schedules. In D. P. Hendry (Ed.), Conditioned reidorcement. Homewood, Ill.: Dorsey, 1969. Hursh, S. R., & Fantino, E. An appraisal of preference for multiple versus mixed schedules. Journal of the Experimental Analysis of Behavior, 1974, 22, 31-38. Jenkins, H. M., & Boakes, R. A. Observing stimulus sources that signal food or no food. Journal

of the Experimental

Analysis

of Behavior,

1973, 20, 197-207.

Kendall, S. B. Effects of two procedures for varying information transmission on observing responses. Journal of the Experimental Analysis of Behavior, 1973, 20, 73-83.

PREFERENCES

FOR RELIABLE

STIMULI

255

McMillan, J. C. Average uncertainty as a determinant of observing behavior. Journal ofthe Experimental Analysis of Behavior, 1974, 22, 401-408. Mulvaney, D. E., Dinsmoor, J. A., Jwaideh, A. R., & Hughes, L. H. Punishment of observing by the negative discriminative stimulus. Journal of the Experimental Analysis of Behavior, 1974, 21, 37-44. Perkins, C. C., Jr. The stimulus conditions which follow learned responses. Psychological Review, 1955, 62, 341-248. Silberberg, A., & Fantino, E. Choice, rate of reinforcement, and the changeover delay. Journal of the Experimental Analysis of Behavior, 1970, 13, 187-197. Squires, N., & Fantino, E. A model for choice in simple concurrent and concurrent-choice schedules. Journal of the Experimental Analysis of Behavior, 1971, 15, 27-38. Staddon, J. E. R., & Simmelhag, V. L. The “superstition” experiment: A reexamination of its implications for the principles of adaptive behavior. Psychological Review, 1971,78, 3-43. Stein, L. Secondary reinforcement established with subcortical stimulation. Science, 1958, 127, 466-467.

Wyckoff, L. B., Jr. The role of observing responses in discrimination learning. In D. P. Hendry (Ed.), Conditioned reinforcement. Homewood, Ill.: Dorsey, 1969. Received August 22, 1979 Revised January 2, 1980