Anim. Behav ., 1979, 27, 87 5-886
FORAGING AND REINFORCEMENT SCHEDULES IN THE PIGEON OPTIMAL AND NON-OPTIMAL ASPECTS OF CHOICE BY S .
University
E. G. LEA
of Exeter, Department of Psychology, Exeter EX4 4QG
Abstract . Pigeons were trained in a Skinner box on a reinforcement schedule that simulated a foraging situation. Pecks on a central key occasionally illuminated a side key which, if pecked, led to food reward after a delay that varied with the side key colour. Reward durations, post-reward detention intervals, the probability of occurrence of the two side-key colours, and the time between side-key illuminations were all varied between conditions . The schedule allowed the pigeons to accept or reject any reward they always accepted the side-key colour associated with less pre-reward delay, but accepted the other colour with a probability that varied between conditions . These variations were qualitatively but not quantitatively consistent with predictions from optimal foraging theory . Introduction If a forager has the sole aim of maximizing E/T, the net input of energy per unit time foraging, by varying the probabilities P i of pursuing the ith type of prey whenever it is . encountered, he should behave as follows (see, e .g . Schoener 1971 ; Charnov 1976). (i) P, values should be either zero or unity ; intermediate values cannot be optimal . That is, the ith prey type is either never or always pursued. (ii) Preference for a given prey item should depend only on the ratio E,lh, ; the. higher this ratio for any prey item for which P i equals one, the higher E/T. Here E; is the net energy worth of the ith prey type, and h ; is its handling time, the time taken to pursue and consume it . (iii) P, should be one if and only if E,/h, exceeds the E/T value resulting from always pursuing all prey types preferable to the ith, and never pursuing any prey type to which the ith is preferable . (iv) As a . consequence of (iii), whether or not the ith type of prey is pursued should not depend on its own density in the environment, but it should depend on the densities of all the more profitable prey types. If the forager's aim is more complex than maximization of E/T, more complex multiattribute models are required, e .g. Rapport (1971) . But recent field and laboratory studies have shown that the predictions (i) . to (iv) describe the behaviour of animals foraging surprisingly - well . Many, species have been shown to become more selective as . overall prey density increases (e.g. trout, Ivlev 1961 ; spotted fly catchers foraging for. Diptera . and aphids, Davies 1977b, Figs . 4 and 5 ; great tits fed on
mealworms, Krebs et al . 1977 ; redshank foraging for polychaete worms, Goss-Custard 1977a ; bluegill sunfish fed on Daphnia, Werner & Hall 1974 ; blue and coal tits foraging for Ernarmonia conicolana larvae in pine cones, Gibb 1958, see Tullock 1971) . This result is consistent with prediction (iii) above, and in several of these studies (Werner & Hall 1974 ; Goss-C.ustard 1977a ; Krebs et al . 1977) prediction (iv) was also confirmed, although GossCustard did find , a sligkt effect of the less profitable prey's own density on the probability that it would be taken. Predictions from optimality are sometimes violated . The all-or-none rule (i) failed in the studies of Goss-Custard (1977a) and Krebs et al . (1977), and the order of preference between prey types . is sometimes not well predicted by E,lh, (e .g . Goss-Custard 1977b ; Smith & Follmer 1972) . It remains the exception rather than the rule, however, for optimality to be brought into question by foraging data. . In contrast, there are a number of situations in which studies of learned instrumental behaviour of , animals reliably find non-optimal choice . For example, when pigeons are exposed to concurrent schedules of reinforcement in a Skinner box, they show a preference for a schedule yielding small, frequent rewards over one that gives larger, rarer rewards even if the latter is better in E/T terms (Todorov 1973 ; similar results were obtained for rats in mazes by Logan 1965a) . Logan also . showed that rats prefer an immediate. reward followed by a detention period to an identical reward preceded by an identical delay even though the two situations are obviously equivalent in E/T terms. Similar results have been obtained by Ainslie . (1974) 875
876
ANIMAL BEHAVIOUR, 27, 3
and Rachlin & Green (1972), although postreward detention does affect preference to some extent (Logan 1965a, page 11 ; Sibly 1975). Another reliable failure of optimality arises where there is a choice between a fixed and a variable outcome of the same mean value : the variable outcome is preferred, even if the fixed outcome is then altered to make it preferable in EJT terms (e .g. Levanthal et al . 1959 ; Pubols 1962 ; Herrnstein 1964 ; Logan 1965b ; Fantino 1967 ; Killeen 1968 ; Davison 1969 ; note that Logan's paper contains one exception to the general rule) . Why should foraging and conditioning experiments give such different results? It may be that optimal choice is only shown in the foraging situation to which animals have adapted (the ecological validity hypothesis) . As alternatives, we might propose that the important difference lies in the structure of the choice situation (normally two options are offered simultaneously in conditioning experiments, whereas the forager must choose between one kind of prey that is offered now, and another that may be found later), or in the particular parameters of the prey types and choice options that have been studied up to this point . The experiment to be described here was designed to distinguish between these hypotheses so far as possible, by using a schedule of reinforcement in a Skinner box that had the same choice structure as the foraging situation, and introducing parameter settings that are typical of the foraging literature as well as those commonly used in operant conditioning experiments . The choice structure hypothesis clearly predicts uniformly optimal behaviour in this situation, while the parametric hypothesis predicts that behaviour will be optimal with typical foraging parameters, and non-optimal with typical conditioning parameters . The predictions from the ecological validity hypothesis depend on what factors we think are important in making a situation ecologically valid . If the physical environment is most important, behaviour should be uniformly non-optimal since the Skinner box is wholly unnatural, and this simple form of the ecological validity hypothesis if therefore open to disproof by the experiment . However, a different form of the ecological validity argument might be advanced to explain the importance of either choice structure or of parameter values, if the experiment should confirm the predictions of either of these hypotheses. Thus the ecological validity
hypothesis as such cannot be disproved by the experiment proposed, but the meaning of ecological validity should be refined . G . H . Collier & L . W. Kaufman (personal communication), have already carried out one simulation of foraging using operant conditioning . They found that rats became more selective between fixed-ratio schedules of reinforcement (Ferster & Skinner 1957) as the work required to gain access to the schedules decreased ; but they did not investigate the quantitative optimality of the behaviour found . Method Subjects Six homing pigeons (domesticated Columba livid) were maintained at 85% of their cagedfree-feeding weights. They had free access to water except in the test chamber . Several months before the present experiment they had been trained to peck keys in apparatus similar to that used here, and had experienced 2000 trials of a pattern discrimination training procedure . Apparatus A commercial pigeon operant test chamber was used . It measured 30 . 8 x 35.0 x 35 .8 cm, and one side wall consisted of an intelligence panel on which were mounted three 2 .5 cm diameter pecking keys, a solenoid-operated food hopper, a 2 . 8 houselight to provide general illumination, and a 4-0 loudspeaker through which white noise was played at all times . The houselight and one key were mounted centrally, 31 . 8 and 25 . 3 cm respectively above the grid floor, while the remaining keys were at the same level as the central key but 8 . 0 cm to each side of it . The right key could be transilluminated with red or green light, and the central key with white light ; the left key was blanked off. The keys operated microswitches when struck with a force exceeding 0 .1 N. An aperture 5 .7 x 5 . 1 cm, directly below the central key and with its lower edge 9 .9 cm from the floor, gave access to the grain hopper when the solenoid was energized, and at these times a 1-W traylight shone into the hopper aperture. The remaining chamber walls formed part of a sound-resisting chest, which was ventilated by a fan . The chest was in a brick-built sound-resisting compartment, part of a large operant laboratory . Behaviour was monitored using electromagnetic counters and a cumulative recorder, but all experimental contingencies were implemented by a digital computer except during some early
87 7
LEA : FORAGING AND REINFORCEMENT SCHEDULES
conditions with one bird (L13), when electromechanical control equipment was used . Pre-training Despite their previous training, the pigeons did not peck the keys when first introduced into the apparatus . They were therefore exposed to several sessions during which the foraging schedule described below was used, but all parameters were set to minimal values, and transitions between states were made when the experimenter judged the pigeon to have made an approximation to a key peck . The standard of approximation required was raised until the pigeons pecked the keys reliably, and the schedule parameters were then gradually increased to the values used in the initial condition for each bird . Pre-training was complete within three sessions for all birds . Foraging Schedule Figure 1 is a flow chart of the schedule of reinforcement used to simulate the foraging situation . The pigeons were introduced into the chamber with the houselight and keylights off and the session was started by turning on the houselight and the centre key light, so initiating the search state . Completion of a fixed-interval (FI) t-s schedule of reinforcement (Ferster & Skinner 1957) was required for entry into the choice state, that is, the first centre key peck after t s caused the state change, any previous pecks being ineffective . On entry to the choice state one of two `prey types' was selected by the computer program, and (according to which was selected) the right key was lit up red or green . The pigeon could now either (i) continue to peck on the white key : three pecks were sufficient to re-initiate the search state, the requirements being set at three rather than one to make sure that the pigeon had a chance to see the illumination of the right key ; (ii) stop responding altogether, which eventually had the same effect as pecking the white key (this contingency was invoked only two or three times in the entire experiment) ; or (iii) peck on the coloured key, which initiated a handling state, turned off the central key light, and initiated a further FI schedule of hs or hL seconds, depending on the `prey types' . Completion of this FI schedule brought Es or EL s access to grain, which in the case of one prey type but not the other was followed by d s detention in which neither key light was lit . The `prey types' thus differed in the key colour, the FI requirement at the handling stage, the time of access to grain given
as reward for completing the Fl, and the presence or absence of post-reward detention . In the present experiment, hs was always 5 s and hL 20 s ; Es and EL were normally both 2 .5 s, but were varied as noted below ; and d was normally zero, but was set to 30 s for some conditions as described below . For birds Lll, L13, and L15, the shorter `handling time' (hs) was associated with the green light on the right key, as in Fig . 1, but the opposite contingency was used for the other three birds . Sessions consisted of 80 trials, that is to say 80 entries to the search state . When entry to the 81st search state would otherwise have occurred, the houselight and keylights were extinguished, and the bird was removed . Each bird experienced one session per day, six days per week so far as possible .
1 side key peck
1 side key peck HANDLING 0
1 FI hssecs side key REWARD
d secs
s secs 1E `DETENTION 0 10
KEYS :
colour
location f
centre . side
0 white
11111111 red
G green dark
Fig. 1 . Flowchart of the reinforcement schedule used to simulate a foraging situation . The rectangles indicate different states of the schedule ; the captions by the arrows indicate the condition required for state transitions . The red and green stimuli were reversed for pigeons L10, L12 and L14.
878
ANIMAL BEHAVIOUR, 27,
Experimental Design The experiment consisted of a series of conditions, where a condition was defined by settings of the parameters, t, p, Es, EL, and d (see Fig. 1) . The conditions were arranged in blocks, each block being designed to test a specific hypothesis about foraging . The blocks used are described below, and Table I summarizes the schedule parameters used. Each condition was maintained for at least five sessions, and conditions were not changed if the probability of accepting the `prey type' of longer `handling time' either (i) ranged over more than 0 . 2 in the last three sessions or (ii) showed a consistent upward or downward trend over the last three sessions . This was a mild stability criterion by the standards of operant experiments on choice ; the maximum and minimum number of sessions
3
run under each condition are included in Table I . Conditions were never changed within a session . In some cases, the same condition was required in successive blocks, and to save time the condition was nott always repeated, the same data being used for two different analyses (see Note b to Table II) . Conversely, conditions were sometimes repeated within a block, and often in different blocks, in order that crucial comparisons could be made between conditions that had been run consecutively or nearly so . Within each block, the order in which the various conditions were used was varied between birds . Testing the Effects of General `Prey' Density For the first block of conditions the only parameter adjusted was t, the search . state fixed interval value. Es and EL were both set at
Table 1. The Conditions Used in the Experiment, the Encounter Rates (per Second in the Search State) They Generated, the Order in which They Were Used, and the Maximum and Minimum Number of Sessions in Each
Schedule parameters Block
t
p
1
5 7-5 10 20 30 40 80 110
2
Encounter rates
EL
d
Short prey
0-5
2 .5 2-5 ditto ditto ditto ditto ditto ditto ditto
0
36 20 20 4
0-9 0 .5 0-5 0-1
2-5
2-5 ditto ditto ditto
3
36 20 20 4
0-1 0-5 0-5 0-9
2-5
4
45 9 5 9 5
0.5 0. 1 0.5 0.9 0.5
5
5
0-5 ditto
6
5
Range across birds of numbers of sessions used
Long prey
Minimum
Maximum
0 . 10 0 . 067 0 . 05 0-025 0-017 00125 0-0063 0-0046
0 . 10 0-067 0 .05 0-025 0-017 0-0125 0-0063 0-0046
5 5 5 5 6 5 7 5
7 5 5 11 6 11 8 5
0
0-003 0-025 0-025 0-225
0.025 0-025 0.025 0-025
5 5 5 6
6' 6 12 8
2-5 ditto ditto ditto
0
0-025 0-025 0-025 0-025
0-003 0-025 0-025 0-225
5 5 5 5
17 9 9 8
2-5
2-5 ditto ditto ditto ditto
0
0.011 0 . 10 0 . 10 0 .011 0 . 10
0 .011 0-011 0 . 10 0 . 10 0 . 10
5 5 5 5 5
8 9 6 9 9
2-5 2
2-5 8
0 0
0 . 10 0. 10
0 . 10 0 .10
5 5
8 16
0-5 2-5 ditto
2-5
0 30
0 .10 0.10
0 . 10 0 . 10
Es
8 12
*Schedule parameters are abbreviated as follows : t, centre key FI (search time) in seconds ; p, probability that a `prey' encountered would have long (20 s) pre-reward handling time ; Es and EL, duration of access to grain following short and long pre-reward handling time respectively (in seconds) ; d, duration of blackout (detention) following short handling time reward (in seconds).
879
LEA : FORAGING AND REINFORCEMENT SCHEDULES
2 . 5 s, d was set to zero, and p to 0 .5 . The aim was to demonstrate a change in selectivity as prey density varied from high values . (low . t values) to_ low (high t) . . The interval t was set0 at . 20 s for all birds in the first condition, and then . increased. .or decreased consistently until a range of probabilities of accepting the prey of longer handling time had been observed ; finally an extra condition was run in which t was set very low if generally high t values had been used up to that point and very high if generally low values had been used hitherto .
prey type of longer handling time, per unit time spent in the search state, equals p/t ; the encounter rate with the other prey type equals (1 - p/t) . These two quantities are referred to as `long density' and `short density' below . By suitable manipulation of p and t it was possible to adjust them independently, and three tests were run in different blocks of conditions . In block 2 long density was held constant at 0 . 025 Hz (occurrences per second in the search state), while short density took values of 0 . 003, 0 . 025, and 0 . 225 Hz. In block 3, short density was held constant at 0 .025 Hz while long density took values of 0 . 003, 0 . 025 and 0 . 225 Hz . Block 4 was a factorial design in which long and short densities both took values of 0 . 10 and 0 . 011 Hz .
Testing the Effect of Density of One Prey Type with the Density of the Other Held Constant Provided that the pigeons completed the search fixed interval schedule with negligible delay, which in practice they nearly always did, the encounter rates with the two `prey types' can be calculated from the parameters of the foraging schedule . The encounter rate with the
Testing the Relative Effectiveness of Size and Delay of Reward With p set to 0 . 5, d to zero and t to 5 s, a condition was run with Es and EL both at 2 . 5 s
Table II . Time Spent (s) and Responses Made in Various Phases of the Schedule by One Bird (L12) . Data are the Means over the Last Three Sessions Run in Each Condition of the Experiment
Time and responses . Schedule parametersa
`Short' prey
Condition
t
p
Es
EL
d
Search
1
1 2
20 40
0. 5 0.5
2 .5 2.5
2.5 2 .5
0 0
21 . 5 13 . 6 40 . 7 20 . 0
2
18 17 12 13
36 20 20 4
0.9 0. 5 0. 5 0. 1
2.5 2.5 2.5 2.5
2.5 2.5 2.5 2 .5
0 0 0 0
3
7 8 10 11
36 20 20 4
0.1 0.5 0.5 0.9
2.5 2 .5 2 .5 2.5
2 .5 2 .5 2.5 2.5
4
16 14 15 5 6
45 9 5 9 5
0.5 0.1 0.5 0.9 0.5
2.5 2.5 2 .5 2 .5 2 .5
5
6 4
5 5
0.5 0.5
6 3
5 5
0.5 0.5
Block
6
Choices
`Long' prey
Handling
Choices
Handling
3 .1 0.0 0 . 6 0 .0
5 . 3 14 . 1 5 . 1 16 .2
4.4 0.3 6.3 0.5
20 . 2 39 .3 20 . 2 50 . 7
39 . 0 15 . 1 20 . 6 11 . 0 20 . 6 13 . 9 5 .7 1 .6
0-9 0-0 0 .9 0 .0 0-60-0 0 .7 0 .0
5 . 212 .9 5 .2 14 . 2 5 .2 11 .2 5 . 2 14-8
6 . 00 . 1 7.7 0.2 3 .0 0 . 1 2-40-1
20-2 39-0 20 .4 43 . 3 20-247-4 20 . 4 28 •l d
0 0 0 0
36 . 9 21 .3 20 . 7 5.2
16 . 7 12 .7 13 . 0 3 .3
0.6 0.6 0.6 0.6
0 .0 0.0 0.0 0.0
5 .2 5.1 5 .2 5 .1
16 . 5 16 . 2 15 . 7 14 . 3
4.8 6.9 3 .8 4.4
0.3 0.5 0.4 0.3
20 . 2 20 . 1 20 . 2 20 . 1
2.5 2.5 2.5 2.5 2.5
0 0 0 0 0
45 .5 9.8 5.5 11 . 8 5.5
31 .2 6.1 5 .5 3 .6 6.7
0.8 0.9 0.7 0.7 0.7
0.0 0.0 0.0 0.0 0.0
5 .3 5.1 5. 1 5.1 5.1
12 . 2 15 . 1 16 . 0 15 . 8 15 . 7
3.6 4.9 5.2 4.7 4.0
0.3 0.7 0 .4 0.3 O .Od
20 . 2 43 .6 20 .8 43 .6 20 .2 44 .8 20-241-0 20 . 2 28 .0d
2.5 2
2.5 8
0 0
2 .5 2.5
2.5 2.5
0 30
42 .2 44 .7 42 .8 46 .6
6.2 2.5
See Block 4 aboveb 0.6 0.0 5 . 1 15 . 7
5.9 0.2
20 . 1 43 .6
5.5
See Block 4 aboveb 0.6 0.0 5 . 1 16 . 1
3.6 0.7
20 . 1 48 .3
5.4
aAs note * to Table I . bData from block 4 were re-used for this conditionn since an identical condition had been used there . cThe means for responses in the choice state are taken only over trials in which the given prey type was accepted ; values for other trials were necessarily three . . dAbberant values for long prey datalunder these conditions are due to very small numbers of trials on which the long prey was accepted, leading to unstable means .
880
ANIMAL BEHAVIOUR, 27, 3
as before . This was compared with another condition in which, with the same p, d, and t values, Es was set to 2 s and EL to 8 s. Since pigeons may be assumed to obtain at least four times as much grain in the latter condition as in the former, the two 'prey types' were thus made equivalent in E/T terms. These conditions comprised block 5 . Testing the Effect of Post-reward Detention With p set to 0 . 5, t to 5 s, and Es and EL both at 2 . 5 s, a condition was run with d set to 30 s. The effect of this was to make the 'prey type' with the longer (20 s) pre-reward 'handling time' Fl objectively the more profitable of the two, in E/T terms. For comparison, another condition was run with all the parameter settings identical except that d was set to zero . These conditions comprised block 6 . Results General Observations on Behaviour Once the birds had been shaped to peck both keys, they adjusted rapidly to the foraging schedule. In all conditions, all birds shifted immediately to the right key on every trial when it lit up with the colour associated with the 'prey type' of shorter pre-reward 'handling time' . Two extremely minor exceptions to this rule are noted below . Behaviour towards the other 'prey type' was more variable, and depended upon the schedule parameters, as will be shown below . The birds commonly paused for some seconds when the right key was lit with the colour associated with this 'prey type', and on occasion pecked the centre key once or twice before switching to the side key, thereby committing themselves to obtaining reward on that trial . Once committed, the birds usually pecked continuously at the right key regardless of its colour, though there was occasionally some acceleration of response toward the end of the 'handling time' . Pecking at the centre key, on thee other hand, was characteristic of fixedinterval schedule performance, with a pause after reinforcement followed by a gradually, accelerating response rate (cf. Ferster & Skinner 1957, pages 157 to 174), and a faster initial response rate when a 'prey' had just been rejected so that in effect a reinforcement had been omitted (cf. Staddon & Innis 1969) . Table II shows the mean values of several peck counts and phase durations in all conditions for one bird (L12), as an example of some of the trends just described .
Although the probability of accepting the prey type of shorter pre-reward 'handling time' was always one, the probability of accepting the other 'prey type' did not take only the values zero and one predicted by optimal foraging theory : the extreme probabilities did occur, but so did intermediate probabilities . The mean value of this probability over the last three sessions run in each condition, called hereafter p (peck long), is therefore the major dependent variable in this experiment . All further presentation and discussion of results focusses on it . Effect of Overall Prey Density Figure 2 shows that there was, as predicted by optimal foraging theory, an increase in p (peck long) as t, the centre key fixed interval, was increased . But the increase did not take the form of a step function, nor did p (peck long) switch from below 0 . 5 to above 0 . 5 at the t value (7 . 5 s) where given optimal foraging it should have switched from zero to one . Effect of Density of the One Prey with Density of the Other Type Constant Figure 3 shows that when the density of the prey with longer handling time ('long density') was held constant at 0 . 025 Hz while 'short density' varied from 0 .003 Hz to 0 . 025 Hz, and from 0 .025 to 0 . 225 Hz, p (peck long) declined systematically for all six birds . The mean Kendall's tau correlation between short density and p (peck long) was - 0 . 97, which is significant (P < 0 .001) using a two-tailed version of the test described by Jonckheere (1954) . The effect of varying 'long density' over the 1 .00
•
=1 .° 1 ro I Z_
cz7 CO .75 (2
z •
L10 '-L13
.0I
~1
. 50
Y •
a•
,L12
Q, 01 .25
•
1
w
• CL
10 20 40 80 CENTRE KEY FI (secs)
Fig. 2 . Probability of each pigeon pecking the side key (and thus choosing to take the current 'prey') when the prey type of longer handling time was offered, as a function of the time between presentations of a prey item. Note the logarithmic scale on the abscissa .
LEA : FORAGING AND REINFORCEMENT SCHEDULES
same range while short density was held constant at 0 . 025 Hz is shown in Fig . 4 . The relationship was clearly less consistent (mean tau -0 . 30), and fell short of significance (P = 0-18) . One pigeon failed to accept the `short prey' in one or two trials under the condition where its density was highest . Figure 5 shows the results of the factorial test of the effects of long density and short density . From the results shown in Figs . 3 and 4, it would be predicted that p (peck long) would be greatest when both long density and short density were low, next greatest when long 1 .00
o z
.75
0 J
Uw .50 a C
1 .25
.003 .025 .225 'SHORT' PREY ENCOUNTER RATE (Hz)
Fig. 3 . Probability of each pigeon accepting the prey type of longer handling time, as a function of the frequency of occurrence the other prey type per unit time in the search state . The density of the prey type of longer handling time was constant at 0 . 025 Hz . Note the logarithmic scale on the abscissa .
881
density was high and short density were low, next greatest when the densities were reversed, and least when both densities were high . This was indeed the dominant pattern observed, and the correlation with the predicted order was high and significant (using the Jonckheere test, mean tau = 0 . 73, P < 0 .001), but the difference in the effects of the two density variables was not significant, since for two of the six birds (Lll and L14) changing the long density from 0 . 011 to 0 . 1 affected behaviour more than changing the short density by the same amount . A further analysis was made to compare the effects of the two variables more sensitively, using data from all the conditions in blocks 1-4 . Multiple regression was used in an attempt to predict p (peck long) from the long and short densities, in order to see which had the dominant role . At the same time, three further variables were included in the regression, in case they were contributing to the results. These were the number of the condition within the experiment, the number of sessions run under that condition and the mean p (peck long) value from the immediately previous conditon . Table III shows the regression coefficients obtained for each variable and each bird . ENCOUNTER RATES (Hz) LONG' PREY 1 .011 'SHORT' PREY 0 .011
® .10 ® .10
L13
L15
1 .00
-0 .75 z
0 J
Y .50 0 w
0-
a .25
L12 L14 L14 , .003
.025
.225
'LONG' PREY ENCOUNTER RATE (Hz)
Fig. 4. Probability of each pigeon accepting the prey type of longer handling time as a function of its frequency of occurrence per unit time in the search state . The density of the prey type of shorter handling time was constant at 0 . 025 Hz . Note the logarithmic scale on the abscissa .
L14
Fig . 5 . Probability of each pigeon accepting the prey type of longer handling time in a block of conditions during which the frequency of occurrence of both prey types was varied factorially . The condition with both frequencies at 0 . 10 Hz was tested twice : the left half of the relevant bar gives data for sessions run close in time to these with high `long' frequency and low 'short' frequency, and the right half gives data for sessions run close in time to those with low `long' frequency and high `short' frequency .
ANIMAL BEHAVIOUR, 27, 3
882
The effect of short density was significantly negative for all six birds ; the effects of long density were negative for all birds except L11, but significantly so only for L12 . For all birds except L12, the regression coefficient for short density was larger in magnitude than that for long density, and using the method for comparing the coefficients of the same regression given by Surrey (1974, page 31), it was found that the difference between the coefficients was significant for birds L11, L13 and L14 (P <0 . 05 at least, 2-tailed tests) . A further regression was run in which the regression coefficients but not the intercept (constant) were forced to the same values for each bird ; results from this overall regression are included as the right-hand column of Table III, and led to the same conclusions as the individual regressions : increases in short density and long density both tended significantly (P < 0 . 001 and 0 .01 respectively) to decrease p (peck long), but the effect of short density was significantly (P < 0 . 001) the stronger of the two . Effects of Variation in Reward Duration Figure 6 shows the change in p (peck long) on changing from a condition in which both `prey types' produced the same duration (2 . 5 s) of access to grain, to a condition where the `prey types' of longer pre-reward `handling time' (20 as against 5 s) led to compensatingly longer access to reward (8 as against 2 s) . Since it took the pigeons a finite if small time
to reach the food after the hopper solenoid was operated, the ratio of food obtained in the two. conditions was probably slightly better than 4 : I in practice .' All six birds showed an increase in p (peck long) under conditions of unequal access to reward, (the increase is therefore significant by a 2-tailed binomial test : P = 0 . 03), but in no case did p (peck long) reach 1 .00. The probability of accepting the prey type of shorter pre-reward `handling time' remained at 1 . 00 for all birds in all sessions under these conditions . Effects of Imposing a Post-reward Detention Interval Figure 7 shows the change in p (peck long) on changing from a condition with no postreward detention interval to a condition where a detention interval of 30 s was imposed for the `prey type' of shorter pre-reward `handling time', thus making its total handling time greater than that of the other `prey type' . Although the mean value of p (peck long) was slightly increased (from 0 . 16 to 0 . 28) by this manipulation, two of the six birds showed a decrease in p (peck long), and the change was therefore not significant . Pigeon L14 failed to peck the key associated with the `prey type' of shorter pre-reward `handling time' on one trial in one session with the 30-s post-reward detention interval ; otherwise that `prey type' was always accepted .
Table IIL Results of Regression Analyses Takingp (Peck Long) as the Dependent Variable (after an Arcslue Transformation) . The Table Shows Regression Coefficients for Each Bird ; Prey Densities Were Log-transformed to Equalize their Spacing . Data from Condition Blocks 1-4 were Used . Birds L15
All birds together
-- 0 . 369*** - 0.206**
- 0 .269**
- 0 . 220***
-0-055
-0-049
-0-090
- 0 .083
0 . 011
0 . 003
0.022
0 .016
0 .003
0 . 046
-0-032
-0-053
-0-050
-0-016
0 .368
-0-022
0 .277
- 0 .228
0 . 163
0.031
L10
1,11
L12
Short prey density
- 0-266**
- 0-175**
- 0-205**
Long prey density
-0-144
0 .028
- 0 . 238**
Condition number
-0-001
0.030*
0 .067 -0-081
Regressor
Sessions run Previous p (peck long) Degrees of freedom Variance accounted for (adjusted for df)
10 45 .6
L13
L14
9
9
8
9
9
63 . 2
77 .8
74 .9
57 . 8
56 . 6
*, **, *** Significant at the 0 .05, 0 . 01 or 0 . 001 level for a two-tailed test .
0 . 014**
79 59 . 7
LEA : FORAGING AND REINFORCEMENT SCHEDULES
Discussion The results show features of both optimal and non-optimal behaviour, in . ways that are consistent with previous literature both on foraging and on operant conditioning . Of the three hypotheses proposed in . the introduction to explain why behaviour is sometimes optimal and sometimes not, the choice structure hypothesis and the `physical environment' version of the ecological validity hypothesis are therefore clearly disproved . We are left with the parametric hypothesis, and the remainder of this discussion is an attempt to discover why particular parameter settings should lead to optimal choice while others do not . Considerations of ecological validity may be relevant, but there are other possibilities . Results Consistent with Optimality These include the virtually universal acceptance of the prey type of shorter pre-reward handling time, at least for all conditions with equal reward duration ; the increase in p (peck long) as the centre key Fl increased, and hence both long and short density decreased (Fig. 2 ; cf. Ivlev 1961 and other studies cited in the introduction) ; the dominance of short density over long density in determining p (peck long) (Figures 3 to 5 and Table III ; cf. Davies 1977a,
1
883
1977b ; Goss-Custard 1977a and Werner & Hall 1974 for more natural foraging . data where the energy worth of single types of prey varied) . Departures from Optimality Most obviously, p. (prey long) was not restricted to zero and one, but took intermediate values ; that is to say, the pigeons showed stochastic rather than exclusive preference . This result has been found in foraging studies (GossCustard 1977a ; Krebs et al. 1977) as well as a previous laboratory simulation of foraging (Collier & Kaufman, personal communication) and it occurs commonly in the operant laboratory, for example in studies of the `matching law' that relates the distribution of responses between two concurrently available schedules of reinforcement to the distribution of responses between them (see Herrnstein 1961, 1970 ; Baum 1974 ; Lobb & Davison 1975 ; Myers & Myers 1977) . However, with the concurrent interval schedules used in most studies of matching exclusive preference may not be optimal, because time does not have to be committed to one schedule rather than the other (Shimp 1969 ; Rachlin et al. 1976). But failure of exclusive preference certainly does not involve any conflict of results between foraging and conditioning studies . Secondly, the pigeons showed a marked bias (in the sense of Baum 1974) towards rejecting the `long prey' . When the centre key FI was 7 .5 s with the two reward types equiprobable, the
4
REWARD DURATION RATIO E L/E s
0 Fig. 6. Probability of each pigeon accepting the prey type of longer handling time as a function of the ratio of the reward duration for the two prey . types . The ratio of I was obtained by both prey types giving 2 .5 s access to grain ; the ratio of 4 was obtained by the prey type of longer pre-reward handling time giving 8 s access and the other prey type 2 s . The time in the search state between prey presentations was 5 s .
30
POST-REWARD DETENTION d secs Fig. 7 . Probability of each pigeon accepting the prey type of longer pre-reward handling time as a function of the detention interval imposed after reward for the other prey type . The time in the search states between prey presentation was 5 s .
8 84
ANIMAL BEHAVIOUR, 27, 3
mean time to the next reward would be the same (20 s) if the pigeon accepted the long prey, or if it rejected it and . all subsequent long prey until a short prey next occurred . It might be predicted (for example from the matching law) that the probability of accepting the long prey would therefore be 0 .5 ; in fact it was consistently less than 0 .5 in that region . This result is consistent with the bias towards a variableinterval schedule of reinforcement over a fixedinterval schedule of the same mean observed in several operant conditioning studies (Herrnstein 1964 ; Killeen 1968; Davison 1969), though the bias seems unusually large . Although this aspect of the results has no parallel in the foraging literature the parallel condition has not been much investigated . Only Werner & Hall (1974) have claimed to show shifts in selectivity at precisely the densities that optimality predicts, other studies merely showing that the shift occurs within some range of densities that is often as wide on the gap between the observed and optimal location for the shift seen in Fig. 2. Thirdly, the pigeons were affected by the density of the worse prey type. This effect was significant overall in the regression analysis, though as Table III shows it varied considerably between birds . Goss-Custard (1977a) has also reported an effect of the density of the nonpreferred prey on the probability that it would be taken, by redshank, but in the opposite direction to the present result : the rarer the worse prey type, the less likely redshank were to take it. However, R. Bonser & J. H . Lawton (personal communication) have found nonoptimalities in the foraging behaviour of water boatmen (Notanecta glauca) that may be explained by a tendency like that shown in the present results . The fourth failure of optimality, the dominance of immediacy over reward duration seen in Fig. 6, is consistent with the results obtained by Todorov (1973) and Rachlin & Green (1972) using pigeons in operant choice situations (and also with what social psychologists call `failure to delay gratification', e.g . Mischel & Metzner 1962). Its consistency with the foraging literature is harder to establish, because although investigations have been concerned with the relative effects of E, and h, on foraging, the emphasis has mainly been on demonstrating handling time effects, preference for larger prey being taken almost for granted . Thus Davies (1977a) reports that wagtails foraging on Scatophagidae did not take the largest available flies, but an
intermediate size class that had a larger E,lh,, and data reported by Kear (1962) show that finches' preferences for seeds were better predicted by (kernel weight)/(husking time) than by kernel weight alone . Results obtained by Willson (1971) and Willson & Harmeson (1973) are consistent with the present data : they found that cardinals' and song sparrows' seed preferences were better predicted by husking time than by calorie intake rates (corresponding to Er/h,) . The final failure of optimality is the failure of post-reward detention to compensate for prereward handling time differences . This is consistent with the data Rachling & Green (1972) obtained in an operant preference study, in which pigeons chose 2 s' access to grain followed by 6 s' blackout in preference to 4 s' access preceded by 4 s' blackout . There are no parallel investigations in the foraging literature, and in this connection it would be interesting to study foragers whose prey sometimes takes a long time to digest . Some snakes might be suitable subjects . The Origins of Optimal Behaviour Since there seems to be no conflict between operant and foraging experiments, we need to develop a theory of choice that is sufficiently wide to explain both optimal and non-optimal behaviour, whether it is found in foraging or in the operant laboratory . The failures of optimality observed here seem to fall into two classes, one of effects that may be due to `sampling' and one of effects of `time preference' . In the first group may be placed the occurrence of stochastic rather than exclusive preference, and the effect of `long density' on long prey acceptance rate . Both would be explained if we assume that pigeons tend to monitor alternative behaviour types when that is not too costly, and do so at an approximately constant rate . `Time preference', or preference for immediacy, clearly describes the dominance of reward delay over reward duration (Fig. 6) and the ineffectiveness of post-reward detention ; more tentatively, it may explain why the pigeons showed a bias towards rejecting the `long' prey . If the shortest delays of reward were given excessive weight, acceptance of the long prey might not become typical until rejecting it could no longer ever produce quicker access to food, i.e . until the centre-key FI exceeded 15 s . Why should optimality fail in these two ways? Possibly what Tolman (1955) called the 'principles of performance', the laws that relate
LEA : FORAGING AND REINFORCEMENT SCHEDULES
response probabilities to reward magnitudes, frequencies, and so forth, are just such as to produce these particular sets of non-optimalities . Analysis along these lines has been pursued with some success for the case of excessive attention to immediacy (Logan 1965a ; Rachlin & Green 1972 ; Ainslie 1975) . However, it is worth noting that both sampling and time preference may be adaptive tendencies . The selective advantage of monitoring the environment is obvious (Shettleworth 1978, has pointed out that monitoring ought to be maintained even if it is consistently punished, while Krebs et al . 1978, consider the optimal amount of time that should be spent sampling) . And preference for immediacy, too, has its advantages . Economists from Bi hm-Bawerk (1891) on have recognized that `time preference proper' might be rational given a finite life span ; similarly we might propose that in the limit, Darwinian fitness is not served by waiting an indefinitely long time for an indefinitely large prey, for faced with that prospect it would be better to leave foraging for food altogether, and perhaps go and procreate . It is therefore possible that these apparent failures of optimality in fact reflect the limits of the single-attribute approach to foraging . This is not inconsistent with the idea that foraging, optimal and otherwise, can be explained by the principles of performance, but perhaps that argument should be turned on its head . Animals' learning capacities seem to divide into two types . There are specific capacities, often very highly developed, that suit particular species for their ecological niches . Examples are song-learning in birds and the learning of routes by migratory species. On the other hand, there are the capacities that are revealed in conditioning experiments . These are usually much less impressive in terms of what can be learned, but they can be mobilized for practically any purpose, and apparently exist in all vertebrates at least (there are of course differences in the rates of learning and the patterns of performance in different species and different situations : see the review by Hogan & Roper 1978) . Foraging, too is a very widespread and general kind of behaviour . It might not be too far fetched to suppose that the laws of operant behaviour have evolved to ensure efficient, if not optimal, foraging . Acknowledgments Results from this experiment were presented to the meetings of the Association for the Study of Animal Behaviour in London, December 1977,
385
and the Experimental Analysis of Behaviour Group in Manchester, April 1978 . Thanks are due for many useful comments made at those meetings . Thanks are also due to Mrs R . M . Kirby for computer programming, to Mr S . J . Webber for help in running the experiment, and to Dr J. Cherfas, Miss S . Dow and Dr T . J. Roper for commenting on the manuscript . REFERENCES Ainslie, G . W . 1974. Impulse control in pigeons . J. exp . Anal. Behav., 21, 485-489 . Ainslie, G . 1975 . Specious reward : a behavioral theory of impulsiveness and impulse control . Psychol . Bull., 82, 463-496 . Baum, W. M . 1974 . On two types of deviation from the matching law : bias and undermatching . J. exp. Anal. Behav ., 22, 231-242. Bohm-Bawerk, E . von 1891 . Capital and Interest (W . Smart, Trans.). New York : Stechert . Charnov, E . L . 1976 . Optimal foraging : attach strategy of a mantid. Am . Nat., 110, 141-151 . Davies, N . B . 1977a. Prey selection and social behaviour in wagtails (Ayes: Motacillidae) . J. Anim. Ecol., 46,37-57. Davies, N. B . 1977b . Prey selection and the search strategy of the spotted flycatcher (Muscicapa striata) : a field study of optimal foraging. Anim . Behav., 25, 1016-1033 . Davison, M . 1969. Preference for mixed-interval versus fixed-interval schedules . J. exp . Anal. Behav ., 12, 247-252. Fantino, E. J . 1967. Preference for mixed- versus fixedratio schedules . J. exp . Anal. Behav., 10, 35-44 . Ferster, C. B . & Skinner, B . F . 1957 . Schedules of Reinforcement . New York : Appleton-CenturyCrofts . Gibb, J . A. 1958 . Predation by tits and squirrels on the encosmid Ernarmonia conicolana (Hey] .). J. Anim. Ecol ., 27, 375-396. Goss-Custard, J . 1977a. Optimal foraging and the size selection of worms by redshank, Tringa totanus, in the field . Anim. Behav ., 25,10-29 . Goss-Custard, J . D. 1977b . The energetics of prey selection by redshank, Tringa totanus (L .), in relation to prey density. J. Anim. Ecol., 46, 1-19 . Herrnstein, R. J . 1961 . Relative and absolute strength of response as a function of frequency of reinforcement . J. exp . Anal. Behav ., 4, 267-272 . Herrnstein, R . J. 1964. Aperiodicity as a factor in choice . J. exp . Anal. Behav., 7, 179-182. Herrnstein, R. J . 1970 . On the law of effect . J. exp . Anal. Behav ., 13, 243-266 . Hogan, J. A . & Roper, T . J . 1978 . A comparison of the properties of different reinforcers . In : Advances in the Study of Behaviour (Ed . by J. S . Rosenblatt, R. A. Hinde, E . Shaw & C . Beer), Vol . 8 . New York : Academic Press . Ivlev, V. A . 1961 . Experimental Ecology of the Feeding of Fishes. New Haven : Yale University Press . Jonckheere, A . R . 1954. A test of significance for the relation between m rankings and k ranked categories . Br . J. statist. Psychol., 7, 93-100. Kear, J . 1962 . Food selection in finches with special reference to interspecific differences. Proc. zool . Soc . Lond., 138, 163-204 .
886
ANIMAL BEHAVIOUR, 27, 3
Killeen, P . 1968 . On the measurement of reinforcement frequency in the study of preference . J. exp . Anal. Behav., 11, 263-269 . Krebs, J. R., Erichsen, J. T ., Webber, M . I. & Charnov, E. L. 1977. Optimal prey selection in the great tit (Panus major). Anim. Behav., 25, 30-38 . Krebs, J . R., Kacelnik, A. & Taylor, P . 1978. Test of optimal sampling in foraging great tits . Nature, Lond., 275, 27-30 . Levanthal, A. M., Morrell, R . F ., Morgan, E . J . & Perkins, C . C . 1959 . The relation between mean reward and mean reinforcement. J. exp. Psychol., 57, 284-287 . Lobb, B. & Davison, M . C . 1975 . Performance on concurrent variable-interval schedules : a systematic replication . J. exp . Anal. Behav ., 24, 191-197 . Logan, F . A. 1965a . Decision-making by rats : delay versus amount of reward . J. comp. physiol. Psychol., 59, 1-12 . Logan, F. A. 1965b . Decision-making by rats : Uncertain outcome choices . J. comp. physiol. Psychol., 59, 246-251 . Mischel, W . & Metzner, R. 1962 . Preference for delayed reward as a function of age, intelligence and length interval. J. abnorm. Soc . Psychol., 64, 425-431 . Myers, D . L . & Myers, L. E . 1977. Undermatching : a reappraisal of performance on concurrent variableinterval schedules of reinforcement . J. exp. Anal . Behav., 27, 203-214. Pubols, B . H . 1962 . Constant versus variable delay of reinforcement. J. comp. physiol. Psychol., 55, 52-56. Rachlin, H. & Green, L. 1972 . Commitment, choice and self-control . J. exp . Anal. Behav., 17, 15-22. Rachlin, H., Green, L ., Kagel, J . H. & Battalio, R . C . 1976. Economic demand theory and psychological studies of choice. In : The Psychology of Learning and Motivation (Ed. by G. H . Bower), Vol . 10 . pp . 129-154 . New York : Academic Press .
Rapport, D . J. 1971 . An optimization model of food selection. Am . Nat., 105, 575-587 . Schoener, T . W . 1971 . Theory of feeding strategies . Ann . Rev. Ecol. Syst., 2, 369-404. Shettleworth, S . J . 1978 . Reinforcement and the organization of behavior in golden hamsters : punishment of three action patterns . Learning and Motivation, 9,99-123 . Shimp, C. P. 1969 . Optimal behavior in free-operant experiments . Psycho!. Rev., 76, 97-112 . Sibly, R. 1975. How incentive and deficit determine feeding tendency . Anim . Behav., 23, 437-446. Smith, C. C. & Follmer, D . 1972 . Food preferences of squirrels. Ecology, 53, 82-91 . Staddon, J . E . R. & Innis, N. K . 1969. Reinforcement omission on fixed-interval schedules. J. exp . Anal. Behav., 12, 689-700 . Surrey, M. J . C. 1974. An Introduction to Econometrics . Oxford : Clarendon Press . Todorov, J. C . 1973. Interaction of frequency and magnitude of reinforcement on concurrent performances. J. exp. Anal. Behav., 19, 451-458 . Tolman, E . C . 1955 . Principles of performance . Psychol. Rev., 62, 315-326. Tullock, G. 1971 . The coal tit as a careful shopper . Am . Nat ., 105, 77-80 . Werner, E . E . & Hall, D . J . 1974. Optimal foraging and the size selection of prey by the bluegill sunfish (Lepomis macrochirus) . Ecology, 55, 1042-1052 . Willson, M . F . 1971 . Seed selection in some North American finches. Condor, 73, 415-429 . Wilson, M. F. & Harmeson, J . C. 1973 . Seed preferences and digestive efficiency of cardinals and song sparrows . Condor, 75, 225-234. (Received 26 June 1978 ; revised 25 October 1978 ; MS. number : 1782)