THEORETICAL
POPULATION
BIOLOGY
18, 44-56 (1980)
Learning to Forage-Optimally? J. G. Culterty
Field
Station,
Newburgh,
OLLASON
Eilon,
Aberdeen&ire,
AB4
OAA
Scotland
Received January 29, 1979
Though there has been remarkably little interest in how animals might achieve the feat, copious evidence has been marhsalled to support the conjecture that animals forage optimally. For a recent review see Pyke et al. (1977). Krebs et al. (1974) investigated the predictions of a hunting by expectation model due to Gibb (1962) and claimed to show that foraging chickadees (Paridae) behave in ways more consistent with the predictions of optimal foraging theory than with those of the hunting by expectation model, a finding which raises the question: Is it generally true that hunting by expectation in patchy environments leads to behaviour that is qualitatively different from optimal foraging ? This paper attempts to provide an answer by presenting a model of hunting by expectation in patchy environments with the following properties: (i) It makes no appeal directly to the theory of evolution for its conception; it is therefore independent of any circular definitions of optimality, and it depends only on aspects of animal behaviour which can be observed now. (ii) It shows all the qualitative properties of Charnov’s (1976) model of optimal foraging in a patchy environment, and it allows the animal to adjust its behaviour in response to a changing environment, where Charnov’s model simply specifies the optimal strategy for a particular set of patches. (iii) The performance of the model depends can be easily estimated experimentally.
on only one parameter
(iv) The model approximates closely to the predictions model for a wide range of values of the single parameter.
which
of the optimal
In a patchy environment, any animal whether it is foraging optimally or not needs to use a rule to decide when to leave the patch at which it is currently feeding. Such a rule might be to compare its feeding rate at the current patch with some kind of weighted average of its own, not necessarily optimal, feeding rate over the environment as a whole, and to stay in the patch as long as it is feeding faster than it remembers doing. A model of foraging incorporating this rule is by definition a model of hunting by expectation because the experience of the animal alone determines the time it will stay at any patch. For the purposes 44 0040-5809/80/040044-13-%02.00/O Copyright All rights
Q 1980 by Academic Press, Inc. of reproduction in any form reserved.
LEARNING
TO
FORAGE--OPTIMALLY
?
35
of the model discussed below it is convenient to distinguish between the memory of the animal, that is, the apparatus the animal uses to remember, and the remembrance of the animal, what the animal remembers at a given time. The memory of the animal may be regarded as a cylindrical vessel with a hole in the bottom, Fig. 1. When the animal is feeding a fluid is caused to flow into this vessel at a rate proportional to the animal’s feeding rate. The fluid flows out at a rate proportional to its height in the vessel whether the animal is feeding or not. The volume of fluid contained in the vessel at any moment is proportional to the amount of food the animal remembers eating, and the rate of change of volume of fluid is proportional to an exponentially smoothed average of the animal’s feeding rate. A single parameter k determines the absolute rate at which fluid flows out of the vessel; this can be regarded as being proportional to the area of the base of the vessel. When k is small the height of a given volume of fluid is large and it flows out of the vessel faster than it would if k were large and the height small. The rate of change of the volume of fluid in the cylinder and, by analogy, the rate of change of remembrance of the animal while it is feeding in a patch may be represented by the following differential equation: dm dF _--;ii-=dt
m k’
(1)
where m(t) is the remembrance of the food eaten as a function of time and dF/dt is the rate of feeding. The parameter k determines the importance of a given feeding event as it recedes in the past: if k is small, the relative importance of recent events is greater in the remembrance than when k is large.
m
II 1mkI FIG. 1. Diagram showing the hydraulic analogue of the memory model. Fluid flows into the memory and leaks away through the hole in the bottom. The rate of input of fluid is dF/dt, the rate of outflow is m/k, and the volume of the fluid corresponding to the remembered food is m.
46
J. G. OLLASON
In order to distinguish between the time spent travelling and the time spent feeding at patches the following notation is used: the time taken to travel between patches from the moment of leaving one patch to the moment of arriving at the next is represented by t, . The time spent at a patch from the moment of arrival to the moment of departure is represented by t, . In discussions of remembrance, at the moment of leaving a patch the variable t is set to zero, and m(O) is defined as the value of m(t,) at the moment of leaving the previous patch and it retains this value until the moment of arrival at the next patch when t is again set to zero and m(O) is redefined at the current value of m(t,), Fig. 2. The animal forages in a patchy environment and stays at a patch as long as it is feeding faster than it remembers doing, as long as dm/dt > 0. On leaving a patch the animal stops feeding and Eq. (1) simplifies to: dm -= dt
--m k’
If the animal searches at random in a patch and eats everything the exploitation equation can be written
it finds,
F(t) = F,(l - exp( -st))
(2)
dF = F,e, exp(-wt), dt
(24
and
where Ft is the total amount of food present in the patch and v is the exploitation rate constant.
m(t) 4
m(o)
:= m(tc) l+Time
feeding-W
m(o) :: mtc1
7~zr-y 4---Time k m(k)
travelling
b
-1 0 m(o):=m(tt) Time
tc m(k) b
FIG. 2. This illustrates the timing convention. The symbol I:=’ means ‘takes the value of’ or ‘becomes’. This convention allows the symbol m(0) to take different values within and between patches, and allows the staying times and travelling times to be treated separately.
LEARNING
Equations (i) (ii)
TO
FORAGE-OPTIMALLY
(2) and (2a) imply the following At a patch the animal
assumptions:
never has the opportunity
Food at a patch is exploited
(1) incorporating
the rate of feeding
this kind of feeding dm = F,a exp( -&) dt
-
l))(exp(--t/k)
of the animal
can be rewritten -
z . k
The properties of the model may be described by the method of Laplace transforms, to give m(t) = (F,kv/(ko
to become satiated.
faster than it can be replaced.
(iii) As the food at a patch is exploited decreases, cf. Charnov (1976). Equation
47
?
by solving
- exp(-ot))
(3) for example
+ m(0) exp(-t/k),
where m(0) is the memory on arriving at a patch. On leaving and redefining m(0) to be equal to m(t,), Eq. (4) simplifies to
the patch at t, ,
m(tJ = m(0) exp(-b/k).
(44
where t, is the time spent travelling between patches, cf. (la). Substituting into (3) equating dm/dt to zero, and solving for t gives
t, = (k/(1 - Ku)) ln(l/ko
+ m(O)/(F,kv)(
(4)
1-
l/kv)),
(4)
(5)
where t, is the time after arrival that the animal leaves the patch. There is no mathematical reason why t, should not take negative values, but since no animal can logically occupy a patch for a negative time, in such cases t, is set to zero. This property is biologically significant because it implies that in an environment containing patches of widely varying types there may be certain patches containing little food or with small exploitation rate constants (F, or u or both small) which though visited and sampled instantaneously will not be exploited because on arrival at such patches the rate of change of remembrance, dm/dt, is less than or equal to zero, implying that t, is also less than or equal to zero. The learning model therefore provides a method not only of determining how long to stay at a patch but also of deciding whether or not the patch is worth exploiting at all; it is thus directly relevant to the theory of food choice and optimal diets because it predicts quantitatively which of the patches visited will be exploited: if on arriving at a patch F,a is less than or equal to m(O)/k the animal will leave it immediately. Notice that the animal leaves the patch when dmjdt < 0; it is monitoring the rate of change of its remembrance and not the magnitude of its remembrance.
48
J. G. OLLASON
The model works as follows: (i) (ii)
m(O) is calculated at the time of arrival at a patch (Eq. (4a)). t, is calculated (Eq. (5)).
(iii) m(t,) is calculated to give the magnitude of the remembrance time of leaving the patch (Eq. (4)).
at the
(iv) The animal leaves the patch. At the same time m(O) is set to m(t,). The animal travels t, time units to the next patch and the process continues. The model learns about its environment and adjusts its staying time according to Ft , v, and t, . Given a series of patches of food of the same type, Ft , v, and t, constant, the model learns to stay at each patch a constant time. The results shown in Figs. 3-6 were produced by presenting the model with such series of patches. If Ft alone is increased or decreased t, is increased or decreased correspondingly at first but it returns to same constant value, then t, is independent of Ft (Fig. 3), but this is a special case, see the discussion and Eq. (8) below. If v alone is increased, the equilibrium value of t, is decreased (Fig. 4) (the faster food can be removed from the patch the shorter the time the animal will stay there). If tt is increased, t, rises to another stationary value (Fig. 5 and Table I), cf. Cowie (1977). The stationary value to which t, converges increases with increasing K but the time taken to converge by t, after perturbation also increases with increasing k, Fig. 6. The memory model may be tested in the following way: Let I be the memory at t, when t, is stationary for given constant and t, . Then on arrival at a patch, from Eq. (4a), m(O) = M(t,) exp(-@).
Ft , v, (6)
70-
60-
o-o-o ~O’o-o-o-o
SW 0 0
o~o-o-o
‘c 50-
0’
40-
30-
II 0
2oJ~200J 1
1000
1
1200
,
Ft FIG. 3. The staying time t0 is plotted against Ft the total amount of food in the patch. k = 100, tt = 100, o = 0.02. The model adjusts to at consecutive patches after reaching equilibrium with Ft = 200 when it encounters patches with Ft = 1000. It reaches equilibrium and then readjusts t, when it encounters patches with Ft = 200.
LEARNING
TO
FORAGE--OPTIMALLY
FIG. 4. The equilibrium staying time constant. F1 = 1000, k = 100, tl = 100.
, 2
1
FIG.
1000,
5. The equilibrium k = 100, o = 0.02.
Substituting %tc)
=
, 4
staying
I 8
t, piotted
I 32 tt
16
time
t, plotted
(6) into (4) and simplifying
((F,kv)/(k~
-
I))(exp(
--r,/k)
-
against
1 64
49
?
21 the
I
I
1
/
128
256
512
f024
against
exploitation
tl the travelling
rate
time.
F, =
gives exp(--wot,))/(l
-
exp(-(t,
t-
t,)/k)).
(7) Substituting
(7) into (5) gives
t, = (k/( I - kv)) h(( 1/kv)( 1 -
exp(
-(%
+
t,/k)))l(
1 -
exp(-(t,
+
ttlk)))). (8)
Notice that F, has cancelled out of this equation; this justifies the statement above that in the special case of feeding at a repeated series of patches of the same type, t, converges to a value independent of F, , cf. Eq. (10).
50
J.
G. OLLASON
-0
4oo-
\ 1
tc 300
"\ "\
0 :\
O\ 9 0, k = 3200
2ooI
"\ -0
o'"~o.o_o_~o~ O\O
-\ loo-
“\
40',
‘0% '0 k i 800 --O,o -o-o-
Go-k
o-o-o-o.
: *co o-o-o-o-o-o+o-o.
--o,o~
.O
-o-o-o-o-o-o-o .o-o-o-o-o-o-o
-k
wt
, , , , , , , , , , , , , , , , , , , 1 5 10 15 20 No.
of
patch%
visited
FIG. 6. This shows the convergence of t, to its stationary value after the animal has been starved until m(O) is zero and is then placed into a repeating series of patches with the following parameters: Ft = 6.538, z) = 0.0081, tt = 20. When k is small not only does t, converge in a smaller number of patches, but because the animal spends less time at each patch, the time to converge is much less than when k is large.
The only unknown in (8) is k. This gives the clue to the experiment which will test the model: for any given seriesof patches with Ft , w, and t, constant, the animal, using this model, will eventually cometo stay at eachpatch for a constant equilibrium staying time, t, ; when this value has been obtained, k can be estimated numerically from Eq. (8). If k is not constant for all equilibrium valuesof t, generated by particular values of Ft , w, and t, , then the model is invalid. The approximate equilibrium values oft, with their optimal values for different travelling times and different values of k, using Cowie’s (1977) data on Great Tits, are shown in Table I, the entries of which were calculated by presenting the model with a series of similar patches separated by constant travelling times, using Eqs. (4) (4a), and (5) repetitively until the change in t, , the the staying time, in consecutive patches was lessthan 0.01 time units. Table I suggeststhat the stationary value of t, converges to its optimal value as k increases. The validity of this suggestionmay be proved asfollows: Equation (8) can be rewritten expklk Taking
-
W limits
=
(I
-
exp(-(vf,
+
t,/k)))/((l
-
exp(-(tc
+
Q/k))
kw).
(9)
LEARNING
TO
FORAGE-OPTIMALLY
TABLE Comparison
of the
Performance
51
?
I
of the Learning Model Strategy in Great Tits”
with
the
Optimal
Foraging
tc 4 8 12 16 20 24 28 32 36
100
200
400
800
1600
3200
6400
Optimal
28.50 38.68 45.88 51.54 56.24 60.26 63.75 66.86 69.63
29.38 40.22 48.16 54.55 59.97 64.67 68.86 72.64 76.07
29.81 40.99 49.36 56.16 61.97 67.09 71.69 75.85 76.69
30.15 41.33 49.93 56.96 63.00 68.33 73.14 77.53 81.58
30.50 41.39 50.12 57.28 63.45 68.90 73.83 78.34 82.51
30.99 41.18 50.03 57.28 63.53 69.07 74.07 78.64 82.88
31.92 40.57 49.60 56.96 63.29 68.89 73.94 78.58 82.86
30.15 41.93 50.71 57.95 64.20 69.74 74.77 79.38 83.66
IO
18
35
60
104
179
297
a The equilibrium values of t, were obtained by training the memory with a series of repeating patches, F, = 6.538, ZJ = 0.0081, (data from Cowie (1977)) and separated by a constant t, travelling time. The training was repeated until t,, , defined as the staying time at the nth patch visited, lay in the range tcnel + 0.01. The optimal values were calculated using the cost free model, cf. Cowie’s tangent model. A measure of the speed of convergence of t, to its equilibrium was obtained for each value of k by allowing the model to equilibrate with tt = 4. It was then allowed to equilibrate with t, == 8, then from tt = 4 to tt = 12 and so on to 36. The average number of patches visited as t, converged, n, is shown in the bottom row. A large increase in k makes ven little difference to t, but it delays the convergence to equilibrium.
since the Maclaurin expansion of exp(-((t, and in the limit (9) can be written w exp(--vt,)
+ t,)/k)
= (1 - exp(-wtJ)/(t,
is 1 - (t, + t,)/k + ...
+ tt).
Equation (10) also defines t, as the optimal staying time in Charnov’s sense provided costs of foraging and travelling are ignored, using Eq. (2) to define the process of exhaustion of the single type of patch. The strict interpretation of this finding is: As k, the decay constant of the memory, increases, the stationary value of t, converges to that value which maximises the average rate of extraction of energy from the environment as a whole. The learning model can never be strictly optimal for k < co. Therefore if the animal is learning about the environment at a significant rate it can at best be foraging only approximately optimally and conversely if it is foraging truly optimally it cannot be learning about the environment. See Fig. 6 and Table I. Numerical evidence (Table II) shows that in an environment containing up to eight types of patch mixed randomly together, randomly spaced apart, tci , the average time spent at the ith type of patch, also converges to its optimal value for all i types of patch as k increases, when the model is close to the steady
600
600
400
400
124
133
130
127
1000
800 800
0.05
0.02
0.05
0.02
0.02 0.05
0.05
1000
110
116 132
v
0.02
Type
29.44
42.08
31.65
48.16
52.59 33.42
34.19
0.44
1.33
0.31
1.01
0.75 0.21
0.22
0.46
SE
= 300 = 674
k = 80
55.43
Mean
m(0) m(f)
33.56
41.22
41.08
59.23
71.60 46.06
49.20
80.93
Mean
m(O) m(f)
= =
0.26
0.72
0.26
0.60
0.65 0.20
0.21
0.57
SE
3000 3589
k = 800
33.94
39.70
42.15
59.50
73.85 47.75
52.04
84.83
0.08
0.17
0.08
0.18
0.21 0.08
0.07
0.18
SE
= 30000 = 28872
of Patch,
33.80
38.77
41.94
59.09
73.35 47.64
52.13
84.52
0.02
0.05
0.02
0.05
0.06 0.02
0.02
0.46
SE
= 300000 = 291437
k=80000
Type
Mean
m(0) m(f)
Time in the ith Values of k”
k=8000
Mean
m(0) m(f)
the Mean Values of the Observed Staying Times, tCt , the Staying of Patch Estimated over a Sample of 1000 patches, for Four
Patch
F
Between
1000
128
patches of each type
No.
Relationship
II for
34.12
39.49
42.23
59.77
74.15 47.99
52.45
85.31
1
Eight
bpt
33.70
38.42
41.80
58.70
73.08 47.56
52.02
84.24
2
Types
o The different patches occur in the environment equiprobably, pi = 0.125, where pi is the probability of occurrence of the ith type of patch. They are separated from each other by travelling times which are rectangularly distributed with a mean of 100 time units. The number of patches of each type in the sample is shown in the first column and the observed mean travelling time among the patches of the sample is 101.98 time units. Two sets of optima have been calculated: column 1 gives the optima calculated as though the sample exactly represented the environment; column 2 gives the optima using the theoretical values of pi and t.
Total
The
TABLE
’
g 8::
0
T
w
LEARNING TO FORAGE-OPTIMALLY?
53
state. (The optimal values of tci were calculated numerically using the “star” method describedby Acton (1973, pp. 451 and 452).) Some of the discrepancies between the observed mean values of tci and their correspondingoptima can be accounted for by the fact that the values of the memoriesat the starts of the experiments, m(O), are in some casesrather different from the corresponding valuesof m(f), the memoriesat the endingsof the experiments. This impliesthat the model had not quite reached the steady state at the start of someof the experiments. For a wide range of values of k it will be difficult to distinguish statistically between the behaviour of the learning model and of the cost-free optimal model: there are no qualitative differences between their predictions and therefore the qualitative criteria used by Krebs et al. (1974) to distinguish between optimal foraging and hunting by expectation are not generally valid. The learning model can also explain Cowie’s findings. Table I shows the optimal values of t, for Great Tits computed from Cowie’s data. In most cases the observed values of staying times in Cowie’sexperiments are greater than the equilibria predicted by both the learning model and the simplified cost-free optimal model. Cowie accounts for these discrepancies by relating the net rate of energy intake from the patch to the net rate of intake of energy from the environment as a whole, taking account of costs of travelling and foraging, reverting to Chamov’s original marginal value equation, but the departures of t, from the equilibria predicted by the learning model can be accounted for by the fact that the Great Tits were starved for 2 h. before the start of the experiments. This may be interpreted as a travelling time of at least 7200 set before the animal arrives at the first patch. At the beginning of the experiment t, will be likely to be much greater than its equilibrium for a shorter travelling time in the range 4-36 set, cf. Fig. 3, while at the end of each experiment only 10 min later, t, may not yet have declined to equilibrium. Consequently the mean of the observed t, is likely to be higher than its equilibrium value for a given t, . -41~0the speedwith which the staying time converged implies that k must have been small and that the equilibrium value of t, would be rather less than its optimum (Fig. 6).
CONCLUSION
Perhapsthe most important conclusion to be drawn from the analysisof the learning model is that to forage in a patchy environment in a way that approximates closely to optimality, an animal need not be omniscient; it doesnot need to sample; it doesnot need to perform numerical analysesto find the maxima of functions of many variables; all it needsto do is to remember and to leave each patch if it is not feeding asfast as it remembersdoing. The fact that both the optimal model and the learning models account for the observations Cowie made indicates that the Great Tits may be foraging
54
J.
G. OLLASON
optimally or they may be learning to forage. Cowie shows that the behaviour of the Great Tits does not depart significantly from optima@. He implies therefore that the departure of his observations from the predictions of the optimal model are due to stochasticity in the experimental system and experimental error. The learning model suggests that any measurable departure from optimality may be due to the animals foraging using this model, as well as to the other sources of error. It is therefore meaningless to write in this context about insignificant departures from the optimal behaviour since any departure may be due to experimental error, or to hunting by expectation, or to both. In contrast to the optimal model, the learning model makes quantitative predictions about the degree of departure from optimality that can be expected in the behaviour of animals that are learning about their environments, and provides a method of quantifying the animal’s view of its own environment whereas the optimal model simply allows the comparison of what an observer imagines that the animal ought to perceive of its environment with what its behaviour implies about its perceptions. Although the predictions of the optimal model and of the learning model can be qualitatively and quantitatively similar, the models describe foraging behaviour from very different philosophical positions. Even defenders of the optimality paradigm admit the hypothesis that animals optimize in some way is unfalsifiable (Maynard Smith, 1978); nevertheless investigation of such hypotheses is held to give insight into animal behaviour, insight which is obtained, presumably, by using some scheme of the type illustrated in Fig. 7. Because nobody can be certain a priori what function will be optimized, path B can be avoided and quantitative predictions can be obtained by refining the definition of optimal behaviour in the light of experience so that it approximates more and more closely to the observed behaviour of the animal; but this is an ad hoc process which is ultimately unfalsifiable because it is teleological ,inferring the animal’s goal from its observed behaviour. Such descriptions are less useful than mechanistic ones because while mechanistic models can be interpreted teleologically, teleological descriptions cannot necessarily be interpreted mechanistically. For example, it is possible, in retrospect, to interpret the learning model within the paradigm of optimal foraging theory: if k, the decay constant of the memory, is regarded as defining a kind of time horizon, then for ~n_y value of k the model defines optimal foraging with respect to the universe as the animal perceives it within the time horizon. But conversely, given that an animal appears to be foraging optimally, this fact alone cannot imply any mechanismby which it doesso. Furthermore, of two mechanisticdescriptionsof a system, one can be said to be better than the other if it describesa wider range of the properties of the systemthan the other does.No such objective distinctions can be made between teleological descriptions since the goals of the system, which are necessarily occult in a living system, have to be inferred from its observable properties, and an arbitrary distinction has to be made between
LEARNING
TO FORAGE-OPTIMALLY
?
__-
I No _r _.--.
not
or
55
Hypothesis: they
have
evolved
I
to do so’ .A
do
FIG. 7. of foraging
Flow diagram behaviour.
showing
the alternative
pathways
----_
Animals do not forage optimally because they have not yet evolved
in a study
of the optimality
those properties that are necessary for the pursuit of the goal and those that are merely contingent upon this. The Panglossian solution to this problem is to assume natural selection has optimized every living system. The optimists say they do not believe this; they say they are using the optimality paradigm to look for “insight” (Pyke et al., 1977) or to “understand the diversity of life” (Maynard Smith, 1978); but the majority seemto conduct their researchas if they do. The learning model described in this paper is consistentwith someobserved foraging behaviour; it is falsifiable; and it is an example of the mechanistic approach to foraging behaviour that escapesfrom the metaphysical processof teleological speculation that is the basisof optimal foraging theory. ACKNOWLEDGMENT I have discussed the ideas contained in this paper with many people: all of them but especially Dr. J. R. Krebs whose work first stimulated interest in this field.
I wish to thank me to take an
56
J. G. OLLASON REFERENCES
F. S. 1973. “Numerical Methods That Work,” Harper & Row, New York. CHARNOV, E. L. 1976. Optimal foraging; the marginal value theorem. Theoret. Pop. Biol. 9, 129-136. Cowrg, R. J. 1977. Optimal foraging in great tits (Parus major). Nffture 268, 137-139. GIBB. J. A. 1962. L. Tinbergen’s hypothesis of the role of specific search images. Ibis 104, 106-111. KREBS, J. R., RYAN, J. C., AND CHARNOV, E. L. 1974. Hunting by expectation or optimal foraging? A study of patch use by chickadees. Animal Behav. 22, 953-964. KREBS, J. R., KACELNIK, A., AND TAYLOR, P. 1978. Test of optimal sampling by foraging great tits. Nature 275, 27-31. MAYNARD SMITH, J. 1978. Optimization theory in evolution. Ann. Rev. Syst. Ecol. 9, 31-56. PYKE, G. H., PULLIAM, H. R., AND CHARNOV, E. L. 1977. Optimal foraging: a selective review of theory and tests. Quart. Rev. Biol. 52, 137-153. ACTON,