Comments on “learning the evolutionarily stable strategy”

Comments on “learning the evolutionarily stable strategy”

J. theor. Biol. (1983) 105, 175-178 LETTER Comments TO THE EDITOR on “Learning the Evolutionarily Strategy” Stable Harley (198 1) discusses a ...

189KB Sizes 2 Downloads 100 Views

J. theor. Biol. (1983) 105, 175-178

LETTER

Comments

TO

THE

EDITOR

on “Learning the Evolutionarily Strategy”

Stable

Harley (198 1) discusses a simple rule for learning the evolutionarily stable strategy (ESS). An ESS is defined as a strategy which, if adopted by a population, means that no other strategy can invade the population (Maynard Smith, 1974). Harley introduces the concept of the evolutionarily stable (ES) learning rule, which he defines as a rule “such that all mutant individuals differing only in their learning rule will have a lower than average fitness when considered among a population of individuals who possess the ES learning rule. In other words, the ES learning rule is uninvadable” (p. 612). Some of Harley’s conclusions are discussed here.

Harley’s

Assumptions

(a) The games considered possess ESSs, “all of which are in principle capable of being learnt. . .” (b) “Games are played against the environment or against randomly chosen members of a finite population.” (c) “For each game, the animal has a repertoire of simple behaviours Bi (i = 1,2,. . . , n : n 2 2) which it may display.” (d) “The payoff matrix may vary in time or location, but it does so slowly enough to allow learning rules to establish stable frequencies of behaviours.” (e) “The learning is short compared to the subsequent period of stable behaviours, i.e. the payoffs during the initial phase of learning are negligible compared to the total cumulative payoff in a game. (f) The payoff Pi(r) > 0 that is received when behaviour Bi is displayed on trial t is evaluated in units of fitness. If Bj is not displayed on trial t, then Pi(t) = 0. (g) “The learning rule defines for each game the n probabilities fi(t) of displayed behaviour Bi (i = 1, 2, . . . n) at each trial t as a function of the previous payoffs Pi(7r)(7r
0022-5193/83/210175

+04

$03.00/O

@ 1983 Academic

Press Inc. (London)

Ltd.

176

A.

HOUSTON

The Equilibrium

of the ES Learning Rule

Harley states that the ES learning rule has the following form at equilibrium: t-1 C

n=l fi(t)

Pi(r) t

+

j, (;;

31

(1)

E(r))

where “ + ” is read “asymptotically approaches”. Harley’s proof begins by introducing ti = total number of times behaviour Bi is displayed in t trials. It is then argued that, at equilibrium, and for t sufficiently large fi(t)

+

lilt

i.e. fi (t) tends to the frequency of behaviour i. Thus equation (1) is equivalent to the matching law (Herrnstein, 1970), which says that the relative frequency of a behaviour equals the relative frequency with which it has been rewarded. The fact that matching does not necessarily result in the maximization of reward rate (Herrnstein & Heyman, 1979; Houston & McNamara, 1981; Staddon, Hinson & Kram, 1981) suggests that Harley’s proof cannot be correct. Apart from the two mathematical results fi(t)

+

tilt

and r-1

E[Pi(t)l+ C Pi(rJlt; where E[Pi(t)] is the expected payoff to Bi on trial t, Harley’s proof is only dependent on the following equation:

EIPi(t)l=EIPjCt)l

for all i, j (at equilibrium).

(2)

Harley takes equation (2) from ESS theory, where it refers to the fitness of all coexisting strategies, and requires them to be equal if there is to be an equilibrium over evolutionary time. It is not valid, however, to apply this condition to the equilibrium between behaviours during an animal’s lifetime. What Harley is saying is that, even in games against a constant environment, the optimal equilibrium occurs when the payoffs from all behaviours are equal, rather than when the total payoff is at its maximum. Sometimes these two conditions coincide, but this is not always the case. When they do not coincide, an optimal rule should maximize fitness-this

LETTER

TO

THE

EDITOR

177

will (in games against nature) result in an uninvadable equilibrium over evolutionary time. To put things another way, it is expected that natural selection will equalize the fitness of various phenotypes (Slatkin, 1978) but Harley is claiming that the contributions of various behaviours to a phenotype will also be equal. Rather than belabour this point in abstract, consider the following discrete-trial game that does not have equation (1) as its optimal solution. Let there be two behaviours: Bl= choosing alternative 1, and B2 = choosing alternative 2. (For simplicity, assume that the choices are to be made at regular intervals.) After nine choices of alternative 1, a reward is always obtained, and if alternative 2 is then chosen, a reward will be obtained. These are the only ways in which rewards can be obtained. It is obvious that the uninvadable equilibrium strategy is to make nine responses on alternative 1, followed by one on alternative 2. Thusfi(t) = 0.1 and C Ri/C (Ri +Rz) = O-5 which contradicts equation (1). In terms of expected payoffs (equations (2) and (3)), E(Ri) = l/9 # E(R2) = 1. Of course in this example the payoff to one behaviour depends on the frequency with which the other is chosen, but Harley does not rule out this possibility: indeed the variable-interval example that he discusses has this property. (It could be argued that non-rewarded choices of alternative 1 make a contribution to fitness, in that they enable rewards to be obtained from alternative 2. This is true, but it means that the value of a behaviour cannot be determined without finding the optimal behaviour-equation (1) no longer says anything, because the payoffs are not known. Although this raises interesting questions, I think the counterexample is sufficient, because Harley uses equation (1) with the Pi standing just for rewards.) The Constraint

0
It is Harley’s view that the ES learning rule does not “allow fixation or deletion of behaviours”. (p. 614). In other words, it is required that 0
178

A.

HOUSTON

it may not be possible to say a lot about the ES learning making particular assumptions. University of Cambridge, Departmentof Zoology, Downing Street, CambridgeCB2 3EJ

ALASDAIR

rule without

HOUSTON

(Received9 October 1982) REFERENCES HARLEY, C. B. (1981). J. theor. Biol. 89,611. HERRNSTEIN, R. J. (1970). J. exp. anal. Behau. 13,243. HERRNSTEIN, R. J. & HEYMAN, G. M. (1979). J. exp.anal.Behau. 31,209 HOUSTON, A. I. & MCNAMARA, J. M. (1981). J. exp. anal. Behav. 35,367. MAYNARD SMITH, J. (1974). I. theor. Biol. 47, 209. MAYNARD SMITH, J. (1978). A. Rev. Ecol. Syst. 9,31. SLATKIN, M. (1978). Am. Nat. 112,845. STADDON, J. E. R. (1980). In Limits to Action. (Staddon, J. E. R. ed.) New York: Academic Press. STADDON, J. E. R., HINSON, J. M. & KRAM, R. (1981). J. exp. anal. Behau. 35,397.