|a|. Using Binmore and Samuelson's formulation, a is not a MESS. Even if b makes up an arbitrarily small portion of the population, its higher payoffs versus itself always dominate its additional costs of complexity. Using the NSS formulation, a is an NSS. With an arbitrarily small portion of the population, the invader b cannot gain enough from playing itself to pay for its added complexity. This rules out the use of ``secret handshake'' strategies which lead to Binmore and Samuelson's main result. The difference between this paper and Binmore and Samuelson can also be given a dynamic interpretation in which differing definitions are interpreted as capturing the relative magnitudes of costs of complexity versus the size of mutations. My conclusions should not be taken to mean that Binmore and Samuelson are somehow ``wrong.'' I am agnostic as to whether it is more valid to think of costs of complexity as being finite or lexicographic. One can think of these approaches as capturing two different ways in which costs of complexity might be incurred. Suppose that costs of complexity are incurred only in building the automaton. As one time costs, they play no role in time average payoffs. Thus, one would employ lexicographic costs of complexity. On the other hand, suppose that the costs of complexity include a cost for maintaining the states each time the automaton is used. This leads to an infinite stream of costs which do play a role in time average payoffs. In this case, one would use finite costs of complexity. Alternatively, the two approaches can be viewed as highlighting two different roles of complexity. Costs of complexity cause unused states to atrophy. They also prevent automata from sprouting new states. These can be thought of as costs of memory and costs of innovation. A treatment with lexicographic costs of complexity emphasizes the role of costs of memory. Unused punishments are eliminated, opening the door for
File: 642J 208903 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 3461 Signs: 2974 . Length: 45 pic 0 pts, 190 mm
3 The weak inequality is needed to prevent trivial non-existence. With a strong inequality, any MESS could be broken by entrants which only differ off the equilibrium path.
games with finite complexity costs
Fig. 1.
269
The prisoner's dilemma.
coordination on the efficient outcome. With finite costs of complexity, costs of innovation come to the fore. Automata can no longer expand to employ the sort of secret handshake strategies which lead to efficiency. The number of possible entrants is limited; necessarily, this expands the set of stable strategies. Rather than arguing that one approach is correct, the point of this paper is that the stability results for models of finite automata playing the infinitely repeated prisoners' dilemma depend sensitively on how costs of complexity are incorporated in the players' fitnesses. The organization of this paper is as follows. In Section II, I lay out the model and notation to be used. In Section III, I derive the folk theorem result using NSS as an equilibrium concept.
II. The Model For simplicity of comparison, I will try to duplicate the notation of Binmore and Samuelson as much as possible. I restrict myself to consideration of the prisoner's dilemma. All of my main ideas are captured by this example. 4 The general version of the prisoners' dilemma I will be using is shown in Fig. 1. Assume $>:>;># and :>#+$. The stage game G is defined as consisting of a pair [S, ?], where S=[A, B] is the strategy set available to both players and ? is the players' payoff function. Let s i be the member of S chosen by player i. Note that play of B by both players is the unique Nash equilibrium. 4
The results can be generalized in the following manner. Take any game which has at least one symmetric Nash equilibrium. Let the payoff from the worst symmetric Nash equilibrium be ? N. Let ?* be the highest possible payoff some strategy achieves versus itself. Consider any ? such that ? N ??* and any =>0. There exists some k$>0 such that, for k
File: 642J 208904 . By:MC . Date:21:12:95 . Time:13:50 LOP8M. V8.0. Page 01:01 Codes: 2783 Signs: 2144 . Length: 45 pic 0 pts, 190 mm
270
david j. cooper
The supergame, G =[R, P], is constructed from G in the ordinary manner. The payoff function P is a time-average payoff function. Let h t be the history up to and including time t. Let r i (h t ) be the member of S chosen by Player i in period t+1 as a function of h t . 5 P i (r 1 , r 2 )= lim
T
1 T&1 : ?(r 1(h t ), r 2(h t )). T t=0
(1)
As in Binmore and Samuelson, I follow Abreu and Rubinstein in describing the automaton selection game G m . The strategy space for this game consists of the set of finite automata, A, which take on the role of players in G . The finite automaton, or Moore machine, chosen by Player i is denoted by a quadruple [Q 1 , q 1i , * i , + i ). Q i is the finite set of states for the automaton a i . The term q 1i is the initial state: q 1i # Q i . The third member of the quadruple, * i , is a function which gives which strategy will be played given the current state: * i : Q i S. The final member of the quadruple, + i , is the transition function for the machine. This determines the machine's state in the next period, subject to its current state and the strategy choice of its opponent: + i : Q i_S Q i . Any two finite automata matched against each other will produce a deterministic sequence of strategy choices and states. Following Abreu and Rubinstein, I refer to the sequence of states as the play and the sequence of actions as the action-play. The complexity of a machine is determined by the number of states it contains; let |a i | be the number of states contained in the finite automaton a i . Any finite automaton can be thought of as implementing some member of R, the set of supergame strategies. Let r(a) be the supergame strategy implemented by automaton a. Let U i (a, b) be Player i's profit for using automaton a while the other player employs automaton b. Let k be the cost of a single state. Players' profits for G m are determined by the following: U i (a i , a j )=P i (r(a i ), r(a j ))&k|a i |.
(2)
For the remainder of the paper, I refer to P i (r(a i ), r(a j )) as Player i's payoff and U(a i , a j ) as Player i's profit. Definitions for ESS and NSS in the automaton selection game follow in the standard fashion (Maynard Smith, [6, 7]). 6 Definition 1. The automaton a # A is an evolutionarily stable strategy (ESS) if for any b # A 5
For the game with finite automata, the limit defined in (1) will always exist due to cycling. The definitions above can be extended to allow for polymorphous populations; this has no effect upon the results. File: 642J 208905 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 3310 Signs: 2420 . Length: 45 pic 0 pts, 190 mm
6
games with finite complexity costs (1)
U(a, a)>U(b, a) OR
(2)
U(a, a)=U(b, a) and U(a, b)>U(b, b).
271
Definition 2. The automaton a # A is a neutral stable strategy (NSS) if for any b # A (1)
U(a, a)>U(b, a) OR
(2)
U(a, a)=U(b, a) and U(a, b)U(b, b)
Note that the set of automata which are ESS is a subset of those which are NSS.
III. A Folk Theorem Result Binmore and Samuelson have shown that if costs of complexity are lexicographic, the only possible stable outcomes are utilitarian. For the prisoners' dilemma, this means both players earn : in a stable outcome. The following theorem shows that this result is radically altered when costs of complexity are finite. Theorem 1. Consider any ? such that ;?: and any =>0. There exists some k$>0 such that, for k
See Appendix.
File: 642J 208906 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 2516 Signs: 1894 . Length: 45 pic 0 pts, 190 mm
The strategy of the proof is fairly mechanistic. I construct an automaton which has a payoff versus itself of (approximately) ? and is a best response to itself. Due to finite costs of complexity, only automata with lesser complexity could enter successfully versus this automaton. The final step is to show that the candidate automaton does strictly better versus itself than any less complex automaton. For example, consider the prisoner's dilemma with :=2, ;=1, #=0, and $=3. Suppose ?=32. This can be supported by the candidate automaton sketched in Fig. 2. The candidate automaton initially plays B for one period. It then alternates between playing B and A. If any deviation from this pattern is observed, it returns to its initial state. The candidate automaton is a best response to itself. Since the costs of complexity are finite, only automata with less than three states can invade. It is easily determined that no such automaton exists. Note that the candidate automaton does not fulfill the second criterion of Definition 1. I have only shown existence of an NSS, not an ESS.
272
david j. cooper
Fig. 2.
Candidate automaton for ?=32.
Appendix A Proof of Theorem. For any given ? and =, let x 1(?, =) and x 2(?, =) be the pair of positive integers which solves the problem of minimizing (x 1 +x 2 ) subject to the restriction given in (3) 7 , x 1 ;+x 2 :
}\ x +x + &? } <=. 1
(3)
2
Let v(?, =) be the smallest integer such that (4) holds, v>
\
x 1 +x 2 x1
$&: . :&;
+\ +
(4)
A candidate automaton to support an appropriate NSS payoff ?$ is now constructed. Q i contains x 1 +x 2 +v states: Q i =[q ji ], j= [1, 2, . . ., x 1 +x 2 +v]. As usual, q 1i is the initial state. The strategy function * i and the transition function + i are defined in (5) as if jv+x 1 if j>v+x 1
(5a)
if s &i =* i (q ij ) and j{v+x 1 +x 2 if s &i =* i (q ij ) and j=v+x 1 +x 2 if s &i {* i (q ij ).
(5b)
* i (q ij )= q ij+1 + i (q ij , s &i )= q v+1 i q 1i
{
{
B A
The candidate automaton starts by playing B for v periods. It then cycles between playing B for x 1 periods and playing A for x 2 periods. If at any point its opponent does not match its own play, it returns to the initial state. 7
Unless there is some ambiguity, the arguments for x 1(?, =) and x 2(?, =) will be suppressed.
File: 642J 208907 . By:MC . Date:21:12:95 . Time:13:50 LOP8M. V8.0. Page 01:01 Codes: 2216 Signs: 1048 . Length: 45 pic 0 pts, 190 mm
games with finite complexity costs
273
Any two finite automata playing each other eventually settle into an infinitely repeated cycle. 8 Since payoffs are determined by time averages, only the infinitely repeated sequence effects the evolutionary fitness of the players. In the limit, the effect of any finite sequence of actions goes to zero. The candidate automaton will settle into a cycle of x 1 periods of (B, B) followed by x 2 periods of (A, A) when it plays versus itself. By definition of x 1 and x 2 , the payoff for the candidate automaton versus itself must satisfy the conditions of the theorem. What remains to be confirmed is that no other automaton can successfully enter versus the candidate automaton. The following lemma is used to prove this. Lemma 2. Setting k=0, the candidate automaton is a weak best response to itself. Proof. In order to be a strict best response to the candidate automaton, an invading automaton must as part of an infinitely repeated cycle induce the candidate automaton to play A and then play B in a period that the candidate plays A. By construction of the candidate automaton, any such cycle must include v+x 1 periods of (B, B), followed by up to x 2 &1 rounds of (A, A), and completed by one round of (B, A). 9 The invader's strategy is listed first, and the candidate's strategy is listed second. I begin by showing that the best such cycle for the invader is to deviate as late as possible. Let y be the number of rounds in which (A, A) is played before a deviation. The following inequality compares the payoff from deviating after y rounds with the payoff from deviating after y+1 rounds of play: (v+x 1 ) ;+y:+$ (v+x 1 ) ;+( y+1) :+$ < . v+x 1 +y+1 v+x 1 +y+2
(6)
By doing some algebra, (6) can be simplified to the following: v+x 1 >
$&: . :&;
(7)
By the definition of v, this relationship must hold. Given that the best possible infinitely repeated cycle for the invader is to deviate as late as possible, it is sufficient to check that the candidate 8
File: 642J 208908 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 3348 Signs: 2525 . Length: 45 pic 0 pts, 190 mm
There are only a finite number of pairs of states which the two automata can reach. Eventually, some pair of states must repeat. Since the play of the game is deterministic, the same sequence of moves must be played as followed the first time this pair of states was reached. Thus, the play will cycle. 9 One could add subsequences to the sequence in which there were x 1 rounds of (B, B) followed by x 2 rounds of (A, A). As such subsequences give the same average payoff as the candidate automaton earns versus itself, these can be deleted without loss of generality.
274
david j. cooper
automaton does better versus itself than an automaton which imitates it for the first (v+x 1 +x 2 &1) rounds, and then deviates. This is captured by the following inequality: (v+x 1 ) ;+(x 2 &1) :+$ x 1 ;+x 2 : < . v+x 1 +x 2 x 1 +x 2
(8)
By doing some algebra, this reduces to the same condition given in (4). By definition of v, this must hold. Thus, the candidate automaton must be a weak best response to itself, ignoring any costs of complexity. Q.E.D. It follows immediately from Lemma 2 that no automaton with more than v+x 1 +x 2 states can successfully invade versus the candidate automaton for any k>0. The candidate always does as well as the invader in terms of payoff, and always beats it on costs of complexity. By Lemma 2, no automaton with less than v+x 1 +x 2 states can be a strict best response to the candidate automaton. Suppose such an invader was a weak best response to the candidate. By the logic used in proving Lemma 2, such an invader would need to be observationally equivalent to the candidate. 10 This requires at least v+x 1 +x 2 states. It follows that any finite automaton with fewer states than the candidate automaton must earn a strictly lower payoff versus the candidate. There must exist some k which is sufficiently small that it also earns a strictly lower profit. Thus, no automaton with fewer than v+x 1 +x 2 states can successfully invade versus the candidate automaton for sufficiently small k. It is easily shown that there exist automata with the same number of states as the candidate which achieve the best response payoff versus the candidate automaton. The candidate automata responds to any move which does not match its own by returning to its initial state. Versus itself, these transitions are never used. Altering these transitions does not alter the payoff of the automaton versus the candidate automaton. These transitions can be considered as off-equilibrium strategies which are never observed, and therefore have no effect upon fitness. Any such automaton must be observationally equivalent to the candidate automaton. Since it has exactly v+x 1 +x 2 states, there are no extra states available beyond those used to mimic the candidate automaton. Thus, any such automaton achieves exactly the same payoff versus itself as versus the candidate automaton. By the second criterion in Definition 2, the candidate automaton cannot be invaded successfully by such an automaton. The theorem is now proved. Q.E.D.
File: 642J 208909 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 3224 Signs: 2601 . Length: 45 pic 0 pts, 190 mm
10 In other words, when matched with the candidate automaton, any such automaton produces the same action-play as would be produced by the candidate.
games with finite complexity costs
275
References
File: 642J 208910 . By:BV . Date:05:01:00 . Time:15:26 LOP8M. V8.0. Page 01:01 Codes: 1504 Signs: 991 . Length: 45 pic 0 pts, 190 mm
1. D. Abreu and A. Rubinstein, The structure of Nash equilibrium in repeated games with finite automata, Econometrica 56 (1988), 12591281. 2. J. Banks and R. Sundaram, Repeated games, finite automata, and complexity, Games Econ. Behav. 2 (1990), 97117. 3. E. Ben-Porath, Repeated games with finite automata, J. Econ. Theory 59 (1993), 1732. 4. K. Binmore and L. Samuelson, Evolutionary stability in repeated games played by finite automata, J. Econ. Theory 57 (1992), 278305. 5. E. Kalai and W. Stanford, Finite rationality and interpersonal complexity in repeated games, Econometrica 56 (1988), 397410. 6. J. Maynard Smith, The theory of games and the evolution of animal conflicts, J. Theor. Biol. 47 (1974), 209221. 7. J. Maynard Smith, ``Evolution and the Theory of Games,'' The Cambridge Univ. Press, Cambridge, UK, 1982. 8. A. Rubinstein, Finite automata play the repeated prisoner's dilemma, J. Econ. Theory 39 (1986), 8396.