Supergames Played by Finite Automata with Finite Costs of Complexity in an Evolutionary Setting

Supergames Played by Finite Automata with Finite Costs of Complexity in an Evolutionary Setting

Journal of Economic Theory  2089 journal of economic theory 68, 266275 (1996) article no. 0015 Supergames Played by Finite Automata with Finite Cos...

230KB Sizes 0 Downloads 38 Views

Journal of Economic Theory  2089 journal of economic theory 68, 266275 (1996) article no. 0015

Supergames Played by Finite Automata with Finite Costs of Complexity in an Evolutionary Setting* David J. Cooper University of Pittsburgh, Pittsburgh, Pennsylvania 15260 Received October 28, 1993; revised October 25, 1994

I examine a model of finite automata playing the repeated prisoners dilemma in an evolutionary setting, similar to that developed by Binmore and Samuelson. The only alteration made to the model is the use of finite costs of complexity, as opposed to lexicographic costs of complexity. The results of Binmore and Samuelson are not robust to this change. Using finite costs of complexity, a folk theorem result is proved in place of the uniqueness result of Binmore and Samuelson. Journal of Economic Literature Classification Number: C70.  1996 Academic Press, Inc.

I. Introduction

File: 642J 208901 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 3423 Signs: 2210 . Length: 50 pic 3 pts, 212 mm

In this paper, I examine a model of finite automata playing the repeated prisoner's dilemma in an evolutionary setting. This model is quite similar to that developed by Binmore and Samuelson [4]. The model is altered only by changing the way in which costs of complexity enter players' preferences. Instead of lexicographic costs of complexity, players have finite costs of complexity. The main result of Binmore and Samuelson is not robust to this change; instead of getting a uniqueness result, I prove a folk theorem. A fruitful approach to equilibrium selection in supergames has been to consider play of the infinitely repeated prisoners' dilemma with timeaverage payoffs by finite automata. Work in this area was pioneered by Rubinstein [8] and Abreu and Rubinstein [1]. 1 Their approach allows * I would like to thank Masaki Aoyagi, Abhijit Banerjee, Al Roth, Ariel Rubinstein, Larry Samuelson, Joel Sobel, and seminar participants at the University of Pittsburgh and the Midwest Mathematical Economics Meeting for their helpful discussions. I am responsible for any errors in this text. 1 See Kalai and Stanford [5], Banks and Sundaram [2], and Ben-Porath [3] for additional approaches to this topic.

266 0022-053196 18.00 Copyright  1996 by Academic Press, Inc. All rights of reproduction in any form reserved.

games with finite complexity costs

267

complexity to play an explicit role in the choice of strategies. Using finite automata, it is shown that the set of equilibrium payoffs is greatly reduced from the entire feasible set as given by the folk theorem. In Abreu and Rubinstein, the existence of players who choose the finite automata to be used makes for a strange mix of bounded and unbounded rationality. Binmore and Samuelson point out that there need not be any such ``metaplayers'' to select automata. Rather, they suggest that it is natural to think of automata as being selected by an evolutionary process. Using modified ESS (MESS) for an equilibrium concept, they show that only utilitarian outcomes will result. 2 Intuitively, suppose a non-utilitarian outcome is reached; types should invade the population which use a ``secret handshake'' to achieve higher payoffs versus themselves. This result suggests that cooperative outcomes should be the norm. I too examine a situation in which finite automata play a supergame. For simplicity, analysis is limited to the repeated prisoners dilemma. Like Binmore and Samuelson, I use players' time-average payoffs to determine their evolutionary fitness, and index complexity by the number of states an automaton possesses. The only way in which I alter their model is that complexity has finite costs, as opposed to lexicographical costs. This change in the model has an enormous effect on the results. Using NSS (a weaker variant of ESS) as the equilibrium concept, a folk theorem result is derived. Consider any payoff level ? between the symmetric Pareto efficient payoff and the single-shot Nash equilibrium payoff. Take any neighborhood of ?. As the cost of complexity goes to zero, there exists a NSS with a payoff within this neighborhood. The results with finite costs of complexity contrast strongly with the results for lexicographic costs. For any finite cost of complexity, there are only a finite number of possible NSS outcomes. As the cost of complexity becomes smaller and smaller, the set of NSS outcomes becomes denser and denser within the interval between the symmetric Pareto efficient payoff and the single-shot Nash equilibrium payoff. In contrast, with lexicographical costs the only stable outcome is the symmetric Pareto efficient payoff. To see how these differing results arise, it is useful to start by comparing the definition of MESS with the definition of NSS used in this paper. In selecting between two possible automata, Binmore and Samuelson consider three levels of criteria. Following their notation, let P(a, b) be the payoff for strategy a versus strategy b. Let |a| and |b| be the sizes of the automata needed to implement, respectively, strategies a and b; size of an automaton is measured by the number of states used. The first two criteria for some strategy a to be a MESS standard from the definition of ESS; P(a, a)>P(b, a) OR P(a, a)=P(b, a)

File: 642J 208902 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 3480 Signs: 3021 . Length: 45 pic 0 pts, 190 mm

2

A utilitarian outcome is one which maximizes the sum of the two player's payoffs.

268

david j. cooper

and P(a, b)>P(b, b) for any b. These criteria can be combined into a single criterion: there exists =*>0 such that for 0<=<=*, (1&=)P(a, a)+ =P(a, b)>(1&=)P(b, a)+=P(b, b) for any b. Complexity only enters via the third criterion; P(a, a)=P(b, a), P(a, b)=P(b, b), and |a|  |b|. 3 In contrast, in the definition of NSS employed below, costs of complexity enter directly into the players' fitnesses. Let k be the cost of each additional state, and let U(a, b)=P(a, b)&k|a|. A strategy a is a NSS if it fulfills the following criterion: there exists =*>0 such that for 0<=<=*, (1&=)U(a, a)+=U(a, b)(1&=)U(b, a)+=U(b, b) for any b. To see why putting the cost of complexity directly into a strategy's fitness changes the results, consider an example with an incumbent a and an invader b. Suppose P(a, a)=P(b, a) and P(a, b) |a|. Using Binmore and Samuelson's formulation, a is not a MESS. Even if b makes up an arbitrarily small portion of the population, its higher payoffs versus itself always dominate its additional costs of complexity. Using the NSS formulation, a is an NSS. With an arbitrarily small portion of the population, the invader b cannot gain enough from playing itself to pay for its added complexity. This rules out the use of ``secret handshake'' strategies which lead to Binmore and Samuelson's main result. The difference between this paper and Binmore and Samuelson can also be given a dynamic interpretation in which differing definitions are interpreted as capturing the relative magnitudes of costs of complexity versus the size of mutations. My conclusions should not be taken to mean that Binmore and Samuelson are somehow ``wrong.'' I am agnostic as to whether it is more valid to think of costs of complexity as being finite or lexicographic. One can think of these approaches as capturing two different ways in which costs of complexity might be incurred. Suppose that costs of complexity are incurred only in building the automaton. As one time costs, they play no role in time average payoffs. Thus, one would employ lexicographic costs of complexity. On the other hand, suppose that the costs of complexity include a cost for maintaining the states each time the automaton is used. This leads to an infinite stream of costs which do play a role in time average payoffs. In this case, one would use finite costs of complexity. Alternatively, the two approaches can be viewed as highlighting two different roles of complexity. Costs of complexity cause unused states to atrophy. They also prevent automata from sprouting new states. These can be thought of as costs of memory and costs of innovation. A treatment with lexicographic costs of complexity emphasizes the role of costs of memory. Unused punishments are eliminated, opening the door for

File: 642J 208903 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 3461 Signs: 2974 . Length: 45 pic 0 pts, 190 mm

3 The weak inequality is needed to prevent trivial non-existence. With a strong inequality, any MESS could be broken by entrants which only differ off the equilibrium path.

games with finite complexity costs

Fig. 1.

269

The prisoner's dilemma.

coordination on the efficient outcome. With finite costs of complexity, costs of innovation come to the fore. Automata can no longer expand to employ the sort of secret handshake strategies which lead to efficiency. The number of possible entrants is limited; necessarily, this expands the set of stable strategies. Rather than arguing that one approach is correct, the point of this paper is that the stability results for models of finite automata playing the infinitely repeated prisoners' dilemma depend sensitively on how costs of complexity are incorporated in the players' fitnesses. The organization of this paper is as follows. In Section II, I lay out the model and notation to be used. In Section III, I derive the folk theorem result using NSS as an equilibrium concept.

II. The Model For simplicity of comparison, I will try to duplicate the notation of Binmore and Samuelson as much as possible. I restrict myself to consideration of the prisoner's dilemma. All of my main ideas are captured by this example. 4 The general version of the prisoners' dilemma I will be using is shown in Fig. 1. Assume $>:>;># and :>#+$. The stage game G is defined as consisting of a pair [S, ?], where S=[A, B] is the strategy set available to both players and ? is the players' payoff function. Let s i be the member of S chosen by player i. Note that play of B by both players is the unique Nash equilibrium. 4

The results can be generalized in the following manner. Take any game which has at least one symmetric Nash equilibrium. Let the payoff from the worst symmetric Nash equilibrium be ? N. Let ?* be the highest possible payoff some strategy achieves versus itself. Consider any ? such that ? N ??* and any =>0. There exists some k$>0 such that, for k
File: 642J 208904 . By:MC . Date:21:12:95 . Time:13:50 LOP8M. V8.0. Page 01:01 Codes: 2783 Signs: 2144 . Length: 45 pic 0 pts, 190 mm

270

david j. cooper

The supergame, G  =[R, P], is constructed from G in the ordinary manner. The payoff function P is a time-average payoff function. Let h t be the history up to and including time t. Let r i (h t ) be the member of S chosen by Player i in period t+1 as a function of h t . 5 P i (r 1 , r 2 )= lim

T

1 T&1 : ?(r 1(h t ), r 2(h t )). T t=0

(1)

As in Binmore and Samuelson, I follow Abreu and Rubinstein in describing the automaton selection game G m . The strategy space for this game consists of the set of finite automata, A, which take on the role of players in G . The finite automaton, or Moore machine, chosen by Player i is denoted by a quadruple [Q 1 , q 1i , * i , + i ). Q i is the finite set of states for the automaton a i . The term q 1i is the initial state: q 1i # Q i . The third member of the quadruple, * i , is a function which gives which strategy will be played given the current state: * i : Q i  S. The final member of the quadruple, + i , is the transition function for the machine. This determines the machine's state in the next period, subject to its current state and the strategy choice of its opponent: + i : Q i_S  Q i . Any two finite automata matched against each other will produce a deterministic sequence of strategy choices and states. Following Abreu and Rubinstein, I refer to the sequence of states as the play and the sequence of actions as the action-play. The complexity of a machine is determined by the number of states it contains; let |a i | be the number of states contained in the finite automaton a i . Any finite automaton can be thought of as implementing some member of R, the set of supergame strategies. Let r(a) be the supergame strategy implemented by automaton a. Let U i (a, b) be Player i's profit for using automaton a while the other player employs automaton b. Let k be the cost of a single state. Players' profits for G m are determined by the following: U i (a i , a j )=P i (r(a i ), r(a j ))&k|a i |.

(2)

For the remainder of the paper, I refer to P i (r(a i ), r(a j )) as Player i's payoff and U(a i , a j ) as Player i's profit. Definitions for ESS and NSS in the automaton selection game follow in the standard fashion (Maynard Smith, [6, 7]). 6 Definition 1. The automaton a # A is an evolutionarily stable strategy (ESS) if for any b # A 5

For the game with finite automata, the limit defined in (1) will always exist due to cycling. The definitions above can be extended to allow for polymorphous populations; this has no effect upon the results. File: 642J 208905 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 3310 Signs: 2420 . Length: 45 pic 0 pts, 190 mm

6

games with finite complexity costs (1)

U(a, a)>U(b, a) OR

(2)

U(a, a)=U(b, a) and U(a, b)>U(b, b).

271

Definition 2. The automaton a # A is a neutral stable strategy (NSS) if for any b # A (1)

U(a, a)>U(b, a) OR

(2)

U(a, a)=U(b, a) and U(a, b)U(b, b)

Note that the set of automata which are ESS is a subset of those which are NSS.

III. A Folk Theorem Result Binmore and Samuelson have shown that if costs of complexity are lexicographic, the only possible stable outcomes are utilitarian. For the prisoners' dilemma, this means both players earn : in a stable outcome. The following theorem shows that this result is radically altered when costs of complexity are finite. Theorem 1. Consider any ? such that ;?: and any =>0. There exists some k$>0 such that, for k
See Appendix.

File: 642J 208906 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 2516 Signs: 1894 . Length: 45 pic 0 pts, 190 mm

The strategy of the proof is fairly mechanistic. I construct an automaton which has a payoff versus itself of (approximately) ? and is a best response to itself. Due to finite costs of complexity, only automata with lesser complexity could enter successfully versus this automaton. The final step is to show that the candidate automaton does strictly better versus itself than any less complex automaton. For example, consider the prisoner's dilemma with :=2, ;=1, #=0, and $=3. Suppose ?=32. This can be supported by the candidate automaton sketched in Fig. 2. The candidate automaton initially plays B for one period. It then alternates between playing B and A. If any deviation from this pattern is observed, it returns to its initial state. The candidate automaton is a best response to itself. Since the costs of complexity are finite, only automata with less than three states can invade. It is easily determined that no such automaton exists. Note that the candidate automaton does not fulfill the second criterion of Definition 1. I have only shown existence of an NSS, not an ESS.

272

david j. cooper

Fig. 2.

Candidate automaton for ?=32.

Appendix A Proof of Theorem. For any given ? and =, let x 1(?, =) and x 2(?, =) be the pair of positive integers which solves the problem of minimizing (x 1 +x 2 ) subject to the restriction given in (3) 7 , x 1 ;+x 2 :

}\ x +x + &? } <=. 1

(3)

2

Let v(?, =) be the smallest integer such that (4) holds, v>

\

x 1 +x 2 x1

$&: . :&;

+\ +

(4)

A candidate automaton to support an appropriate NSS payoff ?$ is now constructed. Q i contains x 1 +x 2 +v states: Q i =[q ji ], j= [1, 2, . . ., x 1 +x 2 +v]. As usual, q 1i is the initial state. The strategy function * i and the transition function + i are defined in (5) as if jv+x 1 if j>v+x 1

(5a)

if s &i =* i (q ij ) and j{v+x 1 +x 2 if s &i =* i (q ij ) and j=v+x 1 +x 2 if s &i {* i (q ij ).

(5b)

* i (q ij )= q ij+1 + i (q ij , s &i )= q v+1 i q 1i

{

{

B A

The candidate automaton starts by playing B for v periods. It then cycles between playing B for x 1 periods and playing A for x 2 periods. If at any point its opponent does not match its own play, it returns to the initial state. 7

Unless there is some ambiguity, the arguments for x 1(?, =) and x 2(?, =) will be suppressed.

File: 642J 208907 . By:MC . Date:21:12:95 . Time:13:50 LOP8M. V8.0. Page 01:01 Codes: 2216 Signs: 1048 . Length: 45 pic 0 pts, 190 mm

games with finite complexity costs

273

Any two finite automata playing each other eventually settle into an infinitely repeated cycle. 8 Since payoffs are determined by time averages, only the infinitely repeated sequence effects the evolutionary fitness of the players. In the limit, the effect of any finite sequence of actions goes to zero. The candidate automaton will settle into a cycle of x 1 periods of (B, B) followed by x 2 periods of (A, A) when it plays versus itself. By definition of x 1 and x 2 , the payoff for the candidate automaton versus itself must satisfy the conditions of the theorem. What remains to be confirmed is that no other automaton can successfully enter versus the candidate automaton. The following lemma is used to prove this. Lemma 2. Setting k=0, the candidate automaton is a weak best response to itself. Proof. In order to be a strict best response to the candidate automaton, an invading automaton must as part of an infinitely repeated cycle induce the candidate automaton to play A and then play B in a period that the candidate plays A. By construction of the candidate automaton, any such cycle must include v+x 1 periods of (B, B), followed by up to x 2 &1 rounds of (A, A), and completed by one round of (B, A). 9 The invader's strategy is listed first, and the candidate's strategy is listed second. I begin by showing that the best such cycle for the invader is to deviate as late as possible. Let y be the number of rounds in which (A, A) is played before a deviation. The following inequality compares the payoff from deviating after y rounds with the payoff from deviating after y+1 rounds of play: (v+x 1 ) ;+y:+$ (v+x 1 ) ;+( y+1) :+$ < . v+x 1 +y+1 v+x 1 +y+2

(6)

By doing some algebra, (6) can be simplified to the following: v+x 1 >

$&: . :&;

(7)

By the definition of v, this relationship must hold. Given that the best possible infinitely repeated cycle for the invader is to deviate as late as possible, it is sufficient to check that the candidate 8

File: 642J 208908 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 3348 Signs: 2525 . Length: 45 pic 0 pts, 190 mm

There are only a finite number of pairs of states which the two automata can reach. Eventually, some pair of states must repeat. Since the play of the game is deterministic, the same sequence of moves must be played as followed the first time this pair of states was reached. Thus, the play will cycle. 9 One could add subsequences to the sequence in which there were x 1 rounds of (B, B) followed by x 2 rounds of (A, A). As such subsequences give the same average payoff as the candidate automaton earns versus itself, these can be deleted without loss of generality.

274

david j. cooper

automaton does better versus itself than an automaton which imitates it for the first (v+x 1 +x 2 &1) rounds, and then deviates. This is captured by the following inequality: (v+x 1 ) ;+(x 2 &1) :+$ x 1 ;+x 2 : < . v+x 1 +x 2 x 1 +x 2

(8)

By doing some algebra, this reduces to the same condition given in (4). By definition of v, this must hold. Thus, the candidate automaton must be a weak best response to itself, ignoring any costs of complexity. Q.E.D. It follows immediately from Lemma 2 that no automaton with more than v+x 1 +x 2 states can successfully invade versus the candidate automaton for any k>0. The candidate always does as well as the invader in terms of payoff, and always beats it on costs of complexity. By Lemma 2, no automaton with less than v+x 1 +x 2 states can be a strict best response to the candidate automaton. Suppose such an invader was a weak best response to the candidate. By the logic used in proving Lemma 2, such an invader would need to be observationally equivalent to the candidate. 10 This requires at least v+x 1 +x 2 states. It follows that any finite automaton with fewer states than the candidate automaton must earn a strictly lower payoff versus the candidate. There must exist some k which is sufficiently small that it also earns a strictly lower profit. Thus, no automaton with fewer than v+x 1 +x 2 states can successfully invade versus the candidate automaton for sufficiently small k. It is easily shown that there exist automata with the same number of states as the candidate which achieve the best response payoff versus the candidate automaton. The candidate automata responds to any move which does not match its own by returning to its initial state. Versus itself, these transitions are never used. Altering these transitions does not alter the payoff of the automaton versus the candidate automaton. These transitions can be considered as off-equilibrium strategies which are never observed, and therefore have no effect upon fitness. Any such automaton must be observationally equivalent to the candidate automaton. Since it has exactly v+x 1 +x 2 states, there are no extra states available beyond those used to mimic the candidate automaton. Thus, any such automaton achieves exactly the same payoff versus itself as versus the candidate automaton. By the second criterion in Definition 2, the candidate automaton cannot be invaded successfully by such an automaton. The theorem is now proved. Q.E.D.

File: 642J 208909 . By:BV . Date:22:01:96 . Time:16:21 LOP8M. V8.0. Page 01:01 Codes: 3224 Signs: 2601 . Length: 45 pic 0 pts, 190 mm

10 In other words, when matched with the candidate automaton, any such automaton produces the same action-play as would be produced by the candidate.

games with finite complexity costs

275

References

File: 642J 208910 . By:BV . Date:05:01:00 . Time:15:26 LOP8M. V8.0. Page 01:01 Codes: 1504 Signs: 991 . Length: 45 pic 0 pts, 190 mm

1. D. Abreu and A. Rubinstein, The structure of Nash equilibrium in repeated games with finite automata, Econometrica 56 (1988), 12591281. 2. J. Banks and R. Sundaram, Repeated games, finite automata, and complexity, Games Econ. Behav. 2 (1990), 97117. 3. E. Ben-Porath, Repeated games with finite automata, J. Econ. Theory 59 (1993), 1732. 4. K. Binmore and L. Samuelson, Evolutionary stability in repeated games played by finite automata, J. Econ. Theory 57 (1992), 278305. 5. E. Kalai and W. Stanford, Finite rationality and interpersonal complexity in repeated games, Econometrica 56 (1988), 397410. 6. J. Maynard Smith, The theory of games and the evolution of animal conflicts, J. Theor. Biol. 47 (1974), 209221. 7. J. Maynard Smith, ``Evolution and the Theory of Games,'' The Cambridge Univ. Press, Cambridge, UK, 1982. 8. A. Rubinstein, Finite automata play the repeated prisoner's dilemma, J. Econ. Theory 39 (1986), 8396.