Imitation dynamics in the repeated Prisoners’ Dilemma: an exploratory example

Imitation dynamics in the repeated Prisoners’ Dilemma: an exploratory example

Journal of Economic Behavior & Organization Vol. 40 (1999) 81–104 Imitation dynamics in the repeated Prisoners’ Dilemma: an exploratory example Chris...

190KB Sizes 0 Downloads 105 Views

Journal of Economic Behavior & Organization Vol. 40 (1999) 81–104

Imitation dynamics in the repeated Prisoners’ Dilemma: an exploratory example Christopher S. Ruebeck Department of Economics, The Johns Hopkins University, 3400 N. Charles, St. Baltimore, MD 21218, USA Received 16 June 1997; received in revised form 12 January 1999; accepted 27 January 1999

Abstract This paper investigates a deterministic evolutionary process governing the adoption of strategies for playing the Repeated Prisoners’ Dilemma (RPD). Agents playing unsuccessful strategies attempt to imitate the strategies of successful agents. Because agents’ strategies are unobservable, they must be inferred from a memory of pairwise play and a knowledge of the strategy space. As a result, winning strategies can be confused with other, inferior strategies, and this imperfect imitation can enhance the growth (or slow the decline) of under-performing strategies. In contrast to results obtained under payoff-monotonic dynamics such as the replicator dynamic or an analysis of Neutrally Stable Strategies (NSS), cooperation is eliminated in the long run; agents’ inability to observe the strategies of successful players can fundamentally change the evolutionary dynamics. ©1999 Elsevier Science B.V. All rights reserved. JEL classification: C72 Keywords: Repeated game; Replicator dynamic; Observable strategy; Limit of means

1. Introduction The large literature on the evolution of cooperation in populations playing the Repeated Prisoners’ Dilemma (RPD) 1 pays little attention to the transmission or observability of strategies, other than specifying that transmission is true and (in most models) proportional to success. What is missing is the specification of how agents decide what strategies to use and upon what knowledge that decision is based. Often, the dynamics are left unspecified; ∗

E-mail address: [email protected] (C.S. Ruebeck) Early papers are Maynard Smith (1974), Axelrod and Hamilton (1981), and Axelrod (1984). For a review of this literature, see Dion and Axelrod (1988). 1

0167-2681/99/$ – see front matter ©1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 2 6 8 1 ( 9 9 ) 0 0 0 4 3 - 8

82

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

even when the dynamics of a system’s evolution are explicitly modeled, as in Weibull (1995), little attention is paid to agents’ ability to observe strategies. Where uncertainty is modeled, the focus has been on noisy observations of agents’ payoffs rather than their inability to observe others’ strategies. 2 The ability of agents to observe and imitate each other would seem equally important in determining the evolutionary success of a strategy. In other words, the extent to which a strategy’s performance translates into its evolutionary success may depend very much on the specification of how agents decide to adopt strategies. 3 In this paper, an imitation dynamic is developed that breaks the direct link between fitness and payoff by assuming that strategies are not directly observable. Agents use a publicly available knowledge of payoff rankings and their own private memories of interaction with other agents. For the RPD, the resulting global attractor is ‘always-defect:’ all agents adopt a strategy of unconditional defection. If, instead, the conventional assumption of observable strategies is made, both ‘tit-for-tat’ (cooperate and then imitate your opponent) and always-defect are steady states, but only tit-for-tat – the unique Neutrally Stable Strategy (NSS) 4 – is locally stable. We see that agents’ inability to directly observe successful strategies can fundamentally change the evolutionary dynamics and the characteristics of the resulting steady state. When strategies are unobservable and some agents imperfectly recall the history of play, tit-for-tat’s weakness is that other strategies confuse it with ‘suspicious’ tit-for-tat (defect and then imitate your opponent), creating a dynamic that is not payoff-monotonic: the growth of an underperforming strategy’s use may be faster (or its decline slower) than that of a more successful strategy. This feature of the model illustrates that the conventional practice of equating performance and success may not be innocuous. It is not surprising that there has been little study of imitation dynamics that are not payoff-monotonic. It is only natural to expect that more successful strategies grow faster than less successful ones, an expectation reinforced by the large variety of underlying imitation and best response dynamics that have been shown to be payoff-monotonic. 5 Yet there may be equally important (and perhaps more relevant) classes of non-payoff-monotonic dynamics. Payoff-monotonicity does not always hold in the model presented here because strategies are unobservable, and this is enough to derail the evolution of cooperation.

2. The model and its state equations Consider a large population of agents characterized by the strategy they use for playing the RPD. Each period, the agents receive the payoff from using their strategy against all 2 For example, Molander (1985) and Dixit (1987). An example closely related in theme to the model presented here is Cooper (1996), in which strategies are not observed and the strategy space consists of two-state stochastic Moore machines. 3 This is in direct contrast to the accepted approach in the literature, as explicitly illustrated by Bendor and Swistak (1995), to define fitness as a strategy’s payoff summed over all pairwise matchings in the population. Harrington (1988), an exception, investigates an endogenous link between fitness and payoff. 4 For a definition of NSS, see footnote 11. 5 See Chapter 4 of Weibull and the references included there.

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

83

Fig. 1. Payoffs to the Prisoners’ Dilemma, used in the exposition of the paper (left), and in general (right). The symmetric Prisoners’ Dilemma is defined by the above matrix where Z > X > W > Y and X > (Y + Z)/2. The last inequality is important in the RPD because it requires that joint cooperation be superior to alternating between cooperation and defection out-of-synch with one’s partner. Table 1 Strategies available to agents Label

Strategy

C D CT DT

Always cooperate Always defect Start with cooperate, then mimic opponent’s previous move Start with defect, then mimic opponent’s previous move

agents in the population (a ‘round-robin’), where each encounter with another player is an infinitely repeated Prisoners’ Dilemma. 6 The payoffs to the one-shot Prisoners’ Dilemma are defined in Fig. 1 both in general and in particular for use of the exposition of the paper. The results are unchanged when using the general form, but use of the particular game here reduces cumbersome notation. 7 The payoffs to an encounter between two players are the limit of means (LOM) payoffs to infinitely repeated play. LOM payoffs are an approximation to averaging over a large number of plays, as follows: if the one-shot payoff in period t to an agent using strategy i against strategy j is Vt (i, j ), then the LOM payoff from using strategy i when playing against strategy j is T

1X Vt (i, j ). T →∞ T lim

t=1

Strategies i and j are rules which players follow. For the sake of tractability, agents’ choices are limited to the strategy space {C, D, CT, DT}, as defined in Table 1. CT is ‘tit-for-tat,’ while DT is commonly referred to as ‘suspicious tit-for-tat’ because it is a ‘nasty’ strategy (it defects on the first move). Although a serious limitation, the consideration of only these four strategies is sufficient to establish that the assumption of unobservable strategies significantly impacts a dynamic analysis. 6 This is meant to approximate a large number of players interacting a large number of times, and is used to simplify the analysis. Since payoffs only change slightly when players interact for tens of rounds instead of an infinite number of rounds, the payoffs in Fig. 2 will not change significantly with a finite number of interactions. Under such a modification, players can still be unsure of winners’ strategies. 7 An exposition of the results using the general form is available from the author.

84

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

Fig. 2. The (LOM) payoffs to the infinitely repeated Prisoners’ Dilemma. Table 2 Payoffs to each strategy after round robin LOM (‘limit of means’) game. Strategy

Fraction of population

Payoff against the population

C D CT DT

a b c d

a−b+c+d 2a a + c + d/2 a + c/2

Given the payoffs to the one-shot game defined in Fig. 1 and the strategies in Table 1, Fig. 2 describes the payoffs from an encounter between two agents. An agent’s encounters with every member of the population then lead to the average payoffs listed in Table 2 when the period’s population consists of fractions a, b, c, and d of strategies C, D, CT, and DT, respectively. A continuum of agents is assumed; necessarily, a + b + c + d = 1. The identities of the players with the highest average payoff are public information. Players do not know or do not use the information contained in either Fig. 2 or Table 2; they know and use the identities of the winners, a memory of play against others, and the list of possible strategies (Table 1). For any strategy pair in a given encounter, the players’ actions will eventually enter a cycle; the play during (one of) these cycles will be called ‘cycle play.’ Two types of agents exist: ‘forgetful’ agents whose only memory of their interaction with others is cycle play, 8 and ‘non-forgetful’ players who remember the entire history of play with others. The results derived here only require that the fraction of agents who are forgetful is positive. The result of being forgetful is that an agent using C or D cannot differentiate between an opponent using CT and one using DT: all she remembers is that her opponent mimicked 8 This assumes that players suffer from imperfect recall, a legitimate concern which is recently explored by Piccione and Rubinstein (1995). This assumption also resembles assumptions and results of Rubinstein (1986) and Binmore and Samuelson (1992). Their notions of equilibrium and use of LOM payoffs to determine fitness create the result that, in equilibrium, machines contain no states that are used a finite number of times. This is a direct consequence of introducing a small cost to (or lexicographic preference against) increasing the number of states used by an agent. The authors recognize the starkness of their assumptions, and call attention to the observation that, in practice, people tend to forget those parts of a rule which they do not often use. The assumption made here is instead that (some) players forget the infrequently observed actions of others.

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

85

Table 3 Realized play between a forgetful agent and all possible opponents, and the resulting equivalence classes: ‘cooperate’ means that the cycle play of the opponent consists solely of cooperation, ‘defect’ means the opponent only defects during cycle play, and ‘cycle’ means that during cycle play the each players alternates playing cooperate and defect out of synch with their opponent. The payoff to this last play conforms to the standard RPD definition mentioned in Fig. 1: it is less than always cooperating. Opponent’s strategy C Agent’s strategy C D CT DT

D

CT

DT

Realized “cycle play” Cooperate Cooperate Cooperate Cooperate

Defect Defect Defect Defect

Cooperate Defect Cooperate Cycle

Equivalence classes Cooperate Defect Cycle Defect

{C ,CT, DT}, {D} {C}, {D, CT, DT} {C, CT}, {D},{DT} {C}, {D, DT},{CT}

her own behavior. Note that, regardless of whether or not he is forgetful, an agent using C or CT cannot differentiate ex post between being matched with an agent using C and one using CT; likewise, any agent using D or DT cannot differentiate between opponents using D and DT. These facts are summarized in Table 3, which lists the realized play between a forgetful agent and any opponent, followed by the resulting ‘equivalence classes’ that can be inferred. For example, consider an agent (call him i) using the strategy C, playing against an opponent using CT (call her j ); this involves the first row of realized play in Table 3. Agent i does not observe agent j ’s strategy, only her play against him: she always cooperated. Forgetful or not, agent i is unsure whether or not j used C or CT. And because agent i is forgetful, he can only remember cycle play, which means that agent j may have defected on the first round because her strategy was DT. So i knows only that his opponent’s strategy is in the set {C, CT, DT}. Call the agents with the highest LOM payoff against the population in the most recent period the ‘winners.’ Before the next period, a fraction (1 − α) of the agents that did not receive the highest payoff (the ‘non-winners’) attempt to emulate the winners’ previous strategies. Agents use the public knowledge of the winners’ identities and their private knowledge of play with the winners, but they do not have a knowledge of the play that occurred between other players. They also do not use knowledge of the payoff structure, for example that represented in the third column of Table 2, to algebraically ‘back out’ hypotheses of the winners’ strategies. When the available public and private knowledge is enough to identify the winners’ strategy, the player adopts it; if that information is not enough to identify the winners’ strategy, the player randomizes over the possible alternatives. (This key step is item 2(a(iB)) below.) Formally, the population’s strategies change as follows. 1. Agents with the highest LOM payoff (the winners) retain their current strategies, and a fraction α ∈ (0, 1) of the players who did not have the highest LOM payoff also retain their current strategies. 2. A fraction (1 − α) of the agents who did not have the highest LOM payoff do change their strategies. All agents know who the winners are; a fraction (1−ε) are not forgetful: they can remember all play in which they were involved; the remaining fraction ε > 0

86

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

of the agents are forgetful: they can only remember cycle play. The central assumption of this paper is that agents (in particular those that are adopting a new strategy) cannot directly observe the strategy/ies of the winners. (a) Consider first the adoption decision of a forgetful player. (i) Suppose that the player considering a change in strategy did not observe winners using strategies from more than one equivalence class. (A) If an agent using strategy i quely identify the winners’ strategy j , then she adopts it. This happens either when her equivalence class containing j is a singleton, or when her equivalence class containing j is {i, j }. In the second case, the agent knows that she didn’t win with strategy i, so it can be eliminated from the set {i, j } of possible winning strategies, leaving only j . (B) If the agent cannot uniquely identify the winners’ strategy, then he randomly chooses between the strategies in the equivalence class containing the winners’ strategy, after eliminating his own (unsuccessful) strategy. With a continuum of agents, the result is an equal allocation over those strategies. (The results are robust to bias which creates an unequal allocation among strategies, as long as all strategies are represented.) For example, again using the first row of Table 3, if a forgetful agent playing C (always cooperate) was not a winner and sees that winners always cooperated with him (in terms of cycle play), he cannot tell whether the winners are playing C, CT, or DT. Knowing that his own strategy did not make him a winner, the agent thus eliminates C, and chooses between CT and DT with equal likelihood. (ii) If any players considering a change observed some winners using a strategy from one equivalence class, and other winners using strategies from another equivalence class, then one or more strategies must have tied as winners. The description here is similar to that above but is found in Appendix A because it adds little to the character of the dynamics. (b) Now consider non-forgetful agents. Due to the restricted strategy space employed here, all non-forgetful agents that are changing strategy can always uniquely identify the winners’ strategies; they fall into category 2(a(iA)). (If there are ties, as in 2(a(ii)), these non-forgetful agents adopt strategies according to their proportion among the winners; see Appendix A.) Thus, only forgetful agents are unsure of winners’ strategies, but only when a winner is using CT or DT and the forgetful agent is using C or D. Returning to the example considered in 2(a(iB)) of an agent playing C when CT is the best-performing strategy: of the 1 − α agents that change strategy, 1 − ε of those using C are not forgetful and so are able to correctly choose CT, while half of the remaining ε agents choose CT correctly by chance, leaving ε/2 agents that adopt DT incorrectly by chance. (The α agents who did not consider changing their strategies continue using C.) This, by the way, is the derivation of the terms that include the state variable a in Case 3 of the state equations below. To derive the equations of motion describing a population’s evolution, we must first identify the winning strategy/ies for every possible population. As defined in Table 2, the

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

87

state variables a, b, c, and d refer to the fractions of the population using strategies C, D, CT, and DT, respectively. Let V¯ (X) denote the average round-robin payoff to strategy X, suppressing its dependence on a, b, c, and d. From Table 2, if b < d/2 if a < c + d/2 if a + b < c + d

then V¯ (C) > V¯ (CT), then V¯ (CT) > V¯ (D), then V¯ (C) > V¯ (D),

and V¯ (DT) < V¯ (CT) when c > 0 or d > 0. When all strategies are represented initially, these inequalities partition the state space into seven cases. These cases are listed below according to the winning strategy or strategies, followed by the inequalities (and/or equalities) that must hold for them to be winners. In Cases 1–3, there is a unique winner. Cases 4–7 (listed in Appendix A) consider the two-way ties and three-way tie referred to in 2(a)ii above; these are relegated to the appendix due to their non-generic nature when the population is modeled as a continuum. For each case, a table is provided which lists the strategies that will be adopted in the following period by the fraction (1 − α) of the agents in the population using a non-winning strategy in the current period. Next to each table are the resulting state equations, where the state variables are defined to be the population fractions using each strategy (a, b, c, and d). A prime denotes the next period’s state. Recall that all players with a winning strategy do not change their strategy. Case 1. C wins when V¯ (C) > V¯ (CT) and V¯ (C) > V¯ (D) : b < d/2 and a + b < c + d. Current strategy D CT DT

Agents adopt C C C

State equations a 0 = a + (1 − α)(b + c + d) b0 = αb c0 = αc d 0 = αd

Case 2. D wins when V¯ (D) > V¯ (C) and V¯ (D) > V¯ (CT): a + b > c + d and a > c + d/2. Current strategy C CT DT

Agents adopt D D D

State equations a 0 = αa b0 = (1 − α)(a + c + d) + b c0 = αc d 0 = αd

Case 3. CT wins when V¯ (CT) > V¯ (C) and V¯ (CT) > V¯ (D): b > d/2 and a < c + d/2. Current strategy C D DT

Agents adopt CT, DT CT, DT CT

State equations a 0 = αa b0 = αb c0 = (1 − ε/2)(1 − α)(a + b) + c + (1 − α)d d 0 = ε/2(1 − α)(a + b) + αd

In Cases 1 and 2, all agents can identify the winning strategy, and so all agents that are considering a change in strategy adopt the best-performing strategy. Thus the non-winning

88

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

strategies decrease at rate α as long as the population continues to satisfy the inequalities for either of these cases. This decrease in the losing strategies’ proportion of the population goes directly to increasing the representation of the winning strategy. In Case 3, on the other hand, some agents mistakenly adopt DT, as reflected by the fact that in the state equations, d (the proportion of agents using DT) is not decreasing at rate α, but at a slower rate—it may even be increasing, as in Example 3 below. These are the forgetful agents who had been using either C or D and can’t differentiate between CT and DT, as described in item 2(a(iB)) above. The four remaining cases, in which there are ties, are not important to the proof of the theorem in Section 3 and are briefly discussed outside Appendix A. 3. The evolution of the system The initial proportions of the population using C, D, CT, and DT are denoted a0 , b0 , c0 , and d0 . Likewise, an , bn , cn , and dn refer to population sizes in period n. If all strategies are represented initially, then the population evolves to homogeneous unconditional defection, all agents adopting D. Theorem 1. If a0 , b0 , c0 , d0 , ε > 0, then limn→∞ bn = 1. The proof, found in Appendix A, addresses each case in turn. First, when agents using C receive the highest payoff (Case 1), the population eventually contains so many of these agents that unconditional defection, D, receives a higher payoff, entering Case 2. Once in Case 2, agents using D continue to win and the population evolves towards one in which all agents defect unconditionally. If the population begins in Case 3 (agents using CT receive the highest payoff), then the use of D decreases faster than the use of DT (in some instances the use of DT will grow). While agents using C are not hindered by the suspicious first-period defection of DT, agents using CT get caught in a cycle of defection and punishment with DT. 9 Thus the population eventually enters Case 1 when the payoff to C exceeds the payoff to CT. The remaining cases (4–7)) are those in which there are ties. They simply move to one of the above cases at most three periods later when all ties are broken. Example 1. To illustrate the evolution of a population toward all agents using D, Fig. 3 depicts the dynamics of the system when strategies initially have equal representation: a0 = b0 = c0 = d0 = 1/4. Half the agents are forgetful (ε = 0.5), and 20 percent of the population of non-winners consider changing their strategy each period (α = 0.80). From the inequalities at the end of Section 2 derived from Table 2, we can see that the strategy initially with the highest payoff is CT (Case 3) because b0 > d0 /2 and a0 < c0 + d0 /2. C and D are tied initially (a0 + b0 = c0 + d0 ), but this is of no consequence. DT ranks last because a0 > c0 /2, yet the size of the population using DT decreases more slowly than the subpopulations using C and D. Twenty percent of the agents using C, D, and DT are changing strategies, but one quarter (ε/2 = 0.25) of these using C and D incorrectly 9 The consideration by other authors of noisy payoffs also highlights this detrimental effect on CT’s interaction with strategies similar to itself. See references in footnote 2. But note that adding noisy payoffs to the specification of this model would not help agents using C or D to differentiate between others using CT or DT; it would only affect the payoffs to using CT or DT against CT or DT.

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

89

Fig. 3. Evolution of a population with a0 = b0 = c0 = d0 = 0.25, ε = 0.5, α = 0.8.

guess that the winners’ strategy is DT instead of CT. During the first phase of evolution (n = 0 to 8), it pays to cooperate conditionally (as a NSS analysis would indicate): the use of CT becomes more prevalent. But the use of DT does not decrease as fast as the use of D, creating a dynamic that is not payoff-monotonic (to be defined formally below). Thus, in contrast with conventional analyses, eventually (when n = 8) C is no longer dominated by CT because the relative presence of DT to D has grown enough to destabilize CT’s superiority over C. More specifically, the loss to C from being exploited by D is less than the loss to CT from its inability to coordinate with DT. After one period (n = 8) of a tie between C and CT, the system enters its second phase of evolution (n = 9 to 11). From this point on, the evolution is similar to that under a pay off-monotonic dynamic: no strategy’s decline is slower than another that receives a higher payoff. There is no longer imperfect imitation because all players can identify the winners’ strategies, a sufficient, but not necessary, condition for short-term payoff-monotonicity under imitation. Continuing the description of the dynamics, when C is ranked highest the growth in its use sows the seeds for its own demise (as predicted by NSS theory) because agents using D take advantage of the growing adoption of unconditional cooperation. Starting with n = 12, although the payoffs to all agents are decreasing, D is ranked highest. Its exploitation of C is never outweighed by the advantages to CT’s conditional cooperation because the populations using C, CT, and DT are all decreasing at the same rate. 䊏 As we have seen, under this specification of an imitation process with unobserved strategies, a population in which all strategies are initially represented evolves toward one in which there is uniform unconditional defection. 10 The results obtained by use of the more standard approach, evolutionary game theory, differ in two ways: D (uniform defection) is evolutionarily unstable and cooperation is expected to prevail because CT is a NSS. 11 It 10 Complete results, which include the evolution from other initial populations (a , b , c , and/or d = 0), are 0 0 0 0 described following the proof in Appendix A. 11 A strategy i is a NSS iff ∀j 6= i, either (a) V (i, i) > V (j, i), or (b) V (i, i) = V (j, i) and V (i, j ) ≥ V (j, j ) hold. The stability concept more commonly employed in evolutionary game theory, an Evolutionarily Stable Strategy (ESS), is similar to a NSS; the only difference is that the final inequality is strict. Maynard Smith (1982) originally proposed the definition of a NSS, calling such a strategy ‘evolutionarily stable.’

90

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

is straightforward to show (and is well-known) that the unique NSS for this model is CT (tit-for-tat), and that none of the strategies in the space {C, D, CT, DT} are Evolutionarily Stable Strategies (ESS). 12 These static solution concepts have been shown to be related to payoff-monotonic dynamics, a class of dynamics which includes the replicator dynamic. Payoff-monotonic dynamics associate high growth rates with high payoffs. If for every possible state of the system xn and strategies i, j , j

j

V (xni ) > V (xn ) ⇔ g(xni ) > g(xn ), where g(xni ) =

i xn+1 − xni

xni

,

then the dynamic is payoff monotonic. It is proved by Weibull that suitable definitions of stability are implied by ESS and NSS if the evolution of the system is determined by the replicator dynamic, although the converse is not true. He also shows that for continuous versions of the replicator dynamic and payoff-monotonic definitions, the set of stationary states (those states x which satisfy xn = x for all n > 0 when x0 = x) are identical for all payoff monotonic dynamics. 13 It is appropriate, then, to consider examples of the instability of CT and the stability of D when strategies are unobserved to highlight the features of this model’s dynamics that differ from the intuition of a NSS analysis or the evolution under payoff-monotonic dynamics. The intuition behind the static ESS/NSS characterizations is typically described in terms of the invasion of a population by ‘mutants,’ agents who have adopted strategies which were not previously represented in the population. If a strategy (or mix of strategies) repels the invasion by driving the mutants to extinction, then the conditions for an ESS are said to have been satisfied. If instead the population share of the mutants does not grow, we have a NSS. 14 Example 2 illustrates the dynamics that destroy the dominance of CT; Example 3 illustrates the short-run effects of a ‘mutant invasion.’ Example 2. Consider the evolution from a state in the neighborhood of all agents using CT. In Fig. 4, the initial conditions are a0 = 10−6 , b0 = 10−4 , c0 = 1 − 2 × 10−4 , and d0 = 0.99 × 10−4 ; almost all of the agents are using CT. The parameters are again α = 0.80 and ε = 0.5. The NSS solution concept presumes that because CT is such a large portion of the population, because it does as well against itself as any other strategy does against 12 Several authors have shown that there are advantages to using NSS over ESS, notably Bendor and Swistak (1995), who argue for NSS over ESS due to considerations of the dynamics implicitly assumed in the definitions of these static equilibrium concepts. Abreu and Rubinstein (1988) illustrated the non-existence of ESS when strategies are restricted to finite automata; see also footnote 14. Various authors have modified the definition of ESS to address non-existence; for example, Binmore and Samuelson. 13 Continuous versions of the following propositions: If a state x is a NSS, then every neighborhood B of x contains a neighborhood B 0 such that ∀x 0 ∈ B 0 , xn ∈ B ∀n ≥ 0. And if state x is an ESS, then additionally there exists a B 1 such that limn→∞ xn ∀x0 ∈ B 1 . 14 As recognized by Selten and Hammerstein (1984), CT is not in general an ESS because a ‘neutral mutant’ such as C can invade the population of CT; the second condition of the ESS definition is not met. Bendor and Swistak note that every pure strategy i in a repeated game has neutral mutants (any strategy j with some variation on the strategy i such that V (i, i) = V (j, i) and V (j, j ) = V (i, j )).

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

91

Fig. 4. Evolution of a population with a0 = 10−6 , b0 = 10−4 , c0 = 1 − 2 × 10−4 , d0 = 0.99 × 10−4 , ε = 0.5, α = 0.8.

it, and because it does as well against any other strategy as that strategy does against itself, it will continue to dominate the population. We now see that this outcome is not realized when strategies are unobservable. Central to the ultimate dominance of D is that the presence of C becomes sufficiently great, so the system must evolve from its initial state of a growing CT subpopulation to one in which the presence of unconditional cooperation (C) spreads faster than that of conditional cooperation (CT). How does the payoff to agents using C come to dominate the payoff to those using CT? Because the agents using C and CT cooperate with each other, the difference in their payoffs is determined by their interaction with agents using D and DT. Even though initially the difference in payoffs between C and CT is determined by very small D and DT populations (their sizes actually can’t be seen on the scale of Fig. 4), there will still be a period (n = 17 in this example) during which the D population has decreased enough relative to DT so that C gets a higher payoff than CT. The advantage to an agent of using C against DT over using CT against DT is that DT turns quickly to continual cooperation with C, instead of entering into a cycle of alternating cooperation and defection as it does against CT. To consider the non-payoff-monotonic forces operating here, note that the initial ranking of payoffs is V¯ (CT) > V¯ (C) > V¯ (DT) > V¯ (D). Yet the decline of DT is slower than the decline of C, which impacts the dominance of CT because it decreases the percentage of agents in the population which fully cooperate with CT, as compared to the percentage that cooperate with C. The remaining description of the evolution from these initial conditions is the same as in Example 1. In particular, the phase during which winners are using C (n = 17 to 20) is in agreement with the NSS/ESS concept, as mentioned in the previous example. Even though C does at least as well against every other strategy as it does against itself, it does not do as well against itself as D does against it, and thus eventually the superiority of C is overturned by the agents using D. On the other hand, the final phase (beginning at n = 21) in which D is winning does not coincide with NSS concepts: D does not do as well against CT in pairwise play as CT does against itself; CT is the unique NSS of this model because it is supposed to have the ability to repel ‘mutant invasions,’ while it seems (using the principles of evolutionary game theory) that D should not be able to repel them. 䊏

92

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

Fig. 5. Evolution when a0 = 4.4 × 10−7 , b0 = 1 − 8.8 × 10−7 , c0 = 3.9 × 10−7 , d0 = 1.1 × 10−7 , ε = 0.5, α = 0.8.

The next example illustrates, when strategies are unobservable, the difference between the effect of a mutant invasion in the short run and in the long run. As opposed to the evolution under a payoff-monotonic dynamic, a population consisting of almost all agents using D is not permanently destabilized by a ‘mutant invasion’ of agents using CT. 15 Example 3. The evolution now starts with a population near the equilibrium of ‘always-defect’ (D), yet even though there is a very large population using D, the initial conditions are chosen so that CT is ranked highest–we are observing a small deviation away from the evolution towards uniform unconditional defection. As can be seen from the inequalities that define Case 3, similar conditions can be satisfied for any initial proportion of the population using D (i.e. arbitrarily close to 100 percent unconditional defection). In general, it is not the size of the population using D that matters; when enough agents are using CT and DT as compared to those using C, then CT will prevail over D – but only in the short run, as can be seen in Fig. 5. The system moves through CT winning (n = 0 to 31), C winning (n = 32 to 36), and in the long run (starting at n = 37) D wins, rejoining the evolution towards all agents adopting D. Although the description of the dynamics is the same as in the previous examples, note that in the first phase of evolution (while CT is winning), the population using DT actually grows at first due to the large number of agents who previously used D and are uncertain whether winners use DT or CT. Initially V¯ (D) > V¯ (DT) when CT is the winning strategy, yet the growth of DT’s use is faster than the growth (actually, decline) of D’s use because some agents incorrectly guess that winners are using DT instead of correctly guessing CT. This again illustrates that the dynamic created by unobservable strategies contradicts both a static NSS analysis and a conventional payoff-monotonic dynamic anal15 The usual definition of a mutant invasion would indicate a small number of agents switching to CT in an otherwise homogeneous population of agents using D. The use of the term here refers either to a heterogeneous status quo population or a heterogeneous invasion population. An invasion by CT of a population in which all agents are using D will not be repelled; such a specification of initial conditions would evolve towards all using CT. As is detailed after the proof in Appendix A, a population cannot evolve to all agents using CT as long as a > 0; if there are initially any agents use C, their subpopulation will eventually outperform CT (and later be exploited by D).

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

93

ysis such as the replicator dynamic. While it is true that an invasion of mutants using CT can ‘destabilize’ the evolution towards D (as predicted by NSS analysis), it is only in the short run; in the long run the system reverts back to evolving toward all agents defecting unconditionally. 䊏 As illustrated here, the assumption of non-observability has the potential to introduce a plausible, well-defined, non-payoff-monotonic dynamic. The non-monotonicity of the dynamic arises from agents mistaking CT for DT. Even though agents using DT may receive a lower payoff than those using C and CT, the use of DT can grow faster (or decrease more slowly) than the use of C or D, destabilizing the superiority of tit-for-tat (CT). We have seen that weakening the connection between fitness and payoff can break the link between NSS and stability that would exist if the dynamic were payoff-monotonic. 4. Extensions of the model 4.1. Observable strategies The non-observability of strategies is a central assumption, for if strategies are observable then there exist initial conditions such that all agents evolve to using CT (see Appendix B). The assumption of observability implies a payoff-monotonic dynamic, fulfilling the requirements for NSS to imply stability as discussed in the examples of the previous section. Furthermore, if strategies are observable, then all agents using D (b = 1) is no longer locally stable in the sense that in any small neighborhood around b = 1, there is a set of positive measure which diverges from b = 1: an arbitrarily small subpopulation of ‘mutants’ switching to CT now causes the population to evolve toward everyone using CT in the long run, as opposed to the behavior highlighted in Example 3. The achilles heel of CT (its inability to coordinate with DT) is no longer a liability when strategies are observable because the subpopulation of agents using DT never declines more slowly than the subpopulations using more successful strategies. 4.2. All agents adopt at once It is also illuminating to consider what happens when all agents try to adopt the winners’ strategies every period (α = 0). As shown in Appendix C, the system can evolve to all C and all CT as well as all D. Why this striking difference from the case of α > 0? When C is initially ranked highest (Case 1), and α = 0, there are no remaining players using D to exploit the large C population. In Case 3 (when CT is initially ranked highest), it is because there are no players using C remaining to take advantage of CT’s poor performance against DT. The assumption that some non-winners do not change (α > 0) is ‘sharp’ precisely because some non-winners remain to take advantage of the winners’ weaknesses (and CT’s weakness is due the assumption of unobservable strategies). 4.3. Alternative specifications of agents’ experimentation We have seen that agents’ inability to perfectly emulate CT, as highlighted in the descriptive examples of Section 3, directly impacts the success of CT. It has been assumed

94

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

that agents attempting to emulate the success of those using CT adopt CT or DT with equal likelihood, but (as mentioned in 2(a(i)) of Section 2) the theorem holds regardless of the fraction of agents guessing DT instead of CT, as long as the fraction is fixed and non-zero. So the extent to which forgetful agents exist in the population does not bear on the theorem’s result. We might instead assume that the fractions of forgetful agents guessing DT and CT are, respectively, proportional to the fractions of the population currently using CT and DT. Although the motivation for this specification is questioned below, its consideration demonstrates the continued importance of non-payoff-monotonic dynamics. Simulations using appropriately modified state equations (see Appendix D) indicate that now both CT and D are attractors.This new result is due to the accelerated decline in agents adopting DT as the use of DT falls. Note, though, that the non-payoff-monotonic nature of the dynamic still affects outcomes: there are still initial conditions (including those in which CT is initially the winning strategy) that lead eventually to all agents adopting D. Yet it is questionable whether such a specification is even appropriate here. It is not clear how agents attempting to mimic successful strategies are to be more influenced by more prevalent strategies when the non-observability of strategies is a basic assumption. That agents must actually observe the strategies of others before they can emulate them is the focus of this inquiry. Finally, note that in the unmodified model of Section 2 any agent who adopts DT will, the next time this player considers changing strategy, accurately perceive the strategy of any winner. The drawback to an individual agent of being confused does not long detract from her ability to imitate the winner. But it is the fact that imitators test their hypotheses during the course of play that makes CT’s dominance difficult. The confusion would not affect the dynamics at all if the agents could have a ‘test bed’ for their strategies outside of their interaction with others (so that they no longer affect others’ payoffs during the ‘trial period’ – especially the payoff to winners using CT). The very act of ‘experimenting’ with possible strategies during the course of play is what causes the system’s dynamics not to be payoff-monotonic.

4.4. Winners change strategy As a final extension of the model, consider the consequences of a fraction (1 − β) of the winners experimenting with other strategies in their own (winners’) equivalence class. An experimenting winner using D adopts CT or DT (see Table 3), because he cannot tell whether other winners are using D (the same strategy as this winner), using CT, or using DT. The result is that uniform adoption of D is no longer stable, as can be argued using the early evolution illustrated in Example 3. Any small fraction of winners who adopt CT and/or DT will destabilize the superiority of D when there are sufficiently few players using C. Assuming that some fraction of winners change strategy means that some of the winners using D will adopt CT or DT, the strategies that perform best in an environment of defection. Fig. 6 shows that even when only 1 percent of winners change strategy, the system enters a cycle: the growth of CT in the population eventually leads to the growth of C, which then is exploited by agents using D, which now brings the system back to CT winning, and so

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

95

Fig. 6. Evolution of a population with a0 = 0.25, b0 = 0.25, c0 = 0.25, d0 = 0.25, α = 0.8, and β = 0.99.

on. This and other simulations suggest a discontinuity at β = 0 because the length of the cycle appears to decrease (perhaps toward a limiting finite value) as β increases, rather than approaching the long run equilibrium of uniform defection (no cycling) when β = 0. This extension is interesting due to the sharpness of the assumption that winners don’t change, the periodic emergence of cooperation, and the possibility of heterogeneous steady states, but additional characterization will be required for an analytic description of the dynamics.

5. Concluding remarks The standard approach to both dynamic and static analyses of a population’s evolution is founded upon equating payoff and fitness, which may seem appealing when a more successful strategy can be assumed to produce more offspring. But in a sociological context, we are not usually considering the passive transmission of characteristics from one generation to the next. It may then be more appealing to imagine a population of agents attempting to discern the most beneficial strategy to adopt. The results presented above analyze the transmission of strategies in a social (non-biological) context with particular interest in the effects of agents’ inability to observe strategies. An evolutionary process has been explored which directly models the imitation mechanism. It has been assumed that agents know the outcome of their own personal encounters with other agents but do not observe the outcomes of encounters between other players, let alone any agent’s strategy. Since some agents are not sure what strategy the successful members of the population have used, an underperforming strategy’s representation in the population may increase relative to a more successful strategy’s representation. This is in contrast with the implications of a static NSS/ESS analysis or results obtained under the assumption of payoff-monotonic dynamics–in particular, the replicator dynamic, a standard starting point in evolutionary game theory. For the sake of tractability, this analysis has focused on a very small strategy space, but this simple strategy space is sufficient to establish that the non-observability of strategies can affect the qualitative results of a dynamic analysis. Although a richer strategy space might result in the selection of some other steady state(s), I would conjecture that when strategies

96

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

are unobservable non-payoff-monotonic forces will continue to be present. As the model presented here has shown, the link between a successful strategy’s payoff and its fitness is weakened when imitation is imperfect. The reinforcement of a winning strategy’s use in future periods is diluted due to imitators’ inability to perfectly infer the strategies of others. We have seen that if a strategy with an inferior payoff interacts with other underperforming strategies in a manner similar to the winners’ strategy, the inferior strategy’s fitness could be improved due to the non-winners’ perception that winners may have actually been using that strategy. This would seem to be a general consideration introduced by the unobservability of strategies, regardless of the size of the strategy space. Although it may seem intuitive that naturally occurring dynamics are payoff-monotonic (the use of no strategy grows faster than the use of any more successful strategy), the results here illustrate the error of such intuition, in particular under imitation. It is unlikely that agents can directly observe the strategies used by successful agents; typically, those strategies will have to be inferred from a mixture of public and private information. By investigating an underlying mechanism which specifies how agents decide which strategies to adopt, we have seen that the evolution of social interaction may be very different from that predicted by either static or dynamic conventional analyses.When fitness is not equivalent to payoff, performance need not be the only determinant of success. Acknowledgements I would like to thank Joe Harrington for recommending this subject for study and for his many helpful comments. Thanks also for comments from two referees and from the participants at the Stony Brook Summer Festival on Game. Appendix A. Proof of theorem Consider the evolution from each of the possible cases that partition the set of initial conditions. A.1. Case 1 evolves to Case 2 If the conditions for Case 1 are satisfied for periods 0, . . . , n, the following reduced form equations hold: b0 <

d0 2

a0 + b0 < c0 + d0 an = a0 + (1 − α n )(b0 + c0 + d0 ) bn = α n b0 cn = α n c0 dn = α n d0 .

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

97

The inequality bn < dn /2 holds in all periods: bn = α n b0 < α n d0 /2 = dn /2. Case 1 evolves to Case 2 when an + bn < cn + dn no longer holds. The existence of the critical period is derived as follows: 16 an + bn − (cn + dn ) = a0 + b0 + c0 + d0 − α n (b0 + c0 + d0 ) + α n b0 − α n (c0 + d0 ) = 1 − 2α n (c0 + d0 ) ≥ 0 when αn ≤

1 . 2(c0 + d0 )

Thus, Case 1 will evolve to Case 2 after a sufficient number of periods because bn < dn /2 and an + bn > cn + dn together imply an > cn + dn /2. A.2. The population does not leave Case 2 Re-numerate so that the first period a population is in Case 2 is period 0. If the population then remains in Case 2 for n additional periods, the following reduced form equations must hold: a0 + b0 > c0 + d0 a0 > c0 +

d0 2

an = α n a0 bn = b0 + (1 − α n )(a0 + c0 + d0 ) cn = α n c0 dn = α n d0 . The inequality an > cn + dn /2 holds in all periods because an − (cn + dn /2) = α n (a0 − c0 − d0 /2) > 0 for all n. To see that once in Case 2 the population stays there, all that remains is to show that an + bn > cn + dn is true for all n; an + bn − (cn + dn ) = a0 + b0 + (1 − α n )(c0 + d0 ) − α n (c0 + d0 ) = [a0 + b0 − α n (c0 + d0 )] + (1 − α n )(c0 + d0 ) > 0 for all n because a0 + b0 − α n (c0 + d0 ) > a0 + b0 − (c0 + d0 ) > 0. Thus, after a population enters Case 2, all strategies other than D eventually die out because lim bn = lim [b0 + (1 − α n )(a0 + c0 + d0 )] = a0 + b0 + c0 + d0 = 1.

n→∞

n→∞

16 If a + b = c + d before a + b < c + d , then Case 6 is next, but Case 2 follows one period later when n n n n n n n n ties are broken, as discussed later.

98

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

A.3. Case 3 evolves to Case 1 When the conditions for Case 3 are satisfied for periods 0, . . . , n, we have immediately from the state equations that an = α n a0 bn = α n b0 . These are used to derive dn

= αdn−1 + 2ε (1 − α)(an−1 + bn−1 ) = α 2 dn−2 + 2ε (1 − α)(an−1 + αan−2 + bn−1 + αbn−2 ) .. .

α n d0 + 2ε (1 − α)(an−1 + αan−2 + · · · + α n−1 a0 + bn−1 + αbn−2 + · · · + α n−1 b0 ) = α n d0 + 2ε (1 − α)(α n−1 + αα n−2 + α 2 α n−3 · · · + α n−1 )(a0 + b0 ) = α n d0 + 2ε (1 − α)nα n−1 (a0 + b0 ), =

and then cn = 1 − (an + bn + dn ) = a0 + b0 + c0 + d0 − α n (a0 + b0 + d0 ) ε − (1 − α)nα n−1 (a0 + b0 ) 2 ε = c0 + (1 − α n )d0 + [1 − α n − (1 − α)nα n−1 ](a0 + b0 ). 2 From these derivations, if a population is in Case 3 during periods 0, . . . , n, it obeys the following reduced form equations: b0 >

d0 2

a0 < c0 +

d0 2

an = α n a0 bn = α n b0 ε cn = c0 + (1 − α n )d0 + [1 − α n − (1 − α)nα n−1 ](a0 + b0 ) 2 ε dn = α n d0 + (1 − α)nα n−1 (a0 + b0 ). 2 Case 3 evolves to Case 1, because there exists n > 0 for which bn < dn /2, while an < cn + dn /2 holds for all n. These two inequalities imply that for some n, an + bn < cn + dn , which satisfies the conditions for Case 1. 17 17 If b = d /2 before b < d /2, then Case 4 is next, followed by either Case 1 and 2, or 6, as shown in when n n n n ties are considered later in the proof.

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

99

The first condition requires that dn αn nε nε − bn = d0 + (1 − α) α n−1 a0 + [(1 − α) α n−1 − α n ]b0 > 0. 2 2 4 4 This is satisfied for n large enough: (1 − α)

nε −α >0 4

or n>

4α . ε(1 − α)

That an < cn + dn /2 holds for all n will be proved by induction, beginning with the given inequality a0 < c0 + d0 /2. It is easy to see from an = α n a0 that a1 < a0 . It remains to be shown that cn + dn /2 > cn−1 + dn−1 /2 for all n: cn +

dn ε αn = c0 + (1 − α n )d0 + [1 − α n − (1 − α)nα n−1 ](a0 + b0 ) + d0 2 2 2 ε + (1 − α)nα n−1 (a0 + b0 ) 4 αn ε )d0 + [1 − α n − (1 − α)nα n−1 ](a0 + b0 ) = c0 + (1 − 2 4 α n−1 ε )d0 + [1 − α n−1 − (1 − α)(n − 1)α n−2 ](a0 + b0 ) > c0 + (1 − 2 4 dn−1 . = cn−1 + 2

The inequality follows from 1 − α n /2 > 1 − α n−1 /2 and ε ε [1 − α n − (1 − α)nα n−1 ] − [1 − α n−1 − (1 − α)(n − 1)α n−2 ] 4 4 nε (n − 1)ε n−2 α ] = (1 − α)[α n−1 − α n−1 + 4 4 (1 − α) n−2 ε α [4α + ε(n − 1)(1 − α) − εα] = (1 − α)α n−2 [α + (n − 1 − nα)] = 4 4 (1 − α) n−2 α [(n − 1)(1 − α) + (4 − ε)α] = 4 > 0 for all n > 0. Thus, an < cn + dn /2 for all n and Case 3 evolves to Case 1.

A.4. Breaking ties Finally, it is necessary to show that there are no steady states in which winners tie. (Note that, with a continuum of agents, these cases are not essential to the dynamic analysis.)

100

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

Next follows the description of the adoption process relevant to these cases to fill in the description of point 2(a(ii)) of Section 2, then the remaining cases are listed, and finally the evolution under each of these cases is shown to lead to one of the non-tie cases above. Whenever a subpopulation is referred to changing below, it is meant that (1 − α) of the subpopulation is changing, and the notation ‘0//’ indicates that the player can differentiate between strategy subsets 0 and . The description for point 2(a(ii)) of Section 2: Agents choose among equivalence classes according to their proportion among the winners, and then with equal likelihood within each equivalence class. 18 For example (referring to the first and third rows of Table 3), assume that agents using C and CT are the winners. Agents using C and CT do not change because they are winners. For agents using DT, both equivalence classes containing C and CT are singletons, so the agents that change from DT choose C and CT according to their proportion in the winners’ subpopulation. For forgetful agents using D, however, the equivalence class containing CT also contains DT. Thus, the agents that change from D choose C according to its proportion in the winners’ subpopulation, and split between CT and DT according to the proportion of the agents that are forgetful (the more that can distinguish between CT and DT, the more that choose CT)—see the terms that include the state variables b and d in Case 4. Case 4. C and CT tie when V¯ (C) = V¯ (CT) and V¯ (CT) > V¯ (D): b = d/2 and a < c+d/2.

Current strategy D DT

Agents adopt C//{CT, DT} C//CT

State equations a 0 = a + a/a + c(1 − α)b + a/a + c(1 − α)d b0 = αb c0 = (1 − ε/2)c/a + c(1 − α)b + c + c/a +c(1 − α)d d 0 = ε/2c/a + c(1 − α)b + αd

Case 5. D and CT tie when V¯ (D) = V¯ (CT) and V¯ (CT) > V¯ (C): a = c + d/2 and b > d/2. Current strategy C DT

Agents adopt D//{CT, DT} D//CT

State equations a 0 = αa b0 = b/b + c(1 − α)a + b + b/b + c(1 − α)d c0 = (1 − ε/2)c/b + c(1 − α)a + c + c/b +c(1 − α)d d 0 = ε/2c/b + c(1 − α)a + αd

18 The results are robust to having players uniformly randomize without regard to equivalence classes’ proportion in the winners’ subpopulation. The assumption made is meant to appeal to the strongest possible notion of ‘rationality’ within the model’s framework; many other schemes would seem to work as well due to the non-generic conditions under which ties occur.

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

101

Case 6. C and D tie when V¯ (C) = V¯ (D) and V¯ (D) > V¯ (CT): a + b = c + d and a > c + d/2. State equations a 0 = a + a/a + b(1 − α)(c + d) b0 = b + b/a + b(1 − α)(c + d) c0 = αc d 0 = αd Case 7. C, D, and CT all tie when V¯ (C) = V¯ (D) = V¯ (CT): b = d/2 and a = c + d/2. Current strategy CT DT

Agents adopt C//D C//D

State equations a 0 = a + a/a + b + c(1 − α)d b0 = b + b/a + b + c(1 − α)d c0 = c + c/a + b + c(1 − α)d d 0 = αd In Case 4, ε/2 of the D population switches to DT instead of one of the winning strategies, while all of the DT population adopts a winning strategy. Agents using C and CT do not change strategy, so the equality b = d/2 changes to b < d/2. Because the inequality a < c + d/2 may stay the same, reverse, or become an equality, the next state may be either of Cases 1 and 2, or 6. The equality a = c + d/2 is broken in Case 5 because the combined CT and DT subpopulation does not decrease, while ε/2 of the C subpopulation chooses DT. Now a < c + d/2. The status of the inequality b > d/2 in the next state is unclear, so the next state is one of Cases 1 and 3, or 4. For Case 6, the equality a + b = c + d becomes a + b > c + d because some agents using CT and DT change to C and D. The inequality a > c + d/2 remains true for the same reason. Thus the next state is Case 2. In Case 7 the inequality b = d/2 changes to b > d/2 and a = c + d/2 changes to a > c + d/2 because some of the agents using DT change to D. The next state is Case 2. An inspection of these results for ties (and footnotes 16 and 17) reveals no possibility of cycling through any group of cases without reaching Case 2, from which the system never leaves. The following complete results, which include the evolution from other initial populations (a0 , b0 , c0 , and/or d0 = 0), can be derived: Evolution to all D when a0 > 0, b0 > 0, c0 ≥ 0, d0 ≥ 0. Evolution to all CT when a0 = 0, b0 ≥ 0, c0 > 0, d0 > 0; or when a0 = 0, b0 > 0, c0 > 0, d0 ≥ 0. Evolution to all C when a0 > 0, b0 = 0, c0 ≥ 0, d0 > 0. No change in the population when a0 = 0, b0 ≥ 0, c0 = 0, d0 ≥ 0; or when a0 ≥ 0, b0 = 0, c0 ≥ 0, d0 = 0. Current strategy DT

Agents adopt C//D//CT

Appendix B. A model with observable strategies What if agents playing CT are the winners and all agents can observe that the winners are playing CT? Then the initial conditions and state equations are

102

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

b0 >

d0 2

a0 < c0 +

d0 2

an = α n a0 bn = α n b0 cn = c0 + (1 − α n )(d0 + a0 + b0 ) dn = α n d0 , and the population continues to emulate CT until all agents have adopted that strategy.With the only change to the model being that strategies are observable, cooperation becomes possible in equilibrium. The proof is that the inequalities bn > dn /2 and an +bn < cn +dn /2 remain true for all n: bn = α n b0 > α n

dn dn = , 2 2

and cn +

dn d0 − an = c0 + (1 − α n )(1 − c0 ) + α n ( − a0 ) 2 2 d 0 − a0 ) > 0. = 1 − α n + α n (c0 + 2

Assuming observable strategies does not change the results of either Case 1 or 2, but it is easy to see from the early periods of Figure 5 that Case 2 is not stable to an arbitrarily small population of mutants switching to CT.

Appendix C. Outcomes with total adjustment When α = 0, all the players adjust immediately. The evolution under the assumption of unobservable strategies is now extremely simple, but it is interesting because there is a discontinuity with α > 0. Cases 1 and 3 are not unstable as they are when α > 0: when α = 0, a population that contains all of the possible strategies can now evolve to one where all agents cooperate. The key to this result is that all players react to the winning strategy. The strategies that exploit winners’ weaknesses are eradicated immediately; within at most three periods, there is no segment of the population, not even a small one, that continues using the unsuccessful strategies. When there are clear winners playing C (Case 1) or D (Case 2), the entire population immediately adopts the winners’ strategy because all non-winners can uniquely identify the winning strategy and winners don’t change. When the winner is CT (Case 3), all players using C or D split immediately between CT and DT, CT continues to win, and in two periods all players are playing CT.

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

103

If there is a tie between C and D (Case 6), then players with strategies CT and DT immediately split between C and D. This causes D to win in the next period because a + b > c + d = 0 and a > c + d/2 = 0. The entire population then (as in Case 2) adopts D. If C and CT tie (Case 4), then the next periods’s population contains no D (b = 0), causing C to win, i.e. Case 1. If D and CT tie (Case 5), the next period’s population contains no C (a = 0), causing CT to win and placing the system in Case 3. If C, D, and CT tie (Case 7), then DT disappears (d = 0) and the population goes to one of Cases 2, 3, 5, or 6.

Appendix D. Varying ratio of DT-to CT-adoption by forgetful agents The state equations under this new specification require that the percentage of forgetful agents adopting CT in Case 3 vary with the proportion of agents using CT to the proportion using either DT (d) or CT (c). 19 The state equations in Case 3 are the only ones that change; when γ = c/(c + d) they become a 0 = αa b0 = αb c0 = (1 − (1 − γ )ε)(1 − α)(a + b) + c + (1 − α)d d 0 = ((1 − γ )ε)(1 − α)(a + b) + αd. Note that these state equations specify the ratio of agents adopting DT to those adopting CT to be decreasing in the proportion using CT. References Abreu, D., Rubinstein, A., 1988. The structure of Nash equilibrium in repeated games with finite automata. Econometrica 56, 1259–1281. Axelrod, R., 1984. The Evolution of Cooperation. Basic Books, New York. Axelrod, R., Hamilton, W., 1981. The evolution of cooperation. Science 211, 1390–1398. Bendor, J., Swistak, P., 1995. Types of evolutionary stability and the problem of cooperation. Proceedings of the National Academy of Science 92, 3596–3600. Binmore, K., Samuelson, L., 1992. Evolutionary stability in repeated games played by finite automata. Journal of Economic Theory 57, 278–305. Cooper, B., 1996. Copying Fidelity, presented at The Seventh Stony Brook Summer Festival on Game Theory. Dion, D., Axelrod, R., 1988. The further evolution of cooperation. Science 242, 1385–1390. Dixit, A., 1987. How should the United States Respond to Other Countries Trade Policies? in: R.M. Stern (Ed.), US Trade Policies in a Changing World, MIT Press, Cambridge, MA. Harrington, Jr. J.E., 1999. Rigidity of Social Systems, Johns Hopkins University, June 1998 Journal of Political Economy, 107, 40–64. Maynard Smith, J., 1974. The theory of games and the evolution of animal conflict. Journal of Theoretical Biology 47, 209–221. Maynard Smith, J., 1982. Evolution and The Theory of Games. Cambridge University Press, Cambridge. 19 The proof in Appendix A assumes that half of these agents choose CT and half choose DT (all non-forgetful agents correctly choose CT), hence the factors of ε/2 and 1 − ε/2 in the state equations when CT wins, Case 3.

104

C.S. Ruebeck / J. of Economic Behavior & Org. 40 (1999) 81–104

Molander, P., 1985. The optimal level of generosity in a selfish, uncertain environment. Journal of Conflict Resolution 29, 611–618. Piccione, M., Rubinstein, A., 1995. On the Interpretation of Decision Problems with Imperfect Recall. Tel Aviv University, Photocopy. Rubinstein, A., 1986. Finite automata play the repeated prisoner’s dilemma. Journal of Economic Theory 39, 83–96. Selten, R., Hammerstein, P., 1984. Gaps in Harley’s argument on evolutionarily stable learning rules and the logic of tit for tat. Behavioral and Brain Science 7, 115–116. Weibull, J., 1995. Evolutionary Game Theory. MA: MIT Press, Cambridge.