Review of Economic Dynamics 3, 311᎐329 Ž2000. doi:10.1006rredy.1999.0078, available online at http:rrwww.idealibrary.com on
Correlation Learning and the Robustness of Cooperation1 Nicola Dimitri Dipartimento di Economia Politica, Uni¨ ersita ` di Siena, 53100 Siena, Italy E-mail:
[email protected] Received December 1998
In the stage game Prisoner’s Dilemma one line of research which is pursued to justify the cooperative outcome is based upon some idea of correlation. This paper aims at testing whether correlation could support a cooperative behavior in the long run, by embedding the infinitely repeated game within a simple evolutionary framework. In particular, the main theorem states that just two born cooperati¨ e agents might remain cooperative forever with strictly positive probability. This robustness result appears to be particularly strong since the model allows cooperative agents to switch strategy and start defecting from a certain time onward, but not vice versa. Journal of Economic Literature Classification Numbers: C72, D83 䊚 2000 Academic Press
Key Words: cooperation; correlation; learning.
1. INTRODUCTION The question of how the ‘‘cooperative’’ outcome can be achieved in a Prisoner’s Dilemma stage game has attracted much attention since the outcome associated with the strictly dominating strategy, which is also the unique Nash and correlated equilibrium of the game, is Pareto inferior to it. Several of the theoretical efforts undertaken to justify cooperation can be seen to fall within these two very general approaches. The first consists of contributions based on widely accepted gametheoretic principles, essentially founded on Expected Utility Maximization, which, by incorporating appropriate assumptions, are capable of character1
Financial support from C.N.R. and the British Council, and hospitality of Clare Hall College ŽCambridge, UK. is gratefully acknowledged. I also thank Prof. David Levine and an anonymous referee for suggestions that much improved an earlier version of this paper. Comments from Tillman Borgers, Pierpaolo Battigalli, Marco Dardi, Frank Hahn, Pierluigi ¨ Sacco, and Stefano Vannucci have also been very useful. The usual disclaimers apply. 311 1094-2025r00 $35.00 Copyright 䊚 2000 by Academic Press All rights of reproduction in any form reserved.
312
NICOLA DIMITRI
izing cooperation as an equilibrium phenomenon. The second, broadly speaking, is founded on some idea of correlation between choices and in so doing departs from game-theoretic standards. Within the former group the following are some of the main results. In the infinitely repeated game, the Folk Theorem ŽFudenberg and Tirole, 1991. formalizes the idea that cooperation might emerge, as a Nash equilibrium solution, as long as agents are not too impatient. Binmore and Samuelson Ž1992. show that a version of the Evolutionary Stability criterion can select the cooperative outcome, as the unique equilibrium of the infinitely repeated game. Cooperation can also be proved to emerge if agents make mistakes ŽFudenberg and Maskin, 1990., though with small probabilities, in choosing their strategies. In the finitely repeated game, instead, it is known ŽFudenberg and Tirole, 1991. that the only subgame perfect equilibrium coincides with the Nash solution of the static game. This result was recently confirmed, also according to the Evolutionary Stability criterion as applied to games with a known number of repetitions ŽCressman, 1996.. Kreps et al. Ž1982. show, however, that with finite repetitions a specific form of reputation effect can entail cooperation, while Neyman Ž1985. and Zemel Ž1989. indicate that in such a context cooperation can indeed be justified when the game is played by automata exhibiting appropriate computational limitations. As far as the latter approach is concerned, Nozick Ž1993. and Binmore Ž1994. produce a fairly accurate account of results and related discussions. In particular, one can observe that a form of correlation appears to be lying at the heart of the controversial Newcomb and Twins Paradoxes. Since the paper is based upon an idea of correlation, it may be worth summarizing briefly the type of coordination behind these two situations. In Newcomb’s Paradox a decision maker tackles the following problem. She faces two boxes ŽI and II.. In box I there is always 1 dollar, while in II there could be either 0 or 2 dollars. Her problem is whether to choose both boxes or box II only. Dollars are put into the boxes by a being who perfectly forecasts what the choice of the individual will be. When she forecasts that both boxes will be chosen she puts in 0 dollars, while if the forecast is that II will be chosen she puts in 2 dollars. If choosing only box II is interpreted as a cooperative strategy, cooperation would obtain when the decision maker perfectly anticipates the being’s strategy. The Paradox of the Twins can be directly referred to the Prisoner’s Dilemma. In it cooperation is enhanced by a symmetric reasoning Žon the part of the players. of the following type. Each agent assumes that both she and the opponent are rational. Hence, every choice she will make will also be made by the other and so, under this configuration, the most rewarding strategy for them is to cooperate.
ROBUSTNESS OF COOPERATION
313
In comparing the equilibrium and correlation lines of research, one issue emerges in a natural way; while in the first set of works some of the contributions explicitly incorporate dynamic features through repetitions of the game, the same does not seem to be so much the case within the second group. Taking the above consideration as a starting point, this paper will not argue about the relative attractiveness of the two approaches, attempting instead to contribute to a more thorough evaluation of the second theoretical strand by investigating the effect of correlation inside a dynamic framework. In particular, the conceptual point guiding the article is the following. We have seen that, according to standard game-theoretic solutions, time could operate as a device for obtaining cooperation. Conversely, with the other approach cooperation is achievable in a static context, but it is far from clear whether correlation could sustain cooperation, and if it could, under what conditions, when time is explicitly incorporated into the model. In principle, one could not exclude the possibility that in this case time may Ževen more than. countervail the effect of correlation. The scope of this paper is, precisely and exclusively, to build up a framework for ‘‘testing this hypothesis’’ by constructing a relatively simple dynamic model that accommodates the appropriate ingredients. We shall perform the ‘‘test’’ by taking the specific stance of investigating the robustness of cooperation. More specifically, we analyze an infinitely repeated Prisoner’s Dilemma in which a certain number of players start playing cooperatively Žthey are ‘‘born’’ cooperative. and explore the circumstances under which such cooperative behavior can survive in the long run. Our setup is characterized by some of the main features distinguishing evolutionary modeling ŽMailath, 1992., but with two important exceptions. On the one hand it is true in fact that we do examine a context in which boundedly rational and unsophisticated agents Žaccording to the taxonomy of Mailath., from a finite population, are randomly matched in pairs at each time to play the stage game. On the other hand, however, unlike most game-theoretic evolutionary models, we do not incorporate the hypotheses that players are best replying against the distribution of strategies in the population, and, moreover, the role of mutations Žnoise. is not systematically investigated. In particular, the former is dealt with by introducing a local information structure, namely players who can only observe what their opponents have chosen. Then we imagine that learning about other players’ behavior unfolds through specific rules formulated in the spirit of fictitious play. These tools enable us to deliver the main issue of the paper. Indeed, we can formalize correlation in a rather ‘‘natural’’ way, assuming that at each
314
NICOLA DIMITRI
stage agents choose their own strategies by considering estimates of conditional probabilities, rather than marginal Žunconditional. probabilities, concerning the opponent’s choice. Finally, a point which is worth stressing is that the model should not be considered one of pure correlation in the sense that at each stage the players’ goal will still be to maximize Žthough in its conditional version. next period estimated expected utility. Absence of an additional random factor Žnoise., which in evolutionary models is typically enclosed to formalize the cumulative effects of mutations, tremblings, births, and deaths, etc., is justified by the fact that in this paper we want to isolate the effect of correlation with respect to its ‘‘systematic activity.’’ Indeed, as it is well known ŽKandori et al., 1993., the action of the additional stochastic component reverberates along two main directions: first, it solves indeterminacy in the long-run behavior of the system due to alternative specifications of the initial conditions, and second, it may prevent the system from getting stuck in ‘‘unattractive’’ positions. These issues are not our central concern in this work, and this explains why noise has not been investigated in the model. Nonetheless, we thought it useful to include in Section 3 a short discussion on how our work would have to proceed for us to investigate the matter. The main result of the paper says that, asymptotically, the players’ presumption of choices being correlated might support a cooperative attitude with positive probability. More explicitly, we show that if just two agents start playing cooperatively, there may be a strictly positive probability that not every player will eventually defect from cooperation. It should be observed that in a sense this robustness result is as strong as it could possibly be, since the model allows ‘‘cooperative born’’ agents to be infected and to start defecting, but not vice versa. Not surprisingly, we shall see that survival of cooperation is governed by the interaction of two parameters. a. The probability that cooperative agents are matched at each stage. b. A ratio of differences between two pairs of payoffs. The intuition behind this is rather simple. The cooperative-born players will remain cooperative over time if and only if they encounter each other sufficiently often such that the estimated conditional probability of being matched to a cooperative player, given that they cooperate, remains higher than the threshold parameter hinted at in b. It must be said that, conceptually speaking, the type of correlation which we consider is obviously different from both the one supporting the
ROBUSTNESS OF COOPERATION
315
Newcomb and Twins Paradoxes and the notion of Correlated Equilibrium ŽAumann, 1974, 1987.. Indeed, the kind of agents’ limited rationality that we model incorporates the idea of individuals who do not engage in ‘‘very sophisticated’’ reasonings on how the opponent will choose. This is justified by the assumption of random matching and anonymity; in the evolutionary framework that we explored, agents pursue neither deductive nor eductive reasoning. They base their choices on simple, manageable learning rules and decision criteria, so that, if anything, they could be seen to reason inductively ŽMilgrom and Roberts, 1991; Arthur, 1994.. However, in spite of this wider range of modeling possibilities that, with respect to game-theoretic standards, are offered by evolutionary contexts, it is important to be specific about the class of Prisoner’s Dilemma games that could be captured by our story. The games that we have in mind have at least two distinguishing features, which we make explicit below in an informal way. They are played simultaneously by rational agents Žpursuing expected utility maximization., but c. Individuals do not know that the game is simultaneous. d. Individuals do not know whether the opponent is rational. Indeed if c or d would not hold, we think that the inclusion of a correlation mechanism, of the sort that we have incorporated into the model, would lack meaningful game-theoretic interpretations. The following example should render our point transparent. Consider the original Prisoner’s Dilemma story; the two individuals are taken into separate rooms and told the payoffs of the game Žsay months in prison., which vary, depending on whether they confess that the other is guilty or not. It is standard to model this situation as a simultaneous game, with the simultaneity being mutually known by the two individuals. However, players here do not observe each other while playing, and it is perfectly plausible to imagine that, if not imposed by assumption, the timing of the game may not be known by the agents. The game itself will be simultaneous, but agents might not be sure that after having decided whether to confess or not, their decision will not be reported to the opponent before she plays. Moreover, if at this point the players are not supposed to know whether the opponent is rational or not, it is easy to imagine how correlation, as modeled by agents who think that her own actions might influence the opponent’s, could play a role. This should clarify why evolution, per se, will not be enough to encapsulate our idea of correlation, even though it will play an important role in making point d plausible.
316
NICOLA DIMITRI
2. THE MODEL Consider a finite even 2 number of players N ) 2 Ž N is also the set of players., who at time t, with t s 1, 2, . . . , meet to play the same Prisoner’s Dilemma game. At each stage agents are matched in pairs randomly, and independently over time, according to a sampling-without-replacement type of scheme. Matchings are then determined by consecutive ‘‘draws’’; more precisely, those agents who will play against each other are the first and the second drawn, the third and the fourth drawn, etc. Notice that this matching scheme is different from the ones proposed, for example, in some of the evolutionary game-theoretic models ŽKandori et al., 1993; Ellison, 1993.. Agents play a symmetric game, the payoff matrix of which is represented in Table 1. In each pair of figures, the left one indicates the payoff of player I and the right figure that of player II. Payoffs are ordered as follows: d ) a ) b ) c. The outcome associated with the pair of strategies Ž ␣ , ␣ . is Pareto superior to the one associated with Ž  ,  ., but the former pair is not a Nash equilibrium, i.e., it is not self-enforcing. Moreover, Ž  ,  . is the unique Nash equilibrium of this game, and strategy  strongly dominates ␣ . This is a standard formulation of the Prisoner’s Dilemma game ŽFudenberg and Tirole, 1991., a game in which we ask how players could have an incentive to cooperate and play Ž ␣ , ␣ .. A full description of the model is provided by the infinite history of strategies chosen by each individual. To explain how these histories are determined, we illustrate the criterion assumed to be adopted by players when facing their choice problem. The model works as follows. Time is discrete; at t s 0 ‘‘nature’’ determines the strategy played by an agent at the beginning, i.e., at t s 1. Then 2 This hypothesis is borrowed from the evolutionary game theoretic literature ŽKandori et al., 1993.. The analysis could of course be undertaken also with an odd number of players. In the matching scheme adopted here, however, at each stage there would be an unmatched player. A possible proposal might be that the unmatched agent simply does not play at that particular stage. However, the exploration of this case would not seem to offer meaningful conceptual improvements to the work.
TABLE 1 Prisoner’s Dilemma Payoff Matrix
␣ 
␣

a, a d, c
c, d b, b
317
ROBUSTNESS OF COOPERATION
for every integer t ) 1 agents will choose strategies, assuming that players can only observe at every stage, and recall in the future their own actions and those of the opponents. Strategy choices are based on an estimate of the probabilities which formalize beliefs about the next opponent’s action. This estimate is constructed in the spirit of the fictitious play scheme ŽBrown, 1951; Fudenberg and Kreps, 1993.. As already anticipated, however, the main result will be crucially dependent upon the modification, with respect to more standard versions of fictitious play, that we are introducing. To be more explicit, some definitions, and further notation are now necessary. Let n g N be a generic player and let nyg N Žfor every t . be her generic opponent; moreover, let sŽ n. t be the strategy played by n at time t. Agents are paired anonymously ŽEllison, 1994., so that at every stage players cannot recognize their opponents. Preplay communication could also be imagined as in Matsui Ž1991., and observe that cheap talk too might be a way to justify the presence of correlation. However, since it is not within the scope of this paper to theorize about this, for expositional simplicity we shall not include this in our construct. Let us define now the random variables ptŽ n. Ž i < j . s ⌺ k I Ž s Ž ny . s i , s Ž n . s j . rt r ⌺ k I Ž s Ž n . s j . rt
½
k
k
k
5 ½
k
if ⌺ k I Ž s Ž n . s j . / 0 ptŽ n. Ž i < j . s I Ž i s  .
k
if ⌺ k I Ž s Ž n . s j . s 0
Ž i.
5 Ž 1.
Ž ii . ,
with k s 1, 2 . . . ,t; t s 1, 2, 3, . . . ; i, j s ␣ ,  , and n g N. I Ž sŽ ny. k s i, sŽ n. k s j ., I Ž sŽ n. k s j ., and I Ž i s  . are standard indicator functions. Moreover, let ptŽ n.Ž i, j . and ptŽ n.Ž j . be defined as k
k
ptŽ n. Ž i , j . s ⌺ k I Ž s Ž ny . s i , s Ž n . s j . rt k
ptŽ n. Ž j . s ⌺ k I Ž s Ž n . s j . rt. The above random functions estimate the relevant probabilities which will appear in the model. In particular, ptŽ n.Ž i < j . estimates the conditional probability that ny would play strategy i at time t q 1, whenever n chooses strategy j at time t q 1. As we already suggested, its introduction is based on the idea that agents think choices might be correlated over players and behave accordingly by formalizing their beliefs about the opponent’s strategies with an estimate of the conditional probability. The
318
NICOLA DIMITRI
interpretation of the other random variables is analogous; ptŽ n.Ž i, j . could be seen as an estimate of the joint probability that ny would play strategy i and n would play strategy j at time t q 1. Finally, ptŽ n.Ž j . could be interpreted as an estimate of the marginal Žunconditional. probability that player n would play strategy j at time t q 1. Moreover, some short remarks on Žii. are also in order. It simply stipulates that whenever a player has no data on a certain strategy Žshe never played it., she estimates that the opponent would play the dominant strategy. Therefore unless cooperation has been experimented with and kept ali¨ e by the correlation conjecture, ‘‘in principle’’ every agent thinks that the opponent will defect. Define now ⌸ t Ž n; i ., with i s ␣ ,  , to be the expected payoff of agent n Žcalculated at time t after having played, for time t q 1. conditional on her playing strategy i Žat time t q 1.. We then assume that sŽ n. tq1 s ␣ if and only if 3 ⌸ t Ž n; ␣ . G ⌸ t Ž n;  . ,
Ž 2.
where t s 1, 2, 3, . . . , ⌸ t Ž n; ␣ . s aptŽ n. Ž ␣ < ␣ . q c Ž 1 y ptŽ n. Ž ␣ < ␣ . . , and ⌸ t Ž n;  . s dptŽ n. Ž ␣ <  . q b Ž 1 y ptŽ n. Ž ␣ <  . . . Solving Ž2., we have sŽ n. tq1 s ␣ iff ptŽ n. Ž ␣ < ␣ . G BptŽ n. Ž ␣ <  . q D,
Ž 3.
where B s Ž d y b .rŽ a y c . and D s Ž b y c .rŽ a y c ., so that B ) 0 and 0 - D - 1. Let C Ž t . be the number of agents playing ␣ at time t, with t s 1, 2, . . . where C Ž1. is the number of players who initially play cooperatively. The first point which is important to mention concerns the possibility that players will switch strategies. The result corroborates, in our slightly modified version of fictitious play, some already established propositions linking this learning scheme to strict Nash equilibria ŽFudenberg and Kreps, 1993.. 3
The assumption says that even in the case of an equality in expected payoff, players behave cooperatively. The main result, however, is substantially robust with respect to Ž2. holding as a strict inequality.
ROBUSTNESS OF COOPERATION
LEMMA 1. 1, T q 2.
319
If sŽ n.T s  then sŽ n. t s  , with T s 1, 2, . . . and t s T q
Proof. Suppose n starts playing cooperatively. Then, recalling that ptŽ n.Ž ␣ <  . s 0 if defection has not been chosen by n until time t, the result follows immediately from Ž3.. Consequently, in this case player n’s decision would depend only on the parameter D and not on B. Instead, if n defects at the first stage, it is easy to see that she will never change strategy in the future. Q.E.D. Summarizing, the above lemma implies that C Ž t . F C Ž1. for all t s 1, 2, 3, . . . , and that a state where all agents play  is a steady Ž absorbing . state, meaning that, with certainty, such a configuration will no longer change over time. As a consequence we can see immediately that if C Ž1. s 1, the only player who initially plays cooperatively switches to  at t s 2, and the system is immediately absorbed in N agents playing.4 From the above consideration the main issue here emerges in a natural way. More explicitly, we want to determine the minimum positive number of players strictly less than N Žand we will see that it exists., such that if they start playing cooperatively, a ‘‘cooperative attitude’’ may still be observed asymptotically with positi¨ e probability. The main result of the work provides an answer to this question, and, as already anticipated, C Ž1. s 2 could suffice to prevent the alleged absorption. Let C Ž1. s 2 and, for all D g Ž0, 1., define events A t Ž D . as A t Ž D . s C Ž 1 . s 2, C Ž 2 . s 2, . . . , C Ž t y 1 . s 2, C Ž t . s 0, C Ž t q 1 . s 0; . . . 4 , with t s 1, 2, . . . These events represent absorption of the system precisely at time t, given the initial condition C Ž1. s 2. Therefore events BT Ž D . and AŽ D ., defined as BT Ž D . s D t A t Ž D .
t s 1, 2 . . . ,T
Ž 4.
AŽ D . s D t A t Ž D .
t s 1, 2, . . . ,
Ž 4a .
indicate, respectively, the event that absorption will occur no later than T, and that absorption will ever occur. Furthermore, if we let S s ␣ ,  4 4
In what follows, by ‘‘absorption of the system’’ we shall always mean in a state where all players play noncooperatively. Indeed, the only possible exception would be C Ž1. s N, a trivial case which we will not consider.
320
NICOLA DIMITRI
then, for all n g N, S T is the space of all possible sample paths, and s Ž T . s s Ž ny . , s Ž ny . , . . . , s Ž ny . 1
2
T
4
represents the strategies chosen by ny, up to time T. Finally, we assume n and m to be the only two players who initially play cooperatively, i.e., sŽ n.1 s ␣ s sŽ m.1 ; moreover, note that P Ž A1 Ž D . . s 0 s P Ž B1 Ž D . . and that P I Ž s Ž ny .
ž
kq 1
s  . s 1 ¬ C Ž i . s 2; ᭙ i s 1, . . . , k s Ž N y 2 . r Ž N y 1 .
/
for all D g Ž0, 1. and k s 1, 2, . . . , where the symbol P Ž⭈. stands for an objecti¨ e probability. Since n and m’s choices are exactly symmetric, in what follows we shall deal with only one of them Žin particular with n.. We are now ready to state two lemmata which provide some interesting preliminary characterizations of the model. LEMMA 2 ŽWeak Monotonocity of P Ž BT Ž D ... If D L and D U are such that 0 - D L - D U - 1, then P Ž BT Ž D L .. F P Ž BT Ž D U .., ᭙T s 2, 3, . . . Proof. To study the probability of absorption, it is easier to evaluate the complementary probability. From Ž3. we observe that to calculate the probability of no absorption before or at T, we need to evaluate the probability of those sample paths sŽT . such that ptŽ n.Ž ␣ < ␣ . is never strictly lower than D strictly before T. Notice, however, that if piŽ n.Ž ␣ < ␣ . G D, ᭙ i s 1, . . . , t, then by Lemma 1, ptŽ n.Ž ␣ < ␣ . s ptŽ ny .Ž ␣ ., so that what we really have to investigate is the behavior of ptŽ ny .Ž ␣ .. Fix a T and consider D U. If absorption occurs before or at T, for all sample paths sŽT ., the result follows immediately. Suppose instead that absorption can occur strictly after T ; then there must exist at least one vector sŽT . g S T , the components of which satisfy i
ptŽ ny . Ž ␣ . s ⌺ i I Ž s Ž ny . s ␣ . rt G D U ,
i s 1, . . . , t ; t s 1, . . . , T y 1,
for all such t. This of course implies that the above inequality holds true also for D s D L , which means that every sample path preventing absorption when D s D U prevents it for D s D L as well. Therefore P Ž BT Ž D U . c .
321
ROBUSTNESS OF COOPERATION
F P Ž BT Ž D L . c ., where, for any D g Ž0, 1., BT Ž D . c stands for the complement of BT Ž D . 5 and the result follows. Q.E.D. Weak Monotonicity of P Ž BT Ž D .. is not really a surprising result in the model. We shall see that it nicely complements the main finding of the paper whenever its application is not trivial. However, before going to it we formulate the second preliminary lemma. LEMMA 3. Ž0, 1..
P Ž BT Ž D .. - 1 for each finite T s 2, 3, . . . and for all D g
Proof. Fix D g Ž0, 1. and suppose there exists a T such that P Ž BT Ž D .. s 1, which implies that P Ž A t Ž D .. s 0 for t s T q 1, T q 2, . . . However, consider the vector sŽ k . g S k , defined as T
s Ž k . s s Ž ny . s ␣ , s Ž ny . s ␣ , . . . , s Ž ny . s ␣ , s Ž ny . 1
2
Tq1
k
s  , . . . , s Ž ny . s  4 , where k ) T is the smallest positive integer such that pky 1Ž ny . Ž ␣ . s Tr Ž k y 1 . G D ) Trk s pkŽ ny . Ž ␣ . . Obviously, the existence of k is guaranteed by the assumption D ) 0. It is simple to check that sŽ k . entails absorption of the system precisely at time k q 1, with strictly positive probability Ž N y 2. ky T rŽ N y 1. k . This contradicts the initial implication and proves the lemma. Q.E.D. We now formulate the main result of the paper. THEOREM. P Ž AŽ D .. s 1 for all D g Ž1rŽ N y 1., 1. and P Ž AŽ D .. - 1 for all D g Ž0, 1rŽ N y 1... Proof. Ža. Take D s 0. In this case the stochastic process I Ž sŽ ny. t s ␣ . is made of i.i.d. Bernoulli variables with parameter 1rŽ N y 1.; hence, by the SLLN the sequence of random variables ptŽ ny .Ž ␣ . converges to 1rŽ N y 1. Žthe expected value of I Ž sŽ ny. t s ␣ .. with probability one. Now consider a number D* g Ž1rŽ N y 1., 1.; then the event c
A Ž D* . s p1Ž ny . Ž ␣ . G D*, p 2Ž ny . Ž ␣ . G D*, . . . , ptŽ ny . Ž ␣ . G DU , . . . 4 , 5
For completeness one should say that the state space ⍀ which we are effectively dealing with is an N!-dimensional vector. Each element of ⍀ is a sequence of N Žmatched. players, and so ⍀T has dimension N!T. However, to avoid overburdening the exposition with too much notation in what follows, we shall not refer to the whole space explicitly but only to the relevant events.
322
NICOLA DIMITRI
where AŽ D*. c is the complement of AŽ D*., has probability zero, and this proves the first part of the theorem. Žb. Take D s 0 and consider now a number D* g Ž0, 1rŽ N y 1... We want to evaluate P Ž ptŽ ny . Ž ␣ . G D*, pty1Ž ny . Ž ␣ . G D*, . . . , p1Ž ny . Ž ␣ . G DU . .
Ž 5.
By the SLLN there exists a t such that the event
ptŽ n . Ž ␣ . G D*, pty1Ž n y
y
.
G D*, . . . , ptq1Ž ny . Ž ␣ . G D* 4
is true for all t ) t with probability one. Moreover, noticing that the sequence
s Ž ny . 1 s ␣ , . . . , s Ž ny . t s ␣ 4 , occurring with probability 1rŽ N y 1. t , implies that P Ž ptŽ ny . Ž ␣ . G D*, . . . , p1Ž ny . Ž ␣ . G D* . ) 0, it can immediately be concluded that for all t ) t, Ž5. is given by P Ž ptŽ ny . Ž ␣ . G D*, pty1Ž ny . Ž ␣ . G DU ; . . . , ptq1Ž ny . Ž ␣ . G D* < ptŽ ny . Ž ␣ . G D*, . . . , p1Ž ny . Ž ␣ . G D* . P Ž ptŽ ny . Ž ␣ . G D*, . . . , p1Ž ny . Ž ␣ . G D* . s P Ž ptŽ ny . Ž ␣ . G D*, pty1Ž ny . Ž ␣ . G D*, . . . , ptq1Ž ny . Ž ␣ . G D* . P Ž ptŽ ny . Ž ␣ . G D*, . . . , p1Ž ny . Ž ␣ . G D* . ) 0. Q.E.D. The above result could be generalized to a situation in which L - N players start playing cooperatively. In particular, we can now also investigate the conditions under which these L initially cooperative players will all change strategy with probability one. Prior to this analysis, however, we observe that should L ) 2 be an odd number, then, with certainty, one cooperative player will become defective at t s 2. Indeed, in this case, at t s 1 at least one of them will be matched to a defective player, and so she will change attitude from t s 2 onward. As a consequence, without loss of generality, in what follows we shall assume L to be an even number of players.
323
ROBUSTNESS OF COOPERATION
Let n still be one of the L initially cooperative players Žwith ny as her opponent., and again let AŽ D . be the event that n will start defecting from a certain time onward. Moreover, with a little notational abuse, L will stand for both the cardinality and the set of initially cooperative individuals. Then, conditionally on l F L players acting cooperatively Žwith n being one of the l players. at date i, it is easy to verify that now the Žobjective. probability that n meets a defective player is given by P I Ž s Ž ny .
ž
kq 1
s  . s 1 < C Ž i . s l ; ᭙ i s 1, . . . , k and l s 1, . . . , L
/
s Ž N y l . r Ž N y 1. . Finally, let A r Ž D . be the event that, among the players in L y n4 , exactly r F L y 1 of them eventually switch to defection. Having set the notation to generalize the theorem, we are now in the position to state the following lemma. LEMMA 4. P Ž AŽ D .. s 1 for all D g ŽŽ L y 1.rŽ N y 1., 1. and Ž Ž .. Ž Ž P A D - 1 for all D g 0, L y 1.rŽ N y 1... Proof. First of all notice that Lemma 1 holds in this case, too; namely, a cooperative player can become defective but not vice versa. Ža. Take D g ŽŽ L y 1.rŽ N y 1., 1.. Since events A r Ž D . are disjoint, P Ž AŽ D . . s ⌺r P Ž AŽ D . l A r Ž D . .
with r s 0, 1, . . . , L y 1. Ž ) .
We claim that P Ž AŽ D . l A r Ž D .. s 0, for r - L y 1, and P Ž AŽ D . l A Ly 1Ž D .. s 1. Indeed, take first Ž AŽ D . l A r Ž D .., with r - L y 1. Because from a certain point in time onward the probability for n of being matched to a cooperative player, when exactly r players have switched to defection, is Ž L y r y 1.rŽ N y 1., following a reasoning analogous to the one put forward in the proof of the main theorem, we obtain that agent n, too, will eventually switch to defection with probability one. This being the case, however, it cannot be that only r q 1 players switch to defection. In fact, since, from a certain date onward, there would now be L y r y 1 cooperative players in the system, letting player n* be one of these players and iterating the above argument, we obtain that n* too switches to defection with probability one and, continuing along this line, that P Ž AŽ D . l A r Ž D .. s 0 for all r - L y 1.
324
NICOLA DIMITRI
Finally consider event Ž AŽ D . l A Ly 1Ž D ... Since, according to the above reasoning, P Ž A Ly 1Ž D .. s 1, it follows that P Ž A Ž D . l A Ly 1 Ž D . . s P Ž A Ž D . < A Ly1 Ž D . . P Ž A Ly1 Ž D . . s 1. Žb. Now take D g Ž0, Ž L y 1.rŽ N y 1... Without loss of generality, let i s 1, . . . , L be the L initially cooperative players. Then consider the event
the first L players drawn at time t are Ž in this precise order . gi¨ en by the ¨ ector 1, 2, 3, . . . , L4 4 at each t s 1, . . . , t with t sufficiently large. The above event clearly implies that all cooperative players will meet among themselves, up to time t, with strictly positive probability wŽ N y L.!rN!x t. Moreover, in analogy to the case of two initially cooperative players, the SLLN implies that when D s 0 the random variable ptŽ ny .Ž ␣ ., for all n g L, converges to Ž L y 1.rŽ N y 1. with probability one. Hence, sample paths in which all initially cooperative players do not switch to defection have strictly positive probability. Q.E.D. The lemma suggests immediately how P Ž AŽ D . behaves as N gets large. Indeed from it we deduce that Ž0, Ž L y 1.rŽ N y 1.., the range of D within which there could be positive probability of the initially cooperative players remaining as such, tends to the empty interval. Hence, the probability that all of the L players will remain cooperative tends to zero over the whole range of D. Intuitively, as for the case of two players, N should not be ‘‘too large’’ for a cooperative attitude to be preserved in all of the L players, since otherwise the chance for cooperative individuals to meet each other would become too low and defection would take place with probability one. The above consideration makes explicit the existence of a tradeoff between D and N. From this point of view, an alternative useful way to look at the matter is by interpreting Ž L y 1.rŽ N y 1. as a cutoff point Žfor D . for preserving cooperation with positive probability. More particularly, starting with L G 2 cooperative players, it is clear that the greatest N preventing full switching to defection is the positive integer solving the following simple maximization problem: max N such that N F Ž L y 1 q D . rD.
Ž 6.
The upper bound to N for cooperation to survive, namely Ž L y 1 q D .rD, is then a decreasing function of D. This suggests, for example, that when payoffs b and c get close to each other Ži.e., when D is near zero., the maximum N preserving cooperation is very large, while when payoffs b and a are close Žnamely D is next to one. N and L tend to coincide. In the
ROBUSTNESS OF COOPERATION
325
former case a sufficiently small D allows N to be large, since the payoff structure represents a ‘‘weaker’’ incentive to defect, and a high number of defectors in the population would not ‘‘easily’’ induce a cooperator to change attitude. In the latter case, on the contrary, when D gets larger, N must be small because now defection is attractive and this could more easily induce a change in a cooperator’s attitude. Hence again, for cooperation to survive, there should be a low chance that cooperative individuals are matched with defective players.
3. MUTATIONS In evolutionary game theory an important part of the analysis is often played by mutations Žnoise., namely by the possibility that, at each date, with some exogenous positive probability, a player would switch strategy. As it was earlier anticipated, it is not a goal of this paper to undertake a full investigation of the matter. Nonetheless it may be of interest to discuss briefly the fact that the introduction of noise in this model cannot in general be studied by means of the analytical techniques most widely used in the established literature ŽKandori et al., 1993; Young, 1993, 1998; Fudenberg and Levine Ž1998., notably via Markov chains. When mutations are contemplated, the noise process superimposes on the underlying Žin our case. stochastic decision process. Typically if, at each date, ⑀ ) 0 is the mutation probability of every agent, the analysis looks for the stochastically stable states of the system, namely those states that as ⑀ ª 0 receive Žpossibly. probability one by the limit of the Žparametrized by ⑀ . unique invariant distribution of the Markov chain. Since the learning rules introduced in this paper are of the fictitious play type, transition probabilities between states Ždefined by the number of cooperative players. are time-dependent, and, what is more, the resulting system is non-Markovian. Moreover, as the underlying stochastic decision process depends upon the ‘‘threshold’’ parameter D, transition probabilities would also vary according to its value. Below we illustrate the above issues with the aid of a simple example. Let t still be the time index, with t s 0, 1, 2, . . . ; furthermore, to start with, suppose C Ž0. s 2. This means that at t s 1 two players will start playing cooperatively. However, still at t s 1 immediately after having played, but before t s 2, players can all mutate with probability ⑀ ) 0 Žindependently of each other.; this implies that C Ž1. could actually be any number between 0 and N, with strictly positive probability. Finally, assume that once a player mutates, the value of her learning rule will be like that of an individual who would just start playing the new strategy and that
326
NICOLA DIMITRI
mutations at any date t will follow the same pattern illustrated for mutations at t s 1. 3.1
Nonstationarity and Dependence on D
To discuss the issue we calculate P Ž C Ž1. s 0 < C Ž0. s 2., P Ž C Ž2. s 0 < C Ž1. s 2. and show that they may differ. First consider D ) 1r2 and notice that P Ž C Ž 1. s 0 < C Ž 0. s 2. s 1r Ž N y 1 .
⑀ 2 Ž1 y ⑀ .
Ny 2
q Ž N y 2. Ž 1 y ⑀ .
N
.
Indeed, from P Ž C Ž0. s 2. s 1, it follows that P Ž C Ž 1 . s 0 < C Ž 0 . s 2 . s P Ž C Ž 1 . s 0 l C Ž 0 . s 2 . rP Ž C Ž 0 . s 2 . s P Ž C Ž 1. s 0 l C Ž 0. s 2. , where the event Ž C Ž1. s 0 l C Ž0. s 2. can occur iff either
the two cooperati¨ e players meet at t s 1 and then mutate, while no defecti¨ e player mutates4 or
the two cooperati¨ e players do not meet at t s 1 and then no defecti¨ e player mutates4 obtains, which explains the above probability. Following a similar reasoning, one can check that P Ž C Ž 2. s 0 < C Ž 1. s 2. s P Ž C Ž 2. s 0 < C Ž 1. s 2 l C Ž 0. s 2. is given by 1r Ž N y 1 .
2
f Ž ⑀ . q Ž N y 2. Ž 1 y ⑀ .
2N
r
½ 1rŽ N y 1.
g Ž ⑀ . q Ž1 y ⑀ .
N
5,
where f Ž ⑀ . and g Ž ⑀ . are continuous functions such that f Ž0. s 0 s g Ž0., which coincides with P Ž C Ž1. s 0 < C Ž0. s 2., implying that transition probabilities, in this case, are time-independent.
327
ROBUSTNESS OF COOPERATION
Suppose now D F 1r2; in this case P Ž C Ž1. s 0 < C Ž0. s 2. and P Ž C Ž1. s 2 < C Ž0. s 2. do not change, with respect to when D ) 1r2, but the two terms in the numerator of P Ž C Ž2. s 0 < C Ž1. s 2. when D ) 1r2 are now replaced by the term hŽ ⑀ ., where hŽ ⑀ . is a continuous function such that hŽ0. s 0, from which we observe that, in general, the transition probabilities of the system are nonstationary and dependent on D. It is also worth noticing that here time dependence is also characterizing the underlying stochastic decision process. Indeed, if in the expressions of conditional probabilities we put ⑀ s 0, we obtain P Ž C Ž 1. s 0 < C Ž 0. s 2. s Ž N y 2. r Ž N y 1.
and
P Ž C Ž 2 . s 0 < C Ž 1 . s 2 . s 0. 3.2.
Non-Marko¨ ianity
To see this we consider now C Ž0. s 0 and show that P Ž C Ž2. s 0 < C Ž1. s 2 l C Ž0. s 0. differs from P Ž C Ž2. s 0 < C Ž1. s 2 l C Ž0. s 2.. Indeed, P Ž C Ž 2. s 0 < C Ž 1. s 2 l C Ž 0. s 0. s
Ž 1r Ž N y 1 . . ⑀ 2 Ž 1 y ⑀ .
Ž Ny2 .
q Ž Ž N y 2. r Ž N y 1. . Ž 1 y ⑀ .
N
,
which in general differs from P Ž C Ž2. s 0 ¬ C Ž1. s 2 l C Ž0. s 2., hence entailing the absence of the memoryless property, the distinguishing feature of Markovian processes. The above example suggests that the process with ⑀ ) 0 probability of mutation is considerably more articulate than the underlying stochastic decision process. This is essentially because of the very scope behind the introduction of noise, namely because of the possibility that at each date the number of cooperative players could be any integer between 0 and N with positive probability, an impossible contingency when the decision process alone is considered.
4. CONCLUSIONS This paper investigates the possibility that a cooperative attitude may ‘‘survive’’ within an infinitely repeated Prisoner’s Dilemma. More specifically, the model aims at providing a relatively simple dynamic evolutionary framework within which to ‘‘test’’ whether a presumed Žon the part of the
328
NICOLA DIMITRI
players. correlation between agents’ strategies could support cooperation in the long run. The main result states that just two initially born cooperative players Žout of N . might be enough to prevent an almost sure absorption of the system in a state where all agents are noncooperative. Indeed, a presumed correlation may generate a statistical correlation, and the conjecture could then sustain cooperation with positive probability.
REFERENCES Arthur, B. Ž1994.. ‘‘Inductive Reasoning and Bounded Rationality,’’ American Economic Re¨ iew 84, 406᎐411. Aumann, R. Ž1974.. ‘‘Subjectivity and Correlation in Randomized Strategies,’’ Journal of Mathematical Economics 1, 67᎐96. Aumann, R. Ž1987.. ‘‘Correlated Equilibrium as an Expression of Bayesian Rationality,’’ Econometrica 55, 1᎐18. Binmore, K. Ž1994.. Game Theory and the Social Contract. Vol. I, Playing Fair. Cambridge, MA: MIT Press. Binmore, K., and Samuelson, L. Ž1992.. ‘‘Evolutionary Stability in Repeated Games Played by Finite Automata,’’ Journal of Economic Theory 57, 278᎐305. Brown, G. W. Ž1951.. ‘‘Iterative Solution of Games by Fictitious Play,’’ in Acti¨ ity Analysis of Production and Allocation. New York: Wiley. Cressman, R. Ž1996.. ‘‘Evolutionary Stability in the Finitely Repeated Prisoner’s Dilemma,’’ Journal of Economic Theory 68, 234᎐248. Ellison, G. Ž1993.. ‘‘Learning, Local Interaction, and Coordination,’’ Econometrica 61, 1047᎐1071. Ellison, G. Ž1994.. ‘‘Cooperation in the Prisoner’s Dilemma with Anonymous Random Matching,’’ Re¨ iew of Economic Studies 61, 567᎐588. Fudenberg, D., and Kreps, D. Ž1993.. ‘‘Learning Mixed Equilibria,’’ Games and Economic Beha¨ ior 5, 320᎐367. Fudenberg, D., and Levine, D. Ž1998.. The Theory of Learning in Games. Cambridge, MA: MIT Press. Fudenberg, D., and Maskin, E. Ž1990.. ‘‘Evolution and Cooperation in Noisy Repeated Games,’’ American Economic Re¨ iew 80, 274᎐279. Fudenberg, D., and Tirole, J. Ž1991.. Game Theory. Cambridge, MA: MIT Press. Kandori, M., Mailath, G., and Rob, R. Ž1993.. ‘‘Learning, Mutation, and Long Run Equilibria in Games,’’ Econometrica 61, 29᎐56. Kreps, D., Milgrom, P., Roberts, J., and Wilson, R. Ž1982.. ‘‘Rational Cooperation in the Finitely Repeated Prisoner’s Dilemma,’’ Journal of Economic Theory 27, 245᎐252. Mailath, G. Ž1992.. ‘‘Introduction: Symposium on Evolutionary Game Theory,’’ Journal of Economic Theory 57, 259᎐277. Matsui, A. Ž1991.. ‘‘Cheap Talk and Cooperation in Society,’’ Journal of Economic Theory 54, 245᎐258. Milgrom, P., and Roberts, J. Ž1991.. ‘‘Adaptive and Sophisticated Learning in Normal Form Games,’’ Games and Economic Beha¨ ior 3, 82᎐100.
ROBUSTNESS OF COOPERATION
329
Neyman, A. Ž1985.. ‘‘Bounded Complexity Justifies Cooperation in the Finitely Repeated Prisoners’ Dilemma,’’ Economics Letters 19, 227᎐229. Nozick, R. Ž1993.. The Nature of Rationality. Princeton, NJ: Princeton Univ. Press. Young, P. Ž1993.. ‘‘The Evolution of Conventions,’’ Econometrica 61, 57᎐83. Young, P. Ž1998.. Indi¨ idual Strategy and Social Structure. Princeton, NJ: Princeton Univ. Press. Zemel, E. Ž1989.. ‘‘Small Talk and Cooperation: A Note on Bounded Rationality,’’ Journal of Economic Theory 49, 1᎐9.