Available online at www.sciencedirect.com
Journal of Economic Theory 148 (2013) 409–417 www.elsevier.com/locate/jet
Notes
Imitating cooperation and the formation of long-term relationships Heiner Schumacher 1 Goethe-University Frankfurt, Germany Received 9 December 2007; final version received 28 March 2012; accepted 16 July 2012 Available online 13 December 2012
Abstract We study the infinitely repeated prisoner’s dilemma with the option to maintain or to quit relationships. We show that if agents imitate successful strategies infrequently, defection is not dynamically stable and cooperation emerges regardless of the initial distribution of strategies. © 2012 Elsevier Inc. All rights reserved. JEL classification: C70; C72 Keywords: Prisoner’s dilemma; Imitation; Random matching; Relationships
1. Introduction This paper studies the infinitely repeated prisoner’s dilemma (PD) where agents have the option to maintain or to quit relationships. Evolution takes place via imitation. We show that if the imitation rate is sufficiently small, then defection is not dynamically stable, i.e., it is imitated only finitely often, regardless of the initial distribution of strategies. The intuition is that the population dynamics draws cooperators into relationships, while defectors are mostly paired up with other defectors. In our model, every agent from an infinite population plays with some opponent in each period. After observing the opponent’s action choice, an agent chooses whether to stay with that E-mail address:
[email protected]. 1 I thank Christian Hellwig (the editor), a referee and an associate editor for superb advice; Chaim Fershtman, Guido
Friebel, Michael Kosfeld, Jörg Oechssler, Ernst-Ludwig von Thadden, the session participants of the European Winter Meeting of the Econometric Society for useful discussions. All remaining errors are my own. 0022-0531/$ – see front matter © 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jet.2012.12.008
410
H. Schumacher / Journal of Economic Theory 148 (2013) 409–417
opponent or not. If both agents maintain the relationship, they play the game together in the next period. Otherwise, they return to a pool of unmatched agents, from which new pairs are formed randomly at the beginning of the next period. In a given period, each agent is either a cooperator or a defector. A cooperator (defector) cooperates (defects) and maintains the relationship as long as the opponent cooperates. Thus, only cooperators maintain relationships. Agents may change their strategy through experimentation and imitation. In each period, with probability ε, an agent experiments and then chooses randomly whether to continue as cooperator or defector. Those who do not experiment imitate with probability μ and then adopt the strategy that currently obtains the highest average payoff, unless their current payoff is as large or even larger. We show that if μ is sufficiently small, and ε is sufficiently small relative to μ, then, after finitely many periods, imitating agents always become cooperators. Cooperation then spreads in the population. The intuition for this is as follows. (1) Within the pool, the share of cooperators becomes small if μ and ε are small, and defectors then earn on average a payoff close to mutual defection. The reason is that cooperators sustain relationships when matched with each other—except when experimentation or imitation occurs—while defectors immediately go back to the pool. (2) Within the population of cooperators, the share of unmatched cooperators becomes small. The reason is that matched cooperators will never imitate cooperation (because of (1)). So the flow of currently matched cooperators that go to the pool is solely due to experimentation. As a result, even when the population of cooperators is small (thanks to experimentation though, it cannot remain smaller than ε/2μ), that flow remains small compared to the flow of unmatched cooperators that meet each other (hence durably get out of the pool). (3) Once most agents in the pool are defectors and most cooperators are in relationships, defectors start imitating cooperators, and cooperation quickly spreads: new cooperators quickly find each other and get out of the pool. Of course, experimentation breaks these ties, but when experimentation is small relative to imitation, cooperation spreads to the whole population. The previous literature2 suggests that agents have to “start small” in order to implement cooperation in the infinitely repeated PD with the option to maintain or to quit relationships: at the beginning of a new relationship both agents defect and start to cooperate in later periods. Whenever an agent deviates from this path of play, the opponent quits the relationship. Thus, any gain from deviation is wiped out by the subsequent phase of low payoffs in the next relationship. In our model, “starting small” is not needed to establish large-scale cooperation. If μ is sufficiently small, and ε is sufficiently small relative to μ, cooperators end up in relationships, while defectors stay in the pool where they are mostly paired up with other defectors. Cooperators then earn on average more than defectors and hence cooperation spreads in the population through imitation. Several papers consider the idea that defection is not a stable outcome in evolutionary models. Binmore and Samuelson [2] show this for the automaton selection game. There, strategies like “tit-for-tat” earn a higher average payoff than defection: “tit-for-tat” is almost never exploited by defection and earns the payoff from mutual cooperation when paired up with itself. Eshel et al. [5] analyze a spatial model where each agent imitates the strategy of successful neighbors. The local interaction structure allows cooperators to group together so that the benefits of cooperation are mostly enjoyed by cooperators. It is shown that there will be a sizeable share of cooperators in the population, regardless of the initial distribution of strategies. The reason why defection is not 2 See Datta [4], Fujiwara-Greve and Okuno-Fujiwara [6], Gosh and Ray [7] and Kranton [9].
H. Schumacher / Journal of Economic Theory 148 (2013) 409–417
411
dynamically stable for small μ in our framework is that cooperators can avoid interaction with defectors by building up relationships. This paper is organized as follows: in the next section, we develop the model, state the main result and provide its proof. Section 3 briefly discusses robustness and extensions of the model. 2. Imitating cooperation 2.1. Model Time is discrete and denoted by t ∈ {0, 1, 2, . . .}. The population is a mass 1 continuum of agents. In each period, each agent plays the following PD with some opponent: D C D 1, 1 H, 0 C 0, H G, G We set G, H ∈ R+ with 1 < G < H < 2G, so that the sum of payoffs is maximal if both agents choose C. After observing their opponent’s action choice, agents have the option to maintain, M, or to quit, Q, the relationship. If both agents choose M, they play the PD against each other in the next period. We then say that these agents are in a relationship. If at least one agent chooses Q, then both return to the pool of unmatched agents from which new pairs are formed randomly in the next period. Therefore, an agent is in one of two states at the beginning of a period: either in a relationship or in the pool. In each period, an agent is either a cooperator or a defector. A cooperator plays strategy σ C “choose C; if your opponent cooperated, choose M, otherwise choose Q”. A defector plays strategy σ D “choose D; if your opponent cooperated, choose M, otherwise choose Q”. Thus, only cooperators form relationships. Let yC (t) be the share of cooperators, and yD (t) be the share of defectors at the beginning of period t . The share of matched cooperators is yCm (t) and the share of unmatched cooperators is yCu (t). The share of agents in the pool is given by y u (t) = yD (t) + yCu (t). Within the pool, the share of cooperators is given by yCu (t) , y u (t) and within the population of cooperators, the share of matched cooperators is given by s(t) =
(1)
yCm (t) . (2) yC (t) Let 2 be the standard 2-dimensional simplex. The distribution of states and strategies in period t is given by Y (t) ∈ 2 , where m yC (t) Y (t) = . (3) yCu (t) The average payoff of defectors in period t is h(t) =
U¯ D (t) = s(t)H + 1 − s(t).
(4)
The average payoff of unmatched cooperators in period t is s(t)G. Matched cooperators earn G. Thus, the average payoff of cooperators in period t is (5) U¯ C (t) = h(t)G + 1 − h(t) s(t)G.
412
H. Schumacher / Journal of Economic Theory 148 (2013) 409–417
In each period, an agent experiments with probability ε and then becomes (with equal chance) either a cooperator or a defector. An agent, who does not experiment, acts with probability μ according to the following rule: Imitation rule. Switch to the strategy with the highest average payoff if and only if this payoff exceeds your current payoff (if σ C and σ D have the same average payoff, then choose σ D ). Otherwise, do not change your strategy. The sequence of events in each period is as follows: (i) agents in the pool are paired up randomly; (ii) agents play the PD; (iii) payoffs are realized and agents observe the action choice of their opponent; (iv) with probability ε, an agent experiments; agents who do not experiment act with probability μ according to the imitation rule; (v) agents who experimented or who changed their strategy through imitation choose Q3 ; all other agents choose Q or M according to their strategy. For given Y (0), this defines a deterministic sequence {Y (t)}∞ t=1 . The equations of motion are in Appendix A. For given μ, a strategy σ i , i ∈ {D, C}, is dynamically stable if there exists a Y ∈ 2 and an ε¯ such that for Y (0) = Y and ε ε¯ strategy σ i is imitated infinitely often. If a strategy is not dynamically stable for given μ, it disappears for ε → 0. 2.2. Main result We show that σ D is not dynamically stable for small μ. Consider first what happens if μ = ε = 0, such that no agent ever changes strategy. Pick any Y (0) ∈ 2 . Cooperators then end up in relationships, while defectors remain in the pool. As a result, U¯ D converges to 1, while U¯ C converges to G. Now let there be imitation and experimentation. It then could be that cooperators—even those in relationships—want to imitate defection. In particular, this happens if within the pool the share of cooperators is large. However, when μ and ε are sufficiently small, then after a while this cannot happen. Unmatched cooperators either become defectors (through experimentation or imitation) or get matched to each other. Defectors and matched cooperators become unmatched cooperators only through experimentation and imitation. Hence, if μ and ε are small, then within the pool the share of cooperators becomes small so that defectors earn a payoff close to 1. Matched cooperators will then never imitate defection. It could also be that defectors never imitate cooperation. In particular, this happens if within the population of cooperators the share of matched cooperators remains small. However, we can show that if μ is sufficiently small and ε small compared to μ, defectors will imitate cooperation after finite time. As discussed above, matched cooperators will not imitate defection. Therefore, cooperative relationships are broken up only due to experimentation. The flow of currently matched cooperators that go to the pool in period t is less than 2εh(t)yC (t), while the flow of unmatched cooperators that enter a relationship is at least (1 − ε)2 (1 − h(t))2 yC (t)2 (the square is because it takes two cooperators in the pool to meet). Besides, the whole population of coopε 4 erators remains at least comparable to 2μ . So for a given h(t) < 1, if μ is sufficiently small, we have 3 This assumption is made for simplicity and does not affect our results. 4 Among agents who consider changing strategies (a fraction ε + (1 − ε)μ do so in any period), the proportion that ε/2 ε when ε is small compared to μ. chooses cooperation is at least ε+(1−ε)μ , hence at least comparable to 2μ
H. Schumacher / Journal of Economic Theory 148 (2013) 409–417
2 2 ε (1 − ε)2 1 − h(t) yC (t)2 (1 − ε)2 1 − h(t) yC (t) > 2εh(t)yC (t), 2μ
413
(6)
which implies that the share of matched cooperators grows in period t . Consequently, if μ is sufficiently small, then within the population of cooperators the share of matched cooperators becomes large enough so that after finitely many periods defectors imitate cooperation. Finally, it could be that unmatched cooperators imitate defection in infinitely many periods even if in some periods defectors imitate cooperation. This cannot happen if ε is sufficiently small relative to μ. When defectors imitate cooperation in period t , the share of cooperators in relationships in period t + 2 is at least comparable to μ2 . Suppose that unmatched cooperators imitate defection in the following periods. If ε is small relative to μ, the share of unmatched cooperators decreases much faster (through imitation and the formation of relationships) than the share of matched cooperators (who break up the relationship only due to experimentation). There exists then a finite number T so that in at least one period τ with t < τ t + T + 2 defectors again imitate cooperation. Consequently, defectors imitate cooperation at least once in each segment of T +2 periods so that the share of cooperators in relationships grows close to 1. Once this share is sufficiently large, defectors imitate cooperation in each period. Theorem 1. If μ is sufficiently small, then σ D is not dynamically stable. In this case, we have limε→0 lim inft→∞ yC (t) = 1. 2.3. Proof of Theorem 1 Pick a λ > 0 such that G > 1 + λ. From (4) we get that U¯ D (t) < 1 + λ if s(t) < H λ−1 , and λ 1+λ ¯ from (5) we get that U¯ C (t) > 1 + λ if h(t) > 1+λ G . Define s¯ = H −1 and h = G . We have ¯ Our goal is to prove that, after finite time, we have U¯ C (t) > U¯ D (t) when s(t) < s¯ and h(t) > h. ¯ s(t) < s¯ and h(t) > h in all periods t. We derive a lower bound on the share of cooperators and defectors in the population. For each t and i ∈ {D, C} we have 1 yi (t + 1) > (1 − ε)(1 − μ)yi (t) + ε. 2
(7)
Hence, there is a period, say t0 , such that in all periods t t0 we have yi (t) >
1 3ε
1 − (1 − ε)(1 − μ)
(8)
for i ∈ {D, C}. We will use this inequality later in the proof. Note that if ε is sufficiently small G−1 D relative to μ, we cannot have s(t) > H −1 (so that all cooperators imitate σ ) in all periods t . G−1 Therefore, we may assume that s(0) H −1 and that the inequality in (8) holds in all periods t . The rest of the proof proceeds by steps. Step 1. We show that if μ is sufficiently small and ε μ, then there is a period, say t1 , such G−1 that s(t) < s¯ in all periods t t1 . If s(t) H −1 , then we have 3 yCu (t + 1) < 1 − s(t) yCu (t) + μyD (t) + ε, 2 y u (t + 1) > 1 − s(t) yCu (t) + yD (t).
(9) (10)
414
H. Schumacher / Journal of Economic Theory 148 (2013) 409–417
We divide both inequalities by y u (t) and use (8), y u (t) yD (t), ε μ to obtain (1 − s(t))s(t) + 10μ and s(t + 1) =
y u (t+1) y u (t)
yCu (t+1) y u (t)
<
> 1 − s(t)2 . We therefore have
yCu (t + 1) y u (t) s(t) 10μ . < + y u (t) y u (t + 1) 1 + s(t) 1 − s(t)2
(11)
Observe from (11) that s(t + 1) < s(t) if μ is sufficiently small for given s(t). Hence, by uss¯ 2 H −G 1 ing (11), we can show that if μ < 40 H −1 , then (i) s(t) ∈ [0, 2 s¯ ] implies s(t + 1) < s¯ , and 1 G−1 (ii) s(t) ∈ ( 2 s¯ , H −1 ] implies s(t + 1) + ξ < s(t) for some small ξ > 0 (see Appendix A for details). Taken together, statements (i) and (ii) imply the result. In the following, we may assume that s(t) < s¯ in all periods t . Step 2. We show that if μ is sufficiently small and ε μ, then in some period, say t2 , we have h(t2 ) > h¯ so that defectors imitate cooperation in period t2 . Assume by contradiction that h(t) h¯ in all periods t . We have yCm (t + 1) = (1 − ε)2 yCm (t) + s(t)(1 − ε)2 yCu (t).
(12)
By using this equation as well as the inequalities s(t) > yCu (t) and (8) we obtain 2 yCm (t + 1) − yCm (t) > −2εyCm (t) + (1 − ε)2 yCu (t) ¯ C (t) + (1 − ε)2 (1 − h) ¯ 2 yC (t) 2 −2ε hy 1 3 ¯ 2 yC (t)ε −2h¯ + (1 − ε)2 (1 − h) . 1 − (1 − ε)(1 − μ) ¯
(13)
2
h) and ε μ. Hence, if μ is sufficiently small and The term in the last line is positive if μ < (1− 2+12h¯ m ¯ ε μ, then h(t) h in all periods t implies yC (t) → ∞, a contradiction. In the following, we ¯ may assume that h(0) > h. T
¯
h Step 3. Choose T ∈ N large enough such that μ(1−μ) < 1− . We show that if ε is sufficiently 2 (1−¯ s )2 2h¯ ¯ ¯ ¯ small relative to μ, then UC (t) > UD (t) implies that UC (τ ) > U¯ D (τ ) in at least one period τ ∈ {t + 1, . . . , t + T + 2}, and so agents regularly imitate σ C . Assume by contradiction that this is not true. If U¯ C (t) > U¯ D (t), we have s(t + 1) yCu (t + 1) > μ(1 − ε)(1 − s¯ )(1 − yCm (t)), and therefore yCm (t + 2) > μ2 (1 − ε)4 (1 − s¯ )2 . Since U¯ C (τ ) U¯ D (τ ) in all periods τ ∈ {t + 1, . . . , t + T + 2}, we have
3 yCu (t + T + 2) < (1 − μ)T + T ε, 2 yCm (t + T + 2) > μ2 (1 − ε)2(T +2) (1 − s¯ )2 . Note that
yCu (t+T +2) yCm (t+T +2)
1−h¯ h¯
is equivalent to h(t + T + 2) > h¯ and
(14) (15) yCu (t+T +2) yCm (t+T +2)
(1−μ)T
if we set ¯ ε = 0. Hence, by the choice of T , if ε is sufficiently small relative to μ, then h(t + T + 2) > h, ¯ ¯ which implies UC (t + T + 2) > UD (t + T + 2), a contradiction. Step 4. We are now ready to complete the proof of Theorem 1. If defectors imitate cooperation at least once in every segment of T + 2 periods, and if ε is sufficiently small relative to μ, then the share of matched cooperators becomes large (since T is independent of ε) so that there ¯ Note that y m (t) > h¯ implies h(t) > h¯ and therefore is a period, say t3 , where yCm (t3 ) 23 + 13 h. C ¯ ¯ UC (t) > UD (t). It remains to make sure that the share of matched cooperators exceeds h¯ in all <
<
μ2 (1−¯s )2
H. Schumacher / Journal of Economic Theory 148 (2013) 409–417
415
periods t t3 . This follows from two observations: (i) We have yCm (t + 1) (1 − ε)2 yCm (t). ¯ 1] implies y m (t + 1) > 1 + 2 h, ¯ and Hence, if ε is sufficiently small, then yCm (t) ∈ [ 23 + 13 h, C 3 3 2 1 2 1 m (t) ∈ [ 1 + 2 h, m m ¯ (ii) If y (t) ∈ (h, ¯ + h), ¯ + h) ¯ implies y (t + 1) > h; ¯ the share of yC C C 3 3 3 3 3 3 2 1¯ unmatched cooperators in period t + 1 is at least (1 − ε)(1 − s¯ )μ(1 − 3 − 3 h). Hence, if ε is suf¯ implies y m (t + 2) > y m (t + 1). ¯ 2 + 1 h) ficiently small relative to μ, then yCm (t), yCm (t + 1) ∈ (h, C C 3 3 Taken together, observations (i) and (ii) imply that if ε is sufficiently small relative to μ, the share of cooperators exceeds h¯ in all periods t t3 . In all periods t t3 , we then have s(t) < s¯ and 1 (16) yD (t + 1) < 1 − s(t) (1 − μ)yD (t) + s(t)yD (t) + ε 2 ε so that limt→∞ inf yC (t) 1 − 2(1−¯ s )μ , which completes the proof. 3. Discussion The essential feature of our model is that cooperators in relationships do not imitate σ D when defectors earn on average more than cooperators, but less than cooperators in relationships. Relationships are then only broken up through experimentation, so that the share of matched cooperators relative to all cooperators becomes large even if agents continuously imitate defection. Our result may hold good for other imitation rules,5 but not for the most simple one “Always switch to the strategy with the highest average payoff ”. Suppose that agents act according to this rule. If s(0) is small and h(0) is small enough such that U¯ C (0) < U¯ D (0), few cooperators enter a relationship, while a constant share of relationships is broken up through imitation. Thus, s and h remain small such that U¯ C (t) < U¯ D (t) in all periods t . Defection is then dynamically stable even for small μ. It is not straightforward to extend our analysis to a framework with more strategies. For example, assume that we add the “starting small” strategy σ DC to our model, where σ DC is “choose D and M whenever you meet a new opponent; in subsequent periods, choose C; if your opponent cooperated, choose M, otherwise choose Q”. Note that σ C and σ DC earn G when paired with themselves (except in the first period of a relationship). Whether σ C earns on average more or less than σ DC is therefore very sensitive to the distribution of strategies in the pool. Strategy σ DC has the advantage that it cannot be exploited by σ D in the pool. Nevertheless, if most cooperators are in relationships, while most “starting small”-players are in the pool, then σ C earns on average nearly G, while σ DC earns on average an amount close to 1. Hence, both σ C and σ DC may be dynamically stable for given μ. Further research may focus on two dimensions. First, for many applications it is plausible that agents do not only care about the realized action profile, but also about the match value (as, for example, in Jackson and Watts [8]). If the match value between two agents is especially high, it may be rational for them to maintain a relationship even if both agents continuously defect. In this case, the population dynamics can be quite different. Second, a generalization of the informational setting is desirable. We assumed that agents have information about the average payoff of each strategy. Alternatively, agents may base their decisions upon a limited number of observations that are obtained through a “reference network” (as in Cartwright [3]). The model can be generalized by combining the ideas of a reference network, where agents exchange information, and a pool, where agents meet new opponents. 5 For an overview, see Apesteguia et al. [1] and Schlag [10].
416
H. Schumacher / Journal of Economic Theory 148 (2013) 409–417
Appendix A Equations of motion. If U¯ D (t) > G U¯ C (t), we have 1 yD (t + 1) = (1 − ε)yD (t) + (1 − ε)μyC (t) + ε, (17) 2 (18) yCm (t + 1) = (1 − ε)2 (1 − μ)2 yCm (t) + (1 − ε)2 (1 − μ)2 s(t)yCu (t), u u u yC (t + 1) = 1 − s(t) (1 − ε)(1 − μ)yC (t) + s(t)(1 − ε)(1 − μ) ε + (1 − ε)μ yC (t) 1 (19) + (1 − ε)(1 − μ) ε + (1 − ε)μ yCm (t) + ε. 2 If G U¯ D (t) U¯ C (t), we have 1 yD (t + 1) = (1 − ε)yD (t) + 1 − s(t) (1 − ε)μyCu (t) + ε, 2 yCm (t + 1) = (1 − ε)2 yCm (t) + s(t)(1 − ε)2 yCu (t), yCu (t + 1) = 1 − s(t) (1 − ε)(1 − μ)yCu (t) + s(t)ε(1 − ε)yCu (t) 1 + ε(1 − ε)yCm (t) + ε. 2
(20) (21)
(22)
If G U¯ C (t) > U¯ D (t), we have 1 yD (t + 1) = 1 − s(t) (1 − ε)(1 − μ)yD (t) + s(t)(1 − ε)yD (t) + ε, 2 yCm (t + 1) = (1 − ε)2 yCm (t) + s(t)(1 − ε)2 yCu (t), yCu (t + 1) = 1 − s(t) (1 − ε)yCu (t) + s(t)ε(1 − ε)yCu (t) + ε(1 − ε)yCm (t) 1 + 1 − s(t) (1 − ε)μyD (t) + ε. 2
(23) (24)
(25)
Omitted details from Step 1. We prove statement (i). If s(t) 12 s¯ , then 1 10μ s(t) 10μ 2 s¯ + . + 2 1 1 + s(t) 1 − s(t) 1 + 2 s¯ 1 − 14 s¯ 2
(26)
Note that 1 2 s¯ 1 + 12 s¯
+
10μ 1 − 14 s¯ 2
< s¯
⇔ ⇔
This inequality is implied by μ < μ<
s¯ 2 H − G s¯ < . 40 H − 1 40
We prove statement (ii). Note that
s¯ 40 .
10μ 1 − 14 s¯ 2
<
s¯ 2 (1 + s¯ ) 1 + 12 s¯
s¯ 1 10μ < (1 + s¯ ) 1 − s¯ . 2 2
(27)
Statement (i) then follows from the fact that (28)
H. Schumacher / Journal of Economic Theory 148 (2013) 409–417
s(t) 10μ < s(t) + 1 + s(t) 1 − s(t)2
⇔ ⇔
10μ s(t)2 < 2 1 + s(t) 1 − s(t) 2 10μ < s(t) 1 − s(t) .
G−1 When s(t) ∈ ( 12 s¯ , H −1 ], this inequality is implied by G−1 s¯ 2 H − G s¯ 2 1− = μ< , 40 H −1 40 H − 1
417
(29)
(30)
which yields statement (ii). References [1] J. Apesteguia, S. Huck, J. Oechssler, Imitation—Theory and experimental evidence, J. Econ. Theory 136 (2007) 217–235. [2] K. Binmore, L. Samuelson, Evolutionary stability in repeated games played by finite automata, J. Econ. Theory 57 (1992) 278–305. [3] E. Cartwright, Imitation, coordination and the emergence of Nash equilibrium, Int. J. Game Theory 36 (2007) 119–135. [4] S. Datta, Building Trust, STICERD—Theoretical Economics Paper Series, London School of Economics, 1996. [5] I. Eshel, L. Samuelson, A. Shaked, Altruists, egoists, and hooligans in a local interaction model, Amer. Econ. Rev. 88 (1998) 157–179. [6] T. Fujiwara-Greve, M. Okuno-Fujiwara, Voluntarily separable prisoner’s dilemma, Rev. Econ. Stud. 76 (2009) 993– 1021. [7] P. Ghosh, D. Ray, Cooperation in community interaction without information flows, Rev. Econ. Stud. 63 (1996) 491–519. [8] M. Jackson, A. Watts, Equilibrium existence in bipartite social games: A generalization of stable matchings, Econ. Bull. 12 (2008) 1–8. [9] R. Kranton, The formation of cooperative relationships, J. Law, Econ., Organ. 12 (1996) 214–233. [10] K. Schlag, Why imitate, and if so, how? A boundedly rational approach to multi-armed bandits, J. Econ. Theory 78 (1998) 130–156.