Memory does not necessarily promote cooperation in dilemma games

Memory does not necessarily promote cooperation in dilemma games

Physica A 395 (2014) 218–227 Contents lists available at ScienceDirect Physica A journal homepage: www.elsevier.com/locate/physa Memory does not ne...

1MB Sizes 5 Downloads 183 Views

Physica A 395 (2014) 218–227

Contents lists available at ScienceDirect

Physica A journal homepage: www.elsevier.com/locate/physa

Memory does not necessarily promote cooperation in dilemma games Tao Wang a,b,∗ , Zhigang Chen b,c , Kenli Li a , Xiaoheng Deng b , Deng Li b a

College of Information Science and Engineering, Hunan University, Changsha, 410082, China

b

School of Information Science and Engineering, Central South University, Changsha, 410083, China

c

School of Software, Central South University, Changsha, 410083, China

highlights • • • •

Memory increases cooperation in the prisoner’s dilemma. Memory inhibits cooperation in the snowdrift and stag hunt games when the cost/benefit ratio is small. Cooperation is analyzed in terms of R, ST , and P reciprocity. Cooperation is analyzed for 16 strategies.

article

info

Article history: Received 14 October 2012 Received in revised form 23 August 2013 Available online 15 October 2013 Keywords: Updating rules Tit for tat Win-stay, lose-shift Synchronous mode

abstract Evolutionary games can model dilemmas for which cooperation can exist in rational populations. According to intuition, memory of the history can help individuals to overcome the dilemma and increase cooperation. However, here we show that no such general predictions can be made for dilemma games with memory. Agents play repeated prisoner’s dilemma, snowdrift, or stag hunt games in well-mixed populations or on a lattice. We compare the cooperation ratio and fitness for systems with or without memory. An interesting result is that cooperation is demoted in snowdrift and stag hunt games with memory when cost-to-benefit ratio is low, while system fitness still increases with memory in the snowdrift game. To illustrate this interesting phenomenon, two further experiments were performed to study R, ST , and P reciprocity and investigate 16 agent strategies for one-step memory. The results show that memory plays different roles in different dilemma games. © 2013 Elsevier B.V. All rights reserved.

1. Introduction Cooperation is needed for the evolution of new levels of organization. Genomes, organisms, social insects, and human society are all based on cooperation [1]. Thus, understanding the emergence and persistence of cooperative behavior among rational individuals is important. Evolutionary game theory provides a suitable theoretical framework for addressing the subtleties of cooperation in dilemmas [2–4]. Seminal work by Martin and Nowak in 1992 showed that cooperation can exist on a lattice in the prisoner’s dilemma game [5]. However, the Nash equilibrium position for the prisoner’s dilemma is defection, and cooperation cannot exist in a well-mixed population. Since then, a large number of studies have searched for mechanisms that promote cooperative behavior [6–8]. There are three types of two-strategy, two-player dilemma game: the prisoner’s dilemma (PD) game, the snowdrift (SD) game, and the stag hunt (SH) game. All these games can model the dilemma of how cooperation can be maintained by rational individuals.



Corresponding author at: College of Information Science and Engineering, Hunan University, Changsha, 410082, China. Tel.: +86 13787001221. E-mail addresses: [email protected], [email protected] (T. Wang).

0378-4371/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.physa.2013.10.014

T. Wang et al. / Physica A 395 (2014) 218–227

219

Table 1 Coding examples for strategies for m = 1. Game history for two player

Payoff for the first player

00 (defect, defect) 01 (defect, cooperate) 10 (cooperate, defect) 11 (cooperate, cooperate)

P T S R

Next step for the first player ALLC

ALLD

TFT

WSLS

1 1 1 1

0 0 0 0

0 1 0 1

1 0 0 1

Landmark work by Axelrod and Hamilton in the 1980s involved computer tournaments to find the best strategy in a population PD game [9,10]. A tit-for-tat (TFT) strategy was identified as the best. In 1987, Axelrod coded game history and strategy using binary strings and applied a genetic algorithm to study the PD game when agents have memory for more than one step [11]. He found that some patterns in the strategy code are similar to TFT. However, in 1993 Nowak demonstrated that a win-stay, lose-shift (WSLS) strategy outperforms TFT and WSLS evolved as the dominant strategy in simulations [12]. Both TFT and WSLS are based on memory. Several groups have studied PD games with memory for different network structures [13–16]. Strategy selection has also been widely investigated [17–23]. Some other memory mechanisms have also been evaluated. Studies of the PD game with accumulated payoffs revealed that memory can boost cooperation [24–27]. Wang et al. investigated the SD game with accumulated strategies (cooperation or defection) and found some non-monotonic phenomena [28]. Although many researchers have studied memory and cooperation, challenges still remain. Most research has focused on comparing different strategies or studying cooperation in a structured network with different memory mechanisms. Few studies have considered cooperation in a completely connected network with memory. It has been widely shown that network structure has an impact on cooperation [6–8]. However, the effect of memory for PD, SD, and SH games in wellmixed population is still unclear. Here we investigate evolutionary dilemma games with a memory mechanism, whereby agents base their decisions on the game history, in a completely connected network.   In two-person, two-strategy game, there are three types of reciprocity, as indicated by the payoff matrix

R T

S P

. For R

reciprocity, the two players cooperate and obtain payoff R. For ST reciprocity, one player cooperates and the other defects; the cooperator obtain S and the defector gets T . For P reciprocity, both players defect and obtain payoff P. The cooperation ratio for a population thus comprises two parts: all the R reciprocity and half of the ST reciprocity. Reciprocity has been studied by various researchers and can help in understanding cooperation [29–31]. When agents make decisions according to history, the strategies they apply (such as TFT or WSLS) determine the cooperation ratio for the system [9–12]. Thus, the strategy distribution for a population can be used to explain how cooperation arises and is maintained. The remainder of the paper is organized as follows. Section 2 describes our model of an evolutionary game with a memory mechanism. Section 3 presents our simulations and analysis. First, we investigate the cooperation ratio and average fitness for PD, SD, and SH games with differing memory length in a well-mixed population or on a lattice network. To exclude the effect of the network, subsequent work focuses on a well-mixed population (completely connected network). Second, we show the ratio of R reciprocity, P reciprocity, and ST reciprocity, and observe the strategy selection by agents, which affects the cooperation ratio. Section 4 concludes. 2. The model 2.1. Game matrix The payoff matrix for a two-strategy, two-player game is



R T

S P



, where both players obtain R if they cooperate with each

other, both players obtain P if they both defect, and the cooperator obtains S and the defector T in a cooperator–defector pair. For a dilemma game there are two indicators [20]: T − R > 0 and P − S > 0. There are three 2 × 2 dilemma games: the PD game (T − R > 0, P − S > 0), the SD game (chicken or hawk–dove; T − R > 0), and the SH game (P − S > 0). Without we reduce the four parameters to one: the cost/benefit ratio r. Thus, loss of generality,   the payoff matrix for the PD game is 1 1+r

0 0.1

(1 > r > 0) (this is a strict PD game and the result is similar to matrix 1b 00 (2 > b > 1) [5]). The matrix for     1 1−r the SD game is 1 + r (1 > r > 0) [28]. The matrix for the SH game is 1r −0r (1 > r > 0) [32]. 0 2.2. The memory device

Following previous work [11,12,29], we design a memory mechanism to code the game history and strategies. One-step memory means that agents can remember the history for the last game. Cooperation is coded as ‘‘1’’ and defection as ‘‘0’’. Thus, there are four possible historical interactions for one-step memory (Table 1): 00 (the focused player defects and the

220

T. Wang et al. / Physica A 395 (2014) 218–227 Table 2 Strategy update mode.

Mode 1 Mode 2 Mode 3 Mode 4

Updating rule

Synchronicity

Proportional imitation rule Unconditional imitation rule Proportional imitation rule Unconditional imitation rule

Synchronous Synchronous Asynchronous Asynchronous

opponent defects), 01 (the focused player defects and the opponent cooperates), 10, and 11. Depending on the history, agents choose to cooperate or defect. Thus, the strategy can be coded as ****, where * represents 0 or 1. The position of * indicates a particular history. Thus, this 4-bit encoding can represent 16 different strategies. For example, the code is 1001 for the WSLS strategy and 1010 for TFT (high bit on the left). Table 1 shows the codes for always cooperates (ALLC), always defects (ALLD), TFT, and WSLS strategies. 2.3. Memory-based evolutionary game Now we study evolutionary games with or without memory in which agents play against others in a well-mixed population. This can help us to understand agents’ behaviors without the influence of a spatial structure [33,34]. Updating rules play an important role in evolutionary games [8,35,36]. Here we use a replicator rule and an unconditional imitation rule for strategy updating in a memory-based evolutionary game (MBG). The replicator rule is also called a proportional imitation rule [35] and is consistent with the subsequent theoretical analysis. When a round of games has finished, agent x imitates random neighbor y with probability

w(sx → sy ) =



(πy − πx )/Φ ,

0,

πy > πx πy ≤ πx ,

(1)

where Φ = max(kx , ky )(max(R, T ) − min(P , S )) to ensure that w(sx → sy ) ∈ [0, 1]. sx and sy are the strategy and πy and πx are the accumulated payoff for agents x and y, respectively, in a round, and kx is the degree for agent x. According to the unconditional imitation rule [35], each player chooses the strategy of the neighbor with the greatest payoff if it is greater than his own:

w(sx → sy ) =



1, 0,

πy > πx πy ≤ πx .

(2)

There are two possibilities for updating synchronicity: for synchronous updating the agents update their strategies simultaneously, while for asynchronous updating the agents randomly take turns to update their strategy. Table 2 shows for modes for the updating rule and synchronicity combinations. The following steps were executed. A total of N = 1000 agents were placed at the nodes of a complete graph (well-mixed population) and Lr = 1000 rounds of the game were played. Agents initially cooperate or defect randomly in the first round, and then they follow the history H and strategy St. In every round, all pairs of connected agents repeat the game t = 100 times simultaneously. After these t times, agents imitate their neighbor’s strategy St according to the updating rule. Then the next round begins. Each bit for strategy St has probability p = 0.01 of reversing when agents update their strategy. All subsequent statistical data are based on 100 samples. 2.4. Theoretical analysis In this section we model an evolutionary process with memory in a well-mixed population. To study the competition between cooperation and defection from an evolutionary perspective, the payoffs obtained by playing the game are considered as fitness and Darwinian dynamics is introduced to promote the fittest strategy. When agents have no memory [35], the classic framework to use is replicator dynamics [2], which assumes that each individual plays with all the others in an infinite and well-mixed population. Let x be the density of cooperators and let fc and fd be the fitness of a cooperator and a defector, respectively. Replicator dynamics posits that x evolves as [35,37] x˙ = x (1 − x) (fc − fd ) .

(3)

If cooperators are doing better than defectors, their density increases accordingly, and the opposite occurs if they are doing worse. Provided that the initial density of cooperators x0 is different from 0 and 1, the asymptotic state of this dynamic system is: full defection in the PD game (x∗ = 0); full cooperation if x0 > xe or full defection if x0 < xe in the SH game; and a mixed population with x∗ = xe in the SD game, regardless of the initial density x0 , where x∗ is the asymptotic density of cooperators [35,37]. For the SH and SD games, the mixed equilibrium xe has a value [35] xe =

S S+T −1

.

(4)

T. Wang et al. / Physica A 395 (2014) 218–227

221

Fig. 1. Sketch map of P , R, S , T sequences for any two of the 32 strategies [29]. The payoff matrix M is defined by an element value divided by the period (for RST in the figure, the payoff is (R + S + T )/3). Note that the sketch map uses a different strategy coding mode.

Note that the game matrix is defined as



1 T



S 0

, where it is assumed that R = 1 and P = 0.

When agents have memory, replicator dynamics still works in well-mixed populations [29]. To obtain the payoff matrix, we add 1 bit in front of the 4-bit strategy to indicate the player’s status (C or D) in the previous step. For example, strategy TFT can be coded as 11010 and 01010. Then the number of strategies expands from 16 to 32. Assuming that a game is infinitely iterated, we can predict a periodic steady state of game consequences for any pair among 32 strategies. We present a similar game matrix from Tanimoto and Sagara [29] in Fig. 1. Assuming that the number of players is sufficiently large, we can apply replicator dynamics to the game evolution [29,37]: s˙i = si (t si Ms − t sMs).

(5)

Both si and s indicate 32-row vectors. The former indicates the ith strategy expressed as si ∈ S = {(10 . . . 0), . . . , (0 . . . 01)}. The latter is a strategy distribution at a certain time-step expressed by s = (s1 s2 . . . s32 ). The superscript t indicates transposition. M is the 32 × 32 payoff matrix. If the ratio of 32 strategies can be calculated by Eq. (5), it is easy to obtain ratios for the 16 strategies by adding the two corresponding strategies together. For example, the ratio for TFT (1010) is the sum of 01010 and 11010. However, Eq. (5) is difficult to solve [29,37]. In general, when there is no memory, the cooperation ratio is easy to analyze; when memory is taken into account, it is necessary to use a computer to simulate the evolutionary process. 3. Experiments and analysis 3.1. Cooperation ratio and system fitness The cooperation ratio rc and average fitness πave for a system are defined as follows: rc = nc /(nc + nd )

(6)

πi = Pfi /(k ∗ t )

(7)

πave =

N 

πi /N ,

(8)

i=1

where nc (nd ) is number of cooperation (defection) strategies adopted by the agents in a game round, πi is the fitness of agent i, Pfi is the total payoff agent i obtains in a round, k is the degree for agent i, t is the number of times the game is repeated in a round, and N is the number of agents. First we compare systems with memory length 0 and 1 in a well-mixed population (Fig. 2). In the PD game, cooperation cannot exist without memory; by contrast, when agents have memory, all the updating modes lead to the emergence of cooperation, but the value generally decreases with increasing r (except for model 3). To the best of our knowledge, memory may be the only underlying mechanism that leads to cooperation in a completely connected network. In the SD game, there is asynchronicity between payoff and cooperation. In Fig. 2b, the cooperation ratio for modes 1, 2, and 4 with memory is lower than the ratio in the absence of memory (m = 0) when r is small (0–0.4). In Fig. 2e, the fitness for modes 1, 2, and 4 is not less than the fitness in the absence of memory (m = 0).

222

T. Wang et al. / Physica A 395 (2014) 218–227

Fig. 2. Cooperation ratio (top) and average fitness (bottom) for m = 0 (red lines) and m = 1 (blue lines) in a well-mixed population. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

In the SH game, the cooperation ratio and average fitness both decrease with memory for updating modes 2 and 4. In the PD and SH games, the cooperation ratio is usually proportional to fitness, but this is not observed for the SD game. When there is no memory (m = 0), almost all the updating modes follow the theoretical prediction. For example, in the PD and SH games, all four updating modes perform similarly. Thus, for these cases we only show one red line in the plot. However, in the SD game, modes 1, 3, and 4 perform according to the theoretical prediction, while mode 2 performs differently. For updating mode 2 with 0.01 mutations, mutant agents can always perform better than most of the others, so all agents imitate them. The whole system oscillates between cooperation and defection, so the average cooperation ratio is approximately 0.5 (Fig. 2b). We then investigated cooperation and fitness in the presence of memory and a network structure. A lattice is a common structure used to simulate the relationships among agents. Fig. 3 shows that memory increases cooperation and fitness in the PD and SH games; while in SD game, cooperation still demoted when r is small. The cooperation for the SD game differs slightly from previous results [34] because our model has a mutation rate of 0.01. We also investigated the effect of memory length on cooperation. When m = 2, agents make decisions according to the last two game steps. We compared cooperation and fitness for memory lengths from 0 to 2. Only modes 1 and 2 are compared in Fig. 4 for clarity. The results show that when memory increases from 1 to 2, both cooperation and fitness increase. In our previous work, increases in memory did not clearly promote cooperation for a scale-free network [15] since the cooperation ratio was close to 1.0 and there was little room for improvement. However, comparison of m = 2 with m = 0 reveals that cooperation decreases for some modes in the SD and SH games when r is small. In general, the three simulation experiments revealed that memory can greatly increase cooperation in the PD game; however, in the SD and SH games, memory decreases cooperation when the cost/benefit ratio r is small for some updating modes. This phenomenon widely exists for different strategy updating modes, different network structures, and different memory lengths. How does the system evolve? Does cooperation mean the same to agents in a different game? What strategy do agents favor? We try to answer these questions in two ways. The first involves calculating   the cooperation ratio by dividing cooperation into R reciprocity and ST reciprocity [28] (for the game matrix

R T

S P

, agents have three

reciprocities among each other: R, ST , and P). This can reveal the cooperation composition in different games. The second involves analyzing the strategies that agents adopt during evolution. There are 16 strategies for all the agents, and the strategies chosen by the agents may be the essential factor that causes changes in cooperation.

T. Wang et al. / Physica A 395 (2014) 218–227

223

Fig. 3. Cooperation ratio (top) and average fitness (bottom) for m = 0 and 1 on a lattice. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 4. Cooperation ratio (top) and average fitness (bottom) for m = 0, 1, 2 in a completely connected network. For clarity, only modes 1 and 2 are presented. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

224

T. Wang et al. / Physica A 395 (2014) 218–227

Fig. 5. R, ST , and P distributions for the PD game (top), SD game (middle), and SH game (bottom) for m = 0 and the four updating modes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Next, we analyze cooperation and fitness in two separate steps. To exclude the effect of the network, we consider a wellmixed population. In Section 3.2 we assess the R, ST , and P distributions for different games, and in Section 3.3 we investigate the strategy distributions for different games. 3.2. Three types of reciprocity In a two-player, two-strategy game, players have the choice of whether to cooperate (C ) or defect (D). Then there are three types of reciprocity between two players: R, P, and ST , as explained above. The cooperation ratio includes all the R reciprocity and half of the ST reciprocity [29,38,39]. The relationships between the cooperation ratio and the R, ST , and P ratios can be described as follows: rc = rR + rST /2

(9)

rd = rP + rST /2

(10)

rST = rS + rT = rS ∗ 2 = rT ∗ 2,

(11)

where rc is the cooperation ratio, rd is the defection ratio, and rR , rS , rT , rP , and rST are the R, S, T , P, and ST reciprocity ratios, respectively. Thus, a clear distribution of the R, ST and P ratios can help us to understand the composition of cooperation. The results are shown in Fig. 5. In the PD game, P reciprocity dominates when m = 0 and cooperation cannot exist. When m = 1, R reciprocity greatly increases for all four modes, so cooperation emerges. In the SD game, when m = 0, Fig. 5f shows mode 1, 3, 4; Fig. 6 represents mode 2. When m = 1, with mode 1, 2, 4, ST reciprocity increases greatly when r is small, which make R reciprocity decreases. So the cooperation ratio decreases when r is small, while the system’s fitness does not decrease because the agents get payoffs by ST reciprocity. In the SH game, ST reciprocity cannot exist in all the situations. Compared to m = 0, memory for updating modes 1 and 3 leads to a decrease in R reciprocity with large r later, but what’s interesting is that mode 2, 4 make R reciprocity decrease earlier. Thus, the cooperation ratio and fitness decrease for modes 2 and 4 in the SH game.

T. Wang et al. / Physica A 395 (2014) 218–227

225

Fig. 6. R, ST , and P ratios for the SD game with updating mode 2 and m = 0. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 7. Strategy selection for different updating modes with m = 1. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

3.3. Strategy selection When agents have memory, how they adopt strategies is an interesting issue. Strategy selection may be the fundamental cause of changes in cooperation and fitness. We used the 16-strategies model presented in 2.2 for selection by agents. Fig. 7 shows the strategy distribution for 1000 generations of the system. From Fig. 7 we can see that 1001 (WSLS) dominates for mode 3 in the PD, SD, and SH games, so mode 3 leads to greater cooperation and fitness compared to the other modes. However, 1010 (TFT) does not prevail in any mode. The results show that WSLS is better than TFT in the evolutionary process.

226

T. Wang et al. / Physica A 395 (2014) 218–227

Fig. 8. Strategies classified into four types in the snowdrift game. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

In the SH game, we are interested in modes 2 and 4, for which cooperation decreases. The 1000 strategy dominates these modes, so cooperation and fitness decrease. Detailed exploration of the results in Fig. 7 is interesting but complex. As an example we consider Fig. 7b. It is evident that 0000 and 0100 dominate the game. In a well-mixed population, synchronous unconditional imitation (mode 2) means that all agents imitate the agent who obtains the highest payoff in a generation. It is not easy for strategies such as WSLS to occur, since this can be exploited by the 0000 strategy. For 0100, whereby a focal agent only cooperates with a defector and defects in other cases, this seems strange. However, in the well-mixed population, a focal agent may cooperate with some other defectors and defect with some other cooperators, depending on the game history. Thus, the focal agent can obtain payoff of S , T , and P, which is better than double defection. Therefore, 0000 and 0100 strategies dominate this mode. ST reciprocity is an important character of SD game. To obtain a clear view of the strategy characteristics in the SD game, we classified the 16 strategies into four types: *00*, *01*, *10*, and *11* [29,38,39], where * is a wildcard that can be 0 or 1. The second and third bits from the right indicate the game history: 01 indicates the player defects and the opponent cooperates, and 10 indicates the player cooperates and the opponent defects. Thus, for strategy *01* the player and the opponent change role (cooperator, defector) alternately; for strategy *10*, the player and opponent remain in the same role (cooperator or defector). The results are shown in Fig. 8. 4. Conclusions Most of the recent literature has focused on evolutionary games on a lattice or in complex networks. Few studies have considered games in a completely connected network for which evolution can be properly modeled using replicator dynamics theory. We studied evolutionary dilemma games in a completely connected network or on a lattice with a memory mechanism, and found some interesting phenomena that are not easily observed by theoretical analysis. In the PD game, memory greatly increases cooperation. To the best of our knowledge, memory may be the only mechanism that can significantly promote cooperation in a completely connected network. In the SD game, memory decreases cooperation when the cost/benefit ratio is small. However, memory increases the average population fitness. In the SH game with unconditional imitation, memory decreases cooperation when the cost/benefit ratio is small; memory also decreases the average population fitness. ST reciprocity cause the asynchronous between cooperation and fitness in SD game. WSLS strategy promote cooperation in all the three games. To summarize, memory does not necessarily promote cooperation in evolutionary dilemma games. The PD, SD, and SH games are often used to explain cooperative behavior in a rational population. However, our study reveals that these three models differ greatly. Thus, it is important to select the right model to illustrate cooperation for different situations. Some interesting phenomena require more detailed study. such as how different updating rules and synchronicity modes affect cooperation, and how to use the three reciprocities and strategies to analyze cooperation in different models. Synchronous unconditional imitation in a well-mixed population seems slightly strange. In fact, all agents will select the strategy with the highest payoff in the second generation, so all the other strategies become extinct. In our model, strategy mutation can help the system avoid reaching local optimization earlier. Large samples (≥100) can also help in obtaining optimal results. Thus, the results are still reasonable. Acknowledgments We thank editors for scrupulous polishing the paper. We thank Yi Liu and Jun Tanimoto for advice on the paper. This work was partly supported by the National Natural Science Foundation of China under Grants 61379110, 61173036, 61133005, 61103202, and 61073186, and Priority Development Areas Project 20120162130008 of the Chinese Ministry of Education.

T. Wang et al. / Physica A 395 (2014) 218–227

227

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39]

M.A. Nowak, Five rules for the evolution of cooperation, Science 314 (2006) 1560–1563. H. Gintis, Game Theory Evolving, Princeton University Press, Princeton, NJ, 2000. J. Maynard Smith, Evolution and the Theory of Games, Cambridge University Press, Cambridge, 1982. J.W. Weibull, Evolutionary Game Theory, MIT Press, Cambridge, MA, 1995. A. Martin, R.M.M. Nowak, Evolutionary games and spatial chaos, Nature 359 (1992) 826–829. G. Szabó, G. Fáth, Evolutionary games on graphs, Physics Reports 446 (2007) 97–216. C.P. Roca, J.A. Cuesta, A. Sanchez, Evolutionary game theory: temporal and spatial effects beyond replicator dynamics, Physics of Life Reviews 6 (2009) 208–249. M. Perc, A. Szolnoki, Coevolutionary games—a mini review, Biosystems 99 (2010) 109–125. R. Axelrod, W.D. Hamilton, The evolution of cooperation, Science 211 (1981) 1390–1396. R. Axelrod, The Evolution of Cooperation, Basic Books, New York, 1984. R. Axelrod, Genetic Algorithms and Simulating Annealing, Pitman, London, 1987. M. Nowak, K. Sigmund, A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game, Nature 364 (1993) 56–58. H. Lin, C.-X. Wu, Evolution of strategies based on genetic algorithm in the iterated prisoner’s dilemma on complex networks, Acta Physica Sinica 56 (2007) 4313–4318. Y. Liu, et al., Memory-based prisoner’s dilemma on square lattices, Physica A 389 (2010) 2390–2396. T. Wang, Z.-G. Chen, X.-H. Deng, J. Zhang, Properties of prisoner’s dilemma game based on genetic algorithm, in: Proceedings of the 29th Chinese Control Conference, 2010, pp. 4732–4736. X.-H. Deng, Y. Liu, Z.-G. Chen, Memory-based evolutionary game on small-world network with tunable heterogeneity, Physica A. Statistical Mechanics and its Applications 389 (2010) 5173–5181. M. Posch, Win-stay, lose-shift strategies for repeated games—memory length, aspiration levels and noise, Journal of Theoretical Biology 198 (1999) 183–195. D.B. Neill, Optimality under noise: higher memory strategies for the alternating prisoner’s dilemma, Journal of Theoretical Biology 211 (2001) 159–180. L.A. Imhof, D. Fudenberg, M.A. Nowak, Tit-for-tat or win-stay, lose-shift? Journal of Theoretical Biology 247 (2007) 574–580. J. Tanimoto, H. Sagara, Relationship between dilemma occurrence and the existence of a weakly dominant strategy in a two-player symmetric game, Biosystems 90 (2007) 105–114. D.P. Kraines, V.Y. Kraines, Natural selection of memory-one strategies for the iterated prisoner’s dilemma, Journal of Theoretical Biology 203 (2000) 335–355. Y. Liu, et al., Win-stay–lose-learn promotes cooperation in the spatial prisoner’s dilemma game, PLoS ONE 7 (2012) e30689. X. Chen, F. Fu, L. Wang, Promoting cooperation by local contribution under stochastic win-stay–lose-shift mechanism, Physica A 387 (2008) 5609–5615. R. Alonso-Sanz, Memory boosts cooperation in the structurally dynamic prisoner’s dilemma, International Journal of Bifurcation and Chaos 19 (2009) 2899–2926. R. Alonso-Sanz, Spatial order prevails over memory in boosting cooperation in the iterated prisoner’s dilemma, Chaos 19 (2009) 023102. R. Alonso-Sanz, M. Martin, Memory boosts cooperation, International Journal of Modern Physics C 17 (2006) 841–852. S.-M. Qin, et al., Effect of memory on the prisoner’s dilemma game in a square lattice, Physical Review E 78 (2008) 041129. W.-X. Wang, et al., Memory-based snowdrift game on networks, Physical Review E 74 (2006) 056113. J. Tanimoto, H. Sagara, A study on emergence of alternating reciprocity in a 2 × 2 game with 2-length memory strategy, Biosystems 90 (2007) 728–737. L. Browning, A.M. Colman, Evolution of coordinated alternating reciprocity in repeated dyadic games, Journal of Theoretical Biology 229 (2004) 549–557. J. Tanimoto, H. Sagara, Relationship between dilemma occurrence and the existence of a weakly dominant strategy in a two-player symmetric game, Biosystems 90 (2007) 105–114. M. Perc, A. Szolnoki, Coevolutionary games—a mini review, Biosystems 99 (2010) 109–125. M. Nowak, K. Sigmund, Chaos and the evolution of cooperation, Proceedings of the National Academy of Sciences of the United States of America 90 (1993) 5091–5094. C. Hauert, M. Doebeli, Spatial structure often inhibits the evolution of cooperation in the snowdrift game, Nature 428 (2004) 643–646. C.P. Roca, J.A. Cuesta, A. Sanchez, Effect of spatial structure on the evolution of cooperation, Physical Review E 80 (2009) 046106. A. Yamauchi, J. Tanimoto, A. Hagishima, An analysis of network reciprocity in prisoner’s dilemma games using full factorial designs of experiment, Biosystems 103 (2011) 85–92. J. Hofbauer, K. Sigmund, Evolutionary Games and Population Dynamics, Cambridge University Press, Cambridge, 1998. P.H. Crowley, Dangerous games and the emergence of social structure: evolving memory-based strategies for the generalized hawk–dove game, Behavioral Ecology 12 (2001) 753–760. L. Browning, A.M. Colman, Evolution of coordinated alternating reciprocity in repeated dyadic games, Journal of Theoretical Biology 229 (2004) 549–557.