Physics Letters A 375 (2011) 3557–3561
Contents lists available at SciVerse ScienceDirect
Physics Letters A www.elsevier.com/locate/pla
q-Strategy spatial prisoner’s dilemma game Zhi-Hua Li a,∗ , Hong-Yi Fan a , Wen-Long Xu b , Han-Xin Yang c a b c
Department of Material Science and Engineering, University of Science and Technology of China, Hefei 230026, China School of Computer Science and Technology, Beihang University, Beijing 100083, China Department of Modern Physics, University of Science and Technology of China, Hefei 230026, China
a r t i c l e
i n f o
Article history: Received 22 April 2011 Received in revised form 11 July 2011 Accepted 2 August 2011 Available online 26 August 2011 Communicated by C.R. Doering Keywords: Evolutionary game theory Prisoner’s dilemma game q-Strategy game Intermediate strategies
a b s t r a c t We generalize the usual two-strategy prisoner’s dilemma game to a multi-strategy game, in which the strategy variable s is allowed to take q different fractional values lying between 0 and 1. The fractionalvalued strategies signify that individuals are not absolutely cooperative or defective, instead they can adopt intermediate strategies. Simulation results on 1D and 2D lattices show that, compared with the binary strategy game, the multi-strategy game can sustain cooperation in more stringent defective environments. We give a comprehensive analysis of the distributions of the survived strategies and we compare pairwise the relative strength and weakness of different strategies. It turns out that some intermediate strategies survive the pure defection because they can reduce being exploited and at the same time benefit from the spatial reciprocity effect. Our work may shed some light on the intermediate behaviors in human society. © 2011 Elsevier B.V. All rights reserved.
1. Introduction
Cooperation is ubiquitous in the real world, ranging from unicellular organisms to human beings [1]. In the past decades, evolutionary game theory [2] has become a powerful framework to study the evolution of cooperation among self-interested individuals. In particular, the prisoner’s dilemma game (PDG) is one of the most outstanding games [3,4]. In the original PDG, two players simultaneously decide whether to cooperate or defect. They will receive R if both cooperate, and P if both defect. A defector exploiting a cooperator receives T and the exploited cooperator receives S, such that T > R > P > S. So it is easy to see that, in a one-round game, defection is the better choice irrespective of the choice of the opponent, though cooperation is much better for the flourish of the whole population. The spatial game is a great extension [5], in which players are located on the vertices of the networks and cooperators can survive due to the spatial reciprocity effect. Notably it has attracted a lot of interests from the physical communities [6,7]. In this line of study, extensive researches have been conducted, such as the heterogeneity of the network topology [8–13], the effect of noise [14– 18], preferential selection [19,20], social diversity [21–23], coevolution of network and strategy [24–30] (see [31] for a recent review) and lately, the mobility of individuals [32,33], just to name a few.
*
Corresponding author. E-mail address:
[email protected] (Z.-H. Li).
0375-9601/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.physleta.2011.08.049
In this work, we propose a game model generalizing the spatial PDG, which is also inspired by physics. The evolutionary games on the lattice networks resemble in some way the lattice statistical models in statistical mechanics, such as the Ising model [34]. In that model, the spin variables can take two values +1 and −1, each corresponding to spin up and down. In statistical mechanics, there is another model, called q-state Potts model [35], which generalizes the Ising model from two spin values to q values say {1, 2, 3, . . . , q}. Inspired by that, we generalize the usual twostrategy game s = 0, 1 to a q-strategy game: the strategy variable s 2 1 is allowed to take values in the set {0, q− , 2 , . . . , qq− 1 q−1 −1 , 1}. Then we define the payoffs such that, if the strategy s of a player is closer to 1, he is more cooperative, while if s is closer to 0, then he is more defective. This generalization is of practical value, in that, in many situations of the real life, under the same temptation to defect, it is not that one has to choose from either completely cooperate or completely defect, rather there can be many intermediate strategies affordable, so that different people can behave rather differently. To our knowledge this phenomenon has been rarely concerned in the framework of spatial games. Note that what we call intermediate strategies coincide with the conventionally called mixed strategies in the literature [8]. While mixed strategies are usually interpreted as the probability of an individual to defect or to cooperate in many deals, it can also be interpreted as to what extent an individual cooperates or defects in one deal. We shall use “intermediate strategy” in this work to emphasize the later interpretation. One fundamental difference between the Potts model and the current game model lies in that, in the former the q states are
3558
Z.-H. Li et al. / Physics Letters A 375 (2011) 3557–3561
symmetric, while in the latter, the q strategies are not identical, so that weaker strategies will be defeated and replaced by stronger strategies. Thus we are interested in which strategies survive in the competition, i.e., the survived strategies (SSs). It is easy to see that, in a well-mixed population, the only SS is pure defection, no mater in the two-strategy game or in the q-strategy game. We implement the game on the networks with the simplest structure, i.e., 1D as well as 2D lattices. We found that the multi-strategy game model can still sustain cooperation when the temptation to defect b becomes larger. We understand the phenomena by analyzing the distributions of the SSs for different b and q values. As b increases, the intermediate SSs shift closer to the pure defection. Though the intermediate strategies are not as “good” as the pure cooperation, they can effectively resist the overspreading of the “worst” pure defection and sustain the cooperation. 2. Model In the standard spatial PDG, N players are located on a spatial network. Two players play the game whenever they are linked to each other. Each player can adopt only two strategies: either cooperate C or defect D, namely, for a player on the site i, the strategy variable si takes binary value 1 or 0. To calculate the total payoff of the player, it is better to map variable si into a two the strategy
component vector form φi =
1 0
or
0 1
. Using the rescaled weak
version of the PDG with T = b > 1, R = 1, and P = S = 0 [5], the total payoff of the player i is computed as:
Pi =
j ∈Ni
T φi
1 0 b 0
W (s j → s i ) =
1 1 + exp [( P i − P j )/ K ]
(2)
Following previous studies, we set K = 0.1. We adopt the 2D and 1D regular lattice networks with periodical boundary conditions as the structure of the population. To avoid cumbersome, we only present the results for the case of node degree k = 4 for both networks, i.e., the 2D lattice with nearest nodes interaction and the 1D lattice with nearest and next nearest nodes interaction. We have also checked that the features of the respective results are alike for several other cases such as k = 8, 12, 16, but with an exception for the case of k = 2 1D lattice, where ρc vanishes identically for all the b and q values. We set N = 100 × 100 for the 2D lattice and N = 2000 for the 1D lattice. Initially the players are distributed uniformly in the q strategies. The total cooperation level is evaluated by the cooperation density ρc ≡ iN=1 si / N with si ∈ Ωq . Note that when q = 2, this definition degenerates to the usual definition for ρc in the two-strategy game. The definition here is obviously reasonable and can be comparable with that of the two-strategy game. In all simulations, ρc is obtained by averaging over the last 5000 generations of the entire 105 generations. Each data point results from an average of 100 runs from each set of independent initial parameter values and the above model is simulated with synchronous update. 3. Results
φj
(1)
where b ∈ [1, 2) and Ni denotes the set of the neighbors of the player i. In the present model, more strategies other than 0 and 1 are affordable, namely, we set the strategy variable si to be in one of 2 1 the q values: si ∈ Ωq ≡ {0, q− , 2 , . . . , qq− 1 q−1 −1 , 1} (q 2 is an integer). To define the payoffs for the fractional-valued we strategies, also need to map si into the vector form φi ≡
si 1− s i
. Then the
payoff of the player i in our q-strategy game is defined by the same formula of Eq. (1) as in the above two-strategy game, but with the strategy set extended from {0, 1} to Ωq (in their corresponding vector forms). So the q-strategy game is obviously a direct generalization of the classical spatial game. We note that, when q = 2 the model degenerates obviously to the standard PDG; while in the limit of q → ∞, s assumes continuous values in the interval of [0, 1]. We now prove that this generalization still preserves the meaning of PDG. To do this, we calculate explicitly the payoff matrix for any strategy subset {s, s } ⊂ Ω , such that s > s . For the ordered strategy pair (s, s ), the payoff for the strategy s is just called the sucker’s payoff S {s,s } = φ T
After a full cycle of the game, the players update their strategies. A focal player i randomly selects one neighbor j and compares its payoff with that of j, then the probability that i changes its strategy to s j in the next round is given by [36]:
10 b0
φ = ss + b(1 − s)s .
The payoffs of the first player for the other three possible ordered strategy pairs (s, s), (s , s) and (s , s ) can be obtained likewise, which define the rest payoff matrix entries: R {s,s } = s2 + b(1 − s)s, T {s,s } = ss + b(1 − s )s and P {s,s } = s 2 + b(1 − s )s . It is easy to verify that the condition for the weak PDG T > R > P S strictly holds. So, in the q-strategy game, any two neighboring players are always playing the prisoner’s dilemma game, with one player exploiting the other, whenever their strategies are not equal. The main difference of the q-strategy game with the two-strategy game is that the intensity of the exploitation is diverse for different strategy configurations.
Fig. 1 shows ρc as a function of b for different values of q on both 2D and 1D lattices. On the 2D lattice (see Fig. 1(a)), at q = 2 which corresponds to the original two-strategy game, ρc decays very fast; while at q > 2, the cooperation can sustain for much larger temptation to defect values. On the 1D lattice (see Fig. 1(b)), for small q values, interestingly, ρc varies with b in a step-like fashion, i.e., ρc doesn’t change with b in certain intervals. The insets of Fig. 1(a) and (b) show the critical value bcr where ρc changes from a non-zero value to zero as a function of q. One can see that bcr increases as q increases. When q is large enough, bcr reaches to a steady value. This phenomenon indicates that multi-strategy is helpful for the survival of cooperators when the temptation to defect is large. To have an intuitive feeling, in Fig. 2 we plot the snapshots of spatial distributions of the strategies on the 2D lattice for two q values at the same b = 1.03. When q = 2, the pure cooperators are so fragile under the invasion of the dangerous pure defectors that they extinct soon, leaving the entire lattice occupied by pure defectors (see Fig. 2(a)–(c)); when q = 11 (see Fig. 2(d)–(i)), each strategy tends to cluster together and the number of different strategies becomes fewer and fewer. Eventually, there are two SSs left: one is the pure defection s = 0, the other is at s 0.5. Comparing the result of q = 11 with q = 2, we see that though s 0.5 is not as “good” as the pure cooperation, it has effectively resisted the overspread of the “worst” pure defection. The above result illustrates the important roles of the intermediate strategies. To illustrate the evolution process of the q-strategy game, we plot the number of strategies as a function of the Monte Carlo (MC) time steps for q = 1001 on both 2D and 1D lattices in Figs. 3(a) and 3(b), respectively. From Fig. 3, one can see that, there are eventually only one or two SSs left in the system, depending on the network structure and the value of b. As reflected in Figs. 2 and 3, the q strategies are not equal, only the most competitive s values are survived. It is expected that the
Z.-H. Li et al. / Physics Letters A 375 (2011) 3557–3561
3559
Fig. 3. Log-log plot of the time series of the number of strategies on (a) 2D and (b) 1D lattices, for b = 1.02 and 1.10 at q = 1001.
Fig. 1. ρc as a function of temptation to defect b for different values of q on (a) 2D lattice and (b) 1D lattice. Insets in (a) and (b) are critical value bcr where ρc changes from a non-zero value to zero as a function of small q values in respective dimensions.
Fig. 4. Distribution of SSs for different values of q and b on the 2D lattice. Each column corresponds to the same b on top of the figure and each row the same q on the left of the figure.
Fig. 5. Distribution of SSs for different values of q and b on the 1D lattice. Parameters are set likewise as in Fig. 4. Fig. 2. Snapshots of the system at different time steps when the temptation to defect b = 1.03: (a)–(c) q = 2 at t = 0, 50, 500, respectively; (d)–(i) q = 11 at t = 0, 50, 500, 2000, 10 000, 50 000, respectively. The values of the strategies are encoded by the color spectrum from black (s = 0) to yellow (s = 1). (For interpretation of colors in this figure, the reader is referred to the web version of this Letter.)
cooperation behavior of the q-strategy game is essentially determined by the SSs. In a structured population, the relative strength and weakness of these strategies are determined by two effects: the spatial reciprocity effect and the exploitation effect. Spatial reciprocity [5] depends on the underlying topology of the network. It is more beneficial for more cooperative strategies (larger s
values). And the exploitation effect depends on the temptation to defect b. Larger b is more beneficial for more defective strategies (smaller s values). The parameter q determines the strategy set affordable. So in the following, we shall investigate the distributions of the SSs which depend on three factors: network topologies, the temptation of defect and the parameter q, as summarized in Figs. 4 and 5. Along the way, we shall also indicate that most of the cooperative behaviors of the system can be explained by these distribution figures. First, we consider the impact of the network topologies. We know from statistical mechanics that the dimensionality plays an
3560
Z.-H. Li et al. / Physics Letters A 375 (2011) 3557–3561
Fig. 6. Stationary density of the strategy s for pairwise evolution of the two strategies {0, s } with s varying in Ωq /{0} and for different values of b. The initial densities of 0 and s are both 0.5.
important role [34]: In different dimensions, the same physical models, such as the Potts model, belong to different universal classes and usually behave rather differently. In the current game model, the dimensionality is also very important. Comparing Fig. 4 with Fig. 5, one significant difference we can find is that, on the 2D lattice, there are typically two SSs: one is always at s = 0, the other is an intermediate strategy; while on the 1D lattice, there are typically only one SS left. And this difference leads to the rather different behaviors of the cooperation density as a function of b, as shown in Fig. 1(a) and (b). The fact that only very few SSs are left in the 2D and 1D lattice is related to the homogeneous nature of both networks. Second, consider the impact of the temptation to defect. As the temptation to defect increases, it is more advantageous for more defective strategies. Taking Fig. 4(f)–(j) for example, we can find that, as b increases, the position of the intermediate SS shifts from s = 1 to closer to s = 0. This means that as the entire environment gets even worser for cooperation, the intermediate SS is attracted to behave more defectively. Though it becomes more defective, it is at the same time more immune to the invasion of the strongest and worst pure defection. This explains the cooperation sustainment trait of the q-strategy model, as shown in Fig. 1(a) and (b). Third, consider the impact of q. In the q-state Potts model, different q values lead to quite different models and need separate analytical treatments [35]. In the current game model, the q values are also important. This is reflected manifestly on the 1D lattice. At q = 3, as b increases from 1.0 to 1.18, the SS stays still in certain intervals of b (see Fig. 5(a)–(e)); while at q = 1001, as b increases from 1.0 to 1.18, the SS always varies with b, moving towards the 0 end (see Fig. 5(f)–(j)). We can infer that, small q values constrain the number of the intermediate strategies affordable, so that when b varies in certain intervals, some strategy is always stronger than others and its position remains unchanged. This explains why there is step-like behavior of ρc vs. b only for small q values, as shown in Fig. 1(b). And the same reasoning explains why, when b is small, the cooperation level is lower for large q, as also shown in Fig. 1(b). So the small q regime is in some sense more interesting on the 1D lattice. We have shown in the above that the features of the q-strategy game are directly reflected in the distributions of the SSs. To better understand these distributions, it is instructive to compare pairwise the relative strength and weakness of the strategies in the strategy set Ωq . We do this by starting from pairs of strategies {s, s } ∈ Ωq and comparing their stationary densities ρs and ρs (ρs + ρs = 1). Given the fact that pure defection is always an SS on the 2D lattice, here we simply consider the cases with fixed s = 0 and s varying in Ωq /{0}. The result for q = 21 and different values
of b is shown in Fig. 6. One can find that when b is a bit larger, i.e. when b = 1.03, 1.05 or 1.07, s can only coexist with pure defectors in a middle interval. We may assert that these strategies are optimized by on the one hand reducing the exploitation by pure defection and on the other hand keeping the spatial reciprocity effect not too low. This is the more complete reason why some intermediate strategies survive when b is larger as shown in Fig. 4. One can also find in Fig. 6 that, the upper bound of the coexist interval reduces as b increases, while the lower bound always stays at around s = 0.2. We have checked several other cases of strategy pairs {s, s } with 0 < s < s 1 and found that, similar to the {0, s } case, s can coexist with s only if s is not too close or too far away from s. This feature also gives some hint why there are so few SSs: In such as Fig. 4(h) we have seen that there are merely two SSs s = 0 and s = 0.53. Strategies larger than 0.53 are all completely defeated, which is easy to understand, as they are too cooperative. Strangely, strategies between 0 and 0.53 can’t survive either. We now understand that it may be because excessive defection makes them no longer able to support each other. Therefore much narrower ranges of strategies in Ωq can coexist with each other and eventually leaving very few SSs. Of course, this is a rough argument here, as starting from the q strategies, the emergence of the final SSs is a complex process not merely through the competition between 0 and one s as shown in Fig. 6 but also between them and all other strategies in Ωq . 4. Conclusion In this work we have generalized the usual two-strategy spatial prisoner’s dilemma game to a fractional-valued q-strategy game. We have proved that this extension still constitutes dilemma games. The fractional-valued strategy has diminished the absolute distinction between cooperation and defection and thus is more practical. We have found that the multi-strategy game can promote cooperation compared with the two-strategy game. We have analyzed the distribution of the final strategies and found that it is the intermediate strategy that sustains cooperation. The reason is that the intermediate strategies are in optimized regions where they can reduce the exploitation by more defective players and at the same time keep the spatial reciprocity effect not too low. We hope this work can help to resolve the puzzle of the social cooperation. Finally, we compare our work with several previous works. In [37], Fort studied a model in which the system evolves from a completely heterogeneous distribution of payoff matrices with players being able to adopt each other’s payoff matrices. The similar point between our work and [37] is that in both works, unlike many other works that the whole population assumes a unique payoff matrix, the payoff matrix can vary from player (pair) to player (pair), besides, in both works only very few payoff matrices can coexist when the system reaches the stationary state. The difference in the two works is that, in [37] the heterogeneity of the payoff matrices is realized by randomly assigning payoff matrices to each player, while in our work, it is realized by recasting the original binary strategy payoff matrix through the fractionalvalued strategies. Similar phenomenon of the disappearing of heterogeneity in the population was reported in [38], where the authors studied models with different adoptions of noise levels and found that the system evolves to a stationary state with only one noise value selected. In [39], Vukov et al. considered the effects of incipient cognition in the spatial PDG. In that work players’ strategies are characterized by two parameters ( p , q) which determine how one player reacts to its counterpart according to its incipient cognition. The strategy parameters ( p , q) can take many different fractional values, which is similar to the fractional strategy s in our work. But the meaning of the strategies in the two works are
Z.-H. Li et al. / Physics Letters A 375 (2011) 3557–3561
quite different: in [39] strategies are reactive in nature that is players behave differently to different neighbors, while in our work we emphasize the intermediate behavior and in each time step players always hold the same strategy. In [40], Hauert et al. studied the effects of spatial structure in snowdrift games. They considered both pure strategies and mixed strategies, and found that in either case spatial structure often inhibits the evolution of cooperation. In contrast, in this work we systematically investigated mixed strategies in the spatial PDG and found that mixed strategies do play a positive role compared with pure strategies: they can further promote cooperation by reducing the exploitation by pure defectors and at the same time to some extent retaining the spatial reciprocity effect. Acknowledgements We would like to thank Dr. Wen-Bo Du for useful discussions. Z.-H.L. and H.-Y.F. acknowledge supports from National Natural Science Foundation of China under Grant No. 10874174. References [1] E. Pennisi, Science 325 (2009) 1196. [2] R. Axelrod, W.D. Hamilton, Science 211 (1981) 1390. [3] J. Maynard Smith, Evolution and the Theory of Games, Cambridge University Press, 1982. [4] M. Nowak, O. Pekonen, J. Pastor, Math. Intell. 30 (2008) 64. [5] M.A. Nowak, R.M. May, Nature 359 (1992) 826. [6] G. Szabó, J. Vukov, A. Szolnoki, Phys. Rev. E 72 (2005) 047107. [7] C. Hauert, G. Szabó, Am. J. Phys. 73 (2005) 405. [8] G. Szabó, G. Fath, Phys. Rep. 446 (2007) 97.
[9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40]
3561
N. Masuda, K. Aihara, Phys. Lett. A 313 (2003) 55. F. Santos, J. Pacheco, Phys. Rev. Lett. 95 (2005) 98104. J. Vukov, G. Szabó, A. Szolnoki, Phys. Rev. E 73 (2006) 067103. J. Vukov, G. Szabó, A. Szolnoki, Phys. Rev. E 77 (2008) 026109. W. Du, H. Zheng, M. Hu, Physica A 387 (2008) 3796. J. Ren, W. Wang, F. Qi, Phys. Rev. E 75 (2007) 045101. M. Perc, New J. Phys. 8 (2006) 22. M. Perc, New J. Phys. 8 (2006) 183. M. Perc, M. Marhl, New J. Phys. 8 (2006) 142. A. Szolnoki, M. Perc, G. Szabó, Phys. Rev. E 80 (2009) 056109. Z. Wu, X. Xu, Z. Huang, S. Wang, Y. Wang, Phys. Rev. E 74 (2006) 021107. A. Szolnoki, G. Szabó, EPL 77 (2007) 30004. M. Perc, A. Szolnoki, Phys. Rev. E 77 (2008) 011904. W. Du, X. Cao, L. Zhao, M. Hu, Physica A 388 (2009) 4509. H. Yang, W. Wang, Z. Wu, Y. Lai, B. Wang, Phys. Rev. E 79 (2009) 056107. M. Zimmermann, V. Eguíluz, M. San Miguel, Phys. Rev. E 69 (2004) 065102. M. Zimmermann, V. Eguíluz, Phys. Rev. E 72 (2005) 056118. J. Pacheco, A. Traulsen, M. Nowak, Phys. Rev. Lett. 97 (2006) 258103. A. Szolnoki, M. Perc, Z. Danku, EPL 84 (2008) 50007. A. Szolnoki, M. Perc, New J. Phys. 11 (2009) 093033. A. Szolnoki, M. Perc, EPL 86 (2009) 30007. F. Fu, T. Wu, L. Wang, Phys. Rev. E 79 (2009) 036101. M. Perc, A. Szolnoki, BioSystems 99 (2010) 109. D. Helbing, W. Yu, Proc. Natl. Acad. Sci. USA 106 (2009) 3680. S. Meloni, A. Buscarino, L. Fortuna, M. Frasca, J. Gómez-Gardeñes, V. Latora, Y. Moreno, Phys. Rev. E 79 (2009) 067101. K. Huang, Statistical Mechanics, 2nd ed., Wiley India Pvt. Ltd., 2008. F. Wu, Rev. Mod. Phys. 54 (1982) 235. G. Szabó, C. Toke, Phys. Rev. E 58 (1998) 69. H. Fort, EPL 81 (2008) 48008. A. Szolnoki, J. Vukov, G. Szabó, Phys. Rev. E 80 (2009) 056112. J. Vukov, F.C. Santos, J.M. Pacheco, PLoS ONE 6 (2011) e17939. C. Hauert, M. Doebeli, Nature 428 (2004) 643; A simulation of the spatial PDG with mixed strategies can be found in http://www.univie.ac.at/virtuallabs/ Snowdrift/struct.mixed.pd.html.