Applied Mathematics and Computation 277 (2016) 44–53
Contents lists available at ScienceDirect
Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc
Two population three-player prisoner’s dilemma game Essam EL-Seidy a,∗, Entisarat.M. Elshobaky a, Karim.M. Soliman b a b
Department of Mathematics, Faculty of Science, Ain Shams University, Cairo, Egypt Department of Mathematics, Zewail City of Science and Technology, Giza, Egypt
a r t i c l e
i n f o
Keywords: Iterated games Prisoner’s dilemma Payoff matrix Symmetric games Asymmetric games Evolutionary games
a b s t r a c t Due to the computational advantage in symmetric games, most researches have focused on the symmetric games instead of the asymmetric ones which need more computations. In this paper, we present prisoner’s dilemma game involving three players, and suppose that two players among them agree against the third player by choosing either to cooperate together or to defect together at each round. According to that assumption, the game is transformed from the symmetric three- player model to asymmetric two-player model such that, the identities of the players cannot be interchanged without interchanging the payoff of the strategies. Each strategy in the resulting model is expressed with two state automata. We determine the payoff matrix corresponding to the all possible strategies. We noticed that, for some strategies, it is better to be a player of the first type (independent player) than being of the second type (allies). © 2016 Elsevier Inc. All rights reserved.
1. Introduction Game theory has become a key tool across many disciplines. The prisoner’s dilemma (PD) is a traditional game model for the study of decision-making and self interest [1,2]. It is only one of many illustrative examples of the logical reasoning and complex decisions involved in game theory. The mechanisms that drive the (PD) are the same as those that are faced by marketers, military strategists, poker players, and many other types of competitors [3–5] This dilemma can multiply into hundreds of other more complex dilemmas. A plethora of disciplines have studied the game, including artificial intelligence, economics [6,7], biology [8], physics, networks [9], business [10], mathematics[11,12], philosophy, public health, ecology, traffic engineering [13], sociology and computer science [14,15]. In the prisoner’s dilemma, two players are faced with a choice, they can either cooperate or defect. Each player is awarded points (called payoff) depending on the choice they made compared to the choice of the opponent. Each player’s decision must be made without knowledge of the other player’s next move. There can be no prior agreement between the players concerning the game. If both players cooperate they both receive a reward, R. If both players defect they both receive a punishment, P. If one player defects and the other cooperate, the defector receives a reward, T the temptation to defect,
∗
Corresponding author. Tel.: +20 226219517; fax: +20 226847824. E-mail addresses:
[email protected],
[email protected] (E. EL-Seidy),
[email protected] (Entisarat.M. Elshobaky), kamohamed @zewailcity.edu.eg (Karim.M. Soliman). http://dx.doi.org/10.1016/j.amc.2015.12.047 0096-3003/© 2016 Elsevier Inc. All rights reserved.
E. EL-Seidy et al. / Applied Mathematics and Computation 277 (2016) 44–53
45
while the player who cooperated is punished with the sucker’s payoff, S [16]. We can represent the payoff matrix as the following:
C D
C D R T
S P
(1)
where, T > R > P > S should be satisfied [17]. If a rational player thinks that his/her opponent will cooperate, then he will defect to receive a reward, T points as opposed to the cooperation which would have earned him/her only, R points. Moreover if the rational player thinks that his/her opponent will defect, he will also defect and receive, P points rather than cooperate and receive the sucker’s payoff of, S points. Therefore, the rational decision is to always defect [18]. But assuming the other player is also rational he/she will come to the same conclusion as the first player. Thus, both players will always defect, earning rewards of, P points rather than , R points that mutual cooperation could have yielded. Therefore, defection is the dominant strategy for this game (the Nash Equilibrium). This holds true as long as the payoffs follow the relationship T > R > P > S, and the gain from mutual cooperation is greater than the average score for defection and cooperation, R > S+T 2 . The iterated prisoner’s dilemma (IPD) is an interesting variant of (PD) where, the dominant mutual defection strategy relies on the fact that it is a one shot game with no future. The key of the (IPD) is that the two players may meet each other again, and this allows the players to develop their strategies based on the previous game interactions [19]. Therefore, a player’s move now may affect how his/her opponent behaves in the future and thus affect the player’s future payoffs, and this removes the single dominant strategy of mutual defection because, the players use more complex strategies which depend on the game history to maximize the payoffs that they receive. In fact, under the correct circumstances mutual cooperation can emerge [10,20]. Xia et al. have focused on the weak prisoner’s dilemma on random and scale-free (SF) networks, and have shown that degree-uncorrelated activity patterns on scale-free networks significantly impair the evolution of cooperation, and they studied how the heterogeneous coupling strength affects the evolution of cooperation in the prisoner’s dilemma game with two types of coupling schemes (symmetric and asymmetric ones) [21]. In addition, the symmetric coupling strength setup leads to the higher cooperation when compared to the asymmetric, that is, the asymmetric coupling loses the evolutionary advantage. Their results convincingly demonstrated that the emergence or persistence of cooperation within many real-world systems can be accounted for by the interdependency between meta-populations or sub-systems [22]. Moreover, they put forward an improved traveler’s dilemma game model on two coupled lattices to investigate the effect of coupling effect on the evolution of cooperation based on the traveler’s dilemma game, where the coupling effect between two lattices is added into the strategy imitation process, and they indicated that the cooperation behavior can be greatly varied when compared to those obtained on the traditionally single lattices [23]. Their results are surprisingly conducive to understanding the cooperation behavior of traveler’s dilemma game within many real world systems, especially for coupled and interdependent networked systems. They integrate the coupling effect between corresponding players on two lattices, and noticed that the coupling or correlation strength between two lattices will observably influence the process of strategy imitation, and further change the persistence and emergence of cooperation in the whole system [24]. Also Perc and Szolonki were interested in studying the enhancement of cooperation, and the impact of diverse activity patterns on the evolution of cooperation in evolutionary social dilemmas [25–28]. Wang et al. studied the evolution of public cooperation on two interdependent networks that are connected by means of a utility function, which determines to what extent payoffs in one network influence the success of players in the other network [9,29]. Also, they have shown that the percolation threshold of an interaction graph constitutes the optimal population density for the evolution of public cooperation, and they have demonstrated this by presenting outcomes of the public goods game on the square lattice with and without an extended imitation range, as well as on the triangular lattice [30–32] importantly, they have found that for cooperation to be optimally promoted, the interdependence should stem only from an intermediate fraction of links connecting the two networks, and that those links should affect the utility of players significantly [33]. Recently, they have studied the evolution of cooperation in the public goods game on interdependent networks that are subject to interconnectedness by means of a network-symmetric definition of utility. Strategy imitation has been allowed only between players residing on the same network, but not between players on different networks. They have shown first that, in general, increasing the relevance of the average payoff of nearest neighbors on the expense of individual payoffs in the evaluation of utility increases the survivability of cooperators [34,35]. They showed that the interdependence between networks self-organizes so as to yield optimal conditions for the evolution of cooperation [36]. Game theory has been extended into evolutionary biology, which has generated great insight into the evolution of strategies under both biological and cultural evolution. The replicator equation, which consists of sets of differential equations describing how the strategies of a population evolve over time under selective pressures, has also been used to study learning in various scenarios [37–39]. There are various approaches to construct dynamics in repeated games [40–42]. Kleimenov and Schneider proposed approach of constructing dynamics in the repeated three-person game to give a tool for solving various optimization problems, for example, the problem of minimizing time of using abnormal behavior types. In their approach, two players act in the class of mixed strategies and the third player acts in the class of pure strategies [43,44]. Matsushima and Ikegami discussed the similarity between a noisy 2p − IPD and a noiseless 3p − IPD game where the role of noise in
46
E. EL-Seidy et al. / Applied Mathematics and Computation 277 (2016) 44–53
the two-person game is replaced by the third player in the three-person game. It is known that, due to the noise, Tit for Tat loses its robustness and is taken over by more complex strategies in a noiseless IPD game, but in the 3p − IPD game, even without noise, Tit for Tat loses its robustness and is also taken over by more complex strategies [45]. They found that similar strategies take over Tit for Tat in both situations. It is also found that game strategies in an automaton form can be understood as a combination of defensive and offensive substructures. A recognition of these substructures enabled them to study the mechanism of robustness in the strategies of the 3p − IPD game. 2. Three-player prisoner’s dilemma game (3p-PD) 2.1. One shot game Most game theory research on the prisoner’s dilemma has focused on two player games, but it is possible to create it involving three or even more players where, the strategies from the two player game do not necessarily extend to a three player game in a natural way. We consider a simple game with three players such that, each player has two pure strategies C and D, and each round in the game leads to one of the eight possible outcomes CCC, CCD, CDC, CDD, DCC, DCD, DDC or DDD, where the first position represents the player under consideration, the second and the third positions represent the opponents. For example, DCD represents the payoff to a defecting player if one of his two opponents cooperates and the other opponent defects. Since we assume a symmetric game matrix, XCD could be written as XDC, where X may be C or D. These outcomes are specified by the player’s payoff R, K, S, T, L or P which can be numbered by i = 1, 2, 3, 4, 5, 6, respectively. We impose three rules about the payoffs for the (3p-PD): 1. Defection should be the dominant choice for each player. In other words, it should be always better for a player to defect, regardless what the opponents do. This rule gives three constraints: (a) DCC > CCC (T > R) (both opponents cooperate). (b) DCD > CCD (L > K) (one opponent cooperates, the other defects). (c) DDD > CDD (P > S) (both opponents defect). 2. A player should always be better off if more of his opponents choose to cooperate. This rule gives two constraints: (a) CCC > CCD > CDD (R > K > S). (b) DCC > DCD > DDD (T > L > P). 3. If one player’s choice is fixed, the other two players should be left in (2p-PD). This rule gives the two constraints: (a) CCD > DDD (K > P). (b) CCC > DCD (R > L). Finally, suppose the payoff matrix of the (3p-PD) is
C D
CC CD DD R T
K L
S P
(2)
where, T > R > L > K > P > S. Assume a rational player under consideration who wants to maximize his/her reward and thinks that his/her opponents will cooperate, then he/she will defect to receive a reward, T points as opposed to the cooperation which would have earned him/her only, R points. If the rational player thinks that his/her opponents will defect, then he/she will also defect and receive, P points rather than cooperate and receive the sucker’s payoff, S points. Moreover if the rational player thinks that one of his/her opponents will defect and the other will cooperate, then he/she will also defect and receive, L points rather than cooperate and receive, K points. Therefore, the rational decision is to always defect. But assuming the opponents are also rational, they will come to the same conclusion and they will defect. Thus, all players will always defect, earning rewards of, P points rather than the, R points that cooperation could have yielded. Therefore, defection is the dominant strategy for (3p-PD) (the Nash Equilibrium), this holds true as long as the payoffs follow the relationship, T > R > L > K > P > S. 2.2. Infinitely iterated game (3p-IPD) Consider now the iterated game repeating the simple game infinitely i.e. with probability 1 to repeat the game. In the above discussion of the (3p-PD), the dominant defection strategy relies on the fact that it is a one shot game with no future. The key to the (3p-IPD) is that the three players may meet each other again, this allows the players to develop strategies based on the previous game interactions. Therefore, a player’s move now may affect how his/her opponents behave in the future and thus affect the player’s future payoffs. This removes the single dominant strategy of defection because, the players use more complex strategies dependent on game history to maximize the payoffs they will receive. We assume that the three players take their decision according to the last choice of the opponents, only the last choice, by this assumption we call our game, (3p-IPD) with memory one. The length of the (3p-IPD) (i.e. number of repetitions of the dilemma played) must not be known to the players, unless all players would defect.
E. EL-Seidy et al. / Applied Mathematics and Computation 277 (2016) 44–53
CD,DD
CC CC C
CC CC,CD
DD
CD,DD
D
C
CD,DD
D DD
(b) S38
(a) S36 CC,CD
CD,DD CC
C
D
CC,CD
DD
CC,CD,DD
D
C
47
(c) S52 CC,CD,DD
DD C
D
CC,CD
C
D
CC,CD,DD CC,CD,DD
(d) S54
(e) S63
(f) S0
Fig. 1. Some examples of automata: (a) automaton represents (TFT1) S36 where, it still play C only if its two opponents play C together while, it moves from the state C to the state D if only one of its two opponents play D. In addition, it still play D unless its two opponents plays C together again. (b) Automaton represents (TFT2) S38 where, it still play C if its two opponents play C together while, it moves from the state C to the state D if only one of its two opponents play D. In addition, it still play D only if its two opponents plays D together again. (c) Automaton represents (TFT3) S52 where, it still play C if only one of its two opponents play C while, it moves from the state C to the state D if only one of its two opponents play D. In addition, it still play D unless its two opponents plays C together again. (d) Automaton represents (TFT4) S54 where, it still play C if one of its two opponents play C while, it moves from the state C to the state D if its two opponents play D. In addition, it still play D unless one of its two opponents plays C again. (e) Automaton represents (ALLC) S63 where, it still play C forever. (f) Automaton represents (ALLD) S0 where, it still play D forever.
In the (3p-IPD), every player has two choices either to defect or to cooperate after each outcome of the six outcomes T, R, L, K, P, S, so the total number of strategies can be composed as 26 = 64 different strategies. The 64 possible strategies can be labeled by (u1 , u2 , u3 , u4 , u5 , u6 ) of zeros and ones. Here, ui is 1 if the player plays C and 0 if he/she plays D after outcome i(i = 1, 2, 3, 4, 5, 6 ). For convenience, we label these rules by Sj , where j ranges from 0 to 63 and j is the integer given by (in binary notation) u1 u2 u3 u4 u5 u6 [11,17]. We can describe our strategies by finite state automata, more precisely, two state automata only. Each of the three players is now an automaton which can be in one of two states through any given round of the (3p-IPD). These states correspond to the two possible moves C and D. The state of the player in the following round depends on the present state and on the opponent’s move. Hence, each such automaton is specified by a graph with two nodes C and D and three oriented edges issuing from each node, one labeled C and the other D, which specify the transition from the current state to the state in the next round [17]. For examples, the transition rule (1, 0, 0, 1, 1, 0) represents the strategy S38 represented in Fig. 1. How one rule fares against another depends, of course, on the initial condition of this rule [10]. Let us consider, for instance, an automaton with rule S36 (a retaliator never relents after defection from any one of his/her opponents unless they both cooperate again) against the two automata S38 (a retaliator but is more forgiving than S36 ) and S52 (slow to anger than his/her opponents and slow to forgive than S36 ). (a) If all three automata start with C, they will keep playing C forever. The sequence looks as follows: S36 : S38 : S52 :
C C C
C C C
C C C
C C C
C C C
C C C
C C C
C C C
D D D
D D D
C... C... C...
(b) If all three automata start with D, the sequence looks as follows: S36 : S38 : S52 :
D D D
D D D
D D D
D D D
D D D
D D D
D... D... D...
(c) If S36 and S38 start with C while S52 starts with D, the sequence looks as follows: S36 : S38 : S52 :
C C D
D D C
D C D
D D D
D D D
D D D
D D D
D D D
D... D... D...
(d) If S36 and S52 start with C while S38 starts with D, the sequence looks as follows: S36 : S38 : S52 :
C D C
D C C
C D C
D C C
C D C
D C C
C D C
D C C
C... D... C...
(e) If S38 and S52 start with D while S36 starts with C, the sequence looks as follows: S36 : S38 : S52 :
C D D
D C D
D D D
D D D
D D D
D D D
D D D
D D D
D... D... D...
48
E. EL-Seidy et al. / Applied Mathematics and Computation 277 (2016) 44–53
(f) If S38 and S52 start with C while S36 starts with D, the sequence looks as follows: S36 : S38 : S52 :
D C C
C D C
D C C
C D C
D C C
C D C
D C C
C D C
D... C... C...
(g) If S36 and S52 start with D while S38 starts with C, the sequence looks as follows: S36 : S38 : S52 :
D C D
D D D
D D D
D D D
D D D
D D D
D D D
D D D
D... D... D...
(h) If S36 and S38 start with D while S52 starts with C, the sequence looks as follows: S36 : S38 : S52 :
D D C
D C D
D D D
D D D
D D D
D D D
D D D
D D D
D... D... D...
The payoff in the infinitely repeated game is simply the average payoff per round. In our example, for the player under consideration who is using the strategy S36 , the payoff is R in case (a), (K + T )/2 in cases (d) and (f), and P in cases (b), (c), (e), (g) and (h). This directly means that for the player using the strategy S38 , the payoff is R in case (a), (K + T )/2 in cases (d) and (f), and P in cases (b), (c), (e), (g) and (h). Finally for the player using the strategy S52 , the payoff is R in case (a), (K + T )/2 in cases (d) and (f), and P in cases (b), (c), (e), (g) and (h). Noting that the payoffs are independent of the moves of the players in the first round. We can use a more direct approach [17] where the eight possible initial conditions lead (in unperturbed runs) to three possible regimes A, B and E, where A denotes the run where the three players use C, while B is the run where the S52 player always plays C and the other two opponents one of them plays C and the other plays D, finally E denotes the run where the three players use D. Suppose we are in regime A, rare perturbation causes one of the three players to play D that follows either scenario (f), (d) or (c), and hence leads after few steps with probability 2/3 to regime B and with probability 1/3 to regime E. Suppose now that a perturbation occurs in regime B, it leads with probability 1/3 to regime A and with probability 2/3 to regime E. Suppose now that a perturbation occurs in regime E, it leads always to regime E. The corresponding transition matrix is
0 1 3
0
2 3
0 0
1 3 2 3
(3)
1
The corresponding stationary distribution vector (the eigen vector corresponding to the eigen value 1) is (0, 0, 1). This means that, an iterated game between S36 , S38 and S52 will be always in the regime E. Thus, the S36 -player receives an average payoff P per round. This argument, repeated for each of the 64 × 64 × 64 = 262, 144 entries, yields a 64 payoff matrix each of them is 64 × 64 payoff matrix. In [46], we are interested in calculating the all payoffs corresponding to the different 64 strategies using an algorithm implemented by a programming language. 3. Two population (3p-IPD) In this section, we suppose that two players will act as one unit (allies) by choosing either C together or D together against the player under consideration (independent player) who can play either C or D. This assumption will reduce the 64 strategy to only 16 strategies and transform the symmetric (3p-IPD) to asymmetric (2p-IPD) game. For example, the four different kinds of TFTY, Y = 1, 2, 3, 4 which are defined in Fig. 1 will be of the same behavior as S10 defined in Fig. 2. Asymmetric game is a strategic confrontation of two players which is called non zero sum game, because the interests of the players are not required to be exactly opposed to each other. Therefore, we have a game with two types of players such that the first type is the player under consideration, denoted by player I, who either play C or D while the second type is the other two players, denoted by player II, who will play C together calling it C∗ or D together calling it D∗ . Suppose that this game is played infinitely repeated where E and F represent the payoff matrix of the first type and the second type respectively, then we have:
E=
C D
C
∗
R T
D∗ S P
(4)
and
C∗ F= ∗ D
C D R L
K P
(5)
E. EL-Seidy et al. / Applied Mathematics and Computation 277 (2016) 44–53
C∗
C∗
C∗
D
C
C D
C∗
D∗
C∗
D
C
C∗ D
∗
D∗
D
D∗
(a) S8 C
C
∗
(c) S10
C C*
D*
C*
D
D
D∗
(b) S14 D C
C
49
C D*
D∗
C*
D
D
(d) S8∗
D ∗ (e) S14
∗ (f) S10
∗ Fig. 2. Some examples of automata: (a) S8 , (b) S14 , (c) S10 are some strategies of the first type where, S10 is Tit-for-Tat of the first type. (d) S8∗ , (e) S14 , (f) ∗ ∗ are some strategies of the second type where, S10 is Tit-for-Tat of the second type. S10
Each round leads to one of the four possible outcomes H1 = (C, C ∗ ), H2 = (C, D∗ ), H3 = (D, C ∗ ) or H4 = (D, D∗ ), where the first and the second positions denote the options chosen by player I and player II, respectively. If we indicated the outcomes H1 , H2 , H3 and H4 by i = 1, 2, 3, 4 respectively, then there are 16 possibilities transition rules for each player which are labeled by quadruples (u1 , u2 , u3 , u4 ) for player I and (v1 , v2 , v3 , v4 ) for player II, where ui , vi are zeros and ones. Here, ui is 1 if player I plays C after outcome i and similarly vi is 1 if player II plays C∗ after outcome i. The quadruple (u1 , u2 , u3 , u4 ) can be represented by the binary notation, for example (1,0,1,0) is the tenth strategy and denoted by S10 . We denoted by Sj for the strategies of player I and S∗j for the strategies of player II; j = 0, 1, . . . , 15. In Fig. 2 there are some automata diagrams for strategies of first and second type. Suppose that player I adopting strategy S = (u1 , u2 , u3 , u4 ) matched against player II using S∗ = (v1 , v2 , v3 , v4 ), then for the outcomes H1 , H2 , H3 and H4 of the players’ strategies, the Markov matrix of the transition rules between these outcomes is given by:
⎛
u1 v1 ⎜u2 v2 ⎝u v 3 3 u4 v4
u1 (1 − v1 ) u2 (1 − v2 ) u3 (1 − v3 ) u4 (1 − v4 )
(1 − u1 )v1 (1 − u2 )v2 (1 − u3 )v3 (1 − u4 )v4
⎞ (1 − u1 )(1 − v1 ) (1 − u2 )(1 − v2 )⎟ (1 − u3 )(1 − v3 )⎠ (1 − u4 )(1 − v4 )
(6)
For example, (1 − u2 ) (1 − v2 ) means that, the transition probability from state H2 to state H4 , where player I play C with probability (1 − u2 ) after outcome H2 , while player II play C∗ with probability (1 − v2 ) after outcome H2 . The probability vector π = (π1 , π2 , π3 , π4 ) represents the stationary distribution of player I for n (number of rounds) → ∞, where π i is greater than or equal to 0 for all i = 1, 2, 3, 4 with total sum equal to 1. Thus, the probability vector π is a left eigenvector of the matrix M corresponding to the eigenvalue 1, so we have π M = π . If ui , vi are inside of the strategy cube, then all Table 1 From Table 1 we can calculate the payoff values for player I and II as explained in Section 3. For example, the element (0122) in the fifth row and second column is leading to probability vector π = (0, 15 , 25 , 25 ) for a player I of first type using the strategy S4 against a player II of second type using the strategy S1∗ such that, the expected payoff of the player I = 0 + 15 S + 25 T + 25 P, and the expected payoff of the player II = 0 + 15 L + 25 K + 25 P.
S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15
S0∗
S1∗
S2∗
S3∗
S4∗
S5∗
S6∗
S7∗
S8∗
S9∗
∗ S10
∗ S11
∗ S12
∗ S13
∗ S14
∗ S15
0001 0101 0001 0101 0102 0100 0101 0100 0001 0101 0001 0101 0101 0100 0201 0100
0011 1001 0111 1001 0122 1101 0100 1201 0011 1101 0111 1101 0211 0100 0100 0100
0001 0111 0112 0110 0001 1101 0001 1101 0001 0111 0111 0110 1102 1100 2201 1100
0011 1001 0110 1111 0011 1001 1111 1001 0011 1111 0110 0110 1111 1100 1100 1100
0012 0212 0001 0101 0112 0100 0102 0100 0012 0212 0001 0101 0312 0100 0201 0100
0010 1011 1011 1001 0010 1111 1111 1101 0010 1111 1111 1101 0110 0100 0100 0100
0011 0010 0001 1111 0012 1111 0001 1101 0021 0010 1111 1110 1111 2210 2201 1100
0010 1021 1011 1001 0010 1011 1011 1001 0010 0010 1110 1110 1120 2210 1100 1100
0001 0101 0001 0101 0102 0100 0201 0100 0001 1202 0001 1202 1203 1200 1201 1200
0011 1011 0111 1111 0122 1111 0100 0100 1022 1000 1111 1000 1111 2100 1200 1100
0001 0111 0111 0110 0001 1111 1111 1110 0001 1111 1111 1110 1001 1000 1000 1000
0011 1011 0110 0110 0011 1011 1110 1110 1022 1000 1110 2110 2011 1000 1000 1000
0011 0121 1012 1111 0132 0110 1111 1210 1023 1111 1001 2101 1111 2310 3201 1100
0010 0010 1010 1010 0010 0010 2120 2120 1020 2010 1000 1000 2130 2110 2100 2100
0021 0010 2021 1010 0021 0010 2021 1010 1021 1020 1000 1000 3021 2010 1000 1000
0010 0010 1010 1010 0010 0010 1010 1010 1020 1010 1000 1000 1010 2010 1000 1000
50
E. EL-Seidy et al. / Applied Mathematics and Computation 277 (2016) 44–53
Table 2 The expected payoff values for player I using the values T = 9, R = 7, P = 1, S = 0. For example, the expected payoff of a player I of the first type using the strategy S4 against a player II of the second type using the strategy S1∗ is equal to 4 which is placed in the fifth row and second column.
S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15
S0∗
S1∗
S2∗
S3∗
1 0.5 1 0.5
5 4
1
10 3
4 4
2.75 4.5 1
8 3
8 3
0 2 5
1
8 3 10 3 8 3
10 3 10 3
5 4 4.5 4.25 5 4 4.25 4 5 4.25 4.5 4.5 4.25 3.5 3.5 3.5
2 3
0 0.5 0 1 0.5 1 0.5 0.5 0 1 3
0
10 3
8 3
1
2.5 0 0 0
4.5 2.25 3.5 3 3.5
S4∗
S5∗
S6∗
S7∗
S8∗
S9∗
∗ S10
∗ S11
∗ S12
∗ S13
11 3
9
5 9 1 4.25
9 6.5
1 0.5 1 0.5
5
1
5
10 3 10 3
17 3
5 4.75 4.5 4.25
2 3
17 3 10 3 13 3
4 4.25 0 0 5.4 7 4.25 7 4.25
9 9 8 8 9 9 6.4 6.4
2.2 1 0.5 2.75 0 2 3
17 3 17 3
4 9 4.25 4.25
11 3
4.25 1 8 3 19 3
8 3
0 11 3
2.2 1 0.5 11 6
0 1 3
0
17 3
4 9 17 3 17 3
0
4 9 9
0 1 1.8 1 1.8
9 4.25 4.25
9 4.25
8 3
16 3
16 3 16 3
4.5 0 0 0
4.25 4.6 3 3.5
6.25 4.6 3.5 3.5
1 3
5 3 7 3
4.5 1 4.25 4.25 16 3
2 7 3
3.5
17 3 16 3 16 3
1 4.25 4.25
5.4 7
16 3
5.75 6 7 7 7
4 7 7 7
14 3 7 3
4.5 4.5 5
16 3
29 6
4.5 4.25 4 14 3
25 3
4.25 4 3.75 4.25
6 7 7
23 6 17 3
5.75
3.5
41 6 14 3 14 3
∗ S14
∗ S15
19 3
9 9 8 8 9 9 8 8
9 6.6 8 6.33 9 6.6 8 6.5 25 3
7 7 40
25 3
8 7 7 8
23 3
23 3
7 7
7 7
entries of M are strictly positive, then according to Perron–Frobenius theorem, the probability vector π will be unique. But if ui , vi are zeros or ones, then the stochastic matrix M will contains many zeros, and it is no longer irreducible. Therefore, the vector π is no longer uniquely defined. Now, we are going to use another direct technique to compute the probability vector π . For more explanation in this technique, we have the following example. Consider, for instance, player I using strategy S8 with the rule (1, 0, 0, 0) plays ∗ with the rule (1,1,1,0), then for the different initial states we have: against player II using strategy S14 1. If player I starts with C and player II starts with C∗ , the sequence looks as follows: S8 ∗ S14
C C∗
C C∗
C C∗
C C∗
C C∗
C C∗
C C∗
C C∗
C C∗
In this case player I has the payoff R and player II has the payoff R. 2. If player I starts with C and player II starts with D∗ , the sequence looks as follows: S8 ∗ S14
C D∗
D C∗
D C∗
D C∗
D C∗
D C∗
D C∗
D C∗
D C∗
In this case player I has the payoff T and player II has the payoff K. 3. If player I starts with D and player II starts with C∗ , the sequence looks as follows: S8 ∗ S14
D C∗
D C∗
D C∗
D C∗
D C∗
D C∗
D C∗
D C∗
D C∗
In this case player I has the payoff T and player II has the payoff K.
Table 3 The expected payoff values for player II using the values T = 9, R = 7, P = 1, S = 0. For example, the expected payoff of a player II of the second type which is placed in the second row and fifth column. using the strategy S1∗ against a player I of the first type using the strategy S4 is equal to 13 5
S0∗ S1∗ S2∗ S3∗ S4∗ S5∗ S6∗ S7∗ S8∗ S9∗ ∗ S10 ∗ S11 ∗ S12 ∗ S13 ∗ S14 ∗ S15
S0
S1
S2
S3
1 2 1 2
3 4 3 4 3
1 3 2.5 4 1
11 3
11 3
3 3.5 3
1
3 4 4 4 3 4 4 4 3 4 4 4 4 5 5 5
5 3
3 2 3 1 2 1 2 2 3 7 3
3
11 3
3 11 3
3 3 3 3
11 3
1 3 3 4 3 5 4.2 5
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
7 3
5
3 5 1 4
5 4.5
1 2 1 2
3
1 3 3 4 1 4 4 5 1 4 4 5 4 7 7 7
3
3 3.5 3.5 4
5 5 6 6 5 5 5.4 5.4
2.6 1 2 2.5 3 5 3
3 7 3
2.6 1 2 8 3
3 7 3
3
13 3 13 3
4 5 4 4 11 3
5 4 4 11 3
4 3 3 3
7 3
4 1 11 3 11 3
5 4 5 4 5 4.2 5
13 3
4 5
5 3
13 3 13 3
3
4 5 5 5 5 5 5 5 5
3 1 3 1 3
7 3
8 3 13 3
3.5 13 3
13 3
3 4 3 4 3 3 3.8 7 4 7 4 17 3 13 3
5
13 3
4 4 3 13 3
5 5 3.8 7 5 5.5 5 7 7 7
10 3
4 4 4.5 10 3
4 4 4.5 4 14 3 14 3
5
17 3 19 3
7 7 16 3
5.5 17 3 17 3
S14
S15
11 3
5 5 6 6 5 5 6 6
5 5 6 11 3
5 5 6 4.5 17 3
7 7 32
17 3
6 7 7 6
19 3
19 3
7 7
7 7
E. EL-Seidy et al. / Applied Mathematics and Computation 277 (2016) 44–53
51
Table 4 The payoff values for a player I of first type who is using the strategy Sm against a player ∗ for every m = 0, 1, 2, . . . , 15. We can see that, it is better II of second type who is using Sm to be a player of the first type than being of the second type. Sm
Player I payoff ≥
Player II payoff
m=0 m=1 m=2 m=3 m=4 m=5 m=6 m=7 m=8 m=9 m = 10 m = 11 m = 12 m = 13 m = 14 m = 15
P=1 R+P =4 2 S+T +2P = 11/4 4 R+S+T +P = 17/4 4 S+T +2P = 11/4 4 R+S+T +P = 17/4 4 P=1 R+P =4 2 P=1 R=7 R+S+T +P = 17 4 4 2R+S+T 23 = 4 4 R+S+T +P 17 = 4 4 2R+S+T = 23 4 4 R=7 R=7
P=1 R+P =4 2 K+L+2P = 10 4 4 R+K+L+P = 16 4 4 K+L+2P 10 = 4 4 R+K+L+P 16 = 4 4 P=1 R+P =4 2 P=1 R=7 R+K+L+P = 16 4 4 2R+K+L 22 = 4 4 R+K+L+P 16 = 4 4 2R+K+L = 22 4 4 R=7 R=7
Fig. 3. Some strategies behaviors: (a) clarify the behavior of a player of the first type who is using the tenth strategy S10 against the all strategies of the ∗ against the all strategies of the first type. (c) Clarify second type. (b) Clarify the behavior of a player of the second type who is using the tenth strategy S10 the behavior of a player of the first type who is using the strategy S0 against the all strategies of the second type. (d) Clarify the behavior of a player of the second type who is using the strategy S0∗ against the all strategies of the first type. We can see that, it is better to be a player of the first type than being of the second type.
4. If player I starts with D and player II starts with D∗ , the sequence looks as follows: S8 ∗ S14
D D∗
D D∗
D D∗
D D∗
D D∗
D D∗
D D∗
D D∗
D D∗
In this case player I has the payoff P and player II has the payoff P. We can use a more direct approach [10] where the four possible initial conditions lead (in unperturbed runs) to three possible regimes A, B and E, where A denotes the run where player I always plays C, and player II always plays C∗ while B is the run where player I always plays D and player II always plays C∗ , finally E denotes the run where player I always plays D and player II always plays D∗ . Suppose we are in regime A, rare perturbation leads always to regime B. Suppose now that
52
E. EL-Seidy et al. / Applied Mathematics and Computation 277 (2016) 44–53
a perturbation occurs in regime B, it leads with probability 12 to regime A and with probability 12 to regime E. Suppose now that a perturbation occurs in regime E, it leads always to regime B. The corresponding transition matrix is:
0 1 2
0
1 0 1
0 1 2
(7)
0
+P The corresponding stationary distribution vector is ( 14 , 12 , 14 ). Thus, the expected payoff for player I will be R+2T , and 4 R+2K+P the expected payoff for player II will be . By repeating this technique for the combinations between the strate4 gies of both players, we get 256 probability vectors for strategy Sj of player I against strategy Sl∗ of player II where, j, l = 0, 1, 2, . . . , 15. These probability vectors computed from the elements which are in Table 1 for player I and II, from ni , i = 1, 2, 3, 4 where, (n1 n2 n3 n4 ) is the element which exists in the ( j + 1 )th row and (l + 1 )th the relation π i = n +n +n 1 2 3 +n4 column in Table 1. Hence, the expected payoff of player I will be (π1 R + π2 S + π3 T + π4 P ) while (π1 R + π2 L + π3 K + π4 P ) for player II. We consider a payoffs that is more similar to Axelrod’s payoffs T = 9, R = 7, L = 5, K = 3, P = 1, S = 0 and calculate the expected payoffs for player I in Table 2 and for player II in Table 3. For example, if a player I of the first type using S1 will play against a player II of the second type using S4∗ and we want to calculate the expected payoffs for both of them so, we can search in Table 1 for the element in the second row and fifth columns which is (0212) that leads to probability vector π = (0, 25 , 15 , 25 ). Therefore, the expected payoff of player I = 25 S + 15 T + 25 P = 11 5 , see Table 2. While the expected payoff of player II = 25 L + 15 K + 25 P = 15 5 = 3, see Table 3. ∗ , we found that player I If player I will use a certain strategy Sm against player II who will also use the same strategy Sm will take a payoff greater than or equal to the payoff of player II for every m = 0, 1, 2, . . . , 15. For example, if three players will use the standard Tit-for-Tat strategy such that one of them will be player I (S10 ) and the other two players will be ∗ ). The player I will receive payoff R+S+T +P = 17 while player II will receive R+L+K+P = 16 , see Table 4 and Fig. 3. player II (S10 4 4 4 4
4. Conclusion We studied three-player prisoner’s dilemma game (3p-PD) where, each player has two pure strategies C and D. Each round in the game leads to one of the six possible outcomes R = (C, C, C ), K = (C, C, D ) = (C, D, C ), S = (C, D, D ), T = (D, C, C ), L = (D, C, D ) = (D, D, C ) or P = (D, D, D ), where the first position represents the player under consideration, the second and the third positions represent the opponents. If the game is one shot game, we have shown that always defection is the dominant strategy for (3p-PD). This holds true as long as the payoffs follow the relationship, T > R > L > K > P > S. Then we considered the infinitely iterated prisoner’s dilemma (3p-IPD) such that the three players may meet each other again and have the ability to develop their strategies based on the previous game interactions and affect how his/her opponents behave in the future. The single dominant strategy of defection was removed because, the players use more complex strategies dependent on the game history in order to maximize the payoffs that they will receive. We assumed that the three players take their decision according to only the last choice of the opponents i.e. (3p-IPD) with memory one. We described the player’s strategies by two state automata, these two states are corresponding to the two possible moves C and D. In [46], we were interested in calculating the all payoffs corresponding to the different 64 strategies using an algorithm implemented by a programing language but in this paper, we supposed that two players will act as one unit by choosing either C together or D together against the player under consideration who can play either C or D. This assumption reduced the 64 strategy to only 16 strategies and transformed the symmetric (3p-IPD) to asymmetric (2p-IPD) game. Therefor we dealt with two types of players such that the first type is the player under consideration, denoted by player I, who either play C or D while the second type is the other two players, denoted by player II, who will play C together calling it C∗ or D together calling it D∗ . We supposed that this game is played infinitely repeated and, calculated the expected payoffs for player I and II in Table 1. We considered payoffs more similar to Axelrod’s payoffs T = 9, R = 7, L = 5, K = 3, P = 1, S = 0 and calculated the expected payoffs for player I in Table 2 and for player II in Table 3. We found that, It is better to be a player of first type than being of the second type. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
R. Axelrod, The Evolution of Cooperation, Basic Books, New York, 2006. R. Axelrod, W. Hamilton, The evolution of cooperation, Science 211 (1981) 1390. L.A. Imhof, D. Fudenberg, M.A. Nowak, Tit-for-tat or win-stay, lose-shift? J. Theor. Biol. 247 (2007) 574–580. M. Nowak, K. Sigmund, A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoners’ dilemma game, Nature 364 (1993) 56–58. A. Szolnoki, M. Mobilia, L. Jiang, B. Szczesny, A.M. Rucklidge, M. Perc, Cyclic dominance in evolutionary games: a review, J. R. Soc. Interface 11 (2014) 20140735. J. von Neumann, O. Morgenstern, Theory of Games and Economic Behaviour, Princeton University Press, Princeton, NJ, 1944. J.W. Friedman, Game Theory with Applications to Economics, Oxford University Press, New York, 1990. P. Hammerstein, R. Selten, Game theory and evolutionary biology , Handbook of Game Theory with Economic Applications, 2, 1994, pp. 929–993. Z. Wang, A. Szolnoki, M. Perc, Evolution of public cooperation on interdependent networks: The impact of biased utility functions, EPL 97 (2012) 48001. A. Haider, Using Genetic Algorithms to Develop Strategies for the Prisoners Dilemma, University Library of Munich, Germany, 2006. E. ElSeidy, H.K. Arafat, M.A. Taha, the trembling hand approach to automata in iterated games, Math. Theory Model. 3 (2013) 47–65. F. Chen, A mathematical analysis of public avoidance behavior during epidemics using game theory, J. Theor. Biol. 302 (2012) 18–28. J. Tanimoto, T. Fujiki, Z. Wang, A. Hagishima, N. Ikegaya, Dangerous drivers foster social dilemma structures hidden behind a traffic flow with lane changes, J. Stat. Mech. Theory Exp. 11 (2014) P11027.28.
E. EL-Seidy et al. / Applied Mathematics and Computation 277 (2016) 44–53
53
[14] Y. Shoham, Computer science and game theory, Commun. ACM 8 (2008) 74–79. [15] J. Golbeck, Evolving strategies for the prisoner’s dilemma, in: Fuzzy Systems and Evolutionary Computation, in: Advances in Intelligent Systems, 306, 2002, p. 299. [16] A. Rapoport, A. Chammah, The Prisoner’s Dilemma: A Study in Conflict and Cooperation, Univ. of Michigan Press, Ann Arbor, 1965. [17] M.A. Nowak, K. Sigmund, E. ElSedy, Automata repeated games and noise, Math. Biol. 33 (1995) 703–722. [18] J. Andreoni, J.H. Miller, Rational cooperation in the finitely repeated prisoner’s dilemma: Experimental evidence, Econ. J. 103 (1993) 570–585. [19] A. Errity, Evolving Strategies for the Prisoner’s Dilemma, Dublin City University, Ireland, 2003. [20] C. Hilbe, M.A. Nowak, K. Sigmund, Evolution of extortion in iterated prisoner’s dilemma games, Proc. Natl. Acad. Sci. 110 (2013) 6913–6918. [21] C.Y. Xia, S. Meloni, M. Perc, Y. Moreno, Dynamic instability of cooperation due to diverse activity patterns in evolutionary social dilemmas, EPL 109 (2015a) 58002. [22] C.Y. Xia, X.K. Meng, Z. Wang, Heterogeneous coupling between interdependent lattices promotes the cooperation in the prisoner’s dilemma game, PloS One 10 (2015b) e0129542. [23] C.Y. Xia, Q. Miao, J. Wang, S. Ding, Evolution of cooperation in the travelers dilemma game on two coupled lattices, Appl. Math. Comput. 246 (2014) 389–398. [24] M. Perc, A. Szolnoki, Coevolutionary games, a mini review, BioSystems 2 (2010) 109–125. [25] M. Perc, A. Szolnoki, Social diversity and promotion of cooperation in the spatial prisoner’s dilemma game, Phys. Rev. 77 (2008) 011904. [26] M. Perc, Does strong heterogeneity promote cooperation by group interactions? New J. Phys. 13 (2011) 123027. [27] M. Perc, A. Szolnoki, Coevolution of teaching activity promotes cooperation, New J. Phys. 10 (2008) 043036. [28] M. Perc, Evolution of cooperation on scale-free networks subject to error and attack, New J. Phys. 11 (2009) 033027. [29] Z. Wang, M. Perc, Aspiring to the fittest and promotion of cooperation in the prisoner’s dilemma game, Phys. Rev. E 82 (2010) 021115. [30] Z. Wang, A. Szolonki, M. Prec, Percolation threshold determines the optimal population density for public cooperation, Phys. Rev. E 85 (2012) 037101. [31] Z. Wang, A. Szolnoki, M. Perc, Interdependent network reciprocity in evolutionary games, Sci. Rep. 3 (2013) 1183. [32] A. Szolnoki, Z. Wang, M. Perc, Wisdom of groups promotes cooperation in evolutionary social dilemmas, Sci. Rep. 2 (2012) 576. [33] Z. Wang, A. Szolnoki, M. Perc, Optimal interdependence between networks for the evolution of cooperation, Sci. Rep. 3 (2013) 2470. [34] M. Perc, Z. Wang, Heterogeneous aspirations promote cooperation in the prisone’s dilemma game, PLoS ONE 5 (2010) e15117. [35] Z. Wang, L. Wang, A. Szolnoki, M. Perc, Evolutionary games on multilayer networks: a colloquium, Eur. Phys. J. B 88 (2015) 1–15. [36] Z. Wang, A. Szolnoki, M. Perc, Self-organization towards optimally interdependent networks by means of coevolution, New J. Phys. 16 (2014) 033041. [37] P.D. Taylor, L.B. Jonker, Evolutionary stable strategies and game dynamics, Math. Biosci. 40 (1978) 145–156. [38] K. Sigmund, M.A. Nowak, Evolutionary game theory, Curr. Biol. 9 (1999) R503–R505. [39] Z. Wang, M.A. Andrews, Z.X. Wu, L. Wang, C.T. Bauch, Coupled disease-behavior dynamics on complex networks: A review, Phys. Life Rev. (2015), doi:10.1016/j.plrev.2015.07.006. [40] J.M. Smith, Evolution and the Theory of Games, Cambridge University Press, Cambridge, 1982. [41] J. Hofbauer, K. Sigmund, The Theory of Evolution and Dynamic Systems, Cambridge Univ. Press, Cambridge, 1988. [42] D. Friedman, Evolutionary games in economics, Econometrica 59 (1991) 637–666. [43] A.F. Kleimenov, Construction of dynamics in repeated three-person game with two strategies for players, in: Proceedings of the IFAC Workshop on Generalized Solutions in Control Problems, GSCP-04, 2004, pp. 132–137. [44] A.F. Kleimenov, M.A. Schneider, Cooperative dynamics in a repeated three-person game with finite number of strategies, in: Proceedings of the Sixteenth IFAC World Congress, Prague, Czech Republic, 2005. [45] M. Matsushima, T. Ikegami, Evolution of strategies in the three-person iterated prisoner’s dilemma game, J. Theor. Biol. 195 (1998) 53–67. [46] E. ElSeidy, K.M. Soliman, Iterated symmetric three-players prisoner’s dilemma game, Appl. Math. Comput. (2016) (AMC-D-15-00788R1).