Economics Letters North-Holland
38 (1992) 405-409
40s
Learning and equilibrium in 2 X 2 games Some simulation Debora
Di Gioacchino
Trinity College, Cambridge, Received Accepted
selection
results *
UK
21 November 199 1 I4 February 1992
We simulate a ‘fictitious play’ model with bounded memory and mistakes by agents. This model gives global convergence to a unique equilibrium. Our simulations suggest that in 2 X 2 games the risk dominant equilibrium is selected. both in the case of two players and in that of a large population with random matching.
1. Introduction In this paper we consider the dynamic process by which players learn to coordinate their actions in games with a multiplicity of equilibria. We assume that players have bounded memory and try to maximize expected utility given their beliefs about opponents’ behaviour. Expectations are backward looking; we assume, as in fictitious play, that agents use the empirical distribution of past plays by their opponents as their expectation. We also assume that, with a small probability, players choose randomly putting equal probability on each choice. This can be interpreted either as a trembling hand, or experimentation. Without mistakes each equilibrium point is locally stable under our learning dynamic. With mistakes Canning (1990) has shown that the system tends to a unique limit distribution and, if the mistake probabilities are small this limit distribution will be close to one of the equilibria of the game without mistakes. Canning solves this model analytically for one-period memory. In this paper we simulate this learning dynamic, for longer memories, on the class of 2 X 2 matrix games with two Nash equilibria in pure strategies and one in mixed strategies. Our simulation results suggest that the equilibrium selected satisfies Harsanyi and Selten’s (1988) criterion of risk-dominance. In games where both equilibria are ranked equally players coordinate and play each of the two pure strategy Nash equilibria half the time, a correlated equilibrium. When we move from two players to allow random matching in a large population (with private memory) coordination becomes harder, risk dominance remains as the selection criterion, Correspondence to: Debora Di Gioacchino, Via Campo Ligure 30. 00168 Roma, Italy. * I would like to thank my Ph.D. supervisor, David Canning. for his helpful advice and comments OlhS-17hS/92/$05.00
0 1992 - Elsevier
Science
Publishers
B.V. All rights reserved
on this note.
406
D. Di Gioacchino / Learning and equilibrium selection
but in the case where equilibria are equally ranked the outcome is close to the mixed strategy Nash equilibrium. Our results are consistent with Kandori, Mailath and Rob (1991) who study a model in which a large population play with one-period memory but agents observe all the games played in the period. The limit distribution puts probability one on the risk-dominant equilibrium, or in symmetric games without a risk-dominant equilibrium, the mixed strategy equilibrium.
2. The simulation
results
The model consists of H agents. Each period two agents are matched at random and play a 2 X 2 normal form game. Given the bounded memory of the players the state of the system any time is a N-vector for each player representing the last N actions played by his opponents. Agents, when playing, choose an action so as to maximise their expected payoff given the belief that the probability distribution over their opponents’ choices is given by the empirical distribution they remember. With a small probability, p, however, they make a mistake, playing each of their actions with probability l/2. The parameters we can vary in studying our learning process are N, the length of memory, p, the probability of mistakes, and H, the number of players in the pool. We begin by concentrating on the case of 2 players. Each simulation consists of 20,000 repetitions of the basic game. Consider the following game in which there are two pure-strategy Nash equilibria (top-left and bottom-right): Game
1
33
00
00
11
Without mistakes the long-run equilibrium depends on the initial position. If the initial history consists of more than l/4 of the (3, 3) outcome the game converges to the Pareto optimal equilibrium. If the initial history has more than 3/4 of (1, 1) outcome the players converge to always playing (1, 1). However, if there is a positive probability of mistakes and players have bounded memory, the game has a unique limit distribution and Canning (1990) shows that the empirical frequencies of outcomes converges, with probability one, to this limit for any initial conditions. We run the simulations starting from an initial history of all bottom-right [the (1, 1) equilibrium in this case] for memory, N = 2, 4, and 10 and for a = 0.2, 0.1, 0.05, and 0.01 [Canning (1990) has shown that for memory one the result is that all four outcomes occur equally often for any value of
Table 1 Percentage outcomes, game 1, two players.
N/P
0.2
2
0.79 0.10 0.81 0.09 0.81 0.09
4 10
0.1 0.10 0.02 0.09 0.01 0.09 0.01
0.90 0.05 0.90 0.05 0.90 0.05
0.05 0.05 0.00 0.05 0.00 0.05 0.01
0.95 0.03 0.95 0.02 0.94 0.02
0.01 0.02 0.00 0.02 0.01 0.02 0.01
0.99 0.00 0.98 0.01 0.00 0.01
0.01 0.01 0.01 0.01 0.00 0.99
D. Di Gioucchino
/ Learning and equilibrium
401
srlection
p]. The results are summarised in table 1, where the four numbers in the cell give the frequencies of the four possible outcomes of the game. For example, in game 1, with N = 2 and p = 0.2, the proportion of times the agents play top-left and get (3, 3) is 0.79. For N = 2 and N = 4 as p + 0 the ratio of times top-left is played approaches unity and its value is not significantly different from 1 -p. Looking at the outcomes for N = 10 when p is low, we see that there does not seem to be convergence to the risk-dominant equilibrium, though the outcome does approximate it for p reasonably large. It can be argued that there has not been enough time for the players to switch to the risk-dominant equilibrium. This was confirmed when we repeated the simulation for longer times. There is a trade off in terms of the value of ~7; high values give faster convergence but low values mean that the limit is closer to the efficient outcome. The process may be described as follows: starting from an initial history of all bottom-right, the players’ optimal response is to choose bottom and right; but, because they make mistakes, they sometimes choose the risk-dominant equilibrium. Once, by chance, the proportion of times players have chosen the risk-dominant equilibrium is bigger than l/4 they start playing top-left. A similar story holds for the transition from top-left to bottom-right; but this transition is harder because it requires more mistakes. As the probability of mistakes becomes smaller the difference in transition time becomes longer. With a small probability of mistakes it is relatively more difficult to get out from the initial equilibrium but once the risk-dominant equilibrium has been reached it is relatively more difficult to go back. Pareto efficiency, however, is not the general selection criterion. Consider the following game: Game
2
which is equivalent,
99
08
80
77
in terms of the best response
11
00
00
77
structure,
to
Game 2 has two pure-strategy Nash equilibria. Table 2 shows, after 20,000 repetitions, the fraction of times that the risk-dominant equilibrium, giving (7, 7), has been played approaches unity, though again there is the difficulty that with long memory the initial condition can influence the outcome. Now consider a game without a risk-dominant equilibrium: Game 3
Table
11
00
00
11’
2
Percentage
outcomes.
game 2,
two
players. 0.1
N/P
0.2
2
0.01
0.10
0.00
0.05
0.00
0.02
0.10
0.79
0.05
0.90
0.03
0.95
0.01 0.01
0.01
0.09
0.00
0.04
0.00
0.03
0.00
0.01
0.09
0.81
0.05
0.90
0.03
O.Y5
0.01
0.99
0.02 0.91
0.12
0.0 1
0.01
0.86
4 10
0.05
0.01
0.09
0.01
0.05
0.05
0.09
0.81
0.05
0.90
0.02
0.01 0.00 0.98
40x Table 3 Percentage
D. Di Gimcchino
outcomes,
N/P
0.2
3
0.38 0.13 0.36 0.14 0.38 0.10
4 6
/ Learning and equilibrium
selection
game 3, two players. 0. I 0.13 0.36 0.10 0.40 0.09 0.42
0.40 0.07 0.35 0.07 0.42 0.05
0.05 0.07 0.46 0.06 0.52 0.05 0.48
0.44 0.04 0.43 0.04 0.05 0.03
0.01 0.04 0.48 0.03 0.50 0.02 0.00
0.00 0.01 0.98 0.00 0.00 0.01
0.01 0.98 0.01 0.00 0.01 0.97
The game has two Nash equilibria in pure strategies and one in mixed strategies. The results, summarised in table 3, indicate that the game does not converge to either equilibrium but, as the memory gets longer, players tend to coordinate on one of the two equilibrium outcomes most of the time. For p relatively large each equilibrium occurs equally often but for N high and p low the players may get stuck in one of the equilibria for a long time, though the influence of the initial condition disappears if we run the simulation for long enough. The following game is similar in not having a risk-dominant equilibrium but it lacks the common interest property. Game
4
00
14
41
00
The game has two Nash equilibria in pure strategies (top-right and bottom-left) and one in mixed strategies [(l/5, 4/5) and (l/5, 4/5)]. As shown in table 4, both the Nash equilibria are played but a lot of dis-coordination occurs, particularly on the (bottom, right) outcome. However, as memory gets longer, players tend to coordinate more and almost always play one of the two equilibria. This suggests again that a correlated equilibrium will occur but that this coordination is more difficult without common interest. For non-symmetric games simulation results are similar; risk dominance remains the selection criterion. We now turn to the results obtained in the simulation of the game with more than two players. Each period two players are selected at random from a pool of ten players. They do not know who the opponent is, i.e. they do not recognise each other. Each of them remembers the outcome of the N previous games they played but pIayers do not observe the outcome when they are not one of the matched pair. Results for game 1 are reported in table 5. Again the (3, 3) outcome is selected. The same conclusion applies to game 2. Quite a different result is obtained in the game without a risk-dominant equilibrium; e.g. game 4. As been seen in table 6 players seem unable to coordinate and play a correlated equilibrium. All
Table 4 Percentage
outcomes,
N/P
0.2
2
0.14 0.28 0.09 0.27 0.07 0.37
4
I0
game 4, two players. 0.1 0.30 0.29 0.28 0.36 0.29 0.27
0.12 0.33 0.07 0.28 0.05 0.46
0.05 0.29 0.25 0.33 0.3 1 0.37 0.12
0.12 0.36 0.07 0.32 0.02 0.26
0.01 0.28 0.24 0.31 0.30 0.67 0.05
0.12 0.30 0.07 0.33 0.01 0.67
0.34 0.24 0.33 0.27 0.3 1 0.01
D. Di Gioacchino Table 5 Percentage
outcomes,
N/P 2 4
I0
Table 6 Percentage
0.79 0.10 0.80 0.09
0.x0
0.10 0.01 0.09 0.01 0.09
0.90 0.05 0.90 0.05 0.X7
0.05 0.00 0.05 0.00 0.05
0.09
0.01
0.05
0.03
0.2
2
0.15 0.24 0.09 0.21 0.09 0.22
10
0.01
0.05
0.1
outcomes,
409
selection
1, ten players.
0.2
N/P
4
game
/ Learning und equilibrium
0.01 0.00 0.01 0.00 0.00 t~.oo
0.99 0.01 0.99 0.01 0.99 0.00
0.03 0.00 0.02 0.00 0.02 0.31
0.95 0.03 0.95 0.02 0.64 0.03
game 4. ten players. 0.05
0.1 0.24 0.37 0.21 0.49 0.22 0.48
0.14 0.24 0.08 0.21 0.08 0.22
0.25 0.37 0.20 0.51 0.22 0.48
0.15 0.24 0.07 0.20 0.08 0.22
0.01 0.24 0.3x 0.21 0.5 1 0.21 0.48
0.14 0.25 0.07 0.20 0.0x 0.22
0 0.24 0.37 0.21 0.51 0.21 0.49
0.14 0.24 0.07 0.20 0.08 0.21
0.24 0.38 0.21 0.52 0.2 1 0.49
outcomes occur with positive probability. This result is hardly surprising; the greater the number players in the pool the more difficult it is to coordinate. Note that the limit distribution outcomes in this case is close to the mixed strategy Nash equilibrium.
of of
3. Conclusions The results reported games) that have been
above, and a large number of additional simulations (including run giving similar results, lead to the following conjectures:
asymmetric
(1) Our learning rule selects the risk-dominant equilibrium both in the case of the two players and the case of random matching, for any length of memory greater than one. (2) In games without a risk-dominant equilibrium the equilibrium selected depends on the number of players involved in the game. In the case of two players the equilibrium selected is a correlated equilibrium playing each pure strategy equilibrium a proportion of the time. In the case of random matching the outcome is close to the Nash equilibrium in mixed strategies.
References Canning, D., 1990, Average behaviour in learning models, Economic Theory Discussion paper No. 156 (Cambridge University, Cambridge). Harsanyi, J. and R. Selten, 1988, A general theory of equilibrium selection in games (MIT Press, Cambridge, MA). Kandori, M., G.J. Mailath and R. Rob, 1991, Learning, mutation, and long run equilibria in games. CARESS Working paper 91-01.