Adaptation and complexity in repeated games

Adaptation and complexity in repeated games

Games and Economic Behavior 63 (2008) 166–187 www.elsevier.com/locate/geb Adaptation and complexity in repeated games Eliot Maenner Department of Eco...

226KB Sizes 3 Downloads 174 Views

Games and Economic Behavior 63 (2008) 166–187 www.elsevier.com/locate/geb

Adaptation and complexity in repeated games Eliot Maenner Department of Economics, University of Copenhagen, Studiestræde 6, DK-1455 Copenhagen K, Denmark Received 7 April 2004 Available online 5 October 2007

Abstract The paper presents a learning model for two-player infinitely repeated games. In an inference step players construct minimally complex inferences of strategies based on observed play, and in an adaptation step players choose minimally complex best responses to an inference. When players randomly select an inference from a probability distribution with full support the set of steady states is a subset of the set of Nash equilibria in which only stage game Nash equilibria are played. When players make ‘cautious’ inferences the set of steady states is the subset of self-confirming equilibria with Nash outcome paths. When players use different inference rules, the set of steady states can lie between the previous two cases. © 2007 Elsevier Inc. All rights reserved. JEL classification: C72; D83 Keywords: Repeated games; Learning; Complexity; Bounded rationality

1. Introduction The paper presents a learning model for two-player infinitely repeated games. Players proceed through recurrent stages where they observe realized play, construct a certain class of inferences of their coplayer’s strategy based upon observed play, and choose new strategies based upon the inferences that they constructed. These recurring stages define a dynamic system that generates infinite sequences of repeated game strategy profiles. The steady states and convergence properties of these sequences are studied. The theorems demonstrate that the behavior of the dynamics depends crucially upon the inference rules that the players use to model each other. At each recurring stage players are assumed to (i) observe; (ii) infer; and (iii) choose. The strategies they choose in each stage determine a mode of play in the game. For instance, when E-mail address: [email protected]. 0899-8256/$ – see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.geb.2007.07.008

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

167

the stage game is the Prisoners’ Dilemma, a particular pair of strategies may lead to the following outcome path (C, D), (D, C), (C, D), (D, C), . . . Players observe only the path of play and not the rule of behavior chosen by their opponents. On the basis of this observation, each player must try to infer what strategy the opponent has chosen. They construct inferences of each other’s unobserved strategy based on observed behavior. Continuing with the Prisoners’ Dilemma example, suppose player 1 chooses the “Tit-For-Tat”strategy and tries to infer what strategy player 2 might be playing. Player 1 could think that player 2 is following a strategy which calls on player 2 to do the same as what player 1 did in the previous period. Such a guess would be consistent with the observed path. On the other hand, he could infer that player 2 is following a strategy that calls on player 2 to defect in odd numbered periods and this would also be consistent with the observed path. Having constructed an inference about the strategy followed by player 2, in a manner that will be specified in detail in Section 2, player 1 now chooses a new strategy that is a best response against the inferred strategy of player 2. The new strategy leads to a new outcome path and the entire process is repeated thereby generating a sequence of strategy profiles. Simplicity considerations constrain both the strategies that players choose and the inferences they form about each other’s strategies. First, in choosing which strategy to adopt, a player selects one from among those that maximize his payoff given his inference, but a player prefers simple decision rules to more complicated ones. Second, in inferring which strategy the other player may have chosen, a player selects a belief that is consistent with observed play, but a player prefers simple beliefs to more complicated ones. This inference rule corresponds to Occam’s Razor, that is, the simplest explanations that fit the observed facts. Simplicity as a belief-selection criterion attempts to address the point that the assumption that players would ever put positive probability on all consistent beliefs, or even a large number of them, may be implausible. The two types of simplicity considerations play different roles in the model. The objective of using simplicity as a belief-selection criterion is to study how it affects the adaptive behavior of the players and the steady states of the dynamic system. Given their beliefs, simplicity as a criterion for selecting a best reply determines the players’ strategy choices. This latter simplicity consideration has been previously studied, beginning with influential papers by Rubinstein (1986) and Abreu and Rubinstein (1988) who define a machine game in which players have a preference for simple strategies and strategies are represented by finite state automata. Its wellknown analytical properties provide useful base-line results, however it certainly isn’t necessary to use simplicity of beliefs and simplicity of best responses together, simplicity of beliefs is an applicable criterion under many preferences and payoff functions. Although the set of minimal state inferences is finite it is not in general uniquely determined. The steady states of the system are characterized under different rules for selecting a minimal state inference. When both players randomly select a minimal state inference from a probability distribution that has full support it is shown that the set of steady states is a subset of the set of Nash equilibria of the machine game in which only stage game Nash equilibria are played. When the probability distribution does not have full support other steady states are possible. Under ‘cautious inferences,’ a player orders the minimal state inferences in terms of the payoffs of their best responses and selects a lowest ranked inference. When both players adapt under cautious inferences the set of steady states is the subset of self-confirming equilibria with Nash outcome paths, hence the set of steady state payoffs corresponds to the set of Nash equilibrium payoffs of the machine game. When one player randomly selects an inference and the coplayer uses cautious

168

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

inferences the set of steady states need not be equivalent to the previous two systems although it can still be possible to preserve some of the cooperative outcomes of the second dynamic. Eliaz (2003) and Spiegler (2002, 2004) use simplicity as a belief-selection criterion in the context of equilibrium models instead of a learning model. Eliaz introduces an equilibrium concept, Equilibrium with Stable Forecasts, in which an equilibrium is a mutual best response and furthermore there doesn’t exist a belief simpler than the coplayer’s strategy which has a best response that is also a best response to the coplayer’s strategy. The players economize on beliefs when a best reply to a simpler belief would appear to achieve the same outcome as the more complex belief. This incentive to switch to simpler beliefs serves as an effective equilibrium selection tool in repeated 2 × 2 symmetric games. Spiegler (2002) constructs a model of reason-based choice by supposing there is a post-game evaluation of the strategies in which players must propose beliefs to justify the strategies they chose. The evaluation stage is captured in the equilibrium concept, Equilibrium in Justifiable Strategies, in which players must defend a belief by demonstrating that profitable alternative responses to the belief would have yielded inferences to which the original response outperforms the alternative response. When beliefs are formed by Occam’s Razor the equilibrium supports plausible behavior in the Centipede Game and the Chain Store Game that can’t be supported by subgame perfect equilibrium. Spiegler (2004) uses a version of Equilibrium in Justifiable Strategies in a game in which players move in alternate periods and choose between making a concession or not making a concession. The objective of a player is to maximize the number of concessions granted by the coplayer. In a large class of strategies the equilibria are either neither player makes any concessions or one player makes all of the concessions. Crucial to this strong selection result is that simplicity of beliefs leads players to believe it is their delay rather than their concessions that is motivating the coplayer to make concessions. Other equilibrium models have used complexity criteria in various ways to select equilibria in dynamic games. Banks and Sundaram (1990) study how further simplicity considerations for selecting best replies may impact the set of Nash equilibria of the machine game. Binmore and Samuelson (1992) and Volij (2002) use approaches based upon Evolutionary Stable Strategies to study equilibrium selection in the machine game. In Chatterjee and Sabourian (2000) players in an n-person unanimity bargaining game have preferences over the degree of complexity of the strategies and only choose stationary strategies in equilibrium. Sabourian (2003) is able to reduce the set of equilibria in a dynamic matching and bargaining game to only stationary strategies, all of which induce the competitive price. An alternative approach to using complexity as a belief-selection criterion is Jéhiel’s (1995, 1998, 2001) models of players who have limited foresight. Limited foresight results when payoffs are evaluated over a finite horizon in an infinite horizon game. The finite horizon in effect serves to bound the set of possible beliefs about continuation play extending up to the horizon and beliefs of continuation play past the horizon, even if they are very simple, are inadmissible. In the 2001 version that includes subjective beliefs over continuation payoffs beyond the finite horizon, conditions are provided for the set of Limited Forecast Equilibria of the infinitely repeated Prisoners’ Dilemma to include simple nonstationary strategies that induce cooperation but no stationary strategies. Section 2 introduces the formal model and the general dynamics. The steady state results in Section 3 hold for any two-player finite action stage game when the set of strategies is the set of finite state automata. Section 4 studies the steady states when at least one of the players departs from the full support assumption of the general dynamics. In Section 5 convergence properties are studied when both players adapt under the full support assumption. Theorem 3 proves that

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

169

for any finite action stage game the system converges to a steady state or an absorbing set from any initial condition. Theorem 4 shows that when the stage game is the Prisoners’ Dilemma the system converges globally to the unique steady state. Theorem 5 shows that when the stage game is Matching Pennies there is a unique absorbing set, that is, the system converges globally to an absorbing set in one-state automata. 2. The model 2.1. The underlying repeated game The game played in each period is a two-player, finite-action, normal-form game. The action set of player i is denoted Ai and the payoffs are given by ui : A1 × A2 → R, i = 1, 2. It is assumed that the two players i and −i choose strategies that can be implemented by finite state automata. An automaton is a 4-tuple Mi = {Qi , qi1 , fi , τi } where Qi is a finite set of states, fi : Qi → Ai is an output function, τi : Qi × A−i → Qi is a transition function that maps states and the coplayer’s actions into states. One of the states, denoted qi1 , is a given initial state. The set of finite state automata is denoted M. For any automaton Mi the number of states in Qi is denoted |Mi |. Let q t = (q1t , q2t ) be the pair of states and f (q t ) = (f1 (q1t ), f2 (q2t )) be the chosen actions at time t. The superscript on a state is reserved for a period in time, the subscripts refer to the identity of the player or the identity of a state in an automaton (the subscript for the player is often omitted when speaking of a particular automaton). Every pair of automata (M1 , M2 ) generates a sequence of states: {q 1 , q 2 , q 3 , . . .}. The finiteness of both the action sets and the sets of states implies that this sequence must eventually cycle after an introductory phase (possibly empty): 

Cycle

Cycle

       q 1 , . . . , q t1 −1 , q t1 , . . . , q t2 , q t1 , . . . , q t2 , . . . .

The outcome path that corresponds to the sequence of states, denoted π(M1 , M2 ), equals {f (q 1 ), f (q 2 ), f (q 3 ), . . .}. Besides the restriction to choosing a strategy in the set of automata, players also have an explicit preference for simplicity in the set of automata. Following Abreu and Rubinstein (1988), complexity is measured by the number of states in the automaton. Provided two automata attain the same payoff, a player prefers the least complex automaton, where complexity is increasing in the number of states. This preference is represented by lexicographic preferences, denoted i , i = 1, 2, in which discounted payoffs are of primary consideration and complexity costs are secondary. The payoff in the infinitely repeated game, denoted Ui (π(M1 , M2 )), is the discounted average (0 < δ < 1) of the sequence of payoffs derived from the outcome path: (1 −  t−1 u (f (q t )). δ δ) ∞ i t=1 The strict preference relation i on the set of pairs of finite state automata is defined by (M1 , M2 ) i (M1 , M2 ) if (i) Ui (π(M1 , M2 )) > Ui (π(M1 , M2 )), or (ii) Ui (π(M1 , M2 )) = Ui (π(M1 , M2 )) and |Mi | < |Mi |. The automaton M1∗ is said to be a best response to an automaton M2 if (M1∗ , M2 ) 1 (M, M2 ) for all automata M. A pair of automata is a Nash equilibrium when each automaton is a best response to the other. Given any automaton, M−i , of player −i, the set of best responses to M−i , denoted Bi (M−i ), is well-defined and consists of automata that have the same number of states and attain the same payoff. Following the terminology of Rubinstein, instead of stating

170

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

that (M1 , M2 ) is a Nash equilibrium of the repeated game with preferences i , we will just say that (M1 , M2 ) is a Nash equilibrium of the machine game. 2.2. The dynamic system We now describe the sequence of events that defines the dynamic system. A decision period is the frequency T at which players are permitted to switch their strategies during the course of play. It is convenient to assume T = ∞ since players make inferences based on the outcome path. This simplifying assumption implies that the outcome path generated by two automata will not be truncated, in which case it yields the maximum possible information about the players’ rules of behavior. Decision periods are indexed by s = 1, 2, . . . to distinguish them from ordinary periods. Formally, the game played in each decision period is an infinitely repeated game. At the start of the process, in the first decision period, play is determined by some initial pair of finite state automata, (M11 , M21 ). It is assumed that player 1 can switch strategies only in even numbered decision periods (s = 2, 4, . . .) and player 2 can switch strategies only in odd numbered decision periods (s = 3, 5, . . .). This assumption of asynchronous adaptation can be interpreted as a simplified version of an assumption that players do not always switch their strategies simultaneously. The player who updates in decision period s chooses a strategy denoted Mis , s = M s−1 . At the beginning of decision period s, s > 1, players observe and for the coplayer M−i −i the outcome path π(M1s−1 , M2s−1 ) of decision period s − 1 and use it to construct inferences of each other. The set of inferences that player i has about the true strategy of player −i, after s−1 observing the outcome path, is denoted I−i . Player i chooses Mis by taking a best response to s−1 an inference selected from I−i . In this way the system generates infinite sequences of automata and sets of inferences: 1





M1 , M21 , M12 , M22 , . . . , M1s , M2s , . . . , 1 1 2 2



I1 , I2 , I1 , I2 , . . . , I1s , I2s , . . . . s−1 s−1 Next we define I−i and how an inference is selected from I−i . A player’s inferences about the coplayer are automata which could have generated the sequence of observed actions, given the player’s own strategy. This consistency condition states that for any outcome path π(M1s−1 , M2s−1 ) generated by a pair of finite state automata, that an s−1 s−1 inference M ∈ I−i (M1s−1 , M2s−1 ) must satisfy π(Mis−1 , M) = π(Mis−1 , M−i ). Note that it is assumed that when players observe the outcome path and construct inferences of each other they do not observe each other’s strategies. A second condition on inferences is that all inferences about a coplayer’s strategy except those that have a common minimal number of states are inadmissible. These minimal state automata can not have their behavior duplicated by automata which have fewer states. Following Birkhoff and Bartee (1970), an automaton M is said to be a minimal state automaton if there does not exist another automaton M  with the same action set such that M and M  yield identical output sequences for each input sequence and |M  | < |M|. More formally, for any outcome path π(M1 , M2 ) generated by a pair of finite state automata, define Ii (M1 , M2 ) to be the set of all minimal state automata with Ci states such that (i) if M ∈ Ii (M1 , M2 ), then π(M, M−i ) = π(Mi , M−i ), and (ii) there does not exist an automaton M  with fewer than Ci states such that π(M  , M−i ) = π(Mi , M−i ). The sets I1 and I2 are called the sets of minimal state inferences for players 1 and 2, respectively. The dependence of these

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

171

sets on (M1 , M2 ) will be omitted from the notation when it is clear in context. For any pair of finite state automata (M1 , M2 ) the corresponding sets I1 and I2 are not empty, unique, finite, and π(M1 , M2 ) = π(M1 , M2 ) for all (M1 , M2 ) ∈ I1 × I2 . An implicit assumption in the inference formation procedure is that the only information the players use in their decision period s inference problem comes from the outcome path π(M1s−1 , M2s−1 ) of decision period s − 1. After the inferences are constructed all other historical data is effectively discarded. In all of the dynamic systems we study the players observe the outcome path, construct sets of minimal state inferences and update their strategies in alternate decision periods as has already been described. The systems will differ in how the players evaluate their minimal state inferences. We first investigate the dynamics when each player randomly selects a minimal state inference and then takes a best response to this random selection (Condition r). When the set s−1 s−1 , let p(M−i ; I−i ) denote the probability that player i beof minimal state inferences is I−i s−1 lieves that the true strategy of player −i is M−i ∈ I−i . It is assumed that every distribution s−1 s−1 s−1 )}M−i ∈I s−1 has full support, i.e. p(M−i ; I−i ) > 0 for all M−i ∈ I−i . {p(M−i ; I−i −i

s−1 Condition r (Selection from the minimal state inference set). Mis ∈ Bi (M−i ) for an M−i ∈ I−i s−1 that is randomly selected with probability p(M−i ; I−i ).

The best response set, Bi (M−i ), will be multi-valued in general. It may be that the present strategy, Mis−1 , of the player who updates in decision period s is in the set Bi (M−i ) referred to in Condition r. In this case, it is of interest to study the dynamics when the adapting player sets Mis = Mis−1 to minimize switching costs. If it is not the case that Mis−1 ∈ Bi (M−i ) and more than one strategy satisfies Condition r then Mis is chosen randomly from Bi (M−i ). These latter two selection rules for the reply are formalized in Condition R. Condition R (Selection from the best response set). Let Bi (M−i ) denote the possible selections / for Mis . If Mis−1 ∈ Bi (M−i ), then Mis := Mis−1 . Suppose Bi (M−i ) has x elements. If Mis−1 ∈ Bi (M−i ), then Mis := M ∈ Bi (M−i ) with probability 1/x. The assumptions that the players update their strategies in alternate decision periods and construct sets of minimal state inferences are common to all of the dynamic systems we study and hence are not repeated in the definitions of each dynamic system. Generalized Dynamic System (GDS). Mis is selected by Condition r followed by Condition R for both players i = 1, 2. Let (M11 , M21 ) be an arbitrary pair of finite state automata. Given this initial condition, GDS generates sequences of automata, {(M1s , M2s )}∞ s=1 . The first example illustrates that a minimal state automaton is not necessarily a minimal state inference. The example also uses a state table representation of a finite state automaton (Table 1). These tables are useful for computations. The initial state is the first state listed in the current state column. Example 1. Let M1 = M2 = TIT-FOR-TAT. Then π(M1 , M2 ) = {(C, C), . . .} and, I1 = I2 = {COOPERATE}, where COOPERATE denotes the one-state automaton that always plays the

172

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

Table 1 A state table representation of TIT-FOR-TAT Current state

1st Input = C

2nd Input = D

Output

qi1 qi2

τi (qi1 , C) = qi1 τi (qi2 , C) = qi1

τi (qi1 , D) = qi2 τi (qi2 , D) = qi2

fi (qi1 ) = C fi (qi2 ) = D

Table 2 The Prisoners’ Dilemma (g, l > 0)

C D

C

D

1, 1 1 + g, −l

−l, 1 + g 0, 0

action C. Thus, even though TIT-FOR-TAT is a minimal state automaton, it is not a minimal state inference. Example 2. The example illustrates GDS when the stage game is the Prisoners’ Dilemma. The payoff matrix is depicted in Table 2. Both players start with an automaton that has a one period “show-of-strength”and then cooperates provided the coplayer cooperates: C q1 q2

SOS = q1 q2

D q2 q1

D . C

The outcome path in the first decision period is:  D C C π(SOS, SOS) = , , ,... . D C C The set of minimal state inferences that each player forms about play in the first decision period consists of 4 strategies: I=

⎧ ⎨ ⎩

q1 q2

C q1 q2

D q2 q1

D , q1 q2 C

C q1 q2

D q2 q2

D , q1 q2 C

C q2 q2

D q2 q1

D , q1 q2 C

C q2 q2

D q2 q2

⎫ ⎬ . D ⎭ C

For conciseness and computational purposes it is convenient to represent the set of minimal state inferences, I, with the following incompletely specified automaton (Birkhoff and Bartee, 1970): I = q1 q2

C −a q2

D q2 −b

D , C

where −a and −b could be any state. We compare the following best response sets: (1) When −b = q2 the best response against either inference is DEFECT and has the payoff δ(1 + g)/(1 − δ), where DEFECT is the one-state strategy that always plays D. (2) When −b = q1 the best response against either inference is: (a) DEFECT with the payoff δ(1 + g)/(1 − δ 2 ), if g  δ and, (b) any strategy in the set given by I yielding the payoff δ/(1 − δ), if g < δ.

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

173

Starting at the initial condition (SOS, SOS), if g  δ then GDS will generate the following sequence (with probability 1) which converges to a steady state:







 SOS DEFECT DEFECT DEFECT , , , ,... . SOS SOS DEFECT DEFECT If g < δ let p be the probability that a player infers −b = q2 . Then the probability that one of the players has not chosen DEFECT by decision period s is (1 − p)s . Once a player chooses DEFECT the other player has a unique minimal state inference and the system converges to a steady state. Hence, even though (SOS, SOS) can be a Nash equilibrium of the machine game, the players have an incentive to depart from it—it is not a steady state. What the players infer about the latent behavior of the coplayer can create an incentive to switch strategies when the players place positive probability on the most optimistic inferences. These “optimistic inferences” occur when −b = q2 is inferred, and they are exactly the inferences whose best responses yield a payoff at least as high as the payoff from taking a best response to any of the other minimal state inferences. Two common types of behavior exhibited by agents in GDS playing the Prisoners’ Dilemma are responses that anticipate unpunished defection from mutual cooperation and responses that anticipate mutual cooperation. When g  δ only the former type of behavior is manifested, but when g < δ the latter type of behavior is also manifested. However, when g < δ any mutual cooperation will eventually be destroyed as soon as a player infers one of the optimistic inferences. In Example 2 the elimination of Nash equilibria involving cooperation by GDS suggests that players are underestimating their coplayer’s willingness or ability to punish; what in fact is a sufficient deterrent is not anticipated even though play of D is observed in the first period of the decision period. Some researchers have proposed that an automaton which starts with a “showof-strength” allows the opponent to learn about its punishment capability, thereby providing a deterrent. However, in GDS, where there is a positive probability of drawing an optimistic inference, the players do not learn this way and fail to be convinced that they will actually be punished. 3. Steady states The solution concept is a steady state of the dynamic system. Definition 1. A pair of finite state automata, (M1∗ , M2∗ ), is said to be a steady state of the dynamic ∗ ∗ ∗ ∗ system if whenever (M1S , M2S ) = (M1∗ , M2∗ ) then {(M1s , M2s )}∞ s=S ={(M1 , M2 ), (M1 , M2 ), . . .}. If GDS is in a steady state then, due to the full support assumption in Condition r, Mi∗ is a best response for player i against any automaton M−i in the inference set I−i (M1∗ , M2∗ ). It follows from Definition 1 that the sets of minimal state inferences in a steady state are the same in each decision period: {I1s , I2s } = {I1∗ , I2∗ } for all s. The aspects of the proofs which require obtaining sets of minimal state best responses involve solving a Markov Decision Problem (MDP) and then constructing minimal state automata that implement the solution to the MDP. In the Markov Decision Problem (MDP) of player i when player −i is using automaton M−i , a sequence of actions {ai∗t }∞ t=1 is chosen that maximizes  t−1 u (a t , f (q t )) under the law of motion q t+1 = τ (q t , a t ) that has initial δ (1 − δ) ∞ i i −i −i −i −i i t=1 −i 1 . It is well known that a solution of the form f : Q → A exists. state q−i i −i i

174

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

The solution to the MDP has |Q−i | states, however we are interested in solutions that have a minimal number of states. An automaton Mi implements the solution to the MDP if the outt ), f (q t ))}∞ which is generated by this come path π(Mi , M−i ) equals the sequence {(fi (q−i −i −i t=1 solution. An automaton in the best response set, Bi (M−i ), will attain the optimal value of player i’s MDP but in general will have fewer than |Q−i | states. This best response set can always be constructed with the following two steps: (a) For each solution, fi : Q−i → Ai , of the MDP t ), f (q t ))}∞ , that is generated when f (q t ) is the inconstruct the outcome path, {(fi (q−i −i −i i −i t=1 put intoM−i . (b) The set of automata that are the minimal state best responses, Bi (M−i ), are obtained by constructing the set of minimal state inferences about player i for each outcome path in (a) and then keeping only those automata that have a common minimal number of states. Two lemmas will be proved before the theorems are presented. Lemma 1 derives necessary conditions for the dynamic system to be at a steady state. These conditions are: (a) the strategies of both players have the same number of states and one of the minimal state inferences of the coplayer is the true strategy of the coplayer, and (b) there is a pair of minimal state inferences which is a Nash Equilibrium of the machine game. The property in condition (a) that the strategies of both players have the same number of states is also a property of equilibria of the machine game. When the system is not at a steady state there are numerous counterexamples to the conclusions of Lemma 1. Lemma 2 establishes a property of the set of minimal state inferences in a steady state: there must be a minimal state inference in which the transitions {τi (q, a−i )}a−i ∈A−i in each state q are identical for all inputs a−i ∈ A−i . Lemma 1. For any finite action stage game, if (M1∗ , M2∗ ) is a steady state of GDS then: (a) |M1∗ | = |M2∗ | and Mi∗ ∈ Ii∗ , for i = 1, 2, and (b) There is a pair of minimal state inferences (M1 , M2 ) ∈ I1∗ × I2∗ that is a Nash equilibrium of the machine game. Proof. Since the pair (M1∗ , M2∗ ) generates the outcome path π(M1∗ , M2∗ ) it follows that Mi∗ , for i = 1, 2, has at least as many states as any minimal state inference of the strategy of player i: |Mi∗ |  Ci∗ . Property (a) will follow once it is shown that |Mi∗ | = Ci∗ , for i = 1, 2. ∗ . A solution to Since player i chooses Mi∗ it must be a best response to some M−i ∈ I−i ∗ states. the MDP based on the transition function of M−i is well-defined and has |M−i | = C−i Since Mi∗ is chosen by player i, this implies that Mi∗ attains the same payoff that is attained by ∗ . It follows from |M ∗ |  C ∗ and the solution to the MDP and has no more states: |Mi∗ |  C−i i i ∗ ∗ ∗ ∗ ∗ ∗ ∗ |Mi |  C−i that |Mi |  Ci  |M−i |  C−i  |Mi |. For property (b), since (M1∗ , M2∗ ) is a steady state, Condition r implies that M1∗ ∈ B1 (M2 ) for some M2 ∈ I2∗ and M2∗ ∈ B2 (M1 ) for some M1 ∈ I1∗ . From (a) we have that every response in Ii∗ has the same number of states as Mi∗ and generates the same payoff when playing M−i , thus, Ii∗ ⊂ Bi (M−i ). Hence, M1 ∈ B1 (M2 ) and M2 ∈ B2 (M1 ), which means that (M1 , M2 ) is a Nash equilibrium of the machine game. 2 Lemma 2. If (M1 , M2 ) is a Nash equilibrium of the machine game and (M1 , M2 ) satisfies  ∈ I (M  , M  ) for which any best response π(M1 , M2 ) = π(M1 , M2 ) then there exists M−i −i 1 2   . to M−i plays a best response in the stage game in each state of M−i Proof. Suppose (M1 , M2 ) generates the sequence of states  1  q , . . . , q t1 −1 , q t1 , . . . , q t2 , q t1 , . . . , q t2 , . . . ,

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

175

with t1 possibly one, and the outcome path π(M1 , M2 ) = {f (q 1 ), f (q 2 ), f (q 3 ), . . .}. By Theorem 1 in Abreu and Rubinstein (1988), if (M1 , M2 ) is a Nash Equilibrium of the machine game the states of M1 (respectively M2 ) which appear in the first t2 periods are distinct. Therefore, there exists a minimal state inference, Mi , which has exactly t2 states and has the following structure. q1  Mi := q 2 ... q t2

a−i1 q2 q3 ... q t1

a−i2 q2 q3 ... q t1

... ... ... ...

a−ik q2 q3 ... q t1

fi (q 1 ) fi (q 2 ) ... fi (q t2 )

The k actions of player −i are the inputs and the set of states for each player is represented by the common set Q = {q 1 , q 2 , . . . , q t2 }. Mi is an inference of the strategy of player i that imitates the states and realized transitions of M−i and plays the actions of player i in the order that they appear in the outcome path. By construction |Mi | = t2 and π(Mi , M−i ) = π(Mi , M−i ). Thus, Mi is a minimal state inference of the strategy of player i: Mi ∈ Ii (M1 , M2 ). Since the transition function τi (q, a−ih ) is constant across inputs {a−ih }kh=1 it follows that in any best response to Mi in state q ∈ Q it is optimal to play a best response to the stage game action fi (q). 2 In Theorem 1 it is shown that in any steady state of GDS the players will play a Nash equilibrium of the stage game in each period. An implication of Theorem 1 is that there are Nash equilibria of the machine game that are not steady states of GDS. This selection occurred in Example 2 because when players do not observe their coplayer’s strategies their inferences may differ from the true strategies, even if the true strategies constitute a Nash equilibrium. Example 3 establishes that Theorem 1 does not imply that every Nash equilibrium of the machine game in which only Nash equilibria of the stage game are played in each period is a steady state of GDS. Theorem 1 (Steady states of GDS). For any finite action stage game, in any steady state of GDS a Nash equilibrium of the stage game is played in each period. Proof. By Lemma 1 the steady state (M1∗ , M2∗ ) has an outcome path that can also be generated by some Nash equilibrium of the machine game. Assuming that (M1∗ , M2∗ ) were to involve play that is not a Nash equilibrium of the stage game we construct an inference which permits an improvement over Mi∗ , thereby leading to a contradiction. Under Condition r, Mi∗ ∈ Bi (M−i ) ∗ . Suppose that in state q of M , M ∗ plays action a instead of the stage for some M−i ∈ I−i −i i i game best response ai∗ . By Lemma 1 and Lemma 2 there exists a class of minimal state inferences in which only the action ai is observed in state q and the transition τ−i (q, ai ) is the only observed  , analogous to the construction transition in state q. Hence, we can construct an automaton M−i in Lemma 2, that is a minimal state inference and is identical to M−i except that τ−i (q, ai∗ ) =  is inferred the best response of player i plays a ∗ in state q instead of a . τ−i (q, ai ). When M−i i i  is chosen with probability p(M  ; I ∗ ) > 0 , therefore, Under Condition r this inference M−i −i −i if the conclusion of Theorem 1 did not hold for GDS, there would be a positive probability of leaving the steady state (M1∗ , M2∗ ) each decision period, a contradiction. 2 The minimal state inferences specify the latent behavior of the players in situations not encountered on the observed outcome path. Some of these inferences, in particular those used to prove Theorem 1, have best responses that optimize period-by-period play of the stage game.

176

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

The inference behavior in Condition r, which places positive probability on all inferences, only sustains steady states consistent with these particular inferences. To emphasize the significance of these particular inferences in GDS it is helpful to observe that they correspond to the inferences selected by a preference relation that orders the minimal state inferences in terms of the payoffs of their best responses. More precisely, the set of steady states obtained in Theorem 1 when players randomly select an inference from a distribution with full support can also be obtained by letting both players choose beliefs under another inference rule, optimistic infers−1  ), M  ) for all , and (ii) (Mis , M−i ) i (Bi (M−i ences: (i) Mis ∈ Bi (M−i ) for some M−i ∈ I−i −i s−1 s−1  M−i ∈ I−i . The optimistic player selects an inference M−i ∈ I−i whose best response yields a payoff at least as high as the payoffs obtained from taking best responses to the other infers−1 . Indeed, the full support assumption of Theorem 1 could be slightly weakened to ences in I−i supports that place positive probability on the optimistic inferences. In long-term competition players may know little about the latent behavior of their coplayers. They form beliefs about each other’s resolve to punish and retaliate. It is the types of players who are overly optimistic, or even a little optimistic, about opportunities for making short-run profits who have more difficulty preserving a cooperative relationship – their cooperative relationships are unstable and will eventually break down. In the next section an inference rule, cautious inferences, is presented in which the players tend to infer a relatively high ability and resolve to retaliate on the part of their coplayers. These players who are cautious about opportunities for making short-run profits are more likely to maintain a cooperative relationship. A corollary of Theorem 1 is that when the stage game is the Prisoners’ Dilemma the unique steady state of GDS is the one-state strategy DEFECT that we saw in Example 2. Figure 1 depicts the set of feasible payoffs in a Prisoners’ Dilemma—the diamond shaped figure. The folk theorem (Fudenberg and Maskin, 1986) implies that any feasible and individually rational payoff—those feasible payoffs that Pareto dominate (0, 0)—can arise as an equilibrium of the infinitely repeated game. The set of Nash equilibrium payoffs of the machine game is contained in the two diagonals that Pareto dominate (0, 0). The only steady state payoff of GDS is the point (0, 0). Example 3 illustrates that there are Nash equilibrium of the machine game in which only Nash equilibria of the stage game are played that are not steady states of GDS.

Fig. 1. Set of payoffs in a Prisoners’ Dilemma.

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

177

Table 3 Game of common interests

C D

C

D

9, 9 8, 0

0, 8 6, 6

Example 3 (Game of common interests). The stage game in Table 3 is a game of common interests. Let the initial strategy of both players be M = q1 q2

C q2 q1

D q2 q1

C . D

The pair (M, M) is a Nash equilibrium of the machine game in which only Nash equilibria of the stage game are played. From the outcome path π(M, M) the set of minimal state inferences is C D I = q1 q2 −a C . D q2 −b q1 When the discount factor δ is greater than 1/2, and player 1 immediately infers that −a = q1 then the path generated by GDS would be





 M DEFECT DEFECT DEFECT , , , ,... . M M DEFECT DEFECT Hence, let δ > 1/2 and let p be the probability that a player infers −a = q1 . Then the probability that one of the players has not chosen DEFECT by decision period s is (1 − p)s . The Nash equilibrium (M, M) is not a steady state because when a player makes an optimistic inference (i.e. −a = q1 ) the best response is to switch to DEFECT. 4. Alternative dynamics In this section we investigate the sensitivity of the dynamics to alternative adaptation rules. The previous section focused on characterizing the steady states when the probability distributions on the sets of minimal state inferences had full support. The steady state characterization of GDS is robust to relaxing the full support assumption as long as positive probability is placed on certain inferences called optimistic inferences. Further insight can be obtained by relaxing the full support assumption in a different way. In another inference rule, cautious inferences, the players order the minimal state inferences in terms of the payoffs of their best responses and they choose a lowest ranked inference. When both players adapt under cautious inferences the set of steady states includes all self-confirming equilibria that have Nash outcome paths. This dynamics, DS-c, and Example 4, adaptation under DS-c with the Prisoners’ Dilemma stage game, serve to illustrate how the dynamics can vary considerably once the full support assumption is relaxed. Cautious inferences are of further interest due to the relationship between the steady states and self-confirming equilibria. Under cautious inferences a player selects an inference M−i which has a best response whose payoff is least preferred relative to the payoffs obtained from the best responses to the other ins−1 ferences in I−i . The player who updates switches to a strategy that is a best response to this

178

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

inference M−i . Cautious inferences are the polar opposite of optimistic inferences, discussed in the previous section, to the extent that they select the least preferred and most preferred minimal state inferences, respectively, when the inferences are ordered by the payoffs obtained from taking best responses. Formally, if Mis is chosen at decision period s under the cautious inference rule it must satisfy Condition c. s−1  ), M  ) , and (Bi (M−i Condition c (Cautious inferences). Mis ∈ Bi (M−i ) for some M−i ∈ I−i i −i  ∈ I s−1 . (Mis , M−i ) for all M−i −i  ) is multi-valued, the notation in Condition c should be interpreted as requiring When Bi (M−i  ). that the conditions hold for each element of Bi (M−i

Dynamic System c (DS-c). Mis is selected by Condition c followed by Condition R for both players i = 1, 2. Example 4. From the set of minimal state inferences, I, found in Example 2 and the best response sets (1), (2a) and (2b) we can perform the calculation of the cautious players. For the case g < δ,we compare (1) with (2b) and find that the cautious player chooses a best response from (2b) because this best response gives a lower payoff than the best response in (1). Hence, the cautious player chooses a two-state automaton in I, and, by Condition R, must re-choose SOS. Starting at the initial condition (SOS, SOS) DS-c generates a single constant sequence. Observe that even though the cautious players construct the same set of inferences as the players in GDS, the Nash equilibrium (SOS, SOS) is a steady state in DS-c but not GDS. Indeed, (SOS, SOS) is a Nash equilibrium of the machine game, and all such equilibria are steady states of DS-c (Theorem 2). DS-c permits the players to sustain more cooperative outcomes—outcomes that can’t be obtained only with Nash equilibria of the stage game—than does GDS. Theorem 2 shows that even though there are steady states of DS-c which are not Nash equilibria of the machine game a steady state of DS-c is a Nash equilibrium in beliefs; the players’ inferences constitute a Nash equilibrium of the machine game. Hence, the set of steady state payoffs of DS-c is identical to the set of Nash equilibrium payoffs in the machine game. Theorem 2 (Steady states of DS-c). For any finite action stage game, if (M1 , M2 ) is a Nash equilibrium of the machine game then (M1 , M2 ) is a steady state of DS-c. Moreover, any (M1 , M2 ) ∈ I1 (M1 , M2 ) × I2 (M1 , M2 ) is also a steady state of DS-c. Proof. It must be shown that if (M1s , M2s ) = (M1 , M2 ) then (M1s+1 , M2s+1 ) = (M1s , M2s ). First we demonstrate that if the decision period s strategies are a Nash equilibrium of the machine game then they are minimal state inferences: If (M1s , M2s ) is a Nash equilibrium then (M1s , M2s ) ∈ I1s × I2s . Suppose (M1s , M2s ) is a Nash equilibrium of the machine game and Mis is not a minimal state inference. Then there is an automaton Mi = Mis that has fewer states, and since s ) = π(M s , M s ) it attains the same repeated game payoff, a contradiction. π(Mi , M−i 1 2 This establishes that M1s and M2s are both best responses to minimal state inferences, one requirement to fulfill Condition c. If it is shown that they are least preferred responses in the decision period s + 1 problem, then due to Condition R they will be selected at decision period s + 1. Suppose that player i updates at decision period s + 1. Notice that all pairs of inferences in

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

179

the set I1s ×I2s yield the same outcome path and payoff. This implies that for all inferences M−i ∈ s it is true that (M s , M ) ∼ (M s , M s ). Since (B (M ), M )  (M s , M ), it follows, I−i −i i i −i −i i −i i i −i i s ) for all M s by transitivity, that (Bi (M−i ), M−i ) i (Mis , M−i −i ∈ I−i . Thus, all requirements of Condition c are satisfied. To obtain the second conclusion, note that since all pairs of inferences in the set I1s × I2s yield the same outcome path and payoff, and since (M1 , M2 ) is a Nash equilibrium of the machine game, it follows that M1 ∈ B1 (M2 ) and M2 ∈ B2 (M1 ), and the rest of the argument is identical to the first argument. 2 Theorem 2 also implies that every steady state is a self-confirming equilibrium in the ma∗ ), chine game. This follows solely from the steady state property π(Mi∗ , M−i ) = π(Mi∗ , M−i ∗ ∗ where Mi ∈ Bi (M−i ) for some M−i ∈ I−i . That is, each player’s inference is confirmed when (M1∗ , M2∗ ) is played next decision period, even though the inference M−i may not be the true strategy of player −i. In a self-confirming equilibrium players choose a best response to their beliefs, and these beliefs only have to be consistent with the equilibrium path of play (Fudenberg and Levine, 1993). To make the relationship between steady states of DS-c and self-confirming equilibria more explicit we define a stability criterion based upon the concept of self-confirming equilibrium. The stability criterion requires that once a player finds an inference which repeatedly matches his observations and the response is chosen optimally, then the response is reselected. Definition 2. A pair of automata (M1∗ , M2∗ ) and a pair of inferences (M 1 , M 2 ) is said to be stable ∗ ). if (a) Mi∗ ∈ Bi (M −i ), and (b) M −i ∈ I−i (Mi∗ , M−i ∗ ) satisfies π(M ∗ , M ) = π(M ∗ , M ∗ ) the consisSince every inference M−i ∈ I−i (Mi∗ , M−i −i i i −i tency condition of self-confirming equilibrium is satisfied by a stable pair {(M1∗ , M2∗ ), (M 1 , M 2 )}. The definitions of GDS and DS-c directly imply that every steady state of GDS and DS-c is stable. Conversely, if a pair of automata is stable then it is a steady state of DS-c. When a pair of automata (M1∗ , M2∗ ) is stable then part (a) of Definition 2 implies that (Bi (M−i ), M−i ) i (Mi∗ , M −i ) for s , because when (M ∗ , M ∗ ) is played all pairs of inferences in the set all inferences M−i ∈ I−i 1 2 I1 × I2 yield the same outcome path and payoff. Hence, Condition c is satisfied and Mi∗ will be reselected in DS-c.

Corollary 1. For any finite action stage game, {(M1∗ , M2∗ ), (M 1 , M 2 )} is stable if and only if (M1∗ , M2∗ ) is a steady state of DS-c. Now we study the dynamics when one player adapts under Condition r and the other player adapts under Condition c. Dynamic System rc (DS-rc). M1s is selected by Condition r followed by Condition R for player 1. M2s is selected by Condition c followed by Condition R for player 2. There is no general result equating the set of steady state payoffs in DS-rc to the steady state payoffs of GDS or DS-c. The conclusion of Theorem 1 need not hold for DS-rc because in DSrc player 2 might not be playing a best response of the stage game in each state of player 1’s strategy. Moreover, the conclusion of Theorem 2 need not hold for DS-rc either. In DS-c, but not DS-rc, there can be steady states that are Nash equilibria of the machine game in which player 1

180

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

does not always play a best response of the stage game in each state of player 2’s strategy. It can be proved for symmetric stage games with two actions that the set of steady states of DS-rc is the same as the set of steady of states of GDS, however this result does not generalize to asymmetric stage games or to symmetric stage games with more than two actions. The dynamics of DS-rc introduces several questions. Could a player profit from switching to a different inference rule, given the inference rule of the coplayer? Although switching player types is not a strategic choice in the model a comparison of the dynamics under different types is a useful exercise to investigate how to manage a relationship against a fixed type or as a first step towards addressing the problem of endogenous types. A related question is whether a particular steady state is robust to switching one of the players’ types. The results for GDS imply that when both players make the most optimistic inferences with positive probability they will not be able to sustain cooperative outcomes. And the results of DS-c imply that players who make cautious inferences can sustain a wide range of cooperative outcomes. In DS-rc, where the players’ types are heterogeneous, it can be possible for the players to obtain some of the cooperative outcomes of DS-c, but without the additional constraint that both players make cautious inferences. Some of the cooperative outcomes are robust to permitting one of the players to choose a minimal state inference at random. In a symmetric stage game with three actions Example 5 illustrates that with only one cautious player all steady states of DS-rc which are not steady states of GDS are Pareto preferred to any steady state of GDS. In this case the presence of a cautious coplayer can be to the advantage of both behavioral types, to the extent that only a single cautious player is necessary to maintain a cooperative outcome. Although the existence of a single cautious player can augment the set of steady states relative to GDS this does not in general imply that the additional steady states necessarily Pareto improve over the steady states of GDS. There exist stage games for which all of the steady states of DS-rc which are not also steady states of GDS are worse for the cautious player and better for the random player relative to any steady state of GDS. Thus, the general answer to the question of whether a particular type of player has an advantage will depend on the structure of the stage game as well as on the type of the coplayer. Example 5 (DS-rc). The symmetric stage game is displayed in Table 4. The action profile (a1 , a1 ) is the unique pure strategy Nash equilibrium and the profile of minmax actions as well. The point (0, 0) is the only steady state payoff under GDS. Consider the initial strategies M1 = q1 q2

a1 q2 q1

a2 q1 q2

a3 q1 q1

a1 , a3

M2 = q 1 q2

a1 q2 q2

a2 q1 q2

a3 q1 q2

a1 . a2

Table 4 A symmetric stage game

a1 a2 a3

a1

a2

a3

0, 0 −1, −1 −1, 3

−1, −1 −1, −1 2, 2

3, −1 2, 2 −1, −1

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

181

When δ > 1/2 the pair (M1 , M2 ) is a Nash equilibrium of the machine game with payoff 2δ/ (1 − δ). The minimal state inferences of player 2’s strategy are I2 = q1 q2

a1 q2 −c

a2 −a −d

a3 −b q2

a1 . a2

Since M1 is a best response to all of these inferences player 1 will continue to use M1 . The minimal state inferences of player 1’s strategy are I1 = q1 q2

a1 q2 −g

a2 −e q2

a3 −f −h

a1 . a3

For any inference in which player 2 infers −g = q2 the best response is the one-state automaton which always plays a1 and has payoff 3δ/(1 − δ). For any inference in which player 2 infers −g = q1 and δ > 1/2 the best response is a two-state automaton that has payoff 2δ/(1 − δ). Player 2, who uses cautious inferences, will choose a best response which gives a payoff lower than the payoff of any other best response to an inference in I1 . The initial strategy M2 is rechosen and it follows that (M1 , M2 ) is a steady state of DS-rc. Comparing the payoff of (M1 , M2 ) to the steady state payoff of GDS it is evident that player 1, who uses inference rule r, is better off when the coplayer uses cautious inferences and player 2, who uses cautious inferences, would not find an incentive to switch to inference rule r. 5. Convergence The convergence results characterize the dynamics of play when the system is not at a steady state. When we investigated steady states in Section 3 we were not concerned with the multiple paths that could be generated by GDS, because a steady state pair of automata is an absorbing set of the system. When the system is not at a steady state, however, Conditions r and R imply that there is a probability distribution over the set of possible paths. The system cannot diverge because the minimal state property of the best responses implies that the number of states in the players’ automata will not increase across decision periods. Consequently, for any finite action stage game GDS converges, with probability one, either to a steady state or another absorbing set. This global convergence proposition is proved in Theorem 3. The main question that remains for convergence is to determine all of the absorbing sets of GDS for particular stage games; this exercise is illustrated in Theorems 4 and 5. Let Pr((M1s , M2s ) = (M1 , M2 )|(M11 , M21 )) denote the probability that GDS is in state (M1 , M2 ) in decision period s given the initial condition is (M11 , M21 ). Hence, Pr((M1s , M2s )| (M11 , M21 )) is a probability distribution on the set of sequences of the form {(M1S , M2S )}sS=1 , where the probabilities are sums and products of the probabilities of the inferences (from Condition r) and the probabilities of the best responses (from Condition R). An infinite sequence generated by GDS for an initial condition (M11 , M21 ) is a fixed infinite sequence of pairs {(M1s , M2s )}∞ s=1 such that any truncation to the first s terms is an event that occurs with positive probability under GDS. An absorbing set,



j  j  XA = M11 , M21 , M12 , M22 , . . . , M1 , M2 , is a set of pairs of automata that the system does not leave once it is entered and each element of XA is recurring. If j = 1 then the absorbing set is called a steady state.

182

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

Definition 3. XA is an absorbing set if (a) Pr((M1S , M2S ) ∈ XA |(M1s , M2s ) ∈ XA ) = 1 for all decision periods S = s + 1, s + 2, . . . , and (b) There exists an η > 0 such that for + + any S and (M1 , M2 ), (M1 , M2 ) ∈ XA there exists an S + > S such that Pr((M1S , M2S ) = (M1 , M2 )|(M1S , M2S ) = (M1 , M2 ))  η. First we prove that GDS must converge to an absorbing set and then, in Lemma 3, we characterize some properties of absorbing sets. Theorem 3 (Convergence of GDS). For any finite action stage game and any initial condition (M11 , M21 ), there is a limit set Ω such that (a) GDS converges to Ω: lims→∞ Pr((M1s , M2s ) ∈ / Ω|(M11 , M21 )) = 0, (b) Ω is finite and it is either an absorbing set or a union of absorbing sets, (c) If (M1 , M2 ) ∈ Ω then it is periodically played with positive lim sups→∞ Pr((M1s , M2s ) = (M1 , M2 )|(M11 , M21 )) > 0. Proof. See Appendix A.

probability:

2

In Lemma 3(a) the minimal state property of the best responses will imply that all automata in an absorbing set have the same number of states; this generalizes a property of steady states. This is a strong restriction on an absorbing set, but it does not imply that the only absorbing sets are steady states since players could choose different strategies which have the same number of states (see Theorem 5—Matching Pennies). If it can be demonstrated that a player will select an automaton with fewer than k states, k > 1, from within a candidate absorbing set, then Lemma 3(a) also provides a useful approach for proving that absorbing sets in k-state automata do not exist. A second property of general absorbing sets, Lemma 3(b), is that the selection Mis of the player who updates in decision period s is a minimal state inference formed by player −i; this is weaker than the analogous property for steady states in Lemma 1(a). Hence, in any absorbing set the true strategy of the player who updates will be inferred with positive probability. This property is useful for establishing that the only absorbing set of GDS with the Prisoners’ Dilemma stage game is a steady state (Theorem 4). By Condition r, Mis is a best response to some minimal s−1 , however it could be that M−i is an incorrect inference. In this case state inference M−i ∈ I−i s s s s ), (Mi , M−i ), not (Mi , M−i ), is played in decision period s and the inferences in Iis (Mis , M−i s save Mi , are not necessarily best responses to any strategies. Hence, the advantage of working s ) is that M s has the additional properties of a miniwith Mis over other inferences in Iis (Mis , M−i i mal state best response. Applying this same reasoning in the next decision period, by Lemma 3(b) s+1 and Condition r there is a positive probability that Mi−1 is a best response to Mis . This crosss+1 | < k with positive probability, decision period link makes it possible to establish that |Mi−1 where k > 1 is the number of states in a strategy in some candidate absorbing set. Lemma 3 (Properties of an absorbing set). For any finite action stage game, if XA is an absorbing set of GDS: (a) All strategies of both players in the absorbing set must have the same number of states:  | for all pairs (M , M ), (M  , M  ) ∈ X . |Mi | = |M−i 1 2 A 1 2

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

183

(b) A decision period s selection of player i (the player who can update) is in the minimal state inference set at decision period s (M1s ∈ I1s , for s = 2, 4, . . . and M2s ∈ I2s , for s = 3, 5, . . .). Proof. See Appendix A.

2

Stronger global convergence propositions than Theorem 3, in particular propositions which prove that every absorbing set is a steady state or propositions which prove that there is a unique absorbing set, can be obtained if one introduces assumptions on the structure of the stage game. Although stronger global convergence propositions are analytically intractable for most stage games, Theorems 4 and 5 illustrate how assumptions introduced on the stage game can be sufficient to obtain such propositions. Theorem 4 proves that there is global convergence to the unique steady state in GDS when the stage game is the Prisoners’ Dilemma; hence it establishes that the only absorbing set is the unique steady state found in Section 3. Theorem 5 proves that there is a unique absorbing set in GDS when the stage game is Matching Pennies. The proofs of Theorems 4 and 5 make use of there always being certain k-state strategies Mis selected under s+1 having fewer than k Condition R which, with positive probability, result in the selection M−i states along any path that is not the unique cycle or steady state. Theorem 4. For the Prisoners’ Dilemma, the unique absorbing set under GDS is the steady state (DEFECT, DEFECT), for any g, l > 0, and all 0 < δ < 1. Proof. See Appendix A.

2

Since Matching Pennies (Table 5) does not have any pure strategy Nash equilibria, Theorem 1 implies that there are no steady states in GDS. Let the one-state strategy which always plays H be denoted by HEAD and let the one-state strategy which always plays T be denoted by TAIL. Theorem 5. For Matching Pennies, the unique absorbing set under GDS is   (TAIL, HEAD), (HEAD, TAIL), (HEAD, HEAD), (TAIL, TAIL) . Proof. See Appendix A.

2

6. Concluding remarks The problems of learning and coordination on strategies in infinitely repeated games were addressed in a dynamic system in which players can begin the game with any feasible strategies represented by finite state automata and are permitted to explicitly switch their strategies during the course of play. The rules that define the mode of adaptation are a set of behavioral assumptions and the model departs from fully rational players in several aspects. In particular, the players’ behavior is constrained by complexity considerations, both in the strategies that they Table 5 Matching Pennies stage game

H T

H

T

1, −1 −1, 1

−1, 1 1, −1

184

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

choose and in the inferences that they construct of their coplayer’s strategies. Players make inferences about the latent behavior of their coplayers, but they only construct the simplest inferences that are consistent with observed play. And, perhaps paradoxically (T = ∞), players have limited foresight because they do not anticipate how their coplayers will adapt in the next decision period. The results demonstrated that the conditions for the steady states to support cooperative outcomes depend crucially on the behavioral rules that the players use to select beliefs from their sets of simple inferences. Acknowledgments I wish to thank Kalyan Chatterjee and Vijay Krishna for numerous discussions and the associate editor and referee at Games and Economic Behavior for their suggestions. I am also grateful for receiving comments from Drew Fudenberg, Jim Jordan, Dmitriy Kvasov, Eric Maskin, Ariel Rubinstein, Hamid Sabourian, Karl Schlag and Tomas Sjöström. Financial support from the European University Institute in 2004–2005 is gratefully acknowledged. Appendix A Proof of Theorem 3. First we establish some general properties of the limit set Ω that assist in the remainder of the proof. Ω is not empty, because it must contain at least one state of GDS. Ω is a finite set because GDS converges to a finite set in which each automata has no more states than max{|M11 |, |M21 |}. To prove the last claim consider any decision period s > 1; optimization implies that the selection of player i in decision period s has no more states than the number of s−1  |Mis |). The states in any minimal state inference about player −i at decision period s − 1 (C−i selection of player −i in decision period s − 1 has at least as many states as any minimal state s−1 s−1 |  C−i ). Hence, for any infinite sequence {(M1s , M2s )}∞ inference about player −i (|M−i s=1 generated by GDS, and for s = 2, 3, . . . ,  s−1   s   s+1  M   M   M   · · · . i −i −i This permits us to make the assumption, for the remainder of the proof, that the state space of GDS is finite. A consequence is that GDS can be transformed into a discrete time, finite state, Markov chain. Each state of the Markov chain has the form ((M1 , M2 ), i) where i, the player index, is either 1 or 2. The transition probabilities of the Markov chain are then determined by Conditions r and R and whether the decision period is an odd or even number. Using the terminology for Markov chains that is in Feller (1968), for a given stage game and 1 ) of GDS we define Ω to be the set of all pairs (M , M ) in GDS initial condition (Mi1 , M−i i −i that can be reached in the Markov chain (for players 1 or 2) from the initial condition and are not transient in the Markov chain (for players 1 and 2). A theorem in Feller (p. 392) establishes that the states of any finite state Markov chain can be partitioned into a set of transient states and a finite number of irreducible sets of states. Ω then corresponds to the irreducible sets of the Markov chain that can be reached from the initial condition; the pairs in Ω do not have the additional player index which the states of the Markov chain have. This implies that Ω is an absorbing set or a union of absorbing sets and that Ω is exactly the states of GDS that will be played with positive probability in the limit. The periodic nature of GDS implies that the characterization of the limiting probabilities for a state (M1 , M2 ) ∈ Ω can be given as lim sups→∞ Pr((M1s , M2s ) = (M1 , M2 )) > 0. 2

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

185

Proof of Lemma 3. First we establish the property that for any infinite sequence {(M1s , M2s )}∞ s=1 generated by GDS there exists an S > 0 such that for all s, s  > S it is true that |Mis | = s | = |M s  | = |M s  |. In the proof of Theorem 3 we established, for any infinite sequence |M−i i −i {(M1s , M2s )}∞ s=1 generated by GDS, and for s = 2, 3, . . . , that  s−1   s   s+1  M   M   M   · · · . i −i −i This is a bounded monotonic sequence. Since each term in the sequence, after the first term, can assume only one of a finite number of integers there exists a tail of the sequence, after some decision period S, in which all terms are the same integer. Since |M1s+1 | = |M1s | if s is even, and |M2s+1 | = |M2s | if s is odd, it follows that|M1S+s | = |M2S+s | for all s. For the proof of Lemma 3(a), let XA be an absorbing set and suppose, for simplicity, that the initial condition of GDS is a member of the absorbing set, (M11 , M21 ) ∈ XA . Take any infinite sequence generated by GDS, then there is a tail of the sequence, beginning in some decision period S, that satisfies the previous convergence property. Choose an s > S, then (M1s , M2s ) = (M1 , M2 ) for some (M1 , M2 ) ∈ XA , and the property on infinite sequences implies that |M1 | = |M2 |. We claim that there exists some infinite sequence generated by GDS in   which (i) (M1s , M2s ) = (M1 , M2 ), (ii) (M1s , M2s ) = (M1s , M2s ) for some s  > s and (iii) for each   (M1 , M2 ) ∈ XA there exists an s  such that s  > s  > s and (M1s , M2s ) = (M1 , M2 ). Given that s s (M1 , M2 ) = (M1 , M2 ), this claim just follows from condition b in Definition 3; this condition says that there is a positive probability that any member of XA can be reached, in a finite number of decision periods, from any member of XA . Since this is a positive probability event it will correspond to some infinite sequence generated by GDS.     Since (M1s , M2s ) = (M1s , M2s ) = (M1 , M2 ) we know that |M1s | = |M2s | and |M1s | = |M2s |.   s+1 s −1 This implies Lemma 3(a), because then |Mis | = |M−i | = · · · = |M−j | = |Mjs | where j is either i or −i depending on whether s  is odd or even; and we already know that |M1s+1 | = |M1s | if s is even, and |M2s+1 | = |M2s | if s is odd. s+1 For the proof of Lemma 3(b), since |Mis | = |M−i | it also follows that |Mis | = Cis and s s Mi ∈ Ii . 2

Proof of Theorem 4. Assuming that there exists an absorbing set XA in k-state automata, k > 1, we derive a contradiction by showing that there is a positive probability that the players will choose strategies that have fewer than k states. Let Mis ∈ Bi (M−i ) be a selection when the system is in XA at decision period s. The minimal state best response property of i implies that the outcome path of π(Mis , M−i ) either (a) consists only of the terms (D, D) and (C, C), or (b) consists only of the terms (C, D) and (D, C). We refer to best responses in which (a) occurs as cooperative solutions and best responses in which (b) occurs as competitive solutions. The proof of Theorem 4 is divided into two cases based on whether the best response Mis is a cooperative solution or a competitive solution. The properties of the transitions of Mis in these two cases are further defined in Cases (a) and (b) below. Since Lemma 3(b) and inference rule r together imply that player −i will infer Mis with positive probability, these properties can be used to analyze the selection in decision period s + 1. Case (a). Assume Mis is a k-state cooperative solution. The minimal state best response property of i implies that the set Bi (M−i ) has multiple elements. For states qD in which fi (qD ) = D, it follows that τi (qD , C) is a variable which takes a value in Q = {q 1 , q 2 , . . . , q k }, and for states qC in which fi (qC ) = C, it follows that τi (qC , D) is a variable which takes a value

186

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

in Q. Hence, there is an M i ∈ Bi (M−i ) for which τi (qC , D) = qC and τi (qD , C) = τi (qD , D). s+1 | < k when Mis = M i . There are other best responses and beliefs that It will be shown that |M−i will lead to a decision period s + 1 selection with fewer than k states but it suffices to use this case. By Lemma 3(b) the true strategy will be in the minimal state inference set and thus by Condition r the minimal state inference of player −i is the same as the true strategy, M i , with probability p(M i ; Iis ). Since the system is not at a steady state condition R implies that a best response is chosen at random from the best response set instead of continuing with the selection s+1 from the previous decision period. By Condition R, Mis = M i with probability 1/x. Then M−i is not a k-state solution because the transitions of M i imply that DEFECT is a best response to M i . By Lemma 3(a) there is no (M1 , M2 ) ∈ XA for which M1 or M2 has fewer than k states. Thus, in any decision period s in which Mis is a k-state cooperative solution there is a positive probability of exiting the candidate absorbing set. Case (b). Assume Mis is a k-state competitive solution. The set Bi (M−i ) has multiple elements. For states qD in which fi (qD ) = D, it follows that τi (qD , D) is a variable which takes a value in Q, and for states qC in which fi (qC ) = C, it follows that τi (qC , C) is a variable which takes a value in Q. Hence, there is an M i ∈ Bi (M−i ) for which τi (qC , C) = qC and τi (qD , D) = τi (qD , C). Now it is shown that when Mis = M i then with positive probability eis+1 | < k or |Mis+2 | < k. ther |M−i By Condition r, the minimal state inference of player −i is the same as the true strategy, M i , with probability p(M i ; Iis ), and by Condition R Mis = M i with probability 1/x. Depending on the stage game parameter, g, the set of best responses to M i is either DEFECT or the set of ks+1 is DEFECT then the system has exited what was assumed to state cooperative solutions. If M−i s+1 be an absorbing set. If M−i is a k-state cooperative solution then case (a) applies with respect to Mis+2 , or it can be proved directly that |Mis+2 | < k. By Lemma 3(a) there is no (M1 , M2 ) ∈ XA for which M1 or M2 have fewer than k states. Thus, in any decision period s in which Mis is a k-state competitive solution there is a positive probability of exiting the absorbing set. It has been shown that there are positive probabilities each decision period of leaving the candidate absorbing set. 2 Proof of Theorem 5. Assuming there is an absorbing set XA in k-state automata, k > 1, we derive a contradiction. In Matching Pennies a player’s (pure strategy) minimax payoff is 1 and equals the maximum payoff that can be obtained in the stage game. This implies that the payoff of any best response to a minimal state inference is 1. Let player 1 be the row player and let I1s and I2s be any minimal state inference sets that appear when the system is in XA . For any M2 ∈ I2s , any best response B1 (M2 ) only plays H for states q in which f2 (q) = H and only plays T for states q  in which f2 (q  ) = T . And, for any M1 ∈ I1s any best response B2 (M1 ) only plays T for states q in which f1 (q) = H and only plays H for states q  in which f1 (q  ) = T . Suppose Mis+1 is a decision period s + 1 selection and is in the set Bi (M−i ) for some M−i ∈ s I−i . It follows from the previously stated properties of a best response that when i = 1, then τ1 (q, T ) can be any state in Q = {q 1 , q 2 , . . . , q k } for states q in which f2 (q) = H . And for states q  in which f2 (q  ) = T then τ1 (q  , H ) can be any state in Q. When i = 2, then τ2 (q, H ) can be any state in Q for states q in which f1 (q) = T . And for states q  in which f1 (q  ) = H then τ2 (q  , T ) can be any state in Q. Under Condition R the existence of the variable transitions implies there is a positive probability p > 0 that Mis+1 is chosen so that |B−i (Mis+1 )| < k. There are many possible combinations

E. Maenner / Games and Economic Behavior 63 (2008) 166–187

187

of the variable transitions for which this is true. For example, if q 1 is the initial state of M2s+1 , and f2 (q 1 ) = H, then for any M2s+1 in which the variable transition τ2 (q 1 , H ) equals q 1 it follows that |B1 (M2s+1 )| = 1. By Lemma 3(b), Mis+1 ∈ Iis+1 . It follows that under Condition r s+2 |  |B−i (Mis+1 )| < k. This establishes that there is a positive probability that in GDS, |M−i s s / XA for all s. 2 (M1 , M2 ) ∈ References Abreu, D., Rubinstein, A., 1988. The structure of Nash equilibria in repeated games with finite automata. Econometrica 56, 1259–1282. Banks, J., Sundaram, R., 1990. Repeated games, finite automata, and complexity. Games Econ. Behav. 2, 97–117. Binmore, K., Samuelson, L., 1992. Evolutionary stability in repeated games played by finite automata. J. Econ. Theory 57, 278–305. Birkhoff, G., Bartee, T., 1970. Modern Applied Algebra. McGraw–Hill, New York. Chatterjee, K., Sabourian, H., 2000. Multiperson bargaining and strategic complexity. Econometrica 68, 1491–1509. Eliaz, E., 2003. Nash equilibrium when players account for the complexity of their forecasts. Games Econ. Behav. 44, 286–310. Feller, W., 1968. An Introduction to Probability Theory and Its Applications, vol. 1, third ed. Wiley, New York. Fudenberg, D., Levine, D., 1993. Self-confirming equilibrium. Econometrica 61, 523–545. Fudenberg, D., Maskin, E., 1986. The Folk Theorem in repeated games with discounting or with incomplete information. Econometrica 54, 533–554. Jéhiel, P., 1995. Limited horizon forecast in repeated alternate games. J. Econ. Theory 67, 497–519. Jéhiel, P., 1998. Learning to play limited forecast equilibria. Games Econ. Behav. 22, 274–298. Jéhiel, P., 2001. Limited foresight may force cooperation. Rev. Econ. Stud. 68, 369–391. Rubinstein, A., 1986. Finite automata play the repeated prisoner’s dilemma. J. Econ. Theory 39, 83–96. Sabourian, H., 2003. Bargaining and markets: Complexity and the Walrasian outcome. J. Econ. Theory 116, 189–228. Spiegler, R., 2002. Equilibrium in justifiable strategies: A model of reason-based choice in extensive-form games. Rev. Econ. Stud. 69, 691–706. Spiegler, R., 2004. Simplicity of beliefs and delay tactics in a concession game. Games Econ. Behav. 47, 200–220. Volij, O., 2002. In defense of DEFECT. Games Econ. Behav. 39, 309–321.