A folk theorem for stochastic games with private almost-perfect monitoring

A folk theorem for stochastic games with private almost-perfect monitoring

Games and Economic Behavior 86 (2014) 58–66 Contents lists available at ScienceDirect Games and Economic Behavior www.elsevier.com/locate/geb Note ...

251KB Sizes 0 Downloads 77 Views

Games and Economic Behavior 86 (2014) 58–66

Contents lists available at ScienceDirect

Games and Economic Behavior www.elsevier.com/locate/geb

Note

A folk theorem for stochastic games with private almost-perfect monitoring ✩ Katsuhiko Aiba Institute for Microeconomics, University of Bonn, Adenauerallee 24-42, 53113 Bonn, Germany

a r t i c l e

i n f o

Article history: Received 10 November 2011 Available online 26 March 2014 JEL classification: C72 C73 D82

a b s t r a c t We prove a folk theorem for stochastic games with private, almost-perfect monitoring and observable states when the limit set of feasible and individually rational payoffs is independent of the state. This asymptotic state independence holds, for example, for irreducible stochastic games. Our result establishes that the sophisticated construction of Hörner and Olszewski (2006) for repeated games can be adapted to stochastic games, reinforcing our conviction that much knowledge and intuition about repeated games carries over to the analysis of irreducible stochastic games. © 2014 Elsevier Inc. All rights reserved.

Keywords: Stochastic games Private monitoring Folk theorem

1. Introduction The class of stochastic games includes models in which persistent shocks, stock variables representing human or natural resources, technological innovations, or capital play an important role. Stochastic games are extensively used in economics since they capture dynamic interactions in rich and changing environments. It is sometimes natural to assume that players cannot observe others’ actions in these dynamic interactions. For example, consider an oligopolistic market in which firms now set prices with their customers bilaterally in each period (Stigler, 1964). The firms cannot observe competitors’ price offers, but they obtain some information about these from their own sales that are unobservable to competitors. This is an example of imperfect private monitoring where players cannot directly observe others’ actions, but receive some private signals, which are imperfect indicators of the action taken in the current period. Such imperfect monitoring is extensively studied in repeated games, which one can view as stochastic games whose state variable is fixed. In this paper, we prove a folk theorem for stochastic games with private almost-perfect monitoring and observable states under the assumption that the limit set of feasible and individually rational payoffs is independent of the state (we call the assumption the asymptotic state independence). In general, when we analyze a repeated game with private monitoring in which there is no available public signal for players to coordinate on, there is no obvious recursive structure available (because players’ beliefs about opponents’ histories become increasingly complex as time proceeds). To avoid these obstacles, the literature has focused on belief-free equilibria,

✩ I am grateful to Johannes Hörner, Antonio Penta, Marzena Rostek, Bill Sandholm, Tadashi Sekiguchi, Ricardo Serrano-Padial, Lones Smith, referees, and seminar audiences at the Midwest Economic Theory Meeting Fall 2010 and University of Wisconsin–Madison for valuable comments and discussions. I also thank Peter Wagner for proof reading. E-mail address: [email protected].

http://dx.doi.org/10.1016/j.geb.2014.03.007 0899-8256/© 2014 Elsevier Inc. All rights reserved.

K. Aiba / Games and Economic Behavior 86 (2014) 58–66

59

introduced by Piccione (2002), and simplified and extended by Ely and Välimäki (2002) and Ely et al. (2005). In this class of equilibria any continuation play is optimal whatever private histories other players might have observed, so that players need not form beliefs about opponents’ histories in order to compute their optimal behaviors. For illustration, suppose that John and Susie repeatedly play the following prisoners’ dilemma.

C D

C D 2, 2 −1, 3 3, −1 0, 0

Note that Susie can ensure that John receives at least 2 when she plays C , and that he receives at most 0 when she plays D. Then John might have an incentive to play C today if Susie is more likely to play C tomorrow when she receives a sufficiently informative signal that John played C today. By suitably choosing the probability with which she plays C tomorrow as a function of her private information today, Susie can ensure that John is indifferent between C and D. In turn, because he is indifferent between C and D, it is optimal for him to condition his play tomorrow on his private information so as to make Susie indifferent between C and D. Strategy profiles of this sort constitute sequential equilibria with the belief-free property. Ely and Välimäki (2002) use this belief-free approach to prove the folk theorem in repeated prisoners’ dilemmas. But Ely et al. (2005) show that Ely and Välimäki’s construction does not extend to general stage games. Hörner and Olszewski (2006) (hereafter HO) exploit the essential feature of belief-free equilibrium to prove the folk theorem for general stage games. To do so, they divide the repeated game into a sequence of T -period block games. Provided that T is sufficiently large, the payoff structure of prisoners’ dilemma can be recovered from any stage game, using two T -period block strategies for each player: a “good” strategy and a “bad” strategy, which correspond to C and D in the prisoners’ dilemma above. John might prefer to play the good strategy if Susie is more likely to play her good strategy in the next block when she observes a sufficiently informative signal that John played his good strategy in the current block. By suitably choosing the probability with which she plays the good strategy in the next block as a function of her current block’s private histories, Susie can ensure that John is indifferent between the good and bad strategy at the beginning of each block regardless of what she observed before, and can also ensure the sequential rationality of his strategy during each block. It is the latter requirement that creates additional complications in HO’s construction relative to the construction of belief-free equilibrium. 1.1. Our construction The heart in this paper lies in the adaptation of HO’s construction from repeated games to stochastic games. To begin, we divide the entire stochastic game into consecutive T -period block games, for which we define a “good” strategy and a “bad” strategy. Then, we introduce L-period Markov strategies that are repeatedly played within the block. It is well known (Blackwell, 1965) that any extreme point of the set of feasible payoffs in a stochastic game is achieved by a pure Markov strategy profile. Hence, any targeted payoff profile in the set is approximately achieved by repeatedly playing the L-period Markov strategy profile that consists of playing a finite sequence of pure Markov strategy profiles.1 Although strategies’ payoffs in stochastic games can be heavily dependent on the initial state, it is shown that the payoffs generated by the L-period Markov strategies depend little on the initial and final state during the L-period game when the asymptotic state independence holds.2 Hence, an L-period game in which these L-period Markov strategies are used is an “almost invariant stage game”, and the repeated play of these almost invariant stage games can be regarded as “almost repeated game”. By using these L-period strategies of the “almost repeated game”, we are able to construct “good” and “bad” strategies as in HO. We ensure that these strategies exhibit the belief-free property in the beginning of each block and sequential rationality during each block by defining carefully the transition probabilities between the good and bad strategies as one block ends and the next begins. Another minor difficulty presented by the stochastic game environment is that altering current actions affects not only the distribution of private signals, but also the distribution of future states. This implies that a player may have an incentive to deviate solely to ensure an advantageous distribution of future states. For example, if, during a play of a strategy profile, ˜ t +1 , then in period t the player has an incentive a player’s continuation payoffs are higher from state ωt +1 than from state ω t +1 to choose an action that makes ω more likely. To contend with this issue, one can ensure that when an opponent’s observations suggest that a player has chosen this kind of advantageous action, the probability that the opponent plays the bad strategy in the next block goes up. By carefully conditioning the transition probabilities on the realized states, we are able to maintain both the belief-free property and the within-block sequential rationality. The folk theorem for stochastic games has already been proved under both perfect monitoring and imperfect public monitoring. In the case of perfect monitoring, Dutta (1995) proved the folk theorem for stochastic games when the asymptotic 

1 More specifically, any payoff profile v in the set of feasible payoffs can be written as convex combination k λk v k with an extreme point v k being  achieved by a pure Markov strategy profile gk . Then, by choosing L k and L such that L k / L ≈ λk and L = k L k , consider the following L-period Markov strategy profile: play g 1 for L 1 periods, followed by g 2 for L 2 periods and so on. Now v is approximately attained by repeatedly playing this L-period Markov strategy profile. 2 The proof follows from some results in Dutta (1995).

60

K. Aiba / Games and Economic Behavior 86 (2014) 58–66

state independence holds. More recently, Hörner et al. (2011) and Fudenberg and Yamamoto (2011) independently proved the folk theorem for stochastic games with imperfect public monitoring under a similar assumption. Section 2 describes the model and result. In Section 3 we provide the sketch of the proof for the two player case and describe the adaptation of HO’s construction from repeated games to stochastic games. Finally, the paper is concluded in Section 4. All the details of the proof and cases with more than two players are left to the supplementary material that is available online. 2. Model and result This paper considers n-player stochastic games with private monitoring, where the stage game is played in each period over an infinite horizon. 2.1. The stochastic game The stage game in each period proceeds as follows: (i) (ii) (iii) (iv)

Each player i publicly observes state ω from the finite set Ω . Each player i independently chooses an action ai from a finite set A i . Each player i receives a stochastic private signal σi from a finite set Σi . The game moves stochastically to next period’s state ω ∈ Ω .

The action ai and the signal σi are private information that only player i can observe. The transition probability from current state ω to next period’s state ω is denoted by p (ω | ω, a) that depends on current action profile a = (a1 , . . . , an ). A profile σ = (σ1 , . . . , σn ) of private signals is drawn with probability m(σ | ω, a) given current state ω and action profile a. Let mi (σi | ω, a) denote the marginal probability that player i observes the signal σi . After choosing ai and observing σi , each  player i collects a realized payoff u˜ i (ω, ai , σi ), so that the ex-ante payoff in the stage game is given by u i (ω, a) = σi ∈Σi u˜ i (ω, ai , σi )mi (σi | ω, a). The payoff function u i is naturally extended from pure action profile a ∈ A := A 1 × · · · × A n to mixed action profile α ∈  A :=  A 1 × · · · ×  A n . In this paper we study the case in which private signals are sufficiently informative that cooperation is possible. For this purpose and simplifying the analysis we assume the following about the monitoring technology. Assumption 2.1 (Canonical signal space). Σi = A −i for each i. Assumption 2.2 (ε -Perfect monitoring). For each mi (σi = a−i | ω, a)  1 − ε .

ε  0, each player i, each state ω ∈ Ω , and each action profile a ∈ A,

Assumption 2.3 (Full support monitoring). m(σ | ω, a) > 0 for all

ω ∈ Ω , a ∈ A, and σ ∈ Σ .

Note that the random variables ω and σ are conditionally independent given (ω, a), so that players do not update their beliefs about their opponents’ signals after publicly observing state ω . Moreover, as → 0, the monitoring distribution m changes but the state transition probability p remains fixed. We denote by ωt , ati , and σit the realized state, player i’s realized action, and player i’s observed signal in period t, respectively. Player i’s private history up to and including period t is then hti := (ω1 , a1i , σi1 , . . . , ωt , ati , σit ) ∈ H ti := (Ω × A i × Σi )t . Let h0i = ∅ denote the null history before the initial state is drawn. A (behavior) strategy for player i is a mapping

∞

si : H i × Ω →  A i , where H i = t =0 H ti , that is, a mapping taking a past private history hti −1 and a current realized state ωt to a mixed action. We denote by S i the set of all strategies for player i. Note that S i includes the (stationary) Markov strategies, which are functions of the current state only. Given a strategy profile s = (s1 , . . . , sn ) ∈ S := S 1 × · · · × S n and initial state w, let U i (ω, s; δ) denote the discounted average payoff for player i with discount factor δ and let U i (ωt , s | hti −1 ; δ) denote the expected continuation payoff after

observing (hti −1 , ωt ), given player i’s consistent belief about opponents’ private histories ht−−i 1 . Then strategy profile s is a sequential equilibrium3 of the stochastic game if no player has a profitable deviation after any private history, that is, U i (ωt , s | hti −1 ; δ)  U i (ωt , si , s−i | hti −1 ; δ) for all (hti −1 , ωt ) and all strategies si of every player i.

2.2. Feasible and individually rational payoffs The set of feasible payoffs for repeated games is defined in terms of the stage game payoffs since any feasible payoff is a convex combination of the extreme points attained by repeatedly playing a constant action in the stage game. But the

3 In our finite setting, a stationary Markov perfect equilibrium always exists (Sobel, 1971, Theorem 1). Hence, the existence of a sequential equilibrium is guaranteed, since a Markov perfect equilibrium is also a sequential equilibrium.

K. Aiba / Games and Economic Behavior 86 (2014) 58–66

61

stage game can change from period to period in stochastic games. Hence, the set of feasible payoffs is defined in terms of discounted average expected payoffs as V (ω; δ) = co{(U 1 (ω, s; δ), . . . , U n (ω, s; δ)) for some s ∈ S } given initial state ω and discount factor δ . Note that unlike in repeated games, V (ω; δ) varies with the discount factor and initial state. Since the folk theorem is an asymptotic result, it is natural to focus on the limit set of feasible discounted average payoffs V (ω; δ) as δ approaches 1. Dutta (1995) shows that this limit can be expressed relatively simply as the set V (ω) of the feasible limit-average payoffs4 : Lemma 2.4. As δ → 1, V (ω; δ) → V (ω) in the Hausdorff metric for all states ω . We define the set of individually rational payoffs. In the stochastic game with initial state ω , player i’s min–max payoff is given by v ∗i (ω; δ) = mins−i ∈ S −i maxsi ∈ S i U i (ω, si , s−i ; δ). From Neyman and Sorin (2003), for discount factors close to 1, the min–max payoff v ∗i (ω; δ) is approximately equal to the limit-average min–max payoff v ∗i (ω)5 : Lemma 2.5. As δ → 1, v ∗i (ω; δ) → v ∗i (ω) for all states ω and players i. We call a payoff vector v = ( v 1 , . . . , v n ) individually rational in terms of the discounted average criterion if v i > v ∗i (ω; δ), and individually rational in terms of the limit-average criterion if v i > v ∗i (ω). Stochastic games can generate incentive problems that are absent from repeated games. For example, suppose that there is an absorbing state. Then a low individually rational payoff for player i starting from another state might not be supportable as an equilibrium if player i can guide play to the absorbing state in which her payoffs are high. Alternatively, suppose that there is no absorbing state, but that player i can force play to remain in a certain state ω . Again, a low individually rational payoff for player i starting from another state might not be supportable as an equilibrium, since player i might deviate in a way that leads play to state ω , and then earn high payoffs by maintaining state ω forever. In order to rule out these possibilities, which would make the folk theorem fail, the following assumptions are made. Assumption 2.6 (Asymptotic state independence). We assume: (i) The set of limit-average feasible payoffs V (ω) is independent of initial state ω . (ii) The limit-average min–max payoff v ∗i (ω) is independent of initial state ω for all i. This state independence assumption is satisfied for irreducible stochastic games in which, for all players i and all pairs of states ω and ω , there is some finite sequence of action profiles of players −i that moves the state from ω to ω with positive probability, independent of player i’s actions.6 Examples of irreducible stochastic games are very common in economics, see e.g. Rotemberg and Saloner (1986) and Besanko et al. (2010). By Assumption 2.6 we can let V (ω) = V and v ∗i (ω) = v ∗i for all states ω . We write V ∗ (ω; δ) and V ∗ for the interior of the set of feasible and individually rational payoffs in terms of the discounted average criterion and the limit average criterion respectively:





V ∗ (ω; δ) = int v ∈ V (ω, δ)  v i > v ∗i (ω, δ), ∀i



and







V ∗ = int v ∈ V  v i > v ∗i , ∀i .

It follows from Lemmas 2.4 and 2.5 that as δ goes to 1, V ∗ (ω; δ) converges to V ∗ for all states

ω.

2.3. Main result While in general the set of equilibria of a stochastic game with imperfect monitoring depends on the exact specification of the monitoring structure, our main conclusion does not. We denote by E (ω, δ, ε ) the set of discounted average payoff vectors in the stochastic game with initial state ω that are sequential equilibrium payoff vectors for all ε -perfect monitoring structures. Our main result is: Theorem 2.7 (The folk theorem). For any v ∈ V ∗ and any ε -perfect monitoring structure, there exists a sequential equilibrium in which player i obtains discounted average payoff v i , so long as the discount factor δ is sufficiently close to 1 and the noise level ε is sufficiently close to 0. That is, ω , strategy profile s, and time sequence { T k }k∞=1 with 0 < T 1 < T 2 < T 3 < · · · is defined as U i (ω, s, { T k }k∞=1 ) = ∞ t =1 u i (ω , a )], provided that the limit exists for all players i. We denote by T (ω , s) the set of sequences { T k }k=1 such that the limit ∞ ∞ of the averages exists for initial state ω and strategy profile s. Then, V (ω) = co{(U 1 (ω, s, { T k }k=1 ), . . . , U n (ω, s, { T k }k=1 )) for some s ∈ S and { T k }k∞=1 ∈ T (ω, s)}. 5 The precise definition is v ∗i (ω) = mins−i ∈ S −i maxsi ∈ S i inf{ T k }∞ ∈T (ω,si ,s−i ) U i (ω, si , s−i , { T k }k∞=1 ). k=1 4

The limit-average payoff for initial state

lim T k →∞ E ω,s [ T1 k

T k

t

t

6 The intuition why irreducibility implies Assumption 2.6 is as follows. Given the finiteness of the state space, the assumption of irreducibility ensures that the probability of a transition from any initial state to any other state in finite time is one. Since payoffs during a finite number of periods are negligible in terms of the limit-average criterion, the payoff attained from any state can also be attained from any other state.

62

K. Aiba / Games and Economic Behavior 86 (2014) 58–66

∀ v ∈ V ∗ , ∃δ < 1, ε > 0, ∀δ ∈ (δ, 1), ∀ε ∈ (0, ε ), ∀ω ∈ Ω, v ∈ E (ω, δ, ε ). Theorem 2.7 shows that in stochastic games any payoff in V ∗ can be supported in a sequential equilibrium if players are sufficiently patient and receive noisy but highly precise private information about opponents’ behavior. In particular, sufficiently patient players are able to achieve approximately Pareto-efficient payoffs. The remainder of the paper is devoted to the sketch of proof of Theorem 2.7 for the two players case. A reader might refer to Online Appendix for the details of the proof and a discussion of the cases with more than two players. 3. Proof for two players In this section, we sketch the proof of Theorem 2.7 for the two players case. Before that, we briefly review HO’s argument for repeated games with private monitoring. HO divide the repeated game into consecutive T -period blocks. They construct two T -period strategies siB , siG such that, in equilibrium, player i is indifferent between two strategies and weakly prefers them to all others, no matter what each player’s private history before the current block is, if player −i uses either of B two strategies s− , s G . This can be done by a suitable choice of the probability with which player −i uses one of the two i −i strategies within the current block (the transition probability) as a function of her recent history and of her recent strategy within the previous block. This notable idea makes sure that beliefs about private histories are irrelevant at the beginning of each block and that strategies siB , siG depend only on recent histories within the current block. This approach makes the analysis tractable and actually enables them to prove the folk theorem in the repeated game with general stage game for almost-perfect monitoring. Note that when T = 1, siB = D, and siG = C , this argument reduces to a belief-free equilibrium in repeated prisoners’ dilemma by Ely and Välimäki (2002). Hence, HO’s idea is the extension of belief-free concept. In the stochastic game considered in this paper, we modify the above construction in two respects. First, HO use constant or deterministic sequences of stage game actions within a block to achieve payoffs surrounding a target payoff, and they derive strategies siB , siG from those. With stochastic games, the stage game payoffs can be very different depending on the state. Therefore, we introduce L-period stochastic game using L-period Markov strategies, whose payoff matrices are almost identical regardless of the initial state. Hence this L-period stochastic game can be regarded as an “almost invariant stage game” and its repeated play as an “almost repeated game”. Then we use a finite sequence of these L-period Markov strategies within a block to approximate payoffs surrounding a target payoff and derive strategies siB , siG from those. During the construction we also use a finite sequence of L-period (not necessarily Markov) strategies to approximate min–max payoffs. Second, in stochastic games current actions affect both the distribution of private signals and the distribution of future states. This implies that a player may deviate to make the state distribution advantageous for her. For example, suppose that, during a play of a strategy profile, the continuation payoffs are higher for player i from state ωt +1 than from ˜ t +1 . Then player i has an incentive in period t to choose a current action which makes ωt +1 more likely. Hence, if we state ω make sure that, given a state, player −i’s observations suggesting that player i is choosing this kind of advantageous action are more likely to trigger a punishment (bad) strategy in the next block, then we will be able to make player i indifferent between siB , siG , as desired. We accomplish this by carefully specifying the above transition probability as function not only of private signals but also of the realized states. In what follows, we will consider the finite T -period (or L-period) stochastic game. Corresponding history sets and strategy sets are denoted H iT and S iT , and t ( T ) refers to a time period in the T -period stochastic game. The discounted average payoffs in this game with initial state ω is denoted U iT (ω, s; δ), where s ∈ S T := S 1T × S 2T . Moreover we will consider T → R at the end of the last period. Player i’s the finite T -period stochastic game with payoffs augmented by a transfer πi : H − i payoff in this auxiliary scenario is defined as U iA (ω, s, πi ; δ) := U iT (ω, s; δ) + (1 − δ)δ T E {πi | ω, s}. 3.1. Target payoff and “almost invariant stage games” We fix a target payoff vector v ∈ V ∗ throughout. Consider payoff vectors { w X Y } X ,Y ∈{G , B } surrounding v such that

w iG G > v i > w iB B , w 1G B

> v1 >

w 1B G

i = 1, 2 and

w 2B G > v 2 > w 2G B .

See Fig. 1. Therefore, there exist v i and v i such that v ∗i < v i < v i < v i and

   [ v 1 , v 1 ] × [ v 2 , v 2 ] ⊂ int co w G G , w G B , w B G , w B B . In the following lemma, under the asymptotic state independence we construct finite L-period strategies with payoffs arbitrarily close to w X Y or min–max payoffs v ∗ to generate “almost invariant stage game”. Lemma 3.1. Consider the perfect monitoring case, i.e., δ  δ , the following hold:

= 0. Then, for any η > 0, there exist L and δ such that for every L  L and

K. Aiba / Games and Economic Behavior 86 (2014) 58–66

63

Fig. 1. Payoffs.

(i) For each X Y ∈ {G , B }2 , there exists an L-period pure Markov strategy g X Y such that

 L   U ω, g X Y ; δ − w X Y  < η, i

i

∀ω ∈ Ω ∀ i ,

i (ii) For each player i, there exists an L-period strategy g − i for player −i such that for any L-period strategy s i of player i,

U iL





i ∗ ω, si , g − ∀ω ∈ Ω. i ; δ < v i + η,

The lemma tells us that the L-period payoff matrix generated by the L-period Markov strategy profiles { g G G , g G B , g BG , g } is arbitrarily close to payoff matrix { w G G , w G B , w BG , w B B }, whatever the initial state is. So an almost repeated game consists of the infinite repetition of these L-period stochastic games that can be regarded as an almost invariant stage games. g i−i is used as punishment for player −i in almost invariant stage game. In the following we apply HO’s technique to our almost repeated game, accounting for the effect of actions on state transitions. BB

3.2. Block games, and “good” and “bad” strategies Now we divide the stochastic game into consecutive T -period stochastic games, each of which we call a block game. Take T such that T − 1 is a multiple of L. The block game consists of the following two rounds. (i) Coordination Round (t = 1): Let G and B be a partition of A i . Player i sends good message M i = G if she chooses an action from G; otherwise player i sends bad message M i = B. (ii) Main Round (t = 2, . . . , T ): If player i observes a message profile ( M 1 , M 2 ) ∈ {G , B }2 in the coordination round, he M M plays a L-period pure Markov strategy g i 2 1 repeatedly. However, if player i sent bad message, and observed a signal M M1

indicating that player −i unilaterally deviated from g −i2 repeatedly in the remaining period of the block game.

, then player i plays the L-period min–max strategy g i−i

In this description of the block game, we denote by siG ∈ S iT (siB ∈ S iT ) the good (bad) strategy in which player i sends a M M

good (bad) message in the coordination round and follows the prescribed Markov strategy g i 2 1 in the main round. B Consider the perfect monitoring case7 with = 0. Observe that if player −i uses strategy s− , no matter what player i i does in the block game, she earns strictly less than v i in every L-period interval of the main round except at most one B interval in which player i deviates from g M 2 M 1 . Hence, if T is sufficiently large, U iT (ω, si , s− ; δ) in the block game cannot i

exceed v i for any si ∈ S iT and any initial state

G ω. Similarly, if T is sufficiently large, U iT (ω, si , s− i ; δ) cannot fall below v i for any si ∈ {siB , siG } and any initial state ω . Hence, for such T and all ω ∈ Ω ,     G T B min U iT ω, si , s− i ; δ > v i > v i > v i > max U i ω, si , s−i ; δ .

si ∈{siB ,siG }

si ∈ S iT

Now consider the imperfect private monitoring case. If > 0 is sufficiently small given T , an error event is very unlikely and hence the influence to T -period’s payoffs is so negligible that the above inequalities still hold.

7 In perfect monitoring, players surely receive a correct message profile in the coordination round and precisely observe opponent’s unilateral deviation in the main round.

64

K. Aiba / Games and Economic Behavior 86 (2014) 58–66

3.3. Perfect monitoring In this subsection we provide a sketch of the proof for the perfect monitoring case, in which i’s signal is equal to −i’s actual action and hence hti equals ht−i up to the ordering of signals and actions. We abuse the notation to denote it by

ht−i = hti . As in HO, we shall define the transfers

πiB , πiG : H −T i → R such that, in the block game, player i is indifferent

G B and weakly prefers them to all others if player −i uses either of two strategies s− i , s −i . B t −1 t t More specifically, by backward induction from period T to 1, we define θ (h−i , ω , ai ) as difference of the continuation

siB , siG

between strategies

payoffs from (ht−−i 1 , ωt ) between some strategy r iB ∈ S iT and another strategy of playing ati in period t and reverting to r iB B afterwards, provided that opponent −i plays a bad strategy s− . Now let i



T 1 t −1 B  t −1 t t  δ θ h −i , ω , a i . T



T πiB h− i :=

δ

(1)

t =1

Then the transfer (1), when added at the end of the block, guarantees that player i is indifferent across all her strategies in T -period stochastic game. In stochastic games, altering current actions affects not only the distribution of private signals, but also the distribution of future states. Transfer θ B accounts for that effect. If, for example, a change in player i’s action from what r iB prescribes lowers her continuation payoff through a change in the distribution of future states, then the transfer compensates her so that she keeps indifferent among all actions. Letting the transfer θ B depend on the realized state enables us to keep the indifference, which is how we adapt the construction of HO to deal with stochastic games. Similarly, define



T 1 t −1 G  t −1 t t  δ θ h −i , ω , a i , T



T πiG h− i :=

δ

(2)

t =1

where θ G is defined8 so that it adjusts the continuation payoffs for strategies siB , siG and other strategies si in S iT to make sure that, when the transfer (2) is added at the end of the block, player i is indifferent between siB and siG , and prefers G them to all other strategies in T -period stochastic game, provided that opponent −i plays good strategy s− . By adjusting i the transfers a little further and enlarging, if necessary, [ v 1 , v 1 ] × [ v 2 , v 2 ], we may assume without loss of generality that for all ω ∈ Ω ,

U iA













G B G B T ω, si , s− and U iA ω, si , s− i , πi ; δ = v i for all si ∈ S i . i , πi ; δ = v i for all si ∈ si , si ,

G B Note that player −i determines player i’s payoff in the auxiliary scenario, depending on whether she plays s− i or s−i . Given the target payoff v, the equilibrium strategies are described as follows: in the first T -period block player −i G B with probability q and strategy s− otherwise, where q ∈ [0, 1] solves v i = qv i + (1 − q) v i . After initial picks a strategy s− i i

G B B or s− throughout the block. If player −i plays s− and observes randomization, player −i sticks to the resulting strategy s− i i i G T B T  history h−i , then the new target payoff from the next block on becomes v = v i + (1 − δ)πi (h−i ). If player −i plays s− i G T T  and observes history h−i , then the new target payoff becomes v = v i + (1 − δ)πi (h−i ). Now in the second T -period block

player −i randomizes two strategies with probability q such that v i = q v i + (1 − q ) v i , and so on.9 Notice that if players G B follow this strategy, player i’s total average payoffs when player −i plays s− and s− respectively are, for si ∈ {siB , siG }, i i

 









1 − δ T U iT 1 − δ T U iT

















T = v i , and ω, si , s−B i ; δ + δ T v i + (1 − δ)πiB h− i G T T = v i , respectively. ω, si , s− v i + (1 − δ)πiG h− i i; δ + δ

Thus, with initial randomization probability q, the target payoff v is actually achieved. Now we claim that these strategies form a sequential equilibrium. From the transfer scheme (1), we can see that for each B . On the other hand, from private history in each block, player i is indifferent among all actions when player −i plays s− i

the transfer scheme (2), for each private history in each block, it is optimal for player i to follow any strategy in {siB , siG } G when player −i plays s− . Hence, given player −i’s strategy, any strategy that player i plays si ∈ {siB , siG } in each block is a i best reply. So the strategies described constitute a sequential equilibrium. 3.4. Imperfect private monitoring In this subsection, we consider the private almost-perfect monitoring, constructing block equilibria similar to those from the perfect monitoring case. We would like to design the transfers that give the desired incentives. While ht−i = hti in the 8 9

We omit the details, which are more complicated than those of θ B . See Online Appendix. The transfers shall be defined so that v  ∈ [ v i , v i ].

K. Aiba / Games and Economic Behavior 86 (2014) 58–66

65

perfect monitoring case, players may form non-trivial beliefs P r {ht−−i 1 | hti −1 , ωt } in the private monitoring case. Thus, we

construct transfers (1) and (2) such that in the block game, player i is indifferent between {siB , siG } and it is sequentially

B , s G .10 For that optimal to follow them given his beliefs P r {ht−−i 1 | hti −1 , ωt } if player −i uses either of two strategies s− i −i t −1 t − 1 E purpose, let H i be the set of all histories (h i , ωt ) ∈ H i × Ω , t  1 that cannot be reached by any strategy profile in

{s1B , s1G } × {s2B , s2G } when there is no error ( = 0), i.e., monitoring is perfect. H iR := ( H iT −1 × Ω) \ H iE is the set of histories that can be reached by some strategy profile in {s1B , s1G } × {s2B , s2G }. We call the former the set of erroneous histories and the latter the set of regular histories. Note that in the private monitoring case ( > 0) an erroneous history can be reached with positive probability. We denote by si | H iR and si | H iE strategies on regular histories and erroneous histories respectively. For regular history (hti −1 , w t ) ∈ H iR , it is not difficult to give the desired incentives using the transfers as in the perfect

monitoring case since Pr{ht−−i 1 = hti −1 | hti −1 , w t } → 1 as

→ 0. However, if player i reaches an erroneous history, he knows

that someone has observed an erroneous signal. For example, suppose that signal σit −1 is such that, given (hti −2 , ωt −1 ) ∈ H iR , it cannot happen on a regular history. Then player i thinks that either player i might have observed an erroneous signal in the current period or opponent −i might have observed an erroneous signal in the previous period even though opponent −i followed the prescribed strategy in both cases. Which event is likely to occur depends on the likelihood on the monitoring probability m. Moreover, there is another case that is absent in repeated games since the public state variable may give additional information about players’ actions together with private signals.11 Suppose that hti −1 can be reached on a regular history but t −1 i

ωt is such that it cannot happen on a regular history. Thus, ωt contradicts σit −1 so that player i believes that

is erroneous. In this case player i thinks that opponent −i observed an erroneous signal in the previous period signal σ and chose an action that is different from the one he should pick if he observed a correct signal. Again, which signal was likely to be observed and which action was likely to be chosen depends on the fine details of the transition probability p and the monitoring probability m, which makes it hard to specify optimal strategies on erroneous histories under generic assumptions. Hence, as in HO, we leave siB | H iE and siG | H iE unspecified and jointly determine transfers (πiB , πiG ) and strategies (siB | H iE , siG | H iE ) by the fixed point argument, still keeping incentives to follow siB | H iR and siG | H iR on regular histories. Now that the transfers (πiB , πiG ) and block strategies (siB , siG ) are determined, the equilibrium strategy is constructed as in the perfect monitoring case, which we leave to Online Appendix. 4. Conclusion We prove the folk theorem for stochastic games with private almost-perfect monitoring when the asymptotic state independence holds. In particular, our result shows that players can cooperate in a broader class of economic environments than those described by repeated games. Moreover, our result establishes that the folk theorem proven by Dutta (1995) for perfect monitoring case is robust to perturbation from perfect monitoring toward private monitoring, and that the sophisticated construction of Hörner and Olszewski (2006) for repeated games can be adapted to stochastic games, reinforcing our conviction that much knowledge and intuition about repeated games carries over to the analysis of stochastic games. Appendix A. Supplementary material Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.geb.2014.03.007. References Besanko, D., Doraszelski, U., Kryukov, Y., Satterthwaite, M., 2010. Learning-by-doing, organizational forgetting, and industry dynamics. Econometrica 78, 453–508. Blackwell, D., 1965. Discounted dynamic programming. Ann. Math. Statist. 36, 226–235. Dutta, P.K., 1995. A folk theorem for stochastic games. J. Econ. Theory 66, 1–32. Ely, J., Hörner, J., Olszewski, W., 2005. Belief-free equilibria in repeated games. Econometrica 73, 377–415. Ely, J., Välimäki, J., 2002. A robust folk theorem for the prisoner’s dilemma. J. Econ. Theory 102, 84–105. Fudenberg, D., Yamamoto, Y., 2011. The folk theorem for irreducible stochastic games with imperfect public monitoring. J. Econ. Theory 146, 1664–1683. Hörner, J., Olszewski, W., 2006. The folk theorem for games with private almost-perfect monitoring. Econometrica 74, 1499–1544.

B 10 Moreover, in the private monitoring case it is important that transfer πiB makes sure that if player −i plays s− , player i is indifferent across all her i actions, conditional on any history within the block. Because of this property, player i may assume, for the sake of computing her best replies, that her G opponent is playing s− , independently of her own private history. Otherwise player i would have to form her belief about −i’s current strategy and hence i i’s strategy would have to depend on her own history in the entire stochastic game before the current block, which makes the analysis untractable. t −1 t −1 11 Because of this, one might be concerned that if ωt is strongly correlated with σ− and ωt suggests that σ− is erroneous, then player i may revise i i

t −1 based on his beliefs about σ− i monitoring probability.

ωt . However, this is not an issue here because of the conditional independence between the transition probability and the

66

K. Aiba / Games and Economic Behavior 86 (2014) 58–66

Hörner, J., Sugaya, S., Takahashi, S., Vieille, N., 2011. Recursive methods in discounted stochastic games: an algorithm for δ → 1 and a folk theorem. Econometrica 79, 1277–1318. Neyman, A., Sorin, S. (Eds.), 2003. Stochastic Games and Applications. NATO ASI Series. Piccione, M., 2002. The repeated prisoner’s dilemma with imperfect private monitoring. J. Econ. Theory 102, 70–83. Rotemberg, J., Saloner, G., 1986. A supergame-theoretic model of price wars during booms. Amer. Econ. Rev. 76, 390–407. Sobel, M., 1971. Noncooperative stochastic games. Ann. Math. Statist. 42, 1930–1935. Stigler, G.J., 1964. A theory of oligopoly. J. Polit. Economy 72, 44–61.