An efficiency result in a repeated prisoner’s dilemma game under costly observation with nonpublic randomization

An efficiency result in a repeated prisoner’s dilemma game under costly observation with nonpublic randomization

Mathematical Social Sciences 101 (2019) 47–53 Contents lists available at ScienceDirect Mathematical Social Sciences journal homepage: www.elsevier...

334KB Sizes 0 Downloads 13 Views

Mathematical Social Sciences 101 (2019) 47–53

Contents lists available at ScienceDirect

Mathematical Social Sciences journal homepage: www.elsevier.com/locate/mss

An efficiency result in a repeated prisoner’s dilemma game under costly observation with nonpublic randomization Yoshifumi Hino 1 Business Administration Program, Vietnam–Japan University, Luu Huu Phuoc Street, My Dinh 1 Ward, Nam Tu Liem District, Hanoi, 12015, Viet Nam

highlights • • • •

I analyze an infinitely repeated prisoner’s dilemma under costly observation. I introduce a correlated signal (nonpublic randomization device) at the beginning of each stage game. I have shown an efficiency result with small observation cost. We find that our results hold under both an arbitrary strongly or weakly correlated signal.

article

info

Article history: Received 30 October 2017 Received in revised form 20 June 2019 Accepted 22 June 2019 Available online 29 June 2019

a b s t r a c t We consider an infinitely repeated prisoner’s dilemma game with costly observation, where each player chooses a pairing of an action and an observational decision. If the player observes the opponent, the player pays an observation cost and observes the action just played by the opponent. Otherwise, the player cannot obtain any information about the action chosen by the opponent. We then introduce a correlated signal at the beginning of each stage game (nonpublic randomization) and prove an efficiency result without any implicit communication on the condition that the discount factor is sufficiently close to one and the observation cost is sufficiently small. We find that our results hold under both an arbitrary strongly or weakly correlated signal. © 2019 Elsevier B.V. All rights reserved.

1. Introduction In this paper, we consider an infinitely repeated prisoner’s dilemma game with costly observation. In our game, each player chooses whether to observe the opponent in each period, but cannot obtain any information about the action chosen by the opponent without observation. This observation incurs a positive observation cost. Importantly, it is an open question whether efficiency is achievable in an infinitely repeated prisoner’s dilemma game or not. We show, though, that if the players receive private and correlated signals at the beginning of each period, then efficiency can be approximated by a sequential equilibrium. To address this, we introduce ‘‘nonpublic randomization’’ in the form of correlated signals at the beginning of each stage game E-mail address: [email protected]. 1 I would like to thank my advisor, Yasuyuki Miyahara. I am also grateful to Hedeo Suehiro and Hisao Hisamoto for their valuable comments and suggestions, and to participants of 2017 Japanese Economic Association Spring Meeting in Ritsumeikan University and seminars at Vietnam–Japan University and Kobe University. Furthermore, I would like to thank Vietnam–Japan University and Japan International Cooperation Agency (JICA). This research did not receive any specific grant from funding agencies in the public commercial, or not-for-profit sector. All remaining errors are my responsibility. https://doi.org/10.1016/j.mathsocsci.2019.06.005 0165-4896/© 2019 Elsevier B.V. All rights reserved.

where each player receives a binary signal. As players receive these signals before the choice of action, the signals do not contain any information about the action chosen by the opponent. We demonstrate the effects of this nonpublic randomization throughout the paper. Our main contribution is to provide an efficiency result without communication in a repeated prisoner’s dilemma game. This is in contrast to the existing literature, which yields an efficiency result only when communication is available. An additional contribution is the construction of sequential equilibria. Our construction of the strategy, as explained at the end of this section, is quite simple and uses only a three-state automaton, whereas other studies require a substantially more complicated construction (e.g., Miyagawa et al. (2008) employ a six-state automaton). Moreover, our propositions require only (1) that the distribution of the signals is not independent, and (2) that the distribution of the signals has full support. Therefore, our signal requirement is not restrictive and can therefore approximate two kinds of crucial situations: an independent signal and a perfectly correlated signal (public randomization). If the correlation coefficient is arbitrarily close to zero, the signals are an almost independent signal; if the correlation coefficient is arbitrarily close to either −1 or 1, the signal is close to the perfectly correlated signal.

48

Y. Hino / Mathematical Social Sciences 101 (2019) 47–53

As discussed, we use nonpublic randomization in this analysis, which can be interpreted as a special type of a mediator. Some previous works show that efficiency is achievable if a dynamic mediator is available. Aoyagi (2005) uses dynamic mediated strategies under ε -perfect monitoring. Costly observation, our monitoring structure, is not ε -perfect, so his result does not hold in our model. In contrast, Rahman and Obara (2010), Rahman (2012) consider a contract and a mediator, with (Rahman and Obara, 2010) assuming that the mediator is endogenous, while Rahman (2012) provides some propositions given the mediator. In both these studies, the entire profile of recommendations is revealed publicly at the end of the each stage game, whereas in our model, the recommendation is never made public and so the players’ continuation strategies do not rely on recommendations made in the past. In addition, unlike our propositions, most of the existing literature on mediators requires strongly correlated recommendations to provide their results. Costly observation is a kind of imperfect monitoring. Some studies on imperfect monitoring confine their attention to public monitoring, with Abreu et al. (1990) characterizing public perfect equilibria, Fudenberg et al. (1994) providing conditions for folk theorem and Fudenberg and Levine (1994) describing the limit set of public perfect equilibrium payoffs. However, when the monitoring structure is private, analysis is much more difficult, with seminal work being Sekiguchi (1997), who shows a nearly efficient sequential equilibrium under private monitoring given that the private signals are almost perfect and players are patient. We can divide subsequent studies on private monitoring into three kinds of approaches: belief-based equilibrium analysis, belief-free equilibrium analysis, and communication-based equilibrium analysis. For the most part, the present paper and most of the existing literature on costly observation employ beliefbased equilibrium analysis, with Miyagawa et al. (2003) analyzing the same monitoring structure as this analysis. They premise a sufficiently small observation cost and demonstrate a Nash folk theorem; that is, any payoff vector that Pareto-dominates a stage-game Nash payoff vector can be achieved by a sequential equilibrium. Their sufficient condition for the folk theorem requires that each player can choose at least three actions so that players can communicate via mixed actions. However, their result does not cover an infinitely repeated prisoner’s dilemma game where each player can choose only two kinds of actions. The monitoring structure in Flesch and Perea (2009) is close to that in this paper, and assumes that players cannot obtain any information when they do not observe the opponent. In their model, players can purchase precise information about the actions taken in the past by the other player. They show that a folk theorem holds without any randomization device or cheap talk when at least three players (resp. four players) are involved, and each player has at least four actions (resp. three actions). On this basis, Flesch and Perea (2009) conjecture that an efficiency result could not hold in a two-player prisoner’s dilemma game. However, our result (Proposition 2) proves that an efficiency result holds in their monitoring structure when a nonpublic randomization device is available. In other related work, Lehrer and Solan (2018) and Kandori and Obara (2004) assume that the observational decision is observable. Lehrer and Solan (2018) suppose that the observational decision is common knowledge and analyze a high frequency repeated game, where they characterize the limit set of public perfect equilibrium payoffs as the observation cost tends to zero and prove that the set is a strict subset of the set of feasible and individually rational payoffs. Alternatively, Kandori and Obara (2004) assume that each player can obtain almost perfect information about the observational decision of the opponent when observing the opponent, and provide an efficiency result for any level of observation cost.

Elsewhere, Miyagawa et al. (2008) introduce a free signal about actions taken by players. In their model, players obtain an imperfect signal even when they do not observe the opponent. They also assume that players choose an observational decision after the choice of action, and that public randomization devices are available just before each decision, and by doing so demonstrate a folk theorem for any level of observation cost. One belief-free equilibrium analysis that includes costly observation is Sugaya (2011), who considers repeated games under general private monitoring. He shows that folk theorem holds when the number of each player’s signal is sufficiently large. However, as the number of signals that players can observe in costly observations is small, his result does not cover costly observation. Many studies (e.g., Compte (1998), Kandori and Matsushima (1998), Fudenberg and Levine (2007), and Obara (2009)) implement communication-based equilibrium analysis. Communication enables players to share information without cost and sometimes helps to coordinate with each other. For example, BenPorath and Kahneman (2003) and Sugaya and Wolitzky (2016) undertake communication-based equilibrium analysis. Ben-Porath and Kahneman (2003) show that a folk theorem holds under costly observation. Lastly, Sugaya and Wolitzky (2016) consider an infinitely repeated game under general private monitoring and show a simple sufficient condition for the existence of a recursive upper bound on the sequential equilibrium payoff set in twoplayer repeated games. These are unlike our model where there is no communication. Let us now briefly explain our equilibrium. Consider a simple repeated prisoner’s dilemma game where each player chooses Ci or Di every period. As mentioned, a private signal zi ∈ {1, 2} for player i is realized at the beginning of the stage game. Assume that Pr(zj = 2|zi = 1) > Pr(zj = 2|zi = 2) for each i, j = 1, 2. Our strategy has only two states on the equilibrium path, a cooperation state and a defection state, and one state off the path, an error state. In the cooperation state, player i chooses Ci if zi = 1, and mixes Ci and Di if zi = 2. Here, player i observes the opponent in the cooperation state if and only if he chooses Di . Player i chooses Di and does not observe the opponent in the defection state, and chooses an optimal action and observational decision in the error state given his/her belief and the state of the opponent. The state transition of each player i is then conditional not only on what the player observes but also on any action by the player. If player i chooses Ci or observes Cj , the state remains the same, but if player i chooses action Di and observes Dj , then the state moves to the defection state in the next period. However, the states remain the same in both the defection and error states. That is, this strategy is a variant of the grim strategy whose trigger is playing Di and observing Dj in the same period. The role of nonpublic randomization is to change the belief of the action chosen by the opponent and change the best response action in the cooperation state. Both actions Ci and Di must be indifferent for player i when the player receives zi = 2 in the cooperation state because player i randomizes actions Ci and Di . Player i believes that zj = 2 is realized with a higher probability when the player receives zi = 1 than when the player receives zi = 2 because Pr(zj = 2|zi = 1) > Pr(zj = 2|zi = 2). This implies that player j chooses Dj with a higher probability. Given player i wants to remain in the cooperation state, player i then has a stronger incentive to choose Ci when the player receives zi = 1 than when zi = 2. Therefore, action Di is suboptimal for player i when zi = 1 is realized and his state is the cooperation state. The fact that action Di is suboptimal for player i when the player receives zi = 1 in the cooperation state yields an incentive to observe the opponent. Assume that player i does not observe

Y. Hino / Mathematical Social Sciences 101 (2019) 47–53 Table 1 Prisoner’s Dilemma. C2 C1 D1

D2

1 , 1 + g,

1

−ℓ,

−ℓ

0 ,

1+g 0

Table 2 The probability distribution of private signals. z2 = 1 z1 = 1 z1 = 2

1 (1 2

z2 = 2 1 (1 2

p

− p − q)

− p − q)

q

the opponent when choosing action Di in the cooperation state. Player i then cannot know to which state he should move. We show that no matter what player i chooses in the next period, player i must pay an opportunity cost when receiving zi = 1. Accordingly, action Di is suboptimal for player i when zi = 1 and his state is the cooperation state. This means that the player must pay some opportunity cost of choosing action Di if the action of the opponent was Cj in the previous period. Action Ci is also costly for player i when the action of the opponent was Dj in the previous period. Therefore, player i must pay some positive (opportunity) costs, irrespective of the action in the next period. Thus, if the observation cost is sufficiently small when compared with the above costs, player i prefers to observe the opponent when choosing action Di in the cooperation state. Owing to the nonpublic signal, players thus have an incentive to observe the opponent. Now suppose a lack of nonpublic randomization. Players are then always indifferent between Ci and Di in the cooperation state, and players will prefer Di in the defection state. This means that one of the optimal continuation strategies with respect to action is choosing Di every period irrespective of what players observe in that period. Therefore, this strategy does not constitute a sequential equilibrium. The rest of this paper is organized as follows. Section 2 introduces the model of a repeated prisoner’s dilemma with costly observation. Section 3 discusses our results and Section 4 provides some concluding remarks. 2. Model The base game is the prisoner’s dilemma. Each player i (i = 1, 2) chooses an action, Ci or Di . Let ai be the action for player i and a = (a1 , a2 ) be an action profile. Let us denote by Ai ≡ {Ci , Di } the set of the actions for player i. The set of action profiles is denoted by A = A1 × A2 . Given an action profile a, the base game payoff for player i, ui (a), is as in Table 1. We make a standard assumption on the above payoff matrix. Assumption 1. i g > 0 and ℓ > 0; (ii) 1 + g − ℓ < 2. The first condition implies that Ci is dominated by Di for each player i, and the second condition ensures that the payoff vector of action profile (C1 , C2 ) is Pareto-efficient. The stage game proceeds as follows: First, each player i receives a private signal zi ∈ {1, 2}. The distribution of private signals is given by Table 2. We impose an assumption on the distribution. Assumption 2. i p > 0, q > 0 and p + q < 1; (ii) 4pq ̸ = (1 − p − q)2 The first assumption is a full support assumption and the second ensures that signals are not independent. The signal is unaffected by the action chosen by the opponent because the signal is realized before action choice.

49

After each player i receives signal zi , each player i chooses the pair of an action and an observation decision simultaneously. Let mi represent the observational decision of player i and Mi ≡ {0, 1} be the set of observational decisions for player i, where mi = 1 represents ‘‘to observe the opponent’’, and mi = 0 represents ‘‘not to observe the opponent’’. If player i observes the opponent, the player incurs an observation cost λ > 0, and receives complete information about the action chosen by the opponent at the end of the stage game. If player i does not observe the opponent, the player does not incur any cost and therefore obtains no information about the opponent’s action. We assume that the observational decision for a player is unobservable, but irrespective of player i’s observational decision, the player can observe its own action. A stage behavior for player i is the pair of base game action ai for player i and observational decision mi of player i and denoted by bi = (ai , mi ). An outcome of the stage game is a pair of b1 and b2 . Let Bi ≡ Ai × Mi be the set of pairs of actions and observational decisions for player i, and let B ≡ B1 × B2 be the set of outcomes of the stage game. Given an outcome of the stage game b ∈ B, the stage game payoff πi (b) for player i is given by

πi (b) ≡ ui (a) − mi · λ. We denote by G(λ) the above stage game given observation cost λ. For any observation cost λ > 0, stage game G(λ) has a unique Nash equilibrium outcome, b∗ = ((D1 , 0), (D2 , 0)). Next, we define the infinitely repeated game. Players play the stage game G(λ) repeatedly over periods t = 1, 2, . . . . Let δ ∈ (0, 1) be a common discount factor and Γ (G(λ), δ ) be the infinitely repeated game with the stage game G(λ) and the common discount factor δ . Given a sequence of the outcome of the stage games (bt )∞ t =1 , player i’s payoff from the repeated game is the average discounted payoff, which is given by (1 − δ )

∞ ∑

δ t −1 πi (bt ).

t =1

Players maximize the expected payoff of the repeated game. Players receive no information about the action chosen by the opponent when they do not observe the opponent. This implies that players do not receive the base game payoffs in the course of play. As in Miyagawa et al. (2003), we interpret the discount factor as the probability with which the repeated game continues. We also assume that each player receives the sum of the payoffs when the repeated game ends. Consequently, the assumption of no-free signal regarding the actions is less problematic. Let oi ∈ Aj ∪ {φi } be an observation result for player i. Observation result oi = aj ∈ Aj means that player i chooses observational decision mi = 1 and observes aj . Observation result oi = φi means that player i chooses mi = 0; that is, the player obtained no information about the action chosen by the opponent. Let hti be a (private) history of player i at the beginning of 1 period t ≥ 2: hti = (zik , aki , oki )tk− =1 . This is the sequence of received signals, comprising the player’s own chosen actions and the player’s observation results up to period t − 1. We omit the observational decisions after hti as observation oki implies the observational decision mki for any k. Let Hit denote the set of all histories for player i at the beginning of⋃ period t ≥ 1, where Hi1 ∞ is an arbitrary singleton set. We denote t =1 Hit by Hi . The (behavior) strategy for player i in the repeated game is a function of the pair of histories and received signals for player i for the player’s stage behavior: σi : Hi × {1, 2} → ∆(Bi ), where ∆(Bi ) is the set of probability distributions over Bi . The belief ψit of player i in period t is a function of the history hti of player i at period t to a probability distribution over the set of histories and the signals for player j at period t. Let

50

Y. Hino / Mathematical Social Sciences 101 (2019) 47–53

ψi ≡ (ψit )∞ t =1 be a belief of player i, and ψ = (ψ1 , ψ2 ) denote a

Table 3 The probability distribution of ρr .

system of beliefs. A strategy profile σ is a pair of strategies σ1 and σ2 . Given a strategy profile σ , a tremble is a sequence of completely mixed behavior strategy profiles (σ n )∞ n=1 that converges to σ . Each completely mixed behavior strategy profile σ n induces a unique system of beliefs ψ n . The solution concept is sequential equilibrium. Let ψ ≡ (ψ1 , ψ2 ) be a system of beliefs. We say that a system of beliefs ψ is consistent with σ if there exists a tremble (σ n )∞ n=1 such that the corresponding sequence of system of beliefs (ψ n )∞ n=1 converges to ψ . Given the system of beliefs ψ , strategy profile σ is sequentially rational if, for each player i, the continuation strategy from each history is optimal given his/her belief at the history, and the opponent’s strategy. A strategy profile σ is a sequential equilibrium if there exists a consistent system of beliefs ψ for which σ is sequentially rational.

r2 = C2 r1 = C1 r 1 = D1

r2 = D2 1 (1 2

p 1 (1 2

− p − q)

− p − q)

q

3. Main results and proofs In this section, we provide our efficiency results. The section proceeds as follows. First, we show an efficiency result when the discount factor is moderately high and the observation cost is sufficiently low (Proposition 1). In this proposition, we fix an upper bound for the observation cost depending on the discount factor, such that the upper bound converges to zero as the discount factor converges to one. Second, we then extend Proposition 1 by showing an efficiency result when the discount factor is sufficiently high and the observation cost is sufficiently low (Proposition 2). Here, the upper bound of the observation cost in Proposition 2 does not converge to zero, even when the discount factor converges to one. Proposition 1. Suppose that Assumptions 1 and 2 are satisfied. For any ε > 0, there exist δ0 ∈ ( (g /(1 + ) g), 1) and λ > 0 such

that for any discount factor δ ∈ δ0 ,

δ0 +1 2

and for any observation

cost λ ∈ (0, λ), there exists a sequential equilibrium whose payoff vector (V1∗ , V2∗ ) satisfies that Vi∗ ≥ 1 − ε for each i = 1, 2. A history hti is said to be on the path of strategy σ if the history is realized with a positive probability given σ . A history is said to be off the path of strategy σ if it is not on the path of strategy σ . We elaborate upon the proposition using the following five steps: (1) the choice of the discount factor and the observation cost, (2) the automaton, (3) the payoff, (4) the incentive for action choice on the path of the equilibrium strategy, (5) the incentive for observation choice on the path of the equilibrium strategy, and (6) the sequential rationality off the path of the equilibrium strategy. 3.1. The choice of the discount factor and the observation cost Fix any ε > 0. We define δ0 and λ:

δ0 ≡

1 1+

εpq(1−p−q)|4pq−(1−p−q)2 | 128(2+g +ℓ+ε )2

,

λ≡

1 − δ0 6

We fix an arbitrary discount factor δ ∈ (δ0 , observation cost λ ∈ (0, λ).

g.

δ0 +1 2

) and an arbitrary

3.2. The automaton To define the desired sequential equilibrium, we define an auxiliary automaton (ωiC , Ωi , fi , τi ). The automaton has three states: a cooperation state ωiC , a defection state ωiD , and an error

Fig. 1. State transition τi .

state ωiE . The initial state is the cooperation state ωiC . The state space is denoted by Ωi ≡ {ωiC , ωiD , ωiE }, and the initial state is ωiC . First, we describe output function fi : Ωi → ∆(Bi ) and transition function τi : Ωi × Ai × Oi → Ωi in the cooperation state. In what follows, we confine our attention to the case of 4pq < (1 − p − q)2 . To describe the automaton, we define the following signal ri .2 Ci

if zi = 1,

Di

if zi = 2.

{ ri =

The probability distribution ρr (ri , rj ) of (r1 , r2 ) is summarized in Table 3. Let ρr (ri ) be the marginal probability of ri . In the cooperation state, if the signal ri is Ci , then player i chooses (Ci , 0). If the signal ri is Di , player i chooses (Ci , 0) with probability 1−βi and chooses (Di , 1) with probability βi . The value of βi is defined later as a solution of equation Eq. (2). The state in the next period remains the cooperation state if ai = Ci or oi = Cj is realized in the current period. The state moves to the defection state ωiD when (ai , oi ) = (Di , Dj ) is realized, and the state in the next period is the error state ωiE if player i chooses (Di , 0). Next, let us consider the defection state ωiD . Player i chooses (Di , 0) for certain. The state remains the defection state ωiD irrespective of (ai , oi ). Finally, we consider the error state ωiE . Player i chooses (Di , 0) for certain as for the defection state ωiD and the state remains the error state ωiE irrespective of (ai , oi ). The state transition rule is summarized in Fig. 1. Let σˆ = (σˆ 1 , σˆ 2 ) be a strategy profile represented by the above automaton (ωiC , Ωi , fi , τi ). We define a system of beliefs consistent with strategy σˆ . Let us consider the following tremble to define this belief system. If 2

When 4pq > (1 − p − q)2 , we define ri as follows.

{ r1 =

C1

if z1 = 1,

D1

if z1 = 2,

{ r2 =

C2

if z2 = 2,

D2

if z2 = 1.

Y. Hino / Mathematical Social Sciences 101 (2019) 47–53

player i receives signal ri = Ci in the cooperation state, he chooses 1

(

1

)

η[(Ci , 1)] + 1 − 2η − η η [(Ci , 0)] + η[(Di , 1)] + η η [(Di , 0)]. If player i receives signal ri = Di in the cooperation state, he chooses

( ) 1 η[(Ci , 1)] + (1 − βi ) 1 − η − η η [(Ci , 0)] ( ) 1 1 + βi 1 − η − η η [(Di , 1)] + η η [(Di , 0)]. In the defection and the error state, player i chooses

η[(Ci , 1)] + η[(Ci , 0)] + η[(Di , 1)] + (1 − 3η)[(Di , 0)]. By the above tremble and Bayes’ rule, we can define the belief system ψη for any small η > 0, and define the consistent belief system ψ ∗ as the limit of ψη as η approaches zero from above. If player i has not yet observed (Di , Dj ) until period T , the belief ψ ∗ is derived from Bayes’ rule, and ψ ∗ assigns the probability of one to the event ωj ̸ = ωjE in period T . This is because the event ωj = ωjE is off the path of strategy σˆ , but player i does not observe the off the path event until period T . Only when player i observed (Di , Dj ) in the past, the event may be off the path of strategy σˆ . The construction of the tremble ensures that if player i observed (Di , Dj ) in the past period T ′ < T , the belief ψ ∗ assigns the probability of one to the event ωj = ωjD in period T . To explain this fact, let us consider the following history hˆ Ti in period T that player i observed (Di , Dj ) in period 1, but observed (Ci , Cj ) from period 2 to T − 1. Let us confine our attention to a belief of ωj = ωjE in period T , which is derived as follows. Pr(ωj = ωjE |hˆ Ti ) =

Pr(ωj = ωjE , hˆ Ti ) Pr(ωj = ωjE , hˆ Ti ) + Pr(ωj ̸ = ωjE , hˆ Ti )

First, we consider an upper bound of Pr(ωj = ωjE , hˆ Ti ). If ωj in period T is the error state, then player j chose (Dj , 0) in period 1 and moved to the error state at the end of period 1. Therefore, 1

the probability Pr(ωj = ωjE , hˆ Ti ) has the upper bound of η η , which is the probability that player j chooses (Dj , 0) in the cooperation state. Next, we consider a lower bound of Pr(ωj ̸ = ωjE , hˆ Ti ). There are many histories of player j at which ωj in period T is not the error state. One of those histories is the following history h˜ Tj . Player j chooses (Dj , 1) in period 1 and moved to the defection state in period 2. He chooses (Cj , 0) from period 2 until period T − 1 through mistakes. Let us derive the probability of this history h˜ Tj . Player j chooses (Dj , 1) in period 1 with a probability 1

of at least ρr (rj = Dj )βj (1 − η − η η ) in the cooperation state. The probability that player j chooses (Cj , 0) in the defection state is η. Therefore, the probability of history h˜ Tj is bounded below by 1

ρr (rj = Dj )βj (1 − η − η η )ηT −2 . Hence, we have obtained a lower bound of Pr(ωj ̸ = ωE , hˆ T ) as follows. j

Pr(ωj ̸ = ω , ˆ E j

hTi )

i

> Pr(ωj ̸= ωjE , hˆ Ti , h˜ Tj ) 1

> ρr (rj = Dj )βj (1 − η − η η )ηT −2 . Using the above bounds, we finally derive the upper bound of probability Pr(ωj = ωjE |hˆ Ti ) as below. Pr(ωj = ω , hˆ Ti ) E j

51

The upper bound approaches zero as η approaches zero from above. Therefore, given beliefs ψ ∗ and hˆ Ti , player i is sure that ωj is not ωjE . Next, let us define our strategy σ ∗ using transition function τi . The transition function τi is a function of (ωi , ri , ai , oi ) to ωi . Therefore, for any history hti , the transition function τi specifies a state of player i in period t on the assumption that player i follows the transition function τi . Let us define the following extended transition function τˆi : Hi → Ωi and the partitions of Hi .

⎧ 1 τˆi (hi ) = ωiC , ⎪ ⎪ ( ) ⎨ ( 1 1 1) τˆi ri , ai , oi ≡ τi ωiC , ri1 , a1i , o1i , ( ( ) ⎪ τˆ ((r k , ak , ok )t −1 ) ≡ τ τˆ (r k , ak , ok )t −2 , ⎪ ⎩ i it −1 i t −i 1k=1t −1 ) i i i i i k=1 (ri , ai , oi ) , ⏐ { } ˆ i (ωi ) ≡ hti ∈ Hi ⏐ τˆi (hti ) = ωi , ∀ωi ∈ Ωi . H Therefore, τˆi is a function of history hti to prescribe a state ωi ∈ Ωi based on transition function τi . We define our strategy σ ∗ , which is proved to be a sequential equilibrium. If hti ∈ Hˆ i (ωiC ) ∪ Hˆ i (ωiD ), then the strategy σi∗ is prescribed to follow σˆ i . For any history hti ∈ Hi , given the belief ψi∗ (hti ) and given that the opponent follows σˆ j , player i has a belief about the ˆ i (ωiE ), opponent’s continuation strategies. For any history hti ∈ H ∗ the strategy σi is prescribed to choose the best response continuation strategy to the probability distribution over continuation strategies of the opponent given the belief ψi∗ (hti ) and given that the strategy of the opponent is σˆ j .3 In our strategy σ ∗ , the stage-behavior (or continuation strategy) at history hti ∈ Hi (ωiE ) is not defined explicitly. Our strategy does not need explicit definition at hti ∈ Hi (ωiE ). The first reason for this is attributed to the construction of the belief ψ ∗ . At any history hti , the belief ψi∗ puts zero probability on the event htj ∈ Hj (ωjE ). It means that for any history hti , the definition of the strategy at htj ∈ Hj (ωjE ) has no effect when we consider the sequential rationality of player i. In addition, the event hti ∈ Hi (ωiE ) is off the path of σ ∗ . Hence, the definition at hti ∈ Hi (ωiE ) has no effect on the equilibrium payoffs. Furthermore, the sequential rationality of player i at history hti ∈ Hi (ωiE ) will be shown in Section 3.6 using the (implicit) definition of the strategy. First, we determine β1 , β2 . We choose βi so that each player j is indifferent between (Cj , 0) and (Dj , 1) when the player receives rj = Dj in the cooperation state. In the cooperation state, player j is certain that the state of player i is in the cooperation state ωiC . Let ρr (ri |rj ) be the conditional probability of ri given rj . Let us denote by Vj∗ the payoff for player j induced by strategy profile σ ∗ . Then, if player j receives rj = Dj and chooses (Cj , 0), the payoff is given by (1 − δ ) (1 − ρr (Di |Dj )βi ) − ρr (Di |Dj )βi ℓ + δ Vj∗ .

{

}

If player j receives rj = Dj and chooses (Dj , 1), the payoff is given by (1 − δ ) (1 − ρr (Di |Dj )βi )(1 + g) − λ + δ (1 − ρr (Di |Dj )βi )Vj∗ .

{

}

Thus, we have βi as follows.

βi =

1−δ

δ

1

ρr (Di |Dj ) Vj∗ +

g −λ 1−δ

δ

(g − ℓ)

.

Pr(ωj = ωjE , hˆ Ti ) + Pr(ωj ̸ = ωjE , hˆ Ti ) 1

<

ηη 1

1

η η + ρr (rj = Dj )βj (1 − η − η η )ηT −2

.

3 Some previous literature (e.g., Miyagawa et al. (2008) use the same technique to define strategies to avoid complex discussions regarding the sequential rationality off the path.)

52

Y. Hino / Mathematical Social Sciences 101 (2019) 47–53 ρr (D |C )

3.3. The payoff We derive Vi∗ . In the cooperation state ωiC , the strategy profile σ ∗ prescribes that player i plays (Ci , 0) with a positive probability irrespective of ri . Likewise, when player i chooses (Ci , 0) in the cooperation state, the state remains the cooperation state. Let ρr (rj ) be the marginal probability of rj . Payoff Vi∗ is given by Vi∗ = (1 − δ ) (1 − ρr (Dj )βj ) · 1 − ρr (Dj )βj ℓ + δ Vi∗ ,

{

}

or, Vi∗ = 1 − ρr (Dj )βj (1 + ℓ).

(1)

Hence, we have

βi =

1−δ

δ

g −λ

1

ρr (Di |Dj ) 1 − ρr (Di )βi (1 + ℓ) +

1−δ

Taking δ ≥ δ0 into account, we have

1−δ

δ

(g − ℓ)

δ

<

.

(2)

1−p−q 128(2+g +ℓ)2

ε′ .

Likewise, we have ρr (Di |Dj ) > 12 (1 − p − q). Therefore, we find that g −λ if βi = 23 1−δ , then the right-hand side of (2) is greater than δ ρ (D |D ) r

i

j

the left-hand side of (2). Conversely, if βi = 43 1−δ , then the δ ρr (Di |Dj ) left-hand side of (2) is greater than the right-hand side of (2). Therefore, βi has the following upper and lower bounds 21−δ 3

δ

g −λ

ρr (Di |Dj )

< βi <

g −λ

41−δ 3

δ

g −λ

ρr (Di |Dj )

.

Taking δ ∈ (δ0 , δ1 ) and λ ∈ (0, λ) into account, we find that βi is well defined for each i = 1, 2. g −λ , we have Vi∗ > 1 −ε Given βj is bounded above by 43 1−δ δ ρ (D |D ) r

by Eq. (1), δ > δ0 and λ < λ.

i

j

3.4. The incentive for action choice on the path We show that it is optimal for player i to follow σi∗ with respect to the action choice. By the construction of strategy σ ∗ , the following facts are trivial. (i) Player i strictly prefers (Di , 0) in the defection state because player i is certain that the opponent chooses Dj from then on. (ii) Player i is indifferent between (Ci , 0) and (Di , 1) when the player receives ri = Di in the cooperation state. We now show that player i prefers action Ci when receiving ri = Ci in the cooperation state. Let σ˜ i∗ be the best response to σj∗ . This means that σ˜ i∗ is the

best response when the player is certain ωj = ωjC . The payoff V˜ i∗ is the payoff for player i when players play (σ˜ i∗ , σj∗ ). By the

definition of σ˜ i∗ , we have V˜ i∗ ≥ Vi∗ . If player i chooses (Ci , 0), then the payoff is (1 − δ ) (1 − ρr (Dj |Ci )βj ) − ρr (Dj |Ci )βj ℓ + δ V˜ i∗ .

{

}

If player i receives ri = Ci and chooses (Di , 0) or (Di , 1), the payoff is bounded above by (1 − δ ) (1 − ρr (Dj |Ci )βj )(1 + g) + δ (1 − ρr (Dj |Ci )βj )V˜ i∗ .

{

}

Eq. (2) ensures the equality. Taking ρ (D j|Di ) > 1 and λ ∈ (0, λ) r j i into account, the last inequality we obtain holds. 3.5. The incentive for observation choice on the path Next, we consider the observational decision. If player i chooses Ci in the cooperation state, then player i is certain that the state of the opponent in the next period is the cooperation state. Therefore, player i never prefers (Ci , 1) in the cooperation state. Also, player i never prefers mi = 1 in the defection state because the state of the opponent in the next period is the defection state for certain. Therefore, all we have to prove is that player i prefers (Di , 1) to (Di , 0) in the cooperation state. If player i chooses (Di , 0), then the player cannot know the state of the opponent in the next period and then might choose suboptimal stage behavior in that period. This is the opportunity cost of choosing (Di , 0). We show that this opportunity cost is strictly greater than the observation cost λ. Suppose that player i receives signal rit and chooses (Di , 0) in the cooperation state in period t. Let s(rit ) ≡ ρr (rjt = Dj |rit ) be the belief of player i that the signal of the opponent in period t is rj = Dj given signal rit . Then, the player in the next period believes that the state of the opponent is the cooperation state with probability 1 − s(rit )βj (> 0) and the defection state with probability s(rit )βj . Let us consider the opportunity cost of choosing (Di , 0), which is the payoff loss from period t + 1 caused by choosing (Di , 0). Let us confine our attention to the loss from period t + 1 when player i receives rit +1 = Ci . If player i chooses action Ci , then player i must pay an opportunity cost of at least (1 − δ )ρr (Ci )(1 − s(rit )βj )ℓ given the opponent is in the defection state ωiD with probability 1 − s(rit )βj . Let us consider the opportunity cost when player i chooses (Di , 0) or (Di , 1) when receiving rit +1 = Ci . We already know that the lower bound of the opportunity cost choosing Di when ri = Ci is bounded below by (3). Let us denote ρr (Dj |Ci ) (g − λ) − g by Li . Player i must pay a positive opportunity ρ (D |D ) r

j

i

cost of at least ρr (Ci )s(rit )(1 −δ )Li when choosing Di in period t + 1. Now, let us consider the opportunity cost of choosing (Di , 0) in period t, bounded below by

{ } δ (1 − δ )ρr (Ci ) min (1 − s(rit )βj )ℓ, s(rit )βj Li . The value s(rit )βj Li is greater than (1 − s(rit )βj )(1 − δ )ℓ as follows. s(rit )βj Li − (1 − s(rit )(rit )βj )ℓ

≥ Li − ρr (Dj |Di )βj (Li + ℓ) 41−δ g −λ ≥ Li − (Li + ℓ) > 0. 3 δ ρr (Dj |Di ) The first inequality holds because the belief s(rit ) attains its minimum when player i receives rit = Di . The last equality follows δ +1 from δ ∈ (δ0 , 02 ) and λ ∈ (0, λ). Therefore, the opportunity cost of choosing (Di , 0) is bounded below by

The difference in the two payoffs is given by

δ (1 − δ )ρr (Ci )ρr (Dj |Di )βj ℓ.

1−δ δρr (Dj |Ci )βj V˜ i∗ + (g − ℓ) − (1 − δ )g . δ

Next, we show that (Di , 0) is suboptimal in the cooperation state. One of the sufficient conditions that (Di , 0) is suboptimal in the cooperation state is given by

{

}

The above value has a lower bound.

{ } 1−δ (g − ℓ) − (1 − δ )g δρr (Dj |Ci )βj V˜ i∗ + δ { } 1 − δ ∗ ≥δρr (Dj |Ci )βj Vi + (g − ℓ) − (1 − δ )g δ { } ρr (Dj |Ci ) =(1 − δ ) (g − λ) − g > 0. ρr (Dj |Di )

λ ≤ δρr (Ci )ρr (Dj |Di )βj ℓ.

(3)

The above condition is ensured by λ ∈ (0, λ). The above inequality shows the relationship between the discount factor δ and the observation cost λ because βi is a decreasing function of δ . When the discount factor δ is high, the incentive for player i to choose action Ci in the cooperation state is also high. To decrease the incentive and make player i indifferent to

Y. Hino / Mathematical Social Sciences 101 (2019) 47–53

choosing actions Ci and Di when player i receives ri = Di , βi must be small. However, making βi small decreases the incentive to observe the opponent. Therefore, the observation cost must be small when the discount factor is high. 3.6. The sequential rationality off the path In the error state, strategy σ ∗ prescribes the optimal continuation strategy given ψ ∗ and given that the opponent follows σˆ j . The difference between σj∗ and σˆ j arises only when htj ∈ Hj (ωjE ). By the construction of the belief, player i at any history is sure that htj ∈ / Hj (ωjE ). Hence, the best response strategy given that the opponent follows σj∗ coincides with one given that the opponent follows σˆ j . Therefore, it is optimal for player i to follow σ ∗ . Proposition 1 is proved. The next proposition extends Proposition 1 and provides the efficiency result for a sufficiently large discount factor. Proposition 2. Suppose that Assumptions 1 and 2 are satisfied. For any ε > 0, there exist δ ∈ (0, 1), λ > 0 such that for any discount factor δ ∈ (δ, 1) and for any λ ∈ (0, λ], there exists a sequential equilibrium whose payoff vector (V1∗ , V2∗ ) satisfies that Vi∗ ≥ 1 − ε for each i = 1, 2. Proof of Proposition 2. We fix the same δ0 and the same λ 2δ as in the proof of Proposition 1. We define δ ≡ δ +01 . Fix any 0 discount factor δ ∈ (δ, 1). Then, we can choose some integer n∗ δ0 +1 n∗ so that δ satisfies (δ0 , 2 ). We divide the repeated game into n∗ distinct repeated games. The first repeated game is played in period 1, n∗ + 1, 2n∗ + 1 . . . , the second repeated game is played in period 2, n∗ + 2, 2n∗ + 2 . . . , and so on. As each repeated game can be regarded as a repeated game with a dis∗ count factor δ n , strategy σ ∗ is a sequential equilibrium in each game. Therefore, this strategy is a sequential equilibrium in each game. Given the equilibrium payoff of each game is greater than 1 − ε , the equilibrium payoff of this strategy is greater than 1 − ε. □ 4. Concluding remarks 1. In our equilibrium, observation cost decreases efficiency. However, our results require a sufficiently small observation cost to avoid efficiency loss. However, Miyagawa et al. (2008), which likewise use a variant of the grim trigger strategy, show that this efficiency loss can be saved by a public randomization device (sunspot). They suppose that each player chooses an observational decision after choosing an action and the sunspot is realized immediately before the choice of the observational decision. That is, players can change their observational decision according to the realized sunspot. Players then have an incentive to choose the cooperative action even when the opponent might not observe the player at all for some realized sunspot. We may be able to apply this result to our model and thereby reduce the efficiency loss. 2. Our strategy is a sequential equilibrium using the monitoring structure in Flesch and Perea (2009), although they do not

53

include the game where players can choose only two kinds of actions. In our strategy, each player chooses a continuation strategy depending only on the previous action–observation pair on the equilibrium path. Therefore, even if players can acquire additional information about past action–observation pairs, players do not have an incentive to do so on the path. References Abreu, Dilip, Pearce, David, Stacchetti, Ennio, 1990. Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58 (5), 1041–1063. http://dx.doi.org/10.2307/2938299. Aoyagi, Masaki, 2005. Collusion through mediated communication in repeated games with imperfect private monitoring. http://dx.doi.org/10.2307/ 25055890. Ben-Porath, Elchanan, Kahneman, Michael, 2003. Communication in repeated games with costly monitoring. Games Econom. Behav. 44 (2), 227–250. http://dx.doi.org/10.1016/S0899-8256(03)00022-8. Compte, Olivier, 1998. Communication in repeated games with imperfect private monitoring. Econometrica 66 (3), 587–626. http://dx.doi.org/10.2307/ 2998576. Flesch, János, Perea, Andrés, 2009. Repeated games with voluntary information purchase. Games Econom. Behav. 66 (1), 126–145. http://dx.doi.org/10.1016/ j.geb.2008.04.015. Fudenberg, Drew, Levine, David K., 1994. Efficiency and observability with longrun and short-run players. J. Econom. Theory 62 (1), 103–135. http://dx.doi. org/10.1006/jeth.1994.1006. Fudenberg, Drew, Levine, David K., 2007. The Nash-threats folk theorem with communication and approximate common knowledge in two player games. J. Econom. Theory 132 (1), 461–473. http://dx.doi.org/10.1016/J.JET.2005.08. 006. Fudenberg, Drew, Levine, David K., Maskin, Eric, 1994. The folk theorem with imperfect public information. Econometrica 62 (5), 997–1039. http://dx.doi. org/10.2307/2951505. Kandori, Michihiro, Matsushima, Hitoshi, 1998. Private observation, communication and collusion. Econometrica 66 (3), 627–652. http://dx.doi.org/10.2307/ 2998577. Kandori, Michihiro, Obara, Ichiro, 2004. Endogenous monitoring. UCLA Economics Online Papers, 398. URL: http://www.econ.ucla.edu/people/papers/ Obara/Obara398.pdf. Lehrer, Ehud, Solan, Eilon, 2018. High frequency repeated games with costly. Theor. Econ. 13 (1), 87–113. http://dx.doi.org/10.3982/TE2627. Miyagawa, Eiichi, Miyahara, Yasuyuki, Sekiguchi, Tadashi, 2003. Repeated Games with Observation Costs. Columbia University Academic Commons, pp. 203–214. http://dx.doi.org/10.7916/D8VX0TRW. Miyagawa, Eiichi, Miyahara, Yasuyuki, Sekiguchi, Tadashi, 2008. The folk theorem for repeated games with observation costs. J. Econom. Theory 139 (1), 192–221. http://dx.doi.org/10.1016/j.jet.2007.04.001. Obara, Ichiro, 2009. Folk theorem with communication. J. Econom. Theory 144 (1), 120–134. http://dx.doi.org/10.1016/j.jet.2007.08.005. Rahman, David, 2012. But who will moniter the monitor. Amer. Econ. Rev. 102 (6), 41–50. http://dx.doi.org/10.1257/aer.102.6.2767. Rahman, David, Obara, Ichiro, 2010. Mediated partnerships. Econometrica 78 (1), 285–308. http://dx.doi.org/10.3982/ECTA6131. Sekiguchi, Tadashi, 1997. Efficiency in repeated prisoner’s dilemma with private monitoring. J. Econom. Theory 76 (2), 345–361. http://dx.doi.org/10.1006/ jeth.1997.2313. Sugaya, Takuo, 2011. Folk theorem in repeated games with private monitoring, Economic Theory Center Working Paper, 011. https://doi.org/10.2139/ssrn. 1789775. Sugaya, Takuo, Wolitzky, Alexander, 2016. Bounding equilibrium payoffs in repeated games with private monitoring. Theor. Econ. 12 (2), 691–729. http://dx.doi.org/10.3982/TE2270.