JOURNAL
OF ECONOMIC
THEORY
45, 32-52
Demons
(1988)
and Repentance*
PAUL S. SEGERSTROM Depurtmeni
Received
I$ Economics, Easl Lansing, October
Michigan Michigan
9, 1986; revised
Stare 48824
University,
May
15. 1987
This paper presents a new explanation for the stability of cartels. For a large class of repeated Cournot duopoly games with discounting, strategies are constructed which have the property that cheating on the cartel is followed by repentance. It is shown that these repentance strategies are subgame perfect equilibrium strategies and that in the presence of demons (infrequently irrational behavior on the part of both players), they lead to Pareto superior expected discounted payoffs in comparison with either Friedman’s trigger strategies or Abreu’s “stick and carrot” strategies. Journal C$ Economic Liferature Classification Numbers: 022, 026. ‘c’ 1988 Academic
Press, Inc
Accidents never happen Harry of Blondie
in a perfect
world....
I never
lie. I never
cry.-Deborah
I. INTRODUCTION
One of the major results of the supergame literature is that players can achieve cooperative outcomes in games that are structurally noncooperative (see Friedman [4] and Abreu [2]). In principle, firms in an industry can use Friedman’s or Abreu’s strategies to police, for example, joint profit maximizing behavior. With Friedman’s trigger strategies, cheating on the cartel agreement is deterred by the threat of reverting to a single-period Nash noncooperative equilibrium for the rest of time, once cheating has been detected. With Abreu’s “stick and carrot” strategies, cheating is deterred by an even harsher threat. This threat involves all the players (not just the cheater) punishing themselves severely for one period and then rewarding each other for going through with the punishment by reverting back to collusive behavior. Even though these strategies are subgame perfect Nash noncooperative * I thank Robert Axelrod, Carl Davidson, Joseph Harrington, Robert Porter, and especially James Friedman for helpful comments concerning this research in various stages of its development. The suggestions of an anonymous referee are also greatly appreciated. Of course, any errors that remain are my own responsibility.
32 OO22-0531/88
$3.00
Copyright iQ 1988 by Academic Press, Inc. All rights of reproduction in any form reserved.
DEMONS
AND
33
REPENTANCE
equilibrium strategies for discount parameters sufficiently close to one, it is only natural to have reservations about the credibility of such “grim” threats. After all, if a player cheats, then by following either of these strategies (in symmetric models), all the players suffer equally while the cheater is being punished. If a potential cheater has any doubts about whether or not the other players will actually carry out their punishment threat, that is going to affect each player’s incentive to cheat and weaken the cartel agreement. This paper explores the extent to which cheaters can be credibly punished without the punishers themselves suffering in the process. The “repentance” strategies that are introduced in this paper represent, I believe, a more attractive and plausible explanation for the stability of cartels. To facilitate comparison with Abreu [Z], I analyze the same symmetric repeated Cournot [3] duopoly games with discounting.’ The players are defined to behave collusively if they each produce half the output of an ordinary monopolist. Each player’s discount parameter is assumed to be sufficiently close to one that both Abreu’s and Friedman’s strategies (for enforcing this collusive behavior) are subgame perfect Nash noncooperative equilibrium strategies. In these circumstances, I show that repentance strategies can be constructed that have the following properties: (1) These strategies are also subgame perfect Nash noncooperative equilibrium strategies; (2) both players abide by the collusive output levels until cheating is detected; (3) once cheating has been detected, the cheater voluntarily “repents” by setting relatively low output levels in the next il 1 periods; (4) the entire punishment for cheating is carried out in these i periods, after which the players revert back to collusive behavior; and (5) during each of these i periods, the punisher receives higher single-period payoffs than the player which cheated. To better understand these repentance strategies, suppose that the players are about to make their period t output choices and that in period t - 1, one of the players cheated on the cartel agreement. The cheater is expected to repent in period t by setting a relatively low output level. The punisher sets a relatively high (single-period best reply) output level in period t, so as to maximize his single-period payoff if repentance occurs and also to reduce the cheater’s single-period incentive to not repent. If the cheater does not repent in period t, then before the period t + 1 output decisions are made, both players know that the cheater has failed to repent. The punisher chooses the same relatively high output level in period t + 1 and the cheater is given a second chance to repent. This process goes on until the cheater has repented in a predetermined manner (this might ’ Actually Abreu from 2 to n players
analyzed the n player is straightforward.
case, but the extension
of the analysis
in this paper
34
PAUL
S. SEGERSTROM
involve i> 1 periods of repentance, depending on the structure of the single-period game being repeated and on the harshness of the punishment inflicted on the cheater). Then both players revert back to their collusive output levels. The repentance strategies are constructed so that it is in neither player’s self-interest to cheat on the cartel agreement but if cheating does occur (one of the players behaves irrationally in some period t), it is in the selfinterest of the cheater to repent in the predetermined manner immediately (in periods t + 1 through t + i). The repentance behavior is constructed so that, subsequent to cheating on the cartel agreement, the cheater’s discounted payoff is greater than it would have been if either Friedman’s or Abreu’s strategies were used. But more importantly, the punisher benefits from the use of these strategies because not only are collusive payoffs restored after i periods, but during these i periods, the punisher gets higher than single-period Cournot-Nash equilibrium payoffs. In fact, if i> 1, then during the first i= i- 1 periods of the punishment the cheater sets an output level of zero and the punisher receives monopoly payoffs. By adopting these repentance strategies, the players are forgiving but repentant behavior must precede forgiveness. In contrast, with Friedman’s trigger strategies, there is no forgiveness for cheating on the cartel agreement. The players abandon any hope of a return to collusive behavior and revert to noncooperative Cournot-Nash behavior for the rest of the time. With Abreu’s “stick and carrot” strategies, the cheater is presumed to resist punishment and like a Victorian schoolmaster dealing with a recalcitrant student, the punisher must tight to impose punishment on the cheater, The reward of future collusive payoffs is enough to induce the punisher to endure the present pain associated with fighting the cheater. The important insight of this paper is that the reward of future collusive payoffs (the “carrot”) can be used to induce the cheater to meekly accept punishment, reducing the payoff losses to the punisher. This is the distinguishing feature of repentance strategies. It is important to remember, however, that in equilibrium, cheating never occurs when the players are adopting any of these strategies. The punishment threats are never carried out and the players’ repeated game payoffs are the same regardless of which strategies they are using. Thus the question naturally arises, just as long as the punishment is sufficiently harsh to deter the crime, what difference does it make how cheaters are punished? Alternatively stated, why would players find it more desirable to use repentance strategies to enforce collusive behavior than either Friedman’s or Abreu’s strategies? To answer these questions, it is helpful to consider a simple game in extensive form due to Rosenthal [S]. See Fig. 1. The outcomes are assumed to be be expressed in U.S. dollars and .Y and .r are dollar values known to
DEMONS
AND
REPENTANCE (1 miiliafl,i)
(O,O)
\/
2
(8. Y)
\/
35
1 FIGURE
I
both players. If player 1 chooses Left, he receives x dollars and player 2 receives y dollars; if player 1 chooses Right, then the outcome is either (0,O) or (1 million, 1) depending on player 2’s choice. If the game is played only once and x is less than 1 million dollars, then the only subgame perfect Nash noncooperative equilibrium is for player 1 to choose Right and for player 2 to choose Right. But does this represent reasonable behavior of reasonable people? Suppose x = $990,000. Would not player 1 choose Left and guarantee himself $990,000 rather than choose Right and risk the possibility that player 2 might behave irrationally and choose Left, in which case player 1 gets nothing at all? My guess is that most people, if put in the position of being player 1 in this game, would choose Left. As Rosenthal puts it, “when the stakes are su~ciently high, people will usually not act as though the hypothesis that other people are utility maximizers is absolutely dependable.” When it comes to firms policing a cartel agreement, the stakes are high. Rosenthal argued that more reasonable behavior on the part of each player (or firm) would be to assign a positive subjective probability to irrational behavior on the part of the other player in each time period and to optimize given this subjective probability. There is an important bounded rationality lesson to be learned from this example. When it comes to choosing among the various strategies available for policing a cartel agreement, reasonable players expect cheating on the cartel agreement to occur once in a while, even if the strategies adopted are such that neither player has any (dynamic) incentive to cheat (the strategies are subgame perfect Nash noncoo~rative equilibrium strategies). Thus for reasonable players, in assessing the desirability of various strategies, the harshness of the punishment for cheating is going to be important, because with some positive probability, this threat of punishment will have to be carried out. In this paper, I use the concept of an s-demon to explore the implications of this bounded rationality lesson. A player is said to be possessed by an E-demon relative to a strategy profile for a repeated Cournot duopoly game if, in each time period, with positive probability E, this player deviates from the output choice given by his strategy and
36
PAUL
S. SEGERSTROM
chooses instead a single-period best reply to the output choice given by the other player’s strategy. I am only interested in infrequently irrational behavior so I only consider E in an arbitrarily small neighborhood of zero. Relative to this neighborhood, I show that when both players are possessed by e-demons, repentance strategies are demon-superior to either Friedman’s trigger strategies or Abreu’s “stick and carrot” strategies; that is, repentance strategies give rise to Pareto superior expected discounted payoffs relative to these two alternatives. Thus in a world of infrequently irrational behavior, reasonable players would prefer to police a cartel agreement using repentance strategies rather than the strategies of either Friedman or Abreu. I now proceed to the formal analysis. Section 2 provides notation, definitions, assumptions, and some useful preliminary results. In Section 3, I show that appropriately constructed repentance strategies are subgame perfect equilibrium strategies. In Section 4, this result is illustrated using a repeated Cournot duopoly model with linear demand. Section 5 contains the demons analysis and finally, in Section 6, the robustness of the conclusions of Section 3 is examined when each player’s discount parameter is relatively low.
2.
NOTATION,
DEFINITIONS,
ASSUMPTIONS
AND
SOME
PRELIMINARY
RESULTS
Let G=(S,, Sz, x1, x2) denote a single-period, simultaneous noncooperative game with two players. 2 S; is a pure strategy set for player i and has typical element qi. S- S, x S2. xi: S+ R is the ith player’s payoff function. R denotes the real line. In this paper, to facilitate comparison with Abreu [2], and to allow for as simple an expression of the main ideas as possible, I will assume that G is the Cournot duopoly model, where the players are firms; qi is the output level of firm i; and zi(qlt q2) is the single-period profit function for firm i. The detailed assumptions about the game G are: Al. For i= 1, 2, Si = R,. A2. For i= 1, 2, n,(ql, q2)= (p(ql i-q,)-c)q,, where the industry inverse demand function p(q) is continuous, differentiable for all q > 0 such that p(q) > 0, with p’(q) < 0, lim, _ X p(q) = 0, and p(0) > c > 0. For i, j = 1, 2, i # j, let q,f+(q,) be a single-period best response to qj 2 0, i.e., qT(qj) satisfies n,(q,?(q,), qi) 2 ni(qi, q,) for all 6, 2 0. 2 See Friedman applications.
[S]
for
a good
introduction
to the
theory
of games
and
economic
DEMONS
AND
37
REPENTANCE
A3. For i, j = 1, 2, i # j, qF(qi) is a well-defined, unique, continuous, nonincreasing function. 29” is the monopol~~ output leoel, i.e., qm satisfies n,(2q”, 0) 2 n,(q, 0) for all q 2 0. Firms are defined to behave collusively if they choose (q”, qm) E S.
A4. qm is unique, strictly positive, n,(q, q) declines monotonically as output q = q, = q2 increases beyond qm or falls below qm and qy(q”) > qm, i= 1. 2. Assumption Al states that firms can choose any non-negative output levels. Assumption A2 states that the Cournot duopolists have the same constant marginal cost c and face a downward-sloping industry demand curve which cuts the price axis and either cuts or is asymptotic to the quantity axis. Assumption A3 states that as the output level of firm j increases, firm is single-period best response output level must not also increase. Assumption A4 guarantees that symmetric joint profit maximizing behavior is uniquely defined. Furthermore each firm always has a singleperiod incentive to cheat on this collusive behavior by expanding its output level. Assumptions Al-A4 in this paper correspond to assumptions Al-A5, A7, and A8 in Abreu [2]. The following lemma is slightly stronger than assumption A6 in Abreu [2].
LEMMA 1. Given Al-A4, G has exactly one Cournot-Nash (qFN, qyN) and it is symmetric (qCN = qFN = qFN).
equilibrium
ProoJ: Let @>O satisfy p(q)=c. Since qf(O)=2q” >O and q:(Q) = 0 < 4, the continuity of qT(q2) assures that q:( .) has a fixed point in the interval [0, 41. Because qT( .) is nonincreasing such a fixed point is unique. Thus we have a unique symmetric equilibrium since q:( .) = q:(. ). equilibrium. Using the Suppose 0 < qFN < q2CN for some Cournot-Nash differentiability of p(q), qyN = q:(qyN) and qFN = q:(qfN) imply p’(qYN + qy)qy
+ p(qy
+ qy)
- c= 0
p’(qFN + qy)qy
+ p(qFN + qYN) - c = 0.
and
38
PAUL
S. SEGERSTROM
Since p’(q) < 0, we get a contradiction.
All equilibria
must be symmetric. Q.E.D.
The Cournot duopolists play an infinitely repeated game with discounting. Let G”(U) be the repeated game obtained by repeating G infinitely often. cz is the discount parameter3 for each player; it is assumed that c(E (0, 1). oi denotes a pure strategy for player i. It is a sequence of functions a;( 1 ), a;(2), o;(3), .... o,(t), ... one for each time period t; the function for time period t determines player is output choice at t as a function of the output choices of all players in all previous periods.4 Formally o,( 1) E S, and for t = 2, 3, .... o,(t): S’- i -+ Si, where H(t) = (q(l), q(2), .... q(t - 1)) E S’-- ’ denotes the history of the repeated game at time t >=2 and q(t) 3 (q,(t), q?(t)). Z’, denotes player r’s strategy set for G X (IX). Z = Z:, x C, is the set of strategy profiles, and a generic element of C is 0 = (a], az). A stream of output choices {q(t)}:=, is referred to as an output path or punishment and is denoted by Q. 52 = S” is the set of output paths. Any strategy profile GE Z generates an output path denoted Q(a) = {4(~)(t),= 1.2.3...; and is defined inductively as
q(o)(t)
= 4t)(q(a)flL
..-2q(fJNt - 1)).
Vi: 52 -+ R defines the ith player’s payoff from an output path. For Q = { q(t)}Z 1 E Q, vi(Q) = C,“= l a’- ‘nc,(q(t)). 8,: C + R is the ith player’s payofffunction. r,(a) = Vi(Q(a)). CJE Z is a Nash [7] noncooperative equilibrium if, for all i, j = 1, 2, i # j, B,(a,, a,) 2 ii,(gl, aj) for all 0; E Zi. The strategy profiles that are examined in this paper all share a common feature; they are all “simple” strategy profiles. Let Qi = {q’(t)};“=, E 1;2, i = 0, 1, 2. A simple strategy profile (T = w(Q’, Q’, Q’) is defined as follows: Let CJ(1) = q’(l), p( 1) = 0, and p( 1) = 1. For t = 2, 3, .... the functions a(t): ,‘-l-b,, p(t): S’-’ -+ (0, 1,2), P(t): S’- i --, 2, and the correspondence y(t): S’- ’ + N are defined inductively by ’ The discount parameter is inversely related to the interest rate. 4 The assumption that the repeated game is a game of perfect recall will turn out to be critical to the construction of repentance strategies. If a player cheats on the cartel agreement at time 1. then at time r+ 1, both players know that cheating occurred at time I and exactly who cheated.
DEMONS
AND
39
REPENTANCE
if
y( t)(H( t)) is a singleton
if
y(t)(H(l))=
PL(fl)(M-1)) otherwise P(f)(ff(t)) = P(fl
(1, 2)
if y(t)(H(t)) Z Izr l)(H(t-
l))+
1
otherwise
and
Stated less formally, the simple strategy profile 0 = o(Q’, Q’, Q’) specifies that (i) the players play Q” until deviation from Q” occurs; (ii) if the jth player deviated singly from Qi, where Qi (i = 0, 1, 2) is an ongoing previously specified output path, then the players play Q’ until deviation from Qj occurs; and (iii) if both players deviate from Qi, where Q’ (i= 0, 1, 2) is an ongoing previously specified output path, then the players play Q” until deviation from Q” occurs. Thus the players’ strategies are completely specified by specifying three output paths. Using Lemma 1, Friedman’s trigger strategies are defined by the simple strategy profile eF = w(Q’, Q’, Q’), where PO=
((q”, q”,):,
(1)
and Q’=Q’-
{(qCN,qCN)};?,.
(2)
Let rrm = xl(qm, qm), rrCh= sup,,, x,(q, qm) and rcCN= x,(qCN, qCN). Of by Lemma 1 and A4. Then course 7rCh> 7rm by A4 and ?“>I?~ Friedman’s trigger strategies are subgame perfect equilibrium strategies if and only if nm -2nch+l-C-This reduces to a condition LEMMA
2 (Friedman
Friedman’s trigger and only if
(3)
on the discount parameter c(:
[4]).
strategies
mCN 1 --cI’
For any game G”(a) satisfying Al-A4, are subgame perfect equilibrium strategies ij
(4)
40
PAUL
S. SEGERSTROM
Abreu [2] showed that, for repeated games G”(a) satisfying Al-A4, subgame perfect equilibrium strategies involving symmetric punishment for cheating can always be constructed which involve punishments harsher than Friedman’s Cournot-Nash punishment. Furthermore Abreu showed that the harshest symmetric punishment can be achieved with the following simple structure: both players follow a symmetric output path characterized by a “stick” in the first period (a high output level 4 for each player) followed by a “carrot” from the second period on (the most collusive beavior that can be supported by subgame perfect equilibrium strategies involving symmetric punishments for cheating). Since I am interested in strategies which support the collusive output path ((q”, qm) >F:, as a subgame perfect equilibrium output path, as a sufficient condition (see Lemma 2), I will assume that the repeated game G”(a) satisfies A5.
c(> aF, where a’ = (rrCh- rrm)/(rcCh-xc”).
This assumption about the extent to which players value future payoffs is made so that the repentance strategies constructed in the next section can be compared with both Friedman’s and Abreu’s strategies. However, since Abreu’s “stick and carrot” strategies can potentially be used to support {(q”, q”))r”_ , as a subgame perfect equilibrium outcome path for some discount parameters below CI~, assumption A5 will be relaxed in the last section. It remains to formally define Abreu’s “stick and carrot” strategies. For any game G’=(U) satisfying AllA5, Abreu [2] showed that there exists a largest “stick” 4 > 0 such that both
and n,(sl*(i),
ci=rP
4) + an,(4, $) +=
hold.
(6)
Given 4, Ahreu’s “stick and carrot” strategies are defined by the simple strategy profile cA = o(QO, Q’, Q2), where Q” is given by Eq. (1) and
Q’=Q== ((4,~),=,,(q”,qm),=2,3.4 .._. 1. As defined, these strategies are subgame perfect and have the property that
DEMONS
That is, Abreu’s punishment punishment for cheating.
3.
AND
41
REPENTANCE
for cheating
REPENTANCE
is harsher than Friedman’s
STRATEGIES
Let B = {h > 0 1Eqs. (9) and (10) both hold},
where these equations are
+7rn’-7rCN)>b 1-a
(9)
and b,nch+XmCN-,p. LY
(10)
In the Appendix (Lemma A) it is shown that B # 4 for any repeated game G”(U) satisfying Al-A5. For any repeated game G’m(a) satisfying Al-A5 and any b E B, I will construct repentance strategies which support the collusive output path as a subgame perfect equilibrium output path. Since b can take on a range of values, there exists a range of repentance strategies. Each b E B represents a different degree to which the players are punished for deviating from collusive behavior and each of these punishments is milder than Friedman’s Cournot-Nash punishment. Fix bE B. First, I will consider the case where b 5 rcCN. Let qR = inf,, (q/x,(q, q;(q))= 7cCN- b$. By A2 and A3, xr,(qR, q;(qR))
= nCN - b.
(11)
For this case, repentance strategies take the form of a simple strategy profile oR = o(Q”, Q’, Q’), where Q” is given by Eq. (1 ),
Q’= {(qR~q~(qR)L~
(qm,qmL=w....>
(12)
Q2= Uql*(qR),qRLt
(qm~qmL2.d~
(13)
and
I claim that these strategies are subgame perfect equilibrium strategies. This reduces to showing5 that there is no incentive for player 1 to cheat on the output path Q” given the punishment path Q1 will follow, that is, (14) T Subgame perfection would hold for these strategies even if the strict inequalities in Eqs. (14). (15), and (16) were changed to weak inequalities. However, the bounded rationality results of Section 5 (Theorems 2 and 3) would then cease to hold.
42
PAUL
S. SEGERSTROM
there is no incentive for player 1 to cheat on the punishment this punishment path will be reimposed, that is,
path Q’ given
(15) and there is no incentive for player 2 to cheat on Q’ given the punishment path Q’ will follow, that is,
(16)
Equation (14) follows from Eq. (10) and the definition (9) implies that
of qR. Equation
(17)
By the definition of qR, qR < qCN. By A3, q:(qR) ZqCN. Thus ~lh%mR))~ 42*(qR)) s nCN. Substituting this and Eq. ( 11) into Eq. ( 17), Eq. (15) follows. Finally, note that qR < qCN implies that n,(q:(qR), qR) < rcCN < rtm. Equation ( 16) immediately follows. Therefore subgame perfection holds for these repentance strategies when h 5 rcCN and b E B. Now take the case where b > nCN and b E B. I claim that I can always choose a unique integer iz 1 such that cl
I+
lnrn
1-M
5nCN-b+
WP
&7m
(18)
l-a<=.
The right-hand inequality holds for i= 1. Since lim,, the claim is verified since Eq. (9) implies that
=, (CC” ‘/( 1 -a)) = 0,
rCN-b+ ->->o. u7cm 7cCN 1-a
Given b > rrCN and qR E [0, qm) so that
7>= 1 satisfying
(19)
l-cc
Eq. ( 18). clearly i+
rrCN-b+f$=c&,(qR,
24”-qR)+k.
I can redefine
lnm
(20)
DEMONS
AND
43
REPENTANCE
I will now use h, i, and qR to construct repentance strategies which are subgame perfect equilibrium strategies. These repentance strategies are defined by the simple strategy profile cR = w( Q”, Q’, Q’), where Q” is given by Eq. (1) Q’=
j(O,%“),=,,z
(21)
...... ,(qR,2qm-qR),=,+,,(qm,qm),=i+2,i+3...)
and Q2=((2qmJV,=~.2
For
i, .i=
,.., i7(2qm-qR,qR)r=i+~,(qm,qm),=~+2,i+3
1,
,... 1.
2, i #.i, let n,*(qi, q,) = n,(q,Yq,),
(22)
qj).
To show that these repentance strategies are subgame perfect, it suffices to show that there is no incentive for player 1 to cheat on the output path Q” given that the punishment path Q’ will follow, that is, 7rm >a’h+CcR,(0,2qm)(l-ai) l-cc l-cc a I+ +Cd+’
n,(qR,
2$n
(23)
2q”-qR)+=,
there is no incentive for player 1 to cheat on the punishment that the punishment path Q’ will be reimposed, that is, r+lnm
VR~~,(o~2q”)(l-d
+ ccfn,(qR, 2qm-qR)+&
l-cc ‘n:(o,
path Q’ given
(24)
2q”) + cfVR,
x,(0, 2q”)(l
-a’)
I-Ci
>7(1*(0,2qrn)+aVR,
r+lnm
(25 1
+z’n,(qR,2q’“-qR)+k
t=i-l,i-2
)...) 1,
and (26)
n,(q’,2q”-qR)+~>I(:(qR,2qm-qR)+IYR,
and there is no incentive for player 2 to cheat on the output path Q’ given that the punishment path Q2 will follow, that is, cIr+ 772(0,
2q”)(
1-U
1 -
a)!
Inn”
+ cc’n,(qR, 2q” - qR I+ G
> ngo, 2q”) + ccVR,
t = t, t - 1) .... 1)
(27)
44
PAULS.SEGERSTROM
and n2(qR, 2q” - qR) + g
> x2*(qR, 2q” - qR) + aVR.
(28)
Equation (23) follows from Eqs. (9) and (20). Since q:(O)=2q” and qf(qCN) = qCN > 0, A3 implies that 2q” 2 qCN. Therefore rcT(O, 2q”) 5 7-cCN= n:(qCN, qCN). Using this result, Eq. (24) follows from Eqs. (9) and (20). For t= i- 1, i- 2, .... 1, the left-hand side of Eq. (25) is larger than the left-hand side of Eq. (24). Therefore Eq. (25) trivially holds for t = i- 1, i- 2, .... 1. Equation (26) follows immediately from Eq. (25) with f = 1 and Lemma B in the Appendix. Equation (27) follows immediately from the obvious observations that 7c2(0,2q”)=n~(0, 2q”) = 2~” > n,(O, 2q”) = 0, z2(qR, 2q” - qR) > n,(O, 2q”), and 7tm > n,(qR, 2q” - qR). Finally, Eq. (28) follows from Eq. (23) and Lemma B in the Appendix. Thus I can conclude with the following theorem: THEOREM 1. For any game G’“(a) satisfying Al-A5 and for any b E B # 0, there exist repentance strategies which are subgame perfect equilibrium strategies and these strategies support the collusive output path Q”= {(q”, q”L= I.*.3.,,,). Let ig { 0, 1, 2, 3, ...} be defined by c11+1$Il
l-r
s7tnCN-b+
mm -<-.
l-cc
dnrn
1-a
If i>O, then these repentance strategies take the form of a profile dQ”, Q’, Q’), w here Eqs. (21), (22), and (20) define respectively. If i= 0, then these repentance strategies take simple strategy profile w( Q”, Q’, Q’), where Eqs. (12), (13), Q’, Q’, and qR, respectively.
(29) simple strategy Q’, Q2, and qR, the form of a and (11) define
4. AN ILLUSTRATION USING LINEAR DEMAND
Suppose that demand is linear, that is, 0 5 q < aJb alb 5 q,
where a, b > 0 and a > c. Straightforward hold and A5 as well if tl > 9/17. Furthermore, b E B to be sufficiently small, i= 0, that period, after which the players revert back a > 8/9, by choosing b E B to be sufficiently
(30)
calculations show that Al-A4 for any tl > 9/17, by choosing is, repentance takes only one to collusive behavior. For any large, i> 0, that is, repentance
DEMONS
AND
REPENTANCE
45
takes more than one time period. Thus for this numerical example, in the construction of subgame perfect equilibrium repentance strategies, both the i= 0 and i> 0 cases are of importance.
5. THE DEMONS
In this section, I am motivated by the bounded rationality discussion in Section 1, and therefore I examine the performance of various subgame perfect equilibrium strategies when the players are infrequently irrational. To make the concepts of “infrequently irrational behavior” and “performance” precise, I need to define some new concepts. Throughout this section, I will assume that the repeated game G”(N) under consideration satisfies AlPA5. Fix E such that 0
46
PAUL
S. SEGERSTROM
both players are possessed by s-demons relative to (a,, a,) EC. Finally, let (~~1H,,j be player is induced strategy on the subgame defined by the history H(t). For i, j= 1, 2, i # j, (oi, CT,)EC is demon-superior to (wi, M;) EC if there exists E between zero and one such that for all E satisfying O
and (32)
In keeping with the symmetric structure of the repeated Cournot duopoly game, I consider the case where both players are possessed by s-demons, that is, both players behave irrationally in each time period t with probability E. Even though both players are adopting strategies which support collusive behavior, neither player is completely dependable in abiding by the cartel agreement even though it is in each player’s self-interest to be completely dependable. Since I am only interested in infrequently irrational behavior, I only consider E in a positive neighborhood of zero. The demon-superior definition provides me with a criterion for ranking alternative subgame perfect equilibrium strategies which support the cartel agreement. Condition (31) in the demon-superior definition is slightly stronger than subgame perfection. For any subgame, player I’s induced strategy on that subgame must be a best reply to the “behavior” of the other player, given that the other player is possessed by an s-demon (condition (31) with E= 0 corresponds to subgame perfection). Thus each player is choosing a strategy which is a best reply to the strategy of the other player, taking into account the infrequently irrational behavior on the part of the other player. Condition (32) in the demon-superior definition states that the demon-superior strategy profile gives both players higher expected discounted payoffs when both players are possessed by s-demons. Applied to strategies which support collusive behavior, in a world of infrequently irrational behavior, both players would find strategy profile A to be a more attractive means of policing the cartel agreement than strategy profile B if A was demon-superior to B. I will now show that repentance strategies are demon-superior to Friedman’s trigger strategies. For any b E B, let R(b) denote repentance strategies constructed from b and let F denote Friedman’s trigger strategies. Let V:(F) denote the expected discounted payoff to player i when both players are adopting Friedman’s trigger strategies, the other player is possessed by an s-demon, and the history of the game is such that the players are about
DEMONS
AND
47
REPENTANCE
to begin the kth period of the punishment path Q’. Let V&(F), Vi(R(6)), and V&(R(6)) be defined accordingly. I will first show that V,,(F) is continuous in E10. (1 -E)7Cm+En~(qm,
V,,(F)=
Rearranging
q:(q”))+cr(l
-E)
V,,(F)+cre~.
(33)
terms, I get that
J,,7(fQ le
-E)71m+&~l(qm,q2*(qm))+ae(71CN/(1 1 -cl(l-E)
-CC))
(34)
The conclusion follows. Similar calculations yield that Vi,(R(b)), Vk,(R(b)), V,,,(R(b)), V,,(R(b)), Vi,(F), V,,(F), V,(F), and V$$F) are all continuous in E for i= 1, 2, k = 1, 2, 3, .... and 1 2 E2 0. The proofs are left to the reader. Suppose that player 2 is possessed by an s-demon relative to Friedman’s trigger strategies and that player 1 deviates in the first period from his trigger strategy by adopting the cheating output level qch =qf(q”) #q”. Since player 2 is going to revert to output choices qCN in all future periods (with probability 1 -E), player 1 has nothing better than to choose output YCN also in all future periods. Thus player l’s cheating payoff is cm
= (1-E)
%kfh, qrn) + m,(qCh, 42*(4”))
+a(1 -E) g
+ M&V,,(F).
(35)
By the continuity of V,,(F) and A5 there exists E, >O such that for E satisfying 0 < E< E, , V,,(F) > V;:(F) and therefore condition (31) holds for Friedman’s trigger strategies. Using similar reasoning and the strict inequalities in Eqs. (14), (1.5), (16), (23), (24), (25), (26), (27), and (28), it follows that for any b E B, there exists EZ(b) > 0 such that, for E satisfying 0
+ &( 1 - &)7P nCN
+ E2~i(4*(qm), qi*(q”)) + a(1 - E)2 V,,,(F) + c((1 - E)E 1-Ci +C(E(l-E)
(36)
48
PAUL S.SEGERSTROM
and for b EB, 1/,,(R(b))=(1-~)2~m+(1-~)~n,(q,=qm,q~(qm))+~(1-~)nch +
E2.rri(q*(4m),
X
Vim(R(b))
qy(qrn))
+
~(1
+
-&I
tl(
l
-E)2
EGe(R(b))
+ CIE(1 -E) V!,,(R(b)) + C(E’P’i,,(R(b)).
(37)
To show that condition (32) in the demon-superior definition holds, that is, there exists E3(b)> 0 such that for E satisfying 0 V,,(F), it suffices to show that for E satisfying 0
+ GSWN.
(38)
But both Vi,,(R(b)) and T/:,&R(b)) are continuous in E and V:,,(R(b))> Vf,,(R(b))>nCN/(l -a) when E=0 by Eqs. (ll), (19), and (20). Thus Eq. (38) holds and by choosing C(b) = min(e,(b), E3(b)), I get THEOREM 2. For any game G=(U) satisfying Al-A5 and for any b E B # a, the repentance strategies constructed from b are demon-superior to Friedman’s trigger strategies.
Since Abreu’s “stick and carrot” strategies provide a punishment for cheating even harsher than Friedman’s trigger strategies (Eq. (B)), the following theorem should not be surprising: THEOREM 3. For any game G” (tx) satisfying AlLA5, and for any b E B, both Friedman’s trigger strategies and the repentance strategies constructed from b are demon-superior to Abreu’s “stick and trigger” strategies.
Since the proof of Theorem 3 parallels so closely the proof of Theorem 2, it is left to the reader. The repentance strategies defined in Section 3 were carefully constructed so that one-shot deviations by any player, given any past history, always strictly lower that player’s discounted payoff. It is this property that guarantees that these strategies are best replies to each other even when the players take into account their infrequent irrationality. If I had instead solved for subgame perfect equilibrium strategies which support collusive behavior and, for example, maximize the discounted payoff of the punisher subsequent to a deviation (by the other player) from collusive behavior, these strategies would not have satisfied Eq. (31) and therefore would not be demon-superior to either Friedman’s or Abreu’s strategies. Finally, although I have not explicitly modelled the bargaining game
49
DEMONS AND REPENTANCE
between players over what strategies to use to enforce collusive behavior, presumably their subjective probabilities (or beliefs) concerning irrational behavior would play a critical role in determining the outcome of this bargaining game. For this reason, I have not attempted to identify a single repentance strategy (and a corresponding b E B) as being “best.”
6. Low DISCOUNT PARAMETERS AND REPENTANCE
One of the virtues of Abreu’s “stick and carrot” strategies is that they can be used to support collusive behavior for some discount parameters tl < rzF because Abreu’s punishment for cheating is harsher than Friedman’s Cournot-Nash punishment for cheating (Eq. (8)). For a repeated Cournot duopoly game with discounting Cm(a) satisfying Al-A4, let ~1~denote the lowest discount parameter for which Abreu’s strategies support collusive behavior. aA =
inf (Z.YlC.4
CI ’
where +an,(q,q)+-
a7cm and l-a
When ~1’ > a 2 aA, it is still possible to construct repentance strategies which support collusive behavior as a subgame perfect equilibrium output path and have the property that the punisher (firm i) enjoys the punishment path Qj (j# i) more than the cheater (firm j). However, it is no longer necessarily the case that the punisher earns higher than Cournot equilibrium profits while the cheater is being punished. In fact, the punisher may have to be content with negative profits while the cheater is being punished. An example will suffice to illustrate this point. Consider a repeated Cournot duopoly game with discounting where the inverse demand function is
and each firm has the same constant marginal cost c = 0.001 and the same discount parameter a E (0, 1). For this example, it is easily verified that qm= 0.2497, 7~~= 0.1247, qCN= 0.333, ncN =O.llOS, q:(q”) = 0.3746,
50
PAUL
S. SEGERSTROM
n ch= 0.1403, uF = 0.5294, and aA = 0.1111. When c(= aA = 0.1111, Abreu’s “stick and carrot” strategies for supporting collusive behavior are given by the simple strategy profile aA = w(Q”, Q’, Q2), where 4 = 15.5937. In order for the symmetric punishment for cheating to be sulIiciently harsh to deter cheating, in the first period of the punishment path, each firm must set an outrageously high output level (15.5937). This is because the marginal cost of each firm is so low (0.001) that each firm must produce a very large output in order to earn sufficiently negative profits (n,(G, 4) = -0.0155). The equilibrium discounted profits that each firm earns subsequent to cheating is the harshest imaginable (n,(g, 4) + aAn”/(l - aA) = 0). Thus, for this example, there do not exist any subgame perfect equilibrium strategies that support collusive behavior for a < aA. As an alternative to adopting Abreu’s “stick and carrot” strategies when a=aA, the players could adopt the repentance strategies given by the simple strategy profile w(Q”. Q’, Q’), where Q”= ((q”, qm),=,.2.3 ,...>, Q’=
{(qR, qP)r= 1, (q”,
qm),=2,3
Q*=
{(q',
qm)t=z,3...
qR),=,,
(q",
,... >,
qm =0.2497,
>,
qR = 15.5937, and qp =0.999. The punisher produces enough in the first period of the punishment path (qp = 0.999) to guarantee that the cheater cannot make positive profits in that period. Given qp the cheater has nothing better to do than “repent” by setting the high output level qR = 15.5937. Th ese strategies are subgame perfect and the punisher enjoys the punishment path more than the cheater (xl(qp, qR) = -0.000999 > X,(qR, qp) = -0.0155). However, the punisher hardly “enjoys” punishing the cheater since negative profits are earned in the first period of the punishment path. This is unavoidable because the punisher must produce an output of at least 0.999 in the first period of the punishment path in order to guarantee that the cheater’s discounted punishment payoff is nonpositive. Thus the results of Section 3 must be modified when assumption A5 is relaxed.
APPENDIX
LEMMA A.
For any game G”(a)
satisfying
AI-A5
DEMONSAND
Assumption
Proof:
A5 implies that 7P ->7cCh+I-r. 1-N
Rearranging
51
REPENTANCE
cmCN
terms we get CN
Dividing
both sides by ~1,we get
The conclusion follows since rrm > rrCN by A4. LEMMA
-ni(qi,
Q.E.D.
B. Given Al-A4, for i, j= 1,2, i# j, nT(qiy qj =2q” -qi) qj = 2q” - q,) is nonincreasing in qi for qi E [IO, Q”].
Lemma B states that the single-period return to cheating for a firm does not increase as the firm’s share of monopoly profits increases. Proo$ First, I will show that qT(2q” - ql) > q, for all q, < 2q”. By A2 and the definition of qm, dnt(q,
9 42
=
%”
-
41)
=P’(qt +2qm-q,)q,
+p(q1 +2q”-q,)-c
841
’ P’(2qrn)(qt + 2q” -q,)
The first conclusion Abreu [a].
+ p(2q”) - c = 0.
follows. Lemma B now follows from Lemma
26 in Q.E.D.
REFERENCES 1. D. ABREU, “Inlinitely Repeated Games with Discounting: A General Paper, Harvard University, September 1984. 2. D. ABREU. Extremal equilibria of oligopolistic supergames, J. Icon. 191-225.
Theory,”
Working
7’heor.v
39 (1986),
52
PAUL S. SEGERSTROM
3. A. COURNOT, “Researches into the Mathematical Principles of the Theory of Wealth” (Nathaniel T. Bacon, Trans.), Macmillan Co., New York/London, 1897. 4. J. FRIEDMAN, A non-cooperative equilibrium for supergames, Rev. Econ. Stud. 38 (1971), I-12. 5. J. FRIEDMAN, “Game Theory with Applications to Economics,” Oxford Univ. Press, New York, 1986. 6. E. GREEN AND R. PORTER,Noncooperative collusion under imperfect price information, Econometrica
53 (1984),
87-100.
7. J. NASH, Noncooperative games, Ann. of Mufh. 45 (1951). 286295. 8. R. ROSENTHAL, Games of perfect information, predatory pricing and the chain-store paradox, .I. Econ. Theory 25 (1981), 92-100. 9. R. SELTEN, Reexamination of the perfectness concept for equilibrium points in extensive games, Int. J. Game Theory 4 (1975). 25-55.