GAMES
AND
ECONOMIC
2,273-290
BEHAVIOR
(19%)
Informational Requirements and Strategic Complexity in Repeated Games* BARTON L. LIPMAN AND SANJAY SRIVASTAVA GSIA,
Carnegie
Mellon
University,
Pittsburgh,
Pennsylvania
15213
Received August 30, 1989
We propose a measure of strategic complexity in repeated games based on the idea that strategies requiring more detailed information are more complex. Our measure often yields a more intuitive ranking of strategies than other measures. In repeated games, the simplest strategy which produces a given equilibrium path is a grim-trigger strategy. We characterize the set of Nash and perfect equilibrium payoffs in n-player repeated games in which preferences over game payoffs and complexity costs are lexicographic. Journal of Economic Literature Classification Numbers: 020, 026. o IWO Academic press, IIIC.
I.
INTRODUCTION
It is well known that complete rationality is an extreme assumption and can lead to implausible predictions (Simon, 1955, 1976; Binmore, 1988). For example, complete rationality predicts a unique outcome in chess, a uniqueness which is certainly not observed; indeed, it is not even known what this outcome is. By contrast, the backward induction argument applied to a simple game, such as a one-offer, one-response bargaining game, is quite compelling. Similar arguments apply to strategies in infinitely repeated games: simple strategies are more plausible than highly complex strategies. Many strategies are not even computable (Megiddo and Wigderson, 1985). In fact, commonly imposed restrictions on strategies, such as stationarity or symmetry, are often motivated by the plausibility of simple strategies. * We thank a referee and an associate editor for helpful comments and acknowledge financial support from the National Science Foundation through NSF Grants SES-85202% and SES-8608118. 273 0899-8256l90 $3 .OO Copyri&t 0 1990 by Academic Press, Inc. All ii&its of reproduction in any form reserved.
274
LIPMAN
AND
SRIVASTAVA
The elimination of implausible theoretical predictions requires the theory to distinguish between simple and complex forms of behavior. To do so requires, first, a definition of complexity which is consistent with our intuitive notions and, second, a model of choice behavior which takes this complexity into account. A simple way to address the second issue is to assume directly that complexity yields disutility, as in Rubinstein (1986). In this paper, we propose a measure of the complexity of a strategy based on the intuition that a strategy which requires extremely precise information is difficult to understand. For example, the implications of following a strategy which requires no information at all (e.g., “always cooperate”) are trivial to comprehend. On the other hand, it is relatively difficult to evaluate the consequences of using a strategy which specifies a different action for each possible situation. In a repeated game, the information used by a strategy is the history of the game, so that the precision of the information a strategy requires is determined by the sensitivity of the strategy to histories. Therefore, we measure the complexity of a strategy by calculating the frequency with which small perturbations of histories change the induced strategy. This formulation of complexity has several appealing features. First, we measure history dependence in a more direct way than approaches in the literature. The recent literature on complexity (Rubinstein, 1986; Abreu and Rubinstein, 1988; Kalai and Stanford, 1988) has focused on the cost of implementing a strategy, defined to be the number of different induced strategies or “states.” This does measure how many different ways the strategy conditions on the history, but not how much of the history is used in the conditioning. By contrast, we explicitly measure the amount of information in the history necessary for the conditioning. Second, our measure generally ranks strategies in an intuitive fashion. We give several examples in which identifying complexity with number of states leads to counterintuitive results and where our measure provides a more intuitive ranking of strategies. Our complexity measure has striking and surprisingly robust implications for the structure of pure strategy equilibria in repeated games. If complexity yields disutility, then equilibrium strategies must be the simplest strategies which produce the equilibrium path. Our characterization starts by specifying a (proposed) equilibrium path and identifying the simplest strategy which produces this path. Our main result is that any simplest strategy is a grim-trigger strategy. This result is quite robust, holding for a large class of alternative versions of our complexity measure. In Section IV, we use this result to study equilibria where players trade off game payoffs and complexity costs lexicographically. We characterize the set of attainable payoffs and show that it is the same as that identified
STRATEGIC COMPLEXITY
275
by the Folk Theorem iff a “mutual minimax” condition (see Fudenberg and Maskin (1986)) is satisfied. We show by example that with more general preferences, the set of Folk Theorem payoffs neither contains nor is contained by the set of equilibrium payoffs in a complexity game. We conclude with a characterization of subgame perfect equilibrium payoffs.
II.
NOTATION
Let G = (A, u) denote an Z-player game in normal form, where Ai is the (finite) set of feasible actions for player i, A = Al X * * . X AI, and Ui: A + R is the payoff function for i. Let NE(G) be the set of pure strategy Nash equilibria of G. Given a discount parameter, 6 E [0, l), the infinitely repeated game G”(6) is defined as follows. Let ZZ, = A” denote the set of histories of length n, where ZZ,,contains only e, the “empty history.” HN is the set of histories of length up to N, where we denote H”, the set of all histories, simply by H. For any h E H and r between 1 and the length of h, let P,(h) E A denote the projection of h onto its rth coordinate. Given h, h’ E H, the concatenation of h and h’ is denoted h * h’ and is the element of H given by h followed by h’. Usually, a strategy specifies a player’s action for every possible history. Hence the player conditions on his own past actions, possibly nontrivially. This is not obviously appropriate in this context: if players wish to avoid having to condition too finely, a simple thing for them to do is to ignore their own past actions. Since players never “accidentally” choose the wrong action, they never need such conditioning. We maintain the usual definition of an individual strategy, but our main results are entirely unaffected if we define strategies as functions only of the opponents’ actions. Hence an individual strategy for player i is a function ci: H + Ai. We only consider pure strategies. Si denotes the set of strategies for i and s = S’ x *** X SI. For any (T E S, one can easily define the sequence of actions generated by V, say (a’, u2, . . .). Call this sequence h*(m), the path generated by u, which is an infinite history. Payoffs in G”(6) are given by
U!(U) = (1 - 6) C The infinitely
repeated game, then, is (S, us). We refer to this as the it from the game we consider later in which yields disutility.
standard game to distinguish
complexity
8”-‘Ui(U”).
276
LIPMAN AND SRIVASTAVA
In this section and the next, we discuss individual strategies but suppress the i subscript. Given, or let cr(h denote the strategy induced by u at h, defined by (olh)(h’) = a(h . h’) for every h’ E H. For each o and h E H, let E,(h) = {h’ E H: u(h = u/h’}. E,(h) is the set of histories which are equivalent to h under u. The sets of equivalent histories form a partition of H, X(a). An element of ‘X(u) is called a state. A strategy which has only one state is a constant strategy. If h’ E E,(h), then u(h) = u(h’) and for any a E A, h’ . a E E,(h . a). A strategy can therefore be completely described by the states, the action prescribed at each state, and the way the strategy transits between states. An automation is precisely a set of states, an initial state, a function mapping states into actions, and a function mapping states and one-period histories into states. See Kalai and Stanford (1988) for a proof of the equivalence between strategies and automata. A strategy can thus be represented as a graph. The nodes of the graph correspond to the states of the strategy. The directed arcs of the graph indicate that there is a transition from one state to the other. These arcs are labeled according to the one-period history inducing the transition.
III.
HISTORYDEPENDENCEANDCOMPLEXITY
Our approach to measuring complexity is based on the idea that strategies that depend in a very fine way on the history of the game are complex. In considering the use of a strategy, a player needs to think about what the strategy entails in different circumstances. Our measure formalizes the idea that this is difficult if the strategy is sensitive to small details of the history. Previous attempts to characterize simple strategies (e.g., limited memory (Rosenthal, 1979; Aumann, 1981), Abreu’s (1988) simple penal codes, and finite automata) have also relied on this idea. For example, the number of states of an automaton measures the number of different ways a strategy conditions on history. Unfortunately, this measure does not fully reflect the amount of information used in the conditioning, as illustrated by the following two-player examples. In these examples, strategies only depend on the actions of the other player (though our formal analysis does not require this). Let Ai = {C, D} for each i, referred to as cooperation and defection, respectively. Consider the strategies in Fig. 1, C4D” and LEAVE ON D. Each state is labeled by the action chosen at that state. The number of states measured implies that C4Dm is more complex than LEAVE ON D. But C4D” counts to four and then defects forever, so that it conditions on history almost trivially. LEAVE ON D requires more conditioning. First, it cooperates until the opponent defects. It then defects until the opponent’s
STRATEGIC
C v
COMPLEXITY
& O\
,0
C
277
rl”
D
D Leave
on FIGURE
D 1
next defection. At this point, it returns to cooperation and begins again. In evaluating the consequences of using C4D” given a history, one essentially only needs to check if the length of the history exceeds four. By contrast, LEAVE ON D requires one to determine whether the number of D’s in a given history is odd or even; for a long history, this is more difftcult than checking whether the length exceeds four. Intuitively, C4D” is simple because for any given history, it is trivial to deduce which state it lies in. On the other hand, LEAVE ON D requires unbounded memory. This indicates that the complexity of strategies which involve counting is exaggerated by the number of states measure. A similar problem is encountered with Markovian strategies. A strategy which conditions nontrivially on the last k components of the history has 2k states and so the number of states explodes as k increases. On the other hand, the informational requirements of a Markovian strategy increase very slowly with k. These examples highlight the intuition that what contributes to complexity is the sensitivity of a strategy to the details of the history. Intuitively, a strategy depends on a small detail of a history if changing that detail affects the induced strategy. We propose to measure the complexity of a strategy by asking how often small perturbations of the history change the induced strategy. To formalize this notion, we must define (1) a small perturbation of the history, (2) how these affect the induced strategy, and (3) an aggregation procedure. The third problem is particularly
278
LIPMAN
AND
SRIVASTAVA
hard to resolve as it raises several difficult philosophical issues discussed below. An intuitive definition of a small perturbation of a history is that it alters one component of the history by deleting one period of the history, changing the actions chosen in one period, or adding a new period of history. We call this type of perturbation a one-perturbation. So for example, the possible outcomes of one-perturbations of CC are C, DC, CD, DCC, CDC, CCD, and CCC. Note that two different one-perturbations lead to C, since we can delete either the first or the second C. Similarly, three one-perturbations lead to CCC. Formally, a one-perturbation of h is t(h) = (E, h’, h”, a, b) such that h
=
h’
. u . h”,
&
=
h’
. b . h”,
where a, b E A U {e}, a f 6, and h’ and h” are histories. Thus h is transformed into k by replacing a with b. If a = e, then we have inserted 6. If b = e, then we have deleted a. If neither is the empty history, then we have changed the vector of actions a into the vector b. As noted above, there will often be many ways to transform h into 2. Note also that, by definition, one-perturbations of different histories are necessarily different one-perturbations, independent of whether or not they lead to the same transformed history. In what follows, we frequently refer to the transformed history, h, simply as t(h). To emphasize the fact that t(h) specifies h as well as how it is transformed, we sometimes write t(h) as (1, h). We emphasize that perturbations are not akin to “trembles”; they do not occur in the play of the game. Instead, they formalize the notion of closeness of histories. Intuitively, h and t(h) are close together if (t, h) is a one-perturbation, so t(h) is a small change of h. This notion of closeness allows us to ask how sensitive the strategy is to small changes of the history. We therefore measure the complexity of a strategy by asking how often one-perturbations change the induced strategy. This requires us to aggregate across (t, h)‘s, which raises some difficult philosophical problems. First, how should we aggregate over one-perturbations of a given history? It is plausible that some kinds of history dependence contribute to complexity more than others. For example, one might argue that strategies which depend on the history only through its length are less history dependent than those which depend on what actions were taken at certain dates, so that changes should count more heavily than insertions or deletions . Second, how should we aggregate across histories? The answer depends on how one believes that complexity considerations are related to
STRATEGIC COMPLEXITY
279
equilibrium choices, There are at least two possible views. First, one can view complexity as a primitive notion which must be defined before one can discuss choices agents make. Under this view, the aggregation procedure need not be tied to histories the player expects to be relevant in equilibrium. Thus a strategy which is very dependent on the details of certain histories is complex, independent of the probability these histories arise in the play of the game. Another view is that complexity should only be considered relative to the equilibrium path. Traditionally, we ask if a strategy maximizes payoffs against the opponents’ strategies, not how it does against other strategies. Similarly, should histories never reached in equilibrium really matter for determining the complexity of the strategy? This view suggests that we should only include histories which are “close” to the equilibrium path. We consider a very wide class of aggregation procedures; there are procedures in this class consistent with either view of how complexity affects equilibrium behavior. We compute a weighted sum of the (t, h)‘s which change the induced strategy. The weighting scheme can vary across players, it can weight different perturbations of a given history differently, it can weight different histories differently, and it can do so as a function of the equilibrium path. More formally, let h* be an infinite history, which we refer to as the (proposed) equilibrium path. Let 0 I &(t, h, h*) I 1 be the weight player i puts on (t, h) given that the equilibrium path is h*. For each N, let KN(ui) = {(t, h): t(h) $E E,;(h) and h E HN}. Finally, let T\Ir(ui,
h*) =
C Pi(t, hy h*). (OEKYUJ
oi is strictly more complex than o: given pi and q-i, denoted oi zgi,o-i a:, if’I’g(Oi, h*(mi, c-i)) > ‘?;(a;, h*(cr, U-i)) for all but finitely many values of N. DEFINITION.
Comments. (1) This definition allows for both views of the relationship between complexity and equilibrium histories discussed above. To allow for the view that the complexity of a strategy depends on the anticipated equilibrium path, we must take account of the fact that different strategies will generate different infinite paths. Since the path generated depend on U-i, the complexity of ui can depend on u-i. However, we never require pi to depend nontrivially on h* and thus the complexity of ui need not depend on u-i. Hence our formulation is consistent with the view that complexity is a primitive notion. (2) It may seem odd that we define the ordering on sequences rather than on their limits. However, as can be seen in the examples below, normalizing the sums (say, by dividing by N or N*) will not yield strict
280
LIPMAN
AND SRIVASTAVA
comparisons of many strategies. Any such normalization will lead to most strategies having a limiting complexity measure of either zero or infinity. (3) This ordering shares an interesting property with the number of states measure. If oi and U[ are two strategies and %(a<) is a refinement of ‘%‘(a!), then oi has more states than c[. Also, it is clear that KN(crf) c KN(aJ for all N, so that oi is more complex than of under either measure. (4) The ordering depends only on counting one-perturbations changing the induced strategy. This can lead to difficulties. For example, consider the strategy which chooses C on every history of odd length and D otherwise and the strategy which plays CDCCDCCCD . . * . For either strategy, the only one-perturbations changing the induced strategy are those changing the length of the history. Intuitively, though, the second is more complex. This problem can be eliminated by considering “higher-order” perturbations. A two-perturbation can be defined as a one-perturbation of a one-perturbation; n-perturbations can be defined in the obvious manner. It is natural to think of a one-perturbation as “smaller” than a IO-perturbation, so we should put more weight on the former than the latter. One way to do this is to compare the effects of perturbations lexicographically, comparing the effects of two-perturbations only when the strategies are tied on one-perturbations, and so on. It is easy to show that the second strategy has many more two-perturbations changing the induced strategy than the first. In this paper, we only compare the effects of one-perturbations since this is sufficient for our results. To provide some insight into both the computation and implications of our measure, consider the examples in Fig. 2. Since these strategies depend only on the action of the opponent, we compute the measure as if the history of the opponent’s actions is the entire history. Also, for simplicity, these computations take pi to be identically 1. (It is straightforward to show that the same comparisons hold as long as the pi’s are uniformly bounded away from zero on at least N3 histories for each N.) Clearly, the constant strategy is the simplest strategy since no oneperturbation changes the induced strategy. We obtain the following ranking ofthe other strategies: LEAVE ON D > TIT FOR TAT > GRIM TRIGGER. Consider GRIM TRIGGER. If h has two or more defections, no one-perturbation changes the induced strategy since t(h) must contain at least one D. If h has exactly one D, then the only one-perturbations that change the induced strategy are those which change this D to a C or eliminate the D. There are exactly n histories of length IZ with only one D. Finally, consider the n length history of all C’s. There are exactly 2n + 1 one-perturbations changing the induced strategy: changing any C to a D or inserting a D in any of n + 1 places. Therefore, ‘PN = 2 (2n + 1) + i 2n = 2N2 + 3N + 1. n=O n=l
STRATEGIC
C,D 0
281
COMPLEXITY
0
Constant Strategy
C
9
“@
z
8’
’ Grim Trigger
cp@&&D
Tit for Tat C
C
C
f----/3 D
P
0 -0 C
Leave on 0
D
D
FIGURE
2
TIT FOR TAT is also straightforward to compute. Consider any history of length n where the opponent’s last two actions are the same. In this case, the only one-perturbations that alter the induced strategy are those which change the last action by the opponent or insert the opposite of the last action at the end. If the last two actions are not the same, then there are three ways to alter the induced strategy since one can also delete the last component. There are 2”-’ histories of each type, so
‘4’N = 1 + 2 [2”-‘(2)
+ 2”-‘(3)l
= 5(29 - 4.
n=l
Clearly, then, TIT FOR TAT > GRIM TRIGGER. This ranking emerges because while TIT FOR TAT only conditions on a small part of a given history, it conditions nontrivially for every history. By contrast, GRIM TRIGGER is constant on most histories. LEAVE ON D depends only on the number of D’s in the history. If h has k D’s and n - k c’s, there are 2n + k + 1 one-perturbations altering the induced strategy-changing any component, eliminating any D, or insert-
282
LIPMAN
ing a D anywhere.
AND SRIVASTAVA
The number of such histories is
Hence
qN =$go(;) +k+1)=go +11, (2n
2”W2
implying LEAVE ON D > TIT FOR TAT. Even though these strategies have the same graph up to labeling of arcs, LEAVE ON D is more history dependent than TIT FOR TAT. The latter conditions only on the last component of the history, while the former requires knowledge of both the last component and the previous state. We note that it is possible to redefine LEAVE ON D by conditioning on one’s own past action in a way which makes it a first-order Markovian strategy. This does not alter the rankings above. We turn to the implications of our complexity measure for equilibrium strategies in repeated games when players trade off game payoffs and complexity costs. It is immediate that in any such equilibrium, each player must choose the simplest strategy which produces his equilibrium actions. Hence we need to characterize the simplest way to produce equilibrium actions given pi. Let h* be the proposed equilibrium path. Let hi denote that element of A; corresponding to i’s actions in h* and let H* denote the set of all subhistories of h*: H* = {h E H: h . h’ = h* for some h’ E H,}.
For each n, let hX denote the (unique) n length history in H*. Define si: H* + Ai by Si(h,*) = Pn+l(hJ for each h,*. Let Si(h*) denote the set of strategies for i whose restriction to H* equals Si and define S-i(h*) analogously . DEFINITION. oi E Si(h*) is a simplest strategy for h* given pi and o-i E S-i(h*) if there is no other strategy u; E Si(h*) such that oi >PI,(T--ru,!.
Note that different specifications of pi generate different complexity rankings. Hence characterizing the simplest strategies for h* requires us to make some assumptions on the @i’s* A natural restriction is to require Pi(t, h, h*) > 0 whenever h E H*. Our main result requires a little more structure. We show that only requiring positive weight on equilibrium histories and histories “near” the equilibrium path leads to a powerful result. Under this condition, we show that the set of simplest strategies in Si(h*) is the set of GRIM TRIGGER strategies, GTi(h*)p defined by: oi E GTi(h*) if there exists ai E Ai such that
STRATEGIC
vi(h) =
COMPLEXITY
s;(h)
if h E H*;
Qi
otherwise.
283
The following theorem shows that these are the simplest strategies to produce a given equilibrium path. The first hypothesis of the theorem requires each player to switch actions an infinite number of times along the path (this does not restrict our analysis of equilibrium payoffs). The second hypothesis of the theorem is that Pi(t, h, h*) is uniformly bounded away from zero for all h E fi(h*), where I&h*) is defined as follows. For any set of histories, H’, let T(H’) = {h: h = t(h’), h’ E H’}. Then Z&h*) = H* U T(H*> U T(T(H*)). Hence &(h*) is the set of histories “close” to the equilibrium path-the set of equilibrium histories and the histories which are one- and twoperturbations of equilibrium histories. As discussed above, it is not clear that histories far from the equilibrium path should “count” in the determination of complexity; our result only requires strictly positive weight on histories in f@h*). THEOREM
1. If
(1) for ~11n, there exists k 2 n with Pk(Li) # Pk+l(Li)y (2) there exists pi > 0 such that Pi(t, h, h*) 2 Pifor all (t, h) with h E fi(h*), then the set of simplest strategies for i for h* given pi and u-i E S-i(h*) is GTi(h*). Proof.
See Appendix.
To see the intuition behind this result, note that our definition of complexity measures how finely the strategy distinguishes between histories. The requirement that the player follow the equilibrium path forces him to make certain distinctions between equilibrium histories. However, it does not require him to distinguish among disequilibrium histories and, consequently, so long as enough disequilibrium histories receive weight in the analysis, it is not optimal for the player to do so. It is worth emphasizing that one does not need to include many disequilibrium histories for this to be true. With a bimatrix game, the number of n length histories is 4”, while the number of n length histories in &h*) is much smaller, approximately 16n2. Theorem 1 is easily extended to the case where strategies are a function of the opponents’ actions only. One can replace H with H(i), where H(i) gives the history except for i’s action, and redefine strategy sets, etc., in
284
LIPMAN AND SRIVASTAVA
the obvious way. The proof of Theorem 1 is entirely unaltered for this formulation except that A-i replaces A everywhere. The proof depends critically on our assumptions about pi. If, for example, Pi(t, h, h*) = 0 whenever t changes the length of h, then the simplest strategy is ci(h) = si(ht), where k is the length of h. As another example, suppose Pi(t, h, h*) = 1 if h E H* and zero otherwise. Then ‘IfN is quadratic in N for any GRIM TRIGGER strategy. However, if the equilibrium can be played by a strategy which only conditions on the last k components of the history, the only one-perturbations of equilibrium histories changing the induced strategy are those affecting these k components. It is easy to show that this implies that qN is linear in N for such strategies. What form the simplest strategy takes with such pi’s is an interesting open question, Note that this points to a discontinuity in the correspondence giving simplest strategies as a function of pi. Suppose Pi(t, h, h*) equals 1 if h is an equilibrium history, E if h is not an equilibrium history but is in i&h*), and zero otherwise. Then for all E > 0, the simplest strategies are the GRIM TRIGGER strategies. However, if h* can be played by a finite memory strategy, this is not true at E = 0.
IV.
EQUILIBRIUM PAYOFFS
We turn to the implications of our complexity measure for Nash equilibria where players trade off game payoffs and complexity costs lexicographically. We make the following assumptions. (Al) i strictly prefers (o;, o-i) to (vi, o-i) if $(af, o-i) > ut(oit c-i). (A2) i strictly prefers (a,!, o-i) to (oi, o-i) if uf(oi, o-i) = ~!(a;, o-i) and oi >pa,C-l u,!. (Al) and (A2) do not completely specify i’s preferences over strategy tuples, but are sufficient to characterize equilibrium payoffs. We assume that pi satisfies (2) of Theorem 1 for all i. A Nash equilibrium in the complexity game is u such that there is no i, u! with (u!, u-i) strictly preferred to (vi, o-i)* Aside from the complexity measure, our analysis differs from that of Rubinstein (1986) and Abreu and Rubinstein (1988) in that we are not limited to two-player games and in that Abreu and Rubinstein do not impose the lexicographic assumption. We discuss these assumptions below. For any a’ and i, let ~,*(a’) = max,,eA, Ui(Ui, a!-i), let ~*(a’) be the vector of these payoffs, and let U be the convex hull of u(A).
STRATEGIC
COMPLEXITY
285
THEOREM 2. For all u E U such that v % u*(a) for some a andfor all suficiently small E > 0, there is an equilibrium in the complexity game for 6 sufJiciently large with payoffs within E of IJ.
Proof. Since v E U, there exists a’, . . . , ak such that Vi there is a j with a{ # a?’ and where the arithmetic average of the payoffs over the sequence is close to u. Hence, for large 6, repeating the cycle infinitely yields payoffs close to u. Let h* be this infinite history and define H* as above. For each i, let a: E GTi(h*) satisfy at(h) = ai for all h 4 H*. By Theorem 1, each i is choosing a simplest strategy in Si(h*). We now prove that these strategies form a standard Nash equilibrium. Consider any deviation by i in period m where m’ is the beginning of the next cycle along the equilibrium path. Then the deviation yields a gain of less than m’ - m times some finite number. However, it yields an average per period loss of ui - u,*(a) > 0 from m’ onward. For 6 sufficiently close to one, the loss must exceed the gain. H
Clearly, repeating any one-shot Nash equilibrium is also an equilibrium of the complexity game. It can also be shown that for any u E U, if for all a, there exists i such that Uj < u,*(a), then u is not near an equilibrium payoff of the complexity game for large 6. Hence the payoffs described by Theorem 2 plus the one-shot Nash payoffs are essentially (with the exception of boundary problems discussed in Lipman and Srivastava (1987)) all the equilibrium payoffs in the complexity game for large 6. Let u denote the (pure strategy) minimax payoffs. The mutual minimax condition (Fudenberg and Maskin, 1986) states that there exists a such that u*(a) = u. If this condition holds, Theorem 2 implies that every feasible, individually rational payoff is an equilibrium payoff in the complexity game. This is the standard Folk Theorem, except that individual rationality refers to the pure strategy minimax payoffs. This condition holds in all two-player games, Coumot oligopoly, and the n-person Prisoners’ Dilemma. The key to Theorem 2 is that CjJh is the same constant strategy for every h @ H*, so every deviation must be punished the same way. Under (Al) and (A2), we then only need to consider payoffs which can be supported by GRIM TRIGGER strategies in the standard game. This implies that in the repeated Prisoners’ Dilemma, cooperation forever is not an equilibrium when complexity matters but cooperate except every billion periods is. This is paradoxical in that complexity considerations rule out a simple equilibrium path but not the more complex path. This is due to the lexicographic assumption and would not occur with different assumptions. For example, consider “reverse” lexicographic preferences, where players prefer less complex strategies and only consider game payoffs when two strategies are equally complex. Clearly, such players will only use con-
286
LIPMAN
AND
SRIVASTAVA
stant strategies, so eternal defection is the only equilibrium in the repeated Prisoners’ Dilemma. It is difficult to characterize attainable payoffs with nonlexicographic preferences. However, it can be shown (as in Theorem l(a) of Abreu and Rubinstein) that if Z = 2 and PI = p2, a payoff is attainable with some monotonic preferences iff it is attainable with lexicographic preferences and hence iff it is a Folk Theorem payoff. What is surprising is that there are attainable payoffs with nonlexicographic preferences which are not Folk Theorem payoffs. Consider the following example in which player I chooses the row, player 2 the column, and player 3 the matrix, and all pi’s satisfy the conditions of Theorem 1.
Players 1 and 2 have lexicographic preferences. Player 3 has reverse lexicographic preferences. Players 1 and 2 play aI for IZperiods, u2 for one period, and then repeat. Player 3 plays al forever. Upon any deviation, players 1 and 2 play a2 forever. These strategies form a Nash equilibrium in the complexity game. If player 1 or 2 deviates, his payoff is 0 in the period of deviation and 1 in each subsequent period. Since he gets 1 every period in equilibrium, he must be worse off for any 6 < 1. Theorem 1 now implies that players 1 and 2 are using best replies. Since player 3 cares first about complexity, the only alternative strategy he will consider is a2 forever. This yields 1 in the period of deviation and - 100 in each succeeding period, making him worse off for large 6. This is not a Nash equilibrium in the standard game as player 3 could play a2 in the first period and aI thereafter. Player 3’s pure strategy minimax payoff is 1, while his equilibrium payoff is V/(1 - a”+‘) < 1 for large n. Hence this payoff vector is outside the set described in Theorem 2. His correlated minimax payoff is strictly positive, so this payoff is not even individually rational. This is also an equilibrium payoff with the number of states measure: players 1 and 2 use (n + 1) state strategies which play the equilibrium cycle and transit to the a2 state upon any deviation. One can show that this is an equilibrium in the Abreu-Rubinstein complexity game with these preferences. To conclude, we consider the problem of subgame perfect equilibria in the complexity game with lexicographic preferences. Subgame perfection
STRATEGIC
287
COMPLEXITY
poses difficult problems in the context of complexity games (see Abreu and Rubinstein, 1988). One problem with the number of states measure is that equilibria may not be subgame perfect along the equilibrium path. Since every state must be used in equilibrium of this complexity game, players typically have to employ “setup” states. These states are used to punish the opponent if he deviates. Since they must be used along the equilibrium path, each player begins with a series of defections, “proving” that the punishment states are available, but these states are never used again. Hence if a player could drop these states after this phase, he would, so the induced strategies do not form an equilibrium in the induced game. In contrast, this is not a problem with our measure: every strategy induced by GRIM TRIGGER is the simplest way to produce the rest of the equilibrium path. If these strategies form a subgame perfect equilibrium of the standard game, they form a subgame perfect equilibrium of the complexity game. This holds iff the punishment actions are a static Nash equilibrium. Formally, (T is a subgame perfect equilibrium of the complexity game if every h, ajh is a Nash equilibrium of the complexity game. For a different approach to the issue of subgame perfection and complexity, see Kalai and Neme (1989). THEOREM
suficiently complexity
3. For all feasible u % u*(a)for some a E NE(G) andfor all small E > 0, there is a subgame perfect equilibrium in the game for 6 sufficiently large with payoffs within E of u.
This is precisely the Friedman-type Folk Theorem (see Fudenberg and Maskin, 1986, Theorem B; Friedman, 1971). If there is a Nash equilibrium a with payoffs u*(a) = _v, then this is the same as the standard subgame perfect Folk Theorem. The Prisoners’ Dilemma, for example, satisfies this condition. One can show that the payoffs described in Theorem 3 plus the one-shot Nash payoffs are essentially the entire set of perfect equilibrium payoffs in the complexity game for large discount factors.
APPENDIX:
PROOFOFTHEOREM
1
Notation. We drop i subscripts. Let C* E GT(h*). For any IZ andj, let h,:(n) be the unique j length history such that h,* * h;(n) = h,*+j. Let M = #A and let ak be a repeated k times. We show that if u E S(h*)\GT(h*), then o >p,C-, u*. Suppose not. LEMMA
1. *$(a*)
‘= 2MN2 + 4MN + 2M.
Proof. If h E H*, then t(h) $2 E,*(h). First, if t(h) $5 H*, then c*lt(h) is a constant strategy. Since the equilibrium path never becomes constant,
288
LIPMAN
AND
SRIVASTAVA
a*lh # cr*lt(h). Second, if t(h) E H*, t(h) must be either the history before or the history after h on the equilibrium path. Then t(h) E E,*(h) iff h’ +a E E&h’) for some a and h’ E H*. But then for any k, h’ * ak E E&h’). However, h’ * ak $ZH* for large enough k by assumption, so t(h) $L E,+(h). Similarly, if h $4 H* and if t(h) E H*, then t(h) $6 E,*(h). However, if t(h) $6 H*, the induced strategy at t(h) is the same constant strategy as that at h, so t(h) E E,*(h). Summarizing, t(h) $SE,.(h) iff h E H* or t(h) E H*. The number of such perturbations is less than twice the number of oneperturbations of equilibrium histories. There are 2Mn + M one-perturbations of an n-length history, so *;((o*)
5 2 2 (2Mn + M) = 2MN2 + 4MN + 2M.
n
n=O DEFINITION A 1. A set of histories E is absorbing if for every h E E and every h’ E H, h - h’ E E. LEMMA
2.
There exists h E T(H*)\H*
such that h E E,(h’)for
some
h’ E H*. Proof. Suppose not. As in Lemma 1, we can establish that t(h) 4 E,(h) if h E H* or if t(h) E H*, which are the same one-perturbations changing the induced strategy for u *. However, there exist additional (2, h)‘s such that t(h) $Z E,(h) since (T $4 GT(h*) implies either (1) there is h $Z H” which is not in an absorbing state or (2) there is more than one absorbing state. In case (I), suppose h’ e H* is not in an absorbing state. Clearly, h’ = ht . a . h”, where!,* is the equilibrium history of length k, but a f h:(k). Clearly, hz - a E H(h*). Since h’ is not in an absorbing state, neither is hz . a, so there exists a’ such that hz * a * a’ $SE,(hz . a). Hence (T >P,cr-,u*, a contradiction. In case (2), suppose El # E2 are two absorbing states. Since (1) does not hold, X(cr*) is strictly coarser than X(o). Therefore, u* is no more complex than (T. Further, there exist m and IZsuch that h,* * a E El and h; . b E Ez. If m 2 n, then h$, * b E E2 is a one-perturbation of hz * a followed by the rest of hz - b. Note that the history being perturbed is an element of n T(H*) and is in El. Hence o >8,cr-i o*, a contradiction. DEFINITION
subhistory LEMMA
A2. Histories h’, h” E T(H*) are distinct if neither is a of the other. 3.
{h E T(H*) : h E E,(h’), h’ E H*} hasjinitely
many distinct
elements. Proof. Suppose not. Then VK, 3n and h,, . . . , hK, all distinct, with hj E T(H*) fl E,(h,*), Vj. By (1) of the theorem, hi * h,:(n) is not in an
STRATEGIC
COMPLEXITY
289
absorbing state, Vi, j. Also, all these histories are distinct. Let (t, hi * h;(n)) change one component of h;(n). Either (1) t(hi * h;(n)) e E,(hi * h;(n)) or (2) En(t(hi 9 h;(n)) is not absorbing. Case (1) yields a one-perturbation changing the induced strategy. In case (2), there exists a such that t(hi ’ h;(n)) ’ a 4: Ev(t(hi * hj+(n)). For every i and j, this yields j oneperturbations of histories in fi(h*) changing the induced strategy. Since the hi * h,?(n) are distinct, these one-perturbations are distinct. Summing over i and j, we see that Vf 2 KpN2/2 (plus a constant times N plus another constant). Choosing K 2 5(2M + I)/& Lemma 1 implies that w u >&mm,u*, a contradiction. Next, note that Lemma 3 implies 3n such that Vj, t,
h,* * t(h,f(n)) 9 Em(hZ+j).
(*I
By Lemma 2, 3h’ E T(H*) n E,(h,*). Let 4 < CQbe the number of oneperturbations of h,* or a subhistory of h,* yielding a history in an equilibrium state. We may have t(h,*) * hi+(n) E Eg(h,* * h]?(n)), but this cannot occur for u*. The number of such perturbations and their inverses (i.e., changing the history in an equilibrium state into h,* or a subhistory) is at most 24N. Every other one-perturbation changing the induced strategy for GRIM TRIGGER must also affect this strategy. Any such one-perturbation changes h E H* to t(h) $ZE,(h’) for any h’ E H* or h 4 E&h’) for any h’ E H* to t(h) E H*. On the other hand, consider a one-perturbation of the form h’ = t(hi+(n)) where this changes a component. Since h’ E E,,(h,*), h’ * r(hj*(n)) @ E,(h’ hjf(n)) iff hz - Q;(4) f$ E&,* . h;(n)). Hence, by (*), h’ - t(h,?(n)) $ E,(h’ . h;(n)). However, h’ . t(h,‘(n)) E E,.(h’ . h;(n)) as both are in the absorbing state. Thus for each j, we get (M - 1)j one-perturbations changing the induced strategy for CTand not u*. Summing over j implies that there are at least (M - l)N*/2 such one-perturbations. Since @(M - l)N*/2 > 24N for large enough N, (T >,+-, (T*, a contradiction. m REFERENCES ABREU, D. (1988). “On the Theory of Infinitely Repeated Games with Discounting,” metrica
Econo-
56, 383-396.
ABREU, D., AND RUBINSTEIN, A. (1988). “The Structure of Nash Equilibrium in Repeated Games with Finite Automata,” Econometrica 56, 1259-1282. AUMANN, R. (1981). “Survey of Repeated Games, ” in Essays in Game Theory and Muthematical Economics in Honor of Oskar Morgenstern. Zurich: Bibliographisches Institut. BINMORE, K. (1987-1988). “Modeling Rational Players: Parts I and II,” Econ. Philos. 3-4. FRIEDMAN, J. (1971). “A Noncooperative Equilibrium for Supergames,” Rev. Econ. Stud. 38, 1-12.
290
LIPMAN
AND
SRIVASTAVA
FUDENBERG, D., AND MASKIN, E. (1986). “The Folk Theorem in Repeated Games with Discounting and with Incomplete Information,” Econometrica 54, 533-554. KALAI, E., AND NEME, A. (1989). “The Strength of a Little Perfection,” Northwestern University working paper. KALAI, E., AND STANFORD, W. (1988). “Finite Rationality and Interpersonal Complexity in Repeated Games,” Econometrica 56, 397-410. LIPMAN, B., AND SRIVASTAVA, S. (1987). “Informational Requirements and Strategic Complexity in Repeated Games,” Carnegie Mellon University working paper. MEGIDDO, N., AND WIGDERSON, A. (1985). “On Play by Means of Computing Machines,” working paper. ROSENTHAL, R. (1979). “Sequences of Games with Varying Opponents,” Econometrica 47, 1353-1366. RUBINSTEIN, A. (1986). “Finite Automata Play the Repeated Prisoners’ Dilemma,” J. Econ. Theory 39, 83-96. SIMON, H. (1955). “A Behavioral Model of Rational Choice,” Quart. J. Econ. 69, 99-118. SIMON, H. (1976). “From Substantive to Procedural Rationality,” in Method and Appraisal in Economics (S. Latsis, Ed.), pp. 129-148. Cambridge: Cambridge Univ. Press.