Journal of Mathematical Psychology (
)
–
Contents lists available at ScienceDirect
Journal of Mathematical Psychology journal homepage: www.elsevier.com/locate/jmp
From the Luce Choice Axiom to the Quantal Response Equilibrium Daniel T. Jessie, Donald G. Saari ∗ International Institute for Applied Systems Analysis, Laxenburg, Austria Institute for Mathematical Behavioral Sciences, University of California, Irvine, CA 92697-5100, United States
highlights • Historical development of Luce’s choice theory in econometric and game theory models. • New mathematical analysis uniquely decomposes games into selfish/cooperative aspects. • Shows common behavioral model ignores cooperative aspects to which subjects respond.
article
info
Article history: Available online xxxx
abstract The Luce Choice Axiom, which has led to advances in several areas, currently is being used to explain subject behavior from experimental game theory. Our critical analysis of one such approach, called the Quantal Response Equilibrium, uses a new way to uniquely decompose games into the portion that encourages individuals to seek personally preferred payoffs and the portion that requires cooperation among players. An open problem concerning how to model the actual behavior of players is described. © 2015 Elsevier Inc. All rights reserved.
1. Introduction Among R.D. Luce’s remarkable contributions during the 1950s are two of his books. The first, coauthored with Luce and Raiffa (1957), is on game theory, which is a theme primarily directed toward understanding group decisions. This was not the first book in this area, but it might be the first generally readable one. The influence this publication has had on subsequent contributions to game theory cannot be overstated. This was strongly demonstrated during a January 2008 conference entitled ‘‘Luce and Raiffa after 50 years: What is next?’’ held at the IMBS, the Institute for Mathematical Behavioral Sciences, (Luce was the IMBS founding director) when several prominent speakers, including four Nobel recipients, expressed their debt to this book. (Videos of their presentations are on the conference link of http://www.imbs.uci.edu.) In the different direction of individual decisions, in 1959 Luce (1959) introduced his choice axioms (LCA). Again, this small book has proved to be a foundational piece of mathematical psychology that has been applied to numerous research areas. An example of the impact it has had beyond psychology is how, as described below, McFadden credits the influence of LCA on his Nobel Prize winning research.
∗ Corresponding author at: Institute for Mathematical Behavioral Sciences, University of California, Irvine, CA 92697-5100, United States. E-mail addresses:
[email protected] (D.T. Jessie),
[email protected] (D.G. Saari). http://dx.doi.org/10.1016/j.jmp.2015.10.001 0022-2496/© 2015 Elsevier Inc. All rights reserved.
Luce’s 1979 survey paper (Luce, 1977) outlined areas in which LCA had been applied, and since that twentieth anniversary article the use of LCA has expanded into other disciplines. Of particular interest is a recent development that connects these two interests of Luce from the 1950s. This is where LCA, which emphasizes individual decisions, is being combined with game theory, which describes group decisions. A goal of this article is to trace a particular thread of this new theme to indicate why LCA is being used in experimental game theory to model subject behavior and, in particular, with models of bounded rationality such as the Quantal Response Equilibrium (QRE) (McKelvey & Palfrey, 1995). This new development reflects the central role played by the Nash Equilibrium in game theory. To review, each agent attempts to optimize personal payoffs, where the set of options available to an agent is determined by the other players’ choices. A Nash point is where each player’s decision (which affects what other players can do) defines a personal optimal value. This ‘‘personally optimal’’ property creates a sense of stability because it is to a player’s disadvantage to unilaterally change strategy. This optimization property makes it reasonable to expect that the players’ choices will gravitate to a Nash equilibrium. Indeed, the acceptance of this Nash behavior has helped to advance our understanding of several disciplines. But a serious limitation, which could even question the reliability of its practical applications, comes from experiments: It is repeatedly noted that players do not necessarily adopt the Nash equilibrium! Thus a
2
D.T. Jessie, D.G. Saari / Journal of Mathematical Psychology (
)
–
new and central research issue is to understand why this is so. To address this concern, notice that each agent’s choice is an individual decision. This observation already suggests why versions of LCA are being incorporated into game theory. Justification for using LCA in this manner reflects Luce’s comments (Luce, 1977) when he addressed the use of his axiom for modeling subject behavior: ‘‘Although it is clear from many experiments that the conditions under which the choice axiom holds are surely delicate, the need for simple, rational underpinnings in complex theories, as in economics and sociology, leads one to accept assumptions that are at best approximate’’. A concern explored here is whether, in this relatively new research direction from experimental game theory, these LCA approximations lead to reasonable predictions of subject behavior, or whether the approaches must be seriously modified. This paper has two main parts. The first is a brief historical outline of the choice axiom to indicate how LCA came to be used in game-theoretic choice models. The second analyzes these models; it shows how Luce’s statement of nearly forty years ago applies to this current research. A twist on this discussion is that our main tool comes from our recent discovery (Jessie & Saari, 0000) of how to uniquely decompose games into their strategic and behavioral components.
2.1. LCA
2. Historical outline
For S ⊂ T , PS denotes the conditional probability measure defined over the set S, and P (A, B) represents P{A,B} (A), which is the probability of selecting A in S = {A, B}. The choice axiom imposes consistency by requiring the probabilities in any set T to determine all consistent subset outcomes. An important feature is that a universal set is not required; the axiom requires consistency as alternatives are added or subtracted from a choice set.
Our description of LCA reflects that of Saari (2005), which describes ways to generalize LCA and offers a perspective of how LCA was incorporated into game-theoretic choice. The basic idea (Saari, 2008, Chap. 1) comes from comparing Arrow’s ‘‘Impossibility Theorem’’ (Arrow, 1951) (also published in the 1950s) with LCA. This is done by modeling each approach as a mapping from a domain, which, typically, is the space of all complete transitive preferences, to the range or outcome space, which also is the space of complete transitive preferences. With individual decision theory, however, the domain describes the likelihood an individual will select a particular ranking. What allows Arrow’s and Luce’s approaches to be compared is that these probabilities can be identified with the proportion of voters who have these particular preferences, which is the kind of domain used in group decision theory. From the three components of (1) the domain, (2) the mapping, and (3) the range, Arrow emphasized the structure of the mappings. By imposing what seemed to be reasonable conditions, Arrow proved his negative impossibility theorem asserting that no such mapping exists; only decisions made by a single individual (a dictator) would always satisfy his conditions. The problem is caused by Arrow’s Independence of Irrelevant Alternatives (IIA), which is a consistency condition whereby each pair’s pairwise ranking agrees with its relative ranking within the full ranking of all alternatives. Subsequent research tended to treat Arrow’s conditions on mappings as the fixed given; the structure of the domain and range were modified to seek positive conclusions: While no reasonable theoretical approach has emerged with IIA, a slight change of IIA admits usable methods (Saari, 2008, Chap. 2, Saari, 2015). Luce, on the other hand, obtained positive conclusions by emphasizing the ‘‘range’’ (rather than the ‘‘mapping’’) component; he required the outcome space to satisfy particular, desired properties. In describing his theory, Luce used specific mappings (that is, specific probability measures) with accompanying domain restrictions, but all of this can be generalized. To do so, treat LCA as the fixed given—it defines the space of outcomes. The choice of mappings (probability measures) with their associated domain restrictions become variables that are free to be determined as long as they are consistent with LCA. In this manner, a richer collection of mappings and properties, along with answers to some questions that Luce raised, follow Saari (2005).
To review, Luce considered settings where a subject places weights, or intensities, on each alternative, and an alternative is chosen according to these weights. But not any choice of weights or probability measure is permitted; part of the strength of Luce’s theory is that it requires consistent decisions across the possible sets from which an item could be chosen—even sets that are currently unimaginable. The reason is, as Luce (2008) noted, ‘‘When a person chooses among alternatives, very often their responses appear to be governed by probabilities that are conditioned on the choice set. But ordinary probability theory with its standard definition of conditional probability does not seem to be quite what is needed’’. LCA removes this reliance on a specified choice set. In formal notation, letters in the early part of the alphabet A, B, C , D correspond to alternatives, R, S , T represent sets of alternatives, and lower-case letters represent probabilities; e.g., PT (A) = a. The probability measure PT satisfies the usual axioms for space T ; i.e., (1) For S ⊂ T , 0 ≤ PT (S ) ≤ 1. (2) PT (∅) = 0; PT (T ) = 1. (3) If R, S ⊂ T and R ∩ S = ∅, then PT (R ∪ S ) = PT (R) + PT (S ).
Axiom 1 (Luce’s Choice Axiom). For any n ≥ 2, let T = {A1 , A2 , . . . , An } be a set of n alternatives. A probability measure PT satisfies the choice axiom if the following are true: (1) For every non-empty S ⊂ T , PS is defined. (2) If P (Ai , Aj ) ̸= 0, 1 for all Ai , Aj ∈ T , then for R ⊂ S ⊂ T PT (R) = PS (R)PT (S ). (3) If P (Ai , Aj ) = 0 for some Ai , Aj ∈ T , then for every S ⊂ T PT (S ) = PT −{Ai } (S − {Ai }). The effect of LCA is to endow each alternative with an intrinsic level of likelihood that is independent of the particular set from which it is chosen. Mathematically speaking, in Luce’s formulation, the choice axiom implies the existence of a weight function v(A) for an alternative A in which the probability of selecting A can be written as ev(A) PT (A) = v(B) e
(1)
where the sum in the denominator is taken over all alternatives in T (Luce, 1959). Because function v(·) does not depend upon the set T , it defines an intrinsic weight of the alternative. An important special case is when v(·) is a linear function; here Eq. (1) defines the multinomial logit model. Also, the problems that Arrow raised are avoided because whatever mapping and associated domain restriction are selected to accompany LCA, this intrinsic weight property requires the mapping to satisfy Arrow’s IIA condition.1
1 There are binary and general probability measures that have no domain restrictions (Saari, 2005).
D.T. Jessie, D.G. Saari / Journal of Mathematical Psychology (
2.2. McFadden and the LCA The interest of economists in LCA stems not from its ability to describe actual human behavior, but from its applicability to the discrete choices being made by a rational (i.e., optimizing) agent. Traditionally, economics had considered an agent’s choices as being represented in terms of real variables, such as the demanded quantity of butter or milk, which, with their divisibility properties, can be described as an element r ∈ R. Beginning around the 1970s, there was a growing interest in those decision problems that are more appropriately modeled by outcomes in a finite or countable set; e.g., level of education, the decision to get married, voting behavior, or a method of transportation. This setting is referred to as a quantal (or discrete) choice problem. Microeconomic theory was concerned with the behavior of a rational agent, where rationality is defined in terms of a consistent, utilitarian decision procedure. With this structure, the failure to match empirical or experimental results was not seen as a flaw. This is because the theory centered around utility functions, which are not observable in the real world or in experiments. Given that LCA defines consistent choice procedures from finite sets, it offered a natural starting point for an economic theory of quantal choice. Indeed, McFadden (1976, 0000) described the foundation of his Nobel winning research as working out an econometric model based upon Luce’s work. The result, which is widely used in the social sciences, was the multinomial logit model, which is Eq. (1) with a linear choice for v , and its relation to the random utility model. Limitations imposed by LCA (such as the requirement that IIA is satisfied, which Debreu (1960) noted) restrict its applicability, but the multinomial logit model is computationally simpler than many alternatives,2 and it is consistent with the classical economic setting of sampling from a population of utility maximizers. In other words, LCA provides a theoretical foundation for the creation of a computationally tractable model of quantal choice that is consistent with the already well-developed theory of rational consumer choice. 3. Strategic choice and bounded rationality Game theory differs from McFadden’s setting of individual choice by representing a situation of ‘‘mutually interdependent choices’’. This is where each player makes a decision, and this choice determines (in part) what outcomes are available to the other players, from which they make their choices. To illustrate these comments, consider the following game played between the Row and Column players.
G1 =
6 4
6
−4
0 2
4 . 0
(2)
Each player’s payoff is determined by the combined choice of a row and column. If, for instance, Row chose Top and Column chose Right, the outcome is (0, 4) where Row receives 0 and Column receives 4. To indicate how a player’s choice restricts which options are available to the other players, if Row chose Top, then Column must select from the admissible outcomes of 6 (by selecting Left) and 4 (by selecting Right). Similarly, if Column selects Right, then Row’s options are between 0 (by selecting Top) and 2 (by selecting Bottom). So, rather than Top and Right, each player would receive a higher payoff by adopting a different strategy. But care is needed; if Column reacts by selecting Left while Row selects Bottom, Column is punished with the outcome of −4. The challenge is to identify whether optimal options exist; it is to determine how a rational player should select from
2 For generalizations, see Train (2003).
)
–
3
among the available strategies. What adds complexity is that each player must consider personally preferred payoffs and possible actions/reactions of opponents. To illustrate, if Column selected Right to obtain the payoff of 4, Row’s optimization reaction of selecting Bottom would leave Column with the payoff of 0, not 4. John Nash’s solution, which has become a foundational piece of modern game theory, builds on this expectation that each player wants to maximize his or her personal payoff. The Nash equilibrium for a n-person game is where this happens simultaneously: Each player’s choice, (s1 , s2 , . . . , sn ), attains a personal maximum, so no player’s payoff can be improved by unilaterally changing strategies. The challenge of this obviously desired setting was to discover whether it exists. To prove that it does, Nash allowed the choices to include probability distributions over options (such as in matching pennies where, rather than selecting Heads or Tails, the coin is flipped to ensure a random outcome). By doing so, Nash’s celebrated theorem proves that any game, with a finite number of players where each has a finite number of choices, always has at least one such equilibrium. To determine the Nash outcome, remember that Row’s options to obtain a personal maximal payoff are subject to the choice made by Column. If Column adopts the probabilistic, or mixed, strategy choice of (q, 1 − q), where q ∈ [0, 1] is the likelihood of Column choosing Left, then Row selects from the following two expected values: E (Top) = 6q + 0(1 − q) = 6q E (Bottom) = 4q + 2(1 − q) = 2q + 2.
(3)
According to these expressions, if q > 0.5, Row’s optimal choice is Top (often called the ‘‘Nash best response’’); if q < 0.5, Bottom is optimal. If q = 0.5, then Top and Bottom yield equal expected values, so Row is indifferent between either choice. Similarly, should Row’s mixed strategy be (p, 1 − p), where p ∈ [0, 1] is the likelihood of playing Top, then Column’s selection is between the two expected values of E (Left) = 6p + (1 − p)(−4) = 10p − 4 and E (Right) = 4p + (1 − p)(0) = 4p. Thus Column is indifferent if the two expected values agree, 10p−4 = 4p, or if Row selects p = 23 . But if Row plays Top with probability p > 32 , then Column’s preferred strategy is to play Left. In this manner the (p, q) choices that define Nash equilibria are (1, 1), (0, 0), and ( 23 , 12 ). Assuming that each player seeks an optimal outcome, it is reasonable to expect that they would play according to Nash’s concept. But experimentally determined results, which may not be surprising to anyone who has taught a game theory course, show that subjects generally do not play according to the Nash equilibrium. This behavior may, for instance, reflect the complexity of computational or conceptual aspects of the Nash concept. Finding a mathematical theory to explain these deviations is an important open issue in modern experimental game theory, and a range of models has been proposed. 3.1. Quantal response equilibrium A prominent example of a ‘‘bounded rationality’’ approach, which is directed toward providing a complexity explanation, is the quantal response equilibrium (QRE). To motivate QRE with Eq. (2), suppose Column’s strategy has probability q = 0.5 + ε with an arbitrarily small ε > 0 value. According to the above, Row’s maximal outcome requires playing p = 1 (Top). But should Column choose q = 0.5 − ε , then Row’s optimal choice flips to the other extreme to play p = 0 (Bottom). This example underscores the unfortunate reality that playing the Nash best response requires the players to be able to distinguish between arbitrarily small differences in expected values of strategy choices. More reasonable models of human behavior allow flexibility, which suggests using a probabilistic choice function where the
4
D.T. Jessie, D.G. Saari / Journal of Mathematical Psychology (
higher the perceived expected value associated with a strategy, the more likely it is that the subject will choose it. This is the motivation used by McKelvey and Palfrey in their development of the QRE (McKelvey & Palfrey, 1995). Instead of perfectly observing the payoffs for a given strategy, assume that the subjects perceive the expected value plus a stochastic term λε , where ε is a random variable and λ is a magnitude parameter. In this framework, Row strategically chooses Top if
ε2 ε1 > E (Bottom) + E (Top) + λ λ ⇔ λ[E (Top) − E (Bottom)] > ε1 − ε2 .
(4)
That is, Row’s choice of a probabilistic strategy combines the actual expected values with a measure of how it is recognized. A common assumption about εi is that they are independent and identically distributed type-I extreme value random variables. In the general case of several strategies, the probability pi that strategy si is chosen is given by eλE (si ) pi = λE (s ) . e j
(5)
3.2. Decomposition The approach we developed (Jessie & Saari, 0000) to analyze QRE and related concerns partitions the set of games into equivalence classes that have the same specified structure. As an example from Jessie and Saari (0000), the three Eq. (6) games evoke very different behaviors.
G2 =
0 2
8 G4 = 10
0 4 0 2
4 6 0 2
2 , 6 2 . 4
G3 =
4 6
4
−2
−2
0
6 , 0
(6)
The G2 goal of achieving Bottom-Right to attain the mutually optimal outcomes is immediate; the simplicity of this G2 structure leaves little to analyze. For G3 , which is a Prisoner’s Dilemma game, a goal may be to reach the Pareto Superior (i.e., where a player cannot get a larger value without some other player getting a smaller value) position of Top-Left, while in G4 , should side payments be allowed, a goal may be to attain Bottom-Left with its superior sum. But when viewed from the Nash perspective, all three games are equivalent. Although the different games require
–
varying degrees of ability and analysis to find their Nash point, each has the same Nash strategy of Bottom-Right. To carry out the analysis, we proved the following: Theorem 1 (Jessie & Saari, 0000). For n ≥ 2 players, any k1 × k2 × · · · × kn game G, where ki is the number of pure strategies for the ith player, i = 1, . . . , n, can be uniquely decomposed into its Nash strategic component, a behavioral component, and a kernel component. The strategic component, GN , contains all Nash information about G. As the behavioral component, GB , contains no Nash information, any desired GB outcomes cannot be obtained by an individual player’s strategic approach; they require cooperation among the players. The kernel component, GK , adds the same value to each of a player’s entries in G, so it has no effect on the behavioral or strategic portions of a game. The decomposition, which is surprisingly simple to use and described in the Appendix, defines the equivalence relationship. To illustrate Theorem 1, when the approach is applied to the Eq. (6) games, the decompositions, in the G = GN + GB + GK order, are: G2 =
j
This expression is identical to the LCA multinomial logit expression in Eq. (1) with v(·) being a strategy’s expected payoff, with an emphasis on preserving the LCA structure. The λϵ form of the stochastic term in Eq. (4) makes it reasonable to treat λ as a measure of ‘‘rationality’’. This is because with small λ values, the random terms dominate; indeed, with λ = 0 (Eq. (5)), the subjects are completely insensitive to differences in expected value, so strategies are selected from a uniform distribution. In contrast, the effect of the random terms diminish as λ → ∞, which means that the subjects become increasingly responsive to payoff differences; in the limit they reach the Nash equilibrium. QRE models raise theoretical (some are developed below) and experimental issues. With the latter, the effect of an experimental or empirical failure of QRE models differs from that of McFadden’s econometric models. This is because McFadden assumed that the agents can be modeled as utility maximizers. The goal of analyzing QRE models in experimental game theory, however, is to describe actual subject behavior rather than to postulate what an ideal subject would do. Because of this criterion, these QRE theories must be evaluated according to their ability to predict subject behavior.
)
−1 1
−1 −1
−1
1
1
1
−1 −1
−1
1
1
1
+
−2 −2
−2
2
−2
2
2
2
3
3
3
3
−3
−3 −3
+
3
3
3
3
3
3
3
3
2
2
2
2
2
2
2
2
,
(7) G3 =
−1 1
+
−3
+
.
(8) G4 =
−1 1
−1 −1
−1
1
1
1
+
4
−1
4
1
−4 −4
−1 1
+
5
2
5
2
5
2
5
2
.
(9) N
All three games have precisely the same G form, so Nash considerations for these three games must agree. (For 2 × 2 games, a pure Nash equilibrium outcome always is where both players’ GN entries are positive.) This leads to our equivalence relationship: Definition 1. Two games, Gi and Gj , are Nash equivalent if and only if they have identical Nash components.; e.g., GNi = GNj . This is represented by Gi ∼N Gj As explained later, for any k1 × · · · × kn game, each of a player’s
GK entries is the average of all of the player’s payoffs. Selecting a
pure strategy for each of the other players identifies a specific array of options for a given player: The player’s GN entry in this array is the difference between the entry’s G value and the array’s average. By being differences from the average, the sum of a player’s GN entries for each array is zero. The player’s GB entries in this array is the difference between the array’s average and the player’s common GK value. As it turns out, the sum of all of a player’s GB entries also is zero. For each player, GK adds a fixed value to each of the player’s payoffs. While GK plays no role in a strategic or behavioral analysis, if the payoffs describe transferables, such as money, GK can influence approaches such as the possibility of side payments. With G4 , for instance, Row’s larger GK4 values introduce this possibility. As described and demonstrated above, each GB row is the same for the Row player; each column is the same for the Column player. Because of this structure, GB admits no personal strategic opportunities for either player, so the GB components contain no Nash information. But there could be GB outcomes that both players wish to have. Illustrating with GB3 , both players may prefer the Top-Left GB3 outcome (because these terms create the TopLeft Pareto superior G3 outcome). This objective, however, can be achieved only if both players cooperate: There is no way for either player to individually ensure this outcome, but as a unit, there are cooperative reasons and ways to do so.
D.T. Jessie, D.G. Saari / Journal of Mathematical Psychology (
Stated in words, our decomposition uniquely separates aspects of a game G that encourage individual strategic actions to achieve a personal optimal outcome (the GN component) from aspects that require cooperative behavior (the GB terms). This difference is captured by G3 where personal interests (GN3 ) drive strategies toward the GN3 Bottom-Right Nash entry with its maximal outcome for each player, while combined interests (GB3 ) encourage cooperative behavior to attain the GB3 Pareto superior Top-Left entry. These opposite, conflicting forces capture the Prisoner’s Dilemma difficulty whereby the cooperative action required to attain the Pareto superior Top-Left outcome is jeopardized by personal greed. Each of the GB bi-matrices in Eqs. (7)–(9) has a Pareto superior and inferior entry. This GB property holds for all two player games, but it does not extend to all games with three or more players. A consequence of the property not extending is that accepted tit-fortat conclusions about creating cooperation in games can differ with n ≥ 3 players (Jessie & Saari, 2014). 3.3. Equivalent QRE games QRE features are analyzed by extending the equivalence relation structure to other solution concepts. Namely, for any given solution concept SC, games Gi and Gj satisfy the binary relationship ∼SC iff they have an identical SC structure. Our equivalence definition for QRE must involve λ. This is because, with Eqs. (4), (5) and however λ is interpreted, λ clearly plays a central role in the QRE solution concept. For this reason, the ∼QRE ,λ equivalence notation reflects λ’s importance. Definition 2. Two k1 × k2 QRE games, Qi and Qj , are ∼QRE ,λ equivalent iff they both have the same solution concept given by Eq. (5). Because Eq. (5) holds for n ≥ 2 player k1 × · · · × kn games, so does Definition 2. Differences among ∼QRE ,λ equivalence sets (Definition 2) must be expected to identify those particular game theoretic structures that explain differences in QRE predictions and, of importance, in the ability of subjects to find a Nash equilibrium. To illustrate with extreme settings, it takes minimal effort to find the Nash solution for game G2 (Eq. (6)). Game G4 requires more effort to recognize that Bottom-Right is the sole Nash point. The more complex of these three games is G3 , with its Prisoner’s Dilemma structure. The distinct differences in effort and ability needed to recognize the Nash structure of these three games makes it reasonable to anticipate that they cannot be equivalent for any λ > 0. If they could, then it must be expected that it would require a subject with sharp expertise, which corresponds to QRE models with a large λ value. A way to address these issues is to examine whether any λ > 0 exists so that G2 ∼QRE ,λ G4 , or, more extreme, whether G2 ∼QRE ,λ G3 . The unexpected answer is based on the next theorem. Theorem 2. For any λ > 0, two k1 × · · · × kn games Gi and Gj are QRE equivalent, Gi ∼QRE ,λ Gj , if and only if they are Nash equivalent, G i ∼N G j . The surprising conclusion is that, for any λ > 0, it is true that
G2 ∼QRE ,λ G3 ∼QRE ,λ G4 . The proof (Appendix) proves for any λ > 0 that Eq. (5) QRE choice probabilities are completely independent of a game’s behavioral and kernel components. Rather than involving the precise game theoretic features that affect a subject’s behavior (an intent of QRE), QRE probabilities ignore them by relying strictly on the GN Nash structure! For this reason, QRE treats all three Eq. (6) games identically in spite of their obviously varying complexities and properties.
)
–
5
By using Theorem 2 along with the structure outlined under Definition 1, it is easy to construct as many other illustrating examples as desired; just add appropriate GB and GK components to a given GN . A matching pennies game, for instance, is given by G5 , where Row wins if both agree (e.g., Heads–Heads or Tails–Tails), and Column wins with disagreement
G5 = GN5 =
1
−1
−1
1
−1
1
1
−1
.
Compare G5 with G6 , which has the identical Nash structure, so G 5 ∼N G 6 .
G6 =
+
12 10
−6
−2
−4
6
0
4
6 6
−5
−6 −6
−5
5
5
= +
1
−1
−1
1
−1
1
1
−1
5 5
0 0
5 5
0 . 0
While the G5 commonly used ‘‘flipping pennies’’ mixed strategy can be found without undue effort, it is arguable that finding Nash strategies for G6 is more difficult because of its distracting Bottom-Left GB6 influence. Yet, according to Theorem 2, for any λ > 0, G5 ∼QRE ,λ G6 , which QRE would interpret as being equivalent. The conflict is caused by the GB6 and GK6 components that can significantly influence whether a subject can discover a game’s Nash structure. Thus, a limitation of QRE is that these terms always are absent from Eq. (5). Clearly, a more viable theory must incorporate the behavioral GB term. In two person games, for instance, the behavioral term always has a Pareto Superior entry, so appropriately integrating this term with the expected value in the v(·) function is one of several possibilities. 3.4. Experimental evidence As shown, games within a Nash or QRE equivalence class can differ widely with their theoretical properties and in the ways in which they are analyzed in the game theory literature. The experimental issue is to determine what happens in practice. If subject behavior consistently changes when faced with different QRE equivalent games, this would indicate that QRE has crucial structural flaws when used to explain and differentiate the behavior of subjects with finding Nash equilibria. The recent work of Jessie and Kendall (0000), which is based on our decomposition, shows that subjects do play differently in QRE equivalent games. The experiments were designed by presenting subjects with sets of games that have identical GN components. According to Theorem 2, all of the games are ∼QRE ,λ equivalent, which, according to QRE, should predict identical behavior. Similar to how G6 is constructed from G5 , the manner in which the games within a set differ is by appropriately selecting GB + GK components. Namely, these GB + GK terms have become new variables to be used in experimental economics. The consistent result is that subjects do not react to QRE equivalent games in a similar manner. As predicted (and suggested by the design of G6 ), they play, with significantly higher probability, the strategy that could lead to the Pareto superior GB outcome. This research also shows that many generalizations of QRE and models of bounded rationality (for example, truncated QRE and Noisy Introspection models) exhibit the same structural flaw: Rather than incorporating the precise payoff information that affects a game’s properties, these models focus entirely on a particular aspect of the game’s structure. What adds surprise to these experimental results is that the theoretical approach (e.g., as reflected by Eqs. (4), (5)) used to develop QRE suggests that it would do as had been anticipated. To show otherwise, to capture the subtleties of what QRE structures actually reflect and to explain why QRE need not do as expected (Theorem 2), required using our decomposition of games. The obvious next step is to develop new mathematical
6
D.T. Jessie, D.G. Saari / Journal of Mathematical Psychology (
approaches that capture how subjects make strategic game theoretic decisions. Toward this end, the decomposition identifies required mathematical properties. It is interesting to consider what role LCA may play in future developments. As described, LCA implies there exists some value function that determines choice probabilities according to Eq. (1); the QRE failure, as demonstrated here, lies in its particular choice. This means that LCA might form the foundation for strategic choice, but with a more appropriate v(·) function that must involve GB effects. However, heeding Luce’s warning, where he found that the choice axiom can lead to poor models of human behavior, one must be cautious about simply replacing the expected value function in Eq. (5) with a more sophisticated expression that contains more information. 4. Summary The Luce Choice Axiom is a seminal result in individual choice theory that is being used as a foundation for a number of models of strategic choice. From a theoretical perspective, LCA provides a rigorous foundation to develop theories consistent with the economic definition of a rational agent. But as Luce (1977) cautioned, its connection to actual human behavior has been tenuous because of the restrictive settings required to satisfy the LCA properties. (Some restrictions are relaxed by generalizing LCA Saari, 2005.) In particular, the Quantal Response Equilibrium model analyzed here uses a special case of the axiom. But our mathematical results, which extract the structures of games (Jessie & Saari, 0000), prove and explain why QRE is lacking as a model of subject behavior. Our decomposition and analysis also identify mathematical properties that must be reflected by future models designed to replace QRE, e.g., while it is not clear how to incorporate information from the GB component, this information cannot be ignored. Furthermore, the historical development of why LCA is used in game theory (and Eq. (4)) suggests that LCA should be more closely examined by experimental game theorists. This comment reflects (Debreu, 1960) of Luce’s 1959 book where ‘‘its emphasis on the conceptual and experimental difficulties associated with any theory of choice, and its constant concern for empirical verification deserve the attention of our profession’’. Appendix The decomposition The objective is to extract all needed features from a game G, and nothing more, to determine the Nash equilibria. To illustrate with G1 , re-expressing Eq. (3) as one equation, the optimal choice is determined by the sign of E (Top) − E (Bottom) = (6q + 0(1 − q)) − (4q + 2(1 − q))
= (6 − 4)q + (0 − 2)(1 − q) = 4q − 2. (10)
)
–
so Row makes the same Nash decision. Similarly, rather than the actual payoffs, Column’s Nash strategy is determined by the differences (6 − 4) and (−4 − 0). So, with the very different games of
G′1 =
16 14
16 0
0 2
14 , 4
2 0
G′′1 =
3 13
14 16
1 , 17
(12)
Row and Column Players face precisely the same differences as with game G1 , which requires the Nash strategies to remain the same. For example, Row in G′′1 has E (Top) − E (Bottom) = (2q + 14(1 − q)) − (0q + 16(1 − q))
= (2 − 0)q + (14 − 16)(1 − q) = 4q − 2 which defines the same strategy choice for any given q ∈ [0, 1]. But while these Nash equilibria strategies remain the same, their properties vary greatly. In G′1 the (Top, Left) equilibrium payoffs Pareto dominate the (Bottom, Right) equilibrium payoffs, and this relationship is reversed in G′′1 . As shown below, these differences are due to GB terms (Eq. (14)), which require cooperative efforts to attain the desired outcome. According to Eqs. (10), (11), the G1 Nash information requires that differences between appropriate terms have specified values. There are many ways to select these values, but the unique approach (Jessie & Saari, 0000) to extract all Nash information is to replace each of a player’s G1 entries in an array with how it differs from the array’s average. With G1 , this becomes
G1 =
1
1
−1
−1
−1
−2
1
2
+
5 5
5
−2
1 1
5
(13)
−2
where the first bi-matrix is GN1 , and the second consists of the averages of the appropriate arrays. For instance, Row selecting Bottom identifies Column’s G1 row options of −4 and 0, with an average of −2; this common −2 value defines the array’s two entries in the second equation (13) bi-matrix. The differences of the array’s G1 entries of −4 and 0 from this average (which are −2 and 2) define the array’s GN1 entries. Similarly, Column’s choice of Right defines Row’s column array with 0 and 2 with average 1, which is the second bi-matrix’s array’s entries. Differences of the array’s G1 entries from the average are in the GN1 array. With this construction, each strategy Row (or Column) selects has precisely the same G1 − GN1 outcomes; should Row select either Top or Bottom, the G1 − GN1 array is 5 and 1. For this reason, G1 − GN1 has no Nash information; all of the G1 Nash information resides in GN1 . A further refinement is needed to separate the G1 − GN1 information that inflates each player’s payoffs from terms that encourage cooperative behavior. Define each of a player’s GK1 entries as the average of the player’s G1 payoffs (which is 3 for Row and 1.5 for Column). Each GB1 entry is the difference between player’s G1 − GN1 entry and the player’s average value. In this manner, the decomposition replaces G1 − GN1 with GB1 + GK1 where
GB1 + GK1 =
3.5 −3.5
2 2
−2 −2
3.5 3 + −3.5 3
1.5 1.5
1.5 . 1.5
3 3
With a positive value, Top is optimal; with a negative value, Bottom is optimal. Similarly, if Row’s strategy is to play Top with probability p ∈ [0, 1], then Column’s strategy is determined by the sign of
So, the above described different equilibria properties of G′1 and G′′1 , which have the same GN1 component, are caused by the differences in their behavioral GB terms
E (Left) − E (Right) = (6p − 4(1 − p)) − (4p + 0(1 − p))
G′1 =
= (6 − 4)p + (−4 − 0)(1 − p).
B
(11)
These Eqs. (10), (11) arrangements identify that certain differences in payoffs are crucial for determining the Nash equilibria of the game. With Eq. (10), for example, replacing the p coefficient of 6 − 4 with 16 − 14 or −9 − (−11), preserves the difference of 2,
7 7
6.5 −6.5
−7 −7
6.5 , −6.5
′′ B
G1 =
−7 −7
−6.5 6.5
7 7
−6.5 6.5 (14)
′B
′′ B
with G1 ’s emphasis on Top-Left and G1 ’s emphasis on BottomK
K
Right. (Here, G′1 = G′′1 with 8 for each of Row’s entries and 8.5 for each of Column’s entries.)
D.T. Jessie, D.G. Saari / Journal of Mathematical Psychology (
)
–
7
eλ([η1,1 q+η2,1 (1−q)]+[β1,1 q+β2,1 (1−q)]+κ1 )
pTop =
eλ([η1,1 q+η2,1 (1−q)]+[β1,1 q+β2,1 (1−q)]+κ1 ) + eλ([−η1,1 q−η2,1 (1−q)]+[β1,1 q+β2,1 (1−q)]+κ1 )
=
eλ([β1,1 q+β2,1 (1−q)]+κ1 )
eλ(η1,1 q+η2,1 (1−q))
eλ([β1,1 q+β2,1 (1−q)]+κ1 )
eλ(η1,1 q+η2,1 (1−q))
eλ(η1,1 q+η2,1 (1−q))
=
eλ(η1,1 q+η2,1 (1−q)) + eλ(−η1,1 q−η2,1 (1−q))
+ eλ(−η1,1 q−η2,1 (1−q))
,
(17) Box I.
This construction holds for any game G; it requires each player’s sum of GN entries for each array to equal zero, and the sum of the player’s GB entries to equal zero. Conversely, any bi-matrix with these properties serves as the appropriate component for some game. To illustrate, it can be difficult to find the pure Nash equilibria of the following 4 × 3 game. 12
−3
6
3
6
−7
5
8
+
−5 −7 −3 −5
7
−4
7
3
7
−6
7
5
−13
3
−2
5
1
0
−3
−1
2
4
5
0
−2
1
1
−4
3
−1
2
2
−1
4
5
−1 = −7 −1 3 −2
3
0
−1
1
−1 −1 −2
−5 −5 −5 −5
2
−4
4
3
4
−4 3
−6
4
−6
5
4
5
.
But the equilibria immediately emerge from GN (the only entries with positive terms) as Top-Left and Third Down-Middle. What complicates the search for Nash equilibria is the GB term; GB is given by subtracting 2 from each of Row’s entries in the last bimatrix (G − GN ) and −0.5 from each of Column’s entries. (These values define GK .) Proof of Theorem 2 The above structure allows any 2 × 2 game to be decomposed into the G = GN + GB + GK form as
η1,1 −η1,1
G=
+
κ1 κ1
η1,2 η2,2 κ2 κ2
κ1 κ1
η2,1 −η2,1
−η1,2 β + 1,1 −η2,2 β1,1
β1,2 β2,2
β2,1 β2,1
β1,2 β2,2
κ2 , κ2
(15)
where a term’s second subindex identifies the player and β1,j + β2,j = 0. With this notation and Column’s strategy of q ∈ [0, 1], Row’s expected values are E (Top) = (η1,1 + β1,1 + κ1 )q + (η2,1 + β2,1 + κ1 )(1 − q)
= [η1,1 q + η2,1 (1 − q)] + [β1,1 q + β2,1 (1 − q)] + κ1 E (Bottom) = (−η1,1 + β1 )q + (−η2,1 + β2,1 )(1 − q) + κ1 (16)
= [−η1,1 q − η2,1 (1 − q)] + [β1,1 q + β2,1 (1 − q)] + κ1
which underscores the fact that all differences in expected values come from the GN component: the βi,j and κj terms are identical in each computation. Using these expressions with Eq. (5) to compute selection probabilities yields Eq. (17) (see Box I), where (as it should be expected) terms that are common in both Eq. (16) expected values factor out of the expression and cancel. The final expression proves that the selection probabilities are independent of the GB and GK components. As such QRE depends strictly on the GN components; QRE predicts invariant behavior within strategically equivalent classes of games, no manner how varied they can be, such as G1 , G′1 , and G′′1 . Our decomposition ensures that a canceling phenomenon similar to Eq. (17) holds for all k1 × k2 × · · · × kn games. This is because, for each agent, the behavioral and kernel terms in each of the expected value computations are precisely the same. References Arrow, K. (1951). Social choice and individual values. New York: Wiley, (2nd ed. 1963). Debreu, G. (1960). Review of R. D. Luce, individual choice behavior: A theoretical analysis. American Economic Review, 50, 186–188. Jessie, Daniel, & Kendall, Ryan (0000). Decomposing models of bounded rationality, IMBS technical reports MBS 15-06. Jessie, Daniel, & Saari, D.G. (0000). Strategic and behavioral decomposition of Games, submitted for publication. Available in IMBS Technical Report series MBS 15-05. Jessie, Daniel, & Saari, D. G. (2014). Cooperation in n-player repeated games. In M. Jones (Ed.), AMS contemporary mathematics series: Vol. 624. The mathematics of decisions, elections, and games (pp. 189–206). Luce, R. D. (1959). Individual choice behavior. New York: Wiley. Luce, R. D. (1977). The choice axiom after twenty years. Journal of Mathematical Psychology, 15. Luce, R. D. (2008). Luce’s choice axiom. Scholarpedia, 3(12), 8077. Luce, R. D., & Raiffa, H. (1957). Games and decisions: introduction and critical survey. New Jersey: John Wiley and Sons. McFadden, Daniel (0000). Daniel L. McFadden—Biographical, Nobelprize.org. Nobel Media AB 2014. Web. 14 April 2015. McFadden, Daniel (1976). Quantal choice analysis: A survey. Annals of Economic and Social Measurement, 5(4). McKelvey, R., & Palfrey, T. (1995). Quantal response equilibria for normal form games. Games and Economic Behavior, 10, 6–38. Saari, D. G. (2005). The profile structure for Luce’s choice axiom. Journal of Mathematical Psychology, 49, 226–253. Saari, D. G. (2008). Disposing dictators; demystifying voting paradoxes. New York: Cambridge University Press. Saari, D. G. (2015). From Arrow’s theorem to ‘dark matter’. British Journal of Political Science,. Train, K. E. (2003). Discrete choice methods with simulation. Cambridge University Press.