Strategic reasoning in p-beauty contests

Strategic reasoning in p-beauty contests

Games and Economic Behavior 75 (2012) 555–569 Contents lists available at SciVerse ScienceDirect Games and Economic Behavior www.elsevier.com/locate...

753KB Sizes 5 Downloads 56 Views

Games and Economic Behavior 75 (2012) 555–569

Contents lists available at SciVerse ScienceDirect

Games and Economic Behavior www.elsevier.com/locate/geb

Strategic reasoning in p-beauty contests Yves Breitmoser ∗,1 EUV, Postfach 1786, 15207 Frankfurt (Oder), Germany

a r t i c l e

i n f o

a b s t r a c t

Article history: Received 13 January 2010 Available online 22 February 2012 JEL classification: C44 C72 Keywords: Beauty contest Logit equilibrium Noisy introspection Level-k

This paper analyzes strategic choice in p-beauty contests. First, I show that it is not generally a best reply to guess the expected target value, even in games with n > 2 players, and that iterated best response strictly applied does not induce a choice sequence approximating pk · 0.5. Second, I argue that the beliefs and actions of players typically considered to be level 2–4 are intuitively captured also by high-level concepts such as quantal response equilibrium and noisy introspection. Third, I analyze this hypothesis econometrically. The results concur. In six different data sets, the choices are described more adequately as mixtures of quantal response equilibrium and noisy introspection than as level-k mixtures. © 2012 Elsevier Inc. All rights reserved.

1. Introduction In the p-beauty contest, n players pick numbers xi ∈ [0, 1] and the player whose “guess” is closest to p ∈ (0, 1) times the mean of all guesses wins a fixed prize. Nagel (1995) reported the first experimental investigation of such p-beauty contests and advanced a theory of “iterated best responses” to explain her observations. Simultaneously, Stahl and Wilson (1995) developed the level-k model of iterated reasoning. Accordingly, the population consists of level-0 types who randomize uniformly, of level-1 types who assume all their opponents are level 0, of level-2 types who assume all their opponents are level 1, and so on. The level-k model has a variety of plausible and attractive features and is currently considered the leading theory of strategic choice in novel situations.2 The present study challenges the level-k interpretation of strategic choice in p-beauty contests, however, and presents evidence in favor of high-level concepts such as equilibrium and rationalizability to explain the choices. The arguably most convincing feature of level-k models is that it is based on the existence of non-strategic players (level 0) and of players assuming their opponents are non-strategic (level 1). The existence of level-1 players is particularly convincing in novel situations, where one acts under uncertainty. Having to formulate subjective beliefs about the opponents’ choice probabilities as a basis for the own choice, the belief that one’s opponents would randomize uniformly follows (if nothing else) from the principle of insufficient reason.

*

Fax: +49 3355534 2390. E-mail address: [email protected]. 1 I thank two anonymous referees, the advisory editor, Friedel Bolle, Daniel Zizzo, and participants at ESA’s 2009 European meeting for many helpful comments, and I gratefully appreciate the support of Rosemarie Nagel by sharing the data. 2 For example, see Ho et al. (1998), Costa-Gomes et al. (2001), Crawford and Iriberri (2007), and Stahl and Haruvy (2008) for level-k analyses and Camerer et al. (2004), Kübler and Weizsäcker (2004), and Rogers et al. (2009) for related analyses. 0899-8256/$ – see front matter doi:10.1016/j.geb.2012.02.010

© 2012

Elsevier Inc. All rights reserved.

556

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

Note: The vertical axes are scaled such that the aggregate areas under the histograms are 1. The two additional lines are kernel density estimates, using Gaussian smoothing kernels, either with bandwidth according to Silverman’s rule of thumb (“Kernel-1.0”) or half this value (“Kernel-0.5”). The value of p was 2/3 in all cases, and I excluded one of the Newspaper experiments of Bosch-Domenech et al. (2002), as it restricted the choice set to natural numbers ( 1). Fig. 1. Overview of the data from one-shot beauty contests compiled by Bosch-Domenech et al. (2002).

The existence of players on level 2 and higher, i.e. of players who go further and believe all their opponents are at least level 1, seems less obvious, however. I will argue that actions and beliefs of these subjects—traditionally attributed to be level 2 and higher—can be captured more accurately by higher-level concepts. To outline my argument, let us look at Fig. 1, which reviews the data compiled by Bosch-Domenech et al. (2002) in six different sets of p-beauty contest experiments (with p = 2/3). The most focal observations in the data are probably the modes at 0.333 and 0.222. These modes are typically referred to as the choices of level-1 and level-2 players, respectively, since p · 0.5 = 0.333 and p 2 · 0.5 = 0.222 (with p = 2/3 and 0.5 being the level-0 mean). First, in Section 2, I show that level-2 strictly applied does not yield a cluster of choices around 0.222. Secondly, these modes combine for less than 20% of the observations in all cases, they rarely have the highest density in these samples (in most cases, the Nash equilibrium has higher density), and their exact location varies between data sets (see in particular the “Laboratory” data). Most importantly, the strength of the mode around 0.222 varies drastically between data sets, it is hardly noticeable in any of the density estimates, and higher-level modes do not exist. Thus, it seems possible to explain the choice patterns without including level-k components for k  2 in the strategic model.3 Thirdly, as Bosch-Domenech et al. (2002) report, many subjects in p-beauty contests explicitly pick numbers they consider “intermediate,” i.e. the subjects themselves believe that some opponents will choose smaller numbers and that others will choose higher numbers. Such beliefs are “balanced” in that they account for the whole distribution of opponents, and they contradict the level-k model for k > 1 if applied strictly—where players believe to be exactly one step ahead of their opponents. Balanced beliefs are implied if players behave according to noisy, high-level concepts such as equilibrium or rationalizability. Needless to say, players trying to formulate “balanced” subjective beliefs are likely not to determine a quantal

3

Coincidentally, Georganas et al. (2010) find psychometric support for the existence of level-1 behavior but not for higher levels.

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

557

response equilibrium (QRE) exactly, but the overall shape of a QRE may well be closer to their subjective beliefs than the level-k model for k  2. For, it captures their objective in formulating beliefs more accurately, and in the end, this may yield a more accurate description of their behavior. The present paper investigates this hypothesis based on the data set compiled by Bosch-Domenech et al. (2002). Its organization and main results are as follows. Section 2 revisits the computation of iterated best responses and level-k choices in p-beauty contests. Section 3 defines the three logistic models of strategic choice to be contrasted. Besides level-k, those are quantal response equilibrium (McKelvey and Palfrey, 1995) and noisy introspection (NI, Goeree and Holt, 2004). Section 4 analyzes their descriptive adequacy. Both QRE and NI improve upon the level-k model overall, but the most adequate models of behavior tend to be mixtures between them. The majority of subjects form components well described as QRE, and the various cluster points (including the Nash equilibrium) correspond more closely with NI than with level-k. Section 5 concludes. The supplementary material contains, amongst others, all parameter estimates and plots illustrating the goodness-of-fit of all models. 2. The level-k theory of choice in p-beauty contests The set of players is denoted as N = {1, . . . , n}, typical players are denoted as i , j ∈ N and actions as xi ∈ [0, 1] for all i ∈ N. In the p-beauty contest, all players move simultaneously, and the payoff of i ∈ N, denoted as πi (x) for all x = (xi )i ∈ N , is



πi (x) =

1/| D (x)|, 0,

if i ∈ D (x), otherwise,

using D (x) = arg min |x j − t |, j∈N

and t =



x j ∗ p /n.

(1)

j∈N

The level-k model rests on the assumption that subjective beliefs of (some) inexperienced players about their opponents’ strategies are simple heuristics such as uniform randomization (see Stahl and Wilson, 1994). More sophisticated players may anticipate this and believe that their opponents’ are level 1, level 2, and so on. This basic level-k model is called “original level-k” in the following, to distinguish it from alternative level-k models defined further below. Definition 2.1 (“Original LevK”). The set of player types is K = {0, . . . , K }; the type shares in the population are (ρk )k∈K . Players of type k = 0 randomize uniformly on [0, 1], and for all k  1, type-k players best respond to level k − 1, randomizing uniformly in cases of indifference. Intuitively, the choices of level-k players approximate the sequence ( pk /2)k1 .4 As the following shows, this does not obtain if subjects randomize uniformly in cases of indifference, which arguably is the most consistent assumption to make in this case. It is the equivalent to assuming uniform randomization at level 0, it is the limiting distribution as random utility perturbations (as in logistic models) disappear, and any refinement of iterated responses implies an incomplete theory as discussed shortly. 2.1. General structure of the level-k choices −1 · p · (0 + 1)/2 (for the following Assuming uniform randomization at level 0, the approximate level-1 response is L 1 = nn− p illustration, this is sufficiently precise). In response to players choosing L 1 , all actions in the interval (BRinf ( L 1 ), L 1 ) are best responses, using



BRinf (xk ) = max

n ∗ (2 − 1/ p ) − 2 n − 2p

 · pxk , 0 .

(2)

BRinf (xk ) is the smallest number x ∈ [0, 1] such that, given the target value t (x) := p · ((n − 1) · xk + x)/n, the distance between x and t (x) is not greater than the distance between xk and t (x). Thus, level 2 randomizes uniformly on (BRinf ( L 1 ), L 1 ). In case n is large, the level-k choices correspond with L 1 = 0.33 −1 · p · (0.11 + 0.33)/2 = and L 2 randomizing on (0.11, 0.33). The best response of level-3 players is approximately L 3 = nn− p 0.148, and level 4 randomizes on (BRinf ( L 3 ), L 3 ) = (0.0492, 0.148). Fig. 2 illustrates these choices. Generally, players at odd levels play pure strategies, those at even levels play mixed strategies, and the supports of the strategies of even levels overlap. The pure strategy of level 1 corresponds with the mode in the data sets around 0.333, but modal choices around 0.222 cannot be explained this way. It seems that refinement of the best responses of even levels yields point predictions approximating ( pk /2)k1 . Actually, this does not lead us far. On the one hand, only rather specific refinement assumptions yield this effect. For example,

4 Most analyses of beauty contests are based on ( pk /2)k1 or closely related sequences, see e.g. Ho et al. (1998), Stahl (1998), Bosch-Domenech et al. (2002), and Kocher and Sutter (2005), but not all claim that this sequence results from the level-k model (e.g. Nagel, 1995, and Stahl, 1996).

558

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

Fig. 2. Level-k predictions for best-responding players.

trembling-hand perfection in general does not, and “uniform perfection,” robustness with respect to uniform trembles, does so only if n is large (for example, the “uniform perfect” response sequence converges above 0.1 if n = 16 as in Laboratory games). On the other hand, point predictions do not explain the actual distribution of choices. Predominantly, subjects do not choose ( pk /2)k1 , and in order to explain the actual choices, substantial noise is required. Substantial noise, in turn, dominates any such refinement concept. The limiting distribution as say logistic errors disappear would be the uniform distribution (on the best responses), rather than a refined choice, however. The level-k model as it is defined above avoids this disconnect and additionally makes for a complete model of beauty contest choices in its own right (i.e. without the necessity to add errors). 2.2. Best responses to randomizing players Next, to illustrate the derivation of best responses to mixed strategies, assume you play a beauty contest against op−1 · px, accounting for the ponents randomizing on [0, 1]. If their strategies have mean x, the intuitive best response is nn− p weight of the own guess, or p · x for simplicity. That is, xi seems to be a best response if it minimizes the distance to the expected target value. Guessing the expected target value is known to be sub-optimal in two-player games, where xi = 0 is generally optimal (under full support, assuming p < 1). The following shows that it is not generally optimal for n = 3 either, and similar arguments apply for larger n, although the bias vanishes as n increases, depending on the distribution of opponents’ choices. In the following, I focus on the response of level-1 players, but responses to other mixed strategies are derived similarly. That is, fix n = 3, p < 1, and consider player i = 1 in response to two opponents randomizing uniformly on [0, 1]. The set of (x2 , x3 ) ∈ [0, 1]2 in response to which a given x1 is closest to the target value is called win region of x1 . Without loss of generality, assume x2 < x3 . Two cases have to be distinguished. On the one hand, in case x1  x2 < x3 , player 1 wins (a share of) the prize if the target value is closer to x1 than to x2 . Using α = p /3, this holds if

1

α (x1 + x2 + x3 )  (x1 + x2 ) 2



x3 

1 − 2α 2α

x1 +

1 − 2α 2α

x2 .

(3)

On the other hand, in case x2 < x1  x3 , player 1 again wins if the target value is closer to x1 than to x2 , but the formal condition is now

1

α (x1 + x2 + x3 )  (x1 + x2 ) 2



x3 

1 − 2α 2α

x1 +

1 − 2α 2α

x2 .

(4)

Note that p < 1 implies that the target value is closer to x1 than to x3 in either case. These two cases correspond with two disjoint win regions in [0, 1]2 and are illustrated in Fig. 3. A second set of restrictions (and win regions) applies when x3 < x2 . Fig. 4 depicts the aggregate win regions in various constellations. In the case of uniformly randomizing opponents, the expected payoff of guessing x1 simply equates with the aggregate area size of the win regions associated with x1 . The expected payoff can be computed for all x1 in closed form, but due to the case distinctions involved, it is relegated to Appendix A. The following results. Proposition 2.2. Assume n = 3 and p ∈ (0, 1). The payoff-maximizing choice in response to two uniformly randomizing opponents is, using α := p /3,

x∗1 = x∗1 =

4α if p < 0.75, 7 − 16α 2α (1 − 2α )

(4 − 7α )(1 − 3α ) + 3α 2

(5) if 0.75  p  0.908,

(6)

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

559

Note: The probability of winning with x1 = 0.5 in this case (p = 0.9 and n = 3) is 0.5417; the size of the shaded area is half this value (0.2708). The probability of winning with the best response x∗1 = 0.5217 in this case is 0.5435. Fig. 3. Win regions if player 1 chooses x1 = 0.5 (in case x2 < x3 and p = 0.9).

Fig. 4. Win regions for various x1 in the cases p = 0.9 and p = 0.67.

x∗1 = 2

 

4−

(2 − 6α )3 α (1 − 2α )(4α − 1)

 if p > 0.908.

(7)

In the extreme case, as p is equal to 1, player 1 is best off picking the number that is between the opponents’ guesses with highest-possible probability, i.e. 0.5 to maximize the chances to win in case x2 < x1 < x3 or x3 < x1 < x2 . As p decreases, player 1 also has a chance to win when his guess ends up being below both opponents’ guesses. In this case, the 2α (x1 + x2 ), see Fig. 3, becomes steeper and the central win region, i.e. winning in case boundary of the win regions 1− 2α x1  x2 < x3 , opens up. In these cases, where 0.91 < p < 1, the players start to gain from emphasizing their chances to win conditional on x1  x2 < x3 . The x1 results in a tradeoff to win in either of these cases, x1  x2 < x3 or x2 < x1 < x3 or x3 < x1 < x2 , and turns out to be decreasing in p, i.e. x1 increases as p decreases. In turn, the optimal choice is not monotonically increasing in p. Initially, the structure of the win region is akin to Fig. 4(a) (although x1 > 0.5), for p = 0.908 it reaches Fig. 4(b), and as p falls further, the best responses start falling again. The structure of the win regions then corresponds with either Fig. 4(c), if 0.75  p  0.908, or Fig. 4(d), if p < 0.75. −1 · px. The most counterProposition 2.2 shows that best response functions are considerably more complex than x∗i = nn− p intuitive case for n = 3 seems to be p = 0.9, which happens to have been chosen in three-player treatments by Ho et al.

560

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

(1998). In case p = 0.9, the payoff-maximizing choice in response to uniformly randomizing players is x∗1 = 0.5217—the level-1 choice is greater than the average guess of level-0 players in this case. In relation to the expected target value, which is α ∗ (0.5 + 0.5 + x∗1 ) = 0.457, the optimal x∗1 is on the “wrong side” of the opponents’ means.

−1 The best response to uniformly randomizing opponents converges to nn− · px, and hence to px, as n approaches infinity. p This convergence suggests that strategic effects become negligible when n is greater than say 10. Ho et al. (1998), however, found that subjects behave as if their opponents’ guesses would be correlated with ρ = 1. Hence, the strategic effects inducing deviations from px do not become negligible as n grows. Such correlation will be investigated in the next section.

3. Logistic models of strategic choice As Fig. 1 shows, the exact location of the empirical “level-1” mode varies across data sets. These shifts of the modes cannot be explained by the original level-k model, as L 1 ≈ 0.33 for all n  10. In order to explain the shifts, we have to weaken the assumption that players are rational, i.e. the assumption that they play best responses to their subjective beliefs. Clearly, reducing the precision of responses, from best responses to say logit responses, does not induce a shift of the mode of choices. The best response(s) would still be chosen most frequently. An alternative approach to explain the shifts of the modes was proposed by Ho et al. (1998). They argue that subjects determine their (best) responses in thought experiments, to simulate the choices of their opponents, but with a limited number of independent draws. Specifically, they found that the opponent’s draws are correlated in the thought experiment (Costa-Gomes et al., 2009, report a similar result). Correlation of truncated random variables (be they uniform, random, or logistic) is difficult to formalize precisely, however, as multivariate distribution functions for arbitrary correlation matrices are not available. Therefore, I adopt an alternative approach to model the thought experiment. In this model, subjects estimate expected payoffs based on a small number m of independent draws, say m = 3 or m = 4, and these draws are then extrapolated to the actual number n − 1 of opponents. Unbiased behavior results if m = n − 1, but the hypothesis m = n − 1 will be rejected in all cases. Notably, in addition to explaining shifts of the mode of choices, this model is compatible with two well-known biases from choice under risk and uncertainty. It is compatible with overconfidence, as m < n − 1 implies that one overestimates the individual probability of winning, and with the “believe in small numbers,” as such players extrapolate based on small numbers of observations. Formally, if i’s opponents randomize (i.i.d.) according to the density f o on [0, 1], i estimates the expected payoff in a thought experiment using m  n − 1 draws from f o . Let f m (·| f o ) denote the joint density of m i.i.d. random variables with density f o , and define i’s payoff from xi as

 f m (y| f o ) p i (xi , y) dy,

π (xi |m, f o ) =

(8)

[0,1]m

where p i (xi , y) indicates whether xi wins in response to y

 p i (xi , y) =

1,

if |xi − t (xi , y)|  | y j − t (xi , y)|

0,

otherwise,

∀ j = 1, . . . , m ,

assuming the target value t (xi , y) is the weighted average of xi and y

t (xi , y) =

1 n

 xi +

n−1  m

 yj .

j m

The scaling factor (n − 1)/m implies that i judges his own weight accurately. Finally, the expected payoff for real m ∈ R is interpolated as follows: the expected payoff is the weighted average of using m and m , with weights m − m and m − m, respectively. I assume m  2 for computational reasons. In addition to such sampling, I allow for the possibility of logistic errors in the payoff estimates (i.e. for multinomial logit choice functions). The main alternative approach to such random utility models are random behavior models, i.e. the addition of say normal errors to the response (Stahl, 1996; Ho et al., 1998). I choose the random utility model, because it ensures that the probability of deviations from the best response is decreasing in the loss occurred thereby (as observed by Battalio et al., 2001), and because utility perturbations can be interpreted as mistakes as they occur in computations of expected payoffs, while behavior perturbations are usually interpreted as trembles. The idea of mistakes in the stage of payoff estimation seems more intuitive than trembles during the execution of one’s (correctly optimized) choice.5 Definition 3.1 (“Logistic LevK”). For all k  1, players of type k randomize on [0, 1] according to the density (using λk  0)

5 Note that I abstain from truncating the support of higher-level choices to subsets of the strategy set (see e.g. Ho et al., 1998). For, all actions but the upper bound are rationalizable in continuous beauty contests, and all choices but the k highest ones are level-k rationalizable in discrete beauty contests.

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

561

Note: The plots are “proportional” representations of the choice probabilities in the discrete beauty contest with 1001 choices (0, 0.001, 0.002, . . . , 1). The actual choice probabilities had been taken to the power of 0.25, as they often are very close to zero and hardly visible on a [0, 1] axis. Fig. 5. Level-k predictions for various precision levels (n = 30 and m = n − 1).



 f i (xi ) = exp λk · π (xi |m, f k−1 )

1





exp λk · π (˜xi |m, f k−1 ) d x˜ i .

(9)

0

The population as a whole is modeled as a finite mixture (McLachlan and Peel, 2000) of K discrete subject types (“components”). This approach is particularly feasible to describe the discrete structure of iterated reasoning and has therefore become standard practice in level-k and related analyses. Formally, σk , k ∈ K , denotes the parameter(s) defining type k and ρk denotes its relative frequency in the population, with k∈ K ρk = 1. The log-likelihood of the finite mixture model (σk , ρk )k∈K given the observations o = (om )m=1,..., M is

LL(σ , ρ |o) =

M  m =1

ln



ρk · f k (om |σk ),

(10)

k0

using f k (x|σk ) as the density of the level-k strategy. Since the density of a pure strategy approximates infinity, likelihood and log-likelihood of any “pure” level-k model approximate infinity—for almost all parameterizations—if this pure strategy is observed at least once. This renders likelihood maximization in continuous beauty contests inadequate, but the continuity assumption does not seem reasonable in the first place. For subjects picking numbers such as 0.32335 do not seem to deviate from say 0.323 based on expected payoff calculations. Digits beyond the third one seem to be chosen randomly and can therefore be rounded off. In the following, I assume that the smallest choice unit is 0.001 and that the strategy set is {0, 0.001, 0.002, . . . , 1}. Observations off this grid are rounded toward the nearest number on the grid. Fig. 5 illustrates how the logistic level-k model approaches the original level-k model (Fig. 2) as λ is increased uniformly for all components (below, I will allow for level-specific precision λk ). It complements the above analysis—as precision increases, the peak of level 1 becomes thinner, approaching the cluster at 0.33 observed in the data sets (Fig. 1), but the distribution of level-2 choices becomes flatter, and thus the overall choice pattern actually diverges from the observations. Further, a plethora of level-k components would be required to explain the overall distribution of choices, while high-level concepts as defined next may be able to explain these non-clustered choices jointly, as part of a single component. The two best-known concepts of high-level reasoning in this context are quantal response equilibrium (QRE, McKelvey and Palfrey, 1995) and noisy introspection (NI, Goeree and Holt, 2004). The QRE concept considered here is the logit equilibrium, which adds (extreme-value distributed) utility perturbations to Nash equilibrium, and NI adds utility perturbations to an iterative concept similar to rationalizability. To define these concepts formally, maintain π (xi |m, σ ) as the expected

562

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

Note: Similarly to Fig. 5, the choice probabilities are rescaled (by taking them to the power of 0.25) to accentuate the effects on the [0, 1] scale. Further, m = 4 in the NI models. Fig. 6. QRE and NI predictions.

payoff of choosing xi ∈ X := {0, 0.001, 0.002, . . . , 1} if all opponents play according to the strategy the discrete logit choice probabilities for given λ and m as







σ xi |λ, σ = exp λ · π xi |m, σ

 





exp λ · π x i |m, σ



.

σ ∈ ( X ), and define (11)

x i ∈ X

The quantal response equilibrium is defined as follows. (In the analysis, I will consider mixture models with up to K = 4 such QRE components.) Definition 3.2 (QRE). Given λ ∈ R+ , the QRE choice probabilities

σ ∈ ( X ) satisfy σ = σ (·|λ, σ ).

In p-beauty contests, it seems plausible to assert uniqueness of QREs for moderate λ, but a general result confirming this assertion is not available. The QREs underlying my analysis are the ones located along the principal branch, which is a result of using the following simple homotopy method (for more elaborate methods, see Turocy, 2005, 2010). Starting at a known solution (i.e. at the one for λ = 0 initially), I increase λ in steps of at most 0.25, and after each increase of λ, I solve for the respective QRE by function iteration. If the function iteration does not converge, the step size is reduced. Thanks to the simplicity of the beauty contest game, this simple method worked well. Noisy introspection, in contrast, is defined as follows. Definition 3.3 (NI). Fix λ ∈ R+ , all k  0, σk = σ (·|λk , σk+1 ).

μ ∈ [0, 1), and for all k ∈ N0 define λk = λ · μk . The NI choice probabilities are σ0 , where for

Essentially, NI captures a continuum of subjective beliefs ranging from uniform randomization (μ = 0) to equilibrium (μ ≈ 1). The unifying idea is that players believe their precision exceeds their opponents’ precision by factor μ−1 , who in turn believe to be μ−1 as precise as their opponents (including me), and so on, consistently for an infinite number of steps. In the analysis, I will assume a maximum of K = 100 induction steps and μ  0.95. In order to model choices based on a larger number of induction steps or larger μ, the equilibrium concept QRE can be adopted virtually without loss. Fig. 6 illustrates the differences between QRE and noisy introspection for various λ and μ ∈ {0.1, 0.3, 0.6, 0.9}. Basically, QRE distributions have larger variance than NI distributions, and for m  7 and intermediate λ, they exhibit an increasingly pronounced bimodal shape. Thus, some players may play the right tail, while other players play those playing the right

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

563

Table 1 Evaluation of the models with up to K = 4 components. K

Orig LevK

Log LevK

NI

QRE

Laboratory 1 2 3 4

−594.16 −585.47 −585.47 −585.47

<<< <<< <<< <<<

−565.42 −560.91 −559.26 −559.18

<< <<< <<< <<<

−562.51 −550.05 −543 −542.65

>> >>> >>> >>>

−566.35 −566.34 −566.33 −566.34

−822.14 −783.06 −782.59 −780.72

<<< <<< <<< <<<

−797.19 −768.49 −759.52 −759.09

<<< <<< <<< <<<

−759.44 −738.18 −723.96 −717.99

= < >>> >>>

−759.43 −735.84 −735.86 −735.82

−953.41 −926.42 −926.42 −925.73

<<< <<< <<< <<<

−901.92 −873.73 −873.73 −861.59

<<< <<< <<< <<<

−896.2 −858.99 −847.29 −842.09

>>> >>> >>> >>>

−909.92 −887.32 −881.21 −879.79

−999.24 −998.63 −998.63 −987.16

= = = =

−999.24 −998.43 −998.11 −986.38

<<< <<< <<< <<<

−891.17 −823.49 −820.62 −818.56

<<< >>> >>> =

−873.97 −838.3 −830.03 −819.38

−1036.38 −1006.71 −1006.71 −1006.71

<<< <<< <<< <<<

−985.98 −966.5 −966.2 −966.11

<<< <<< <<< <<<

−931.05 −878.07 −846.41 −820.3

< >> >>> >>>

−929.13 −884 −883.05 −883.02

−43 546.2 −42 827.54 −42 788.56 −42 489.07

<<< <<< <<< <<<

−43 358.55 −42 657.98 −42 654.35 −42 302.12

<<< <<< <<< <<<

−43 273.48 −41 088.15 −40 394.9 −40 052.94

<<< >>> >>> >>>

−43 250.26 −42 028.55 −41 383.48 −41 381.76

Classroom 1 2 3 4 Takehome 1 2 3 4 Theorists 1 2 3 4 Newsgroup 1 2 3 4 Newspaper 1 2 3 4

Note: The relation signs indicate the level of significance in likelihood ratio (Vuong) tests for nested or non-nested models, as appropriate. In non-nested cases, the test statistic includes the BIC correction term following Eq. (5.9) of Vuong (1989). The sign = indicates insignificance, < indicates significance at the 0.1 level in two-sided tests, << indicates significance at 0.02, and <<< indicates significance at 0.001.

tail—all within a single QRE component. These two strategies are not equally profitable, as shown by the differences in their choice probabilities, but as λ is intermediate, neither of them dominates. The NI distribution closest to bimodality in Fig. 6 is the one for (λ, μ) = (10, 0.9), but as λ increases, the NI distributions approximate pure strategies (thin peaks) rather than the QRE distributions (flat peaks). This difference allows us to distinguish QRE and NI types. 4. Analysis I start with a brief description of the econometric procedure underlying the analysis. The models are estimated by maximizing the full information likelihood jointly over all parameters using the gradient-free NEWUOA algorithm (Powell, 2008), verifying convergence using a Newton–Raphson algorithm, and ensuring global maximization using a variety of starting values per model (on the order of 100 per model). Standard errors are obtained from the information matrix. Expected payoffs are computed by Quasi Monte Carlo integration (using Niederreiter sequences with 10.000 · m2 random numbers, which is rather large in this context). All parameter estimates are provided as supplementary material. In the analysis, I focus on models with up to four components. This implies that level-k models cannot explain observations close to the Nash equilibrium. In order to explain these observations, however, “high-level” concepts such as NI and QRE appear more intuitive than explicitly accounting for levels 7–10. Thus, to explain the choices of all subjects, mixtures of concepts, between say level-k and QRE, would be required, or NI models.6 Tables 1 and 2 summarize the results. Table 1 displays the log-likelihoods of all models and the results of likelihood ratio tests between “adjacent” models.7 Table 2 displays the results of likelihood ratio tests between the models with the estimated number of components. The tests are nested or non-nested Vuong (1989) tests, as appropriate, using the BIC

6 Detailed information on such mixtures are provided in the supplementary material. They do not improve on the goodness-of-fit of the best models (NI). 7 Concerning these likelihood ratio test results, transitivity in the sense a < b ∧ b < c ⇒ a < c holds (using the notation of Table 1). However, a < b ∧ b = c  a < c. For this reason, Table 2 is included.

564

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

Table 2 Likelihood ratio tests between the models with the estimated numbers of components, their pseudo- R˜ 2 , and the respective shares of level-0 players. Laboratory Orig LevK 2 Log LevK 2 QRE 1 NI 3

Orig LevK 2

−585.47 −560.91 −566.35 −543

Classroom Orig LevK 2 Log LevK 3 QRE 2 NI 4

−783.06 −759.52 −735.84 −717.99

Takehome Orig LevK 2 Log LevK 4 QRE 3 NI 4

−926.42 −861.59 −881.21 −842.09

Theorists Orig LevK 4 Log LevK 4 QRE 4 NI 2

−987.16 −986.38 −819.38 −823.49

Newsgroup Orig LevK 2 Log LevK 2 QRE 2 NI 4

−1006.71 −966.5 −884 −820.3

Newspaper Orig LevK 4 Log LevK 4 QRE 3 NI 4

−42 489.07 −42 302.12 −41 383.48 −40 052.94

Log LevK 2

QRE 1

NI 3

<<<

<<< =

<<< <<< <<<

>>> >>> >>>

= >>>

Orig LevK 2

Log LevK 3

QRE 2

NI 4

<<<

<<< <

<<< <<< <<<

>>>

>>> >>> >>>

> >>>

Orig LevK 2

Log LevK 4

QRE 3

NI 4

<<<

<<< =

<<< <<< <<<

>>>

R˜ 2 0.094 0.402 0.388 0.489 R˜ 2 0.438 0.538 0.713 0.707 R˜ 2 0.274 0.636 0.407 0.693

ρ0 0.731 0.273 0 0.034

ρ0 0.511 0.27 0 0.011

ρ0 0.629 0.484 0 0.243

>>> >>> >>>

= >>>

>>>

Orig LevK 4

Log LevK 4

QRE 4

NI 2

R˜ 2

ρ0

=

<<< <<<

<<< <<< =

0.146 −0.002 0.898 0.9

0.777 0.773 0.024 0.172

R˜ 2

ρ0

= >>> >>>

>>> >>>

=

Orig LevK 2

Log LevK 2

QRE 2

NI 4

<<<

<<< <<

<<< <<< <<<

>>> >>> >>>

>> >>>

>>>

Orig LevK 4

Log LevK 4

QRE 3

NI 4

<<<

<<< <<<

<<< <<< <<<

>>> >>> >>>

>>> >>>

>>>

0.28 0.47 0.825 0.913 R˜ 2 0.442 0.47 0.602 0.736

0.628 0.609 0 0.129

ρ0 0.588 0.565 0 0.002

Note: The “estimated numbers of components” (given next to the model label) is the highest number of components that is not significantly improved upon at α = 0.1. More details on the test results concerning the number of components are provided as supplementary material. The testing procedure is the same as in Table 1, now evaluating the hypothesis H 0 : LL(Row Model) = LL(Col Model). The sign = indicates insignificance, < indicates significance in favor of “Col Model” at the 0.1 level in two-sided tests, << indicates significance in favor of “Col Model” at 0.02, and <<< indicates significance at 0.001. The signs >, >>, >>> respectively indicate significance in favor of “Row Model”. The value R˜ 2 is the adjusted pseudo-R 2 (following Cox and Snell) of the L (Unif ) respective logistic level-k model. That is, R˜ 2 = 1 − ( exp(BIC) )2/ N , with L (Unif ) as the likelihood of the uniform distribution, BIC = −LL + #Pars · log( N )/2, and N as the number of observations. The value ρ0 is the share of level-0 players according to the ML estimates of the logistic level-k model.

correction term, as suggested by Vuong to account for differences in the dimensionalities in the parameter spaces in nonnested cases. Table 2 additionally reports the adjusted pseudo-R 2 of the fitted models, which adjusts for the number of parameters by using the Bayes information criterion (BIC, Schwarz, 1978) rather than the log-likelihood (for its definition, see Table 2). Thus, its values may be negative if its explanatory content does not justify its number of parameters, and more generally, R˜ 2 puts the BIC into perspective and shows how much of the variance the model actually explains. Note that the model that maximizes R˜ 2 simultaneously minimizes BIC. Table 1 shows that noisy introspection improves upon logistic level-k highly significantly, for all model dimensions and in all data sets. In turn, logistic level-k improves highly significantly upon the original level-k model in all cases but Theorists, to which the level-k models with k  4 do not fit (the corresponding pseudo- R˜ 2 is negative in Table 2). The relation between NI and QRE models is slightly mixed. If the number of components is set to be 1, then the models tend to fit the overall distribution of choices. This is the case in Classroom, Theorists, Newsgroup, and Newspaper. With respect to these overall distributions, the QRE model fits at least as good as, and sometimes better than, the NI model restricted to μ  0.95. Thus, the overall distribution actually resembles a noisy equilibrium. However, as further components are added, such as level 1, NI attains an overall fit that is better than QRE’s. Table 3, which reports the parameter estimates of the estimated models, confirms this observation. In all data sets, the majority of subjects belong to a component where the estimated μ is close to the assumed upper bound 0.95, which essentially reflects equilibrium behavior.8 Further, in all data sets but Laboratory, there is a second component with “high-precision QREs” (λ > 30 and ρ ≈ 0.95), which is a second equilibrium component, in this case the Nash equilibrium.

8 The supplementary material provides estimates for models containing both QRE and NI components. These models confirm the hypothesis that at least one NI component can be substituted by a QRE component without loss.

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

565

Table 3 Parameter estimates of the estimated NI models.

ρ0

Component 1

Component 2

Component 3

Component 4

ρ

λ, μ

ρ

λ, μ

ρ

λ, μ

ρ

0.11 (0.04)

640.27, 0 (122.2, 0)

0.09 (0)

5405.76, 0 (0.24, 0)

0.77 (0.18)

3.8, 0.86 (0.01, 0)

0.79 (0.07)

9.16, 0.93 (0.03, 0)

0.06 (0)

51.35, 0.95 0.44, 0

0.08 (0)

1883.56, 0 (200.56, 0)

0.06 (0)

0.14 (0.03)

9509.16, 0 (953.17, –)

0.08 (0)

852.77, 0.01 (0.57, 0)

0.05 (0)

63.6, 0.95 (1.57, 0.03)

0.51 (0.08)

0.14 (0.08)

12.55, 0.95 (0.02, 0)

0.69 (0.08)

24.28, 0.95 (0, 0.02)

0.09 (0)

77 883.29, 0 (–, 0)

0.05 (0)

43.78, 0.84 (0.04, 0)

0.17 (0.03)

60.48, 0.95 (1.51, 0.02)

0.56 (0.07)

0.32 (0.01)

38.99, 0.93 (1.19, 0.03)

0.08 (0)

2118.57, 0 (12.5, 0)

0.57 (0.01)

6.33, 0.95 (0.01, 0)

0.04 (0)

m

LL

λ, μ

Laboratory 0.0338

2.4836 (2e−04)

−543

968.66, 0.01 0.83, 0

3.0435 (0)

−717.99

8.71, 0.94 (0.01, 0)

3.0793 (9e−04)

−842.09

2.0202 (0)

−823.49

9.26, 0.95 (0.01, 0)

3.579 (0.0027)

−820.27

2209.58, 0 (0.06, 0)

2.9402 (0)

Classroom 0.0109 Takehome 0.2269 Theorists 0.1716 Newsgroup 0.1289 Newspaper 0.0019

−40 052.94

The other estimated components tend to fit the mass points at 0.33 and 0.22, i.e. the observations typically attributed to be of level 1 and 2. A mass point at 0.33 results for high λ and μ close to zero.9 A mass point at 0.22 results for high λ and μ slightly above zero (e.g. around 0.01). Essentially, this second group responds with high precision to the belief that their opponents’ choices have high variance and mean close to 0.33. The NI model is flexible enough to explain such beliefs, while the (logistic) level-k model is not (see Fig. 2). This does not rule out that alternative level-k specifications may explain such beliefs. The observations that the majority of the subjects play a noisy equilibrium and that a second group plays the Nash equilibrium imply that most choices are described more adequately using QRE than using level-k. Overall, Table 2 shows that the estimated QRE models fit at least as good as the estimated level-k models in all data sets. However, QRE and level-k tend to capture different subjects, and only NI can capture all of them. Correspondingly it improves upon both level-k and QRE in all data sets but Theorists (which seem to be equilibrium players). Fig. 7 illustrates the attained fit. Overall, the estimated NI models fit well both visually, i.e. qualitatively, as well as quantitatively. The adjusted R˜ 2 reported in Table 2 are 0.5 in the Laboratory data, and either 0.7 or 0.9 in the other data sets. Thus, most of the observed variance is explained by noisy introspection and quantal response equilibrium. Finally, we can compute the posterior classifications of the subjects using the model estimates. Given a data set o and model estimates (ρk , σk ), the probability that subject s is of type k conditional on its guess o s is10



Pr(type = k|o s ) = ρk · f k (o s )

ρk · f k (o s |σk ).

(12)

k ∈K

In general, choices close to zero are high-precision (Nash) equilibrium choices and choices around 0.33 are NI choices with μ = 0, i.e. level 1. However, the ranges of choices that classify as level 1 is very thin in all cases. The neighboring choices on both sides of this range are much more likely to have been made by subjects belonging to the noisy equilibrium component discussed above. These observations falsify an assumption frequently made in the literature: a low guess does not imply (ex post) a high level of reasoning. Since the majority of subjects belong to a single component, the noisy equilibrium, the ex post most likely explanation for choices such as 0.1 or 0.4, which are not particularly close to either of the mass points, is that they have been made by members of the QRE component. That is, an ex post distinction based on beauty contest choices seems largely uninformative. 5. Concluding discussion In this paper, I analyzed p-beauty contests asking whether choices of subjects that do not classify as level 1 are modeled adequately using high-level concepts. The qualitative observations in favor of high-level concepts are that the mode of the

9 10

The entries in Table 3 are rounded. The exact values are available from the author. The corresponding plots are provided as supplementary material.

566

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

Note: The predictions of the models with the “estimated numbers of components” (highest number of components that is not significantly improved upon at α = 0.1) are plotted on top of the histograms of the respective data set, and the predictions are aggregated to match the bins of the histograms. The broken line is the kernel estimate labeled “Kernel-0.5” in Fig. 1. Fig. 7. Predictions of the Noisy Introspection models with the estimated numbers of components.

observations around 0.222 is not compatible with level-k if applied strictly (Fig. 2), that this mode or any higher-level mode are not noticeable in the empirical density estimates (Fig. 1), and that players choosing numbers such as 0.2 seem to explicitly choose “intermediate” numbers (as reported by Bosch-Domenech et al., 2002). The latter is not compatible with the level-k idea where players believe to be exactly one step ahead of their opponents, while it is compatible with noisy, high-level concepts such as quantal response equilibrium and noisy introspection. The quantitative analysis confirmed the hypothesis (see Tables 1 and 2). The majority of subjects belong to a quantal response equilibrium component. The remaining subjects distribute across three mass points (around 0.333, 0.222, and at or near 0) that reflect highly precise responses to specific beliefs. The first two mass points relate intuitively to level-k beliefs, since they follow from beliefs with means around 0.5 (leading to choices of 0.333 if n is large) and means around 0.333 (leading to choices of 0.222). The latter cannot be explained adequately as level 2, however, even if we allow for logistic choice. The best response to a belief with mean 0.333 and low variance is not near 0.222, and level-1 choices with high variance (low precision) do not have mean 0.333. Since modes at 0.222 are compatible with noisy introspection and the remaining choices (including level 3 and higher) are found to form the QRE component, only the level-1 choices 0.333 are left to be classified. Level-1 choices are included in many model families as a special case, e.g. level-k, NI, and heterogeneous QREs (Rogers et al., 2009), which underlines their epistemological robustness in addition to their psychometric relevance (Georganas et al., 2010). Thus, the underlying question is along which model family rooted in level-1 (the “principle of insufficient reason”) strategic reasoning develops as it becomes sophisticated. By the above results, this seems to be the NI family rather than the level-k family. Since the estimated models fit well, with pseudo-R 2 between 0.5 and 0.9, this conclusion appears robust. Further research may investigate the robustness of these results with respect to other beauty contest variants (varying e.g. p) and with respect to related games where level-k reasoning has been shown to improve upon standard models (such as auctions, see Crawford and Iriberri, 2007). Also, one may relax the independence of irrelevant alternatives (IIA) assumption underlying multinomial logit, as in beauty contests, the choice sets are ordered and can be partitioned into subsets by iteratively eliminating dominated strategies. If subjects do not ignore such orderedness or partitioning, their choices violate IIA and can be modeled more accurately using ordered GEV or nested logit models (see e.g. McFadden, 1984).

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

567

Fig. 8. The cases distinguished in Proposition 2.2 for p  0.75.

Appendix A. Proof of Proposition 2.2 A.1. Best responses in case p  0.75 ⇔ α  0.25 Fix x1 and define f and g as follows.

f (x) =

1 − 2α 2α





(x1 + x),



g ∈ x| f (x) = x

g=

1 − 2α 4α − 1

(13)

x1 .

2α Note that α  0.25 implies 1−  1. The complementary case α < 0.25 is analyzed below. 2α Both f and g can be interpreted easily in relation to Fig. 3. On the one hand, f is the lower-right boundary of the area characterized by Eq. (4) and the upper-left boundary of the one characterized by Eq. (3). On the other hand, g is the fixed point of f , i.e. the point where it intersects with the diagonal. These functions greatly simplify the notation relating to the conditions (3) and (4). The payoff-maximizing x1 is derived in three steps. First, I show that x1  1−α2α , i.e. x1 is not optimal if implies win regions as in Fig. 8(c).

Lemma A.1. The best response satisfies x∗1  1−α2α . Proof. If x1  1−α2α , see Fig. 8(c), the expected payoff is





π (x1 ) = (1 − x1 )2 + f −1 (x1 ) + f −1 (1) ∗ (1 − x1 ).

(14)

This equates with

π (x1 ) = and implies

1 − (3 − 8α ) ∗ x1 1 − 2α

∗ (1 − x1 )

π (x1 ) < 0 for all x1 

α

1−2α

.

(15)

2

Second, Lemma A.2 establishes the condition differentiating the cases displayed in Figs. 8(a) and 8(b). −1 Lemma A.2. The best response satisfies x∗1  41α −2α if and only if α  0.3026. −1 Proof. If x1  41α −2α , see Fig. 8(a), the expected payoff is





 





π (x1 ) = 2 ∗ x1 ∗ (1 − x1 ) − x1 − f −1 (x1 ) ∗ f (x1 ) − x1 + f (x1 ) − x1 ∗ [ g − x1 ] = 2 ∗ x1 ∗ (1 − x1 ) −

(2 − 6α )2 2 (2 − 6α )2 2 x1 + x , (1 − 2α )2α 2α (4α − 1) 1

(16)

−1 and the first derivative of it is positive for all x1  41α −2α if and only if

2α (1 − 2α )2 + (2 − 6α )3 − 4α · (1 − 2α )(4α − 1) > 0,

(17)

4α −1 . 1−2α

α < 0.3026. Hence, α  0.3026 implies x1  −1 α If 41α −2α  x1  1−2α , see Fig. 8(b), the expected payoff is

i.e. iff



 





 



π (x1 ) = 1 − x21 − x1 − f −1 (x1 ) ∗ f (x1 ) − x1 − f −1 (1) − x1 ∗ 1 − f (x1 ) = 1 − x21 −

(2 − 6α )2 2 [2α − 2x1 (1 − 2α )]2 x − . (1 − 2α )2α 1 (1 − 2α )2α

(18)

568

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

Fig. 9. The cases distinguished in Proposition 2.2 for p < 0.75.

It is easy to verify that its first derivative is non-positive for all x1 satisfying satisfied, i.e. iff

α  0.3026. Hence, α  0.3026 implies x1 

4α −1 . 1−2α

2

4α −1 1−2α

 x1 

α

1−2α

if and only if (17) is not

α  0.3026, then the payoff-maximizing choice is the zero of the first derivative of Eq. (16), which is       (2 − 6α )2 (2 − 6α )2 (2 − 6α )3 ∗ =2 4− , x1 = 2 4 + − (1 − 2α )α α (4α − 1) α (1 − 2α )(4α − 1)

Thus, if

and if

(19)

α < 0.3026, it is the zero of the first derivative of Eq. (18), which is x∗1 =

4α (1 − 2α )

(2 − 6α

)2

+ 2(2 − 3α )(1 − 2α )

=

2α (1 − 2α )

(4 − 7α )(1 − 3α ) + 3α 2

(20)

.

A.2. Best responses in case p < 0.75 ⇔ α < 0.25 2α Fix x1 and define f as above, i.e. f (x) = 1− (x1 + x) for all x ∈ [0, 1]. Now, 1−2α2α > 1, and the cases to be distinguished 2α are the ones depicted in Fig. 9. The proof that choices x1 > α /(1 − 2α ) are not optimal is very similar to Lemma A.1 and therefore skipped. Thus, x1  α /(1 − 2α ), and in this case, the expected payoff is











π (x1 ) = x1 1 − f (0) + 1 − f (x1 ) + (1 − x1 )2 − 1 − f (x1 ) f −1 (1) − x1      1 − 2α 1 − 2α 2α = x1 · 2 − · 3x1 + (1 − x1 )2 − 1 − 2x1 − 2x1 . 2α 2α 1 − 2α By the first-order condition, the optimal choice is the zero of



2−

1 − 2α 2α

     1 − 2α 1 − 2α 2α · 6x1 − 2(1 − x1 ) + 2 1 − 2x1 + 2 · − 2x1 2α 2α 1 − 2α

which yields x1 = 4α /(7 − 16α ). Supplementary material The online version of this article contains additional supplementary material. Please visit doi:10.1016/j.geb.2012.02.010. References Battalio, R., Samuelson, L., Huyck, J., 2001. Optimization incentives and coordination failure in laboratory stag hunt games. Econometrica 69 (3), 749–764. Bosch-Domenech, A., Montalvo, J., Nagel, R., Satorra, A., 2002. One, two, (three), infinity, . . .: Newspaper and lab beauty-contest experiments. Amer. Econ. Rev. 92 (5), 1687–1701. Camerer, C., Ho, T., Chong, J., 2004. A cognitive hierarchy model of games. Quart. J. Econ. 119 (3), 861–898. Costa-Gomes, M., Crawford, V., Broseta, B., 2001. Cognition and behavior in normal-form games: An experimental study. Econometrica 69 (5), 1193–1235. Costa-Gomes, M., Crawford, V., Iriberri, N., 2009. Comparing models of strategic thinking in Van Huyck, Batallio, and Beil’s coordination games. J. Europ. Econometrica Assoc. 7 (2–3), 365–376. Crawford, V., Iriberri, N., 2007. Level-k auctions: Can a nonequilibrium model of strategic thinking explain the winner’s curse and overbidding in privatevalue auctions? Econometrica 75 (6), 1721–1770. Georganas, S., Healy, P.J., Weber, R.A., 2010. On the persistence of strategic sophistication. Working paper. Goeree, J., Holt, C., 2004. A model of noisy introspection. Games Econ. Behav. 46 (2), 365–382. Ho, T., Camerer, C., Weigelt, K., 1998. Iterated dominance and iterated best response in experimental “p-beauty contests”. Amer. Econ. Rev. 88 (4), 947–969. Kocher, M., Sutter, M., 2005. The decision maker matters: Individual versus group behaviour in experimental beauty-contest games. Econ. J. 115 (500), 200–223. Kübler, D., Weizsäcker, G., 2004. Limited depth of reasoning and failure of cascade formation in the laboratory. Rev. Econ. Stud. 71 (2), 425–441. McFadden, D., 1984. Econometric analysis of qualitative response models. In: Handbook of Econometrics, vol. 2, pp. 1395–1457. McKelvey, R., Palfrey, T., 1995. Quantal response equilibria for normal form games. Games Econ. Behav. 10 (1), 6–38.

Y. Breitmoser / Games and Economic Behavior 75 (2012) 555–569

McLachlan, G., Peel, D., 2000. Finite Mixture Models. Wiley Series in Probability and Statistics. Nagel, R., 1995. Unraveling in guessing games: An experimental study. Amer. Econ. Rev. 85 (5), 1313–1326. Powell, M., 2008. Developments of NEWUOA for minimization without derivatives. IMA J. Numer. Anal. 28 (4), 649. Rogers, B., Palfrey, T., Camerer, C., 2009. Heterogeneous quantal response equilibrium and cognitive hierarchies. J. Econ. Theory 144 (4), 1440–1467. Schwarz, G., 1978. Estimating the dimension of a model. Ann. Statist. 6 (2), 461–464. Stahl, D., 1996. Boundedly rational rule learning in a guessing game. Games Econ. Behav. 16 (2), 303–330. Stahl, D., 1998. Is step-j thinking an arbitrary modelling restriction or a fact of human nature? J. Econ. Behav. Organ. 37 (1), 33–51. Stahl, D., Haruvy, E., 2008. Level-n bounded rationality in two-player two-stage games. J. Econ. Behav. Organ. 65 (1), 41–61. Stahl, D., Wilson, P., 1994. Experimental evidence on players’ models of other players. J. Econ. Behav. Organ. 25 (3), 309–327. Stahl, D., Wilson, P., 1995. On players’ models of other players: Theory and experimental evidence. Games Econ. Behav. 10 (1), 218–254. Turocy, T., 2005. A dynamic homotopy interpretation of the logistic quantal response equilibrium correspondence. Games Econ. Behav. 51 (2), 243–263. Turocy, T., 2010. Computing sequential equilibria using agent quantal response equilibria. Econ. Theory 42 (1), 255–269. Vuong, Q., 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57 (2), 307–333.

569