True overconfidence: The inability of rational information processing to account for apparent overconfidence

True overconfidence: The inability of rational information processing to account for apparent overconfidence

Organizational Behavior and Human Decision Processes 116 (2011) 262–271 Contents lists available at SciVerse ScienceDirect Organizational Behavior a...

297KB Sizes 1 Downloads 75 Views

Organizational Behavior and Human Decision Processes 116 (2011) 262–271

Contents lists available at SciVerse ScienceDirect

Organizational Behavior and Human Decision Processes journal homepage: www.elsevier.com/locate/obhdp

True overconfidence: The inability of rational information processing to account for apparent overconfidence Christoph Merkle a,⇑, Martin Weber a,b a b

Lehrstuhl für Bankbetriebslehre, University of Mannheim, L5, 2, 68131 Mannheim, Germany Centre for Economic Policy Research (CEPR), London, UK

a r t i c l e

i n f o

Article history: Received 14 September 2009 Accepted 22 July 2011 Available online 31 August 2011 Accepted by William Bottom Keywords: Overconfidence Better-than-average effect Overplacement Bayesian updating Belief distribution

a b s t r a c t The better-than-average effect describes the tendency of people to perceive their skills and virtues as being above average. We derive a new experimental paradigm to distinguish between two possible explanations for the effect, namely rational information processing and overconfidence. Experiment participants evaluate their relative position within the population by stating their complete belief distribution. This approach sidesteps recent methodology concerns associated with previous research. We find that people hold beliefs about their abilities in different domains and tasks which are inconsistent with rational information processing. Both on an aggregated and an individual level, they show considerable overplacement. We conclude that overconfidence is not only apparent overconfidence but rather the consequence of a psychological bias. Ó 2011 Elsevier Inc. All rights reserved.

Introduction Overconfidence is not just an artifact of psychological experiments but seems present in many real life situations where considerable stakes are involved. Overconfident decision making has been observed in financial markets (Odean, 1998), corporations (Malmendier & Tate, 2005), with business entries (Cooper, Woo, & Dunkelberg, 1988) or even marriages (Mahar, 2003). Indeed, overconfidence is perhaps the behavioral bias most readily embraced by academic researchers in economics and finance. In particular the better-than-average effect, which is the tendency of people to rate their skills and virtues favorably relative to a comparison group, yields direct predictions for economic decision making. In a recent paper, Benoît and Dubra (2009) challenge the notion of overconfidence as it was previously analyzed in psychology and economics. The subject of their criticism is the conventionally used research methodology to demonstrate the better-than-average effect. In a signaling framework, Benoît and Dubra show that rational information processing can lead to the very results formerly interpreted as evidence for overconfidence. This does not rule out true overconfidence as an explanation for these findings, but instead

⇑ Corresponding author. Fax: +49 6211811534. E-mail address: [email protected] (C. Merkle). 0749-5978/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.obhdp.2011.07.004

also allows for straightforward Bayesian updating as an alternative explanation.1 Despite this setback for the overconfidence literature, it is not sufficient to take a methodological viewpoint on the matter; we have to ask ourselves about the psychological reality of this bias and its relation to other self-serving biases. The assertion that people are overconfident is an appealing explanation for behavior, both on the financial markets and elsewhere. In contrast, rational updating is demanding in terms of people’s information processing capacity and the underlying signal structure necessary to produce apparent overconfidence. It therefore seems worthwhile to design a research strategy that would be able to demonstrate the presence of true overconfidence by improving previous research methodology in such a way that it becomes capable of withstanding the critisism of Benoît and Dubra (2009). We identify the aggregation of beliefs as the feature most damaging to the interpretational value of the traditional experimental setting. The simplest setup asks people to judge whether they believe themselves to be above average in a certain domain, as for example in the famous account on driving ability by Svenson (1981). More advanced designs ask participants to specify the percentile of a distribution they believe themselves to belong to (e.g.

1 In accordance with Benoît and Dubra, we use the term ‘‘true overconfidence’’ for truly biased self-evaluations. In contrast ‘‘apparent overconfidence’’ stands for data that seems to reflect overconfidence, but where it is not possible to prove the presence of a better-than-average effect. The term ‘‘apparent overconfidence’’ thus includes cases of true overconfidence and other possible causes such as rational information processing.

C. Merkle, M. Weber / Organizational Behavior and Human Decision Processes 116 (2011) 262–271

Dunning, Meyerowitz, & Holzberg, 1989). Both approaches have the common feature that as they only retrieve a single estimate, a lot of information gets lost, thus leaving room for alternative explanations. Many sets of beliefs can produce the same result when aggregated in this manner; Bayesian posteriors and true overconfidence are just two of them. We design two experiments to elicit more detailed beliefs of participants concerning a number of domains that have previously been associated with overconfidence, overoptimism, or underconfidence. Self-evaluations are given along a quantile scale that describes the ability distribution relative to a peer group. Along this scale, participants provide estimates representing their subjective probability of themselves falling into each skill quantile. The extended assessment allows us to directly test whether the findings are in line with rational information processing. Our central result is that considerable overconfidence is present in the belief distributions of experiment participants. We test various conditions for population averages of these probability distributions and find them incompatible with rational information processing. Bayesian updating can be rejected as an explanation for apparent overconfidence at conventional significance levels. Most people find it highly probable that they rank among the higher quantiles of the ability distribution and not at all likely that they are below average. On an individual level, they often fall short of their expectations, and especially the unskilled exhibit pronounced overconfidence. We conclude that true overconfidence is the main driver of our results. Types of overconfidence While this is not the place to review an abundant overconfidence research in psychology and economics (consider e.g. Glaser & Weber, 2010), it is nevertheless useful to divide the field into three subareas which can be summarized following Moore and Healy (2008): 1. Judgments of one’s absolute performance or ability (overestimation). 2. Confidence in the precision of one’s estimates (miscalibration or overprecision). 3. Appraisal of one’s relative skills and virtues (better-than-average effect or overplacement). Overestimation is diagnosed if people’s absolute evaluation of their own performance (e.g. correct answers in a knowledge test) exceeds their actual performance (Lichtenstein, Fischhoff, & Phillips, 1982; Moore & Healy, 2008). Miscalibration or overprecision denotes the observation that people choose overly narrow confidence intervals when asked for a range that is supposed to contain a true value with a certain probability (Alpert & Raiffa, 1982; Russo & Schoemaker, 1992). Overplacement often occurs when people try to evaluate their competence in a certain domain relative to others. Typically, most people rate themselves above average, which is why this effect is also called better-than-average effect (Alicke & Govorun, 2005). The relationship between these different forms of overconfidence is discussed for instance, in Glaser, Langer, and Weber (2009), Healy and Moore (2007), Larrick, Burson, and Soll (2007). Apart from the aforementioned, overoptimism (Weinstein, 1980) and illusion of control (Langer, 1975) are associated with overconfidence in a broad interpretation of the term. We will concentrate on the better-than-average effect (overplacement) and occasionally on overoptimism, as the elicitation techniques for these biases are similar. Criticism which has been raised against all types of overconfidence is usually directed either at research methodology and experimental design or the underlying concept itself; the list of

263

authors in psychology who have questioned the reality of overconfidence or the research design includes Gigerenzer (1991), Gigerenzer, Hoffrage, and Kleinbölting (1991), Juslin (1994), Erev, Wallsten, and Budescu (1994), Dawes and Mulford (1996), Klayman, Soll, González-Vallejo, and Barlas (1999). In economics—where the rationality assumption was long prevalent—the emphasis was a different one: in recent years, various approaches were pursued to reconcile overconfidence with rational behavior (Bénabou & Tirole, 2002; Brocas & Carrillo, 2002; Compte & Postlewaite, 2004; Healy & Moore, 2007; Köszegi, 2006; Santos-Pinto & Sobel, 2005; Van den Steen, 2004; Zábojník, 2004). These models differ mainly in their assumptions, their relevance for different forms of overconfidence and the degree of rationality they are based on. In many ways, this literature has contributed to improving and clarifying methodology, but the debate whether overconfidence exists at all is far from being settled. Criticism by Benoît and Dubra The reasoning of Benoît and Dubra (2009) to some extent combines the two mentioned strands of criticism. They identify a problematic feature in the conventional procedure to demonstrate the better-than-average effect, namely relative imprecise inquiries for an appraisal of relative skills and virtues. Based on a parsimonious signaling model, they then employ rational Bayesian argumentation to illustrate that this kind of research cannot show overconfidence in the form of the better-than-average effect. We will now examine their reasoning in detail. Probably the most prominent account of the better-than-average effect is given in Svenson (1981), who finds that a great majority of subjects rated themselves to be safer drivers than the median driver (77% of his Swedish and 87% of his US sample). He explains his findings by a general tendency of people to view themselves more favorably than they view others, possibly accompanied by cognitive effects such as low availability of negative memories. Similar results could be reproduced for other domains, for example for people evaluating their personal virtues relative to others (Alicke, Klotz, Breitenbecher, Yurak, & Vredenburg, 1995). These overplacement studies have a common research methodology, which often simply consists of asking participants whether they view themselves as better as or worse than the median or average of a comparison group with respect to some skill or virtue. Researchers occasionally require more precise estimates, i.e. other quantiles (often percentiles or deciles) are used instead of the median. Overconfidence is usually diagnosed if significantly more than half of the participants place themselves above the median, or more generally if more than x% place themselves above the (100  x)-percentile. Some concerns regarding this design were raised earlier; for instance, people may interpret the skill in question differently or they may lack information about its distribution within the population. Additionally, the sample of participants might not be representative of the population, and the meaning of ‘‘average’’ can be understood in various ways. These problems can nevertheless be addressed by a more careful experimental design including precise and unambiguous formulation of questions and a fairly large and representative choice of subjects. Combined with the assumption that participants use best estimates of their own and others’ abilities and skills, the general result remains valid—it seems intuitive that no more than a certain fraction of the population can rate themselves above a respective percentile. However, Benoît and Dubra (2009) show that exactly this is possible even when people update their beliefs in a perfectly rational manner. In order to illustrate this, we shall briefly reproduce their example capitalizing on Svenson’s study of driving ability here. In a uniformly distributed population of low, medium and

264

C. Merkle, M. Weber / Organizational Behavior and Human Decision Processes 116 (2011) 262–271

highly skilled drivers, people are assumed to evaluate their driving skills depending on whether they have previously had an accident or not. Probabilities for causing an accident are given as pL = .8, pM = .4 and pH = 0 for the different groups. If drivers do not know their initial skill level but interpret the occurrence of an accident as a signal, they will update their beliefs according to Bayes’ law. Given the prior probabilities, all people who did not experience an accident will arrive at a posterior probability of 59 that they are of high skill; it seems reasonable for this group to rate themselves above average. As 60% of all drivers have had no accident, this implies that these 60% are expected to regard themselves as highly skilled. Beyond this concrete example, Benoît and Dubra (2009) show that—within the traditional experimental design—almost any distribution of respondents on the percentile scale can be explained by rational information processing. There are some immediate concerns with the Benoît–Drubacriticism. One concern refers to the way people deal with signals in classic overconfidence domains. A very early study of Preston and Harris (1965) suggests that even drivers hospitalized after an accident exhibit the same overplacement patterns when asked for their driving performance. The authors find no evidence that participants adjust their evaluation according to the received signal. Benoît and Dubra (2009) discuss this study and some more recent evidence on how people interpret adverse signals. Although the results are quite mixed, a general self-serving bias in the perception of signals is well documented (Bradley, 1978; Zuckerman, 1979). People tend to ascribe bad outcomes to external forces rather than to their own performance or ability. For this reason, it is at the least unclear whether good and bad signals are perceived symmetrically within a Bayesian model. The framework of Benoît and Dubra also imposes some requirements on signal structure. It is obvious that if the number of signals becomes large (or alternatively the quality of signals very good), overconfidence can no longer be explained rationally. With perfect signals and people allocating themselves reasonably to the percentiles, the distribution will correspond to underlying probabilities. If one extends the aforementioned driving setup by an additional period and maintains the same probabilities as before, this becomes clear: after observing two signals, only 47% would reasonably consider themselves as highly skilled and 27% each as medium or low skill drivers. After ten periods, the Bayesian result is practically indistinguishable from the real probabilities. If, for instance, ten signal observations are interpreted as ten years of driving experience, there is no room for overconfidence afterwards. While it is possible to construct different examples that still show rational overconfidence after many periods, this comes at the cost of a highly asymmetric signal structure with the rare occurrence of (very) negative signals. Indeed, the asymmetric signal structure is one key ingredient to the emergence of rationally explicable overconfidence in Benoît and Dubra (2009), but this is a good portrayal of reality only for some domains (e.g. driving). However, with respect to signal frequency and quality, feedback is far from perfect in many situations—even in financial markets where new information arrives almost continuously. It would therefore be premature to dismiss the Benoît–Dubra-criticism solely on these grounds. We instead design an experiment to distinguish between two possible sources of apparent overconfidence, namely rational information processing and true overconfidence.

on a non-rational formation of beliefs. Overconfidence is consequently mostly modeled as over- or underreaction relative (and thus distinct) to rational Bayesian updating (cp. e.g. Odean, 1998). Classic experiments, however, are unable to distinguish between apparent and true overconfidence. To overcome this problem, Benoît and Dubra (2009) propose using a stronger requirement to test for overplacement. Based on their proof that maximally 2  x% can rate themselves rationally among the top x% of the population, they suggest using this hurdle for future experiments. This rule is unsuitable for the often used median condition and represents a very strong requirement to find overconfidence for other percentiles. For example (following this logic), more than 60% of the subjects must place themselves among the top 30% of a population before one can deduce a better-than-average effect. Although many studies observe overconfidence among their participants, it is rarely this pronounced. Even for the large levels of overconfidence observed by Svenson (1981) this rule allows to identify overconfidence only for some intervals of his US subject sample. Furthermore, this rule applies only in the case that people indeed use the median of their own beliefs to arrive at their rating. Another perfectly prudent way of answering such a question is to take the average of one’s beliefs; in that case almost any possible distribution of self-evaluations can be rationalized (for a proof, see again Benoît & Dubra (2009)). The difficulty in showing true overconfidence within the traditional framework lies in the aggregation of information resulting from subjects placing themselves in one specific half, decile, quartile or other category. In the example mentioned, drivers that experienced no accident had a posterior probability of 59 of being of high skill, 13 of being of medium skill and 19 of being of low skill. This distributional information gets lost if one observes only a point estimate. Fig. 1 shows how a rich belief distribution containing information about probabilities for all deciles is represented by a single rating. First, subjects enjoy some discretion concerning how they determine this rating given their beliefs, i.e. how to summarize their beliefs in a single parameter. Second, and more importantly, the resultant ratings yield much less information to distinguish between true overconfidence and alternative explanations. This is why one can often only speak of ‘‘apparent overconfidence’’ in such situations. The experimental setting proposed here is to ask subjects directly for the probabilities with which they would place themselves in the different quantiles (e.g. deciles). This avoids the complication of different aggregation methods and preserves the additional information coming from people’s distributional beliefs. The setup imposes clear restrictions on what is possible under rational Bayesian updating. Posterior probabilities calculated by Bayes’ law, weighted by their occurrence, must add up to the relative frequencies within the population. In a quantile framework,

Derivation of an alternative experimental design The distinction of how people arrive at overconfident judgments is crucial, as Bayesian updaters are not biased in a special direction; whether they appear over- or underconfident is simply a result of prior probabilities and signal distribution. Any claim that human beings are persistently overconfident must be based

Fig. 1. Aggregation of beliefs. Notes: The top panel of the Figure shows a belief distribution over ten deciles a person may possess about a skill or ability. The typical assessment of the better-than-average effect asks people to aggregate this belief distribution in a one point-estimate.

265

C. Merkle, M. Weber / Organizational Behavior and Human Decision Processes 116 (2011) 262–271

these real probabilities are simply defined by the chosen partition of the scale: for instance, for any decile, there are 10% of the population who belong to that decile. To determine whether the conditional beliefs for a state A are consistent with Bayesian updating, one has to check whether

X

PðAjSi Þ  PðSi Þ ¼ PðAÞ

ð1Þ

i

where the signals Si form a disjoint partition of the universe. To make this restriction clearer, we will again refer to the driving skill example: in the example, 60% of the population had no accident. These people share the beliefs mentioned above: PðHjno accidentÞ ¼ 59 ; PðMjno accidentÞ ¼ 13, and PðLjno accidentÞ ¼ 1 . Among the 40% who experienced an accident, posterior probabil9 ities are 0 for being highly skilled, PðMjaccidentÞ ¼ 13 and PðLjaccidentÞ ¼ 23 for being of medium and low skill. To translate the example into Eq. (1), signal S1 corresponds to ‘‘no accident’’ and signal S2 to ‘‘accident’’. Together, these two signals describe all possible scenarios. If we plug in the different skill levels for A, we arrive at the following equations:

PðHjno accidentÞ  Pðno accidentÞ þ PðHjaccidentÞ !

 PðaccidentÞ ¼ PðHÞ PðMjno accidentÞ  Pðno accidentÞ þ PðMjaccidentÞ !

 PðaccidentÞ ¼ PðMÞ PðLjno accidentÞ  Pðno accidentÞ þ PðLjaccidentÞ !

 PðaccidentÞ ¼ PðLÞ We know from the given distribution of driving skill within the population that PðHÞ ¼ PðMÞ ¼ PðLÞ ¼ 13. This provides us with three conditions that have to hold when beliefs are updated rationally. As the posterior probabilities stated above were calculated by Bayes’ law, the conditions are of course met. Note that in the example, the signal and the probability of the signal for each group were known; this is not necessarily the case. In an experimental setting, P(Si) are unobservable and signals may be much more complicated than the binary ‘‘accident’’ versus ‘‘no accident’’. We treat the Si as elements of a set of possible signals S. One may think of these signals as idiosyncratic life-time experiences in a certain domain. We do not impose any further restrictions on these signals except for the standard assumption that signal realizations for experiment participants are randomly drawn from S. We now define K ability quantiles Qk to provide a common understanding of skill levels. The probability P(Qk) of falling into each quantile is evaluated conditional on the observed signal Si and subjects in an experiment will thus report P(QkjSi). Inserting this into Eq. (1) we obtain:

X i

!

PðQ k jSi Þ  PðSi Þ ¼ PðQ k Þ ¼

1 K

ð2Þ

The right-hand side of Eq. (2) is defined by the choice of the scale’s partition. Conditional probabilities P(QkjSi) may differ from 1/K but—weighted by the probability of the signals in the population—they must equal P(Qk). As P(Si) corresponds to the fraction of subjects observing signal Si, the average reported probability for a quantile must again equal 1/K. We arrive at K equations of this form as the condition needs to be satisfied for each quantile. In an experimental setting, concrete signals and signal probabilities are usually unknown. However, under random sampling for the number of participants n(Si) observing each signal Si, it holds that E[n(Si)] = P(Si)  n; we can thus replace P(Si) by E[n(Si)/n].

Moving the expectation operator outside the sum, we arrive at the following equation:

" E

X i

# nðSi Þ ! 1 ¼ PðQ k jSi Þ  n K

ð3Þ

Our final simplification is to assume that n(Si) equals one, which corresponds to the notion of idiosyncratic signals—we nevertheless allow for several subjects to observe the same signal or for elements of S not to be observed. In the experiments we will mostly use a decile setup. Eq. (3) then generates ten conditions of the form:

"

# n 1X E PðQ k jSj Þ ¼ 0:1 n j¼1

k ¼ 1; . . . ; 10

ð4Þ

The left-hand side represents the average reported probability for each ability decile, which in expectation equals 0.1 for a population of perfect Bayesian updaters. It enables us to compare the realized average belief distribution in the experiments to the uniform benchmark distribution. To test for true overconfidence, we will additionally rely on two conditions: first, the average reported probability mass for the upper half of the ability quantiles should not exceed 50%. This follows directly from Eq. (1) if A represents the state of ‘‘being above average’’. Likewise, in the decile setup of Eq. (4), the probability of the union of the top five deciles must in expectation equal 0.5 across participants. In contrast, true overconfidence would predict that 10 X n 1X PðQ k jSj Þ > 0:5 n k¼6 j¼1

ð5Þ

Of course, similar relations also exist for the top 30%, top 20% and other fractions of the scale. Since the better-than-average effect takes its name from the notion of being above average, we will mainly concentrate on (5) but report other results occasionally. As a second indicator of overconfidence, we consider the mean of the individual belief distribution means. This mean of means should correspond to the middle point of the ability scale in a population of rational updaters, which follows from the definition of PK the mean for individual belief distributions k¼1 PðQ k jSi Þ  k and the aggregation using the signal weights:

X

PðSi Þ 

i

K X

PðQ k jSi Þ  k ¼

k¼1

K X X k¼1

¼

K X k¼1

PðQ k jSi Þ  PðSi Þ  k

i

PðQ k Þ  k ¼

K þ1 2

ð6Þ

The second equality uses Eq. (2) and shows that the mean of the belief distribution means must equal (K + 1)/2. For a decile scale, the mean of individual belief distribution means should thus be at 5.5. True overconfidence predicts a mean of means >5.5. Experiment one Method Participants Experiment one was conducted in 2008 at the University of Mannheim. 68 business students completed the paper-based questionnaire; 69% of the participants were male, the median age was 24. We excluded four participants who left blank substantial portions of the questionnaire.

266

C. Merkle, M. Weber / Organizational Behavior and Human Decision Processes 116 (2011) 262–271

Procedure Subjects answered questions about their skills and abilities in several domains. We selected various domains of skills and abilities to reflect different levels of overconfidence. Subjects were asked for their performance as a student, their abilities in choosing investments, their ability to get along with other students, their programming skills, their sense of humor and their risk of suffering a heart attack before the age of 40. The ability to get along with other people is a domain in which people are prone to high overconfidence (Moore & Cain, 2007). The same applies to sense of humor (Kruger & Dunning, 1999). In contrast, the performance as a student is regularly objectively reported by a relatively efficient feedback mechanism (grades), thus less overconfidence is expected here. For investment abilities, we anticipate considerable overconfidence in line with the behavioral finance literature (e.g. Odean, 1998). Computer skills were previously related to underconfidence (Kruger, 1999), the risk of a heart attack is one of many incidents where overoptimism has been observed (Weinstein, 1980). Appendix A contains part of the questionnaire used in the experiment, which explains to subjects the meaning of the quantile scale and how to fill out the input fields. They were asked to state their probabilities for the quantiles of the scale according to their belief distribution. We alternately used a quartile and a decile scale; the decile scale has the advantage of being more precise at the expense of being more demanding to complete. It is crucial for our analysis of belief distributions that people understood the scale and answered the question for their probabilities of falling into each quantile reasonably. They were informed by the instructions that the probabilities had to add up to one (see Appendix A). For 96% of the entries, people obeyed this rule. In the remaining cases, the sum of probabilities was almost always close to one, suggesting mistakes in calculation and not in comprehension; we nevertheless exclude these cases. Additionally, we asked subjects for a point estimate along the quantile scale following the traditional approach of demonstrating overconfidence; they thus made both judgments as illustrated in Fig. 1. This enables us to compare the two evaluations and—most likely—infer how subjects tend to aggregate their beliefs. We varied the order of point and probability judgments and for two domains we used a between-group design in which one group was asked only for probabilities while the other stated only a point estimate; this allows us to test for order effects and interdependencies between the two types of evaluations. Results and discussion Classic overconfidence Subjects classifying themselves by point estimates into ability quantiles for the different domains corresponds to the traditional way of testing for overplacement. We can conclude whether there is apparent overconfidence or not and—more importantly—later compare whether true overconfidence shows up in the same domains. Table 1 shows the results. In line with earlier research, participants appear to be exceedingly overconfident when judging their sense of humor or their ability to get along with others. Most people see themselves above average (96% and 84%, respectively) and many place themselves in the 7th or 8th decile of the ability distribution. However, one needs to keep in mind that even this extreme case is not sufficient for proving the existence of true overconfidence; for instance, with respect to sense of humor, exactly 60% place themselves among the top 30% of the distribution, which does not violate the condition set up by Benoît and Dubra (2009). As discussed before, their requirement is too strong for the ratios typically found in overconfidence experiments.

For study performance (a domain where better feedback is available), overconfidence is less pronounced, but still mean, median and percentage of participants viewing themselves to be above average are significantly greater than the corresponding neutral values. We find only slight underconfidence for programming skills and a neutral result for investing abilities. A young student population accustomed to computers may feel more competent in programming while at the same time having little financial market experience. Kruger (1999) shows that the self-assessed ability in a domain is a strong driver of overplacement. In particular Glaser et al. (2009) find higher levels of overconfidence for finance professionals than for students. The result for risk of a heart attack is again as anticipated: most participants are overoptimistic and assess their personal risk as lower compared to their peers; in fact, 75% of participants believe that their risk is below average. We emphasize that—for the purpose of this article—it is less important whether the results for each domain match precisely the expectations derived from the literature as we are primarily interested in the relationship between apparent and true overconfidence. Probability assessments The key data of our study consist of the probability assessments supplied by experiment participants. While participants exhibit many shapes of belief distributions—some skewed and others symmetric, some flat and others very steep—the distributions have in common that they are unimodal, i.e. exhibiting a single probability peak in one quantile or several adjacent quantiles sharing the same probability. We almost never observe a first peak followed by a drop in probability followed by another peak; this makes sense intuitively as one mostly feels either skillful or not for any given domain. The tightness of the distribution hints at how sure subjects are about their self-assessment: very often, subjects indicate a zero probability for several deciles, i.e. they are sure that they could conceivably fall only into a certain range of the scale. When we ask for the whole distribution of beliefs, it is no longer possible for a rational population to be predominantly above average in the sense that neither the mean of the individual distribution means, nor a major part of the probability mass of the distributions can be significantly above average. In a decile framework, the mean of individual distribution means must be at 5.5, in a quartile setup it must lie at 2.5; the aggregated probability mass must be split equally between the lower and upper half of the quantiles. In fact, in a population of true Baysians, Eq. (2) has to hold in expectation for every quantile. This is far more restrictive than the direct assessment analyzed before, where people only indicated to which quantile of the distribution they believed themselves to belong to. The results for the probability assessment in Table 2 appear similar to those of Table 1. We again find strong overconfidence for sense of humor and the ability to get along with others, and somewhat weaker overconfidence for study performance, all significant at 1%-level (t-test). The rightmost column of Table 2 refers directly to Eq. (5). Values significantly above 50% indicate true overconfidence (overplacement). Overplacement is found in all expected domains with the exception of ‘‘financial market investment’’. Analogously, underplacement can be diagnosed for programming skills and risk of a heart attack (overoptimism). We did not find any order effects, neither for the order of domains nor for the order of probability estimate and point estimate. The between-subject domains (‘‘sense of humor’’ and ‘‘programming skills’’) reveal that the degree of overplacement is similar between point estimates and probability estimates even if compared across groups. To test whether the elicited probabilities coincide approximately with the rational benchmark (and thus might have been

267

C. Merkle, M. Weber / Organizational Behavior and Human Decision Processes 116 (2011) 262–271 Table 1 Point estimates of own skills and virtues on quantile scales (experiment one). Domain Study performance Financial markets Sense of humor Programming skills Getting along with others Risk of heart attack

Expectation

Scale

Overplacement (>5.5) Overplacement (>2.5) Overplacement (>5.5) Underplacement (<2.5) Overplacement (>5.5) Overoptimism (<2.5)

n

Deciles Quartiles Deciles Quartiles Deciles Quartiles

64 61 25 25 64 63

Mean

Median

⁄⁄⁄

⁄⁄⁄

6.39 2.56 7.80⁄⁄⁄ 2.32 7.08⁄⁄⁄ 1.83⁄⁄⁄

6 3 8⁄⁄⁄ 2 7⁄⁄⁄ 2⁄⁄⁄

% Above avg. 78.1⁄⁄⁄ 54.1 96.0⁄⁄⁄ 32.0⁄ 84.4⁄⁄⁄ 25.4⁄⁄⁄

Notes: The table shows the tested skill domains, the experimental expectation (including its numerical meaning), and the partition of the scale used for each domain in experiment one. It contains number of observations, mean, median and percentage of subjects that placed themselves above average. For the decile scale the midpoint is 5.5, for the quartile scale 2.5. We use a two-tailed t-test (mean), Wilcoxon signed-rank test (median), and binominal probability test (percentage above average). ⁄ p < .1. ⁄⁄ p < .05. ⁄⁄⁄ p < .01.

Table 2 Population averages of probability assessments for skills and virtues (experiment one). Domain

Scale

n

Mean of distr. means

Total probability mass above average (%)

Study performance Financial markets Sense of humor Programming skills Getting along with others Risk of heart attack

Deciles Quartiles Deciles Quartiles Deciles Quartiles

64 63 39 38 64 64

6.20⁄⁄⁄ 2.51 7.08⁄⁄⁄ 2.15⁄⁄⁄ 7.08⁄⁄⁄ 2.02⁄⁄⁄

69.0⁄⁄⁄ 50.0 80.6⁄⁄⁄ 36.2⁄⁄⁄ 80.6⁄⁄⁄ 31.0⁄⁄⁄

Notes: The table shows the tested skill domains and scale used for each domain in experiment one. It contains number of observations, mean of individual distribution means, and the total probability mass above average in %. For the decile scale the midpoint is 5.5, for the quartile scale 2.5. ⁄ Two-sided t-test: p < .1. ⁄⁄ Two-sided t-test: p < .05. ⁄⁄⁄ Two-sided t-test: p < .01.

Table 3 Tests for compatibility of average belief distributions with the prediction of rational information processing (experiment one). Domain

Study performance Financial markets Sense of humor Programming skills Getting along with others Risk of heart attack

p-Values

v2-test

KS-test

.000 .005 .000 .041 .000 .004

.002 .351 .001 .211 .000 .009

Notes: The table reports two test whether the average probabilities submitted by experiment participants correspond to the theoretical prediction of Bayesian updating. It shows the p-values of a chi-square test with nine degrees of freedom (deciles) and 3 degrees of freedom (quartiles) and the p-values of a Kolmogorow– Smirnov test for all domains of experiment one.

Limitations of experiment one It has been argued that ambiguity is a problem in questions concerning skills and virtues (Dunning et al., 1989; Van den Steen, 2004). People might interpret the skill in question differently, consequently allowing everyone to rightfully reach the conclusion that they are above average with respect to their subjective definition of the skill in question. There also were no monetary incentives in experiment one, primarily because a convincing incentive scheme was not available. However, it has been demonstrated that behavioral biases may disappear with proper incentivization, although evidence in psychological and economic experiments is mixed (Camerer & Hogarth, 1999; Hertwig & Ortman, 2001). To account for these possibilities, we design a second experiment in which subjects have to evaluate their performance in incentivized laboratory tasks. Experiment two

derived by Bayesian updating), we use a chi-square goodness-of-fit test and a Kolmogorov–Smirnov test for equality of distributions. Table 3 shows the p-values of these tests for the skills and abilities used in experiment one. For four domains, both tests reject rational information processing at 1%-level, for the remaining two domains at least the chisquare test is significant.2 The in general pronounced asymmetric shape of the average belief distribution cannot be reconciled with the normative result we derived before. While in the theoretical driving skill example optimistic assessments of those who received a positive signal were counterbalanced by the beliefs of those with a negative signal, this does not seem to happen in the experiment. We will analyze individual belief distributions in more detail in experiment two. 2 The differences between the two tests arise from the fact that the chi-square test penalizes any deviation from the distribution, while the Kolmogorov–Smirnow test is sensitive to deviations in the cumulative distribution function.

Method Participants Experiment two took place at the University of Mannheim in 2010. 50 students of various faculties (50% business and economics) were recruited via an online recruitment system for economic experiments (ORSEE; Greiner, 2004). 48% of the participants were male, the median age was 24. Experiment two was computerbased and programmed in z-tree (Fischbacher, 2007). Procedure In this experiment we elicited probability assessments for four tasks conducted in the laboratory. We chose tests for intelligence, memory, creativity, and general knowledge as tasks for the experiment (see Appendix B); these domains should represent meaningful and desirable qualities for our subjects. Trait desirability often goes along with self-serving ability assessments (Alicke, 1985); in particular, Burks, Carpenter, Goette,

268

C. Merkle, M. Weber / Organizational Behavior and Human Decision Processes 116 (2011) 262–271

Table 4 Average probabilities assigned to deciles (experiment two). Domain

Decile scale Worst 10% 1

2

3

4

5

6

7

8

9

Best 10% 10

Intelligence

.014 (.061)

.020 (.061)

.045 (.105)

.073 (.094)

.120 (.127)

.140 (.126)

.189 (.152)

.185 (.147)

.129 (.146)

.085 (.165)

Memory

.016 (.065)

.023 (.074)

.014 (.049)

.025 (.055)

.076 (.131)

.115 (.135)

.167 (.149)

.200 (.132)

.189 (.169)

.176 (.245)

Creativity

.056 (.130)

.084 (.135)

.088 (.119)

.115 (.125)

.125 (.113)

.147 (.128)

.155 (.153)

.132 (.135)

.058 (.101)

.040 (.082)

Knowledge

.033 (.083)

.071 (.131)

.078 (.135)

.062 (.093)

.073 (.096)

.107 (.129)

.152 (.159)

.188 (.167)

.153 (.153)

.085 (.162)

Notes: The table shows the average probabilities assigned to the deciles of the ability scale for the four tasks of experiment two. Standard deviations are in parentheses.

Table 5 Population averages of probability assessments in experimental tasks (experiment two). Domain Intelligence Memory Creativity Knowledge

Scale Deciles Deciles Deciles Deciles

n 48 48 48 48

Mean of distr. means ⁄⁄⁄

6.74 7.50⁄⁄⁄ 5.52 6.45⁄⁄⁄

Total probability mass above average (%) ⁄⁄⁄

72.8 84.7⁄⁄⁄ 53.2 68.4⁄⁄⁄

Proportion of correct responses (%) 69.5 85.9 32.8 67.8

Notes: The table shows the experimental tasks and the scale used for each domain of experiment two. It contains number of observations, the mean of individual distribution means, the total probability mass above average in %, and the proportion of correct responses for each task. For the decile scale the midpoint is 5.5. ⁄ Two-sided t-test: p < .1. ⁄⁄ Two-sided t-test: p < .05. ⁄⁄⁄ Two-sided t-test: p < .01.

and Rustichini (2009) find overplacement in an IQ test and Moore and Healy (2008) demonstrate a similar effect in knowledge tests. It has been further shown that in social comparisons, easy tasks typically produce more pronounced overplacement than do difficult tasks (Larrick et al., 2007; Moore & Healy, 2008), as people seem to focus on their own result and do not fully account for the task being easy or difficult for most of the other participants as well.3 We thus expect true overconfidence of subjects for the test domains, probably moderated by increasing task difficulty. As subjects had given appropriate responses in the more precise decile framework in experiment one, we used solely this design in experiment two. The wording of the experimental instructions remained the same (see Appendix A). It was automatically checked whether probabilities summed to one and subjects were prompted to correct their entries if not. Two participants repeatedly failed to correct their answers and were excluded from the analysis. Participants completed tasks prior to their evaluation of probabilities, implying that at the time they had to make their judgments, there was little ambiguity about which performance they should evaluate. We used a quadratic scoring rule to incentivize subjects (Selten, 1998), as a quadratic scoring rule makes it optimal for (risk-neutral) subjects to submit their true belief distributions. If anything, risk-averse subjects would bias their response to a more uniform distribution which would counteract our results. In overconfidence research, Moore and Healy (2008) apply the quadratic scoring rule in a similar experimental design. Participants were told that tied scores would be resolved by chance. Results and discussion Probability assessments Participants find it most likely that they rank between the sixth and ninth decile for the tasks in experiment two. Table 4 shows the 3 This finding is a reversed form of the classic hard-easy effect (Lichtenstein et al., 1982; Juslin, Winman, & Olsson, 2000).

average probabilities that participants stated for the ability scale used. Relatively few subjects believe their performance to be in the very top decile compared to their peers. While the seventh and eighth decile are the most popular choice (with average probabilities assigned to these deciles mostly exceeding 0.15), participants submit very low probabilities for the bottom deciles, sometimes as low as between 0.01 and 0.03 for the domains of intelligence and memory. Kruger and Dunning (1999) show that for many tasks, very few people believe that they perform very badly compared to their peers. We add that people do not even find it probable that they could be bad. Table 5 generalizes these results to the statistics of the distribution that we are especially interested in. In three out of four domains, we find significant overplacement of participants measured by both the mean of the distribution means and the average probability mass above the middle point of the scale. The extent of overplacement is comparable to the untested abilities of experiment one. Ambiguity may have contributed to overconfidence in domains such as ‘‘sense of humor’’ (cp. Dunning et al., 1989), but true overconfidence is present also in controlled tasks with little interpretational flexibility. Extending the analysis to other thresholds than the average or middle point of the quantile scale reveals patterns which were already suggested by the descriptive statistics. For the top 40%, we still find overplacement similar to the presented results for being above average. For high quantiles, however, overconfidence becomes markedly weaker or even disappears altogether. Only in one domain of experiment two (memory test), overplacement is still significant for the top 20%; the better-than-average effect thus appears to be a ‘‘slightly-better-than-average effect’’. The pattern of overplacement found is in line with the reversed hard-easy effect for relative judgments (Larrick et al., 2007; Moore & Healy, 2008). The right column of Table 5 shows the proportion of correct answers given in the tasks. With only 32.8% correct responses, the creativity test clearly qualifies as hard, and we find no significant overplacement; on the other hand, overplacement

C. Merkle, M. Weber / Organizational Behavior and Human Decision Processes 116 (2011) 262–271 Table 6 Test for compatibility of average belief distributions with the prediction of rational information processing (experiment two). Domain

Intelligence Memory Creativity Knowledge

p-Values

v2-test

KS-test

.000 .000 .013 .007

.002 .000 .343 .033

Notes: The table reports two test whether the average probabilities submitted by experiment participants correspond to the theoretical prediction of Bayesian updating. It shows the p-values of a chi-square test with nine degrees of freedom (deciles) and 3 degrees of freedom (quartiles) and the p-values of a Kolmogorow– Smirnov test for all domains of experiment two.

is most pronounced in the easiest task (memory). We explain this finding by the egocentric nature of relative judgments (Kruger, 1999; Moore & Cain, 2007): subjects react more strongly to variations in their own performance than to possible variations in the performance of other participants. We again test whether the submitted probabilities coincide approximately with the rational benchmark, and hence might have been derived by Bayesian updating. Table 6 shows the p-values of the chi-square test and of the Kolmogorov–Smirnov test for the tasks used in experiment two. For three out of four domains, rational information processing can clearly be rejected. The asymmetry of belief distributions already visible in Table 4 is again not compatible with the uniform distribution postulated by Bayesian updating. The statistical tests suggest that deviations are too strong to be a result of imperfect sampling of experiment participants alone, as this type of randomness should be small in magnitude and not systematic in a manner observed in the presented results. Overconfidence on an individual level Besides population averages, the controlled tasks allow us to test for overconfidence on an individual level. In the probability framework, participants provide a range of deciles they believe to be possible for themselves with different probabilities. If the actual result is below or above this range, this is far stronger evidence for over- or underplacement than if it were to fall short of—or exceed— a point estimate. Table 7 shows the fraction of participants who ranked below their worst expectation or above their best expectation. For the domains of intelligence, memory, and general knowledge, about a third of the subjects end up in a decile below all of the deciles they had assigned a probability greater than zero, i.e. they perform worse than they had even considered possible. The other extreme—reaching a decile above one’s best expectation— happens far less often (between 2% and 10%). Except for the domain of creativity, the difference between the two proportions is strongly significant (z-test). This impression of asymmetry is backed by the fraction of subjects reaching a decile below their mean expectation. It is rather common for subjects to fall short of their mean expectation, especially for the intelligence and memory test. We have previously in parts explained the failure of average belief distributions to match the rational benchmark by the pronounced overplacement of unskilled participants (cp. Ehrlinger, Johnson, Banner, Dunning, & Kruger, 2008; Kruger & Dunning, 1999): those who receive a negative signal should adjust their probabilities accordingly, and (as in the driving skill example) should hence submit high probabilities for low quantiles. We thus now examine the belief distributions of two specific groups, namely the skilled and the unskilled, where we define the groups as those who finish in the top three and bottom three deciles in

269

each task, respectively. We assume that participants hold neutral priors before they enter the tasks.4 A good or bad performance in the task should then inflate their subjective probabilities of falling into low or high quantiles, at least if subjects interpret their task performance correctly and update their beliefs in a rational manner. However, Table 8 shows that unskilled participants recognize their negative signal only partially: with the exception of the memory task, they understand that they are less likely to reach the top 30% but only slightly and occasionally increase their probability for the bottom 30%. For two tasks, the intelligence and memory test, they state even smaller probabilities than the neutral prior probability of .3 for the bottom 30%. Skilled participants seem to react more strongly to their positive signal: while they assign high probabilities to the top 30%, they regard the bottom 30% as almost impossible. This asymmetry in signal processing is one major cause for the disparity between the belief distributions and the rational benchmark. The right column of Table 8 displays the average probability subjects assign to the correct decile, i.e. the decile they actually fall into according to their task performance. With the exception of creativity, skilled participants here submit higher probabilities and are thus better able to recognize their true skill level; they consequently earn more under the quadratic scoring rule regime. This finding is consistent with the idea that poor performers also lack metacognitive skills (Ehrlinger et al., 2008). While part of the result may be due to a regression effect (cp. Burson, Larrick, & Klayman, 2006), it cannot explain the asymmetry we observe between the judgments of unskilled and skilled participants. General discussion We propose a new methodology to measure overconfidence. Experiment participants evaluate their relative position within the population for different skills and tasks by stating their complete corresponding belief distribution; they provide probability estimates for each decile or quartile instead of a single point estimate. This approach avoids many problems that were shown to be detrimental to previous research in overconfidence. Belief distributions yield clear restrictions as to what is possible for a population of rational Bayesian updaters. There is considerable overplacement in the belief distributions of experiment participants. Probability estimates closely resemble results based on traditionally used point estimates. Population averages for different characteristics of belief distributions are inconsistent with Bayesian updating. Participants on average state high probabilities for quantiles above average while they regard it as unlikely that they should fall into the bottom quantiles. Because of this pattern, the aggregated belief distribution fails to match the rational benchmark. Individual level results confirm these observations, with people often underperforming even their worst expectations. Overplacement is particularly pronounced for unskilled participants who apparently do not fully account for the negative signals they receive. Causes of true overconfidence We believe that motivational and non-motivational factors account for the existence of overconfidence (and in particular overplacement). It has been argued that positive illusions contribute to mental health and well-being (Taylor & Brown, 1988). They foster self-esteem and enhance the motivation to act (Bénabou & Tirole, 2002). Among the non-motivational factors, selective 4 This is a conservative assumption; if anything, experience should already have shifted the skilled and unskilled towards more realistic priors.

270

C. Merkle, M. Weber / Organizational Behavior and Human Decision Processes 116 (2011) 262–271

Table 7 Individual overplacement in experimental tasks (experiment two). Domain

Intelligence Memory Creativity Knowledge

Percentage of subjects (%) Ranked below worst expectation

Ranked below mean expectation

Ranked above best expectation

27.1 37.5 10.4 31.2

70.8 70.8 45.8 56.3

10.4⁄⁄ 2.1⁄⁄⁄ 8.3 4.2⁄⁄⁄

Notes: The table shows the proportion of subjects for which their actual decile rank is below their worst expectations, below their mean expectations, and above their best expectations in experiment two. Worst (best) expectations are defined as the lowest (highest) decile for which subjects submit a probability > 0. Asterisks stand for significant differences between the proportion below worst expectation and above best expectation using a two-sample z-test of proportion. ⁄ p < .1. ⁄⁄ p < .05. ⁄⁄⁄ p < .01.

recruitment of information, focalism, and egocentrism have been put forward (cp. Alicke & Govorun, 2005). As discussed before, ambiguity, desirability, and controllability of the judgment item moderate the degree of overplacement. We favor these explanations as they examine the psychological roots of the phenomenon and seem more plausible than a logically rigorous but less realistic model. Healy and Moore (2007) provide such a rational explanation for the occurrence of the reversed hard-easy effect, which we observe in experiment two. In their model, people hold incorrect prior beliefs but then update these beliefs rationally. If subjects perform better or worse than their expectation, they will attribute this partly to chance and partly to their ability; we cannot fully exclude this possibility. However, we use tests that relate to abilities and virtues such as memory or creativity for which subjects should have more accurate prior beliefs (compared to the trivia quizzes used in Healy & Moore, 2007). To additionally reduce surprise potential in these tasks, we mostly use questions of a type that people may have seen before, e.g. typical IQ-test questions. In the postexperiment questionnaire, subjects rate the tasks according to their perceived reliability to test for the ability in question. Predominantly high ratings support the impression that the test content was in line with the expectation of participants.

but may not be able to express them in a probability distribution. We tried to address this concern by a careful analysis of what subjects actually do in the experiment: individual responses seem reasonable (as submitted belief distributions are unimodal without jumps or breaks), but this is of course only indicative evidence. Additionally, the incentive scheme in experiment two should motivate subjects to represent their true beliefs as closely as possible. We further do not measure priors directly in experiment two and also cannot observe the signals participants receive from having completed the tasks. It is thus hard to determine precisely at what point during the information processing procedure the biases occur. However, in combination with experiment one (which elicits unconditional beliefs in several domains) our impression is that both priors and interpretation of signals are biased. Implications Overconfidence is among the behavioral biases most readily adopted by academic researchers in economics and finance. In the literature, it is related to excessive trading volume (Barber & Odean, 2000; Glaser & Weber, 2007; Odean, 1998), to the emergence of stock market bubbles (Scheinkman & Xiong, 2003; Shiller, 2002), to corporate investment decisions (Gervais, Heaton, & Odean, 2003; Malmendier & Tate, 2005), and to the predictability of market returns (Daniel, Hirshleifer, & Subrahmanyam, 1998). Most of these articles take overconfidence as a given result from psychology and not as a subject of further scrutiny. For instance, Odean (1998) states that ‘‘a substantial literature in cognitive psychology establishes that people are usually overconfident and, specifically, that they are overconfident about the precision of their knowledge (p. 1888).’’ Some caution seems to be appropriate here: whereas excessive trading for instance is an observed reality, its link to overconfidence is established only on argumentative grounds; it relies on the existence of overconfidence as a robust feature of human behavior. Consequently, if the existence of overconfidence is challenged in psychology, this will directly affect the mentioned research in economics and finance. Alternative explanations appear less compelling in many situations, thus without overconfidence these results lose much of their appeal. On that account, our findings contribute to behavioral explanations built on overconfidence remaining intact. They still might inspire some research to directly relate behavioral phenomena to economic reality.

Remaining caveats Conclusion A caveat to the proposed methodology is that participants may have difficulties with meaningfully completing the probability evaluation. People possess underlying beliefs about their skills

The evidence collected suggests that the theoretically valid criticism of Benoît and Dubra (2009) has only little practical

Table 8 Probability assessment of skilled and unskilled participants (experiment two). Domain

Skill level

Estimated probability to be in top 30%

Estimated probability to be in bottom 30%

Estimated probability for actual decile

Intelligence

Skilled Unskilled

.54⁄⁄ .20

.04⁄⁄⁄ .19

.20⁄⁄ .06⁄

Memory

Skilled Unskilled

.83⁄⁄⁄ .39

.00⁄⁄⁄ .07⁄⁄⁄

.33⁄⁄⁄ .04⁄⁄

Creativity

Skilled Unskilled

.42 .12⁄⁄

.05⁄⁄⁄ .46⁄

.15 .18⁄

Knowledge

Skilled Unskilled

.65⁄⁄⁄ .17

.03⁄⁄⁄ .42⁄

.22⁄⁄ .08

Notes: The table shows the average estimated probabilities of skilled and unskilled participants for different ranges of the decile scale in experiment two. We test for deviations from neutral prior probability using a Wilcoxon signed-rank test. ⁄ p < .1. ⁄⁄ p < .05. ⁄⁄⁄ p < .01.

C. Merkle, M. Weber / Organizational Behavior and Human Decision Processes 116 (2011) 262–271

consequences for overconfidence research. In general, apparent overconfidence represents underlying true overconfidence which is reflected in belief distributions. It is not necessary to discard the literature on the better-than-average effect or to redo the entire research with a methodology that is robust against this objection. For future research, scientists may want to adopt a design like ours to avoid potential concerns. Appendices A and B. Supplementary material Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.obhdp.2011.07.004. References Alicke, M. D. (1985). Global self-evaluation as determined by the desirability and controllability of trait adjectives. Journal of Personality and Social Psychology, 49, 1621–1630. Alicke, M. D., & Govorun, O. (2005). The better-than-average effect. In M. D. Alicke, D. A. Dunning, & J. I. Krueger (Eds.), The self in social judgment (pp. 83–106). New York, NY: Psychology Press. Alicke, M. D., Klotz, M. L., Breitenbecher, D. L., Yurak, T. J., & Vredenburg, D. S. (1995). Personal contact, individuation, and the better-than-average effect. Journal of Personality and Social Psychology, 68, 804–825. Alpert, M., & Raiffa, H. (1982). A progress report on the training of probability assessors. In A. Tversky & D. Kahneman (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 294–305). New York, NY: Cambridge University Press. Barber, B. M., & Odean, T. (2000). Trading is hazardous to your wealth: The common stock investment performance of individual investors. The Journal of Finance, 55, 773–806. Bénabou, R., & Tirole, J. (2002). Self-confidence and personal motivation. The Quarterly Journal of Economics, 117, 871–915. Benoît, J., & Dubra, J. (2009). Overconfidence? Working paper. SSRN. . Bradley, G. W. (1978). Self-serving biases in the attribution process: A reexamination of the fact or fiction question. Journal of Personality and Social Psychology, 36, 56–71. Brocas, I., & Carrillo, J. D. (2002). Are we all better drivers than average? Selfperception and biased behavior. CEPR discussion paper No. 3603. Burks, S. V., Carpenter, J. P., Goette, L., & Rustichini, A. (2009). Is overconfidence a judgment bias? Theory and evidence. Working paper. Burson, K. A., Larrick, R. P., & Klayman, J. (2006). Skilled or unskilled, but still unaware of it: How perceptions of difficulty drive miscalibration in relative comparisons. Journal of Personality and Social Psychology, 90, 60–77. Camerer, C. F., & Hogarth, R. M. (1999). The effects of financial incentives in experiments: A review and capital–labor–production framework. Journal of Risk and Uncertainty, 19, 7–42. Compte, O., & Postlewaite, A. (2004). Confidence-enhanced performance. The American Economic Review, 94, 1536–1557. Cooper, A. C., Woo, C. Y., & Dunkelberg, W. C. (1988). Enterpreneurs’ perceived chances for success. Journal of Business Venturing, 3, 97–108. Daniel, K., Hirshleifer, D., & Subrahmanyam, A. (1998). Investor psychology and security market under- and overreactions. The Journal of Finance, 53, 1839–1885. Dawes, R. M., & Mulford, M. (1996). The false consensus effect and overconfidence: Flaws in judgment or flaws in how we study judgment? Organizational Behavior and Human Decision Processes, 65, 201–211. Dunning, D., Meyerowitz, J. A., & Holzberg, A. D. (1989). Ambiguity and selfevaluation: The role of idiosyncratic trait definitions in self-serving assessments of ability. Journal of Personality and Social Psychology, 57, 1082–1090. Ehrlinger, J., Johnson, K., Banner, M., Dunning, D., & Kruger, J. (2008). Why the unskilled are unaware: Further explorations of (absent) self-insight among the incompetent. Organizational Behavior and Human Decision Processes, 105, 98–121. Erev, I., Wallsten, T. S., & Budescu, D. V. (1994). Simultaneous over- and underconfidence: The role of error in judgment processes. Psychological Review, 101, 519–527. Fischbacher, U. (2007). Z-tree – Zurich toolbox for readymade economic experiments. Experimental Economics, 10, 171–178. Gervais, S., Heaton, J. B., & Odean, T. (2003). Overconfidence, investment policy, and executive stock options. Rodney L. White Center for Financial Research Working paper no. 15-02. SSRN. . Gigerenzer, G. (1991). How to make cognitive illusions disappear: Beyond heuristics and biases. European Review of Social Psychology, 2, 83–115.

271

Gigerenzer, G., Hoffrage, U., & Kleinbölting, H. (1991). Probabilistic mental models: A Brunswikian theory of confidence. Psychological Review, 98, 506–528. Glaser, M., Langer, T., & Weber, M. (2009). Overconfidence of professionals and laymen: individual differences within and between tasks? Working paper. Glaser, M., & Weber, M. (2007). Overconfidence and trading volume. The GENEVA Risk and Insurance Review, 32, 1–36. Glaser, M., & Weber, M. (2010). Overconfidence. In H. K. Baker & J. Nofsinger (Eds.), Behavioral finance - Investors, corporations, and markets (pp. 241–258). Hoboken, NJ: John Wiley & Sons. Greiner, B. (2004). An online recruitment system for economic experiments. In K. Kremer & V. Macho (Eds.), Forschung und wissenschaftliches Rechnen 2003 (pp. 79–93). Göttingen: Ges. fnr Wiss. Datenverarbeitung. Healy, P. J., & Moore, D. A. (2007). Bayesian overconfidence. Working paper. SSRN. . Hertwig, R., & Ortman, A. (2001). Experimental practices in economics: A methodological challenge for psychologists? Behavioral and Brain Sciences, 24, 383–403. Juslin, P. (1994). The overconfidence phenomenon as a consequence of informal experimenter-guided selection of almanac items. Organizational Behavior and Human Decision Processes, 57, 226–246. Juslin, P., Winman, A., & Olsson, H. (2000). Naive empiricism and dogmatism in confidence research: A critical examination of the hard-easy effect. Psychological Review, 107, 384–396. Klayman, J., Soll, J. B., González-Vallejo, C., & Barlas, S. (1999). Overconfidence: It depends on how, what, and whom you ask. Organizational Behavior and Human Decision Processes, 79, 216–247. Köszegi, B. (2006). Ego utility, overconfidence, and task choice. Journal of the European Economic Association, 4, 673–707. Kruger, J. (1999). Lake Wobegon be gone! The below average effect and the egocentric nature of comparative ability judgements. Journal of Personality and Social Psychology, 77, 221–232. Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121–1134. Langer, E. J. (1975). The illusion of control. Journal of Personality and Social Psychology, 32, 311–328. Larrick, R. P., Burson, K. A., & Soll, J. B. (2007). Social comparison and confidence: When thinking you’re better than average predicts overconfidence (and when it does not). Organizational Behavior and Human Decision Processes, 102, 76–94. Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities: The state of the art to 1980. In A. Tversky & D. Kahneman (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 306–351). New York, NY: Cambridge University Press. Mahar, H. (2003). Why are there so few prenuptial agreements? Harvard Law School John M. Olin Center for Law, Economics and Business. Discussion paper No. 436. Malmendier, U., & Tate, G. (2005). CEO overconfidence and corporate investment. The Journal of Finance, 60, 2661–2700. Moore, D. A., & Cain, D. M. (2007). Overconfidence and underconfidence: When and why people underestimate (and overestimate) the competition. Organizational Behavior and Human Decision Processes, 103, 197–213. Moore, D. A., & Healy, P. J. (2008). The trouble with overconfidence. Psychological Review, 115, 502–517. Odean, T. (1998). Volume, volatility, price, and profit when all traders are above average. The Journal of Finance, 53, 1887–1934. Preston, C. E., & Harris, S. (1965). Psychology of drivers in traffic accidents. Journal of Applied Psychology, 49, 284–288. Russo, J. E., & Schoemaker, P. J. H. (1992). Managing overconfidence. Sloan Management Review, 33, 7–17. Santos-Pinto, L., & Sobel, J. (2005). A model of positive self-image in subjective assessments. The American Economic Review, 95, 1386–1402. Scheinkman, J. A., & Xiong, W. (2003). Overconfidence and speculative bubbles. Journal of Political Economy, 111, 1183–1219. Selten, R. (1998). Axiomatic characterization of the quadratic scoring rule. Experimental Economics, 1, 43–62. Shiller, R. J. (2002). Bubbles, human judgment and expert opinion. The Financial Analysts Journal, 58, 18–26. Svenson, O. (1981). Are we all less risky and more skillful than our fellow drivers? Acta Psychologica, 94, 143–148. Taylor, S. E., & Brown, J. D. (1988). Illusion and well-being: A social psychological perspective on mental health. Psychological Bulletin, 103, 193–210. Van den Steen, E. (2004). Rational overoptimism (and other biases). The American Economic Review, 94, 1141–1151. Weinstein, N. D. (1980). Unrealistic optimism about future life events. Journal of Personality and Social Psychology, 39, 806–820. Zábojník, J. (2004). A model of rational bias in self-assessments. Economic Theory, 23, 259–282. Zuckerman, M. (1979). Attribution of success and failure revisited, or: The motivational bias is alive and well in attribution theory. Journal of Personality, 47, 245–287.