Do non-expected utility choice patterns spring from hazy preferences? An experimental study of choice ‘errors’

Do non-expected utility choice patterns spring from hazy preferences? An experimental study of choice ‘errors’

Journal of Economic Behavior & Organization Vol. 41 (2000) 277–297 Do non-expected utility choice patterns spring from hazy preferences? An experimen...

287KB Sizes 0 Downloads 39 Views

Journal of Economic Behavior & Organization Vol. 41 (2000) 277–297

Do non-expected utility choice patterns spring from hazy preferences? An experimental study of choice ‘errors’ D.J. Butler Department of Economics, University of Western Australia, Nedlands, WA 6907, Australia Received 27 January 1997; accepted 25 February 1999

Abstract Individuals often have only incompletely known preferences when choosing between pair-wise gambles. Particular presentations of the choice problem may then passively encourage the use of some choice method to clarify the preference. Different presentational displays can then lead to choice patterns predicted by one or other Generalised Expected Utility theory. When a preference is not or cannot be constructed, choices will be arbitrary. I run an experiment that uses three different presentational displays and incorporates a ‘strength of preference indicator’. The experiment investigates regret theory as an example of a Generalised Expected Utility theory. As preference strength is found to vary by display, regret effects, event-splitting effects and choice reversals are all found to be display dependent. It is suggested that the evidence is best explained by assuming incomplete EU preferences which are clarified by constructive heuristics, rather than some Generalised Expected Utility model coupled with an error term. ©2000 Elsevier Science B.V. All rights reserved. JEL classification: C91; D81 Keywords: Expected utility; Regret; Event-splitting effects; Incompleteness; Errors

1. Introduction Although Expected Utility (EU) theory remains the basis for economists’ work in the area of choice under uncertainty, there is widespread recognition of its descriptive limitations (see Camerer, 1995). Economists have responded to this weakness by developing more general non-expected utility theories that are still utility theories. But as Friedman (1989) pointed out, the strategy of refining preference structures is a retrograde one because the predictive power of the original theory is being weakened rather than strengthened. Psychologists’ efforts to explain systematic violations of the Expected Utility axioms led to the alternative strategy of incorporating perception effects, and the development of 0167-2681/00/$ – see front matter ©2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 2 6 8 1 ( 9 9 ) 0 0 0 7 7 - 3

278

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

Behavioural Decision Research. This strategy presupposes that human cognitive processes are limited and proposes various non-utility models of decision processes. There are two compensatory choice processes, and numerous non-compensatory processes (see Payne et al., 1993). Compensatory decision processes confront the trade-offs between probabilities and consequences and can lead to choices consistent with Expected Utility and other utility theories. Non-compensatory rules, for example Majority of Confirming Decisions (MCD) and Elimination By Aspects (EBA), are usually less cognitively expensive but are harder to defend as rational. Buschena and Zilberman (1995) show that both the economists’ and psychologists’ approaches alone are inadequate. They argue a more general paradigm is needed to capture choice behaviour and account for choice reversals. A number of papers by Georgescu-Roegen (1966) address the issue of psychological thresholds in perception, and the consequences of assuming the individual is not a ‘. . . perfect choosing instrument’. This paper contends that one of those consequences could provide the foundation for a more general paradigm. By incorporating the perception effects into Expected Utility theory as the psychologists advocate, the process of preference construction may give rise to both choice patterns consistent with Generalised Expected Utility (GEU) and choice reversals (errors). Section 2 of this paper presents the argument and Section 3 describes an exploratory experiment to test some of the implications. Section 4 discusses the findings in light of the recent literature, and concludes.

2. Consequences of incomplete preferences 2.1. Constructing preferences Walley (1991) argues that a preference is an underlying behavioural disposition to make a certain choice. The choice made will reflect this disposition when it is apparent which alternative is implied by it. Savage (1954) defined preference as revealed by choice. His approach is valid if we can ‘read’ our disposition costlessly, but if choices are required where preferences are incomplete, this simple identification can no longer be satisfactory. The utility theories assume that individuals possess complete preferences between all pairs of choice objects. The recent economics literature (see Section 4) acknowledges that choices may involve error, and represents it by an appropriately specified error term added to the relevant underlying model. But this strategy would also be called into question if both the generalised utility theories and the error were caused by the same factor, as eliminating that factor would remove not just the error but the theory. This paper departs from Savage’s view when: 1. a preference is not costlessly accessible and the attempt to access or create the preference generates Generalised Expected Utility choice patterns; 2. the clarity of the preference remains too weak to conclude that any one choice object is preferred, even after an attempt to identify the preference. 1 1 Uncertainty regarding one’s preference is conceptually distinct from indifference, wherein the utilities of the alternatives are known precisely, but happen to be equal.

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

279

Expected Utility may then be an appropriate model only when there are minimal cognitive constraints present in a choice environment, limiting the scope of the theory. There would be a need for descriptive and normative models of choice in those cases when one’s preference is not immediately obvious. Our limited cognitive processing power prevents us from maintaining an up-to-date fully defined preference ordering. Even the attempt to define one more closely may be judged too costly when compared with the additional benefits a more complete ordering would offer over a less complete one. Camerer (1995) and Buschena and Zilberman (1996) assert that even if some experimental design offered great incentives for accuracy, there is a cognitive limit to the fine-grading of preferences. It should not then be possible to force all behaviour to conform to Expected Utility simply by raising incentives sufficiently high. Imagine an Expected Utility index against which risky choice objects are measured for their utility. Suppose our ability to discriminate immediately between different values on this index is restricted to a coarse grading, consistent with Fechner’s (1966) psychophysical concept of ‘just-noticeable differences’ (JNDs). Unless the choice objects are more than a certain distance apart, it will not be immediately apparent which object is preferred. If they are sufficiently distant, our underlying disposition shines through brightly and preferences between the choice objects are clear. Choices should be very confident, although occasional errors could still occur due to white noise. Choice patterns should be consistent with Expected Utility theory, plus a white-noise term. 2 Hence: H1a: A preference between a pair of lotteries where the (generalised) expected utilities are close will be weaker in comparison with another lottery pair where the expected utilities are further apart. This hypothesis is consistent with evidence from Butler and Loomes (1988) who found that the difficulty of the choice problem increased as the utility difference between the choice objects decreased. It seems reasonable to assume that when a subject reports a choice to be more difficult, their preference is less clear. If the choice objects are less than a certain distance apart, cognitive effort may then be expended to interpret the choice in a way that allows the underlying disposition to make itself felt. This is equivalent to seeking a finer-grading on a utility index, possibly rendering our underlying disposition sufficiently clear to permit a preference-consistent, rather than an arbitrary, choice 3 . There are different ways in which this cognitive ‘work’ can be done, so we may expect a method (choice rule) that seems most appropriate given the choice context to be chosen. But these different interpretations of the problem emphasise different aspects of the choice from which our preference is then formed: that is, the ranking of the choice objects within a Just Noticeable Difference can be affected by the method used to assign them. Or as Schick (1984) argued, the process of choosing may not simply reveal our pre-existing preferences, it can help to create them. 2 Persons using the expected value rule, which is a special case of EU, should have very confident preferences even when the choice objects are located very close together on the utility index. This is because the Just Noticeable Differences given by the rule are infinitely small. 3 Georgescu-Roegen (1966, p. 152) makes the related point that the greater the time spent perceiving the stimuli before formulating a judgement, the smaller will be the perceptual threshold.

280

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

Aumann (1962) showed that the completeness axiom isn’t necessary to derive a utility that represents preferences, but that the resulting utility is no longer unique. He refers to ‘...the limitation of our discriminatory capacity’ requiring the need to construct the detail of the preference. Leland (1994) and Buschena and Zilberman’s (1995) ‘similarity’ theory maintains that as the risk-aversion embodied by such a utility function is masked, fewer risk-averse choices may occur in the more similar lottery pair. This aspect of their theory is also consistent with the arguments of this paper. If the instruction from an underlying utility function is partly smothered, actual choices can then to a degree reflect the impact of preference constructing heuristics. Hence: H1b: If H1a holds, lottery pairs where (G)EU’s are close will exhibit a smaller proportion of risk-averse choices. When preferences aren’t strong, error rates should exceed that from white noise, but be lower than if choices were arbitrary. Of course, ‘errors’ would then be a misleading term to use if there is no underlying model of complete preferences against which mistakes can be measured. The neutral term ‘choice reversals’ should be preferred. If the description of the choice problem is particularly ill suited to the construction of a preference, non-compensatory rules might be used. Choices made using such rules may not reflect an underlying preference. But it is also possible they help to construct a (weakly felt) preference, rather than simply produce a choice. If we still have no preference between two objects even after using some choice process to pursue a finer-grained discrimination, our ultimate choice can not be said to be preference-revealing (or creating). Such choices will essentially be arbitrary 4 . If the utility difference between the objects is sufficiently small, the true preference can not be accessed (behaviourally equivalent to it not existing), and either choice object would stand a good chance of being selected. Choice reversal rates at the extreme could approach the 50 percent of fully random choices. The implication is that choice errors result from the remaining gaps in our preference ordering, after whatever cognitive effort to reduce the coarse-grading in our utility index has been attempted. Hypothesis 1 alone does not imply that the GEU models cannot be core theories of preference, but it does suggest that the extra sources of utility GEU models permit in the choice process are quickly outweighed by parameter changes which increase both EU and GEU gaps between the choice objects. Descriptively, EU would then be the more parsimonious model in all cases except where the utilities of lotteries are close, just as Newtonian mechanics is still used in preference to the more general relativistic version except at velocities approaching light speed. It also raises the possibility that GEU models are not core theories at all; the effects they purport to explain might be the consequence of preference constructing heuristics, as the latter would also lead to choice reversals only when the utilities of the choice objects are close. If this is so, the choice process then used may lead to violations of one or more of the Expected Utility axioms. Sugden (1995) gave two examples: that complementarities may arise in evaluating the various attributes within an object, violating the sure-thing principle. Or, if complementarities arise when attributes 4 Though perhaps some pattern may be evidenced if a low-effort, non-compensatory choice process is used as a tie-breaker.

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

281

are evaluated across objects, the transitivity axiom may be violated. Generalised Expected Utility choice patterns are here predicted to be a consequence of choice rules such as these being used to reduce the size of the Just Noticeable Differences 5 . Since the various choice processes may be used with different frequencies under different descriptions of the choice problem or procedures for eliciting choice, the use of some Generalised Expected Utility model based on a particular processes might be display-dependent. Display dependence would be a general feature of tests of the axioms of Expected Utility theory if GEU models only predict behaviours that result from honing preferences rather than representing core preference structures. Keller (1985) investigated display effects for compliance with the Sure-Thing and Substitution principles and found significant differences, as Harless (1992) did for regret theory. A second set of hypotheses is thus implied: H2a: If displays differ in the ease with which individuals can form clear preferences, the proportion of choices based on a clear preference for a fixed lottery pair will vary by display. The same logic also implies that: H2b: Given H2a, the proportion of risk-averse choices will be smaller in displays where preferences are less clear. If GEU theories can be core preference structures, then altering the displays of a fixed lottery pair should not affect the frequency of observed GEU effects. If however, they result from display-affected preference constructing heuristics, suitably distinct displays should lead to different frequencies of these effects: H2c: Given H2a, the proportion of GEU effects and choice reversals will be greater in displays where preferences are less clear. In the experiment reported below, regret theory (Loomes and Sugden, 1987) is chosen as an example of a Generalised Expected Utility model. Regret theory was selected because Butler (1998) has shown it is equivalent to a parameter restriction in an additive-difference choice model, which in turn is prompted to varying degrees in some displays when preferences need constructing. 6 The arguments in Butler (1998) demonstrate that at least in theory, choice heuristics can account for the anomalies predicted by one GEU model. Regret effects rely on rather sophisticated information processing, which imposes substantial cognitive costs on an individual. Starmer and Sugden (1993) and Humphrey (1995) show that some experimental evidence identifying regret effects was actually conflated with ‘Event-Splitting Effects’ (ESE’s). This latter effect refers to the tendency of some individuals to be more inclined to choose those acts where the consequence is shown as 5 Jarvenpaa (1989) provides evidence that different displays lead to the use of different choice rules, and argues for a cognitive cost/benefit interpretation of the findings, as did Payne et al. (1992). As those authors point out, this is unlikely to be the result of a meta-choice process as hardly anyone is consciously aware either of the set of alternative choice rules or of the different outcomes they would generate. Instead, it stems from the passive encouragement of suitable rules offered by the choice context. 6 Refer to the former article for a description of regret theory, and the latter for a justification of the choice-rule interpretation. It is speculated that other GEU theories could be reduced to their choice-rule foundations in a similar manner.

282

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

the outcome for contiguous events, rather than one large event, even though the probability of the consequence being realised is unchanged (see Fig. 3). Such effects are inconsistent with all choice theories derived from a compensatory choice process, though they are consistent with the non-compensatory ‘majority of confirming decisions’ (MCD) rule in the choice problems of interest to us. They are in themselves highly suggestive of incomplete preferences. A test for ESE’s is designed into the experiment.

3. An exploratory experiment and results 3.1. The experiment I designed a computerised experiment to shed some light on the hypotheses. The experiment was written in Borland C++version 3.0. An important innovation in the design was the attempt to observe the strength of an individual’s preference in each of the pair-wise choices, using a graphical ‘strength of preference’ indicator. This indicator has strong similarities to the ‘confidence’ indicator used in Butler and Loomes (1988). I assume a clear preference to exist when a high number (≥7) is selected on the preference indicator. Two pairs of lotteries were employed, which are termed the low and high gap lotteries. The set of lotteries used in the 16 questions is detailed in Fig. 1. The testing of H1 and H2 used a low gap lottery pair that offered a 30 percent chance of $44 and 70 percent chance of $0, or a 70 percent chance of $16 and 30 percent chance of $0. The (G)EU’s of these are assumed to be close because the EV of the high variance (risky) lottery exceeds the EV of the risk-averse lottery. If risk aversion is a typical characteristic of clear utility functions, this combination should for many people lead to reasonably equal utilities of the lotteries presented. The high gap lottery pair used to test H1 and H2 offered a 40 percent chance of $33 and 60 percent chance of $0, or a 60 percent chance of $26 and a 40 percent chance of $0. This time the EV of the risk-averse lottery exceeds that of the risky lottery, which should reinforce any natural tendency toward risk-aversion. The information and practice question screens used in the experiment are illustrated in Fig. 2. Ninety-four students from the University of Western Australia were recruited by crosscampus advertising. A total of 16 experimental sessions were held. The experimenter assigned subjects to a terminal and gave a brief description of the experiment and the payment mechanism. As part of this explanation, subjects completed three practice questions, one from each display that they would encounter. The experimenter ensured that all subjects understood how to use the preference indicator before the experiment proper began. There were three very different displays in which the gamble pairs were presented. Examples of these are illustrated in Fig. 2, and will be referred to here as the pie, strip, and bar displays. The pie and strip displays have often been used in choice experiments, the bar display has not. Subjects had to select either the risky or risk-averse option. Subjects may have had more difficulty forming a preference using the bar display in this experiment, as the value of the outcomes was not explicitly stated. As the lottery outcomes were stated clearly in the strip and pie displays, this suggests the bar display used here was fundamentally different. If so then we cannot be certain whether to attribute differences in choice behaviour

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

283

Fig. 1. Details of the choice problems. A total of 18 sub-groups used the payoff values for u,v and probabilities for A, B, C listed above. The question numbers, types and displays indicated are those for one sub-group. For examples of question type, see Fig. 3 a,b and c. Questions 7–10 were used for the monotonicity test. The hypotheses were explored using the high and low gap lottery pairs in questions 1–6 and 11–16.

to some inherent property of the bar display or to the ease with which the outcomes were discerned. A slightly different design could address this issue. 7 The three display groups were then split so that half had the labels ‘A’ and ‘B’ on the lotteries reversed, and finally the resulting six groups were each broken into three different question orderings. In total this gave 18 subgroups; subjects were randomly assigned to these subgroups over the various sessions. The purpose of the labelling and question-order subgroups was to minimise other possible sources of bias. Camerer (1989) found that choice reversal rates can depend directly on the gap between the first encounter and the second, although the reversal rate stabilises if the repetition is separated by a gap of more than five other choice problems. Consequently, all pure repeats in this experiment were kept six choices apart to lessen the memory factor in lowering reversal rates. Each subject’s response to a question was converted into a score from 1 to 18. On this scale, a choice of gamble ‘A’ coupled with a maximum preference score of 9 was assigned an extreme value of 1. A choice of gamble ‘B’ coupled with a maximum preference of 9 was assigned the opposite extreme value of 18, thereby creating a continuum stretching from 1 to 18 denoting decreasing preference for ‘A’. Hence, a subject choosing ‘A’ combined with a preference strength of just 3 would have a score of 7 for that question; a subject choosing ‘B’ with a preference strength of 7 would have a score of 16 assigned. This method allows 7

Simply print the dollar value on each bar.

284

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

Fig. 2. Experimental displays used.

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

Fig. 2 (Continued).

285

286

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

Fig. 3. The juxtapositioning of lottery consequences: strip display. (Other displays used equivalent transformations. Ref Figs. 1 and 2).

both the subject’s choice and strength of preference for a question to be combined into a single indicator. There are two possible ways of recording ‘errors’ in this design. First, like other researchers the paper focuses on those cases where the choice between ‘A’ and ‘B’ reverses. I will refer to this as the ‘standard’ measure. Assuming subjects who used a 7, 8 or 9 (on the 9-point preference scale) had a clear preference, the dependence of choice reversals (errors) on preference clarity could be investigated. The indicator was designed with this definition in mind (see Fig. 2). Additionally, we can look to see the proportion of subjects that made a change of 9 or more points on the 18-point preference scale. We will call this a ‘9+reversal rate’. Where relevant, both sets of results will be reported; the latter definition is clearly the more stringent, though not directly comparable to results of previous research. Each subject also faced pairs of choices that: 1. controlled for displays and event-splitting effects but changed the juxtaposition of consequences; 2. controlled for displays and juxtapositions but changed the number of event splits; 3. controlled for displays but allowed the juxtaposition and the number of event splits to change simultaneously. This design (see Fig. 3) permits the display-dependence of regret and event-splitting effects to be investigated. Finally, subjects were told that payment would depend on one of their chosen gambles being played out for real, and that payments could range from zero to A$50. At the end

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

287

Table 1 Proportions of risky choices, by display and clarity Display

Risky (%)

Risk averse (%)

Total

% of risky on unclear

% of risk averse on unclear

Overall average preference strength

High gap lottery paira Strip (4.3) Pie (8.8) Bar (15.5) Average (9.5)

(95.6) (91.2) (84.5) (90.5)

184 192 180 556

(75.0) (94.1) (78.6) (83.0)

(18.2) (30.3) (48.7) (31.6)

7.4 6.76 6.27 6.81

Low gap lottery pairb Strip (42.4) Pie (31.7) Bar (42.7) Average (39.0)

(57.6) (68.3) (57.3) (61.0)

186 180 178 544

(72.1) (75.4) (68.4) (71.7)

(57.0) (55.3) (57.8) (56.6)

5.83 6.12 5.98 5.97

a χ 2 = 13.46 > 5.99, therefore, the proportion of risky choices is display dependent; χ 2 = 2.23 < 5.99 therefore the proportion of risky choices on unclear preferences is not display dependent; χ 2 = 35.3 > 5.99, therefore, the proportion of risk averse choices on unclear preferences is display dependent. b χ 2 = 6.03 > 5.99, therefore, the proportion of risky choices is display dependent; χ 2 = 0.80 < 5.99, therefore, the proportion of risky choices on unclear preferences is not display dependent; χ 2 = 0.16 < 5.99, therefore, the proportion of risk averse choices on unclear preferences is not display dependent.

of the experiment a random device selected one of the 16 questions the subject faced. The lottery was then played for real by allowing the subject to draw a lottery ticket numbered from 1 to 100 from a box, and payment occurred in accordance with their lottery choice for that ticket. Overall, around half of the subjects won money, the average sum being A$ 26 for less than an hour of their time. The experimental design follows the incentive-compatible method developed by Becker et al. (1964), but it is not completely uncontroversial (Machina, 1989). This is because a subject’s answers to any one choice problem needs to be independent of their responses to any other choice problems in the experiment. If not, they may reduce what then becomes a two-stage lottery to a one-stage lottery, thereby undermining the incentive structure. However, Starmer and Sugden (1991) provide strong evidence against this reduction process in experiments of this type. I shall follow the bulk of the literature in assuming that the Becker, DeGroot, Marschak method is basically sound. 3.2. Experimental results Hypotheses 1a and b, and Hypothesis 2b can be investigated with reference to Table 1. As six otherwise identical questions were asked in each of the three displays for both the high and low gap pair we can compare the average confidence scores for each display for these questions. It is clear that preferences in the low gap lottery pair (Table 1) are much hazier than for the high gap pair (Table 1), as H1a proposed. Table 1 also shows a near perfect negative rank correlation between the average preference strength for the question and the degree of pro-risk behaviour, as H1b implies. The great majority of risky choices occurred on unclear preferences, far higher than the proportion for risk-averse choices, especially for the high gap pair. Note that in general there is no additional display effect on the proportion

288

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

Table 2 Strength of preference by presentational displaya Percentage of subjects indicating clear choices both times the same lottery pair was presentedb Display

Clear (%)

Total

Pie Bar Strip Average

(46.1) (31.1) (63.7) (47.3)

102 90 102 294

a Note: The 294 identical repeats comprise three repeats for each of the 94 subjects plus an additional unplanned repeat for 12 subjects. The latter was due to a typographical error in a question shown to some sub-groups, which also explains why some of the other totals reported may not be divisible by 94. b χ 2 19.3 > χ 2 .05,2 = 5.99, therefore, the display affects the proportion of clear preferences.

of risky choices on unclear preferences. This is to be expected, as the display effect operates by changing the ease of preference formation, and as a consequence the proportions of risky choices. 8 If GEU models are not core preference theories, this suggests that use of a low gap lottery pair has the potential to be more conducive to any choice switches, be they errors, GEU effects or Event-Splitting Effects, than would a high gap pair. After all, if no risky choices are made at all, then all choice-switching rates must be zero. Hypothesis 2 focuses directly on the core preferences issue. Table 2 shows the clarity of preferences by display, for the questions repeated identically, all using the high gap lottery pair. These results suggest that the proportion of subjects with clear preferences did indeed vary by display, as Table 1 also suggested. There is a clear ranking of the displays used, with the strip display promoting the clearest preferences. These findings are consistent with H2a. Given the success of H2a, it is not surprising that Hypothesis 2b (on the frequency of risky choices) is also supported, as can be seen in Table 1. These results suggest we may find support for our explanation of choice errors and GEU effects, as hypothesised in H2c. Choice reversals should occur predominantly for those choices where subjects are least sure of their preference, rather than being spread across preference levels as Harless and Camerer’s (1994) ‘white noise’ theory would imply. The breakdown of the 294 questions that were repeated identically is as follows (not shown in Table 2). Of the 139 times that a confident choice is made on both the original and repeat question, there are only three choice-reversals, giving a reversal rate of 2.1 percent. Of the remaining 155 times where a subject is less confident of their choice on one or both occasions, there are 32 choice reversals, giving a reversal rate of 20.6 percent. This difference is easily significant: 2 = 3.84. Overall, there are 35 reversals, representing 11.9 percent of the χ 2 = 23.9 > χ.05,1 total 294 questions. For the same 294 repetitions, the 35 reversals were spread across displays as shown in Table 3. Given the relatively small number of reversals, these display differences narrowly fail to be significant at the 5 percent level, although they are easily significant at the 10 percent level. However, the ranking of choice reversal rates by display is consistent with 8

The puzzling exception (see Table 1) involved risk-averse choices in the high gap pair.

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

289

Table 3 Choice reversals by displaya 9 + measure

Standard measure Display

Reversals (%)

Total

Reversals (%)

Total

Pie Bar Strip Total

(14.7) (15.5) (5.9) (11.9)

102 90 102 294

(8.8) (14.4) (4.9) (9.2)

102 90 102 294

a

2 χ 2 5.42 < χ.05,2 = 5.99, therefore, choice reversals narrowly fail to be display dependent at the 5% level.

Hypothesis 2: the strip display has the largest proportion of clear preferences and the fewest reversals, the bar display the least clear preferences and the most reversals. These results show a lower reversal rate than those found by others, for example. Hey and Orme (1994). One plausible explanation is that we used only the high gap lottery pair, whereas Hey and Orme’s choice pairs generally involved lottery pairs with more similar expected values. The experiment also offered precisely the same gamble to a subject twice, but in different displays each time. The findings for the three display groups, all of which used the low gap lottery pair, are shown in Table 4. The overall rate of 33.8 percent is well in excess of the 11.9 percent reported previously for the high gap pair when the display was unchanged. Using the 18-point scale, exactly the same score was chosen by just 25.4 percent (65/256) for choices repeated in different displays. This compares with 42.5 percent (114/268) when the choice was repeated within a display on the high gap pair. The dramatic difference in choice reversal rates between Tables 3 and 4 is explained by both the use of the low gap rather than high gap lottery pair in Table 4, and the trans-display comparison. It is not possible to separate the impacts of the display-change and (G)EU-gap change in this particular study, however, that now seems a worthwhile question for future work. Continuing with Hypothesis 2, we now look at the choice-rule effects. There are three plausible ways of measuring such effects in this experiment. First, there is the traditional count when the lotteries’ consequences are juxtaposed, and/or the number of events split. Secondly, the number of 9+switches could be counted. Thirdly, the number of shifts in the direction predicted by these theories on the 18-point scale can be calculated. Each of these scores can then be contrasted with the number of moves in the opposite direction. Table 4 Choice reversals as display changesa 9 + measures

Standard measure Display pairs

Reversals (%)

Total

Reversals (%)

Total

Strip and bar Bar and pie Strip and pie Total

(38.4) (31.7) (31.2) (33.8)

86 80 90 256

(24.4) (15.0) (14.4) (18.0)

86 80 90 256

a

2 χ 2 1.3 < χ.05,2 = 5.99, therefore, no display-pair differences in reversal rates.

290

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

Table 5 Combined juxtaposition/event-splitting effects (i)a Number of subjects changing choices following the above change, by clarity

Clear Unclear Total a

Predicted (%)

Unpredicted (%)

Neither (%)

Total

(5.1) (19.4) (14.8)

(1.7) (6.1) (4.7)

(93.2) (74.5) (80.5)

117 247 364

2 χ 2 17.63 > χ.05,2 = 5.99, therefore, combined juxtaposition effects are concentrated on unclear preferences.

The first general result is that, overall, the frequency of combined juxtaposition/eventsplitting, pure regret and pure event-splitting effects is rather low in comparison with other studies. Again, the difference in expected values of the lotteries was somewhat higher than in other experiments; this should not have prevented movements in the predicted direction on the 18-point scale though. Half of the comparisons used the high gap pair and half the low gap pair, for each test and each display. Tables 5 and 6 give the results for combined juxtaposition effects (i.e. a regret test without standardising for event-splitting effects; compare problems a and c in Fig. 3). If the preference construction explanation of GEU effects is correct, we would need to see the great majority occur on choices where preferences were unclear. This is what we find. However, it is important to state that although the evidence is consistent with that view, by itself the results in Table 5 do not constitute evidence against the core theory alternative. This is because a person may have GEU preferences without producing a GEU ‘effect’ if the utilities of the lotteries are sufficiently different. If so then such a person would report clear preferences and their choice would appear to be consistent with EU, despite their true preferences being GEU. Of course, the greater the evidence consistent with the preference constructing view, the more the onus shifts to advocates of GEU models to demonstrate why their interpretation of this is the more plausible. Table 6 investigates the display dependence of these effects. The best display to observe them is the pie display. Although the display differences are not significant at the 5 percent level, they are so at the 10 percent level. Using the 18-point scale, of the 61.5 percent of movements, 67 percent favour the predicted direction overall, comprising a 60 percent share in the bar display, 63 percent in the strip display, and an overwhelming 79 percent in the Table 6 Combined juxtaposition/event-splitting effects (ii)a Numbers of subjects changing choices following a juxtaposition of consequences by displayb 9 + measure

Standard measure

18-point moves

Display

P (%)

U (%)

N (%)

P (%)

U (%)

N (%)

P (%)

U (%)

N (%)

T

Pie Bar Strip Total

(18.5) (12.1) (13.8) (14.8)

(0.8) (7.2) (6.0) (4.7)

(80.7) (80.7) (80.2) (80.5)

(11.3) (8.9) (8.6) (9.6)

(0.8) (4.0) (6.0) (3.6)

(87.9) (87.1) (85.4) (86.8)

(45.2) (37.1) (41.4) (41.2)

(12.1) (25.0) (24.1) (20.3)

(42.7) (37.9) (34.5) (38.5)

124 124 116 364

a b

2 χ 2 8.52 < χ.05,4 = 9.48, therefore, combined juxtaposition effects narrowly fail to be display dependent. P = predicted, U = unpredicted and N = neither.

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

291

Table 7 Regret effects (i)a Number of subjects changing choices following a pure juxtaposition of consequences, by clarity

Clear Unclear Total a

Predicted (%)

Unpredicted (%)

Neither (%)

Total

(3.4) (11.9) (8.4)

(1.3) (8.1) (5.3)

(95.2) (80.0) (86.2)

146 210 356

2 χ 2 16.96 > χ.05,2 = 5.99, therefore, regret effects are concentrated on hazy preferences.

pie display. Taken together these results for combined effects suggest that the juxtaposition phenomenon may be a milder, but more widespread phenomenon than has previously been realised. Tables 7 and 8 look at the pure-regret effects (i.e. juxtaposition effects after standardising for event splits). Again, Table 7 shows that observed effects are several times more likely to occur on unclear preferences. By display, the presentations this time do have a statistically significant impact on these effects. The predicted effects are concentrated in the bar display. Of the 58 percent who move on the 18-point scale under a pure regret test, 59 percent move in the predicted direction. These results suggest that the regret effect is less pronounced than the juxtaposition effect, and is more concentrated in the bar display. The regret effect does not appear to be strong, though the asymmetry on the 18-point scale suggests it may be a genuine, though very mild, factor in some peoples’ decisions. Tables 9 and 10 show the results for pure event-splitting effects. Using the conventional choice count measure, Table 9 shows event-splitting effects are also concentrated where preferences were hazy. The displays however, are not the cause of a statistically significant difference in these effects, though a larger experiment may have shown them to be. The pie display is most suited to these effects, and the bar display the least suited. Using the more general test of the number of shifts of any magnitude on the 18-point scale, we find that for event-splitting effects, predicted shifts comprise 61 percent of the 63 percent of movements; this conceals a 53 percent share in the bar display, and a 66 percent share for the other displays. Table 8 Regret effects (ii)a Number of subjects changing choices following a pure juxtaposition of consequences, by displayb 9 + measure

Standard measure

18-point moves

Display P (%)

U (%)

N (%)

P (%)

U (%)

N (%)

P (%)

U (%)

N (%)

T

Pie Bar Strip Total

(1.7) (7.9) (6.5) (5.3)

(92.5) (78.9) (86.9) (86.2)

(3.3) (10.5) (4.9) (6.2)

(1.7) (5.3) (4.1) (3.6)

(95.0) (84.2) (91.0) (90.2)

(38.3) (36.8) (27.9) (34.2)

(19.2) (22.8) (29.5) (23.9)

(42.5) (40.3) (42.6) (41.8)

120 114 122 356

a b

(5.8) (13.1) (6.5) (8.4)

2 χ 2 10.5 > χ.05,4 = 9.48, therefore, regret effects are display dependent. P = predicted, U =unpredicted and N = neither.

292

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

Table 9 Event-splitting effects (i)a Number of subjects changing choices following a splitting of events, by preference strength

Clear Unclear Total a

Predicted (%)

Unpredicted (%)

Neither (%)

Total

(3.5) (19.6) (14.4)

(2.6) (9.4) (7.2)

(93.8) (71.0) (78.4)

113 235 348

2 χ 2 23.4 > χ.05,2 = 5.99, therefore, event splits are concentrated on hazy preferences.

An unexpected result was also found. Four questions (7–10 in Fig. 1) unconnected to testing for choice reversals or for juxtaposition/event-splitting effects were also faced by each subject. Their purpose was to be identical to one of the other questions, except for an improvement to either the probability or dollar value of a positive consequence. Assuming event-wise monotonicity, the idea was to build into the experimental design a test of the preference indicator. For instance, if a subject chose gamble ‘A’ over gamble ‘B’ in one question, then faced another question that had improved either the probability of a favourable outcome or the consequence concerned, then ceteris paribus their confidence in having chosen ‘A’ should improve. Similarly, if they had chosen ‘B’ on the first question, their confidence in that choice should now fall, and for some their choice would switch to ‘A’. The results however, confounded this plan. Table 11 shows that out of 542 valid comparisons, 402 do not change their choice and 117 others switch their choice in the expected direction. But no fewer than 23, or 4.2 percent, switch their choices in the wrong direction! Of these 23, 16 also switch the wrong way using the 9+measure. Using the 18-point scale to check whether the preference strength moved in the expected direction (regardless of whether the choices switch or not) there are 315 moves in the predicted direction, but 84 in the opposite direction. Thus only 79 percent of the 399 movements are in favour of monotonicity. No existing normative theory of choice under risk could defend violations of the event-wise monotonicity axiom, but it is not inconsistent with the general thrust of this paper. Consider for example the within-display choice-reversal rate of 11.9 percent on the high gap lottery pair reported above. If an individual has unusually poor knowledge of their preferences, Table 10 Event-splitting effects (ii)a Number of subjects changing choices following a splitting of events, by displayb 9 + measure

Standard measure

18-point moves

Display P (%)

U (%)

N (%)

P (%)

U (%)

N (%)

P (%)

U (%)

N (%)

T

Pie Bar Strip Total

(4.5) (11.4) (5.7) (7.2)

(76.7) (77.2) (81.1) (78.4)

(12.5) (8.8) (8.2) (9.8)

(2.7) (8.8) (4.9) (5.5)

(84.8) (82.4) (86.9) (84.7)

(42.0) (33.3) (41.0) (38.8)

(21.4) (29.8) (22.1) (24.4)

(36.6) (36.8) (36.9) (36.8)

112 114 122 348

a b

(18.7) (11.4) (13.1) (14.4)

2 χ 2 6.8 < χ.05,2 = 9.48, therefore, event splits are not display dependent. P = predicted, U = unpredicted and N = neither.

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

293

Table 11 Monotonicity violationsa Percentage of subjects changing choices for the questions listedb Problem pair

7 and 3/13 9 and 3/13 8 and 2 10 and 2 Total

9 + measure

Standard measure P%

U%

N%

P%

U%

N%

Total

(5.7) (35.7) (25.0) (20.6) (21.6)

(4.0) (4.4) (2.2) (6.5) (4.2)

(90.3) (59.9) (72.8) (72.9) (74.1)

(5.7) (33.0) (23.9) (13.0) (19.2)

(4.0) (3.3) (0.0) (3.3) (3.0)

(90.3) (63.7) (76.1) (83.7) (77.8)

176 182 92 92 542

a

Note: Questions 3 and 13 were identical, so have been combined here. Each subject potentially provided six observations, and 542 of the 564 resulting comparisons were valid. b P = predicted, U = unpredicted and N = neither.

it should be possible for a fraction of those people to violate event-wise monotonicity if an unsuitable choice rule is prompted. But even so, it should make economists uneasy that ceteris paribus improvements to the expected value of one gamble on the order of A$ 2–A$ 6, should result in a choice switch away from that gamble some 4 percent of the time. 9 It suggests that the haziness and uncertainty surrounding our preferences may be deeper than has hitherto been supposed.

4. Discussion and conclusion 4.1. Relevant literature As in this paper, similarity theory (Leland, 1994) applies only when the utilities of the choice options are close; for other choices Expected Utility theory is assumed to hold. When the lotteries are similar, preferences are said to be unclear and a simple series of comparisons are made in a particular order to produce a choice. Choice reversals are predicted as a by-product of gaps in the choice rule. Similarity theory assumes individuals apply a lexicographic semi-order, which is a limit case of the compensatory additive-difference choice rule (see Suppes et al., 1989). Being lexicographic (therefore, non-compensatory) it involves fewer cognitive costs than the general case of the additive-difference model, while still generating some of the predictions of Generalised Expected Utility theories based on that model, such as regret theory. But there are weaknesses; measuring the similarity of choice objects is not always easy. Buschena and Zilberman (1996) argue it is unclear how to apply the theory to lotteries with multiple non-zero outcomes. Also, its inability to confront trade-offs prevents it from making a prediction other than for an arbitrary choice for some choice pairs, when arbitrariness may not otherwise seem to be required. Similarity theory’s predictions when the presentational displays are altered have also not been formalised, as the theory is defined on lotteries. But similarity theory has received some experimental support. It is probable that similarity 9

All expected values were under A$20.

294

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

has a role to play in explaining the clarity of a subject’s preference: the more similar the lotteries are perceived to be, the less likely is a confident choice, and the more likely is the risky gamble to be selected. But the correspondence is far from perfect; for example the Expected Value rule can produce a clear preference even for very similar lotteries. Also Butler and Loomes (1988) found that increasing the variance of one lottery in a pair-wise choice problem led to less confident choices, despite the options becoming less similar. Although no complete list of items likely to impact on preference clarity is offered here, factors such as the displays used to present the lotteries and their presentation within a display, the variance of the riskier lottery and the magnitude of differences in expected value are likely candidates. This is a task for future research. Butler and Loomes also report evidence of extensive haziness of preference, even for simple, well-defined choice problems. They suggest that individuals do not have cognitively cheap access to a clear preference ordering, even over relatively straightforward lotteries. Loomes (1998) presents further evidence that subjects’ responses are constructed, rather than a reflection of well-defined underlying preferences. Loomes (1988) provides evidence that alternative procedures for eliciting preferences lead to different certainty-equivalent valuations of risky options. When asked to state their certainty equivalents for various simple lotteries, most individuals employed very coarse-grained rounding for those values. He found that one method 10 of eliciting preferences led to notably less rounding of certainty equivalents, which in the current paper would imply relatively small Just Noticeable Differences, a clearer view of one’s underlying disposition and hence fewer choice reversals. As heterogeneity in the extent of rounding used by subjects was also observed, it may be true that different individuals will exhibit Expected Utility and Generalised Expected Utility patterns and choice reversal rates in different degrees. I make the normative presumption that for a given expenditure of cognitive resources, the occurrence of choice reversals should be minimised. The more often the option with the higher utility can be recognised and selected, the greater will be total utility. This is equivalent to prescribing the use of displays and elicitation methods that prompt the use of the most suitable choice rules to improve the clarity of the preferences. Payne et al. (1993) made a related argument for choice in general, using a broader definition of accuracy: “...better decisions can be encouraged by designing displays that passively encourage more accurate strategies by making them easier to execute”. Both Loomes et al. (1997) and Ballinger and Wilcox (1997) assume a family of stochastic models that each incorporate a core Generalised Expected Utility theory which they describe as the deterministic special case as the stochastic element reduces to zero. Thus, regret theory could be a core theory, just as easily as could Expected Utility theory. Choice reversals are then measured against these underlying core preference structures. Loomes et al. divide the explanations of choice reversals in the literature into three parts, depending on the stage at which the randomness is introduced: a preference selection stage, a computation stage, and an action stage. They suggest viewing the Harless and Camerer (1994) model as one of ‘white noise’ located in the action stage, which we have here argued plays a limited role in explaining the extent and distribution of choice reversals. They then 10

The ‘iterative choice and valuation’ method.

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

295

argue that Hey and Orme (1994) assume the randomness occurs at the computation stage, where individuals are liable to processing errors. Loomes et al. prefer a random preference model, in which individuals possess a stochastic utility function, rather than one single true utility function. That is, they locate the error in the initial preference-selection stage; once a utility function has been selected, there is no further error. Although the Harless and Camerer model can be appended to the other models, the remaining theories are more clearly rivals to each other. By contrast I have not argued that our ‘underlying disposition’ (utility function) is random, but that there is no uniquely defined utility function within a Just Noticeable Difference. It is the incompleteness of a stable utility function that leads to attempts at preference construction, and any remaining choice errors. The results of the present experiment offer support to this explanation of errors. Encouragingly, event-splitting effects and regret effects also seem to be disproportionately concentrated in lottery pairs and displays where subjects are less sure of their preference. Although not all of the latter were statistically significant, a larger sample size might have made them so. These findings suggest that the Generalised Expected Utility theories’ claims to represent core preference structures are questionable, making the theoretical results in Butler (1998) more pertinent.

5. Conclusions The experimental work in this paper is largely exploratory. Future research could follow up a variety of the results reported herein to check for their robustness and clarify a number of remaining issues. Would use of other displays, such as the matrix display, have led to different results? Just how different do the expected values of gambles have to be before violations of monotonicity drop to zero, or are they explained by a white-noise term? To what extent would a ‘similarity’ indicator correlate with a ‘strength of preference’ indicator? What exactly are the correlates of preference strength? If the approach taken in this paper is correct, decision theories can still be based on assumptions about rationality, but the definition of rationality should reflect: 1. the paucity of available information 2. the often conflicting nature of motivations, which can promote the use of non-compensatory rules 3. the cognitive limitations of the choosing individual 4. the extent of the benefits expected from greater cognitive effort. It is when these factors matter that the theory of behaviour suggested by the Von Neumann and Morgenstern and Savage axiomatic systems falters. Important implications for industries such as advertising and marketing concern the fraction of our choices that are manipulable by choice of presentation, and to what extent. For example, a recent study by North et al. (1997) showed that relative sales of German and French wine in a British supermarket were drastically affected according to whether Parisian accordion music or German bierkeller pieces were being played on a tape deck near the drinks display. If consumer preferences are not invariant even under such trivial manipulations of the choice context, what are the implications for the economist’s theories of the rational consumer? We need to find new models of choice to resolve these anomalies.

296

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

A tentative conclusion is that the prescriptive role of decision theory should be limited to assistance in the choice of presentational formats for particular problems. This is because given incomplete preferences, Generalised Expected Utility choice patterns can be justified because they assist in narrowing the set of arbitrary choices. If our preferences can be made clearer through the choice of appropriate displays and elicitation methods, then non-expected utility choice patterns and choice reversals will be reduced. Research from evolutionary psychology (e.g. Cosmides and Tooby, 1992) proposes that our brains evolved numerous modules for dealing with problems common in the ancestral environment. Their work shows that subjects can cope well with decision problems expressed in a manner conducive to those modules, while failing badly on logically equivalent problems expressed in a different way. These findings have hinted at a possible future discipline of ‘cognitive engineering’ which seeks to improve human decision-making by expressing problems in a manner conducive to our evolved capacities. The findings of this paper fit nicely with such a strategy.

Acknowledgements Much of this paper was written while visiting the University of Arizona. Thanks to Paul Miller, Peter Walley, Robert Sugden, Chris Starmer, James Cox, John Hey, Graham Loomes, Simon Grant, the Editor and a very helpful referee for their comments in the preparation of this paper. Thanks also to Jason Wood for writing the computer program. References Aumann, R.J., 1962. Utility theory without the completeness axiom. Econometrica 30, 445–462. Ballinger, T.P., Wilcox, N.T., 1997. Decisions, error and heterogeneity. Economic Journal 107, 1090–1105. Becker, G.M., DeGroot, M.H., Marschak, J., 1964. Measuring utility by a single-response sequential method. Behavioral Science 9, 226–232. Buschena, D., Zilberman, D., 1995. Performance of the similarity hypothesis relative to existing models of risky choice. Journal of Risk and Uncertainty 11, 233–262. Buschena, D., Zilberman, D., 1996. Predictive Value of Incentives, Decision Difficulty, and Expected Utility Theory for Risky Choices, mimeo. Butler, D.J., 1998. A choice-rule formulation of intransitive utility theory. Economics Letters 59, 323–329. Butler, D.J., Loomes, G.C., 1988. Decision difficulty and imprecise preferences. Acta Psychologica 68, 183–196. Camerer, C., 1989. An experimental test of several generalised utility theories. Journal of Risk and Uncertainty 2, 61–104. Camerer, C., 1995. Individual decision-making’. In: Kagel, J., Roth, A.E. (Eds.), Handbook of Experimental Economics. Princeton University Press, Princeton, pp. 587–703. Cosmides, L., Tooby, J., 1992. Cognitive adaptations for social exchange. In: Barkow, J., Cosmides, L., Tooby, J. (Eds.), The Adapted Mind. Oxford University Press, New York, pp. 163–229. Fechner’s, G., 1966. In: Howes, D., Boring, E. (Eds.), Elements Of Psychophysics. Holt, Rinehart and Winston, New York. Friedman, D., 1989. The S-shaped value function as a constrained optimum. American Economic Review 79, 1243–1248. Georgescu-Roegen, N., 1966. Analytical Economics. Harvard University Press, Cambridge. Harless, D., 1992. Actions versus prospects: the effect of problem representation on regret. American Economic Review 82, 634–649.

D.J. Butler / J. of Economic Behavior & Org. 41 (2000) 277–297

297

Harless, D., Camerer, C., 1994. The predictive utility of generalised expected utility theories. Econometrica 62, 1251–1290. Hey, J.D., Orme, C.D., 1994. Investigating generalisations of expected utility theory using experimental data. Econometrica 62, 1291–1326. Humphrey, S., 1995. Regret aversion or event-splitting effects? More evidence under risk and uncertainty. Journal of Risk and Uncertainty 11, 263–274. Jarvenpaa, S., 1989. The effect of task demands and graphical format on information processing strategies. Management Science 35, 285–303. Keller, L.R., 1985. The effects of problem representation on the sure-thing and substitution principles. Management Science 31, 738–751. Leland, J., 1994. Generalized similarity judgments: an alternative explanation for choice anomalies. Journal of Risk and Uncertainty 9, 151–172. Loomes, G.C., 1988. Different experimental procedures for obtaining valuations of risky actions: implications for utility theory. Theory and Decision 25, 1–23. Loomes, G.C., 1998. Probabilities vs. money: a test of some fundamental assumptions about rational decision making. Economic Journal 108, 477–489. Loomes, G.C., Moffatt, P., Sugden, R., 1997. A Microeconometric Test of Expected Utility Theory, mimeo. Loomes, G.C., Sugden, R., 1987. Some implications of a more general form of regret theory. Journal of Economic Theory 41, 270–287. Machina, M., 1989. Dynamic consistency and non-expected utility models of choice under uncertainty. Journal of Economic Literature 27, 1622–1668. North, A., Hargreaves, D.J., McKendrick, J., 1997. In-store music affects product choice. Nature 390, 132. Payne, J., Bettman, J., Johnson, E., 1992. Behavioral decision research: a constructive processing perspective. Annual Review of Psychology 43, 87–131. Payne, J., Bettman, J., Johnson, E., 1993. The Adaptive Decision-Maker. Cambridge University Press, Cambridge. Savage, L.J., 1954. The Foundations of Statistics. Wiley, New York. Schick, F., 1984. Having Reasons. Princeton University Press, Princeton. Starmer, C., Sugden, R., 1991. Does the random lottery incentive system elicit true preferences? An experimental investigation. American Economic Review 81, 971–978. Starmer, C., Sugden, R., 1993. Testing for juxtaposition and event-splitting effects. Journal of Risk and Uncertainty 2, 159–178. Sugden, R., 1995. Alternatives to Expected Utility Theory, mimeo. Suppes, P., Krantz, D., Luce, R., Tversky, A., 1989. Foundations of Measurement II, Academic Press, San Diego. Walley, P., 1991. Statistical Reasoning with Imprecise Probabilities. Chapman & Hall, London.