Journal of Health Economics 25 (2006) 803–820
Further evidence of preference reversals: Choice, valuation and ranking over distributions of life expectancy Adam Oliver ∗ LSE Health and Social Care, Department of Social Policy, London School of Economics and Political Science, Houghton Street, London WC2A 2AE, UK Received 10 August 2004; received in revised form 22 August 2005; accepted 23 August 2005 Available online 23 September 2005
Abstract Moving between choice and direct valuation procedures can cause substantial, systematic preference reversals, which is problematic because it leaves us unsure as to which procedure (if any) is ‘best’. This article reports a study that tests whether using a ranking procedure to infer values rather than more conventional direct valuation can negate this phenomenon. The results suggest that the ranking procedure can render strict preference reversals both less substantial and less systematic. This could have important implications for the valuation of health outcomes, and in attempting to ascertain whether choice, valuation or ranking best uncovers people’s preferences, similar tests in the context of more realistic health outcome descriptions are recommended. © 2005 Elsevier B.V. All rights reserved. JEL classification: C91; D81; I10 Keywords: Preference reversals; Choice; Valuation; Ranking; Life expectancy distributions
1. Introduction The assumption of procedural invariance underlies standard economic theory. That is, it is assumed that an individual’s preferences will be unaffected by the preference elicitation procedure used. If preferences over goods could be altered simply by changing the elicitation ∗
Tel.: +44 20 7955 6471/6840; fax: +44 20 7955 6803. E-mail address:
[email protected].
0167-6296/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.jhealeco.2005.08.004
804
A. Oliver / Journal of Health Economics 25 (2006) 803–820
procedure, we would face a dilemma: which procedure (if any) most accurately elicits underlying preferences? This dilemma is not merely theoretical. For example, under the assumptions of expected utility theory, the probability equivalents and certainty equivalents versions of the standard gamble ought to elicit the same values for any particular health state, but it has been demonstrated in numerous studies that the former procedure generates significantly higher values than the latter (Hershey and Schoemaker, 1985; Pope, 2004). Moreover, over the past forty years, much evidence has been collected on a further violation of procedural invariance that is generally termed ‘preference reversal’ (for a review of the preference reversal literature, see Seidl, 2002). The most well known and frequently replicated form of preference reversal mirrors those uncovered by psychologists in the 1960s (Lindman, 1965; Slovic and Lichtenstein, 1968) and involve two bets, commonly termed the $-bet and the P-bet. The $-bet offers a modest probability of winning a relatively large amount, the P-bet offers a high probability of winning a modest amount, and the two bets have similar expected values. Respondents are asked to choose directly between the $-bet and the P-bet, and are also asked for their certainty equivalents of each of the two bets, usually by eliciting their selling prices. In a large number of studies a substantial percentage of respondents – sometimes as high as 70–80% – choose the P-bet over the $-bet, but value the $-bet higher than the P-bet (Seidl, 2002). As an example, consider the following bets, taken from Lichtenstein and Slovic (1971): ($16, 11/36; −$1.50, 25/36) ($4, 35/36; −$1, 1/36)
$-bet P-bet
Here, the $-bet offers an 11/36 chance of winning $16 and a 25/36 chance of losing $1.50. The P-bet can be similarly read. Lichtenstein and Slovic (1971) undertook three tests of preference reversal within their study, and reported that between 51 and 83% of their respondents chose the P-bet but placed a higher value on the $-bet. It is important to emphasize that in addition to being substantial, preference reversals are generally found to be systematic. That is, preference reversals in the direction of P-bet $-bet and v($-bet) > v(P-bet), where denotes the preference relation ‘is preferred to in a choice task’ and v(.) denotes the monetary value of a bet, are far more common that those in the direction of $-bet P-bet and v(P-bet) > v($bet).1 Systematic violations of theoretical propositions cannot adequately be attributed to random error. Several explanations for the observance of preference reversals have been suggested, but the explanation that probably carries the most weight in the research community at this moment in time is that which attributes most of the phenomenon to the use of different heuristics across elicitation procedures.2 That is, choice and valuation tasks may well be driven by different cognitive
1 For example, Lichtenstein and Slovic (1971) observed that between 6 and 27% of their respondents demonstrated preference reversals in this latter direction (which are now often referred to as ‘unpredicted’ – as opposed to the more commonly observed ‘predicted’ – preference reversals). 2 See Seidl (2002) for an extensive literature survey on preference reversals and the reasons for why they may occur. Other that the ‘differential heuristic’ explanation, Seidl notes that it is sometimes claimed that the ‘endowment effect’ will often cause people to overstate their selling prices (or ‘willingness to accept’), and thus preference reversals may be a peculiar feature of selling price elicitation. However, Seidl cites a number of studies where significant predicted preference reversals occur when certainty equivalents are elicited via buying prices (or ‘willingness to pay’). A further explanation for preference reversals is that people possess an intransitive preference ordering, although Seidl cites evidence that demonstrates that genuine intransitive preferences are likely to explain only 10–20% of the phenomenon (see also Tversky et al., 1990).
A. Oliver / Journal of Health Economics 25 (2006) 803–820
805
processes (or ‘rules of thumb’); as first noted by Slovic and Lichtenstein (1968), choice tasks might encourage greater focus on the probability of winning – which favours the P-bet, whilst (monetary) valuation tasks may tend to focus attention on the (usually, money) payoffs – which favours the $-bet.3 More specifically, it has been proposed that a likely reason for why the $-bet tends to be valued higher than the P-bet is because, as a starting point when valuing the $-bet, people often ‘anchor’ on its best outcome, but then fail to adjust the overall value of this bet downwards sufficiently to take account of its other attributes4 (e.g., Bateman et al., 2002; Lichtenstein and Slovic, 1971). The suggestion is that this consequently causes an ‘overpricing’ or ‘overvaluation’ of the $-bet. By accepting the plausibility of the heuristic explanation for preference reversals, Bateman et al. (2002) hypothesised that the use of a ranking procedure to obtain respondents’ valuations for bets might lessen this phenomenon (although given that 10–20% of preference reversals are seemingly the result of intransitivity (Tversky et al., 1990), it is perhaps not reasonable to expect ranking to eliminate them entirely). Valuations can be obtained through a ranking exercise by asking respondents to rank a bet alongside a number of ‘sure amounts’. The value of the bet is then inferred from the values of the two sure amounts that it is placed between. The idea is that by engaging the respondents in a task where they are encouraged to consider explicitly a broad range of sure amounts when considering their value for a $-bet, they will be more likely to adjust downwards their anchoring on the best outcome in the bet to take into account its other attributes. Bateman et al. (2002) tested their hypothesis with four sets of differentially constructed $-bets and P-bets. They discovered that whilst the ranking procedure (compared to the conventional valuation procedure), when coupled with a choice task, generated preference reversals that were noticeably less substantial and systematic in two of their tests, in the other two tests the anomaly remained almost as substantial and at least as systematic. Bateman et al. tentatively attributed their complex pattern of observed preference reversals to the range-frequency effect (Parducci and Weddell, 1986); that is, the effect that people will often rank an option higher when it is compared to a lot of worse options than when it is compared to a lot of better options. Specifically, Bateman et al. asked their respondents to rank each $-bet and each P-bet against ten sure amounts and nine other risky options, but the set of risky options differed across each ranking exercise. Consequently, some $-bets were ranked with a set that contained a lot of better risky options, and some were ranked against a lot of worse risky options; similarly for the P-bets. Bateman et al. report that in those tests where preference reversals were made less substantial by ranking, the $-bets were ranked with many better outcomes and the P-bets were ranked with many worse outcomes, suggesting that if the range-frequency effect was at work, the value of the $-bet would be depressed and the value of the P-bet would be enhanced, which is indeed likely to reduce the number of preference reversals. Conversely, in the tests where preference reversals were largely unaffected by ranking, the relative rank of the $-bets was the same or higher than the relative rank of the P-bets, implying that the range-frequency effect would not be working to depress the value of the $-bet relative to the value of the P-bet, offering a plausible explanation for the persistent substantial and systematic preference reversals in these ranking exercises. 3 “In general, choices . . . tend to reveal preferences for gambles in which the more favourable outcome is most likely. Conversely, selling prices tend to reveal a preference for the gamble for which the more favourable outcome has the largest value” (Lindman, 1971, p. 396; quoted in Seidl, 2002, p. 623). 4 For example, the large chance of the poor outcome.
806
A. Oliver / Journal of Health Economics 25 (2006) 803–820
To summarise, in economic theory it is assumed that choice, valuation and ranking tasks will generate consistent preferences. Although, strictly speaking, the different tasks are all forms of choice, they may engender different mindsets depending on whether a pairwise (or multiwise) choice is given (as in choice and ranking exercises) or has to be constructed (as in valuation exercises). The main objective of the study reported in this article is to test whether ranking can reduce preference reversals in circumstances where the $-bet and the P-bet are both ranked against a number of almost identical options. Thus, the possible confounding influence of introducing a large set of different ‘risky’ options in each ranking exercise – and thus the potential for the range-frequency effect to ‘bias’ differentially the results across the ranking exercises – is largely absent, although the possibility remains of a ‘steady’ bias across the ranking exercises, due to the respondents being guided by the options in each ranking choice set. Explicitly, the theoretical core of this study is the assumption that people do have expected utilities for outcomes that can be expressed with a good deal of precision through their preferences (as is assumed under expected utility theory and generalised theories of rational choice including cumulative prospect theory), but that systematic preference reversals occur if people make mistakes when asked questions in particular ways.5 The study also has two other relatively ‘innovative’ features. The first relates to the fact that preference reversals in the context of health are largely unexplored, which is an important oversight, particularly because valuations in health and health care often employ methodologies based on different (and possibly systematically incompatible) elicitation procedures; for example, choice with respect to discrete choice experiments (Ryan and Gerard, 2002), and valuation with respect to contingent valuation (Donaldson and Shackley, 2002). If choice tasks do indeed encourage a greater focus on probabilities it is possible that in a discrete choice question, a relatively safe but unspectacular treatment will be preferred over a riskier treatment that, if successful, pays great dividends; but if valuation tasks focus attention away from probabilities and towards the best attainable outcome it is possible that this ‘preference’ will be reversed in a willingness to pay study. This study is a rare attempt at uncovering preference reversals in a health-related context (where ‘health’ is measured by life-expectancy).6 Second, although preference reversals are most often observed over risky ‘personal’ bets, there is some evidence that they also occur when using similarly constructed options over income distributions (Amiel et al., 2003). This study, by using different distributions of life expectancy, supplements the literature that aims to place the study of preference reversals in a wider ‘social’ context. 5 See Lopes (1991) for a view on how such mistakes may have been deliberately induced and reported in the literature to give a misleading perception on the poor state of human judgement abilities. 6 As noted earlier, the certainty equivalents version of the standard gamble is a valuation task where people often give extremely low values to health states, at least compared to the probability equivalents version of the standard gamble (Hershey and Schoemaker, 1985; Pope, 2004). I am grateful to one of the referees for pointing out to me that, at first sight, this seems to undermine the claim that valuation tasks will tend to lead to overvaluations. However, it may be that the probability version of the standard gamble encourages people to state very high values, not least due to their desire to avoid accepting a significant probability of death. Moreover, to replicate specifically the $-bet in the preference reversal literature, the probability attached to full health in the certainty equivalents standard gamble would have to be low, and the probability attached to death would have to be high. An experiment that elicits values with this particular construct of the standard gamble might prove interesting, as it may show that people do not adjust their valuations downwards sufficiently to account for the high probability of death, although it is perhaps worth keeping in mind that the certainty equivalents form of the standard gamble is rarely used in practical health economics-related circumstances. Nonetheless, as Bateman et al. (2002) observed, we certainly cannot rule out the possibility that the propensity for overvaluation and undervaluation is highly sensitive to the elements in the choice set.
A. Oliver / Journal of Health Economics 25 (2006) 803–820
807
2. Methods 2.1. Sample Thirty-six respondents, recruited from the Department of Social Policy at the London School of Economics, participated in the study. Each respondent was paid £6 for attending one face-toface interview, all of which were conducted by the author. No specific sample selection criteria were applied, although any respondent who clearly did not understand what was being asked of them would have been excluded from the analysis. Following a practice session (detailed below), it was not felt necessary to exclude any of the respondents. Twenty-one respondents were postgraduate students, with the rest comprising of staff educated up to postgraduate level. Twenty-one respondents were female, 23 were aged 18–30 years (with 11 aged 31–45 years, 1 aged 46–60 years and 1 aged >60 years), 30 had a social science background (with 3 having a science background and 3 having an ‘other’ background), and 18 stated that they were familiar with decision theory. The respondents represented a mix of different nationalities. 2.2. Questionnaire The respondents were presented with a total of eight questions, three of which were practice questions. They were required to answer the practice questions at the beginning of the interview to ensure that they understood their tasks before answering the questions in the main part of the questionnaire. The main questions did not replicate any of the practice questions. The respondents were encouraged to ask questions during the practice session, and were informed at the start that there are no right or wrong answers to any of the questions. All of the respondents appeared to understand the practice questions, and explicitly stated that this was indeed the case. After completing the practice questions, they were required to answer the five questions in the main part of the questionnaire without asking any questions so as to reduce the possibility of interviewer bias. The order in which the five questions were presented was randomised across respondents. The respondents were not allowed to return to previous questions in order to revise their answers, because, given the quite small number of questions, allowing them to do so may have led some of the respondents to search for an ‘artificial’ level of consistency in their answers. The respondents were required to answer three types of question (hence the three practice questions), which took the form of choice, valuation and ranking. Examples of each type of question, replicated from the main part of the questionnaire, are respectively presented in Figs. 1–3. Fig. 1 replicates the choice question used in the main part of the questionnaire. As indicated in Section 1, the questions in this study are framed in terms of the distribution of health, where ‘health’ is defined by life expectancy. Fig. 1 represents a straight choice between living in country A, which has a modest probability of a high life expectancy (the $-bet), and country B, which has a high probability of a modest life expectancy (the P-bet).7 The respondents were all educated, and thus if they took their own position into account, it can reasonably be surmised that most would think that they would have a good chance of being in the high life expectancy groups. They were all therefore asked to imagine that their real-life socio-economic position was of no relevance to 7 A life expectancy is of course a mean of a distribution, and the respondents’ perceptions of what the distribution around the mean was could have posed problems. However, it was hoped that the respondents would interpret a life expectancy of x years as meaning that everyone knows that they will die at x years of age. From listening to the respondents in the practice questions, we can be confident that this is what they did think when answering the questions.
808
A. Oliver / Journal of Health Economics 25 (2006) 803–820
Fig. 1. The choice question.
the life expectancy opportunities offered in each question (i.e., they were placed behind a veil of ignorance, insofar as this is possible). In the practice choice question, the respondents were informed that they should assume that they would be born in the country that they chose, with the implication being that each stated percentage gives a good indication of the chance that the respondent would have the associated life expectancy. As noted in Section 1, with all of the questions in this study the intention was for the respondents to place themselves in a ‘social’ context. However, within this social context of life expectancy distributions, where respondents are implicitly being asked to consider implications for others as well as for themselves, there is a possibility that the respondents could have answered from the viewpoint of their own personal welfare, or from the viewpoint of a social planner who takes into account the general welfare of the country. Indeed, the explanations that the respondents gave for their answers (on which more later)
Fig. 2. A valuation question.
A. Oliver / Journal of Health Economics 25 (2006) 803–820
809
Fig. 3. A ranking question.
suggest that some adopted a personal perspective, whilst others adopted a planner perspective. Different perspectives may influence the results; for example, risk averse respondents may not necessarily be inequality averse, suggesting that whilst they may choose the P-bet from a personal perspective, they might choose the $-bet from a societal perspective. The varying perspectives are, in hindsight, a caveat of the study, and it may be advisable if efforts are undertaken in future studies to ensure that all respondents adopt a single perspective. By indicating that they have no preference between the two countries, the respondents were allowed to express indifference in the choice question. Fig. 2 replicates the question that was designed to elicit the respondents’ ‘direct’ valuation for the $-bet (and hence is meant to reflect the conventional valuation procedure generally used in the preference reversal literature). The valuation question aims to elicit the life expectancy required by the respondents for them to be (close to) indifferent between this ‘certain’ life expectancy and
810
A. Oliver / Journal of Health Economics 25 (2006) 803–820
the distribution in country A.8 Consequently, the respondents’ answers to the valuation question approximate their ‘certainty equivalents’ for the unequal distribution of life expectancy in country A. The valuation for the P-bet given in Fig. 1 was similarly elicited. The question intended to value Fig. 1’s $-bet through a ranking exercise is replicated in Fig. 3.9 The respondents were asked to consider the distribution of life expectancy in eight countries (Countries A–H), one of which (country D) is the $-bet and five of which represent a situation where everyone in the country has the same life expectancy. Each respondent was asked to place all of the countries on a straight line from what they considered to be the worst country to the best country, and the value of the $-bet was taken as the mid-point of the two ‘certainties’ it was placed between. The P-bet used in Fig. 1 was similarly valued, with minor exceptions; although the P-bet ranking exercise contained the same ‘certain’ options as the $-bet ranking exercise, the two additional ‘uncertain’ options differed slightly. In Fig. 3, country B dominates country D (the $-bet) and was therefore used in a simple test of dominance, and country H has a slightly narrower range of outcomes than country D, and was included to try to stimulate the respondents to think about their answers. The P-bet ranking exercise contained a similarly constructed ‘dominated’ (rather than ‘dominating’) option and a similar ‘stimulant’ option, but because the P-bet differs from the $-bet, these additional uncertain options necessarily had to differ from those in Fig. 3. Nonetheless, as noted in Section 1, the $-bet and P-bet are both ranked against a number of almost identical options, and neither bet is ranked against a large number of better or worse uncertain options. Thus, compared to the study by Bateman et al. (2002), the possibility for the rangefrequency effect to have a differential confounding influence across the two ranking exercises, although still plausible, was probably minimal. In the choice question and the conventional valuation questions, the respondents were also required to write down briefly their reasons for their answers. This was done both in order to try to obtain some qualitative understanding of any conventional predicted preference reversals, and in the hope of further raising the respondents’ level of ‘engagement’ with the questions. Written explanations were not elicited in the ranking questions, as it was felt that simply answering these particular questions would extend the respondents to the limits of their patience, and thus attempts at eliciting explanations in these questions might be detrimental to the experiment as a whole. 2.3. Tests The three practice and five main questions are summarised in Table 1. As stated above, the practice questions were used to familiarise the respondents with the three different types of task, and are summarised in Table 1 purely for reference purposes. The 8 ‘Minimum life expectancies’ are used to approximate certainty equivalents because it was felt that the respondents would find these easier to conceptualise than ‘indifference life expectancies’. The implication of using these approximations is that the elicited certainty equivalents for both the $-bet and the P-bet described later in this article will be marginally above the ‘true’ certainty equivalents, but since this (marginally) affects both bets, it is unlikely to have any effect on the main findings of the study. 9 Some may take the view that the ordering of the various options in the ranking exercises is unnecessarily unnatural, and may have induced the respondents into giving ‘incorrect’ answers if they answered too quickly or are inept at reorganising the list in a more natural way (cf. Lopes, 1991). However, the scenarios were presented in this way so as to encourage the respondents to spend more time and thought on their answers. By making the ranking exercise slightly more challenging, it was hoped that the respondents might deliberate more than they would otherwise have done, and as a consequence offer answers that are better in tune with their underlying preferences.
A. Oliver / Journal of Health Economics 25 (2006) 803–820
811
Table 1 Summary of questions
i The ii
options are listed in the order in which they were presented to the respondents.
$-bet.
iii P-bet. iv Dominating v Dominated
option. option.
$-bet and the P-bet in the main questions are the focus for our attention, and took the form: (64 years, 70%; 84 years, 30%) $-bet (65 years, 3%; 70 years, 97%)
P-bet
These options were designed to approximate the construct of the bets used by Lichtenstein and Slovic (1971), summarised in Section 1.10 That is, it was assumed that the respondents would at 10
i.e., $-bet = ($16, 11/36; −$1.50, 25/36) and P-bet = ($4, 35/36; −$1, 1/36).
812
A. Oliver / Journal of Health Economics 25 (2006) 803–820
face value take the ‘average’ life expectancy in each of these two countries to be about 68–69 years, and therefore the $-bet involves a reasonably large chance of a ‘loss’ and a moderate chance of a large ‘gain’, and the P-bet involves a much smaller chance of a slightly smaller loss and a larger chance of a more modest gain. The ‘expected’ life expectancies of the $-bet and the P-bet, at 70 years and 69.85 years respectively, are almost identical. With reference to the main questions summarised in Table 1, the answers to questions 1–3 offer a conventional test of preference reversal. If a respondent chooses A (the $-bet) in question 1, standard economic theory would require his/her certainty equivalent to be higher in question 2 than in question 3, and vice-versa if s/he chooses B (the P-bet) in question 1. Predicted preference reversal (at least in its ‘strict’ form, where the respondent demonstrates a definite preference for the $-bet over the P-bet or vice-versa) would require the respondent to choose B in question 1 but offer a certainty equivalent that is higher in question 2 than in question 3, and unpredicted preference reversal would require him/her to choose A in question 1 but offer a certainty equivalent that is higher in question 3 than in question 2. It was hypothesised that by using this conventional valuation approach there would be substantial preference reversals, and that they would be systematically in the direction of the predicted pattern. The answers to questions 1, 4 and 5 offer a test of preference reversal with use of a ranking procedure. Similar to the conventional procedure, if a respondent chooses A in question 1, standard economic theory would require his/her inferred value for the $-bet in question 4 to be higher than his/her inferred value for the P-bet in question 5, and vice-versa if s/he chooses B in question 1. Predicted preference reversal would require the respondent to choose B in question 1 but have a higher inferred value for the $-bet in question 4 than for the P-bet in question 5, and unpredicted preference reversal would require him/her to choose A in question 1 but have a higher inferred value for the P-bet in question 5 than for the $-bet in question 4. Since the main objective of the study is to test whether ranking can reduce preference reversals, it was hypothesised that there will be substantially fewer preference reversals compared to the conventional approach, and that the pattern of any observed preference reversals would not be as systematic. 3. Results 3.1. Quantitative results As mentioned in Section 2.2, the two ranking questions each allow one test of dominance. Specifically, with reference to Table 1, dominance is violated if D B in question 4 or if C F in question 5. Dominance was violated by seven respondents (19%) in question 4, and by two respondents (6%) in question 5.11 However, it is not obvious that these violations are ‘irrational’. For example, a preference for D over B in question 4 could possibly be explained by a preference for a country where there is less of a sense that a relatively small proportion of the population are being ‘left too far behind’, an argument that could also be applied to explain a preference for C over F in question 5 (albeit to a lesser extent since the possible life expectancies in these latter two countries are relatively close together). This particular explanation is motivated by a concern about societal inequality, but the respondents may also plausibly violate dominance due
11 One of the violations in question 5 was a weak violation, in that the respondent expressed indifference between C and F.
A. Oliver / Journal of Health Economics 25 (2006) 803–820
813
to issues pertaining to personal risk. For example, in question 4, some respondents may anticipate quite severe disappointment (Loomes and Sugden, 1986) if in B they end up in the group with the lower life expectancy, but much less disappointment if this were to happen in D, because in D they had a higher expectation of ending up in the group with the lower life expectancy. These anticipated feelings, which fall outside the scope of expected utility theory, could in theory modify the expected utilities of B and D to the extent that D is preferred over B. In the ranking exercises the respondents were not asked for written explanations for their answers, and therefore it is not possible to deduce whether or not they reasoned in these ways. However, it is plausible that people may often ‘rationalise’ in ways that seem both reasonable, and that fall beyond the scope of conventional theories of rational choice. Even if these violations of dominance are entirely the result of respondent error, they would only cause a significant problem in terms of the interpretation of the results if they generated a notable shift from or to one particular preference pattern when comparing the conventional and ranking procedures (and particularly if they caused a large number of predicted preference reversals in the ranking procedure that were not evident in the conventional procedure, or viceversa). This was not the case.12 For these reasons, the respondents who violated dominance are not excluded from the analysis. A summary of the main results of the study is given in Table 2. In Table 2, strict preference patterns are those that indicate that the respondents had a definite preference for either the $-bet or the P-bet in the choice, valuation and ranking questions, and are denoted $$, PP, P$ and $P. There are a number of striking features with respect to these strict patterns of preference. In terms of the preference patterns that conform with standard economic theory, the ranking procedure generated fewer responses where the respondents consistently preferred the $-bet than the conventional procedure, and more responses where they consistently preferred the P-bet. Moreover, and perhaps more importantly, compared to the conventional procedure (where strict predicted preference reversals are the modal preference pattern) the ranking procedure produced far fewer strict predicted preference reversals (preference pattern P$), and overall preference reversals were not as systematically unidirectional. Therefore, with respect to strict preference patterns, the hypothesis that ranking will generate fewer, less systematic preference reversals cannot be rejected. However, weak preference reversals, whilst being of roughly equal number in the conventional and ranking procedures, are more systematically in the direction of predicted reversals when using the ranking procedure. This is due to the greater number of respondents expressing preference pattern P= in the ranking procedure. The explanation for this observation is that a number of respondents who demonstrated strict predicted preference reversal pattern P$ in the conventional procedure gave a lower value for the $-bet in the ranking question compared to the valuation question, but the value was insufficiently low enough to remove the preference reversal completely (and hence generated more weak predicted preference reversals). This can be observed in the
12 For the seven respondents who violated dominance in question 4, it is possible to discern from the results that one respondent demonstrated a predicted preference reversal in the conventional procedure but not in the ranking procedure, and two respondents demonstrated a predicted preference reversal in the ranking procedure but not in the conventional procedure. It could not be discerned whether either of the two respondents who violated dominance in question 5 demonstrated such a pattern of preferences, because neither of these respondents gave answers that made it possible to detect whether they valued the $-bet higher than the P-bet (or vice-versa) in the ranking procedure (in cases where a respondent values both bets lower than 62 years or higher than 70 years in the ranking questions – respectively the lowest and highest certainty equivalents on offer in these questions – it is not possible to compare the respondent’s $-bet and P-bet values).
814
A. Oliver / Journal of Health Economics 25 (2006) 803–820
Table 2 Summary of results Preference patterna
Number of observations
Interpretation
Conventional procedure $$ PP P$ $P I$ P= IP $= I=
7 (19%) 7 (19%) 13 (36%) 0 (0%) 1 (3%) 5 (14%) 1 (3%) 2 (6%) 0 (0%)
Consistent with economic theory Consistent with economic theory Predicted preference reversals Unpredicted preference reversals Weak predicted reversalsb Weak predicted reversalsc Weak unpredicted reversalsd Weak unpredicted reversalse Consistent with economic theory
Ranking procedure $$ PP P$ $P I$ P= IP $= I= Not enough information
2 (6%) 10 (28%) 5 (14%) 2 (6%) 0 (0%) 9 (25%) 0 (0%) 2 (6%) 1 (3%) 5 (14%)f
Consistent with economic theory Consistent with economic theory Predicted preference reversals Unpredicted preference reversals Weak predicted reversalsb Weak predicted reversalsc Weak unpredicted reversalsd Weak unpredicted reversalse Consistent with economic theory
a
‘$$’ indicates that the $-bet is chosen in the choice question, and is also valued higher in the valuation or ranking questions, whereas ‘$P’ indicates that the $-bet is chosen in the choice question, but the P-bet is valued higher in the valuation or ranking questions. All other preference patterns can be read similarly (‘I’ denotes indifference in the choice question and ‘=’ denotes the elicitation of equal certainty equivalents or inferred values in the valuation or ranking questions). b The answer to the choice question indicates indifference, but the answers to the valuation or ranking questions are consistent with predicted preference reversals. This preference pattern is therefore classified as a weak predicted preference reversal. c The answer to the choice question is consistent with predicted preference reversals, but the answers to the valuation or ranking questions indicate that an equal value is placed on the $-bet and the P-bet. This preference pattern is therefore classified as a weak predicted preference reversal. d The answer to the choice question indicates indifference, but the answers to the valuation or ranking questions are consistent with unpredicted preference reversals. This preference pattern is therefore classified as a weak unpredicted preference reversal. e The answer to the choice question is consistent with unpredicted preference reversals, but the answers to the valuation or ranking questions indicate that an equal value is placed on the $-bet and the P-bet. This preference pattern is therefore classified as a weak unpredicted preference reversal. f If a respondent values both the $-bet and the P-bet lower than the lowest ‘certain’ option or higher than the highest ‘certain’ option in the ranking exercise, there is insufficient information to determine whether s/he values the $-bet higher than the P-bet, the P-bet higher than the $-bet, or both bets equally. In such circumstances it is not possible to tell whether the respondent has reversed his or her preferences.
breakdown of individual data presented in Table 3, where it is shown that five of the 13 respondents who demonstrated strict predicted preference reversals in the conventional procedure offered weak predicted preference reversals in the ranking procedure (with five and three of the remaining eight respectively showing no preference reversals and strict predicted preference reversals in the ranking procedure). The implication is that the ranking exercise may be partially successful at
A. Oliver / Journal of Health Economics 25 (2006) 803–820
815
Table 3 Breakdown of results for each respondent Respondent 1 2* 3 4* 5* 6* 7 8 9 10 11 12 13 14* 15 16 17* 18* 19 20 21 22 23 24 25* 26* 27* 28 29* 30 31 32 33 34* 35* 36
Choice question
Valuation questions
Ranking questions
$-bet
$-bet (70, 69) $-bet (72, 68) $-bet (92, 80) $-bet (69, 67) $-bet (75, 70) $-bet (72, 69)
P-bet (68, 69) Inda (67, 67) NEIb (>70, >70) Ind (67, 67) $-bet (69, 67) Ind (69, 69) Ind (69, 69) Ind (69, 69) $-bet (>70, 69) Ind (69, 69) NEI (>70, >70) Ind (67, 67) Ind (65, 65) $-bet (>70, 69) $-bet (>70, 69) P-bet (65, 69) Ind (65, 65) P-bet (65, 69) Ind (69, 69) NEI (>70, >70) Ind (69, 69) NEI (>70, >70) P-bet (65, 67) $-bet (>70, 69) P-bet (67, 69) P-bet (67, >70) Ind (69, 69) $-bet (>70, 69) P-bet (67, 69) P-bet (65, 69) P-bet (65, 69) P-bet (<62, 67) P-bet (63, 69) P-bet (67, 69) $-bet (69, 67) NEI (<62, <62)
P-bet Ind P-bet P-bet P-bet P-bet $-bet P-bet P-bet $-bet P-bet P-bet P-bet P-bet P-bet P-bet P-bet Ind $-bet $-bet $-bet $-bet $-bet P-bet P-bet P-bet $-bet P-bet P-bet P-bet P-bet P-bet P-bet P-bet P-bet
Ind (70, 70) Ind (70, 70) Ind (70, 70) P-bet (68, 69) $-bet (74, 70) P-bet (64, 69) P-bet (60, 65) $-bet (72, 70) Ind (70, 70) Ind (70, 70) $-bet (73, 66) $-bet (70, 69.5) P-bet (70, 72) $-bet (75, 68) Ind (70, 70) $-bet (80, 70) $-bet (70, 68) $-bet (84, 70) $-bet (74, 70) $-bet (80, 75) $-bet (70, 69) $-bet (71, 70) $-bet (70, 69) Ind (70, 70) P-bet (70, 71) P-bet (60, 68) P-bet (68, 75) $-bet (70, 69) $-bet (70, 69) P-bet (64, 70)
Notes: The respondent numbers affixed with an asterisk (*) are those respondents who demonstrated a strict predicted preference reversal in the conventional procedure (i.e., P-bet $-bet and v($-bet) > v(P-bet)). The words under each column denote each respondent’s preference in each elicitation mode. For example, for respondent 1, ‘$-bet’ under ‘Valuation question’ indicates that s/he placed a higher certainty equivalent on the $-bet than on the P-bet in the conventional valuation procedure. The figures in parentheses are the values implied from the answers given by each respondent. The value for the $-bet is always stated first. Thus, for example, in the conventional valuation questions, the respective values for the $-bet and P-bet offered by respondent 1 were 70 years and 69 years. a Ind: The respondent was indifferent between or attached equal value to the $-bet and P-bet. b NEI: There is not enough information to make a judgment on the respondent’s preference.
negating the problem of systematic preference reversal. A further indication of this partial success is that when the strict and weak preference reversals are combined, the predicted to unpredicted reversal ratio is 19:3 in the conventional procedure and 14:4 in the ranking procedure, a difference between predicted and unpredicted reversals that is statistically significant at the 1% level in the
816
A. Oliver / Journal of Health Economics 25 (2006) 803–820
2 (1) = 6.63; former procedure but not in the latter (conventional procedure: χ2 = 11.64 > χ0.01 2 2 13 ranking procedure: χ = 5.56 < χ0.01 (1) = 6.63). The possibility that many of the respondents would offer a lower value for the $-bet in the ranking procedure is exactly the effect that was hypothesised by Bateman et al. (2002), and explains why there are a lower number of $$ as well as P$ preference patterns in the ‘ranking’ versus the ‘conventional’ results reported in Table 2. This hypothesis is supported by a further analysis of the individual data, where it was discovered that although 23 respondents for both the $-bet and the P-bet offered a lower value in the ranking questions than in the direct valuation questions (compared to five and three respondents, respectively, who gave a higher value), the average reduction for the $-bet was 4 expected life years compared to only 1.5 expected life years for the P-bet.14
3.2. Qualitative results Written explanations were elicited from the respondents for their answers to the choice question and conventional valuation questions in an attempt to understand why people might systematically reverse their preferences across these types of questions. Table 4 represents an attempt to summarise in a few words the explanations offered by each respondent who demonstrated strict or weak predicted preference reversals.15 In Table 4, the respondent numbers correspond to those given in Table 3. For instance, in Table 3, it can be observed that respondent 2 preferred the P-bet in the choice question, but valued the $-bet higher than the P-bet in the conventional valuation question, and thus demonstrated a strict predicted preference reversal. Thirteen of the 19 respondents who demonstrated predicted preference reversals seemed to be motivated by personal risk aversion in preferring the P-bet in the choice question. These respondents tended to emphasise the small chance of the good outcome or, less frequently, the high chance of the poor outcome in the $-bet. Four of the 19 respondents were motivated in choosing the P-bet by social inequality aversion The explanations for the $-bet valuations are a little more varied. Four respondents were seemingly motivated by wanting to go higher than the poor outcome in the $-bet, two indicated explicitly an anchoring on the good outcome, four apparently chose a simple ‘halfway’ or ‘somewhere in between’ heuristic (which for some might have implicitly involved an anchoring on the good outcome), and seven indicated that their valuation was motivated by something close to their perception of the expected value of the $-bet. The explanations for the P-bet valuations followed two main, and possibly related, lines of thought. Three respondents went for something close their perception of the expected value of the P-bet, and ten respondents perceived the P-bet to be an almost certain 70 years, and thus valued this bet at, or marginally below, 70 years. The very general explanation for ‘conventional’ predicted preference reversals in this study therefore seems to be that a significant minority of the study’s respondents were risk or inequality
13
I am grateful to one of the referees for pointing this test out to me. The 23 respondents who gave the $-bet a higher value in the conventional valuation procedure than in the ranking procedure came from the 28 respondents where it was possible to compare values. The comparable figures for the P-bet are 23 from 34 respondents. It is possible that when respondents are asked to ‘create’ their certainty equivalents for these bets (as they are when they are asked to answer the conventional valuation questions), they enter a bargaining mindset that serves to inflate their certainty equivalents of what many perceive to be risky options. 15 The complete set of qualitative explanations as reported by the respondents is available at: http://www.lse.ac.uk/ collections/LSEHealthAndSocialCare/documents/ADAMOLIVER/PreferencereversalsAugust2005.xls. 14
A. Oliver / Journal of Health Economics 25 (2006) 803–820
817
Table 4 Qualitative explanations of those with predicted preference reversals Respondent 2* 4* 5*
Choice question
Valuation of $-bet
Valuation of P-bet
PRAa
About halfway between 64 and 84 years Slightly less than the EVb of the $-bet About halfway between 64 and 84 years Somewhat higher than poor outcome in $-bet Close to the EV of the $-bet Preference for a more equal country pushed value up Close to the EV of the $-bet A focus on the good outcome in $-bet pushed value up A focus on the good outcome in $-bet pushed value up Close to the EV of the $-bet Somewhat higher than poor outcome in $-bet Somewhat higher than poor outcome in $-bet About halfway between 64 and 84 years Lack of data implies a bad country: high life expect. required Close to the EV of the $-bet Close to the EV of the $-bet Close to the EV of the $-bet Somewhere between 64 and 84 years Somewhat higher than poor outcome in $-bet
About half way between 65 and 70 years Substantially less than the EV of the P-bet Approximates the P-bet as almost 70 years for certain Approximates the P-bet as almost 70 years for certain Close to the EV of the P-bet
with a focus on high chance of poor outcome in $-bet PRA
17*
PRA with a focus on high chance of poor outcome in $-bet PRA with a focus on high chance of poor outcome in $-bet PRA with a focus on small chance of good outcome in $-bet Concern about social inequality
18*
Concern about social inequality
25*
PRA with a focus on small chance of good outcome in $-bet
26*
PRA with a focus on certainty of 65–70 years in P-bet
27*
34*
PRA with a focus on small chance of good outcome in $-bet PRA with a focus on small chance of good outcome in $-bet Concern about social inequality
35*
PRA
6* 14*
29*
3**
Not clear
7***
Does not care for life beyond 70 years PRA with a focus on small chance of good outcome in $-bet PRA with a focus on small chance of good outcome in $-bet Concern about social inequality
9*** 15*** 16*** 30***
PRA with a focus on small chance of good outcome in $-bet
Preference for a more equal country pushed value up Approximates the P-bet as almost 70 years for certain Approximates the P-bet as almost 70 years for certain Not clear
Approximates the P-bet as almost 70 years for certain Approximates the P-bet as almost 70 years for certain Approximates the P-bet as almost 70 years for certain No reason stated Lack of data implies a bad country: high life expect. required Close to the EV of the P-bet Approximates the P-bet as almost 70 years for certain Close to the EV of the P-bet Approximates the P-bet as almost 70 years for certain Approximates the P-bet as almost 70 years for certain
Notes: The respondent numbers correspond to those listed in Table 3. The respondent numbers affixed with an asterisk (*) are those respondents who demonstrated a strict predicted preference reversal (i.e., P-bet $-bet and v($-bet) > v(P-bet)). Those affixed with a double asterisk (**) demonstrated the first type of weak predicted preference reversal listed in Table 2 (i.e., P-bet indifferent to $-bet and v($-bet) > v(P-bet)). Those affixed with a triple asterisk (***) demonstrated the second type of weak predicted preference reversal listed in Table 2 (i.e., P-bet $-bet and v($-bet) = v(P-bet)). a PRA: the respondent’s answer was driven by personal risk aversion. b EV: expected value.
818
A. Oliver / Journal of Health Economics 25 (2006) 803–820
averse in the choice question and, for a variety of reasons (for example, a tendency to place approximate expected values on the bets,16 the use of very simple heuristics so as to save time and/or energy, and/or a possible anchoring on the good outcome in the $-bet), valued the $-bet at least as high and often higher than the 70 years (or marginally lower) value they placed on the P-bet. 4. Discussion The main objective of the study reported in this article was to test whether encouraging respondents to consider a broad range of sure amounts when valuing ‘bets’, by means of an exercise whereby the $-bet and P-bet were ranked against a set of almost identical options, would generate fewer and less systematic preference reversals than an exercise that more closely mirrors the conventional ‘preference reversal’ procedure. On the basis of the answers given by the respondents used, this hypothesis cannot be rejected, particularly if focus is limited to the strict preference reversals. Moreover, the study demonstrated that substantial and systematic strict preference reversals can be observed with a conventional procedure when the questions are framed in a ‘health’ context, and also that the phenomenon remains apparent when the bets are constructed as ‘social’ distributions, rather than ‘lottery tickets’ that have implications entirely for oneself. As intimated in Section 1, the apparent fact that in some circumstances different elicitation procedures can cause substantial numbers of respondents to vary their preferences systematically has important potential implications for health economists. The evidence in this article lends some (albeit contestable) support to the conjecture that direct valuations can equate to the overvaluation of bets that involve moderate chances of relatively large gains. The respondents used in the study were highly educated, and half of them stated that they were familiar with decision theory. On the basis of their qualitative explanations, many of the respondents certainly appeared to be familiar with the concept of expected value. It is therefore telling that such a group expressed, on the whole, systematic predicted preference reversals in the conventional procedure, in that one might have expected such a group of individuals to be more conventionally ‘rational’ than a random sample of the general public (although, admittedly, the qualitative results imply that their familiarity with the concept of expected value may have caused some of the predicted preference reversals). However, the sample size was relatively small, and thus further evidence, using more realistic health outcome descriptions on larger and, if possible, more varied samples is required to confirm the findings of the study. In addition to acknowledging the small sample size, a further caveat of the study is that the respondents’ answers may have been dependent upon the choice set used in the ranking exercises.17 In the wider economics and philosophy literatures, choice set dependence is sometimes termed ‘menu dependence’ (Pettit, 2005; Sen, 2002), for which there are quite a few published examples. For instance, Black (1986) has observed that preference for a team may depend on the team’s differential likelihood of winning against different opponents, a finding that may bear some resonance with the range-frequency effect mentioned earlier (Parducci and Weddell, 1986). In the elicitation of health state values with use of a rating scale, the range-frequency effect has been termed context bias, where the value for any particular health state may depend on the number 16 For example, a combination of the quantitative and qualitative results suggest that those who demonstrated weak predicted preference reversals may have simply rounded their expected values of both bets at 70 years. 17 I thank one of the referees for drawing my attention to the general problem of choice set dependence, and Marjon van der Pol for intimating how the study’s ranking exercise may to some extent suffer from this problem.
A. Oliver / Journal of Health Economics 25 (2006) 803–820
819
of better or worse health states it is rated against (Bleichrodt and Johannesson, 1997; Robinson et al., 2001).18 In a further example of menu dependence, Pettit (1991) has argued that an action, such as taking an apple from a bowl of fruit when offered, will vary in its identity as an object of preference, depending on the context. That is, it may be a polite action if there are other fruit in the bowl, but impolite if it is the only remaining fruit. Menu dependence with respect to the ranking questions may have occurred because the respondents were asked to rank the bets against certainty equivalents that fell within the range of 62–70 years. Therefore, many of them may have felt compelled to rank the bets within this range, which might have ‘artificially’ suppressed (or, less likely, enhanced) the value of the $-bet in particular.19 Despite the caveats of the study, and given that most health care treatments involve an element of risk, it should be possible to test for preference reversals over ‘realistic’ health care treatment options that offer moderate chances of large gains or high chances of smaller gains (i.e., that mirror the typical construct of the $-bet and P-bet). If preference reversals are confirmed, there may be a case for arguing that direct valuation methods (such as the direct elicitation of willingness to pay values) are in some circumstances upwardly biased. Ranking exercises might serve to reduce this bias, although the study reported in this article indicates that they still may come at the cost of systematic weak predicted preference reversals, and thus further empirical testing on whether this method sufficiently negates any overvaluation of $-bet-type contexts is required. If ranking also proves flawed, we might conclude that direct pairwise choice-based methods are best equipped to uncover respondents’ preferences. Of course, this conclusion relies on the assumption that the systematic preference reversals observed in the literature are due to internal bias that occurs only in the valuation tasks. At this point in time, it is not possible to conclude definitively that this is the case. Perhaps preferences are always constructed by the question that is asked, and depend entirely upon such factors as frame, elicitation mode and choice set. If this is the case, then, given the motivation, researchers may well be able to guide respondent preferences in any direction they wish. Consequently, systematic violations of conventional theories of rational choice – which generally assume that preferences are innate rather than constructed – are hardly surprising, and any consistencies across elicitation modes (e.g., across choice, valuation and ranking) may simply be that: consistencies, rather than an indication of underlying preferences. If preferences are constructed, then it is possible that all of the revealed preference techniques that have been used in health and health care give a distorted impression of what people really want; moreover, the possibility of designing a method that identifies highly specific preferences with which we can attach a large degree of confidence may remain remote. The extent to which preferences are constructed has yet to be discovered. If they are wholly constructed from the question design, then there is not much hope for preference elicitation exercises of any sort. If they are partially constructed then there are limits to their accuracy as an indication of what people really do value, and the analyst ought to strive to acknowledge the different conclusions that could be reached within their margin of error. Given 18 Specifically, if a health state is presented together with many better health states, its value tends to be depressed, and if it is presented with many worse health states, its value tends to be enhanced. 19 In Table 3, it can be observed that in the more open-ended conventional valuation questions, 13 of the 36 respondents valued the $-bet higher than 70 years, and six of these reduced their values to below 70 years in the ranking exercise. All six respondents demonstrated strict predicted preference reversals in the conventional procedure, but five of the six no longer demonstrated strict predicted reversals in the ranking procedure. It is possible (although by no means certain) that these five strict reversals may have disappeared due to these respondents artificially compressing their valuations of the $-bet to the 62–70 year range in the ranking exercise. Thus, given the small sample size, the possibility that a menu dependent bias in the ranking exercises might have impacted on the results cannot be entirely ruled out.
820
A. Oliver / Journal of Health Economics 25 (2006) 803–820
that revealed preference techniques are now being used at the practical health policy making level in several countries, the extent to which preferences are constructed is an issue that perhaps deserves more attention from the health economics community than it has thus far received. Acknowledgements I am grateful for comments received from participants at seminars held at the London School of Economics, King’s College London, and the University of Aberdeen. I am also grateful for very helpful comments offered by two anonymous referees. All mistakes remain my own. References Amiel, Y., Cowell, F.A., Davidovitz, L., Polovin, A., 2003. Preference reversals and the analysis of income distributions. Distributional Analysis Discussion Paper 66, STICERD, London School of Economics, London. Bateman, I., Day, B., Loomes, G., Orr, S., Sugden, R., 2002. Does a ranking procedure eliminate the usual violations of expected utility theory? Paper presented at a meeting of the Preference Elicitation Group, London School of Economics, December 11th. Black, M., 1986. Some questions about Bayesian decision theory. In: Daboni, A., Montesano, A., Lines, M. (Eds.), Recent Developments in the Foundations of Utility and Risk Theory. Reidel, Dordrecht. Bleichrodt, H., Johannesson, M., 1997. An experimental test of the theoretical foundation of rating-scale valuations. Medical Decision Making 17 (2), 208–216. Donaldson, C., Shackley, P., 2002. Willingness to pay for health care. In: Scott, A., Maynard, A., Elliot, R. (Eds.), Advances in Health Economics. John Wiley and Sons, Chichester. Hershey, J.C., Schoemaker, P.J.H., 1985. Probability versus certainty equivalence methods in utility measurement: are they equivalent? Management Science 31 (10), 1213–1231. Lichtenstein, S., Slovic, P., 1971. Reversals of preferences between bids and choices in gambling decisions. Journal of Experimental Psychology 89 (July), 46–55. Lindman, H.R., 1965. The Measurement of Utilities and Subjective Probabilities. University of Michigan, Unpublished Doctoral Dissertation. Loomes, G.C., Sugden, R., 1986. Disappointment and dynamic consistency in choice under uncertainty. Review of Economic Studies 53 (2), 271–282. Lopes, L.L., 1991. The rhetoric of irrationality. Theory and Psychology 1 (1), 65–82. Parducci, A., Weddell, D., 1986. The category effect with rating scales: number of categories, number of stimuli and method of presentation. Journal of Experimental Psychology: Human Perception and Performance 12 (4), 496–516. Pettit, P., 1991. Decision theory and folk psychology. In: Bacharach, M., Hurley, S. (Eds.), Essays in the Foundations of Decision Theory. Blackwell, Oxford. Pettit, P., 2005. Construing Sen on commitment. Economics and Philosophy 21 (1), 15–32. Pope, R.E., 2004. Biases from omitted risk affects in standard gamble utilities. Journal of Health Economics 23 (4), 695–735. Robinson, A., Loomes, G., Jone-Lee, M., 2001. Visual analog scales, standard gambles and relative risk aversion. Medical Decision Making 21 (1), 17–27. Ryan, M., Gerard, K., 2002. Using discrete choice methods in health economics: moving forward. In: Scott, A., Maynard, A., Elliot, R. (Eds.), Advances in Health Economics. John Wiley and Sons, Chichester. Seidl, C., 2002. Preference reversal. Journal of Economic Surveys 16 (5), 621–655. Sen, A., 2002. Rationality and Freedom. Harvard University Press, Cambridge and London. Slovic, P., Lichtenstein, S., 1968. Relative importance of probabilities and payoffs in risk taking. Journal of Experimental Psychology 78 (3), 1–18. Tversky, A., Slovic, P., Kahneman, D., 1990. The causes of preference reversals. American Economic Review 80 (1), 204–217.