A test of independence of discounting from quality of life

A test of independence of discounting from quality of life

Journal of Health Economics 31 (2012) 22–34 Contents lists available at SciVerse ScienceDirect Journal of Health Economics journal homepage: www.els...

480KB Sizes 2 Downloads 53 Views

Journal of Health Economics 31 (2012) 22–34

Contents lists available at SciVerse ScienceDirect

Journal of Health Economics journal homepage: www.elsevier.com/locate/econbase

A test of independence of discounting from quality of life夽 Arthur E. Attema∗ , Werner B.F. Brouwer iBMG/iMTA, Erasmus University Rotterdam, PO Box 1738, 3000 DR Rotterdam, The Netherlands

a r t i c l e

i n f o

Article history: Received 6 June 2011 Received in revised form 21 November 2011 Accepted 12 December 2011 Available online 20 December 2011 JEL classification: D90 I10

a b s t r a c t The quality-adjusted life-years (QALY) model assumes quality and quantity of life can be multiplied into a single index and requires quality and quantity to be mutually independent, which need not hold empirically. This paper proposes a new test for measuring independence of utility of life duration from quality of life in a riskless setting. We use a large representative sample of Dutch citizens and include two health states generally considered better than dead (BTD) and one health state considered worse than dead (WTD). Independence cannot be rejected when comparing the BTD health states, but is rejected when comparing the BTD states with the WTD state. In particular, utility of life duration becomes more concave for the WTD state. This may suggest that independence holds only for BTD health states. This has implications for the QALY model and would require using sign-dependent utility of life duration functions.

Keywords: Discounting QALY model Utility of life duration

1. Introduction The quality-adjusted life years (QALY) model is often used in economic evaluations and medical decision making and is becoming increasingly important in health policy making. This is exemplified by the binding recommendations, (also) based on (cost per) gained QALY estimates, of decision making bodies such as the National Institute for Health and Clinical Excellence (NICE) as to which treatments should be reimbursed. Such use and influence of QALY measurement and valuation require it to be an appropriate measure of health preferences. One of the key properties of the QALY model is independence of the function for quality of life and the function for life duration (Bleichrodt and Pinto, 2005). This implies that separate utility functions can be defined over the attributes quality and duration (Bleichrodt et al., 2011). As a result of this, QALYs can be computed by multiplying utility of life duration with utility of the health states, where both are independent of each other. Hence, if T denotes life duration and Q a chronic health state that lasts the full amount of T, then the number of

夽 This research was made possible through a grant from The Netherlands Organization for Health Research and Development (ZonMW), project number 152002010. We thank Jose Luis Pinto for useful comments on an earlier version of this manuscript. The usual disclaimer applies. ∗ Corresponding author. Tel.: +31 10 4089129; fax: +31 10 4089081. E-mail addresses: [email protected] (A.E. Attema), [email protected] (W.B.F. Brouwer). 0167-6296/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jhealeco.2011.12.001

© 2011 Elsevier B.V. All rights reserved.

(discounted) QALYs can be represented by the function U(Q,T) = V(Q) × W(T). Advantages of independence are that it makes the decision problem more tractable and that it simplifies the measurement of preferences for health. For example, one can measure the utility of life duration using just one impaired health state and full health, and the resulting utility function will be valid for all other health states as well. The following example may clarify this. Suppose an individual lives ten more years in a chronic, imperfect health state. Suppose, moreover, that this health state receives a fixed quality weight of, say, 0.5 on a scale of 0 (death) to 1 (full health), implying that this individual values one year in this health state as half a year in full health. The total number of remaining QALYs for this individual can be computed by multiplying the utility of the next ten years by the utility of the health state (i.e., 0.5). If we assume, for simplicity, that utility of life duration is linear, we obtain: 10 × 0.5 = 5 QALYs. If lifetime utility is represented by the function U(Q,T) = V(Q) × W(T) without W(T) being linear, independence simply requires that the degree of discounting (utility curvature) of life duration does not depend on the health state in which one spends the time. So, the discounting function W(T) is similar for all Q. Moreover, it also means that the quality weight attached to a state (e.g., the value ‘0.5’ in the example) does not depend on the duration of the state. The value ‘0.5’ should be appropriate whether the health state lasts one day or ten years. Hence, the discounting function (W[T]) and the utility of health function (V[Q]) are mutually independent. This assumption is useful in many applications. For example, in a

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

time tradeoff (TTO) elicitation, one can correct the estimates for utility of life duration using the utility of life duration function that was elicited for one particular health state, and use this function to correct as many TTO health state utilities as desired (Attema and Brouwer, 2009). It is not evident, however, whether independence indeed represents people’s preferences. The empirical evidence about the descriptive validity of independence is mixed, as reported in the next section. If independence is violated in empirical studies, measurements of the QALY model that assume independence will be biased, and policy recommendations based on the QALY model will not reflect people’s genuine interests. For example, the utility of a health state may depend on its duration, so that the value function for health states becomes V(Q,T) instead of V(Q). The function V(·) elicited for a particular value of T may then not be valid for other values of T. Such a problem may, for instance, be caused by maximum endurable time (MET Dolan and Stalmeier, 2003; Stalmeier et al., 1996). MET entails the phenomenon that some health states can be regarded as better than dead (BTD, i.e., V[Q] > 0) for a short duration (i.e., small values of T) but considered worse than dead (WTD, i.e., V[Q] < 0) for longer durations (i.e., higher values of T). In that case, alternative models and methods have to be used that better reflect the preferences of individuals. Therefore, it is important to carefully test the validity of the independence property. Most previous studies on this topic investigated independence of quality of life from life duration or the independence of utility of life duration from quality of life in the context of risk. However, it is not a priori clear that independence of quality from life duration necessarily implies independence of the opposite relation. For example, MET implies that the utility of a health state depends on its duration, but still the utility of life duration need not depend on the health state. Similarly, people may for some reason discount bad health states more heavily than good states, whereas the utility of a health state is constant irrespective of its duration. Moreover, it is unclear whether the results obtained in a risky environment can be transferred to a risk-free environment (Abellán-Perpinán et al., 2009b; Attema et al., forthcoming). This paper aims to test independence of the utility of life duration from quality of life in a risk-free situation, using a withinsubjects design, in a large sample representative for the Dutch adult population. To the best of our knowledge, it is the first to do so. In what follows, Section 2 introduces notation and the method and discusses related literature. Section 3 describes the experimental design. Section 4 presents the results and Section 5 discusses them and concludes.

2. Methods

23

quality and ı(t) denotes the corresponding weight attached to the value in this period. We term the QALY value of a period spent in full health the utility of a period of life and denote periods by their beginning and their end, e.g., [2,5] for time points 2–5 (Attema et al., forthcoming). We write W[2,5] for the utility of this period. We suppress the beginning of a period if it is 0 and, hence, write W(10) instead of W[0,10]. In fact, W(10) is the sum of the discount weights given to the period between t = 1 and t = 10 (assuming the health state is constant during that period). We normalize the utility function such that W(0) = 0 and W(T) = 1. The QALY model states that h weakly preferred to h if and only  T T if ı(t)V (Qt ) ≥ ı(t)V (Qt ) (Bleichrodt and Gafni, 1996). t=1 t=1 That is, health profiles are evaluated by the function U(T, Qt ) = T ı(t)V (Qt ). For constant profiles, this expression becomes t=1

T

ı(t). The QALY model implies U(T,Q) = W(T)V(Q), with W (T ) = t=1 that the utility of a health state and the utility of life duration are independent. Here, V(Q) depends only on the health state irrespective of the number of life years and W(T) depends only on the number of life years irrespective of the health state (Dolan, 2000). 2.2. Measurement method Because of independence, the QALY model can be measured quite straightforwardly. One robust measurement method that relies on these properties is the Direct Method (DM) proposed by Attema et al. (forthcoming). The DM lets a subject compare two simple health profiles with horizon T, which are both combinations of two health states, e.g.,  and ˇ, with  strictly preferred to ˇ. The difference between the profiles is that one starts with the better health state  and ends with the worse state ˇ: ( 1 , . . .,  t , ˇt+1 , . . ., ˇT ); whereas the other starts with ˇ, followed by an improvement toward : (ˇ1 , . . ., ˇt ,  t+1 , . . .,  T ). Now, the purpose is to elicit the point t = d1/2 such that an individual is indifferent between the two profiles, i.e., ( 1 , . . .,  t , ˇt+1 , . . ., ˇT )∼(ˇ1 , . . ., ˇt ,  t+1 , . . .,  T ). The period [0,d1/2 ] then has the same utility as [d1/2 ,T]: W (d1/2 )V () + W [d1/2 , T ]V (ˇ) = W (d1/2 )V (ˇ) + W [d1/2 , T ]V (). (1) If we denote the health improvement from ˇ to  as X = V() − V(ˇ), we get: W (T )V (ˇ) + W (d1/2 )X = W (T )V (ˇ) + W [d1/2 , T ]X.

(2)

After simplification, we obtain: W (d1/2 ) = W [d1/2 , T ] And because

W(d1/2 )

+

(3) W[d1/2 ,T]

= W(T) = 1, we have:

2.1. Notation

1 W (d1/2 ) = 2

Before we introduce notation, it is worthwhile to emphasize there are different axiomatizations of the QALY model, depending on whether a risky or riskless decision problem is analyzed. Since we consider a riskless decision problem, we use an axiomatization for the riskless situation. Characterizations for the risky situation can for example be found in Pliskin et al. (1980) and Bleichrodt and Miyamoto (2003). We let h = (Q1 , . . ., Qt ) denote a health profile where Qt represents the health state in period t = 1, . . ., T, with T the final period. A preference relation over health states is defined from preferences over constant profiles, i.e.,  weakly preferred to ˇ if and only if (, . . ., ) weakly preferred to (ˇ, . . ., ˇ). Further, V(Qt ) is a value function that represents the individual’s preferences over health

Hence, we need not know the quality of life weights of the health states involved. After sufficient elicitations, this method allows for a measurement of the complete utility function for life duration. For example, we can next find d1/4 such that W[0,d1/4 ] = W[d1/4 ,d1/2 ] and, hence, W(d1/4 ) = 1/4, etc. If an individual would not discount the future, his value of d1/2 would be d1/2 = 1/2 × T and his utility function would accordingly be linear. The situation changes if the utility of a health state and the utility of life duration are not mutually independent. The utility of life duration may for example be different for different health states. In principle, the DM is then still able to produce valid results, but the elicited utility for life duration function cannot be transferred to other health states. Instead, the discounting function would become health state dependent, ı(t,Qt ), and utility for life duration

(4)

24

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

functions should be elicited several times, one function for each health state. This paper tests this assumption by using the DM to measure utility for life duration for three different health states. In addition, the utility of a health state may depend on its duration. MET, for example, may cause a health state to be evaluated BTD for the first few years but during later years the health state may be considered WTD (Sutherland et al., 1982). If we then erroneously assume that the utility of this health state is constant during the entire period, apparently lower discounting may result, whereas the subject in fact has the same discounting function for this health state as for all other health states. Appendix 2 exemplifies how this possibility might occur in our experimental procedure. Thus, a time-dependent utility function over health states, for example caused by MET, may be confused with changing discounting patterns over time. 2.3. Related literature Published studies in this area differ with respect to what they exactly tested in terms of independence and how they did so. Fewer studies have tested whether the utility of life duration is independent from the health state in which this life is spent than vice versa, i.e., whether the utility of a health state is independent from its duration. Furthermore, some studies employed a riskless context, whereas others used a risky context, and some considered varying rather than chronic health states. Here, we will only discuss studies concerning chronic health states.1 In order to test whether the estimated utility of a health state is the same when using different time horizons in a riskless case, the TTO method can be used. The TTO method elicits preference for health states by letting a subject imagine living T more years in an imperfect health state. The subject then has to indicate the number remaining life time x < T in full health such that he is indifferent between living T years in the imperfect health state and living x years in full health. According to the QALY model, the resulting indifference can be evaluated by: V (Q )W (T ) = V (FH)W (x).

(5)

Normalizing V(Q) such that V(FH) = 1, leaves us with: V (Q ) =

W (x) . W (T )

(6)

Investigators using TTO often assume the linear QALY model, i.e., W(t) = t/T, which implies a simplification of Eq. (6) to: V (Q ) =

x . T

(7)

The utility of health state Q is constant in the QALY model and, hence, should not depend on its duration. Therefore, multiplying the denominator with a factor z should be accompanied with a multiplication of the numerator with a factor z. This condition is often termed constant proportional tradeoff (CPTO). CPTO also holds under the power QALY model, i.e., with W(T) = Tr (Pliskin et al., 1980). However, if the QALY model is neither linear nor power, a rejection of CPTO need not necessarily imply a violation of the generalized QALY model; it may also be due to a discounting function not belonging to the power family (Attema and Brouwer, 2010). Table 1 presents some details of published CPTO studies. Most studies (implicitly) assumed no discounting. Of these, the majority found a violation of CPTO (Buckingham et al., 1996; Dolan and Stalmeier, 2003; Sackett and Torrance, 1978; Stalmeier et al., 1997,

1 See, for example, Guerrero and Herrero (2005) and Spencer and Robinson (2007) for the case of health states that vary over time.

2001; Stiggelbout et al., 1995; Unic et al., 1998), but some others did not find a violation (Bleichrodt and Johannesson, 1997; Cook et al., 1994; Hall et al., 1992). Stalmeier et al. (1997, 2001) attributed the violation of CPTO in their experiments to MET combined with a proportional heuristic; i.e., subjects regard some poor health states as BTD for some time but WTD afterwards, whereas when responding to a TTO task they simply double their answer when the time horizon has been doubled. This implies a spurious satisfaction of CPTO that is inconsistent with the subjects’ preference for living the shorter number of years in the considered health state over living the larger number of years in it. The evidence regarding CPTO in studies that corrected for discounting is inconclusive. Attema and Brouwer (2010), Bleichrodt et al. (2003) and Martin et al. (2000) reported violations of CPTO, whereas Attema and Brouwer (forthcoming) found mixed evidence, and Stalmeier et al. (1996) and van der Pol and Roux (2005) found no violation. If one instead aims to test independence of utility of a health state from time horizon in a risky setting, the standard gamble method (also known as ‘probability equivalence’) may be used. In this method, the subject has to imagine being in an imperfect health state and has to consider two alternatives. One is a risky treatment with a probability p that the subject returns to perfect health and will live for T additional years, and a complementary probability 1 − p of immediate death. The other alternative involves the certainty that the current health state will persist for the rest of his life (T years again). The probability p is then varied until the subject is indifferent between these alternatives. Using the QALY model under expected utility and normalization, we get V(Q)W(T) = pW(T), so V(Q) = p and p represents the utility of the considered imperfect health state. Now, by changing the length of T while holding the health state constant, one can test whether the utility of the health state as measured by p remains the same. A few studies have investigated this hypothesis. Bleichrodt and Johannesson (1997) found no violation of independence when taking into account imprecision of preferences. On the other hand, Bala et al. (1999) concluded that independence was violated by a majority of their subjects. Martin et al. (2000) reported a significant negative relation between utility and time frame, hence also violating independence (Table 2). Independence of utility of life duration from quality of life can be investigated in a risky context by employing, for example, the certainty equivalence (CE) method. This method is similar to the SG except that it keeps the probability p fixed and instead varies the amount of remaining life time in the current health state, t, of the certain alternative. Moreover, the CE method uses the same health states in both alternatives, but involves a gamble on remaining life time in that health state. The CE method also relies on expected utility and enables an estimation of the utility of life duration. For example, if a subject indicates to be indifferent between 4 years in full health and a lottery treatment giving 10 years in full health with p = 0.5 and immediate death otherwise, we can infer (under expected utility) that the next 4 years give her as much utility as the 6 years after that. By changing the health state (for example from full health to an inferior health state), we can test whether the certainty equivalence stays 4 years (another answer would entail more or less discounting than in the initial setting). The CE approach was used by Miyamoto and Eraker (1988), who reported health states to have no significant main effect on the CEs in a small sample of patients. However, they did observe a slight tendency for CEs to be greater in the poor health condition when using a sign test. They concluded that the majority of subjects had no underlying effect of health quality. In addition, Abellán-Perpinán et al. (2009a) used CEs in a Spanish general population sample. They also could not reject independence. It should be noted, however,

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

25

Table 1 Overview of studies testing independence of utility of health state from (utility of) duration under certainty (using TTO method). Study

Sample

Health state

Violation independence (yes/no)

Sackett and Torrance (1978)

General population n = 246; Patients on home dialysis n = 29 University staff n = 10 Women aged 40–70 of which n = 60 breast cancer patients n = 44 no b.c. n > 500 patients Cancer patients n = 54 testicular cancer patients; n = 72 disease free colocteral cancer patients; n = 29 incurable patients General population n = ±1500 Students n = 86; Exp 1: n = 19 female students; Exp 2: n = 29 female students; Exp 3: n = 38 female high school students Students n = 172 Exp. 1a: n = 48 female students; Exp. 1b: n = 39 high school students; Exp. 2: n = 28 high school students; Exp. 3: n = 64 high school students Healthy women with suspected genetic predisposition to breast cancer n = 54 General population n = 64 n = 201 cardiovascular disease patients n = 176 students and patients; n = 14 patients with migraine; n = 27 patients with esophageal cancer n = 51 students n = 91 students n = 111 students n = 56 students n = 76 students

Various health conditions

Yes

Severe or mild angina pain Breast cancer treatment outcomes

No No

Gallstone disease Current health

No Yes

Own health

Yes

Genetic counseling, Prophylactic mastectomy and metastasized breast cancer

Mixed

Back pain and rheumatism

No

Migraine, genetic counseling, Prophylactic mastectomy and metastasized breast cancer

Yes

Prophylactic mastectomy

Yes

Several health states

Mixed

Own current health Living with migraines Living with metastasized cancer

Yes Yes

Back pain EQ-5D state 21223 Increase in body weight EQ-5D state 11221 EQ-5D state 11221

Yes Yes No Yes Mixed

Pliskin et al. (1980) Hall et al. (1992)

Cook et al. (1994) Stiggelbout et al. (1995)

Buckingham et al. (1996) Stalmeier et al. (1996)

Bleichrodt and Johannesson (1997) Stalmeier et al. (1997)

Unic et al. (1998)

Kirsch and McGuire (2000) Martin et al. (2000) Stalmeier et al. (2001)

Bleichrodt et al. (2003) Dolan and Stalmeier (2003) van der Pol and Roux (2005) Attema and Brouwer (2010) Attema and Brouwer (forthcoming)

that expected utility is often descriptively violated and, hence, this causes distortions in CE estimates as well. An alternative is the tradeoff (TO) method (Wakker and Deneffe, 1996). The TO method elicits the utility function

for life duration by determining a sequence of life durations that are equally spaced in utility terms. This is accomplished by asking the subject to compare two lotteries. One lottery consists of a probability p of living x0 more years and a probability 1 − p of living

Table 2 Overview of studies testing independence of . . .. Study

Sample

. . .utility of health state from (utility of) duration under risk Volunteers McNeil et al. (1981) n = 37 Students Bleichrodt and Johannesson (1997) n = 172 Elderly people Bala et al. (1999) n = 114 Cardiovascular disease patients Martin et al. (2000) n = 201 . . .utility of duration from health state under risk Hospital inpatients (various Miyamoto and Eraker (1988) diseases) n = 64 Students Bleichrodt and Pinto (2005) n = 51 Representative sample of Abellán-Perpinán et al. (2009a) Spanish adult population n = 656 . . .utility of duration from health state under certainty Representative samples of Abellán-Perpinán et al. (2006) Spanish adult population n = 977 n = 300

Method

Health state

Violation independence (yes/no)

SG

Speech loss

Yes

SG

Back pain and rheumatism

SG

Severe shingles pain

Yes (but not after accounting for imprecision) Yes

SG

Own current health

Yes

CE

Own current health

No

TO

Full health, back pain, migraine

No

CE

18 EQ-5D states

No

TTO

43 EQ-5D states

Yes

26

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

(a usually higher amount of) R more years (assume the health state is full health everywhere). The other lottery consists of the same probability, p, of living x1 more years and probability 1 − p of living r more years (with R > r). The health state has to be the same for all these durations, for example full health. The amounts of p, x0 , R and r are fixed and the subject is asked for the amount of x1 such that he is indifferent between the two lotteries. For example, if p = 0.5, x0 = 0, R = 50 and r = 40, the subject may indicate to be indifferent if x1 = 5 (note that monotonicity implies that x1 should be greater than x0 , in order to compensate for r being smaller than R). Under several utility theories (including expected utility and prospect theory), it can be shown that this indifference can be evaluated by: U(x1 = 5) − U(x0 = 0) = U(R = 50) − U(r = 40). Then x0 = 0 is substituted by the elicited number x1 = 5 and the value x2 is elicited so that the subject is indifferent between the lottery giving 5 years with p = 0.5 and R = 50 years otherwise, and the lottery giving x2 years with p = 0.5 and r = 40 years otherwise. Suppose x2 = 12 is elicited. This indifference can then be evaluated by U(x2 = 12) − U(x1 = 5) = U(R = 50) − U(r = 40) and it follows that U(x2 = 12) − U(x1 = 5) = U(x1 = 5) − U(x0 = 0). Hence, we know that the utility difference between 0 and 5 years is regarded as equally high as the utility difference between 5 and 12 years, implying diminishing marginal utility over life years. If one repeats this procedure, a utility function over life duration can be elicited. By changing the health state, one can test whether the elicited standard sequence is the same for different health states, as predicted by the QALY model. Bleichrodt and Pinto (2005) used the TO method to consider independence of utility of life duration from quality of life for convenience samples of university students. They elicited three utility of life duration functions in a risky situation, using full health and two mild diseases states as their health states, and compared the results, which did not reject independence. Finally, the case of independence of utility of life duration from quality of life in a risk-free situation is the one that has been least explored. To date, only one empirical study appears to have investigated this issue (Abellán-Perpinán et al., 2006). This study rejected independence for a sample of the general public in Spain and found more discounting for more severe health states. To reach this conclusion, Abellán-Perpinán et al. (2006) used an indirect method to elicit discounting. They first elicited TTO scores in one sample (n = 977). Then, in another sample (n = 300), they elicited an amount of time such that the subject was indifferent between two health profiles. The TTO scores of the first sample were used to compute for which amount of discounting the elicited values in the second sample were predicted best. Hence, there was no direct elicitation of discounting and it had to be estimated indirectly, with this estimate possibly distorted by other influences. In addition, only a between-subjects comparison was possible. Finally, the general profiles that they compared differed only in the first year, which is a rather short time horizon. In this study we directly investigate utility of life duration by means of the DM (Attema et al., forthcoming). This method directly elicits discounting by asking subjects to compare improving and deteriorating health profiles, which we do here for three different EQ-5D health states, including one very poor health state. This allows for a within-subjects test of independence of discounting from quality of life in a context of certainty, which, to the best of our knowledge, has not been done before. We employ a relatively long time horizon of twenty years, thereby capturing the greater influence of discounting for longer time frames. This was also recommended by AbellánPerpinán et al. (2006, p. 674). Finally, we use a large general population sample, so that inferences for the entire population are possible. The latter is important, since the general public

is a common source of health state valuations to derive QALY weights. 3. Experiment 3.1. Subjects The subject pool consisted of a large sample (n = 1448) of the Dutch adult population. 3.2. Procedure We hired a professional internet sampling company (Survey Sampling International) to program and conduct our experiment. This company has much experience with internet surveys and a large representative database of subjects. The subjects are rewarded with a monetary amount to be given to a charity fund of their choice and a small monetary amount for themselves upon completion of the questionnaire. We first pilot tested our design with 100 subjects, in order to investigate whether or not possible problems might arise (the aforementioned subject pool of n = 1448 does not include these 100 pilot subjects). However, we did not find such problems and the results of the pilot test were similar to those of the full sample. All indifferences were elicited by using sequences of five binary choices, similar to the procedure in Attema et al. (forthcoming), because indifference by choice tends to cause fewer inconsistencies than indifference by matching (Bostic et al., 1990; Hey et al., 2009). Because of the impossibility to use real incentives in the context of health, hypothetical incentives had to be employed, as is common in health economics (e.g., Bleichrodt and Pinto, 2005; Doctor et al., 2004; Wakker and Deneffe, 1996). Hence, we could not pay subjects according to their answers, as can be done in experimental studies involving monetary outcomes. Such studies may use a random incentive system, paying subjects according to one randomly selected choice (Baltussen et al., forthcoming; Cubitt et al., 1998; Morone, 2010; Morone and Schmidt, 2008). The evidence regarding the influence of using hypothetical rather than real incentives is mixed (Anderson and Mellor, 2009; Beattie and Loomes, 1997), with some studies finding significant differences (Holt and Laury, 2002) and others not (Abdellaoui et al., 2011; Coller and Williams, 1999), or only increased noise for hypothetical incentives (Camerer and Hogarth, 1999). An advantage of our design, however, is that it is not possible to improve one’s choice set in follow-up questions by strategically misrepresenting one’s answer in the starting questions (e.g., Harrison, 1986). 3.3. Stimuli We classified health states according to the EQ-5D system. This system classifies health states using five dimensions, each consisting of three levels. We elicited the utility of life duration for three different ˇs, namely ˇ = health state 11221 (‘G’, abbreviating good), ˇ = 32211 (‘M’, abbreviating mediocre) and ˇ = 33333 (‘B’, abbreviating bad). The first was chosen because it is a relatively mild health state. The more severe state M was selected to investigate whether a more severe state had an influence on discounting. Finally, the DM was used with state 33333 (B) as stimulus for those subjects who had valued this state as WTD in a TTO task. This allowed for investigating the discounting function in case of WTD health states.  was taken to be state 11111 (full health) in all three elicitations. Appendix 1 gives complete descriptions of these four health states. We investigated the utility function over the next T = 20 years. Therefore, we normalized W(20) = 1. The subjects were told that after this period both options would give the same health state,

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

27

without specifying it further. For state G[M,B] we first elicited g1/2 [m1/2 ,b1/2 ] such that W(g1/2 [m1/2 ,b1/2 ]) = 1/2 × W(20) = 1/2, in the same way as explained for d1/2 above. We further elicited g1/4 [m1/4 ,b1/4 ], g1/8 [m1/8 ,b1/8 ], g3/4 [m3/4 ,b3/4 ], g7/8 [m7/8 ,b7/8 ], with respective utilities 1/8, 1/4, 3/4 and 7/8. Indifferences were elicited up to a precision of one month and analyzed in terms of months. The order of G and M was randomized across subjects. The elicitation of B was always at the end of the questionnaire (or not included at all if the state was evaluated as BTD). The questions of the different tasks were not interspersed and they were administered successively.

In order to exploit the variety in background characteristics of our sample, we run regressions trying to explain both discounting and differences in discounting for different health states. The data are non-normally distributed (Kolmogorov–Smirnov test: p < 0.01 for all data), so we only report nonparametric statistical tests (i.e., Wilcoxon signed ranks tests). In case of conflicting conclusions, we also give parametric results (i.e., paired t-tests).

3.4. Analysis

Table 3 shows summary statistics of the subjects. These show that our sample was representative in terms of age and gender (CBS, 2011). Our subjects were slightly more likely to have a higher education, however, whereas there were somewhat fewer people with an income above 4000 euro per month. The average health status corresponded to that reported in Lamers et al. (2006), who also used a Dutch general public sample, although the subjects of our sample reported to be a little unhealthier according to both the VAS and the five dimensions of the EQ-5D. There was one subject who had missing data for all three investigated health states, so that 1447 subjects were included in the analysis. Of these, 1110 subjects indicated a negative TTO score for B (i.e., considered it to be WTD). Hence, discounting with state B as health state was elicited only for these 1110 subjects. Moreover, 9 subjects did not complete all questions for G, while 11 [3] did not satisfy monotonicity for G[M]. These results were excluded, leaving usable data of 1428 [1445] subjects for G[M]. However, it was not always possible to compute all PMs for all subjects, because about 20% had denominators equal to 0 for PM1/82 and PM7/8.3 This occurred because these subjects gave extreme answers, i.e., always choosing A or always choosing B. These preferences are remarkable, since they may for example imply a preference of (B1 , . . ., B230 , FH231 , . . ., FH240 ) over (FH1 , . . ., FH230 , B231 , . . ., B240 ). An explanation for this answer patterns is that (some of) these subjects did not read or understand the instructions and just wanted to finish as soon as possible (this was confirmed in a probit analysis by the presence of a significant negative effect of interview duration on the likelihood of always choosing the same option, p < 0.01), or both. Therefore, we reran the analyses excluding these subjects, but this did not change the conclusions. Below, we therefore only report the results of the analyses excluding the subjects with extreme answers. The results of a ranking exercise, as well as the results of a VAS revealed that the majority of subjects indeed regarded G to be the best state, followed by M and B (Ranking: G > B: 95.3%; G > M: 91.7%; M > B: 91.7%; VAS: G > B: 98.5% G > M: 96.5%; M > B: 96.0%). Table 4 shows summary statistics for the PM estimates. These make clear that the PMs were indeed quite similar, with the PMs of B generally being slightly lower than those of G and M. Moreover, for all states, the PMs were lower at the higher end of the utility curve. In other words, discounting mainly occurred for longer delays; whereas, PMs were higher than 0.5 for PM1/8 and PM1/4, indicating negative discounting in this range. Table 5 [6] presents the results of the main tests of this study for the full [reduced] sample. The PMs of B were significantly smaller than the values of G and M for many equations (p < 0.01, except for: PM1/8 [M vs. B], PM3/4 [M vs. B] and PM7/8 [M vs. B and G vs.

If discounting does not depend on the stimulus health state, the values of gsuperscript should be the same as msuperscript and bsuperscript for corresponding superscripts. However, except for 1/2, the measurements are chained, so g [m,b]-values depend on previous answers. The direct tests therefore do not constitute independent tests and, hence, we compare the results for the three health states using proportional matches (PMs). These were introduced by Miyamoto and Eraker (1988) and later on employed by AbellánPerpinán et al. (2009a), Attema et al. (forthcoming), and van Osch et al. (2006). A PM corrects for the value of an input, if this is an answer to an earlier question, by an appropriate division. For example, the PM of d1/4 expresses the value of d1/4 relative to the value of d1/2 , since T = d1/2 is used in the elicitation of d1/4 . This way we correct for the possibility of d1/4 being higher merely because T = d1/2 happened to be higher. We test the equality of the following five PMs: PM1/2 : g 1/2 = m1/2 = b1/2 PM1/4 : PM3/4 : PM1/8 : PM7/8 :

g 3/4

g 1/4

− g 1/2

240 − g 1/2 g 1/8 g 1/4 g 7/8

=

=

m1/8 m1/4

− g 3/4

240 − g 3/4

=

g 1/2

m1/2

m3/4

=

− m1/2

240 − m1/2

=

=

m1/4

b1/4 b1/2 =

b3/4 − b1/2 240 − b1/2

.

(8)

b1/8 b1/4

m7/8 − m3/4 240 − m3/4

=

b7/8 − b3/4 240 − b3/4

The number ‘240’ stems from the total horizon under consideration (remember that we analyzed in terms of months, so g1 = T = 20 years = 240 months). We also compare the areas under the normalized utility functions that result from linear interpolation between known utilities. A higher magnitude of this area is associated with a higher degree of concavity. Therefore, this allows investigating differences in the degree of concavity among different stimulus health states. The maximum value of this area is 240 (240 months × 1 utility unit) and if an individual has a linear utility function for life duration, the magnitude of her area is 1/2 × 240 = 120, reflecting the 45◦ -line that represents a linear function. Following Attema et al. (forthcoming), the areas are computed by: 240 − 240 × (1/8d1/8 + 3d1/4 /16 + 1/4d1/2 + 3d3/4 /16 + 1/8d7/8 ) (9) However, keep in mind that comparing the areas under the curve may be biased because of the chaining procedure, hence overstating differences due to an initial difference in d1/2 . Therefore, we approximate this measure by combining the five PMs and comparing these between the different health states.

4. Results

2 G: 275 (of which 15 had the same problem for PM7/8) of 1428 = 19.3%; M: 298 (of which 3 had the same problem for PM7/8) of 1445 = 20.6%; B: 299 (of which 9 had the same problem for PM7/8) of 1110 = 26.9%. 3 G: 355 of 1428 = 24.9%; M: 297 of 1445 = 20.6%; B: 212 of 1110 = 19.1%.

28

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

Table 3 Summary statistics (n = 1447). Variable

Percentage

Age Gender (% male) Children (%yes) Number of children (among people with children, n = 816) Income groups D 3999 Education Lower Middle Higher Health status EQ-5D (Dutch tariff) VAS Completion time (min)

Mean

SD

Minimum

Maximum

45.8

16.2

18

88

1

15

48.2 57.2 2.22 12.9 34.1 29.4 14.9 8.8 23.8 38.1 38.1 0.86 78.1 32.1

0.22 17.4 14.2

−0.33 0 6.1

1 100 125.7

Table 4 PM values. Health state

Measure

PM1/8

PM1/4

PM1/2

PM3/4

PM7/8

G

Mean (variance) Median Interquartile range (IQR)

0.56 (0.07) 0.51 0.45–0.76

0.58 (0.06) 0.52 0.48–0.77

0.46 (0.05) 0.48 0.33–0.52

0.37 (0.09) 0.45 0.05–0.52

0.30 (0.08) 0.24 0.02–0.50

M

Mean (variance Median IQR

0.57 (0.07) 0.52 0.45–0.77

0.57 (0.06) 0.52 0.45–0.76

0.44 (0.04) 0.45 0.33–0.52

0.35 (0.08) 0.36 0.05–0.52

0.31 (0.09) 0.24 0.02–0.51

B

Mean (variance) Median IQR

0.55 (0.06) 0.50 0.44–0.67

0.54 (0.06) 0.52 0.44–0.64

0.40 (0.04) 0.45 0.23–0.48

0.34 (0.08) 0.38 0.05–0.52

0.32 (0.09) 0.32 0.02–0.51

Table 5 Results of Wilcoxon signed ranks tests. Test G vs. M

PM1/8 PM1/4 PM1/2 PM3/4 PM7/8 Area a b c d

G vs. B

M vs. B

Z

n

Z

n

Z

n

1.01 1.30 1.75d 2.25c 0.14 −2.83a

1130 1428 1428 1428 1097 1428

2.54b 5.62a 7.41a 3.70a 0.58 −8.64a

805 1096 1096 1096 838 1096

0.52 3.71a 5.17a 0.96 1.04 −5.89a

815 1108 1108 1108 877 1108

Significant at the 1% level. Significant at the 5% level according to the Wilcoxon signed ranks test, but not significant according to the paired t-test. Significant at the 5% level according to the Wilcoxon signed ranks test, and at the 1% level according to the paired t-test. Not significant according to the Wilcoxon signed ranks test, but significant at the 5% level according to the paired t-test.

Table 6 Results of Wilcoxon signed ranks tests excluding subjects with extreme answers. Test G vs. M

PM1/8 PM1/4 PM1/2 PM3/4 PM7/8 Area a b c d

G vs. B

M vs. B

Z

n

Z

n

Z

n

−0.65 −0.67 0.78 1.86d −0.65 −2.13c

688 688 688 688 688 688

1.60 3.15a 6.41a 0.61 −3.05a −7.10a

480 480 480 480 480 480

1.31 2.85b 5.58a −0.73 −1.84d −5.53a

485 485 485 485 485 485

Significant at the 1% level. Significant at the 1% level according to the Wilcoxon signed ranks test, and at the 5% level according to the paired t-test. Significant at the 5% level according to the Wilcoxon signed ranks test, and at the 1% level according to the paired t-test. Not significant according to the Wilcoxon signed ranks test, but significant at the 5% level according to the paired t-test.

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

1 0,9 0,8 0,7

U(X)

0,6

State G State M

0,5

State B

0,4 0,3 0,2 0,1 0

0

0,2

0,4

0,6

0,8

1

X Fig. 1. Mean utility function. Table 7 Median split. TTO G

TTO M Above median Below median

Above median

Below median

1 3

2 4

B]; p > 0.30).4 However, the values of G did not differ significantly from the values of M for all but one comparison (p > 0.08, except for PM3/4, p = 0.02). After Bonferroni correction, that difference was also no longer significant. Hence, although the evidence of heavier discounting of B is robust, the evidence for heavier discounting of M than of G is mixed (Table 6). Fig. 1 shows the means of the three elicited utility of life duration functions. All three areas were significantly higher than 120 and, hence, linear utility was rejected in favor of concave utility (i.e., positive discounting, p < 0.01). The area under the normalized utility of life duration function was highest for B, confirming the results of comparing the separate PMs. The difference was significant both when compared to M and when compared to G (p < 0.01). In addition, the area under M was significantly higher than the area under G (p < 0.01), but, as highlighter before, comparing areas may overstate the differences in discounting. This is confirmed by a nonsignificant difference between G and M when comparing all PMs together for the reduced sample (n = 3440, p = 0.825, but not for the full sample [n = 6511, p = 0.014]). However, the conclusion regarding the comparison of B with G and M did not change when comparing all PMs together (p < 0.01 for both samples). We performed an additional test of independence for state G and M by using the elicited TTO scores for these states. In particular, we used a median split, dividing the sample into two samples, with one half containing subjects with TTO G above the median and the other containing the subjects with TTO G below the median. We did the same for TTO M, resulting in Table 7. If state M were to be discounted more heavily because of it being a more severe state than state G, this should be reflected by a higher violation of independence rate for subjects who perceived more ‘distance’ between these two health states. That is, the subjects in group 2 [3] of Table 7 should be less [more] susceptible to a violation of independence. Therefore, we performed the same comparison of

4 The lack of significance for PMs 1/8 and 7/8 may be related to the possibly reduced answer space for these tasks.

29

PMs for these separate groups, as reported in Table 8. It makes clear that the hypothesized ‘distance effect’ was not present. Instead, subjects whose TTO M was below the median were significantly more likely to violate independence than those whose score was above the median, irrespective of their TTO G. Since the median TTO M was close to 0 (−0.02), this suggests that it is the nature of the health state (i.e., being better or worse than dead) that matters, instead of the distance per se. Our results lend support to the assertion that MET preferences may distort the estimated discounting functions. The valuation of state B in particular appears to be strongly time-dependent. Although most subjects evaluated this health state as WTD according to a TTO task (82.4%), only 16.4% of the subjects did so in the rating task (VAS). The same is true, albeit to a smaller degree, for states G (VAS: 6.2%; TTO: 31.5%) and M (VAS: 10.0%; TTO: 50.6%). An explanation for this observation may be that individuals tend to neglect duration in the VAS and prefer it above death in that case, whereas in a TTO they start to realize that after some time the health state will be tough to handle (Robinson et al., 1997). Therefore, we cannot exclude the possibility that pure time preference is actually independent from quality of life, but that instead quality of life is dependent on duration (and in fact may switch from BTD to WTD), or, which may be most likely, that they are mutually dependent. On the other hand, subjects may have regarded the utility of B as negative even for short durations, but thought the state would be followed by full health in the VAS, and therefore gave it a positive value. Table 9 presents the results of stepwise OLS regressions explaining the variables g1/2 , m1/2 and b1/2 , and the areas under the normalized utility functions for G, M and B, by the available background variables (the results for the full sample are presented in Table A2 of Appendix 3). In general, women discount more than men, and older people discount more than younger people. In addition, people with at least one child discount more than people without children for state G, and the VAS rating of own health had a negative effect on discounting of state B, i.e., healthier people discount less. The TTO score of the state had a highly significant negative impact on discounting for state M and B. That is, the worse one evaluated the health state according to the TTO task, the steeper it was being discounted. This reinforces the finding of steeper discounting of more severe states: people tend to have stronger preferences for postponement of severe health states than of milder health states. Furthermore, subjects who had problems washing or dressing themselves discounted state G less heavily than subjects who did not have those problems; subjects who had a BTD VAS score for state M, but a WTD TTO score (perhaps due to MET), discounted less if state M was the stimulus health state, as expected; finally, individuals with a higher income and a low response time had a lower discount rate in case of state B. Moreover, we performed a regression to examine whether differences in background had predictive ability regarding violation of independence. To this end, we conducted an OLS regression of the absolute value of the difference of the PMs on the background variables, giving the results shown in Table 10 (the results for the full sample are presented in Table A3 of Appendix 3). They highlight the importance of task effort, since subjects who spent less than fifteen minutes on the questionnaire were clearly more likely to violate independence (note that this analysis already excludes the subjects with extreme answers, who had a significant shorter response time). Moreover, higher educated people had a significantly smaller deviation from independence than lower educated ones. This stresses that eliciting discounting is quite a cognitively burdensome task. Further, older subjects as well as anxious or depressed subjects had a higher tendency to violate independence. To the contrary, subjects with pain had smaller violations of

30

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

Table 8 Comparison PMX and PMY according to TTO score.

PM1/8 PM1/4 PM1/2 PM3/4 PM7/8 All PMs Area a b c

Group 1 Z

n

Group 2 Z

n

Group 3 Z

n

Group 4 Z

−0.64 −0.50 −1.19 1.74a 0.22 −0.54 0.27

426 477 477 477 382 2239 477

−1.21 −0.04 0.58 1.47 −0.11 −0.17 −0.83

141 172 172 172 132 789 172

1.12 3.36c 2.02b 1.07 0.17 3.23c −2.83c

89 116 116 116 86 523 116

2.07b 0.74 1.41 1.03 0.38 2.44b −2.37b

n 285 413 413 413 312 1836 413

Significant at 10% level. Significant at 5% level. Significant at 1% level.

Table 9 Results of OLS explaining discounting.

2

Adj. R Age TTO score state Dummy female VAS Own health Dummy at least one child Problems washing or dressing yourself VAS M BTD; TTO M WTD Dummy high income Dummy duration <15 min

G0.5 (n = 782)

M0.5 (n = 839)

B0.5 (n = 607)

Area G (n = 782)

Area M (n = 839)

Area B (n = 607)

0.01

0.041

0.047 T = −3.04 (p < 0.01) T = 2.78 (p < 0.01)

0.009 T = 2.78 (p < 0.01)

0.067 T = 3.23 (p < 0.01) T = −6.5 (p < 0.01) T = 2.79 (p < 0.01)

0.066 T = 4.03 (p < 0.01) T = −2.11 (p = 0.036) T = 3.41 (p < 0.01) T = −2.50 (p = 0.013)

T = 5.78 (p < 0.01) T = −2.43 (p = 0.015)

T = 2.09 (p = 0.037) T = −2.25 (p = 0.025) T = 2.19 (p = 0.029) T = −2.75 (p < 0.01)

T = 2.46 (p = 0.014) T = 2.35 (p = 0.019)

independence when comparing G to B. Higher TTO scores for states G and M were negatively related to the absolute difference between discounting for state G and discounting for state M; hence, the higher the TTO score, the lower the absolute difference. Finally, if state G or state M had a BTD VAS score but also a WTD TTO score, the violation of independence was higher. The values of d1/2 may give some indication of the height of the implied discount rates. The mean estimate of g1/2 was 110.1 months. Assuming exponential discounting, this implies

 110.1 0

e−rt dt =

 240

110.1

e−rt dt, which can be solved for r, yielding r

= 0.017 per year. Similarly, the mean estimate of m1/2 [b1/2 ] was 106.4 [96.2] months, implying r = 0.023 [0.041] per year.

5. Discussion Our results confirm the findings of Abellán-Perpinán et al. (2006) for a different elicitation method and another country. Therefore, heavier discounting of WTD states than BTD states appears to be a robust phenomenon in medical decision making under certainty. However, comparing the two least severe health states (G and M) we find no robust violation of independence. The consideration of a health state as better or WTD seems to be of particular importance. This may help explain why Bleichrodt and Pinto (2005) did not find violations of independence for mild health states. The results of Abellán-Perpinán et al. (2009a), contrary to those of Abellán-Perpinán et al. (2006) and ours, indicated no violations of independence using a within-subjects design for a Spanish general population sample. However, Abellán-Perpinán et al. (2009a) only compared the subjects who valued both states as BTD (they did not report any test comparing subjects who valued both states WTD). These results are therefore not completely at odds with ours: we also could not firmly reject independence when comparing only BTD health states (although we did find more concavity for M than for G when comparing the areas under the normalized utility functions). It seems that the consideration of states

regarded as either BTD or WTD causes a systematic change in people’s decision making processes. Another explanation for the discrepancy of the results of Abellán-Perpinán et al. (2009a), and those of Abellán-Perpinán et al. (2006) and ours may be the difference in elicitation methods, i.e., certainty equivalents (risk) in the former versus health profiles (no risk) in the latter two. Several factors come into play that may affect our choice situation. One factor is MET. This would imply that discounting does not (only) depend on the health state, but instead (also) the utility function is time dependent, especially for poor health states. This highlights the need for a better understanding of preferences for health states around dead. It seems desirable to first determine whether the value of a poor health state is approximately constant and if not, to elicit the switching time point where the health state value changes. This may for example avoid erroneously accepting CPTO (Dolan and Stalmeier, 2003) or, in our case, erroneously rejecting universality of the discount function. However, as shown in Appendix 2, neglecting MET would generate apparently lower discounting for health states that are more susceptible to MET; whereas, we find the opposite pattern of more discounting for more severe states (which are, arguably, more affected by MET). Our conclusion therefore seems not be confounded by MET. An issue that research on independence (including ours) tends to neglect is that both functions are interdependent and, consequently, the performed tests are actually not valid. For example, if we test whether the utility function for life duration is independent from the health state by eliciting utility of life duration for different health states, we are assuming that the value of these health states does not depend on their duration. If the utility functions for life duration and health quality both depend on each other, then we are not able to separate them and have to consider the function U(T,Q) in its totality, without unraveling its components. One possibly influential factor is related to prospect theory, which states that people form reference points and evaluate outcomes as deviations from these reference points (Kahneman and Tversky, 1979; Tversky and Kahneman, 1992). Positive deviations are considered gains and negative deviations are considered losses,

T = 2.51 (p = 0.012)

T = 2.10 (p = 0.036)

T = 2.24 (p = 0.026)

0.059 T = 2.78 (p < 0.01) T = −3.84 (p < 0.01)

T = 2.41 (p = 0.016) T = −2.76 (p < 0.01) T = 2.81 (p < 0.01)

T = −2.10 (p = 0.036)

T = 2.60 (p = 0.01)

T = 3.03 (p < 0.01)

T = −2.22 (p = 0.027)

0.007 0.052 T = 2.60 (p = 0.01) T = −2.49 (p = 0.013) 0.059 T = 3.99 (p < 0.01) T = −2.78 (p < 0.01) 0.028 T = 2.91 (p < 0.01)

0.063 T = 3.00 (p < 0.01) T = −2.08 (p = 0.038) T = 2.29 (p = 0.022) T = 2.67 (p < 0.01)

Adj. R Dummy response time <15 min Dummy higher education Age Dummy depressed Dummy pain VAS G BTD; TTO GWTD VAS M BTD; TTO M WTD TTO score state G TTO score state M

Abs (Area G − Area B) (n = 464) Abs (Area G − Area M) (n = 661) Abs (M1/2 − B1/2 ) (n = 482) Abs (g1/2 − B1/2 ) (n = 464) Abs (g1/2 − M1/2 ) (n = 661)

Table 10 Results of OLS explaining differences in discounting of different health states (T-values).

2

Abs (Area M − Area B) (n = 482)

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

31

with losses looming larger than gains. This phenomenon is termed loss aversion and causes losses to receive more weight than commensurate gains (Tversky and Kahneman, 1991). Loss aversion may also be relevant for sequences of outcomes, such as a stream of monetary or health outcomes. Consider a series of wage receipts. People often tend to regard their last earned income as their reference point and the subsequent payment will be compared to this reference income. If it is higher, it will be seen as a gain and if it is lower, it will be considered a loss. Due to loss aversion, deteriorating sequences get less overall utility than improving sequences, even when the former have higher net present values for any positive discount rate than the latter (Kapteyn and Teppa, 2003; Loewenstein and Prelec, 1991, 1993; Manzini et al., 2010; Read and Powell, 2002). In other words, it is equivalent to a negative time preference rate. However, since all health states are losses compared to full health, this would imply a similar preference for improving sequence for all three health states, in contrast to our findings. Another explanation is that decision makers compare a sequence to a reference sequence, i.e., to their expectations about how outcomes usually change over time. Because people often experience increasing incomes over time, this may explain their preference for increasing income sequences, while at the same time preferring deteriorating health sequences, with health generally expected to worsen over time (Chapman, 1996a). In particular, people have expectations regarding their health state at different ages (Stolk et al., 2002; Tsuchiya, 2000) and subjects may form different reference health states for different ages (Brouwer et al., 2005). An 80-year old man is not expected to have the same health state as a 20-year old. Consequently, the reference QALY weight given to the 80-year old may be, say, 0.5, whereas the QALY weight given to the 20-year old may be 1. In our choice task, a subject may expect his health to deteriorate in the next 20 years and therefore may choose the deteriorating sequence over the improving sequence, implying positive discounting. Because all our three tasks compared an imperfect health state to full health, a preference for a deteriorating sequence would imply similar patterns in all three tasks. This suggests that the existence of expectations cannot explain our results. The formation of reference points may also have been such that subjects considered state B a loss and state G a gain. There exists empirical evidence suggesting that losses are discounted at a lower rate than gains, both for money (Abdellaoui et al., 2010; Benzion et al., 1989; Thaler, 1981) and for health (Chapman, 1996b; Hardisty and Weber, 2009; Khwaja et al., 2007). However, this would be a reason for less discounting of state B than of states G and M, instead of more. WTD health states may also activate another solving mechanism of a problem, automatically leading to another kind of answering. Perhaps people have difficulties imagining being in such a miserable health state as EQ-5D state 33333, causing them to prefer farther postponement of this state. This will increase observed discounting. Furthermore, individuals could have regarded the health states G and M more realistic for themselves than state B, reducing the wish to postpone them. On the other hand, they may not realistically expect to regain perfect health after being in the worst possible state and therefore preferring the deteriorating health profile over the improving one. Finally, subjects may have considered not actually (or consciously) having to live in state B if first living in the better health, potentially considering sedation or even suicide, to avoid really experiencing B for longer periods of time. This would increase the tendency to choose the option starting with full health. A limitation of this study is that the use of internet may have decreased response reliability. Because effort could not be

32

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

Table A1 Choices of an imaginary subject in our experiment. Stimuli option 1

Utility option 1

(FH1 , . . ., FH120 , B121 , . . ., B240 )

W(120)× V(FH) + [W(240) − W(120)] × V+ (B) = 0.525 × 1 + (1 − 0.525) × 02 = 062 W(60) × V(FH) + [W(180) − W(60)] × V+ (B) + [W(240) − W(180)] × V− (B) = 0.269 × 1 + (0.768 − 0.269) × 0.2 + (1 − 0.768) × − 0.3 = 0299 W(90) × V(FH) + [W(210) − W(90)] × V+ (B) + [W(240) − W(210)] × V− (B) = 0.399 × 1 + (0.886 − 0.399) × 0.2 + (1 − 0.886) × −0.3 = 0.462 W(105) × V(FH) + [W(225) − W(105)] × V+ (B) + [W(240) − W(225)] × V− (B) = 0.462 × 1 + (0.943 − 0.462) × 0.2 + (1 − 0.943) × −0.3 = 0541 W(112) × V(FH) + [W(232) − W(112)] × V+ (B) + [W(240) − W(232)] × V− (B) = 0.492 × 1 + (0.970 − 0.492) × 0.2 + (1 − 0.970) × −0.3 = 0578 116

(FH1 , . . ., FH60 , B61 , . . ., B240 )

(FH1 , . . ., FH90 , B91 , . . ., B240 )

(FH1 , . . ., FH105 , B106 , . . ., B240 )

(FH1 , . . ., FH112 , B113 , . . ., B240 )

Indifference value

Stimuli option 2

Utility option 2

Choice

(B1 , . . ., B120 , FH121 , . . ., FH240 )

W(120) × V (B) + [W(240) − W(120)] × V(FH) = 0.525 × 0.2 + (1 − 0.525) × 1 = 0.58 W(60) × V+ (B) + [W(240) − W(60)] × V(FH) = 0.269 × 0.2 + (1 − 0.269) × 1 = 0.785

Option 1

(B1 , . . ., B90 , FH91 , . . ., FH240 )

W(90) × V+ (B) + [W(240) − W(90)] × V(FH) = 0.399 × 0.2 + (1 − 0.399) × 1 = 0.681

Option 2

(B1 , . . ., B105 , FH106 , . . ., FH240 )

W(105) × V+ (B) + [W(240) − W(105)] × V(FH) = 0.462 × 0.2 + (1 − 0.462) × 1 = 0.630

Option 2

(B1 , . . ., B112 , FH113 , . . ., FH240 )

W(112) × V+ (B) + [W(240) − W(112)] × V(FH) = 0.492 × 0.2 + (1 − 0.492) × 1 = 0.607

Option 2

(B1 , . . ., B60 , FH61 , . . ., FH240 )

monitored and it could not be verified whether subjects fully understood the choice task, some subjects may not have taken the task seriously or misunderstood the instructions. However, we think that these problems are outweighed by the cheaper and more efficient way of data assembly using an internet panel. This also enabled a large and representative sample, which would be prohibitively expensive if all subjects would have been interviewed personally. Finally, the subjects were paid only are very small sum and, hence, were likely to be intrinsically motivated. Moreover, while our results indicate that motivation (in terms of completion time) was important, removing subjects with short durations did not alter our main results. The use of an internet experiment also implied the lack of any qualitative analyses. Hence, we were not able to test what factors respondents were considering in their decision making. Future studies should, therefore, preferably use personalized interviews, allowing for an exploration of the factors that may underpin people’s decision making process. Another limitation of our study is that our test included only three health states, of which only one was considered WTD on average. Therefore, our conclusion regarding differential discounting for BTD and WTD states is not very robust and may instead be influence by particular peculiarities of the tested health state. A test that includes a larger number of WTD states would thus be useful. A drawback of the elicitation method is that it may be prone to the earlier mentioned sequencing effects. That is, if respondents have a preference for improving sequences that is distinct from time preference, the method is not able to disentangle these two preferences. This may have biased the elicited discount rates downwards. In addition, since our experiment used health outcomes, we were not able to implement real incentives, let alone an incentive compatible payment scheme (e.g., a scheme comparable to the one of Carbone and Hey, 2004). This may have reduced the salience of the experimental design, but we are not aware of a more salient design for health outcomes. However, when applying the Direct Method to monetary outcomes we recommend the use of more salient, real incentives. We conclude that the assumption of independence underlying the QALY model appears to hold for health states BTD, but this does not need to be the case for states WTD. This finding has implications for using the QALY model and may require using state-dependent utility of life duration

+

Option 2

functions, at least one for BTD states and one for WTD states. Appendix A. Health state descriptions State G: • • • • •

You have no problems in walking about. You have no problems washing or dressing yourself. You have some problems with performing your usual activities. You have moderate pain or other discomfort. You are not anxious or depressed. State M:

• • • • •

You are confined to bed. You have some problems washing or dressing yourself. You have some problems with performing your usual activities. You have no pain or other discomfort. You are not anxious or depressed. State B:

• • • • •

You are confined to bed. You are unable to wash or dress yourself. You are unable to perform performing your usual activities. You have extreme pain or other discomfort. You are extremely anxious or depressed. Full health:

• • • • •

You have no problems in walking about. You have no problems washing or dressing yourself. You have no problems with performing your usual activities. You have no pain or other discomfort. You are not anxious or depressed.

Appendix B. Example demonstrating that MET preferences may generate apparently lower discounting Suppose a decision maker has an exponential utility function for life duration: W(T) = (1 − e−rT/240 )/(1 − e−r ) where r denotes the discount rate, which is equal to 0.2 for this decision maker. In

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

33

Table A2 Results of OLS explaining discounting (full sample).

2

Adj. R Age TTO score state Dummy left option is most impatient Dummy female Dummy high education VAS state

G0.5 (n = 1361)

M0.5 (n = 1395)

B0.5 (n = 1165)

Area G (n = 1350)

Area M (n = 1395)

Area B (n = 1109)

0.003

0.033

0.003

T = 2.23 (p = 0.026)

T = 5.04 (p < 0.01) T = −4.75 (p < 0.01)

0.055 T = −3.91 (p < 0.01) T = 3.06 (p < 0.01) T = −6.15 (p < 0.01)

0.05 T = 2.99 (p < 0.01) T = −5.22 (p < 0.01) T = 5.69 (p < 0.01)

0.046 T = 4.39 (p < 0.01) T = −2.67 (p < 0.01) T = 4.63 (p < 0.01)

T = 2.30 (p = 0.022)

T = 3.12 (p < 0.01)

T = 2.12 (p = 0.034) T = −2.06 (p = 0.04)

T = −2.17 (p = 0.03)

Table A3 Results of OLS explaining differences in discounting of different health states (full sample).

Adj. R2 Dummy higher education Dummy middle education Age Dummy depressed VAS G BTD; TTO G WTD VAS M BTD; TTO M WTD TTO score state G VAS score B Dummy high income Dummy left option is most impatient option for G

Abs (g1/2 − M1/2 ) (n = 1330)

Abs (g1/2 − B1/2 ) (n = 1055)

Abs (M1/2 − B1/2 ) (n = 1086)

Abs (Area G − Area M) (n = 1330)

Abs (Area G − Area B) (n = 1055)

Abs (Area M − Area B) (n = 1086)

0.054 T = −4.67 (p < 0.01)

0.044 T = −2.92 (p < 0.01)

0.042 T = −5.45 (p < 0.01)

0.064 T = −4.53 (p < 0.01)

0.052 T = −3.24 (p < 0.01)

0.038 T = −4.81 (p < 0.01)

T = −2.99 (p < 0.01)

T = −2.21 (p = 0.027) T = 2.28 (p = 0.023)

T = −2.45 (p = 0.014) T = 2.65 (p < 0.01) T = 2.26 (p = 0.024) T = 6.014 (p < 0.01)

T = −2.18 (p = 0.03)

T = 2.17 (p = 0.03) T = 6.65 (p < 0.01) T = 2.40 (p = 0.017)

T = −5.23 (p < 0.01)

T = −6.32 (p < 0.01) T = 3.77 (p < 0.01)

T = −2.86 (p < 0.01)

T = 1.98 (p = 0.048)

addition, she has a maximum endurable time of 10 years for health state B, i.e., this state has a QALY weight of 0.2 during the first 10 years, but after that the state becomes WTD with a QALY weight of −0.3. Health state G, on the other hand, has a value of V(G) = 0.7 irrespective of duration and, hence, is not subject to MET. Table A1 shows the choices this subject would have made in one part of our experiment. Because we erroneously assume that the value of V(B) is independent of its duration, we evaluate the obtained indifference by the following equation: W (116) × V (FH) + [W (240) − W (116)] × V (B) = W (116) × V (B) +[W (240) − W (116)] × V (FH) ↔ W (116) = 0.5.

T = 3.45 (p < 0.01) T = −3.05 (p < 0.01)

(A1)

Assuming the exponential discounting function, we then obtain: W(T) = (1 − e−116r/240 )/(1 − e−r ) = 0.5. Solving for r gives r = 0.113. Hence, we underestimate the true value of r, which in reality is 0.2. Appendix C. Regression results when including the full sample See Tables A2 and A3. References Abdellaoui, M., Attema, A.E., Bleichrodt, H., 2010. Intertemporal tradeoffs for gains and losses: an experimental measurement of discounted utility. The Economic Journal 120, 845–866. Abdellaoui, M., L’Haridon, O., Paraschiv, C., 2011. Experienced vs described uncertainty: do we need two prospect theory specifications? Management Science 57, 1879–1895. Abellán-Perpinán, J.M., Bleichrodt, H., Pinto-Prades, J.L., 2009a. The predictive validity of prospect theory versus expected utility in health utility measurement. Journal of Health Economics 28, 1039–1047.

Abellán-Perpinán, J.M., Pinto, J.L., Méndez-Martinez, I., Badia-Llach, X., 2006. Towards a better QALY model. Health Economics 15, 665–676. Abellán-Perpinán, J.M., Sánchez-Martínez, F.I., Martínez-Pérez, J.E., MéndezMartínez, I., 2009. The QALY model which came in from a general population survey: roughly multiplicative, broadly nonlinear and sometimes contextdependent. Economic Working Papers at Centro de Estudios Andaluces E2009/04. . Anderson, L.R., Mellor, J.M., 2009. Are risk preferences stable? Comparing an experimental measure with a validated survey-based measure. Journal of Risk and Uncertainty 39, 137–160. Attema, A.E., Bleichrodt, H., Wakker, P.P. A direct method for measuring discounting and QALYs more easily and reliably. Medical Decision Making, forthcoming. Attema, A.E., Brouwer, W.B.F., 2009. The correction of TTO-scores for utility curvature using a risk-free utility elicitation method. Journal of Health Economics 28, 234–243. Attema, A.E., Brouwer, W.B.F., 2010. On the (not so) constant proportional tradeoff in TTO. Quality of Life Research 19, 489–497. Attema, A.E., Brouwer, W.B.F. Constantly proving the opposite? A test of CPTO using a broad horizon and correcting for discounting. Quality of Life Research, forthcoming, doi:10.1007/s11136-011-9917-4. Bala, M.V., Wood, L.L., Zarkin, G.A., Norton, E.C., Gafni, A., O’Brien, B.J., 1999. Are health states “timeless”? The case of the standard gamble method. Journal of Clinical Epidemiology 52, 1047–1053. Baltussen, R., Post, T., van den Assem, M.J., Wakker, P.P. Random incentive systems in a dynamic choice experiment. Experimental Economics, forthcoming, doi:10.1007/s10683-011-9306-4. Beattie, J., Loomes, G., 1997. The impact of incentives upon risky choice. Journal of Risk and Uncertainty 14, 155–168. Benzion, U., Rapoport, A., Yagil, J., 1989. Discount rates inferred from decisions: an experimental study. Management Science 35, 270–284. Bleichrodt, H., Doctor, J.N., Filko, M., Wakker, P.P., 2011. Utility independence of multiattribute utility theory is equivalent to standard sequence invariance of conjoint measurement. Journal of Mathematical Psychology 55, 430–450. Bleichrodt, H., Gafni, A., 1996. Time preference, the discounted utility model and health. Journal of Health Economics 15, 49–66. Bleichrodt, H., Johannesson, M., 1997. The validity of QALYs: an empirical test of constant proportional tradeoff and utility independence. Medical Decision Making 17, 21–32. Bleichrodt, H., Miyamoto, J., 2003. A characterization of quality-adjusted life-years under cumulative prospect theory. Mathematics of Operations Research 28, 181–193.

34

A.E. Attema, W.B.F. Brouwer / Journal of Health Economics 31 (2012) 22–34

Bleichrodt, H., Pinto, J.L., 2005. The validity of QALYs under non-expected utility. The Economic Journal 115, 533–550. Bleichrodt, H., Pinto, J.L., Abellán-Perpinán, J.M., 2003. A consistency test of the time trade-off. Journal of Health Economics 22, 1037–1052. Bostic, R., Herrnstein, R.J., Luce, R.D., 1990. The effect on the preference-reversal phenomenon of using choice indifferences. Journal of Economic Behavior & Organization 13, 193–212. Brouwer, W.B.F., van Exel, N.J.A., Stolk, E.A., 2005. Acceptability of less than perfect health states. Social Science & Medicine 60, 237–246. Buckingham, J.K., Birdsall, J., Douglas, J.G., 1996. Comparing three versions of the time tradeoff: time for a change. Medical Decision Making 16, 335–347. Camerer, C.F., Hogarth, R.M., 1999. The effect of financial incentives in experiments: a review and capital–labor–production framework. Journal of Risk and Uncertainty 19, 7–42. Carbone, E., Hey, J.D., 2004. The effect of unemployment on consumption: an experimental analysis. The Economic Journal 114, 660–683. CBS. Key demographic figures. (accessed 29.03.11). Chapman, G.B., 1996a. Expectations and preferences for sequences of health and money. Organizational Behavior and Human Decision Processes 67, 59–75. Chapman, G.B., 1996b. Temporal discounting and utility for health and money. Journal of Experimental Psychology: Learning, Memory & Cognition 22, 771–791. Coller, M., Williams, M., 1999. Eliciting individual discount rates. Experiment Economics 2, 107–127. Cook, J., Richardson, J., Street, A., 1994. A cost utility analysis of treatment options for gallstone disease: methodological issues and results. Health Economics 3, 157–168. Cubitt, R., Starmer, C., Sugden, R., 1998. On the validity of the random lottery incentive system. Experimental Economics, 115–131. Doctor, J.N., Bleichrodt, H., Miyamoto, J., Temkin, N.R., Dikmen, S., 2004. A new and more robust test of QALYs. Journal of Health Economics 23, 353–367. Dolan, P., 2000. The measurement of health-related quality of life for use in resource allocation decisions in health care. In: Culyer, A.J., Newhouse, J.P. (Eds.), Handbook of Health Economics, vol. 1. Elsevier, North Holland, pp. 1723–1760. Dolan, P., Stalmeier, P.F.M., 2003. The validity of time trade-off values in calculating QALYs: constant proportional time trade-off versus the proportional heuristic. Journal of Health Economics 22, 445–458. Guerrero, A.M., Herrero, C., 2005. Utility independence in health profiles: an empirical study. In: Schmidt, U., Traub, S. (Eds.), Advances in Public Economics: Utility, Choice and Welfare, vol. 38. Springer, Dordrecht, pp. 135–150. Hall, J., Gerard, K., Salkeld, G., Richardson, J., 1992. A cost utility analysis of mammography screening in Australia. Social Science & Medicine 34, 993–1004. Hardisty, D.J., Weber, E.U., 2009. Discounting future green: money versus the environment. Journal of Experimental Psychology: General 138, 329–340. Harrison, G.W., 1986. An experimental test for risk aversion. Economics Letters 21, 7–11. Hey, J.J.D., Morone, A., Schmidt, U., 2009. Noise and bias in eliciting preferences. Journal of Risk and Uncertainty 39, 213–235. Holt, C.A., Laury, S.K., 2002. Risk aversion and incentive effects. American Economic Review 92, 1644–1655. Kahneman, D., Tversky, A., 1979. Prospect theory: an analysis of decision under risk. Econometrica 47, 263–291. Kapteyn, A., Teppa, F., 2003. Hypothetical intertemporal consumption choices. The Economic Journal 113, 140–152. Khwaja, A., Silverman, D., Sloan, F., 2007. Time preference, time discounting, and smoking decisions. Journal of Health Economics 26, 927–949. Kirsch, J., McGuire, A., 2000. Establishing health state valuations for disease specific states: an example from heart disease. Health Economics 9, 149–158. Lamers, L.M., McDonnell, J., Stalmeier, P.F.M., Krabbe, P.F.M., Busschbach, J.J.V., 2006. The Dutch tariff: results and arguments for an effective design for national EQ-5D valuation studies. Health Economics 15, 1121–1132. Loewenstein, G., Prelec, D., 1991. Negative time preference. American Economic Review 81, 347–352.

Loewenstein, G., Prelec, D., 1993. Preferences for sequences of outcomes. Psychological Review 100, 91–108. Manzini, P., Mariotti, M., Mittone, L., 2010. Choosing monetary sequences: theory and experimental evidence. Theory and Decision 69, 327–354. Martin, A.J., Glasziou, P.P., Simes, R.J., Lumley, T., 2000. A comparison of standard gamble, time trade-off, and adjusted time trade-off scores. International Journal of Technology Assessment in Health Care 16, 137–147. McNeil, B.J., Weichselbaum, R., Pauker, S.G., 1981. Speech and survival: tradeoffs between quality and quantity of life in laryngeal cancer. New England Journal of Medicine 305, 982–987. Miyamoto, J.M., Eraker, S.A., 1988. A multiplicative model of the utility of survival duration and health quality. Journal of Experimental Psychology: General 117, 3–20. Morone, A., 2010. On price data elicitation: a laboratory investigation. The Journal of Socio-Economics 39, 540–545. Morone, A., Schmidt, U., 2008. An experimental investigation of alternatives to expected utility using pricing data. Economics Bulletin 4, 1–12. Pliskin, J.S., Shepard, D., Weinstein, M.C., 1980. Utility functions for life years and health status. Operations Research 28, 206–224. Read, D., Powell, M., 2002. Reasons for sequence preference. Journal of Behavioral Decision Making 15, 433–460. Robinson, A., Dolan, P., Williams, A., 1997. Valuing health status using VAS and TTO: what lies behind the numbers. Social Science & Medicine 45, 1289–1297. Sackett, D.L., Torrance, G.W., 1978. The utility of different health states as perceived by the general public. Journal of Chronic Diseases 31, 697–704. Spencer, A., Robinson, A., 2007. Tests of utility independence when health varies over time. Journal of Health Economics 26, 1003–1013. Stalmeier, P.F., Bezembinder, T.G., Unic, I.J., 1996. Proportional heuristics in time tradeoff and conjoint measurement. Medical Decision Making 16, 36–44. Stalmeier, P.F.M., Chapman, G.B., de Boer, A.G.E.M., van Lanschot, J.J.B., 2001. A fallacy of the multiplicative QALY model for low-quality weights in students and patients judging hypothetical health states. International Journal of Technology Assessment in Health Care 17, 488–496. Stalmeier, P.F.M., Wakker, P.P., Bezembinder, T.G.G., 1997. Preference reversals: violations of unidimensional procedure invariance. Journal of Experimental Psychology: Human Perception and Performance 23, 1196–1205. Stiggelbout, A.M., Kiebert, G.M., Kievit, J., Leer, J.W., Habbema, J.D., The, De Haes J.C., 1995. utility of the time trade-off method in cancer patients: feasibility and proportional trade-off. Journal of Clinical Epidemiology 48, 1207–1214. Stolk, E.A., Brouwer, W.B.F., Busschbach, J.J.V., 2002. Rationalising rationing: economic and other considerations in the debate about funding of Viagra. Health Policy 59, 53–63. Sutherland, H.J., Llewellyn-Thomas, H., Boyd, N.F., Till, J.E., 1982. Attitudes toward quality of survival. Medical Decision Making 2, 299–309. Thaler, R.H., 1981. Some empirical evidence on dynamic inconsistency. Economics Letters 8, 201–207. Tsuchiya, A., 2000. QALYs and ageism: philosophical theories and age weighting. Health Economics 9, 57–68. Tversky, A., Kahneman, D., 1991. Loss aversion in riskless choice: a referencedependent model. Quarterly Journal of Economics 106, 1039–1061. Tversky, A., Kahneman, D., 1992. Advances in prospect theory: cumulative representation of uncertainty. Journal of Risk and Uncertainty 5, 297–323. Unic, I., Stalmeier, P.F., Verhoef, L.C., van Daal, W.A., 1998. Assessment of the timetradeoff values for prophylactic mastectomy of women with a suspected genetic predisposition to breast cancer. Medical Decision Making 18, 268–277. van der Pol, M., Roux, L., 2005. Time preference bias in time trade-off. European Journal of Health Economics 6, 107–111. van Osch, S.M.C., van den Hout, W.B., Stiggelbout, A.M., 2006. Exploring the reference point in prospect theory: gambles for length of life. Medical Decision Making 26, 338–346. Wakker, P., Deneffe, D., 1996. Eliciting von Neumann–Morgenstern utilities when probabilities are distorted or unknown. Management Science 42, 1131–1150.