Pergamon
0277-9536(94)E0109-6
Sot'. Sci. Med, Vol. 40, No. 6, pp. 767-776. 1995 Copyright © 1995 ElsevierScience Ltd Printed in Great Britain. All rights reserved 0277-9536/95 $9.50 + 0.00
PREFERENCES FOR OUTCOMES IN ECONOMIC EVALUATION: A N ECONOMIC APPROACH TO A D D R E S S I N G ECONOMIC PROBLEMS AMIRAM GAFNI and STEPHEN BIRCH Centre for Health Economics and Policy Analysis and the Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada Abstract--In this paper we critically appraise the appropriateness and validity from an economic perspective of alternative preference-based approaches to measuring outcomes in economic evaluations of health care interventions. We describe the properties of an outcome measure for economic evaluation to make it compatible with the principles of economics when applied to the problem of resource allocation. We also describe the difference and similarities between the psychometric and the economic approaches for the measurement of outcome• Using these properties we critically appraise the use of QALY and HYE methods of measuring individual and social preferences for health outcome. We argue that the most advanced measure currently available that meets these required properties is the HYE. Because the HYE, unlike the QALY, has its foundations in utility theory under uncertainty, it neither assumes particular formulations of the individual utility function, nor is it incompatible with the principles of economics. As such it represents a further stage in the continuing development of methods for economic evaluation of health care programmes. Key words--economic evaluation, cost utility analysis, QALYs, utility theory
(b) That there are many objectives, but that the alternative interventions are thought to achieve these to the same extent (p. 74).
I. INTRODUCTION [H]ealth economics as a discipline does not exist independently of economics as a discipline (and) ... economics is not the only discipline applicable to ... topics within the general topic of health (Culyer, [1, p. 4]). The application of cost-effectiveness analysis (CEA) to problems of resource allocation in health care has represented a major development in the decision-making processes over the last thirty years. However the nature of the commodity health presents particular challenges to the use of C E A techniques. In particular, the 'biggest bang for the buck' philosophy that underlay the application of C E A in other areas was easily, and meaningfully translated into a healthcare context as being concerned with ensuring that " . . . for any given level of resources available, society . . . wishes to maximize the total aggregate health benefits conferred" [2, p. 717]. But whereas the products of alternative uses of limited resources were, in other areas in which C E A had been applied, generally measured along a single dimension, the multi-dimensional nature of health outcomes (e.g. quality and quantity of life effects) represented a constraint on the use of C E A for evaluating healthcare interventions. So, for example, it was noted [3] that •.. in order to carry out a cost-effectiveness analysis one or the other of the following conditions must hold: (a) That there is one unambiguous objective of the intervention(s) and therefore a clear dimension along which effectiveness can be assessed; or
Because these conditions represented a major limitation on both the application and policy relevance of CEA, further research focused on developing methods for combining the measurement of the different dimensions of health outcomes into a undimensionat measurement scale, or health status index, which would allow these conditions to be relaxed. Many approaches to the measurement of health outcomes had already been developed in the health services research literature (see for example a comprehensive bibliography of quality of life indexes [4]). Such diversity is both inevitable and appropriate because researchers may seek to measure changes in different domains of health and/or they may have a variety of objectives that require different instruments or combinations of instruments. However, few if any of these existing measures were intended to bridge all health domains. So, although they could be used as broader measures of effectiveness within health domains, they could not be used to compare outcomes of interventions involving different health domains (e.g. hip replacements compared to treatment for schizophrenia). The usefulness of economics in general, and C E A in particular, as a way of thinking about allocating scarce resources [1] was therefore limited by the absence of a global outcome measure to assess outcomes both within and among health domains. Alternative approaches have been devel767
768
AMIRAM GAFNI and STEPHEN BIRCH
oped for combining dimensions of health outcomes which can then be used for assessments of health care interventions. Moreover, in some cases these approaches use individuals' preferences for outcomes as a basis of combining dimensions of outcomes and cost utility analysis (CUA) has been used to distinguish methods of evaluation using preference-base weights as the approach for combining the dimensions of outcome from other methods of deriving a single dimension for health outcomes. The goal remains the maximization of health benefits from given resources but the intention is to measure these benefits in terms of the utility associated with the outcome of alternative interventions. Where interest lies in solving resource allocation problems i.e. economic evaluations, it is important that the methods used to combine the dimensions are consistent with the discipline of economics. The purpose of this paper is to analyze current approaches to measuring outcomes in the economic evaluation literature to consider the extent to which these approaches are consistent with the principles underlying the economics discipline and hence are suitable as measurement methods for use in economic evaluations. 2. ECONOMIC EVALUATION, COST-UTILITY ANALYSIS AND IMPLICATIONS FOR OUTCOME MEASUREMENT
It was shown [5] that the principles underlying CUA are concerned with the simultaneous satisfaction of efficiency in both production and product mix. Resource allocation will be optimal (i.e. top-level efficiency achieved) when the social utility function is tangential to the production possibilities frontier. In this way, CUA is consistent with the principles of welfare economics theory and a 'welfarist' approach to economics [1]. But in order to achieve the goal of maximizing health-related well-being (i.e. utility associated with health benefits) from given resources, the methods used to measure health-related wellbeing must be consistent with the theories on which the welfarist approach, and principles of cost-utility analysis are based. In this section we identify several conceptual aspects of outcome measurement that emerge from the welfarist approach. This enables us to identify requirements of measurement methods, in order that these methods are valid ways of measuring outcomes for use in economic evaluations, based on the welfarist approach to economics is general, and cost-utility analysis in particular.
2.1. The internal structure of preference formulation (i.e. the individual's utility function) Under the welfarist approach to economics an individual's preferences are embodied in that individual's utility function. Thus for a measure of outcome to be consistent with the welfarist approach, and hence cost-utility analysis principles, it must be con-
sistent with a theory of utility. Utility is defined in economics as the value of a function that represents an ordering--specifically a preference ordering of different combinations of goods and services consumed. One prospect is at least as good as another if, and only if, it has at least as great a utility. Different utility theories exist (i.e. different preference theories) based on different (but not necessarily mutually exclusive) fundamental axioms (e.g. vonNeumannMorgenstern utility theory, A l p h a - N U choice theory, Anticipated utility, Lottery dependent theory; see Ref. [6]). But the method for measuring utility is determined by the particular theory being followed. We might choose between alternative theories based on how we would like individuals to behave, e.g. a theory which leads to all individuals choosing healthier lifestyles might be adopted based on its normative appeal, even though it provides a poor basis for explaining how individuals actually behave. So, as noted [7]: "[E]ven if people act in a particular manner (descriptively), they may not want to act that way (normatively)" (p. 218). Choosing a theory based on its normative appeal is also justified as follows: "[T]he whole idea of a normative model arises when we are not satisfied with our functioning . . . in view of the many early demonstrated lapses in human decision-making . . . who would want to rely on unaided judgement for a complex and important decision problem" [8, p. 682]. Alternatively, we might choose to be guided by the paretian principle of the individual being the best judge of his or her own welfare and hence chose a theory based on its accuracy in meaning individual's preferences, irrespective of how we feel about these preferences. But if we choose to follow a paretian approach, i.e. believe that individual's own preference should be adopted as the basis of measuring individual welfare, then the method of measuring benefits must accurately reflect the individual's preferences, even if these are not the preferences we feel they should have. 22
Attitudes towards risk
Preferences can be measured under conditions of certainty (i.e. outcomes under consideration are deterministic) or uncertainty (i.e. outcomes are probabilistic) and preferences need not be the same under these two conditions (for a more detailed discussion see Ref. [9]). In health-care interventions decisions are made under conditions of uncertainty (i.e. outcomes are probabilistic) by patients, providers, managers and funders. Hence methods for the measurement of individual preferences among alternative uncertain outcomes must capture individuals' attitude toward risk (i.e. measure utility under uncertainty). But where projects are approved from a societal perspective, it has been suggested [10] that risk to the individual can be ignored and mean values can be used for a measure of outcome, i.e. based on the statistical 'law of large numbers' that benefits are the
Preferences for outcomes in economic evaluation total outcome produced over an entire population which is deterministic and can be redistributed among members of that population, for each of whom separately the outcome was probabilistic. Under such conditions ex post losers can be compensated and the project becomes riskless for each individual as well. But in the case of health-care programmes, health outcomes are intrinsic to individuals and cannot be redistributed among individuals. Hence, social decision-making concerning health outcomes must incorporate individuals' attitudes to risk if it is to reflect individuals' preferences [11].
2.3. Aggregation of individual preferences Several models exist for aggregating preferences, each of which is based on restrictions imposed on the set of preferences and/or the aggregation rule [12]. In particular the aggregation rule should satisfy a set of 'mild' assumptions (e.g. where all individuals prefer A to B under uncertainty then the societal preference is for A). But the aggregation of individuals' utilities of the (probabilistic) outcomes (i.e. evaluation of the total benefit to the community), necessarily involves attaching weights to the utilities of different individuals or groups. In other words, equity considerations are an intrinsic part of any social utility function [13-15]. Hence, the calculation of a social utility function (i.e. method of aggregation of individual utilities) must reflect the equity criterion adopted in the analysis. As already shown [14], this may involve 'correcting' the individuals' utility scores for inconsistencies between the method of measurement and the social utility function that underlies the aggregation rule. Moreover any aggregation of the utility of outcomes over individuals must take into account externalities i.e. one person's health status may affect another person's utility. It has been suggested that the concept of external effects, is much more applicable in the case of health care consumption than for most other commodities because of the special nature of the commodity health that health care is expected to produce [e.g. 16-19]. Hence such effects should be included when measuring outcomes [20].
2.4. The meaning of validity in economic approaches to outcome measurement The act of measurement is an essential component of scientific research, whether in the natural, social or health sciences. The basic concepts which are involved in determining the quality of a measurement instrument (or method) are validity (i.e. does an instrument measure what it is intended to measure) and reliability (i.e. does the scale measure something in a reproducible or consistent fashion) [21]. The intent of this section is to explain how these concepts are used in the economic discipline, i.e. are measures that are found to be valid and reliable also compatible with the economics goal? Measures to be used in economic analysis should
769
be first tested to determine whether the measurement task is feasible and acceptable (e.g. clarity of the presentation, length of the interview, etc.). This is done to ensure subjects participation and completion of the interview. Measures to be used in economic analysis should also be reliable. More about the different tests that one can use to establish reliability of an instrument can be found elsewhere [e.g. 21]. The difference between the 'classical' psychometric approach and the economic approach is in the way that the validity of an instrument is determined. In economics the validity of the instrument stems from the validity of the theory which the instrument is derived from. Thus instead of determining the validity of the instrument itself (the typical case when one uses the classical psychometric approach) one has to establish the validity of the underlying theory. As explained above (see section 2. i), under the welfarist approach, measurement of outcome must accurately reflect individuals' preferences. To establish the validity of the measurement method we need to test the validity of the assumptions underlying the utility theory to which it relates. If individuals behave in a way that violates the assumptions (i.e. axioms) underlying the theory, then this theory or the measurement method derived from the theory cannot be a good descriptor of individuals' behaviour. We might find that individual behaviour violates all or most of the underlying axioms of a theory and yet we would like to use these theories based on the criterion of a normative appeal. The important point is that when a theory is chosen based on the normative criterion alone, one does not have to establish validity using classical psychometric methods. As stated, for example, in [22] " . . . health state utilities are claimed to be utilities obeying the axioms of von Neumann-Morgenstern utility t h e o r y . . . In this case, the standard gamble measurement technique is valid by definition because it is based directly on the axioms . . . " (p. 598). It is often the case that researchers introduce additional assumptions to simplify the presentation of an argument or to simplify the measurement process. The validity of the simplified measurement process should also be based on descriptive accuracy and/or normative appeal. However, unlike the case of the fundamental axioms underlying utility theory, if one or all of the additional assumptions are rejected, based on descriptive validity or normative appeal, it does not necessarily undermine the chosen theory. In the following sections we shall examine how QALYs and HYEs coincide with the above mentioned requirements. 3. QALYS: PRINCIPLES, PRACTICE AND ECONOMICS
The most commonly economic evaluation is ing from a particular aspects of quality and
used measure of outcome for the change in QALYs resultintervention. By combining quantity of life the QALY
770
AMIRAMGAFNIand STEPm~NBIRCH
enables comparison of interventions that affect both of these dimensions. Moreover it allows us to compare effects on different dimensions of quality of life (e.g. distress vs pain). In addition, the QALY is intuitively appealing for decision makers in terms of being expressed as a proportion of a standard, constant-health period of time. It is presumably the lack of intuitive appeal that inhibited the use of the utility function directly (i.e. the meaning of a util is not so easily understood by decision makers). The concept of the QALY is described as follows: The first approach to this problem falls under the rubric of "health status indexes". A health status index is essentially a weighting scheme: each definable health status, ranging from death to coma to varying degrees of disability and discomfort to full health, and accounting for age difference, is assigned a weight from zero to one, and the number of years spent at a given health status, Ys, is multiplied by the corresponding weight, 2s, to yield a number A~Ys that might be thought of as an equivalent number of years with full health--a number of quality-adjusted-life-years (QALYs). The source of these weights is ultimately subjective ... [2, pp. 718-719]. QALYs also play an important role in the broader policy context. As described elsewhere [23], The policy objective underlying the QALY literature is the maximization of the community's health. An individual's health is measured in terms of QALYs and the community's health is measured as the sum of QALYs ... Maximizing health is argued to be a natural objective to want to pursue, given the desire to see resources deployed efficiently (p. 22). Several different methods have been proposed and used to measure the weights (2s). Moreover, the comparison of weights derived from the different methods have found poor correlations among assessment methods, and large discrepancies between the numerical values generated by the different methods, both at the individual and group level [24]. Similar findings are reported in many studies [e.g. 25-28]. This implies that the different methods cannot be measuring the same thing (i.e. preferences for probabilistic health outcomes) and the notion of the QALY toolkit [29] is both ambitious and problematic.
3.1. Conditions for equating QALYs and utilities: validity revisited In choosing between the alternative methods for the purpose of selecting a measure for use in economic evaluations it is appropriate to consider first the relationship between the QALY measure and utility theory. It is widely recognized [e.g. 30-32] that only under restrictive conditions would the QALY index describe how individuals behave if they are expected utility maximizers [33]. These additional conditions are as follows: The two attributes of quality and quantity must be mutually utility independent (preference for gambles on the one attribute are independent of the other attribute), the tradeoff of quantity must exhibit the constant proportional trade-off property (the proportion of remaining life that one would trade-off for a specific quality improvement is independent of the amount of remaining life), and the single-at-
tribute utility function for additional healthy life-years must be linear with time (for a fixed quality level one's utilities are directly proportional to longevity, a property also referred to as risk neutrality with respect to time [32, p. 569]. Satisfaction of these conditions is necessary and sufficient to equate the QALY index with a vNM utility function (i.e. expected utility theory) only for the case of a permanent chronic health state (i.e. being in the same health state for the rest of their life). It was shown [15, 34] that an additional condition must be satisfied for the more general case of a life time health profile (i.e. an individual who might experience several different health states during his or her remaining life). In particular " . . . in the person's preferences, qualities of life at different times are strongly separable . . . (i.e.,) a person's preferences about the qualities of her life in any particular group of years are independent of the qualities of her life in other years" [15, p. 152]. To what extent do the additional assumptions have either normative appeal and/or constitute an accurate description of individuals behaviour? For the case of a chronic health state, even proponents of the QALY method acknowledge that the assumptions needed to equate the QALY to a utility are very strong and presumably unrealistic [e.g. 32, 35]. Moreover it was concluded [36] that, based on available empirical evidence, none of these assumptions hold. It is thus not surprising that proponents of this method argue that "These c o n d i t i o n s . . , are u n c o m m o n in practice, and thus generally a utility weighted QALY is not in itself a utility" [32, p. 569]. With respect to the strong separability condition required for the more general case, it is neither normatively appealing nor an accurate way of describing individuals' behaviour. The condition requires that the utility of health state in period A is independent of the utility of health in period B, and hence the two utilities can be added over time. But intertemporal additivity in utility functions is not accepted by economists as compelling from either a normative or descriptive perspective (see, e.g. [37]). In particular, there are many situations where consumption at one point in time might be expected to influence the marginal utility of consumption at another, and there is no a priori reason why health does not fit this type of behaviour. Furthermore, a large body of research on the demand for health [e.g. 38-43] acknowledges, both at a theoretical level as well as empirical level, that the relationships between consumption in different points in time are important in understanding individuals' behaviour. Finally, although few studies have addressed directly the validity of this assumption from an empirical perspective in the context of health (for studies that refute this assumption in other settings see Refs [37, 44]), recent work [45] found that: ... the utility associated with a complex (health) scenario may not be accurately calculated by a weighted average of the utilities of the constituent states and the assumption of
Preferences for outcomes in economic evaluation a reasonable time preference. It appears that the prognosis has a significant effect upon the assessment of preceding health states and therefore the holistic approach must be adopted to obtain valid results (p. 996). In the absence of any normatively-appealing reason or empirical support for this strong assumption it is not surprising to find that Broome [15] labels it as "IT]he most dubious condition . . . " (p. 152). It was argued that "We need to know whether the imposition of the QALY assumptions does more injustice to 'real' preferences, than asking people to make valuations of future health prospects out of ignorance. It is no defence to reply that we are not entitled to force our judgements of these matters onto the public" [46, p. 306 our emphasis]. But in the absence of empirical evidence or normatively-appealing reason to justify the conditions underlying the QALY model it is hard to see why we would ever want to 'impose' the QALY model as a way of measuring individual preferences [47]. Instead it seems reasonable when asking the public to assist in the determination of health priorities to choose measurement techniques that allow the public to reveal their true preferences. If not, why do we bother asking them at all?
3.2. The choice of utility theory Expected utility theory has been proposed as the
normative standard in the health care field for rational decision-making, under conditions of uncertainty, both at the individual level [e.g. 48] and the group level [e.g. 32, 49]. Numerous empirical studies have found that people often do not generally behave in a manner consistent with the axioms of expected utility theory, and when asked to reconcile their actual behaviour with the axioms, many (while accepting the appeal of these axioms) preferred not to change their original behaviour. Hence the theory has limited validity in terms of descriptive accuracy. Moreover, those findings have lead many economists and decision scientists to question expected utility theory as the normative standard for rational decision-making under uncertainty [e.g. 6, 50, 51]. It is important to emphasize, however, that: (i) if someone chooses expected utility theory as the normative standard the measurement method used must be consistent with this theory. In this way, preferences for health outcomes must be measured in a way which is consistent with this theory, i.e. using lottery-based questions known also as the standard gamble (SG) method [52]. As also stated elsewhere " . . . only the standard gamble method directly measures vNM utilities. The other instruments at best can be viewed as approximations" [32, p. 566]. (ii) accepting the basic assumptions of the expected utility theory does not mean that one
771
must also accept the additional assumptions which are imposed by the QALY structure. Yet those implications are often overlooked. For example it was argued that " . . . the role of the standard gamble question is just to introduce misunderstanding and e r r o r . . , even if we were quantifying the uncertain health state that exists in the real world, the argument that the standard gamble is the appropriate valuation method is rather like insisting that the appropriate way of expressing the value of copper is in terms of gold, because they are both metals. We do not need to express the value of risky prospects in terms of risk" [46, pp. 307-308]. It seems that the author [46] sees the goal of expected utility theory as appropriate but is not comfortable about measuring outcomes in ways consistent with this goal. He provides no alternative approach for measuring utilities compatible with the expected utility model.
3.3. Aggregating individual preferences In terms of the implications for societal preferences, the health-related well-being of community is calculated by summing individuals' QALY values. But because individuals' QALY scores do not represent individuals' preferences, a simple aggregation of QALY scores is meaningless as a measure of societal preferences (i.e. it fails to meet the unanimity requirement). Assume a hypothetical community of individuals with identical preferences. We need ask only one individual for his/her preferences in order to know the community's preferences. But, if the method used to measure preferences does not represent the individual's true preference, it cannot represent the community's preference. Moreover it was shown [14] that in the case of QALY calculations, existing methods of measurement (mainly relating to the SG technique) are frequently inconsistent with the equity criteria adopted by the researchers. For example, we may choose to set the utility of full healthy life from birth to death to be equal for each individual, but the measurement technique used (e.g. the Standard Gamble) sets the rest of life (i.e. years until death) as the measurement standard that is equated across individuals. Furthermore, the same measurement procedures are sometimes used even though different equity criteria are being pursued, although none of them are in fact consistent with this measurement procedure [14, 15]. Adjustment algorithms have been derived to take account of the chosen equity criteria in the methodology for measuring individual utility scores [14]. Finally, it was noted that "the exclusion of externalities may bias program ranking in unpredictable ways, leading to a non-optimal allocation of resources" [20, p. 259] and further argued that it is not clear whether "existing utility measurement t o o l s . . . can be modified to incorporate this information" (ibm p. 274). But this would represent a major
772
AMIRAMGAFNIand STEPHENBIRCH
limitation of the Q A L Y as a measure to be used in the context of decision-making about resource allocation.
4. HYES:
ECONOMIC PRINCIPLES AS A BASIS FOR OUTCOME ASSESSMENT
The H Y E concept is derived directly from the theoretical foundations of utility under uncertainty [53]. Like the QALY, it combines outcomes of both quality of life (morbidity) and survival (mortality) and thus can serve as a c o m m o n unit of measure for all programmes, hence allowing comparisons across programmes. It also preserves the appealing intuitive notion that QALY has.
4.1. HYE: definition of the concept Let Q and T denote two attributes of a chronic health state under consideration (Q = health state of the individual, T = duration of state in years). Let represent the state of full health and Q represent death, such that Q > Q >__Q. Let U(Q, T) be a utility function under conditions of uncertainty that describes the utility, as viewed now by the individual, of being in a given health state Q, starting now, for a period of T years, followed by death. Let H be the healthy-years-equivalent of (Q, T). H is defined as: Find H such that U(Q, H ) = U(Q, T).
(1)
The solution to the above equation yields a hypothetical combination of H years in full health ( 0 ) that is equivalent, in terms of the individual's utility under uncertainty, to living T years in health state Q. Note, no assumptions about separability or linearity are required because the individual is making a direct comparison of the packages of quality and duration. For the general case (i.e. an individual who will be in more than one health state over his/her lifetime) a lifetime health profile of an individual can be described as a vector Q = [q~] where q~ is the ith element of the vector. Let qi be the health state of the individual at the ith period (measured for example in years). For the sake of simplicity but without loss of generality, we assume that all periods are of equal duration (e.g. one year). Denote ~ as perfect health during a period and _q as death. Assume a potential lifetime health profile QT = [ql . . . . qx] where T = r e m a i n i n g life-years. Let U(QT) be a utility function under conditions of uncertainty over the individual's lifetime health profile. Let H be the healthy-years-equivalent of QT. H is defined as follows: Find H such that U(QH)= U(QT)
(2)
where: ~t7 for i = 1 QH = [qi] such that qi = [ q otherwise
H
4.2. The choice of utility theory It is important to distinguish between the definition of the H Y E concept and the specific method suggested to measure the concept. The HYE concept does not require that the individual subscribes to expected utility theory. But for those who accept expected utility theory as the normative standard, a simple two-stage SG procedure can be used to measure the H Y E [34]. This procedure is the same for both the chronic health state and the general case of a lifetime health profile. For the sake of simplicity we describe here the measurement of H Y E for a chronic health state (see Fig. 1). Assume that the individual has additional life expectancy of 30 years. In the first stage the individual is offered two alternatives. The first is a lottery with two possible outcomes: full health for 30 years (probability P ) or immediate death (probability 1 - P). The second alternative offers a certain outcome of living in the chronic state (e.g. unable to walk without a walking frame) for the remaining 30 years [Fig. l(a)]. Probability P is varied until the person is indifferent to either alternative. Denoting the utility of full health for 30 years as 1.0 and the utility of death as 0.0, at the indifference point the person's preference value (utility) of living 30 years in the chronic state is equal to P* (the value of the probability at the indifference point). In the second stage, lottery questions are used to convert the time in ill health (i.e. the chronic state) to time in full health (HYEs). The person is offered two alternatives. The first is a lottery with two possible outcomes: full health for 30 years (probability P*) or Full health for rest of life (30 years)
(a) Stage 1:
I-P
Immediate death Unable to walk without a walking frame for rest of life (30 yearsl
P* = the value of the probability at the indifference point (b) Stage 11:
p*
I-P*
Full health for rest of life (30 years)
Immediate death Full health for H years followed by death
H* = denotes the value of H at the indifference point U(Full health, H*) = U(Chronic State. 30 years) H* is the number of healthy years equivalent to living 30 years unable to walk without a walking frame Fig. I. The measurement of HYE for a chronic health state (an example).
Preferences for outcomes in economic evaluation immediate death (probability 1 - P * ) . The second alternative is the certain outcome of living H more years in full health. H is varied until the person is indifferent between alternatives [Fig. l(b)]. H* denotes the value of H at the indifference point and represents the hypothetical number of years in full health, which is the equivalent of the person living 30 years in the defined chronic state. Because the definition of the HYE does not require that an individual subscribe to expected utility theory (i.e. vNM utility theory), other types of utility function (non-vNM) can be used as the basis for generating algorithms to measure HYEs. Although, to the best of our knowledge, no such algorithm has been proposed, that does not imply that a non-vNM-based algorithm cannot be derived. Such algorithms are likely to be more complex and result in greater burden of measurement. This is, however, the price to pay for a measurement method that is less restrictive than the vNM-based one and allows for better representation of the true range of individuals' preferences. Researchers in this field will have to decide whether they want to choose a method that is easy to use but probably less representative of the individual's true preferences or a method that is more complicated but more representative.
4.3. H Y E (SG based) vs Q A L Y (TTO based) Several authors have recently suggested that the HYE measured using the two-stage SG procedure is equivalent to a QALY measure that uses the timetrade-off technique [46, 54, 55]. But, as we show elsewhere, this argument is invalid on several accounts [47, 56]. We argue that this claim fails to distinguish between choice under uncertainty and under certainty and is a classical example of the misunderstanding of the relations between the measurement technique and the utility theory from which it stems. It is also an example of a confusion between subscribing to the basic axioms of a theory and subscribing to additional behavioral assumptions (i.e. which are in addition to the basic axioms). Due to space limitations we summarize the arguments using a simple example. The time-trade-off (TTO) technique was proposed [57] has an empirical substitute to the standard gamble (SG) technique that would offer a simpler measurement approach and would (hopefully) provide the same results as the SG technique. Unlike the SG, which its use is related to expected utility theory (vNM utility theory), the TTO was not related to any behavioral theory by its proponents. But without such theoretical basis we are unable to interpret or understand what the technique measures, and why empirical studies report large discrepancies in the rating obtained by the two methods [e.g. 24, 28]. The only attempt known to us to link TTO with a utility theory is in [9]. Following [9, 34], both the two-stage lottery used to measure HYE and the TTO are aimed at identifySSM 40/6--D
773
ing two points on an individual's indifference curve. The difference between the two methods is that the TTO deals with decisions under certainty but the two-stage lottery is concerned with decisions under uncertainty. The claim for equivalence was made based on the transitivity axiom. Although the transitivity axiom underlies both approaches, this does mean that the scores derived from the two distinct methods are the same. This is a classical example where two preference theories share some (but not all the) underlying axioms. The following example illustrates our point. Consider the case of an individual with the prospect of living 15 years with chronic renal failure followed by death. Using the TTO technique suppose we find that the individual is indifferent between living 11 years in full health and 15 years with chronic renal failure (i.e. a healthy-years-equivalent under certainty of X * = I1). Now calculate the healthyyears-equivalent under uncertainty. Using the twostage lottery technique suppose we find in stage 1 that the utility of living 15 years with chronic renal failure is equal to say 0.7. In stage 2 we find that the utility of living 8 years in full health is also equal to 0.7. This implies that the healthy-years-equivalent under uncertainty (H*) is equal to 8 years. Note that using the two procedures, each of them stemming from a different preference theory, we arrive in this hypothetical example at two different values (i.e. X* = 11, H* = 8). Both Buckingham [46] and Johanneson et al. [54] argue that this type of example cannot occur. They acknowledge the fact that the SG method allows individuals to reveal preferences under uncertainty but they claim that the risk is introduced in one stage and then cancelled in the second stage. In other words, the effects of the two lotteries are equal and opposite. This argument contains two parts (i.e. equal and opposite effects of risk attitude) neither of which is supported by either theoretical (i.e. mathematical) or empirical proof. The debate also highlights the importance of distinguishing between the basic axioms of a theory and the additional assumptions. The fact that an individual is indifferent between 15 years in renal failure and 11 years in full health (as in our example) does not imply that his preference value of the health state 'renal failure' is equal to 11/15 (as currently assumed when using the TTO as recommended by Torrance et al. [57]). This is an additional assumption that has nothing to do with the basic axioms of the preference theory it stems from [9] and is described by others as being 'heroic' [15, p. 152]. Finally, because the HYE is a measure of outcome which is derived directly from the individual utility function a community health-related well-being measure can be generated which is based on the theoretical foundations of welfare economic theory. For example, such a measure can be derived by implying an additive social welfare function (see for
774
AMIRAMGAFNI and STEPHEN BIRCH
example [58, 59]) to evaluate HYEs across individuals* (i.e. a simple aggregation rule in which the health-related well-being of the community is calculated by summing individuals' HYE values). Different equity criteria can be introduced into the measurement procedures using simple algorithms [14]. With respect to its ability to capture external benefits and costs the suggested method of measurement does not allow us to capture such benefits and costs. However, because the HYE is derived directly from the individual's utility function, such benefits and costs should be able to be captured and the challenge is to develop an algorithm that will allow us to measure them. In summary, from a welfarist economics perspective, (i.e. the perspective that underlies economic evaluation) the validity of the HYE can be established. Current challenges to the validity of the HYE require the introduction of additional assumptions which themselves lack validity in terms of either normative appeal or empirical observation. In other words, attempts to equate the HYE with the QALY do so by assuming conditions necessary and sufficient to establish equivalence. 5. DISCUSSION
In this paper we address the question of what is the appropriate measure to be used in the context of economic evaluations where the outcome is a single, non-monetary and preference based (i.e. cost-utility analysis). We describe the properties of an outcome measure for economic evaluation to make it compatible with the principles of the discipline of economics when applied to the problem of resource allocation. Using these properties we critically appraise the use of QALY and HYE methods of measuring individual and social preference for health outcomes. With respect to the most commonly used measure of outcome, QALYs gained, we distinguish between three groups of measures: those which are associated with utility theory under uncertainty, those associated with utility theory under certainty and those which are not related to utility theory. In the first group only one measure exists---QALYs calculated using the SG method to measure the weights. However, only under very strong assumptions, which have not been validated from either an empirical or normative perspective is a QALY equal to a utility function. Similar problems arise with the multi-attribute utility approach when used to estimate the QALY weights [e.g. 3]. In this method, instead of *Other theories exist of social welfare although none has been utilized to date in the context of health-care decision making. tAlthough we focus attention in this paper on the methods for measuring outcomes in economic evaluations of health-care programmes, the use of the measures in cost-utility studies has also been shown to be both inconsistent with welfare-economic theory [5] and potentially misleading from a decision-making perspective.
measuring every potential outcome directly using the SG method, a multi-attribute scale is developed and a utility function over attributes is estimated, using the SG method. However, the multiattribute utility approach does not imply any reduction in the nature, number or strength of assumptions needed to equate QALYs to utility. On the contrary it involves adding to the existing set of assumptions without establishing validity for any of the existing assumptions. A common method to measure the weights for the QALY calculations in the second group (associated with utility theory under certainty) is the time-tradeoff (TTO) method. But because a welfarist approach requires outcome to be measured under conditions of uncertainty, the TTO alone cannot provide us with appropriate measures. The third group of QALY indexes, as proposed, by the proponents of the EuroQol [60], is subject to even greater limitations. For example, it was shown [61] that in addition to all of the conditions required by the utility based QALYs the EuroQol suffers from two additional problems: (1) the implicit assumption that the effects of health-care interventions are deterministic as opposed to probabilistic, (2) inability to address the validity of the condition that must be satisfied to equate EuroQol scores with utilities because the conditions have not been derived (i.e. it is assumed that the EuroQol represents a valid measurement of individual preferences). The implications of the limitations of the QALY concept are neither academic nor trivial. Misrepresentation of individuals' preferences can result in either preference reversal [53] or biased estimates of the magnitude of the strength of preferences which can affect the result of an economic evaluation [62, 63]. Moreover attempts to discard some of these examples [53] have resulted in generating other examples of preference reversal [47]. Notwithstanding the conceptual limitations of each of these approaches as a method of measuring individual (or societal) preferences, the use of alternative (and distinct) approaches to derive the 'utility weights' generates further problems for the QALY approach. As a consequence, health-care programmes might be evaluated as being a 'burden' or a 'bargain' depending on the particular method of measurement chosen [24, 62]. Moreover the criteria used to derive the weights might be related to the outcome of the evaluation. In so far as the 'utility weights' differ according to the particular method of measurement then the QALY scores which are based on these weights will also differ, indicating that the QALY need not represent the " . . . common unit of measure . . . " [3, p. 113] that it was intended to represent. This therefore invalidates comparisons between cost-utility studies in which differing methods of measurement are used.t As stated by Drummond et al. [64] "[S]ince it is known that different ap-
Preferences for outcomes in economic evaluation proaches to the estimation of utilities for health states generate different values, this has the potential to reduce greatly the comparability o f the six source studies" (p. 35). In this paper the H Y E has been proposed as an alternative method of measurement for probabilistic health outcomes. In developing or refining the H Y E we seek to enhance the validity of economic evaluation f r o m a welfarist economic perspective. Thus we advocate the use of an outcome measure derived from a utility theory under conditions of uncertainty. We relaxed the restrictive assumptions of the existing utility-based Q A L Y model and develop a measurement procedure based solely on the axioms of v N M utility theory. The definition of the H Y E does not require that the individual subscribe to v N M utility theory. Other algorithms to measure H Y E can be developed which stem from other preference theories under uncertainty. It is important to emphasize that relaxing the restrictive assumptions of the utility-based Q A L Y model comes with a price tag i.e. a more complex and time consuming measurement technique. This has resulted in a debate between those who are willing to add assumptions (typically invalid assumptions) to ease the measurement burden and those who would like to relax as many assumptions as possible even at a price of a more complex measurement technique [65]. As Johanneson et al. [54] state " . . . the trade-off is between invoking a behavioral assumption (additive independence) in order to simplify the assessment task" (p. 286). Like H. L. Menken we believe that "[T]o every complex question there is a simple answer . . . and it is wrong". Thus we argue that efforts should be directed to developing better algorithms to measure individuals' and societal preferences in ways that reduce the burden of measurement, but not at the expense of invoking invalid assumptions. It is important that measures of outcome used in economic evaluations be compatible with the principles of economics underlying the economic evaluation approach and avoid simplifying assumptions that are known to be invalid based on the research findings from within the economics discipline (e.g. the demand for health). Failing to do so implies failure to recognize that "health economics as a discipline does not exist independently of economics on a discipline" [Culyer, 1, p. 4].
5. 6.
7. 8. 9. 10. 11. 12. 13. 14.
15. 16. 17. 18. 19. 20. 21.
1. Culyer A. J. Health, economics and health economics. In Health, Economics and Health Economics (Edited by vanderGaag J. and Perlman M.), pp. 3-11. North Holland, Amsterdam, 1981. 2. Weinstein M. C. and Stason W. B. Foundations of cost-effectiveness analysis for health and medical practices. N. Engl. J. Med. 296, 716, 1977. 3. Drummond M. F., Stoddart G. L. and Torrance G. W.
22. 23. 24.
25. 26.
28.
29.
30.
Methods of Economic Evaluation of Health Care Programmes. Oxford University Press, Oxford, 1987.
4. Spilker B., Molinek F. R., Johnston K. A., Simpson
R. L. and Tilson H. H. Quality of life bibliography and indexes. Med. Care 28 (supplement): DSI. 1990. Birch S. and Gafni A. Cost-effectiveness/utility analyses: do current decision rules lead us to where we want to be? J. Hlth Econ. !I, 279, 1992. Karni E. and Schmeidler D. Utility theory with uncertainty. In Handbook of Mathematical Economics (Edited by Hildenbrand W. and Sonnenschein H.) Elsevier Science Publishers, Amsterdam, 1991. Weinstein M. C. Time-preference studies in the health care context. Med. Decision Making 13, 218, 1993. Howard R. A. Decision analysis: practice and promise. Management Sci. 34, 679. 1988. Mehrez A. and Gafni A. Evaluating health related quality of life an indifference curve interpretat.ion for the time trade-off technique. Soc. Sci. Med. 31, 1281, 1990. Arrow K. J. and. Lind R. C. Uncertainty and the evaluation of public investment decisions. Am. Econ. Rev. 60, 364, 1970. Ben Zion U. and Gafni A. Evaluation of public investment in health care: is the risk irrelevant? J. Hlth Econ. 2, 161, 1983. Arrow K. J. Social Choice and Individual Values, 2nd Edn. Yale University Press, New Haven, CT, 1963. Bator F, The simple analytics of welfare maximization. Am. Econ. Rev. 47, 22. 1957. Gafni A. and Birch S. Equity considerations in utilitybased measures of health outcomes in economic appraisals: an adjustment algorithm. J. Hlth Econ. 10, 329, 1991. Broome J. QALYs. J. Public Econ. 50, 149, 1993. Musgrave R. A. The Theory of Public Finance. McGraw-Hill, New York, 1959. Tobin J. On limiting the domain of inequality. J. Law Econ. 13, 263, 1990. Walzer M. Spheres of Justice. Basic Books, New York, 1983. Evans R. G. Strained~Mercy: The Economic of Canadian Health Care. Butterworths, Toronto, 1984. Labelle R. J. and Hurley J. E. Implications of basing health care resource allocation on cost-utility analysis in the presence of externalities. J. Hlth Econ. 11,259, 1992. Streiner D. L. and Norman G. R. Health Measurement Scales: a Practical Guide to their Development and Use.
27. REFERENCES
775
31.
Oxford University Press, Oxford, 1988. Torrance G. W. Utility approach to measuring health related quality of life. J. Chron. Dis. 40, 593, 1987. Wagstaff A. QALYs and the equity eMciency trade-off. J. Hlth Econ. 10, 21, 1991. Hornberger J. C., Redelmeier D. A. and Peterson J. Variability among methods to assess patients' wellbeing and consequent effect on a cost-effectiveness analysis. J. Clin. Epidemiol. 45, 505, 1992. Nord E. Methods for quality adjustment of life-years. So(+. Sci. Med. 34, 559, 1992. Nord E. Toward quality assurance in QALY calculations. Int. J. Techn. Assessment in Hlth Care 9, 37, 1993. Gerard K. Cost-utility in practice: a policy maker's guide to the state of the art. Hlth Policy 21, 249, 1992. Read J. L., Quinn R. J., Berwick D. M., Finberg H. V. and Weinstein M. C. Preference for health outcomes: comparison of assessment methods. Med. Decision Making 4, 315, 1984. Gudex C. and Kind P. The QALY toolkit. Discussion paper 38, Centre for Health Economics, Health Economics Consortium, University of York, York, U.K., 1988. Pliskin J. S., Shepard D. S. and Weinstein M. C. Utility functions for life-years and health status. Oper. Res. 28, 206, 1980. Weinstein M. C., Fineberg H. C., Elstein A. S. et al.
776
32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43.
44. 45. 46. 47. 48.
AMIRAMGAFNI and STEPHENBIRCH
Clinical Decision Analysis. W. B. Saunders, Philadelphia, 1980. Torrance G. W. and Feeny D. Utilities and qualityadjusted life-years. Int. J. Techn. Assessment Hlth Care 5, 559, 1989. Von Neumann J. and Morgenstern O. Theory of Games and Economic Behaviour. John Wiley, New York, 1953. Mehrez A. and Gafni A. The healthy-years-equivalents: how to measure them using the standard gamble approach. Med. Decision Making 11, 140, 1991. Miyamoto J. M. and Eraker S. A. Parameter estimates for a QALY utility model. Med. Decision Making 5, 191, 1985. Loomes G. and McKenzie L. The use of QALYs in health care decision making. Soc. Sci. Med. 28, 299, 1989. Loewenstein G. and Prelec D. Negative time preference. Am. Econ. Rev. 81, 347, 1991, Grossman M. The Demand for Health: a Theoretical and Empirical Investigation. National Bureau of Economic Research, New York, 1972. Grossman M. The demand for health after a decade. J. Hlth Econ. 1, 1, 1982. Muurinen J. M. Demand for health: a generalized Grossman model. J. Hlth Econ. 1, 5, 1982. WagstaffA. The demand for health: some new empirical evidence. J. Hlth Econ. 5, 195, 1980. Wagstaff A. The demand for health: an empirical reformulation of the Grossman model, Hlth Econ. 2, 189, 1993. Birch S. and Stoddart G. Incentives to be healthy: An economic model of health-related behaviour. In Incentives in Health Systems (Edited by Lopez-Casasaovas G.) Springer-Verlag, Berlin, 1991. Loewenstein G. and Prelec D. Preferences for sequences of outcomes. Psychol. Rev. 100, 91, 1993. Hall J., Gerard K., Salkeld G. and Richardson J. A cost-utility analysis of mammography screening in Australia. Soc. Sci. Med. 34, 993, 1992. Buckingham K. A note on HYE (Healthy Years Equivalent). J. Hlth Econ. 12, 301, 1993. Gafni A., Birch S. and Mehrez A. Economics, health and health economics: HYEs versus QALYs. J. Hhh Econ. 12, 325, 1993, Pauker S. G. and Kassirer J. P. Decision-analysis. N. Engl. J. Med. 316, 250, 1987.
49. Torrance G. W. Measuring of health state utilities for economic appraisal: a review. J. HIth Econ. 5, I, 1986. 50. Machina M. J. Choice under uncertainty: problems solved and unsolved. Econ. Perspectives 1, 121, 1987. 51. Fishburn P. C. Expected utility: an anniversary and a new era. J. Risk Uncertainty 1, 267, 1988. 52. Farquhar P. H. Utility assessment methods. Management Sci. 30, 1283, 1984. 53. Mehrez A. and Gafni A. Quality-adjusted life-years, utility theory, and healthy-years-equivalents. Med. Decision Making 9, 142, 1989. 54. Johanesson M., Pliskin J. S. and Weinstein M. C. Are healthy-years-equivalents an improvement over qualityadjusted life-years? Med. Decision Making 13, 281, 1993. 55. Culyer A. J. and Wagstaff A. QALYs versus HYEs. J. Hlth Econ. 12, 311, 1993. 56. Mehrez A. and Gafni A. Healthy-years-equivalents versus quality-adjusted life-years: in pursuit of progress. Med. Decision Making 13, 287, 1993. 57. Torrance G. W., Thomas W. H. and Sackett D. L. A utility maximization model for evaluation of health care programmes. Hlth Serv. Res. 7, l l8, 1972. 58. Fleming M. A cardinal concept of welfare. Q. J. Econ. 66, 366, 1952. 59. Harsanyi J. C. Cardinal welfare, individualistic ethics and interpersonal comparisons of utility. J. Political Econ. 63, 309, 1955. 60. The EuroQol Group. A new facility for the measurement of health-related quality of life. Hlth Policy 16, 199, 1990. 61. Gafni A. and Birch S. Searching for a common currency: critical appraisal of the scientific basis underlying European harmonization of the measurement of health related quality of life (EuroQol). Hlth Policy 23, 219, 1993. 62. Gafni A. and Zylak C. J. Ionic versus nonionic contrast media: a burden or bargain? Can. Med. Assoc. J. 143, 475, 1990. 63. Gafni A. Measuring the adverse effects of unnecessary hypertension drug therapy: QALYs versus HYEs. Clin. Investigative Med. 14, 266, 1991. 64. Drummond M. F., Torrance G. W. and Mason J. Cost-effectiveness league tables: more harm than good? Soc. Sci. Med. 37, 33, 1993. 65. Fryback D. G. QALYs, HYEs, and the loss of innocence. Med. Decision Making 13, 271, 1993.