A limited-sample benchmark approach to assess and improve the performance of risk equalization models

A limited-sample benchmark approach to assess and improve the performance of risk equalization models

Journal of Health Economics 29 (2010) 426–437 Contents lists available at ScienceDirect Journal of Health Economics journal homepage: www.elsevier.c...

278KB Sizes 0 Downloads 49 Views

Journal of Health Economics 29 (2010) 426–437

Contents lists available at ScienceDirect

Journal of Health Economics journal homepage: www.elsevier.com/locate/econbase

A limited-sample benchmark approach to assess and improve the performance of risk equalization models Pieter J.A. Stam a,b,∗ , René C.J.A. van Vliet a , Wynand P.M.M. van de Ven a a b

Institute of Health Policy and Management, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands SiRM – Strategies in Regulated Markets, P.O. Box 24355, 3007 DJ Rotterdam, The Netherlands

a r t i c l e

i n f o

Article history: Received 18 April 2008 Received in revised form 26 January 2010 Accepted 8 February 2010 Available online 17 February 2010 JEL classification: G22 I10 I11 I18

a b s t r a c t A new method is proposed to assess and improve the performance of risk equalization models in competitive markets for individual health insurance, where compensation is intended for variation in observed expenditures due to so-called S(ubsidy)-type risk factors but not for variation due to other, so-called N(onsubsidy)-type risk factors. Given the availability of a rich subsample of individuals for which normative expenditures, YNORM , can be accurately determined, we make two contributions: (a) any risk equalization scheme applied to the entire population, YREF , should be evaluated through its performance in the subsample, by comparing YREF with YNORM (not by comparing YREF with observed expenditures, Y, in the entire population, as commonly done); (b) conventional risk equalization schemes can be improved by the subsample regression of YNORM , rather than Y, on the risk adjusters that are observable in the entire population. This new method is illustrated by an application to the 2004 Dutch risk equalization model. © 2010 Elsevier B.V. All rights reserved.

Keywords: Health insurance Managed competition Subsidies Risk equalization Optimal risk adjustment

1. Introduction In several countries, competition among health insurers is used to stimulate efficiency and responsiveness to consumers’ preferences in the health care sector.1 The ultimate goal is to stimulate health insurance companies to act as prudent purchasers or providers of care for their members. At the same time, financial transfers are needed in such markets for individual health insurance in order to avoid problems of access to coverage for those at high risk. The first and best solution in this case is to organize a system of risk-adjusted equalization payments (Van de Ven et al., 2000) distributed by a sponsor via a so-called Risk Equalization Fund (REF). In European countries, the role of the sponsor is played by the government. Risk adjustment is usually based on a regression model relating observed expenditures to risk factors.

∗ Corresponding author at: Institute of Health Policy and Management, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands. Tel.: +31 10 4088584; fax: +31 10 4089094. E-mail address: [email protected] (P.J.A. Stam). 1 For example, Belgium, Germany, Switzerland, The Netherlands and USA (Van de Ven and Ellis, 2000). 0167-6296/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jhealeco.2010.02.001

Variation in observed health care expenditures will be determined by various risk factors, not all of which the sponsor may want to subsidize. In general, the total set of potential risk factors can therefore be divided into two categories: the subset of risk factors that cause variation in expenditures which the sponsor decides to subsidize, the S(ubsidy)-type risk factors, and the subset that causes variation in expenditures which the sponsor does not want to subsidize, the N(on-subsidy)-type risk factors (Van de Ven and Ellis, 2000, pp. 768–769). In most countries, up to a certain extent, gender, health status, and age will probably be considered as S-type risk factors. Examples of potential N-type risk factors are a high propensity for medical consumption, living in a region with high prices and/or overcapacity resulting in supply-induced demand, or using providers with an inefficient practice-style (Van de Ven et al., 2000). The selection of S-type risk factors plays a crucial role in the scientific and political debate.2 Ultimately, if government is the sponsor, this

2 The need for a society to make its goals explicit is in accordance with the WHO recommendations to make societal goals for countries explicit (Murray and Frenk, 2000).

P.J.A. Stam et al. / Journal of Health Economics 29 (2010) 426–437

categorization will be determined by value judgments in society. Given a specific categorization of S-type and N-type risk factors, adequate measures thereof should be found in order to be able to implement a system of risk-adjusted equalization payments. However, although it may be relatively easy to collect information on age and gender, it often proves difficult to find direct measures of health status that can be made available for every insured individual. As a consequence, a rather limited set of indirect health status measures is often used instead, which may not only lead to undercompensation for expenditure variation caused by S-type risk factors, but also increases the risk of undesired compensation for expenditure variation caused by N-type risk factors. For example, working status may be used as an indirect measure of health status, although expenditure differences between employees and self-employed people may be partly caused by an N-type risk factor such as time price (“no time to visit a doctor”) and the resulting propensity for visiting a doctor. Although there are econometric techniques to avoid compensation for N-type variation, such an approach is seldomly applied because it often turns out to be an even bigger challenge to find adequate measures of the N-type risk factors for every insured individual than finding adequate measures of the S-type risk factors. To the extent that the sponsor might find (more) precise measures of S-type and N-type risk factors for a limited subsample of insured people, up till now a methodology was lacking to exploit this additional information to improve the equalization payments for the total population. The conventional method to determine the performance of a risk equalization model is to compare model predicted expenditures to observed expenditures. This study introduces a new method to assess the extent to which a given set of risk adjusters generate risk-adjusted equalization payments as intended by the sponsor. The basic idea of this new method is to develop a comprehensive “risk equalization” model for a subsample of insured people for whom more (precise) measures of S-type and/or N-type risk factors can be collected (for example, from tailor-made health surveys) than for the larger population of insured people on which the risk equalization model is estimated by the sponsor. We claim that performance of any risk equalization model should be assessed in the subsample as the difference between predicted expenditures from the former model (instead of observed expenditures) and the latter model predicted expenditures. A second innovation is that the results of this exercise can also be used to improve the performance of conventional risk equalization models, even though the extensive array of direct measures of the S-type and N-type risk factors are available for those in the limited subsample alone. To this end, predicted expenditures from the comprehensive model in the subsample (instead of observed expenditures) should be regressed on the risk adjusters that are observable in the entire population. The approach proposed in this study is relevant for all sponsors who need to assess and improve the extent to which their system of risk-adjusted equalization payments functions in accordance with their policy goals. In Section 2 the conceptual framework of our new approach is described. Section 3 gives an illustration by applying this method to the 2004 Dutch risk equalization model. Section 4 concludes, and Section 5 discusses the results. 2. Method

427

price level of treatment that the sponsor considers to be acceptable to be subsidized (Van de Ven and Ellis, 2000). In practice, however, such costs are hard to determine and therefore acceptable costs are usually based on observed expenditures instead of need-based costs. We follow this convention in this study. The calculation of acceptable costs should then be based on a prediction model of observed expenditures that includes (current or prior year) measures of both S-type and N-type risk factors as explanatory variables. This prediction model is denoted as follows:

Y = ˇ0 +

L 

ˇl Sl +

M 

m Nm + 

(1a)

m=1

l=1

where Y is the health care expenditures observed during some period in time, Sl is the lth S-type adjuster, l = 1, ..., L, Nm is the mth N-type adjuster, m = 1, ..., M, and  t is an independent and identically distributed error term. The S-type and N-type adjusters may be observed prior to the observation period for expenditures or during the same period, i.e. the risk equalization model may be either prospective or concurrent, respectively. The variables Y, Sl , Nm , and  are N × 1 vectors, the elements of which contain the observations with respect to insured individuals i = 1, ..., N. The ˛j coefficients may vary over time. After the estimation of the coefficients ˇ and  by ordinary leastsquares (OLS), the acceptable costs are approximated by ˆ0 + Y NORM = ˇ

L l=1

ˆ l Sl + ˇ

M m=l

¯m ˆ m N

(1b)

where the values of the N-type adjusters are set equal to some level desired by (or: ‘acceptable’ to) the sponsor according to the so-called Schokkaert approach, and is often set equal to the ¯ m (Carr-Hill et al., 1994; Schokkaert et al., overall sample mean N 1998; Schokkaert and Van de Voorde, 2000, 2004). Acceptable costs derived this way are called normative expenditures in this study. This Y NORM is the definition of normative expenditures as good as the sponsor can get at the time of the construction of model equation (1b), given the available measurement set of S-type and N-type risk factors. It often turns out to be quite a challenge to find adequate measures of the S-type and N-type risk factors for every insured individual. The difficulty to find precise measures of S-type and N-type risk factors arises because these measures should ideally satisfy the criteria of fair payments, appropriate incentives, and feasible data (Van de Ven and Ellis, 2000) for every insured individual in the population. For example, the latter criterion requires that the measures are feasible to obtain for all individuals without undue expenditures of time or money. As a consequence, the measurement set reduces the larger the population of insured people for which the individual risk-adjusted equalization payments should be calculated. Usually, estimation is based on as large a “sample” of individuals as possible, ideally the total population of insured people. In the context of the Dutch REF this amounts to (nearly) the entire population of over 16 million people; for the Medicare system in the USA a sample of 5% of the relevant population is used, amounting to several millions. As a consequence, the calculation of the normative expenditures is often based on a limited set of potentially inaccurate measures of the S-type and N-type risk factors.

2.1. The calculation of the normative expenditures 2.2. The calculation of the REF predicted expenditures Theoretically, the calculation of the risk-adjusted equalization payments should be based on acceptable costs, i.e. the costs of services that follow from a quality, intensity and (demand and supply)

In most countries with a system of risk equalization, an approximation of normative expenditures follows from the estimation of

428

P.J.A. Stam et al. / Journal of Health Economics 29 (2010) 426–437

the coefficients ˛j by OLS in the following linear equation: Y = ˛0 +

J j=1

˛j Xj + ∈

(2a)

where the jth REF adjuster Xj is a potentially imperfect measure of the S-type risk factor, j = 1, ..., J, and ε is an independent and identically distributed error term. Notice that measures of the N-type risk factor are not included at all. The approximation of normative expenditures, called REF predicted expenditures in this study, are calculated by the sponsor as Y REF (˛) ˆ =˛ ˆ0 +

J 

˛ ˆ j Xj

(2b)

imperfect measures of the N-type risk factors do not adequately remove direct and indirect N-type expenditure variation, the latter induced by S-type risk factors. Consequently, the Schokkaert approach as a strategy to reduce the problem of unintended compensation for N-type expenditure variation will be less effective in case of imperfect measures of the N-type risk factors. If the Zk are available, the relevance of applying the Schokkaert approach can be determined by performing a generalized Hausman (1978) specification test of the null hypothesis that the coefficients corresponding to the measures of the S-type measures are equal to those from an alternative regression that excludes the measures of the N-type risk factors. The Schokkaert approach should be applied if this null hypothesis is rejected.

j=1

where the ˛ ˆ = {˛ ˆ 0, ˛ ˆ 1, . . . , ˛ ˆ J } are the weights to the REF adjusters when calculating REF predicted expenditures.3 Note that inclusion of any Xj in Eq. (2a) is a normative choice by the sponsor as all observed expenditure variation will be cross-subsidized among the subgroups defined by Xj , despite the possibility that Xj may also capture N-type expenditure variation to some extent. This normative choice may be made if the sponsor expects Xj to primarily (though potentially not exclusively) capture S-type expenditure variation. Also note that, by property of OLS, total observed health care expenditures sum up to total REF predicted expenditures (and normative expenditures, see above) in the study sample. If measures of the N-type risk factors can be made available for the total population of insured people, REF predicted expenditures can be improved by including these measures during the estimation and setting the N-type risk factors to their average values when calculating the equalization payments. To the best of our knowledge, this Schokkaert approach is only applied by the sponsor in Belgium. Eq. (2a) is then replaced by Y = ω0 +

J 

ωj Xj +

j=1

K 

ık Zk + 

(3a)

k=1

where the kth measure Zk is a potentially imperfect measure of the N-type risk factors, k = 1, ..., K, ωj and ık are unknown coefficients to be estimated and  is an independent and identically distributed error term. REF predicted expenditures are thus given by: ˆ =ω Y REF (ω, ˆ ı) ˆ0 +

J  j=1

ω ˆ j Xj +

K 

ıˆ k Z¯ k

(3b)

k=1

where the arguments ω ˆ = {ω ˆ 0, ω ˆ 1, . . . , ω ˆ J } and ıˆ = {ıˆ 0 , ıˆ 1 , . . . , ıˆ K } represent the vectors of estimated regression coefficients. A sponsor may choose to include Zk if it is expected to primarily (though potentially not exclusively) capture N-type expenditure variation. Note that biasedness of ˛ ˆ in Eq. (2b) implies that ˛ ˆ0 = / ω ˆ0 + K ˆ ¯ k and ˛ ˆ = / ω ˆ for all k = 1, ..., K if the correlation between Z ı k k k k=1 Xj and Zk is non-zero for any j and k. Two problems may arise due to the limited availability of adequate measures of the S-type and N-type risk factors. The first is that imperfect measures of the S-type risk factors may lead to undercompensation for S-type expenditure variation. A strategy to reduce this problem, besides a continuous search for new such measures, is to apply some form of ex-post risk sharing. However, this solution comes at the expense of introducing disincentives for efficiency (Van Barneveld et al., 2001). The second problem is that

3 ˛ ˆ is introduced as an argument to the notation of REF predicted expenditures in order to distinguish them from REF predicted expenditures when using an alternative set of regression coefficients, see Eqs. (4) and (5b).

2.3. Assessment and improvement of REF predicted expenditures Although it may be difficult to satisfy the criteria of fair payments, appropriate incentives, and feasible data (Van de Ven and Ellis, 2000) for every insured individual in the total population, it probably will be a lot easier for the sponsor to find such measures for a representative subsample of insured individuals. Because of the restricted sample size it may be possible to collect a rich set of measures of the S-type and N-type risk factors from claims data and other data sources. For example, a tailor-made health survey can be conducted under a limited sample of insured people, for whom a relatively precise measure of the S-type risk factor health status may be derived than the limited set of health measures that are used by the sponsor to derive the risk-adjusted equalization payments for the total population. Furthermore, it will also be easier to find measures of the N-type risk factors from additional data sources when using a limited subsample of insured people and apply the Schokkaert approach in order to prevent omitted variables bias in the estimated coefficients of the risk equalization equation. Acceptable costs for this limited subsample of insured people can then be based on a regression analysis of observed expenditures on these relatively precise measures of the S-type and N-type risk factors. Assuming that the norms of the sponsor are reflected as accurately as possible, normative expenditures may then function as a benchmark against which the sponsor can assess the performance of the actual risk equalization model as well as that of alternative model specifications. Furthermore, these results can be used to better align the risk-adjusted equalization payments across the subgroups defined by the REF adjusters with the policy goals of the sponsor. 2.3.1. Assessment of REF predicted expenditures Traditionally, the performance of the REF equation is judged in terms of individual differences between REF predicted expenditures and observed expenditures. For example, these individual differences are used to calculate R2 , mean absolute deviations (MAD) and averages for selected subgroups of relatively unhealthy insured people. However, using observed expenditures as a benchmark is no longer a valid strategy if the sponsor intends to subsidize S-type variation in expenditures alone. In that case, normative expenditures instead of observed expenditures should serve as a reference point against which to test the performance of REF predicted expenditures. REF predicted expenditures will deviate from normative expenditures to the extent that S-type expenditure variation remains uncaptured and/or N-type expenditure variation is unintentionally compensated for. Therefore, given the specification of the normative equation, any existing gap between acceptable costs and normative expenditures can be interpreted as a misalignment of the risk equalization model to the policy goals of the sponsor.

P.J.A. Stam et al. / Journal of Health Economics 29 (2010) 426–437

The following performance measure can be defined as an alternative to R2 and MAD

N

ˆ =1− k ()

i=1

N

ˆ − yNORM |k |yiREF () i

i=1

|yiNORM − y¯ iNORM |k

(4)

N

where y¯ NORM = N1 yNORM and either  = ˛ or  = (ω,ı), dependi=1 i ing on the sponsor’s choice to derive REF predicted expenditures according to Eq. (2b) or (3b), respectively. 1 is a sign-reversed, normalized version of the MAD and 2 is an alternative to R2 . A noticeable difference between 1 and 2 is that the influence of an incidental large difference between yiREF and yiNORM is smaller if performance is measured in terms of 1 than in terms of 2 . The theoretical range of k is (−∞,1], where model performance is better the closer k will be to 1.4 If yiREF = yiNORM for every insured i = 1, ..., N, then k = 1. In that case, it may be concluded that the performance of the REF model is as intended by the sponsor, under the assumption that the normative Eq. (1a) is correctly specified. Negative values of k imply that observed variation between REF predicted and normative expenditures is larger than observed variation of normative expenditures across individuals. In that case, the average sample value of normative expenditures may actually serve as a better estimate of normative expenditures than REF predicted expenditures, and the sponsor should be advised to drop any REF adjuster but the constant term from the REF equation: it is better not to induce any cross-subsidies at all, than to induce false ones. The performance of the REF equation at the subgroup level of analysis is calculated as the average difference between REF predicted and normative expenditures for the subgroups defined by the REF adjusters. Ideally, this difference equals zero for each of these subgroups. Any deviation from normative expenditures implies unjustified over- or undercompensation of the insured belonging to this subgroup to the extent of this deviation. 2.3.2. Improvement of the REF predicted expenditures We propose a new method that can improve the performance of the REF equation, which exploits the extra information of the more (precise) measures of the S-type and N-type risk factors available in the subsample to the extent that these measures are not already included in the specification of the risk equalization model by the sponsor. The proposed procedure starts with regressing normative expenditures Y NORM (instead of observed expenditures Y) on the limited set of REF adjusters for the limited subsample as follows: Y NORM = 0 +

J 

j Xj + .

(5a)

j=1

Estimation of Eq. (5a) generates an estimate of normative ˆ 5: expenditures that can be seen as an alternative to Y REF (˛) ˆ = ˆ0 + Y REF ( )

J 

ˆ j Xj

(5b)

j=1

ˆ j reflects S-type expenditure variation alone. Therefore, where ˆ j is an estimate of the marginal effect of the REF adjuster Xj ˛ ˆj − on observed expenditures for which the sponsor does not desire cross-subsidization. This marginal effect is generally non-zero. If

4 This range is the same as the range of Cronbach’s (1951) alpha coefficient of reliability in classical measurement theory. 5 The explained variance (R2 ) in Eq. (5b) is expected to be much larger than in Eq. (2), because at the individual level the variance of normative expenditures in Eq. (5a) is much smaller than observed expenditures in Eq. (1).

429

the normative equation is correctly specified, unintended compensation for expenditure variation among the subgroups defined by ˆ j instead the REF adjusters can thus be avoided entirely by using of ˛ ˆ j to calculate the risk-adjusted equalization payments, not only for those individuals in the limited subsample but even for the total population of insured people. In this way, risk-adjusted equalization payments can be calculated that are better aligned with the policy goals of the sponsor than when using the unadjusted coefficients. Our new approach can be interpreted as a procedure to adjust for omitted S-type and/or N-type variables bias in the estimated coefficients from REF equation (3a).6 The performance ˆ and 2 ( ) ˆ with ˆ = { ˆ 0, ˆ 1, . . . , ˆ J }, measure is denoted by 1 ( ) ˆ equals R2 associated with Eq. (5a). where 2 ( ) Alternatively, if measures of N-type risk factors can be made available for the total population of insured individuals, then the Zk from Eq. (3a) must be added as N-type adjusters to Eqs. (5a) and (5b). This procedure then starts by estimation of the equation Y NORM = 0 +

J 

j Xj +

j=1

K 

j Zk +

(6a)

k=1

and boils down to a normative adjustment of the coefficients in Eq. (2b) as described by ˆ = ˆ ) ˆ0 + Y REF ( ,

J 

ˆ j Xj +

j=1

K 

ˆ j Zk .

(6b)

k=1

This procedure, if feasible, should always be preferred over and above omitting the Z-variables. The performance measures are ˆ and 2 ( , ˆ with ˆ ) ˆ ) ˆ = { ˆ 0, ˆ 1, . . . , ˆ J } and ˆ = denoted by 1 ( , ˆ equals R2 associated with Eq. (6a). ˆ ) { ˆ 0 , ˆ 1 , . . . , ˆ K }, where 2 ( , Note that, contrary to the procedure described by Eq. (3b), the individual values of the Zk are not set equal to some level desired ˆ The by the sponsor when calculating this alternative to Y REF (˛). ˆ and ˆ capture Sexplanation for this different procedure is that type expenditure variation alone, under the assumption that the normative equation is correctly specified. 3. Empirical illustration 3.1. Study population and data source The population studied is the 2001/2002 Dutch population of sickness fund enrollees who constitute about two-thirds of total Dutch population. They were legally obliged to enroll with a sickness fund (together with their family members) because they earned an annual wage below an income threshold of D 30,700 (2002). The study sample is based on the individual claims data of the largest sickness fund at the time, called Agis Health Insurance (market share: 16%). These claims data include 2002 health care expenditures for general practitioner (GP) care, inpatient room and board, inpatient and outpatient specialist care, prescription drugs, dental care, obstetrics, physical therapy, medical devices, sick-transport, and maternity care. Every member had the same insurance coverage without deductibles or co-payments and was insured during the 2001–2002 period, 3.9% of them for less than two years. The mean expenditures in 2002 are D 1753 with a coefficient of variation of about 3.4. Only 3.7% of the members had zero expenditures in 2002.

6 This notion is in line with the suggestion in some recent theoretical papers that optimal risk adjustment does not generally require the capitation payments to equal average expenditures for the subgroups defined by the REF adjusters (Ellis, 1998; Frank et al., 2000; Glazer and McGuire, 2000, 2002; Sappington and Lewis, 1999).

430

P.J.A. Stam et al. / Journal of Health Economics 29 (2010) 426–437

Table 1 Measurement sets X and S of the S-type risk factors and Z and N of the N-type risk factors. Measurement sets

S-type risk factors Age Gender Health status

N-type risk factors Output price Access and time price

X

S

Z=N

Age Gender

Age Gender

– –

(Single) PCGs (Single) DCGs Insurance eligibility Region – – –

(Multiple) PCGs (Multiple) DCGs – – CCI SF-36 OECD

– – – – – – –





Hospital output price

– –

– –

Distance to hospital Distance to GP

All data on expenditures refer to actual charges. The individual claims data also include 2001 age, gender, pharmacy-based and diagnostic health measures (PCGs and DCGs) (Lamers, 1998, 1999; Pope et al., 2000; Van Vliet and Prinsze, 2003), insurance eligibility, region, hospital output price, distance to the GP, and distance to the hospital. These claims data are combined with the answers of respondents (N = 18,617) to a health survey conducted in the last quarter of 2001. This special purpose survey was conducted to collect self-reported health measures of general health status, functional status, and long-term diseases and conditions not present in the claims administration. A stratified sampling procedure was designed to get an overrepresentation of people with a poor health status (Stam, 2007). In the analysis, the strata are weighted back to population proportions to take account of the stratified sampling procedure. 3.2. The specification of the REF and normative equation The specification of the REF equation that was used for risk equalization among Dutch sickness funds in 2004 and 2005 is used to illustrate the new method. This means that Eq. (1a) contains the administrative variables age, gender, insurance eligibility (i.e. subgroups of disabled, (self-)employed, unemployed, retired insured people and those on social welfare), region, PCGs and DCGs.7 The (single) PCGs and (single) DCGs are rank-ordered, and an individual may belong to one PCG and/or one DCG at most (Lamers and Van Vliet, 2003). The ex-post risk sharing arrangements between insurers and the sponsor are not applied here, as we focus on (ex-ante) risk adjustment.8,9 Under the 2006 Dutch Health Insurance Act, the purpose of the REF model is to compensate for differences in health status among insured people that are caused by age, sex, and objective

7

Appendix A describes the construction of region. The actual specification of the 2004 Dutch REF equation differs from its implementation in this study in some respects. The REF adjuster age defines ten-year instead of five-year classes of age, and interactions between the REF adjusters insurance eligibility and age are absent in this study. From a separate analysis it appears that the correlation between 2004 REF predicted expenditures for the study sample and those for the total population of 10 million Dutch sickness fund insured equals 0.973. 9 These types of REF adjusters are also included in the risk equalization model that holds after the convergence of social health insurance and private health insurance enacted by the 2006 Dutch Health Insurance Act. 8

measures of health status (Hoogervorst, 2005). In this empirical illustration, it is therefore assumed that the Dutch government desires risk-adjusted equalization payments to equalize the variation in predicted expenditures that is caused by the S-type risk factors age, sex and health status alone. Table 1 lists the measurement sets of the S-type and N-type risk factors, given this choice of the Dutch sponsor. Table 1 shows that DCGs, PCGs, insurance eligibility and region constitute the measurement set of the S-type risk factor health status in the REF equation. DCGs and PCGs are direct measures of health status, whereas insurance eligibility and region can be interpreted as indirect measures (Stam et al., 2010). Insurance eligibility and region are included in the 2004 Dutch REF equation to capture health status differences not already captured by the PCGs and DCGs. For example, given age, gender, PCG and DCG classification, it is expected that disabled people have to cope with worse health conditions than those who are self-employed. As a consequence of the inclusion of insurance eligibility and region as REF adjusters in the REF equation, the risk-adjusted equalization payments will compensate for the full discrepancy in observed expenditures between disabled and self-employed individuals, which is justified only if these differences cannot be attributed to any N-type risk factor. The development of our normative approach facilitates a test of this yet untested assumption. Furthermore, this normative approach can be applied in order to validate the political decision to include the employed and self-employed individuals as separate subgroups in the REF equation since 2004. Given the choice about the S-type risk factors of the Dutch sponsor, age and gender are included in both the REF equation (2a) and the normative equation (1a). Table 1 shows that the implementation of the S-type risk factor health status differs between these equations in this study, however. The motivation for this difference is that the normative equation should be restricted to direct measures of health status (or, more generally: direct measures of S-type and N-type risk factors) in order to avoid such undesired compensation for N-type expenditure variation by construction, which might be the case with insurance eligibility or region. Other direct measures of health status are the SF-36 questionnaire, a count of seven specific OECD (auditive, visual and mobility) limitations, and a Chronic Conditions Index (CCI) based on 20 self-reported longterm diseases and conditions (see Appendix B), being self-reports of perceived health status, functional health status and chronic conditions, respectively (e.g. Kautter and Pope, 2005; Stam et al., 2010). (Multiple) PCGs and (multiple) DCGs are added to the normative equation, as medical care expenditures are not necessarily

P.J.A. Stam et al. / Journal of Health Economics 29 (2010) 426–437

larger for lower scores on the above mentioned health status indicators (Newhouse et al., 1989). An insured can belong to multiple PCGs and/or multiple DCGs in order to improve the prediction of expenditure variation resulting from co-morbidities. This implementation differs from that in REF equation (2a), where an insured individual belongs to a single PCG and/or a single DCG. Due to sample size limitations, we did not estimate coefficients of the selfreported health measures separately for each included PCG and DCG in order to capture expenditure variation caused by intensity of treatment effects (Hornbrook and Goodman, 1996). Hospital output price, distance to the GP and distance to the nearest hospital measures are also included during the estimation of the normative equation (3a) as measures of the N-type risk factors. This measurement set of the N-type risk factors is identical to that in REF equation (3a).10 4. Results 4.1. Assessing the performance of the REF equation REF predicted expenditures follow from Eq. (2b) using the set of REF adjusters X listed in Table 1. The estimated coefficients are shown in the last but two column of Table 2 . The corresponding R2 equals 17.9%, which is in line with results on the Dutch REF equation reported elsewhere (e.g. Van de Ven et al., 2004). An Ftest of the estimated coefficients being zero ( 2 (3) = 6.31, p = 0.097) indicated that hospital output prices, distance to the GP and distance to the hospital being included as measures of the N-type risk factors using Eq. (3b) do not contribute to the prediction of expenditure variation. Furthermore, a generalized Hausman specification test ( 2 (54) = 2.89, p = 1.000) showed that their inclusion does not lead to significant deviations from the estimated coefficients in REF equation (2b).11 Therefore, the choice of the Dutch sponsor to adhere to the specification of REF equation (2b) cannot be refuted, given this measurement set of the N-type risk factors. Nonetheless, for illustrative purposes, in Tables 3 and 4 we also present the results based on Eq. (3b) under the assumption that the sponsor manages to make the N-type adjusters available for every insured individual in the total population. The last column in Table 2 contains the estimated coefficients from normative equation (1b), the last but one column lists the estimated coefficients in case the PCGs and DCGs are omitted from this equation. The predictive power of the PCGs and DCGs is illustrated by an increase of R2 from 7.3% to 19.6%.12 Furthermore, it appears that most of the estimated coefficients corresponding to the SF-36 health status scales, the OECD limitations and in particular the CCI are reduced after inclusion of the PCGs and DCGs. A generalized Hausman specification test ( 2 (54) = 8.37, p = 1.000) showed that these reductions (and the changes of the other coefficients) are not significantly different from zero, however.13

10

The REF adjuster region is not included in the normative equation because it includes (to an important extent) N-type information. Note that the regions are not merely zip-codes, contrary to what is often the case in the literature (see Appendix A). 11 The p-value of 1.000 can be explained by the very low value of the test statistic which is based on two sets of estimated coefficients that appear to be very close to each other indeed (results available upon request). 12 Note that, in this study, maximization of the conventional R2 is not an inherent goal. The REF coefficients must be determined such that REF predicted expenditures are as close as possible to normative expenditures (instead of observed expenditures). 13 The estimated MH coefficient does not have the expected sign, whether the PCGs and DCGs are included or not, but remember that the presented coefficients reflect partial effects on health care expenditures such that part of the expected effect associated with mental health may already have been captured by other variables

431

An F-test of zero coefficients corresponding to hospital output prices, distance to the GP and hospital ( 2 (3) = 8.48, p = 0.037) indicated that these N-type adjusters add to the prediction of expenditure variation. However, from a generalized Hausman specification test ( 2 (54) = 3.94, p = 1.000) it follows that their inclusion does not lead to a significant change of the coefficients corresponding to the S-type adjusters in the normative equation. This means that, given the current set of N-type adjusters, it cannot be shown that the coefficients of the S-type adjusters suffer from omitted variables bias. Nonetheless, for illustrative purposes, we adhere to the specification of the normative equation that includes this set of N-type adjusters. For each of the eight scales, the SF-36 scores are heavily skewed to the right (results not presented here). Therefore transformations of these scales to dummy variables were also tested. For each scale, dummy variables were created for the first, second and third quartile of the continuous metric scale values. Only 6 out of 24 estimated coefficients appeared significantly different from zero when included in the normative equation (3a) (two-sided t-test, p ≤ 0.05). Only 4 out of 24 significant coefficients resulted for an alternative variant with interactions between these dummy variables and their corresponding continuous metric scale values. Furthermore, for both transformations, an F-test of equality between the coefficients corresponding to the three quartile variables could not be rejected for 4 out of the 8 scales (p ≤ 0.05) and explained variance appeared to be smaller than the 19.6% reported in Table 2. Therefore, the untransformed SF-36 health status scales were preferred as S-type adjusters in the normative equation. Table 3 shows that the performance of the 2004 Dutch REF equation equals 90.2% in terms of 2 . Of course, this result only holds under the assumption that normative expenditures as derived in this study reflect acceptable costs adequately. Table 3 also shows the performance of alternative model specifications, given that subsets of the 2004 Dutch REF adjusters are included in Eq. (1a). It appears that 2 equals 18.6% for the demographic model, which increases to either 39.0% after inclusion of the PCGs or even to 79.9% if the DCGs are added to the demographic model. If performance is determined in terms of the 1 from normative expenditures, then the improvement after adding PCGs to the demographic model is close to the improvement after adding DCGs instead. The contribution of the DCGs is thus larger than the PCGs if measured in terms of 2 , but about equal if measured in terms of 1 . This may be explained by the fact that DCGs better capture the costs of hospitalizations which have a more incidental character than the chronic diseases described by the PCGs. Incidental deviations from normative expenditures are weighted quadratically when calculating 2 and non-quadratically when calculating the 1 . 4.2. Improving the performance of the REF equation Under the assumption that the N-type adjusters can be made available for every insured individual in the population, the REF coefficients can be adjusted according to Eq. (3b). However, Table 3 shows that the performance of the 2004 Dutch REF equation using these adjusted coefficients barely differs from the situation when applying the unadjusted coefficients, at least not given the set of

included in the regression. This partial effect may be a reflection of a substitution effect that results from care provided inside mental care institutions. This type of care was not part of the sickness fund benefits package. In this way, a poorer mental health status could lead to reduced sickness fund expenditures. Moreover, if the PCS and MCS scores were included instead of the eight SF-36 subscales, the estimated coefficients with respect to PCS and MCS both have the appropriate negative sign.

432

P.J.A. Stam et al. / Journal of Health Economics 29 (2010) 426–437

Table 2 Weighted sizes and estimated regression coefficients for subgroups included in the REF equation (2b) and two versions of the normative equation (1b). Explanatory variables

Weighted size of subgroupa

REF equation (2b)

Normative equation (1b)

(Single) PCGs and DCGS included

(Multiple) PCGs and DCGS excluded

M 15–24 (reference category) M 25–34 M 35–44 M 45–54 M 55–64 M 65–74 M 75–84 M ≥ 85 F 15–24 F 25–34 F 35–44 F 45–54 F 55–64 F 65–74 F 75–84 F ≥ 85

4.2% 7.1% 6.9% 6.3% 5.7% 5.0% 2.3% 0.3% 6.2% 11.3% 12.8% 11.3% 8.9% 7.2% 4.0% 0.7%

– −209 −316 464 380 1493* 2796* 1598 −109 345 0 194 274 1003* 2289* 1662*

No PCG (reference category) Asthma/COPD Epilepsy Crohn/colitis ulcerosa Cardiac disease Rheumatism Parkinson Diabetes (Type I) Transplantation Cystic fibrosis Neuromuscular disorder HIV/AIDS Renal disease/ESRD

91.2% 3.4% 0.5% 0.2% 2.8% 0.3% 0.1% 1.2% 0.1% 0.0% 0.1% 0.1% 0.0%

– 1883* 1803* 1098 2001* 3848* 3199* 3366* 7791* 3823 8030* 11895* 20748*

– – – – – – – – – – – – –

0 1537* 1393* 181 1109* 2865* 2048 2343* 6587* 3105 6608* 11773* 19468*

No DCG (reference category) DCG01 DCG02 DCG03 DCG04 DCG05 DCG06 DCG07 DCG08 DCG09 DCG10 DCG11 DCG12 DCG13

97.2% 0.4% 0.5% 0.4% 0.4% 0.3% 0.1% 0.2% 0.2% 0.0% 0.1% 0.1% 0.1% 0.0%

– 1356* 6319* 3565* 5591* 4262* 7820* 6038* 8869* 7983* 18152* 12626* 9050* 77982*

– – – – – – – – – – – – – –

0 651 4572* 3442* 4656* 2313* 6243* 3753* 6876* 5524* 15037* 7217* 8711* 74847*

Disabled Employed (reference category) Social welfare Unemployed Retired Self-employed

9.0% 59.5% 4.1% 4.2% 20.5% 2.8%

1437* – 211 214 341 −197

– – – – – –

– – – – – –

ZIP-code cluster 1 ZIP-code cluster 2 ZIP-code cluster 3 ZIP-code cluster 4 ZIP-code cluster 5 ZIP-code cluster 6 ZIP-code cluster 7 ZIP-code cluster 8 ZIP-code cluster 9 ZIP-code cluster 10 (reference category)

7.2% 20.6% 9.5% 9.2% 14.6% 9.9% 16.7% 2.8% 3.3% 6.1%

457* 262 138 239 37 −121 −31 −164 29 –

– – – – – – – – – –

– – – – – – – – – –

−2121* −370* 502* −1726* 195 −763* −53 1698*

−1659* −162 −97 −756* 431 −593* −29 971*

PF (physical functioning)b RP (role-physical)b BP (bodily pain)b GH (general health)b VT (vitality)b SF (social functioning)b RE (role-emotional)b MH (mental health)b

0.80 0.72 0.73 0.67 0.64 0.81 0.79 0.74

– – – – – – – –

0 −147 −429 445 483 1647* 2388* 800 −167 221 −171 −150 −33 483 1142* −242

(Multiple) PCGs and DCGS included 0 −196 −459 368 243 1102* 2065* 695 −200 277 −122 −136 −49 382 1172* 70

P.J.A. Stam et al. / Journal of Health Economics 29 (2010) 426–437

433

Table 2 (Continued ) Explanatory variables

Weighted size of subgroupa

REF equation (2b)

Normative equation (1b)

(Single) PCGs and DCGS included

(Multiple) PCGs and DCGS excluded

(Multiple) PCGs and DCGS included

10.3% 4.9% 5.2%

– – –

347* 587* 1182*

376* 619* 881*

Chronic Conditions Index (weighted)c

0.04



8000*

4441*

Hospital output prices (in 100 euro) Distance to the general practitioner (in km) Distance to the hospital (in km)

5.94 0.16 4.38

– – –

76 −69 −2

43 −71 −8

Intercept

1

623*

2696*

2291*

17.9%

7.3%

19.6%

One self-reported OECD limitation Two self-reported OECD limitations Three or more self-reported OECD limitations

R2

Note: See Van Vliet and Prinsze (2003) for a description of the DCG classification used in this study. In case of Eq. (2b) an insured can belong to one PCG and/or one DCG only, in case of the normative Eq. (1b) an insured can belong to multiple PCGs and/or multiple DCGs. a The sizes of the rank-ordered (single) PCGs and DCGs are presented here, those of the (multiple) PCGs and DCGs are not reported in this table. Weighted means are presented instead of subgroup sizes in case of the continuous metric eight SF-36 subscales, the self-reported chronic conditions, hospital output prices, distance to the GP and distance to the hospital. b The ranges of the eight SF-36 scales of [0,100] are rescaled to [0,1] in order for the size of the estimated coefficients to be comparable to those of the estimated coefficients corresponding to the regression dummy variables. People with higher scale scores are healthier. c See Appendix B. The scale is 0–1. * The estimated coefficient is statistically significant from zero (two-sided t-test, p ≤ 0.05). Table 3 The performance of the 2004 Dutch REF equation (2b) (“Demographic + PCGs + DCGs”) and three adjusted REF equations (N = 18,617; 1 = D 1).a . REF equation

Adjusted REF equations

(2b)

(3b)

(5b)

(6b)

1 () No REF adjusters (only a constant included) Demographic Demographic + PCGs Demographic + DCGs Demographic + PCGs + DCGs

0.0% 28.6% 43.0% 44.3% 57.0%

0.0% 28.8% 43.1% 44.4% 57.1%

0.0% 29.2% 43.9% 45.0% 58.4%

0.0% 29.2% 43.9% 45.0% 58.4%

2 () No risk adjusters (only a constant included) Demographic Demographic + PCGs Demographic + DCGs Demographic + PCGs + DCGs

0.0% 18.6% 39.0% 79.9% 90.2%

0.0% 18.7% 39.1% 79.9% 90.3%

0.0% 19.1% 39.6% 80.6% 91.0%

0.1% 19.1% 39.6% 80.6% 91.0%

Note: PCG = (Single) Pharmacy-based Cost Group and DCG = (Single) Diagnosis-based Cost Group. a ˆ = ˆ if REF coefficients are adjusted following equations (3b), (5b) and (6b), respectively. ˆ ) ˆ and  = ( , =˛ ˆ if unadjusted REF coefficients are applied,  = (ω, ˆ ı),

N-type adjusters used in this empirical example. Furthermore, 1 and 2 slightly improve if the adjusted coefficients from Eq. (5b) or (6b) are applied instead. Again, using the N-type adjusters in Eq. (6b) does not make a difference compared to using Eq. (5b). Table 4 shows the performance of the REF equation for the subgroups defined by the REF adjusters insurance eligibility and region. Equality between REF predicted and normative expenditures holds for the subgroups age and gender by construction, as these 0/1 dummy variables are included in the REF equation (2b) and the normative equation (1b) at the same time.14 However, REF predicted and normative expenditures differ significantly from each other for more than one-third of the tabulated subgroups defined by the REF adjusters insurance eligibility and region (p ≤ 0.05). For example, REF predicted expenditures are above normative expenditures for disabled individuals. This result implies that, conditional on the specific composition of the subgroup of disabled individuals in the study sample, risk-adjusted equalization payments should

14 In case of a continuous scale, this equality does not hold at the subgroup level but only at the level of the total population.

be based on D 2791 instead of D 3204 in order to satisfy the policy goals of Dutch government. Table 4 also reveals that, although there is no statistically significant difference between REF predicted expenditures and normative expenditures for the subgroup of employed people (p ≤ 0.05), REF predicted expenditures are D 173 below normative expenditures for the self-employed. Therefore, the hypothesis of Dutch government that the difference of observed expenditures between self-employed and employed individuals was caused by S-type risk factors must be refuted.15 Furthermore, people being on social welfare appear to be undercompensated, whereas the discrepancies between REF predicted and normative expenditures for other subgroups defined by insurance eligibility are not significantly different from zero (p ≤ 0.05).

15 Note that this does not necessarily mean that the decision to define a separate subgroup for self-employed individuals was wrong. In particular, it would come at the expense of REF predicted expenditures for the employed individuals if it was decided otherwise. The best strategy to overcome the misalignment of REF predicted expenditures with normative expenditures for the self-employed is to adjust the coefficients following our normative approach.

434

P.J.A. Stam et al. / Journal of Health Economics 29 (2010) 426–437

Table 4 Average observed, REF predicted and normative expenditures for subgroups of survey respondents defined by the REF adjusters from the 2004 Dutch REF equation (2b) (N = 18,617; 1 = D 1; average expenditures: D 1753).a . Subgroups of enrollees

Weighted size of subgroup (%)

Average observed expenditures

Average REF predicted expenditures

Average normative expenditures

REF predicted–normative expenditures REF equation

Adjusted REF equations

(2b)

(3b)

(5b), (6b)

Insurance eligibility Disabled Employed Social welfare Unemployed Retired Self-employed

9.0% 59.5% 4.1% 4.2% 20.5% 2.8%

3204 965 1689 1597 3573 839

3204 965 1689 1597 3573 839

2791 988 2009 1715 3576 1012

413* −23 −321* −118 −3 −173*

412* −21 −360* −112 −4 −156*

0 0 0 0 0 0

Region ZIP-code cluster 1 ZIP-code cluster 2 ZIP-code cluster 3 ZIP-code cluster 4 ZIP-code cluster 5 ZIP-code cluster 6 ZIP-code cluster 7 ZIP-code cluster 8 ZIP-code cluster 9 ZIP-code cluster 10

7.2% 20.6% 9.5% 9.2% 14.6% 9.9% 16.7% 2.8% 3.3% 6.1%

2052 1976 1639 1847 1699 1582 1649 1335 1550 1682

2052 1976 1639 1847 1699 1582 1649 1335 1550 1682

1800 1893 1591 1756 1730 1779 1747 1518 1627 1682

252* 83 48 91 −31 −197* −99* −183* −77 0

197* 52 19 62 −47 −193* −85 −105 −16 181*

0 0 0 0 0 0 0 0 0 0

100.0%

1753

1753

1753

Total

0

0

0

Note: The zeros in this column indicate that REF predicted expenditures are equal to normative expenditures, which is a result by construction. A conclusion that these adjusted REF predicted expenditures are identical to acceptable costs for the tabulated subgroups is conditional on the assumption that normative expenditures as derived in this study reflect acceptable costs adequately.   NORM a Observed expenditures for a subgroup is calculated as (1/nx ) yi and normative expenditures as (1/nx ) yi , where IX constitutes the set of indices i of the i ∈ Ix

i ∈ Ix

individuals who belong to subgroup X as defined by the REF adjusters in REF equation (2b), and nX equals subgroup size. * Difference between average observed expenditures and normative expenditures is statistically significant (two-sided t-test, p ≤ 0.05).

In the last two columns of Table 4, the results are presented under the scenarios that the adjusted coefficients are used in order to calculate REF predicted expenditures. It follows that the gap between REF predicted expenditures and normative expenditures hardly changes for subgroups defined by insurance eligibility if the N-type adjusters are taken into account during the estimation phase using Eq. (3b). This approach appears to only slightly remove the cost variation caused by N-type risk factors; with respect to the subgroup of people being on social welfare it even worsens the undercompensation. By construction, an adjustment of the REF coefficients using Eq. (5b) or (6b) removes the entire gap for each subgroup. Table 4 also compares REF predicted to normative expenditures for subgroups defined by the 2004 regional REF adjuster. REF predicted expenditures for people living in the first cluster of ZIP-codes appear to be D 252 above normative expenditures, whereas ZIPcode clusters 6, 7 and 8 contain individuals with REF predicted expenditures which lie between D 99 and D 197 below normative expenditures. Furthermore, the deviations from normative expenditures are reduced if REF equation (3b) is applied instead (2b), except for those living in the fifth ZIP-code cluster and especially not for those living in the tenth cluster. Again, an adjustment of the REF coefficients using Eq. (5b) or (6b) removes the entire gap for each subgroup. 5. Conclusion A new method is introduced to assess and improve the performance of risk equalization models in case the sponsor intends subsidization for S-type risk factors alone, and not for N-type risk factors. As an empirical illustration, the new method is applied to the 2004 Dutch REF model. The S-type

adjusters in the normative equation are drawn from a broad array of health status indicators, which were made available by merging administrative data with data from a tailor-made health survey for almost 19,000 respondents. At the population level of analysis, the performance of the 2004 Dutch REF model appears to be 90.2% in terms of the performance measure 2 . This means that 90.2% of the variation in normative expenditures is captured by the REF adjusters. Of course, this result only holds under the assumption that normative expenditures in this empirical illustration reflect acceptable costs adequately. If the REF coefficients are adjusted following the normative approach developed in this study, then performance at the population level of analysis increases to 91.0%, which seems to be a modest improvement. Furthermore, performance hardly improves at all from including measures of N-type risk factors in the REF equation, at least not given the set of N-type adjusters used in this empirical example. This result is independent from the choice between an adjustment of the coefficients according to the Schokkaert approach or according to the normative adjustment approach developed in this study. The conclusion must be that performance of the 2004 Dutch REF equation can be improved by using adjusted REF coefficients according to the normative approach proposed in this study, mainly because this procedure exploits the extra information of the more (precise) measures of the S-type risk factors available in the subsample. At the subgroup level of analysis, substantial differences exist between REF predicted expenditures and normative expenditures for the subgroups defined by the REF adjusters. If the crosssubsidies are based on REF predicted expenditures, then the average risk-adjusted equalization payment for disabled insured

P.J.A. Stam et al. / Journal of Health Economics 29 (2010) 426–437

people is higher than the sponsor intends, whereas for those on social welfare and being self-employed it is lower than intended. Furthermore, cross-subsidies among regions appear also to be misaligned to some extent. Any such difference is an estimate of the marginal effect on observed expenditures for which the sponsor does not desire cross-subsidization. Thus, at the subgroup level of analysis, there is ample room for improvement of the REF model by application of the new method proposed in this study. Moreover, if the REF coefficients are adjusted following our approach and applied instead of the unadjusted REF coefficients, then the risk-adjusted equalization payments to the subgroups defined by the REF adjusters insurance eligibility and region will even be exactly aligned to normative expenditures as defined for the study sample. Therefore, to the extent that normative expenditures reflect acceptable costs adequately, unintended subsidies among the subgroups defined by the REF adjusters can be avoided entirely if the sponsor uses these adjusted REF coefficients instead of the unadjusted ones in the REF equation.

6. Discussion The method developed in this study applies to any retrospective and prospective system of risk-adjusted equalization payments that is implemented in a competitive market for individual health insurance with mandatory contributions to a REF for a specified benefits package. The performance of any risk equalization model can be assessed against a benchmark model that is developed for a limited sample of insured people. Furthermore, performance can be improved by an adjustment of the REF coefficients that are used to calculate the risk-adjusted equalization payments, following the normative approach proposed in this study. Moreover, it is recommended that the sponsor always uses these adjusted REF coefficients instead of the unadjusted REF coefficients in the REF equation. A noticeable difference between this approach and the Schokkaert approach is that the REF coefficients are adjusted to better predict both S-type and N-type expenditure variation following the normative approach, whereas the Schokkaert approach adjusts for N-type expenditure variation alone. The method developed in this study is independent from the context of the health insurance markets. It may be applied if insurers are free to and capable of setting their premiums to an individual’s risk, but also if premiums are regulated (for example, community-rating). This contrasts with the conventional method for measuring performance which cannot be applied if insurers are allowed to risk-rate their premiums. The conventional method is to determine the extent to which incentives for risk selection are removed by the risk equalization model under study. This is done by tabulation of regulation- or transaction-costs-induced predictable profits and losses, with a focus on those for subgroups of high-risk insured people. However, if insurers are free to and capable of risk-rating their premiums, then predictable profits and losses will not exist and the conventional method does not apply. Under premium rate regulation, predictable profits and losses will be zero for subgroups defined by the REF adjusters. However, if the coefficients are adjusted following our normative approach in order to improve performance, then this will no longer be the case to the extent that the unadjusted and adjusted coefficients differ from each other. In order to avoid incentives for selection and their adverse effects with respect to the subgroups defined by the REF adjusters, application of these adjusted coefficients should therefore go hand in hand with the ability of insurers to risk-rate

435

their premiums across the subgroups defined by the REF adjusters. Thus, premium rate regulation with respect to the REF adjusters should be avoided. The procedure that is used to test the REF equation that a sponsor applies can also be used to test alternative specifications. The risk-adjusted equalization payments can possibly be made more effective not only by adding new risk adjusters to the REF equation (Stam, 2007), but also by the construction of an ex-post risk sharing arrangement (Van Barneveld et al., 2001), or choosing an alternative functional specification of the REF equation, for example. Notice that if the REF equation is supplemented by an ex-post risk sharing arrangement between insurers and the sponsor, it comes at the expense of insurers’ incentives for efficiency in production. Therefore, in case of ex-post risk-sharing, there is a trade-off between performance of the risk-adjusted equalization payments and efficiency. Within a competitive health insurance market without risk equalization, premium rebates under the option of voluntary deductibles will reflect expenditure variation caused by S-type risk factors as a consequence of adverse selection (Van Kleef et al., 2006). This will still be the case if subsidies are based on imperfect REF adjusters. The extent to which this is the case can be determined following the new method developed in this study. Furthermore, the level of the voluntary deductible chosen by the insured people may be included in the REF equation as a proxy for health status. Although implementation of this proxy as a new REF adjuster may also induce undesired subsidies for N-type expenditure variation such as caused by moral hazard, these effects can be explicitly weighed against each other by application of the theoretical framework developed in this study. An adjustment of the corresponding REF coefficient can be applied in order to avoid compensation for these N-type effects. In general, a sponsor may choose any type of model, apply any estimation technique and use any available dataset in order to define normative expenditures. For example, a Generalized Linear Model (GLM) or even non-linear model may be specified, the coefficients of which can be estimated from a panel dataset by maximum likelihood. However, for ease of exposition, we assumed in our theoretical framework that the sponsor confines herself to a linear specification, the OLS estimation technique and a single cross-section as is commonly used in the context of risk equalization. The new method for measuring and improving risk equalization models is illustrated by an empirical example, but the data have certain drawbacks. First of all, the data in this study are representative for only one Dutch insurer. Second, it may be possible that the S-type adjusters included in the normative equation (1a) in this study do not capture expenditure variation caused by S-type risk factors to the full extent. The range of health indicators may be expanded depending upon availability in future studies. In this sense, the new method developed in this study produces an upper bound on the extent to which the REF equation induces the subsidies that the sponsor desires, if one expects that better S-type adjusters would induce more variation in normative expenditures. In that case, the 2 values probably will turn out to be smaller than those presented in Table 3. In other words, given the implementation of normative expenditures in this study, the performance of the REF models as determined here will probably indicate a maximum for the extent to which the policy goals of the Dutch government are met. Third, endogeneity of the S-type adjusters may still be an issue. The set of N-type adjusters included in the empirical illustration here may still be too limited (think of lifestyle-related health-effects, for example).

436

P.J.A. Stam et al. / Journal of Health Economics 29 (2010) 426–437

In sum, the new method for measuring and improving the performance of risk equalization models as developed in this study can be applied by a sponsor in several ways, and it is relevant to all countries with competitive health insurance markets. This new method is recommended to assess and improve the subsidies of risk equalization models in these cases. Acknowledgements We would like to thank Agis Health Insurance for kindly providing data and human resources, their members for taking the time to fill out the Agis Health Survey 2001 and Xander Koolman for comments on earlier drafts. We are grateful to two anonymous referees for careful and thoughtful comments and suggestions which helped to improve the manuscript. Appendix A. Construction of the REF adjuster region and the three N-type adjusters A.1. The REF adjuster region Since 2002, the Dutch regional REF adjuster is no longer the result of a classification of the about 4500 four-digit ZIP-codes into five regions according to the degree of urbanization, but into ten clusters of ZIP-codes of not necessarily adjacent geographic areas instead. The construction of these ten clusters starts with calculating the individual level differences between actual expenditures and REF predicted expenditures, where the set of REF adjusters consists of age, gender, membership eligibility, PCGs and DCGs (i.e. exclusive of any regional variable). The second step is to aggregate these individual level differences to the four-digit ZIP-code level and subsequently regress these on ZIP-code level data on health status (Standardized Mortality Rate (SMR), for example), consumer preferences (degree of urbanization, for example) and health care supply (number of hospital beds, for example) for which the sponsor assumes that these are related to health status or supply-side characteristics which in the short term can (almost) not be influenced by insurers’ policies. The third and last step is to cluster the predictions that result from this latter regression into ten groups of four-digit ZIP-codes by application of the Ward (1963) clustering method.16 A.2. The three N-type adjusters Hospital output price 2002 is defined as a weighted average of hospital fees for a one day hospital stay per ZIP-code, where the weights are the number of 2001 outpatient contacts that Agis enrollees living in that ZIP-code had with hospitals.17 Hospital output prices differ substantially between hospitals and may be seen as N-type adjusters that induce cost variation for which insurers can be held responsible. The distance to the nearest health care facility, i.e. hospital location or GP office, may be an important determinant of individual health care use in that it measures health care accessibility and time price of health care use for an insured individual. Distances to the nearest hospital and GP are measured between the centroids of four-digit ZIP-codes in 2001. They are equal to zero if the health care facility’s ZIP-code equals that of the insured individual.

16 The clustering with respect to the 2004 Dutch risk equalization model is based on national claims data prior to 2002. 17 These weights are based on the outpatient contacts of the 1.6 million Agis enrollees in 2001.

It should be noted that in 2004, the Dutch sponsor used distance to the GP and distance to the hospital as explanatory variables in the second step of the three-step ZIP-code clustering procedure described above. The assumption was that any simultaneity between aggregate expenditures and distance at the ZIP-code level of analysis would reflect the common influence of average regional health status differences not yet captured by the REF adjusters. Simultaneity bias may be assumed negligible at an individual level of analysis, however, because it is unrealistic to expect that individual health status will be a determinant of distance to health care facilities in a ZIP-code region. Therefore, given an individual level of analysis in our study, the distance variables can be interpreted as Ntype adjusters that avoid estimation bias in the S-type coefficients caused by variation, across ZIP-codes, in individual expenditures given individual health status. Appendix B. Description of the health survey data Following the Dillman (1978) mailing procedure guidelines, the postal health survey was sent to a stratified sample of 50,022 non-institutionalized sickness fund members of between 16 and 90 years of age. Gross response is 46.3% (23,163 respondents). Nonresponse analysis showed that standardization for selective nonresponse is not needed for the purpose of our study (Stam, 2007). A net response of 22,029 records remained after validity and completeness checks, following the CAHPS 3.0 Adult Commercial Questionnaire (CAHPS, 2002) guidelines. The eight SF-36 Likert (1932) scales are general health measures, called physical functioning (PF), role-physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), roleemotional (RE) and mental health (MH) (Ware and Sherbourne, 1992; McHorney et al., 1993; McHorney et al., 1994; Ware et al., 2000). They are derived for 18,617 respondents and appear to be both reliable and valid (Stam, 2007). Sample size is relatively high compared to other studies (Stam, 2007). The eight scales are used instead of the SF-36 Physical (PCS) and Mental (MCS) Component Summary scales to capture the maximum extent of systematic variation present in this study sample. Functional status is measured by responses to seven items from the OECD Questionnaire, which concern communication, visual and mobility problems (McWhinnie, 1981; Van Sonsbeek, 1988). Each OECD item concerns a question about with how much effort a specific task could be performed. The functional status measure is a count of the number of questions answered by “with a lot of effort” or “I cannot do this”. The interpretation is that functional status worsens as this count approaches seven. Finally, a weighted so-called Chronic Conditions Index (CCI) was developed that takes into account both the number and the importance of 20 self-reported long-term diseases and conditions (Van Vliet, 1992). The relative importance weight is based on average expenditures in 2001 reporting that condition. Some examples of the chronic conditions are: diabetes mellitus, stroke, heart conditions, cancer, hypertension, urinary incontinence, hernia, and osteoarthritis. A chronic condition is only taken into account if the respondent indicates that he/she still has complaints or is under treatment at the time of filling out the questionnaire. References Carr-Hill, R.A., Hardman, G., Martin, S., Peacock, S., Sheldon, T.A., Smith, P., 1994. A Formula for Distributing NHS Revenues based on Small Area Use of Hospital Beds. University of York, York. Dillman, D.A., 1978. Mail and Telephone Surveys: The Total Design Method. John Wiley & Sons, New York. Ellis, R.P., 1998. Creaming, skimping and dumping: provider competition on the intensive and extensive margins. Journal of Health Economics 17 (5), 537–556.

P.J.A. Stam et al. / Journal of Health Economics 29 (2010) 426–437 Frank, R., Glazer, J., McGuire, T., 2000. Measuring adverse selection in managed health care. Journal of Health Economics 19, 829–854. Glazer, J., McGuire, T., 2000. Optimal risk adjustment of health insurance premiums: an application to managed care. American Economic Review 90 (4), 1055–1071. Glazer, J., McGuire, T., 2002. Setting health plan premiums to ensure efficient quality in health care: minimum variance optimal risk adjustment. Journal of Public Economics 84 (2), 153–175. Hausman, J.A., 1978. Specification tests in econometrics. Econometrica 46 (6), 1251–1271. Hoogervorst, J.F., 2005. Besluit Zorgverzekering. Ministry of Health, Welfare and Sports, vol. 389. SDU Uitgevers, The Hague, pp. 23. Hornbrook, M.C., Goodman, M.J., 1996. Chronic disease, functional health status, and demographics: a multi-dimensional approach to risk adjustment. Health Services Research 31 (1), 283–307. Kautter, J., Pope, G.C., 2005. CMS frailty adjustment model. Health Care Financing Review 26 (2), 1–19. Lamers, L.M., 1998. Risk-adjusted premium subsidies: developing a diagnostic cost groups classification for the Dutch situation. Health Policy 45, 15–32. Lamers, L.M., 1999. Pharmacy Costs Groups: a risk-adjuster for capitation payments based on the use of prescribed drugs. Medical Care 37 (8), 824–830. Lamers, L.M., Van Vliet, R.C.J.A., 2003. Health based risk adjustment: improving the pharmacy-based cost group model to reduce gaming possibilities. European Journal of Health Economics 4 (2), 107–114. Likert, R., 1932. A technique for the measurement of attitudes. Archives of Psychology 140, 5–55. McHorney, C.A., Ware, J.E., Lu, R.L., Sherbourne, D., 1994. The MOS 36-item Short Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Medical Care 32 (1), 40–66. McHorney, C.A., Ware, J.E., Raczek, A.E., 1993. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Medical Care 31 (3), 247–263. McWhinnie, J.R., 1981. Disability assessment in population surveys: results of the O.E.C.D. common development effort. Revue d’Epidemiologie et de Santé Publique 29, 413–419. Murray, C.J., Frenk, J., 2000. A framework for assessing the performance of health systems. Bulletin of the World Health Organization 78 (6), 717–731. Newhouse, J.P., Manning, W.G., Keeler, E.B., Sloss, E.M., 1989. Adjusting capitation rates using objective health measurers and prior utilization. Health Care Financing Review 10 (3), 41–54. Pope, G.C., Ellis, R.P., Ash, A.S., Liu, C.F., Ayanian, J.Z., Bates, D.W., Burstin, H., Iezzoni, L.I., Ingber, M.J., 2000. Principal inpatient diagnostic cost group model for Medicare risk adjustment. Health Care Financing Review 21 (3), 93–118. Sappington, D.E.M., Lewis, T.R., 1999. Using subjective risk adjusting to prevent patient dumping in the health care industry. Journal of Economics and Management Strategy 8 (3), 351–382.

437

Schokkaert, E., Dhaene, G., Van de Voorde, C., 1998. Risk adjustment and the tradeoff between efficiency and risk selection: an application of the theory of fair compensation. Health Economics 7, 465–480. Schokkaert, E., Van de Voorde, C., 2000. Risk adjustment and the fear of markets: the case of Belgium. Health Care Management Science 3, 121–130. Schokkaert, E., Van de Voorde, C., 2004. Risk selection and the specification of the conventional risk adjustment formula. Journal of Health Economics 23, 1237–1259. Stam, P.J.A., 2007. Testing the effectiveness of risk equalization models in health insurance. Ph.D. Dissertation. Erasmus University Rotterdam, Rotterdam. Available at: http://www.stamonline.nl/phdthesis. Stam, P.J.A., Van Vliet, R.C.J.A., Van de Ven, W.P.M.M., 2010. Diagnostic, pharmacybased and self-reported health measures in risk equalization models. Medical Care 48 (5). Van Barneveld, E.M., Lamers, L.M., Van Vliet, R.C.J.A., Van de Ven, W.P.M.M., 2001. Risk sharing as a supplement to imperfect capitation: a tradeoff between selection and efficiency. Journal of Health Economics 20, 147–168. Van de Ven, W.P.M.M., Ellis, R.P., 2000. Risk adjustment in competitive health plan markets. In: Culyer, A.J., Newhouse, J.P. (Eds.), Handbook of Health Economics, vol. 1. Elsevier Science BV, Amsterdam, pp. 755–845. Van de Ven, W.P.M.M., Van Vliet, R.C.J.A., Schut, F.T., Van Barneveld, E.M., 2000. Access to coverage for high-risks in a competitive individual health insurance market: via premium rate restrictions or risk-adjusted premium subsidies? Journal of Health Economics 19, 311–339. Van de Ven, W.P.M.M., Van Vliet, R.C.J.A., Lamers, L.M., 2004. Health-adjusted premium subsidies in the Netherlands. Health Affairs 23, 45–55. Van Kleef, R.C., Van de Ven, W.P.M.M., Van Vliet, R.C.J.A., 2006. A voluntary deductible in social health insurance with risk equalization: community-rated or risk-rated premium rebate? The Journal of Risk and Insurance 73 (3), 529–550. Van Sonsbeek, J.L.A., 1988. Gezondheidsenquêtes: methodische en inhoudelijke aspecten van de OESO-indicator betreffende langdurige beperkingen in het lichamelijk functioneren. CBS Maandbericht Gezondheidsstatistiek 7, 4–17. Van Vliet, R.C.J.A., Prinsze, F.J., 2003. Eindrapportage: Onderhoud FKG’s en nader vervolgonderzoek naar DKG’s voor toepassing in het ZFW-verdeelmodel 2004. Erasmus University Rotterdam, Rotterdam. Ward, J.H., 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58 (2), 236–244. Ware, J.E., Sherbourne, C.D., 1992. The MOS 36-item short-form health survey (SF-36): I. Conceptual framework and item selection. Medical Care 30 (6), 473–483. Ware, J.E., Snow, K.K., Kosinski, M., 2000. SF-36 Health Survey: Manual and Interpretation Guide. QualityMetric Incorporated, Lincoln RI.