Sot.
Sci.
Med.
Vol.
34, No.
9, pp.
1057-1068,
0277-9536192
1992
S5.00
+ 0.00
Copyright 0 1992Pergamon Press Ltd
Printed in Great Britain. All rights reserved
INTERNATIONAL HEALTH SPENDING FORECASTS: CONCEPTS AND EVALUATION THOMASE. GETZEN’ and JEAN-PIERRE POULLIER’ ‘Health Care Finance, Temple University, Philadelphia, PA 19122, U.S.A. and *Directorate for Social Affairs, Manpower and Education, Organization for Economic Co-operation and Development, 2 rue Andre Pascal, 75775 Paris Cedex 16, France Abstract-Health care depends on the organizational and financial decisions which constituted each national system. Since those decisions were made at various times over the preceding years under different macroeconomic conditions, current expenditures are a distributed lag function of GDP growth and inflation rates. The accuracy of forelasts from such causal econometric models are compared to exoonential smoothine. movine average. and ARIMA methods. Data for 19 OECD countries 1965-79 r are used for calibration, and then ex unre forecasts are generated for 198047 so that actual forecast accuracy can be tested. The greatest reduction in mean absolute error was obtained with the econometric model estimated in aggregate across all 19 countries, although single-country models, exponential smoothing and international averaging were also effective. A combination of all four forecasts was more accurate than any one alone, reducing MAE by 25% relative to a constant growth projection. li
Key words-forecasting,
_
national health expenditures, international comparisons
Forecasts of national health expenditures (NHE) are required by the governments which pay for public health systems, the administrators who must plan for construction and labor force expansion, and by the ancillary supply industries (medical equipment, pharmaceuticals) whose production and inventories must be adjusted to accomodate future sales. Although numerous forecasts have been made in different countries over the years, they have used different definitions and methods from year to year, and were frequently based on or adjusted by ‘professional judgement’ so that no consistent evaluation of accuracy can be made. Even in those few cases where a series could be constructed comparing forecast and actual values, there are not enough observations to make generalizations regarding relative merits. This article reports the first systematic study of health expenditure forecast accuracy. Six methods are calibrated ex post over the 1965-79 period, and then used to make ex ante forecasts for the 1980-87 period. The mean absolute error (MAE) of each forecast is reported. The OECD health data files for 19 countries provides 280 observations for parameter calibration and 151 observations for testing accuracy. Measurement of forecast accuracy focuses attention on what policy makers do, and do not, know about the workings of the health system. The impact of service reorganization, new technology, cost con*A common error occurs when cost controls are slapped on in response to a temporary surge in spending or a recession induced fall in revenues, and then to have the government claim that such controls ‘worked’ when the rate of increase moderates, as it almost inevitably does under these conditions with or without policy intervention.
trols or physician restrictions can be determined only by comparison with an existing accurate forecast of what the trend in expenditures would have been in the absence of new initiatives.* Evaluation and organizational analysis therefore require forecasting tools almost as much as does planning. FORECAST METHODS
Forecasting methods can be ranked by the amount of theory and information required. The simplest, a null hypothesis of ‘no change’ since the last observation, is the ‘naive’ forecast universally accepted as the standard against which other methods can be measured. In this study, the naive forecast is that the percentage growth rate in real (inflation adjusted) health expenditures this year will be the same as last year. The second method is to use more historical observations to discover trends and cycles among the data. While analysis may be. of varying sophisticationmoving average, exponential smoothing, regression, ARIMA-all are limited to the information contained in past value of the series itself. Model selection is ‘data driven’ and has no a priori conceptual basis. Hence these univariate analyses are ‘atheoretical’ in that they do not determine causality, only patterns. All four of the above were considered to be variants of the second forecasting method. Exponential smoothing provided the best fit to the data and therefore those results are presented in detail, with the other forms being more briefly discussed. Projecting each country’s growth rate using the average for all countries, the third method examined here, makes a limited, but important, theoretical
1057
1058
THOMAS
_
1%--
-4x
E.GETZEN and
-2% A.
0%
2%
One-Year
Income
JEAN-PIERRE POULLIER
1%
4%
6%
2%
3m
B. Weighted
4+ Seven-Year
5x
6%
Average
Income
7X
Fig. I. Annual percentage change in real per capita income and health expenditures, U.S. 196687.
assertion-that health systems are subject to some common forces and should move together. The next step is to discover what those forces are and build a structural model. Yet identifying the causes of national health spending will not aid in prediction unless (a) the causal variables can be predicted in advance or (b) there is a lag between cause and effect. Macroeconomic conditions drive health care spending, and form the basis for the causal econometric models developed for the fourth and fifth forecast methods.* The health system is conceptualized as an implicit contract between politicians, professionals and the public. Since the decisions which define the health care system were made at varying times in the past under differing macroeconomic conditions, current spending is a distributed lag function of income growth and inflation. The sixth forecast method evaluated here is a combination of the econometric, smoothing and international average methods. The applications of univariate forecast methods such as exponential smoothing or ARIMA modelling, while unfamiliar to some health economists, are well treated in standard texts such as Granger and Newbold [I] or Makridakis, Wheelwright and *Univariate ARIMA analysis is sometimes mistakenly called ‘econometric’ because it is somewhat complex and usually applied by those trained in econometrics, but appropriately should not be. To qualify as ‘econometric’ a model must have some economics, some independent variable which can cause changes in the dependent variable-a condition which no univariate analysis can be met. Used correctly, the terms ‘causal’ and ‘econometric’ both require that there be at least two variables in the equation, not just past values of the same variable. The semantic distinction is worth maintaining since the logical distinction which it signifies is quite important. tThe lag function is: lag = 0.20 x year, + 0.05 x year_, + 0.10 x year_, + 0.13 x year_, + 0.37 x year_, + 0.15 x year-,
McGee [2]. The lagged adjustment model [3] offers more theoretical insight into long-run organizational change, and will therefore be described at some length before turning to the empirical results. A XlACROECONOMlC
MODEL OF NATIONAL EXPENDITURE
HEALTH
Growth in GDP is the most important source of growth in national health expenditures [4-81. The actual process of budget adjustment, however, is not well understood and possibly quite complex. Spending plans are based on organizational dynamics, accumulated surpluses and deficits, expectations of future growth in revenues, wages, supply costs, technology, etc. The planned level of spending is in turn displaced by unanticipated inflation, strikes, recession, epidemics, etc. To say that spending will adjust to GDP opens a number of questions. Adjust to nominal or real GDP growth? Is adjustment instantaneous or delayed? Are there differences between nations or between types of medical care in the temporal adjustment process? Does adjustment ever overshoot? Do nominal prices and real quantities adjust at the same rate? The health sector tends to lag the rest of the economy, so that its growth in any particular year is a function of income and inflation over several previous years [9]. Since all these effects from different years are confounded together, it is not surprising the contemporaneous association between GDP growth and health spending is essentially nil (Fig. 1A). A distributed lag model [lo] allows for the inclusion of responses based on permanent income, rational expectations of future values, and slow or delayed adjustment, and the dominant long run income effects are clearly revealed (Fig. lB).t The temporal dynamics of inflation adjustments are less complex, and must be made prior to examining real effects, and
International health spending forecasts so will be discussed
before the more elaborate
GDP
model is presented. Inflation effects
It has long been recognized that inflation is responsible for most nominal growth in health expenditures for all countries, but has little real effect in the long run as both costs and revenues rise by the same percentage. Dividing income and expenditures by the GDP deflator factors out inflation and allows for comparison of real amount over time. Yet it the rate of inflation is temporarily different in the health sector than in the rest of the economy, this process may impart significant. albeit transitory, fluctuations. Consider the following hypothetical example. Budgets are made assuming that the number of nurses remain the constant, and inflation will be 5% annually, with a wage contract renegotiated each December. 1996 goes according to plan, but in 1997 inflation is, unexpectedly, 12%.
Price index Nominal spending Real spending
-1995 -1996 ~1997 _1998 122 100 105 117 51000 $1050 $1100 $1220 1000 1000 940 1000
The excess unanticipated inflation in 1997 causes a temporary decline in ‘real’ deflated health expenditures. The deficit is made up during the next year by a wage increase larger than the rate of inflation, so the ‘inflation effect’ is entirely transitory. The U.S. Bureau of Labor Statistics health care wage index for 1973-87 rose on average f% faster than the GNP deflator, but with a lag. Sixty-two percent of the rise in inflation from one year to the next was not incorporated into health wages until the following year, and 11% was still missing after 2 years, indicating substantial contractual rigidity. Workers bear the brunt of the decline in real health spending to the extent that wages are slow to adjust when inflation accelerates. The number of workers and quantity of supplies, equipment, food, etc, are also likely to decline as fixed expenditure budgets buy fewer resources than anticipated. Conversely, when the rate of inflation turns out to be lower than expected, wage increases negotiated in prior years provide workers with windfall gains, and prospective budgets allow excessive employment and equipment purchase to occur. Some lag in spending appears to occur for most consumption expenditures. Deaton [I l] estimates that l/2 of all unanticipated inflation is involuntarily left unspent by consumers in the U.S. and U.K. economies, although with a much shorter quarterly lag. Health care prices appear to adjust more slowly than most other prices, yet the extent of the lag probably varies widely between countries due to differences in organization, accounting and financing methods. Conceivably, health prices could lead rather than lag in some country if large hospital wage settlements initiated new inflation. SSM
3419-I
1059
Difficulties in the timing of financial reports can create similar fluctuations in measured spending even if there are no real differences. If some reports are filed late, the total expenditures will be low during a period of rising inflation, and artificially high during deflation. Many organizations report expenses for a fiscal year ending in June. If national health expenditures are accumulated for a calendar year ending in December, a 6-month lag is built in, and excess inflation will not distort measured real spending. Budgeting in physical units (manpower FTEs, beds constructed), fixes the quantity of real resources applied to medical care. However, the monetary amount must then fluctuate with the price level. This can be accommodated only within flexible financing systems, such as the U.S. period of full-cost reimbursement from 1965-80, or economies which routinely index all contracts to neutralize price changes-Brazil for example. The service contracts which are characteristic of most public and private health insurance are even more subject to nominal revision since dollar expenditures must be varied not only to accommodate price changes, but changes in the intensity of services (real resources per person) as well. Time series estimates can be improved by making adjustments for the transitory effect of price and accounting lags. The change between the OECD Outlook forecast and actual inflation, and the change between this year and last, are used as proxy measures of unanticipated inflation. These ups and downs tend to offset each other, netting out to zero over time, but on a year-to-year basis may cause large fluctuations in the rate of health expenditure growth. The long run trend, however, is unaffected by inflation and is fundamentally related to real economic growth as measured by deflated permanent GDP [12]. Adjustment to real GDP: the decision process
The health system is constituted by the decisions which define its organizational and financial structure, a sort of implicit contract accumulated over many years between government, health institutions and professions, employers, and the public. The health sector cannot respond immediately to changes in GDP. Budgets are set in advance, cannot be easily cut (or expanded), and may reflect deficits accumulated over several years. In Spain, for example, the rate of GDP growth dropped sharply from 5% annually before 1960-74, to 0.6% annually from 1974-84. Health spending grew by 12% annually 1964-74, continued at an 8% rate 1974-76, then 4% 1976-78, and fell to 0.5% 1978-84. The lag structure arises from three conceptually distinct, but sometimes empirically indistinguishable, temporal responses to a change in GDP; delayed, current, and anticipatory. Consider some type of current expenditures, for example nursing home services. The organizational framework for spending is
1060
THOMAS E. GETZEN and JEAN-PIERRE POULLIER
largely set by decisions already made at varying times in the past; provision of social insurance benefits, funding of training programs, staffing regulations, construction of facilities, etc. These decisions are not immutable, but are slow to change, and very hard to change by much in the short term. Even if a facility could legally terminate half its staff or remove patients’ televisions, it would probably be disastrous to do so. The sluggishness of individual behaviors, organizations, financing mechanisms and government policy creates variable delays between the time when decisions are made and their consequences in national health spending. Previous decisions keep the structure relatively fixed, but the amount actually spent will be strongly affected by current wages, unemployment and other economic conditions. Changes in energy and international pharmaceutical prices, rising in payroll taxes, etc. will be passed through from the general economy to the health sector entirely; others will be partially transmitted. Current fiscal conditions will always be influential because administrators can use their powers of interpretation to bring about more or less generous results from existing legislation. Anticipations matter because long term decisions must be made with primary regard to macroeconomic conditions in the future when the implied financial obligations must be met. How large an increase in medical costs the country can afford depends on whether a boom or a recession is expected. Expectations of the future are, however, based mostly on the past patterns of GDP growth. Furthermore, the decisions made last year were made based on what expectations were then. Thus while future GDP is more relevant than current GDP, the expectations which affected decision-making are mostly old, and were formed based on what was already past history then.
Let ‘d’ represent the delay factor, so that ‘d, _ 2’ is the fraction of all expenditure decisions accounted for by decisions made two years previously. Finally let ‘c’ be the current factor, the fraction of expenditures determined by current realized conditions (prices, disposable income). Note that income in the current year, G,, has two conceptually distinct effects on this years expenditure; it affects the anticipation of future GNP and hence the decisions made in this year, and it acts directly upon the realized value of those decisions through its current effect on prices and disposable income. H, = % change in real per capita health expenditures; G, = % change in real per capita GDP; E, = expected values of G,, , , G,,?, G,+,, etc.; e, = weights on previous years income G, _ , used to construct E,; c = fraction of H dependent on current economy in t,; d, = fraction of expenditures dependent upon decisions made in year t (Zd, = 1); I, = observed (estimated) lag coefficient. Health expenditures are a weighted function of decisions made in prior years according to expectations prevalent then. H,=H(E,,E,_,,E,_,....,d,,d,_,, , c, G,)
42,. Using
simple adaptive
expectations:
E,=e,G,+e,G,_, +e,G,_,+
... +e,G,_,
(Ze,=
Substituting: H,=cG,+[l
-c][d,E,+d,E_,
+ d, E_, + . . + d, E-J cGO+ [I -
~1Mde&
+e,G_,+e2G_2+
. ..+e.G_,)
A model of the lag structure
+d,(e,G,+e,G_,+ezG-,+
*..+e,G_._,)
Medical expenditures in this year depend upon the decisions (contracts) made during previous years, based upon expectations of the future economic conditions which were then present. The realized cost of those obligations also depends upon current economic conditions now as the implicit service contracts are fulfilled. Let H, represent the rate of health expenditure growth in year t, and G, the rate real GDP per capita growth. These are the observed actual values. Let E, represent the expectations in year t of incomes in all future years t + I, t + 2, t + 3, and so on, e.g. El990 represents our current set of expectation for GDP in all future years. While specifying estimators for expectations is still the subject of considerable econometric controversy, all take the same general form: a function of prior values of the series and some ‘other’ relevant information. For simplicity, assume that expectations are formed adaptively and hence are fully defined by a lag function of prior GDP growth.
+d2(eOG2+e,G-,+e2G_,+
... +e,G_,_,)
f...
+dz(eOGz+e,G,_3+e2G,_4
f..,
+
c,G-,-,)I
Rearranging H,=cG,,+[l
terms: -c][d,,e,JGO+[l
-cl[4e,+d,e01G_,
+ [l - cJ[d,e, + d,e, + dze,,]G-2 + [I - c][d,e, + d,e, + dze, + d,e,]G_, +...
+[I
-c][doe,+d,e,_,+...
d,e,JG_,+
... +[I -c][d,_,e,+d,e,_,]
x G_,_,_,
+ [l - c][d~e,]G_._,
Each row is a lag term in the observed I, = c + [l - c][d,,e,] /,=[I-c][d,e,+d,e,]
+
estimate:
1)
International
health spending forecasts
12= [l - c][d,e? + d, e, + dzeo] &=[I -c][d,e,+d,e,+d,e,+d,e,l /,=[l
-c][d,e,+d,e,_,+
1n+Z= [I -
... +d,e,]
cl[4enl
Thus the observed lag terms are not estimates of expectations, delays or current effects of income, but rather a compound of all three. The lag coefficients are reverse cross products of the delay terms ‘d’ and the expectation formation terms ‘e.’ Unless both the delay and the expectations terms decay rapidly, this cross multiplication and accumulation of terms will create a ‘hump-shaped’ response function, with an initial peat at lag 0 due to the current effects, and another several years later when the accumulation if greatest. Consider the following numerical example: current = 0.25, expectation weights = 0.4,0.3,0.2,0.1 decision delays 0.28, 0.26, 0.24, 0.22. The observable lag structure would be: 0.33, 0.14, 0.17, 0.18, 0.11, 0.05, 0.02 (Fig. 2). This simplified model assumes that the expectations E, are functions of prior values of GNP alone. Although it would seem that explanatory power could be greatly improved by including forecast GDP as well, published forecasts are in fact primarily
1061
a function of past values, adding little new information. Also, to the extent that an old forecast was correct, that information is already included in the lag values, except for the most recent year. The influence of anticipations per se is clearly visible only when those anticipations were wrong and consumption decisions were made on them. Empirical tests using historical GDP estimates reported in the OECD Outlook indicates that such ‘other’ expectation factors do have a small positive effect on expenditures. Comparison of the single country models with the pooled model encompassing all countries provides a measure of the importance of country specific parameters vs more precise but more general parameters obtained by combining all observations from all countries together. A compromise is to use a pooled analysis but include country specific dummy variables for intercept or slope. Similarly, dummy variables could be used for sets of related countries (e.g. Scandinavian, English-speaking, etc.). In general, such dummy variables did not have sufficient explanatory power to improve adjusted R2. Alternative functional forms, specification, and error diagnostics are discussed at length in Getzen [9]. Many of these issues are also addressed elsewhere [ 131.
current
(1990)
effects
1990
decsions
1989
decisions
1988
decisions
1987
decisions
Cumulative
Fig. 2. Lag structure of income effects.
1062
THOMASE. GETZEN and JEAN-PIERRE POULLIER
Ocher variables
The parsimonious econometric model using only two terms, past and expected future income, appears adequate to capture the effects of changes in GDP. What about other factors such as aging, physician and hospital supply, or technological change, which are also thought to influence national health expenditures? What about the organizationof governmental, political changes. health insurance system and degree of public financing. which might affect the relationship between national income and expenditures? Aging is frequently mentioned as a cause of higher medical expenditure. And indeed, the percentage of the population over age 65 is positively correlated with the share of GNP spent on health. Yet increased longevity and reduced birthrates are themselves attributable in part to economic development. Once per capita income effects are included in a fully specified model, population aging no longer has any significant independent effect on the health spending [9]. Analysts are misled by a fallacy of composition into the plausible assumption that since individual age affects the allocation of health expenditures between persons, the average population age must also affect the total amount of national health expenditures-which empirically it does not. Tests of physician and hospital supply, and of the percentage of total health expenditures publicly financed, similarly reveal a lack of independent significant association with health expenditures. Policy change is not only difficult to quantify, but historically its impact on spending has depended on the strength of administrative enforcement than the language of the law [14]. Some elections, such as Ronald Reagan in the United States and Margaret Thatcher in the United Kingdom, mark clear changes. Yet the shift to fiscal conservatism had virtually no discernable effect upon the growth of health expenditures in these two countries after accounting for income effects. Culyer (151 identifies ‘political will’ as the missing factor which was responsible for cost control in Canada, but explicitly recognizes that the construct is somewhat tautological and lacks predictive capability. In sum, a large number of studies has failed to discover any other factors which have a reliable association even ex post with spending differentials, much less having the ability to improve ex ante forecasts.
expenditures is forecast using information available in December. Thus the 1980 health forecast is based on actual 1978 health expenditures, GDP and prices, and estimates of 1979 and 1980 GDP and inflation published in the December 1979 issue of the OECD Ourlook. Forecast accuracy is measured by mean absolute error (MAE), the average difference between forecast and actual percentage growth rates. Mean Absolute
Error
(MAE) =
z( ;Actual-
Forecast1)
N
Average error is readily comprehensible by policy makers, less subject to distortions based on a single observation, and hence MAE is typically preferred as a practical measure of forecast accuracy to RMSE, Theil U, or other measures [18]. The relative performance results presented here are essentially the same for each, thus only MAE is reported. Comparison is made between the chosen forecast method and the baseline ‘naive forecast’ that growth next year will remain the same as it was last year (i.e. 1980 estimate, 1978 actual, 1979 being not yet available). This procedure follows that used by Theil, Smyth and McNees [19,21] to measure the accuracy of annual macroeconomic forecasts. The series begins in 1960, thus growth rates are defined for 1961 on. E.Y post forecasts are being made for 1965-79, and used to select the optima1 smoothing models, select variables for the econometric models, and estimate parameters. Since 1961-64 values are undefined for some methods (e.g. a 5-year moving average requires 5 previous years), summary statistics are reported only for the 1965-79 span to achieve consistency. Genuine ex ante forecasts for 198&87 are used to evaluate the relative accuracy of the different approaches. The quality of the OECD data is in general excellent. However, shifts to new methods of health accounting and the unavailability of old data mean that there are breaks in some of the series. Nine outliers were deleted; initial 1961 values for Australia, Denmark and Japan, 1966 Switzerland, 1970 Netherlands, 1971 Ireland, 1975 Austria and Belgium, and 1980 Italy. The calibration period 1965-79 thus has 280 rather than 285 observations, the ex ante evaluation period 1980-87 151 rather than 152. A re-analysis which included the omitted observations showed larger variance but did not change any major conclusions.
1lETHODS AND DATA
The data are from the national health accounts maintained by the Directorate of Social Affairs, Manpower and Education of the Organization for Economic Cooperation and Development [16, 171 revised and updated through 1 July 1990. Many statisticians, budgetary officers and health analysts contribute to make this the largest consistent international date series on health care costs. The annual percentage change in real (deflated) national health
A BASELINE FORECAST: STEADY GROWTH
The growth of health expenditures across all 19 countries from 1965 to 1987 averaged 5.8% per year. For single countries, the average growth 1965-87 ranged from 4.0% (Denmark, U.K.) to 8.5% (Japan). Growth in single years ranged from - 9% (Denmark, 1976) to +20% (Spain, 1965 and 1967). However, all 19 countries had higher growth rates during 1965-79 (average, 7.2%) than 1980-87 (average, 3.3%).
1063
International health spending forecasts
earlier years. Although smoothing forecasts use functional forms which are fixed in advance and do not require parameter estimation, as do ARIMA and regression models, the existence of many choices implies that some empirical fitting will have to take place. Two-, three- and five-year moving averages, and exponentially declining weighted averages (eponentia! smoothes) with a = 0.2, 0.3, 0.4, 0.5, 0.6 and 0.7 were evaluated. The lowest MAE was obtained with a exponential smooth using a = 0.3. For all 19 countries over a!! 23 years the forecast error is reduced from 3.6% to 3.0%. The percentage reduction in MAE for each country, averaged over a!! 19 countries, is 15% 1965-79 and 13% 1980-87 (Appendix, Tables Al and A2). Greater variability implies more random error which can be removed by smoothing, and a 1% larger MAE is associated with 0.16% more reduction in forecast error. However, the amount of smoothing required to obtain the greatest reduction in forecast
The mean absolute error of naive steady growth forecasts for a!! countries over the entire period is 3.6%. Although the growth rate for 1965-79 is over twice that of the recent period, the MAE of 3.9% is only slightly larger than the MAE of 3.0% for 1980-87. The average growth rates in real health expenditures over the entire period 1965-87, and the MAE of a combined forecast mode!, are presented in Fig. 3 to provide a graphical display of the scale of the problem, and of the results. UNIVARIATE
SMOOTHING
Some of the error in a naive forecast which uses last year’s growth to project next year is purely random. An average over several years will, to an extent, let those random errors offset each other. However, if the health care system is changing over time, older data are less relevant in predicting future trends. A decision must be made regarding how many years to include, or how to reduce the weighting of
T
% Japan
Spain
Italy
Can
US
Neth
France
T
Norway
Greece
Fin
Ire
Austral
Switz
Fig. 3. National Health expenditures,
Belg
T
Aus
Sweden
1965-87.
Germany
T
Den
UK
THOMASE. GETZENand JEAN-PIERREPOULLIER
1064
error is not related to the variability of the time series within a country. The United States, with the lowest naive error (1.8%), obtained the largest percentage reduction with the greatest amount of smoothing, while the country with the next lowest variability, The Netherlands (1.9%), did best with no smoothing at all. More extensive testing confirmed the lack of correlation between magnitude of MAE and optimal degree of smoothing (magnitude of 2 or length of moving average). An attempt was made to reduce error by using a different smoothing function for each country, choosing the model that had the lowest MAE for the 1965-79 period. The average 1980-87 MAE obtained by doing so was worse than if a = 0.3 was used across the board; forecast error increased in eleven countries, and fell for only seven. It appears that such country specific adjustment is more often a response to a random fluctuation rather than a persistent pattern, and therefore average forecast accuracy declines. ARIMA
tionarity. First differences (growth rates) for the period 1960-80 will usually fail stationarity tests because of this decline. Therefore second differences (rates of change in growth rates) must be used, and a secular trend is built into the model. Only countries which continued to fall through most of 1980-87, and had a clear pattern, would provide good ARIMA models. These criteria were met for one country, France, whose data followed an ARIMA (2,2,0). For this one country, the ARIMA forecasts were substantially better than any others. Yet good results in 1 out of 19 series would appear to be nothing more than chance. It is necessary to wait for more data in longer time series before ARIMA methods can be systematically applied and evaluated on OECD health expenditures. INTERNATIONAL AVERAGES
Averaging over several years removes some of the random fluctuation in this year’s observed rate of health expenditure growth, making the smoothed rate a more accurate predictor. It is also possible to average over countries, rather than years. Such international averaging is useful only if there is some relationship between health expeditures among the countries, otherwise it will just add random error and become a worse guide to future growth rates. Empirically, an ‘international average’ forecast that next year’s growth rate in any particular country will be equal to last year’s average rate across all 19 OECD countries is more accurate than a naive forecast that each country will continue at its own growth rate for 16 of 19 countries 1965-79, and 15 of 19 198&87. The average gains in MAE are 17% and 13% respectively (Table I). The international cross-section can also be smoothed over time to remove random fluctuations, but since it is already an average of 19 observations, such smoothing yields little additional benefit. The optimal smooth is z = 0.8, with 80% of the weight on the most recent year and only a 0.5% incremental improvement in MAE.
modelling
Box-Jenkins methods are able to extract more information from a time series than smoothing, and univariate ARIMA models have been used successfully in many macroeconomic forecasting applications [l, 211. However, since the model-fitting process is data driven, good results depend upon the existence of a clear pattern and a relatively large number of observations, Health expenditures are available only on an annual basis, and only since 1960. The 20 observations available before 1980 were not sufficient to determine models which could improve on a naive forecast. Smyth [20] reports a similar lack of success in applying Box-Jenkins methods to annual OECD forecasts. For all of the countries, average growth rates declined between 1960 and 1980. It can now be seen that this downward trend was temporary, growth rates having stabilized after 1976. ARIMA modelling commences by differencing the data to achieve sta-
Table
I. Comparison
of 6 methods real (deflated)
of forecasting national
health
annual
Comparative Mean
absolute
(MAE) Method I.
Naive (forecast
= actual,
2. Exponential
percentage
forecast
errors
80-87
3.92%
2.99
3.23
2.52
3.14 2.83 2.95
rate in
performance Average
reduction
relative
to naive
average
1965-79
growth
expenditures
1965-79
g&87
_ I)
smooth
I 5%
13%
2.34
17%
13%
2.52
26%
8%
2.36
22%
17%
(2 = 0.3) 3. International 4.
average
Econometric (single
country)
5. Econometric (all
countries)
6. Adjusted Values
2.82
combination
are for averages
across
errors
(MAE)
in %.
MAE
relative
to the naive forecast
as last year’s
Third
27%
2.19
19 countries. and fourth
First columns
two
columns
25% are mean
are the percentage
that next year’s
growth
rate will
absolute
reduction
in
be the same
International health spending forecasts The international average has a lower MAE than the single-country time series exponential smooth with a = 0.3 for 10 of 19 countries 1965-79, and 15 of 19 countries 1980-87 (Appendix Table Al). The gains from using more recent observations outweigh the costs of using observations from other countries. The growth rates of health expenditures in other countries are more accurate predictors of a single country’s future health expenditures than its-own past rates of health expenditure growth. This suggests that there are common factors which drive health expenditures across all countries.
ECONOMETRIC
MODEL
The econometric model estimated on 431 observations for 19 countries 1965-87 is given in line 1 of Table 2. There is a 0.7% secular rising trend, a cumulative income elasticity 1.39 distributed over seven years, and a reduction for delayed adjustment to (unanticipated) changes in inflation of -0.20 for the last year, and -0.10 for the year before. The coefficient on expected future GDP growth, 0.10, reflects only the additional incremental effect of new information contained in the forecast as most of the anticipated GDP growth is already reflected in the lag income coefficients. Income effects are bimodal, with peaks at lag 0 and lag 2, similar to the theoretical lag structure derived above. Inflation lags are dependent on institutional structure and much more variable than income lags. Estimating over a pooled sample of 19 very different countries tends to blur away the strong price adjustment lags found in some individual country time series (Canada, Finland, Netherlands, Spain, U.S.). Distributed lags are able to explain much of the variance in the level and rates of change in annual health spending. The question is, can such a retrospectively accurate model produce good predictions of future spending? Ex post in 1988, actual 1986 and 1987 GDP is known. Yet to make ex ante forecasts before the fact in 1986, these terms must be replaced with estimated values or left out. Inflation adjustments are even more troublesome since by definition ‘unanticipated’ inflation cannot have been forecast. The historical health expenditure function is truncated from 10 parameters to 3 when used for estimating the Table 2. Regressions
estimating
1065
pooled forecasting model (line II, Table 2), and only 2 for single country models (constant and moving average income term). The forecast procedure is standardized by assuming that all forecasts for year, are made in December of year,_, . Actual health expenditures and GDP for year,_ z and prior are available, as is forecast GDP for years,. , _ , . Unrestricted regression over the lags 2-10 for the period 1965-79 gives significant positive coefficients to lags 2-5. A Chow test of the simplifying assumption that all four lag coefficients were equal could not be rejected, so these four terms are replaced by a 4-year moving average. The Almon lag procedure, applied to the first 7 lags, gave nearly identical coefficients. The reduction in residual error obtained was insufficient to justify estimation of the additional parameters required to fit a third-degree polynomial. A Koyck lag with geometrically declining weights was clearly rejected. The forecasts equation for any given year, is obtained by regressing real health expenditure growth from 1961 to year,_ z on a 4-year moving average of real GDP growth lagged two years. The income average for the next period (actual,_2,,_3,,_, and forecast,_,) is then substituted into the regression equation. Thus the functional form is the same for each year, but the estimated constant and income elasticity are slightly different. The individual country parameters differ widely from the pooled estimate, and from each other [9]. ‘Forecasts’ for 1965-79 are made by fitting the model to the data ex post. They are not true forecasts and the errors in the early period must be interpreted in this light. Estimates of future (year,) GDP growth rates which were available in year,_z were obtained from back issues of the OECD Outlook, and are included in the pooled forecasting model (line 2, Table 2). In the 19 separate single-country time series, each regression has less than 30 observations, estimated income elasticity is much less stable, and forecast GDP is significantly associated with health expenditures in only 3. Therefore only past actual GDP is included in the single-country econometric health forecasts. The ability to forecast change in inflation 2 years in the future is essentially nil, and there is no inflation adjustment term in any of the forecasts. Its use at the time the forecast was made would be limited to adjusting the (as yet unobserved) 1987 health expenditures Pooled econometric (Income ctkts)
1. ex posr Historical Health,,=0.7%+0.10,,,+0.30,,+0.12,,+0.34,,+0.20,+0.21,,+0.12,,
-0.20,,,
forecast model estimation (Inflation effects) - 0.1 I,,,
II. ex owe Forecast Health,, = 1.3% + 0.50,,, + 0.74.,,.es2-ar I is retrospective and uses all data available as of December 1988, including observed change in inflation rates 87-86 and 86-85. N = 431, i'= 0.327. II is the forecast equation and is therefore limited to data available in December 1986, N = 393, i' = 0.31 I. Pooled estimate for I9 countries. Incomes and expenditures are percentage changes in real (deflated) currencies, 1965-87. Sources: Health Care Financing Review 8, (4) l-36, 1987; OECD Ourlook.
THOMAS E. GETZEN and JEAN-PIERREPOULLIER
1066
year,_, health expenditure base, not the rate of change in spending from year,_, to year,. The pooled econometric model is able to reduce ex post 1965-79 MAE for 16 of 19 countries compared to a naive forecast, and for 14 of 19 compared to an exponential smooth (Appendix Tables Al and A2). However, these errors are obtained over the data used to fit the model, and hence do not provide a true test. The 198&87 ex ante forecasts do piovide a genuine test. There the pooled econometric model is again better than naive for 16 of 19 countries, and better than an exponential smooth for 1I. On average, the pooled econometric model performs better than the naive, exponential smooth or international average methods in an actual forecast situation (Table I). Single-country econometric models necessarily have smaller ex post errors since 19 different regression constants and income elasticities are fitted to the data, but they perform significantly worse than the aggregate pooled model in ex ante forecasting. For 198&87, single-country econometric model MAE’s are better than naive forecasts for 11 countries, better than exponential smoothes for 6, and better than the pooled model for only 4 (Appendix Table Al). For 3 countries, Greece, Ireland and Norway, the income coefficient was less than one standard deviation from zero for the calibration period, and hence no single-country econometric forecast was made. Even in the other 16 countries, fluctuation in the regression constants and coefficients indicates that 18 (for 1980) to 25 (for 1987) observations is not sufficient to estimate stable model parameters. Yet only single-country parameter estimates can capture the structural differences between countries. While all nations respond to a recession (boom) by lowering (raising) health expenditures, the magnitude of the response is clearly different across countries, as is the size of the underlying secular trend in spending. The difficulty is that such ‘true’ differences are confounded with random variations. On average, the losses from estimating a model with a small number of observations outweighs the gains from particularizing the parameters for each country. COMBINEDFORECASTS
The choice between using a pooled model whose parameters are stable because of the large number of or many single-country aggregate observations, models whose parameters are unstable due to the small number of observations available for each, need not be dichotomous. A combination of the two may be better than either one alone [18]. If their errors are uncorrelated, they will partially offset each other, reducing MAE. An average of the pooled and singlecountry econometric forecasts reduces MAE IS%, vs 17% and 8% respectively for each separately. Similarly, the choice between an international average across countries at a point in time and an exponential
smooth within a single-country time series can be replaced by a combination, reducing MAE by 18% vs 13% for each method alone. Combinations usually reduce MAE most when the information sets are most different. The singlecountry regression uses no international data, and the international average uses no income data. A combination of these two reduces MAE 22% relative to naive. There is less to be gained from pairing similar forecast methods. Although each of the four forecasts are overlapping to some extent, none appears to be entirely redundant on theoretical grounds, and each improves average forecast accuracy within the limited set of 151 test observations available. Much of the benefit from smoothing arises from the fact that averaging over several years allows random errors to average out. Combining several forecasts also averages out random errors. The individual country time series therefore requires less smoothing when used in a combination than when used alone. More weight can be put on the most recent observation. Using 1965-79 to calibrate the model, the optimal degree of exponential smoothing is raised to r = 0.6 when combined with the international average, single-country and all-country econometric forecasts, lifting MAE reduction from 21% to 23%. During a period of declining GDP growth rates, the trend of health expenditure growth will also be downward. Forecasts using averaged past values, the international and exponential smoothes, will therefore systematically overestimate. The magnitude of the bias is about +0.6% for both 1965-79. To remove this bias from a combined forecast, a downward adjustment of 0.3% is called for (0.6% times l/4 for exponential smooth and l/4 for international average). GDP continued to decline through 1982, and appeared to revert back to a stationary international mean of 2-3% real growth after that. Adjustment of forecasts through 1984 brings MAE reduction from 23% to 25%. The ‘GDP bias adjustment’ is not the same as using a smoothing method with trend, such as linear exponential smoothing. The adjustment is based on a causal interpretation of the linkage between GDP and health expenditure. That is why no adjustment is called for in years 1985 +. The adjusted combined forecast using four methods (exponential smooth, international average, single- and all-country econometric model) is more accurate overall than any one alone (Table I). The combined forecast is better than any of its components for 8 of 19 countries (Appendix Table Al). The strength of the combined forecast lies primarily in its ability to avoid large errors. In only one of 19 countries does the combined forecast have larger MAE than the naive. In 15, the gains are greater than IO%, and for 9 the grains are 30% or more. Several attempts were made to find a more selective method of combining the forecasts. None were successful. Over (under) weighting the forecast which was better
International health spending forecasts (worse) in the earlier period did not improve average MAE. Note, for example, that the international average, which was the most accurate method for the U.S. 1965-79, was 92% worse than naive 1980-87. It was considered that adjusting the entire period might be too crude and too outdated to be useful, so a continuous adjustment model was constructed which weighted each forecast by the reciprocal of its error in the last observed period. Error wei‘ghts were averaged over 1,2 and 3 years, and varied by a power function from -0.1 to - 100 to fine tune the model. Again, no adjustment was able to significantly improve on an unweighted combination of forecasts. Any large deviation made forecasting accuracy worse. Averaging forecasts is a simple and powerful method for reducing error which is remarkably robust, and hard to beat on a consistent basis. Rates of growth in real (deflated) health expenditures are the most meaningful theoretically, and most readily comparable between countries. However, health and finance officials in each country must forecast next year’s budget in nominal currencies. Also, macroeconomic forecasts of nominal GDP are often more accurate than real GDP forecasts. Using raw data unadjusted for inflation did not improve forecasts of real expenditures. Furthermore, nominal budgets were more accurately forecast by adding inflation to the real health expenditure growth forecast, than estimating directly from unadjusted nominal expenditures. DISCUSSION
The rate of growth of national health spending is related to that of prior years, and of other countries, so that smoothing across time or nations increases the accuracy of forecasts. Further gains can be obtained with a causal econometric model relating health spending to past and expected future GDP growth when estimated using a pooled cross-section of time series including all 19 countries, but is less accurate if fitted only to the short series available for each country. Averaging different forecasts together improves accuracy, reducing MAE by 25% relative to the naive steady growth forecast. Other studies of forecast accuracy [I 8,22-241 document similar gains. Systematic measurement of predictive power, and recognition of the range of residual uncertainty, is a useful corrective to undue analysis of small fluctuations. At the same time, it focuses attention on behavior which is markedly different from the norm and deserves further study-drastic cuts by Denmark since 1980, excessive growth in the United States. While it is arguably more important to understand the structural factors which determine health care cost functions than to arrive at an estimate of next year’s spending, the development of such analytic capability depends upon our ability to make a good forecast against which policy outcomes can be measured. One goal of health services research is to
1067
be able to examine several options, and then to project for the government the expected cost of following either A, B or C. Without a strong and flexible model of inertial trends, international crossinfluences, and fundamental macroeconomic effects, to which policy effects can be added we cannot do so. Indeed, forecast accuracy provides a very practical and exacting test of our understanding of the health system is in objective terms. For short term forecasts such as those evaluated here, it is possible to project forward incrementally from the past without inquiring what defines some spending as ‘medical’ (care of the elderly, alcoholism, or physician education) or how that might shift over time due to changes in technology and social organization. Comparison of 1990 with 1950 or 1890 is not possible without considering such ‘theoretical’ issues. Before the 20th century, the organization and financing of care was fragmented and costs rose slowly. A ‘modern’ medical care system was created in most developed countries between 1900 and 1950, accompanied by moderate expenditure growth. Since then, costs have risen quite rapidly, leading some to question the sustainability of expanding medical benefits and technology. As the year 2000 approaches, will high cost, high technology trends continue, or should we expect a shift to more constrained ‘postmodern’ medicine? Thorough, detailed and consistent data has been collected only for the last thirty years, and growth in this period may not be fully representative of the long run. Acknowledgemems-The
authors would like lo thank Joel Leon Telles, David Barton Smith and the two anonymous referees for their helpful comments. The analyses and conclusions made in this study are the views of the authors, and not any organization. REFERENCES
I. Granger C. W. J. and Newbold P. Forecasting Economic Time Series. Academic Press, New York, 1977.
2. Makridakis S., Wheelwright S. and McGee V. Forecastina: Methods and Avvlicarions. 2nd edn. Wilev. _. New York, 1983. .1 3. Getzen T. E. Macro forecasting of national health expenditures. Adc. Hlth Econ. HIrh Services Res. 11, 27-48,
1990.
Abel-Smith B. An Inrernational Study of Healrh Expenditure. World Health Organization. Geneva. 1967. Newhouse J. P. Medic;1 care expenditure: a crossnational survey. J. Human Res. 12, (I), 115-125, 1977. Maxwell R. Health and Wealth. Lexington Books, Lexington, MA, 1981. Leu R. The public-private mix and international health care costs. Public and Private Health Services (Edited by Culyer A. J. and JBnsson B.), pp. 41-63. Blackwell, London, 1986. 8. Pfaff M. Differences in health care spending across countries: statistical evidence. J. Hlfh Polir. Policy L-aw IS, (1). l-67,
1990.
9. Getzen T. E. The adjustment of health expenditures to GNP: variability and lags among OECD nations. Eur.. Econ. Rev. In press. IO. Dhrymes P. Distributed Lugs, 2nd edn. North-Holland, Amsterdam, 1981.
1068
E.
THOMAS
GETZEN
and JEAN-PIERRE POULLIER
1I. Deaton 12.
13.
14. 15.
16.
A. Involuntary saving through unanticipated inflation. Am. Econ. Rm. 67, (5), 899-910, 1977. Getzen T. E. and Poullier J.-P. An income-weighted international average for comparative analysis of health expenditures. Inr. J. Hlth Plan. Management 6, %22, 1991. Gerdtham U.-G., Sogaard J., Jiinsson B. and Andersson F. A pooled cross-section analysis of the health care expenditures of the OECD countries. Proceedings of the Second CVorld Congress on Healrh Economics, Zurich, September 10-14,. 1990. Glaser W. Paying for Hospifals. Jossey-Bass, Boston, 1988. Culyer A. J. Healrh Care Expenditures in Canada: Myth and Reality; Past and Fufure. Canadian Tax Paper No. 82. Canadian Tax Foundation. Toronto, 1988. OECD Compendium of health care expenditure and other data. Healrh Care Financing Raiew. 1989 Annual Supplement, pp. 111-194. Also published as Health Care in Transition. Paris, OECD, 1990.
17. OECD Measuring Health Care 1960-1983: Expenditures, Costs and Performance Paris, OECD, 1985. 18. Armstrong J. S. Long-range Forecasting. Wiley, New York, 1985. 19. TheiI H. Applied EConomic Forecasfing. North Holland, Amsteniam, 1966. 20. Smyth D. J. Short-run macroeconomic forecasting the OECD performance. J. Forecast. 2, (I), 37-49, 1983. 21. McNees S. K. How accurate are macwconomic forecasts? N. Engl. Econ. Rev. 1636, July/August 1988. 22. Ascher W. Forecasting: an Appraisal for Policy Makers and Planners. Johns Hopkins University Press Baltimore, 1978. 23. Holden K., Peel D. and Sandhu B. The aozumcy of OECD forecasts. Empirical Econ. 12, 175-186, 1987. 24. Ash J. C. K., Smyth D. J. and Heravi S. M. The accumcy of OECD forecasts of the international economy. (mimeo) Department of Economics, University of Reading, U.K., 1989.
APPENDIX Table Al. Comparison
of forecast errors: national
health expendi-
Table AZ. Comparison
tllKS
naive
Mean absolute error (MAE) in % econ-s econ-a intl combined exP I.6 58 I .4 3.2 2.2 42 3.7 2.4 34 3.6 3.7 ?.I 3.3 2.7 3.8 5.2 2.6 3.0 2.5
1.4 4.7 1.2 2.8 2.4 3.9 3.4 2.2 2.5
U.K.
2.1 8.4 1.7 4.0 2.8 5.7 4.6 3.1 4.3 4.8 4.9 2.8 3.6 2.1 5.0 5.1 3.3 3.3 3.1
Average
3.92%
1965-19
U.S.
1.9 4.7 1.6 3.1 2.4 4.3 3.7 2.2 2.9 3.2 2.8 2.6 3.7 I.9 3.6 5.8 2.6 2.7 4.0
I.3 5.0 I.1 2.9 2.2 3.8 3.2 2.0 2.7 3.4 3.3
3.9 2.1 2.3 2.2
1.2 4.6 1.6 2.9 2.5 4.3 3.1 ?.I 25 3.0 3.3 2.4 3.6 2.5 3.8 5.3 2.5 2.7 ?.I
3 23%
2.83%’
2.95%
3.14%
2 8246
0.8 2.0 2.4 2.2 1.5 I6 I.1 I.8 2.2 4.9 5.4 2.6 3.0 2.4 4.6 3.1 2.2 I.5 2.3
0.8 2.1 2.7 1.3 1.0 2.7 I.5 1.6 2.5 * 3.5 4.0 2.5 * 2.4 2.5 2.2 I.9
I.? 1.6 2.0 I.3 I.2 3.4 I.? I .4 2.2 4.9 56 2.7 2.5 I .6 3.7 I .9 2.4 I.8 ?I
2.3 2.0 2.1 1.3 2.2 2.8 1.6 I.5 I.9 4.4 4.2 2.1 2.3 1.9 3.8 2.7 2.0 I.4 2.2
1.3 I.5 2.0 I .2 I.4 2.0 I .o 1.3 2.2 4.7 4.9
U.K.
1.2 2.3 2.4 1.9 1.9 2.0 2.2 1.7 3.3 5.7 5.7 3.8 2.7 1.7 5.8 4.4 2.0 2.6 3.8
Average
2.99%
.7.57% _
2.52%’
2.36%
2.34%
2.19%
Australia Austria Belgium Canada Denmark Finland France Germany Greece Ireland Italy Japan Netherlands Norway Spain Sweden Switzerland
198&87 U.S. Australia Austria Belgium Canada Denmark Finland France Germany Greece Ireland Italy Japan Netherlands Norway Spain Sweden Switzerland
l
’ 3.0 3.7 I.5 l
l
of forecast
2.5 2.9 1.9 3.7 4.6 2.2 2.5 2.3
2.5 2.2 1.3 3.7 2.5 2.0 1.7 2.2
Mean Absolute Errors (MAE) of forecast annual % growth rate of real health expenditures for five models; naive (next year same as last), exp (exponential smooth I = 0.3). share (trended share of GDP on forecast GDP), eco” (function of lagged moving average per capita real income, single-country time series), all (econometric model estimated with all 19 countries), intl (international average of 19 countries last year) and combined (an adjusted combination of exp, eco” and all). Sources: OECD Outlook, HCFA (1989).
models relative performance
Relative % reduction
MAE naive
exu
U.S. Australia Austria Belgium Canada Denmark Finland France Germany Greece Ireland Italy Japan Netherlands Norway Spain Sweden Switzerland U.K.
2.1% 8.4% 1.7% 4.0% 2.8% 5.7% 4.6% 3.1% 4.3% 4.8% 4.9% 2.8% 3.6% 2.1% 5.0% 5.1% 3.3% 3.3% 3.1%
26 31 I8 20 21 27 I9 21 21 25 24 -II 6 -31 24 -2 21 8 21
Average
3.92%
econ-s
econ-a
intl
37 44 33 31 IS 32 27 28 42
45 45 9 29 IO 25 32 31 42 38 31 I4 -I -19 25 -5 23 I7 32
IO 44 6 22 I6 25 I8 29 33 34 43 6 -5 II 29 -14 22 I6 -30
combined
196s-79
I5%
l l
-10 -3 26 . 23 35 31 30 26%
22%
17%
38 40 35 28 23 33 30 35 37 29 32 IO I7 I? 26 9 31 24 27 27%
198&87 U.S. Australia Austria Belgium Canada Denmark Finland France Germany Greece Ireland Italy Japan Netherlands Norway Spain Sweden Switzerland
U.K. Average Reduction
I .2% 2.3% 2.4% I .9% I .9% 2.0% 2.2% I .7% 3.3% 5.7% 5.7% 3 8% 2.7% I .7% 5.8% 4.4% 2.0% 2.6% 3.8% 2.99%
31 II 0 -17 21 16 51 -6 33 I3 5 32 -I2 -45 21 29 -10 43 38 13%
34 6 -10 31 48 -39 30 7 24 l l
9 -50 -49 l
44 -23 I6 50 8%
-2 30 I6 31 38 -75 44 IS 32 14 3 30 6 I 36 57 -18 31 44 17%
-92 II I5 29 -18 -45 26 I2 43 21 26 45 IS -16 34 49 I 46 40 13%
-5 34 I7 34 26 0 56 23 34 I6 I4 34 17 19 36 43 3 35 41 25%
in MAE of annual % growth rate forecasts relative to
naive (next year same as last) for exp (exponential smooth a = 0.3). econometric (function of lagged moving average per capita real income, single-country time series), all (econometric model estimated with all I9 countries). intl (international average of I9 countries) and combined (an adjusted combination of cxp. econ. all and intl). Sources: OECD Outlook, HCFA (1989).