Journal of Health Economics 24 (2005) 839–854
Health care expenditure and GDP: Are they broken stationary? Josep Llu´ıs Carrion-i-Silvestre∗ “An`alisi Quantitativa Regional” Research Group, Parc Cient´ıfic de Barcelona, Universitat de Barcelona, 690 Avd. Diagonal, 08034 Barcelona, Spain Received 1 February 2004; received in revised form 1 October 2004; accepted 1 January 2005 Available online 7 March 2005
Abstract In this paper, we analyse the stationarity of the real per capita health care expenditure (HCE) and real per capita GDP for a sample of OECD countries, allowing for the presence of multiple structural breaks. One novelty of the paper is that it permits the presence of structural breaks that affect both the level and the slope of the time series. After the cross-section dependence is accounted for, we have found that these variables can be characterised as stationary processes evolving around a broken trend. © 2005 Elsevier B.V. All rights reserved. JEL classification: C12; C22; C23; I10 Keywords: Multiple structural breaks; Panel data stationarity test; Cross-section dependence
1. Introduction The relationship between real per capita health care expenditure (HCE) and real per capita income (GDP) has been profusely analysed since the publication of the seminal papers in Kleiman (1974) and Newhouse (1977). This literature has argued that there is not only a strong positive correlation between the HCE and GDP of the developed economies, but also that the GDP explains a high percentage of the variation of the HCE. The availability ∗
Tel.: +34 93 402 18 26; fax: +34 93 402 18 21. E-mail address:
[email protected] (J.L. Carrion-i-Silvestre).
0167-6296/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.jhealeco.2005.01.001
840
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
of a long-range, good database that makes it possible to establish international comparisons – i.e. the OECD health database – has encouraged the appearance of further analyses aimed at determining whether this relationship holds in the long-run. In this framework, the application of the developments that deal with non-stationary time series is of special interest. These studies can be classified depending on the type of statistical tools and the approaches that they apply. A first group is made up of those analyses that work at an individual level – i.e. county-by-country. Among them, we can highlight papers by Hansen and King (1996), Blomqvist and Carter (1997), and Gerdtham and L¨othgren (2000), where both the real per capita HCE and GDP are found to be non-stationary. In addition, in just a few cases these time series give rise to a cointegration relationship, which casts doubts on the empirical analyses that have argued the presence of a long-run relationship. The second group of studies has applied panel data techniques to assess the stochastic properties of the real per capita HCE and GDP. The supposition here is that the combination of the individual evidence in a panel data test can imply a more powerful analysis. Blomqvist and Carter (1997), McCoskey and Selden (1998), Roberts (1998) and Gerdtham and L¨othgren (2000) compute panel data based unit root and stationarity tests finding evidence of non-stationarity.1 This reinforces previous results based on univariate time series techniques. However, the main difference between this group of studies and the one described above is that panel data cointegration tests point to the existence of a long-run relationship between HCE and GDP – see Gerdtham and L¨othgren (2000). In all this literature the unit root and stationarity hypothesis testing omit the presence of structural breaks. However, it is well known that this kind of misspecification error can lead to spurious non-stationarity. Actually, we should notice that there exist a number of studies that have found evidence in favour of the stationarity of the output once the breaking-trend specification is introduced in the analysis – see Ben-David and Papell (1995) and BenDavid et al. (1996) for the real GDP and GDP per capita and Perron (1997) for the real GNP or GDP in a sample of developed countries. Due to the strong correlation between HCE per capita and GDP per capita – see Newhouse (1977) – it should not be surprising that, if structural breaks can change the conclusions of the stochastic properties analysis of the real GDP per capita, the non-stationarity of the HCE per capita can also be affected by the presence of these structural breaks. In fact, Hansen and King (1996) mention that this possibility should be considered when establishing the order of integration analysis. This is addressed in Jewell et al. (2003), who test the null hypothesis of non-stationary panel data allowing for the presence of structural breaks. They apply the LM test statistic in Im and Lee (2001) allowing for up to two level shifts. One of the appealing properties of this test is that its limiting distribution does not depend on the presence of level shifts. Besides, the empirical size of the test is not affected by misspecification errors on the dates of the breaks, although there might be some effects on the power of the test in finite samples. Jewell et al. (2003) conclude that both real per capita HCE and GDP are stationary around a broken trend – they specify a time trend which might be affected by level shifts. Notwithstanding, 1
McCoskey and Selden (1998) characterised these variables as stationarity when the deterministic function of the ADF test was given by a constant, although evidence that favoured the non-stationarity was reported when specifying a time trend. As pointed out in Hansen and King (1998), since both the real per capita HCE and GDP exhibit a trending behaviour the conclusion that should prevail is the one of non-stationarity.
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
841
it should be borne in mind that their analysis is restricted, on the one hand, to the level shift specification, and on the other hand, to the presence of up to two structural breaks. In this paper, we contribute to this literature on four different fronts. First, we apply the stationarity test in Carrion-i-Silvestre et al. (2003), that makes the deterministic specification more flexible by incorporating the consideration of both level and slope shifts. This is of special interest, so long as the structural breaks do not have to be restricted just to cause level shifts. They can also change the slope of the time series – for instance, see Perron (1997) for the GNP and GDP in a sample of countries. Second, our analysis allows for multiple structural breaks and, hence, we do not restrict it to just two structural breaks. Third, this test specifies the stationarity as the null hypothesis, so that it complements the previous evidence, and to some extent, acts as a confirmatory analysis. Finally, none of the proposals mentioned above have dealt with cross-section dependence in a satisfactory way. Although it is observed that cross-section dependence is quite important, the existing evidence relies only on the application of test statistics that assume constant cross-section dependence; common practice employs cross-section demeaning in order to remove this dependence. However, when doing so we are implicitly assuming that the cross-section is common to all the individuals. This is important because all these panel and stationarity test statistics obtain their limiting distributions assuming cross-section independence. We compute the bootstrap distribution of the panel data unit root and stationarity tests in order to take into account any kind of cross-section dependence. The rest of the paper is organised as follows. In Section 2, we give a brief overview of the panel data unit root and stationarity tests that are applied in this paper, and highlight their main advantages and limitations. Section 3 presents the empirical results of the stochastic properties analysis of the HCE and GDP panel sets. At the first stage we focus on the standard tests and after that we motivate and present the results for the stationarity test that include the effects of multiple structural breaks. Finally, Section 4 concludes with a summary of the main results.
2. Panel data unit root and stationarity tests 2.1. Tests without structural breaks Some of the most popular panel data unit root tests are the ones in Im et al. (1997, 2003) – hereafter, IPS tests. These tests assume that data are generated in the time series according to the finite AR(pi + 1) process: yi,t = βmi dmt +
p i +1
φi,k yi,t−k + εi,t ,
(1)
k=1
t = 1, . . . , T , i = 1, . . . , N, where dmt denotes the vector that collects the deterministic regressors – i.e. a constant term, dmt = 1, when we allow for individual effects and a time trend, dmt = (1, t) , when allowing for individual and time effects. The stationarity of yi,t implies that all the roots of the polynomial in the lag operator φpi +1 (L) = (1 − φi,1 L − . . . − φi,pi +1 Lpi +1 ) lie outside the unit circle, i.e. |L| = 1 is not a root of φpi +1 (L). Note
842
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
that (1) can be rewritten as yi,t = αmi dmt + δi yi,t−1 +
pi
γi,k yi,t−k + εi,t ,
(2)
k=1
pi +1 φi,h , with φpi +1 (1) = (1 − φi,1 − · · · − where δi = −φpi +1 (1) and γi,k = − h=k+1 φi,pi +1 ). If L = 1 is a root of φpi +1 (L) this will imply that (1 − φi,1 − · · · − φi,pi +1 ) = 0, that is, δi = 0. Therefore, the stochastic properties of the panel data set can be assessed by looking at the statistical significance of δi in (2). Thus, the null hypothesis of nonstationarity panel data is given by H0 : δi = 0 ∀i, while the alternative hypothesis is H1 : δi < 0 i = 1, . . . , N1 ; δi = 0 i = N1 + 1, . . . , N. Note that the null hypothesis is rejected if there is a subset (N1 ) of stationary individuals. As a result, unit root hypothesis testing can be conducted allowing for a higher degree of heterogeneity provided that under the alternative hypothesis a common autoregressive parameter is not required. In addition, it accounts for idiosyncratic dynamics since different lag lengths for the parametric correction can be specified for each individual. These authors propose two test statistics. The first statistic is the standardised group-mean Lagrange multiplier (LM) bar test statistic – the LM test – and the second one is the standardised group-mean t-bar test statistic – the ¯t test. For instance, the LM test is given by: √ N[LM − N −1 N i=1 E(LMi )] LM = , (3) N −1 N Var(LM ) i i=1 N with LM = N −1 i=1 LMi , where LMi denotes the individual LM test for testing δi = 0 in (2), and E(LMi ) and Var(LMi ) are obtained by means of Monte Carlo simulation. The ¯t test has a similar expression replacing LMi by ti in (3), where ti denotes the individual pseudo t-ratio for testing δi = 0 in (2). Under the assumption that the individuals are cross-section independent, it can be shown that both tests converge to the standard Normal distribution once they have been properly standardised. This approach has a non-parametric counterpart. Thus, we can test the unit root hypothesis computing the test as in Maddala and Wu (1999), where instead of combining the individual ti they suggest pooling the individual p-values. Under the null hypothesis and assuming cross-section independence, the test statistic is 2 given by MW = −2 N ln(π i ) ∼ χ2N , where πi denotes the p-value of the pseudo t-ratio i=1 2 for testing δi = 0 in (2). One concern about ADF-based panel estimates is how to interpret failures to reject the null of non-stationarity. Hadri (2000) provides an alternative panel test in which the null 2 In order to facilitate computation of π we have carried out 100,000 replications to obtain the empirical i percentiles for the ADF test for a DGP given by a random walk without drift. Then a response surface has been estimated to approximate the corresponding p-values using the logistic functional form given by
πi =
exp{xi β} , 1 + exp{xi β}
where xi β = β0 + β1 xi + β2 xi2 + β3 xi3 + β4 xi4 , with xi being the value of the ADF test and πi the corresponding percentile.
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
843
hypothesis is stationarity allowing for heterogeneous and serially correlated errors. The test in Hadri (2000) assumes that the individual time series yi,t is generated according to the following unobserved component model: yi,t = αmi dmt + ri,t + εi,t
(4)
ri,t = ri,t−1 + ui,t ,
(5)
2 ) ∀i, i = 1, . . . , N, with where εi,t is assumed to be a stationary process and ui,t is i.i.d.(0, σu,i εi,t and ui,t being mutually independent. The stationarity of yi,t implies that the random walk 2 = 0 so that r = r component in (4) collapses into a constant or, equivalently, that σu,i i,t i,0 a constant. The null hypothesis is that all the variables are stationary, so that for the N elements of the panel the variance of the errors of the random walk component is such that 2 = · · · = σ2 2 H0 : σu,1 u,N = 0 against the alternative hypothesis that some σu,i > 0. In order to test the null hypothesis of stationarity Hadri (2000) proposes to use the panel version of the test in Kwiatkowski et al. (1992) applied in the univariate context – hereafter KPSS test. In its heterogeneous version the test statistic is given by: N T 2 ηk = N −1 ωˆ i−2 T −2 , (6) Si,t i=1
t
t=1
k = {µ, τ}, where Si,t = j=1 εˆ i,j denotes the partial sum process obtained from the estimated OLS residuals when regressing the individual time series on a constant – ηµ test – or on a time trend – ητ test. We define ωˆ i2 as a consistent estimate of the long-run variance 2 , i = 1, . . . , N. Note that the specification in (6) assumes of εi,t , ωi2 = limT →∞ T −1 Si,T heterogeneous the long-run variances across individuals, although it is possible to impose 2 . The non-parametric method dehomogeneity replacing ωˆ i2 in (6) by ωˆ 2 = N −1 N ω ˆ i=1 i scribed by Newey and West (1994) and the parametric method in Shin and Snell (2000) can be applied to obtain consistent estimates of ωi2 . However, some caution has to be maintained when applying the non-parametric methods jointly with the use of optimal lag selection for the bandwidth. As Lee (1996) and Kurozumi (2002) have shown, the procedure of lag selection in Andrews and Monahan (1992) should not be applied to compute the long-run variance for the KPSS test as it leads to inconsistency in the test. In this paper, we follow the suggestion in Kurozumi (2002) and estimate the long-run variance non-parametrically with the bandwidth of the Bartlett kernel fixed according to: ˆl = min 1.1447
4ˆa2 T (1 + aˆ )2 (1 − aˆ )2
1/3
, 1.1447
4k2 T (1 + k)2 (1 − k)2
1/3 ,
where aˆ is the estimate of the autoregressive parameter that produces Andrews (1991) method. The simulations that he carries out suggest using k = 0.7 or k = 0.8 as values that maintain a compromise between the empirical size and power of the test. As in McCoskey and Kao (1998) and Hadri (2000), it is not necessary to assume homogeneity of the long-run variance across individuals, so that the expression (6) can include separate estimates for the long-run variance of each individual. After suitable standardisation, the tests are shown to
844
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
converge to the standard Normal distribution. However, and as for the panel data unit root tests presented above, this result is found assuming cross-section independence. 2.2. Test with structural breaks The stationarity test in Hadri (2000) can be modified to allow for multiple structural breaks through the incorporation of dummy variables in the deterministic specification of the model. In this case, under the null hypothesis the DGP for the variable is assumed to be: yi,t = αi +
mi
θi,k DUi,k,t + βi t +
k=1
mi k=1
∗ γi,k DTi,k,t + εi,t ,
(7)
∗ = t − T i for t > T i and 0 elsewhere, k = {1, . . . , m }, with the dummy variable DTi,k,t i b,k b,k mi ≥ 1. The model in (7) includes individual effects, individual structural break effects – that is, shifts in the mean caused by the structural breaks – , temporal effects – if βi = 0 – and temporal structural break effects – if γi,k = 0, that is when there are shifts in the individual time trend. This specification is the panel data counterpart of models with breaks proposed in the univariate framework. Thus, when βi = γi,k = 0 the model in (7) is the counterpart of the one analysed by Perron and Vogelsang (1992), whereas when βi = γi,k = 0 we revert to the specification given by Perron (1989)’s model C. Although other specifications might be adopted, e.g. the panel data counterparts of models A and B in Perron (1989), the asymptotic distribution of the test proposed below for those cases cannot be asymptotically distinguished from the one with βi = γi,k = 0. Thus, these models can be rewritten in a way that their representation becomes equivalent, therefore sharing the limit distribution – see Carrion-i-Silvestre et al. (2003). The specification given by (7) is general enough to allow for the structural breaks to have different effects on each individual time series – the effects are measured by θi,k and γi,k – and to be located at different dates since we do not restrict the dates of the breaks to satisfy i = T , ∀i = {1, . . . , N}. Finally, it permits the individuals to have a different number Tb,k b,k of structural breaks mi = mj , ∀i = j, {i, j} = {1, . . . , T }. The test of the null hypothesis of a stationary panel follows the proposal of Hadri (2000), with expression given by: N T 2 LM(λ) = N −1 ωˆ i−2 T −2 , (8) Si,t
t
i=1
t=1
where Si,t = j=1 εˆ i,j denotes the partial sum process that is obtained using the estimated OLS residuals of (7), where ωˆ i2 is a consistent estimate of the long-run variance of εi,t . As before, the homogeneity of the long-run variance across individuals can be imposed. Finally, λ is used in (8) to denote the dependence of the test on the dates of the break. For i /T, . . . , T i /T ) each individual i it is defined as the vector λi = (λi,1 , . . . , λi,mi ) = (Tb,1 b,mi which indicates the relative positions of the dates of the breaks on the entire time period, T. The estimation of the number of structural breaks and their position is made through the procedure in Bai and Perron (1998) that computes the global minimisation of the sum of squared residuals (SSR). Here we made use of this procedure and chose as the estimate of the dates of the breaks the argument that minimises the sequence of individual
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
845
i , . . . , T i ) computed from (7): SSR(Tb,1 b,mi i i (Tˆ b,1 , . . . , Tˆ b,m ) = arg minT i i
i b,1 ,...,Tb,mi
i i SSR(Tb,1 , . . . , Tb,m ). i
Once the dates for all possible mi ≤ mmax , i = {1, . . . , N}, have been estimated, the point is to select the suitable number of structural breaks for each i, if there are any, that is, to obtain the optimal mi . Bai and Perron (1998) address this concern using two different procedures. Briefly speaking, the first procedure relies on the use of information criteria – the Bayesian information criterion (BIC) and the modified Schwarz information criterion (LWZ) of Liu et al. (1997). The second procedure is based on the sequential computation – and detection – of structural breaks with the application of pseudo F-type test statistics, though the asymptotic distribution of these test statistics is only derived for the case of non-trending regressors. Bai and Perron (2001) compare both procedures concluding that there was better performance for the latter. Therefore, following their recommendations, when the model under the null hypothesis of panel stationarity does not include trending regressors, our suggestion is to estimate the number of structural breaks using the sequential procedure. For trending regressors the number of structural breaks should be estimated using the information criterion. Bai and Perron (2001) conclude that the LWZ criterion performs better than the BIC criterion.
3. Empirical results The database that has been used in the paper is the one in Gerdtham and L¨othgren (2000) and Jewell et al. (2003), and comes from the OECD (1998). Specifically, it includes data for 20 OECD developed countries on real HCE per capita and real GDP per capita (1990 constant prices) covering the time period from 1960 to 1997.3 All the variables have been taken in natural logarithms. Given their trending nature – see Fig. 1 for the HCE of a sample of eight countries – we have included a time trend in the deterministic function of the regression equations on which the different tests are based, i.e. dmt = (1, t) in (2) and (4) for the IPS, MW and ητ tests, respectively, and βi = γi,k = 0 in (7) for the LM(λ) test. Tables 1 and 2 report the country-by-country and panel data test statistics, respectively, for the unit root and stationarity tests that do not allow for the presence of structural breaks. Following Gerdtham and L¨othgren (2000), we have specified pmax = 8 as the maximum lag order for the autoregressive correction, and the final number of lags is fixed according to the tsig criterion in Ng and Perron (1995). The estimation of the long-run variance that is required for the KPSS test follows the suggestions in Kurozumi (2002) as described above with k = 0.7. For the individual tests, the finite sample critical values are drawn from the response surfaces in MacKinnon (1991) and Sephton (1995) for the ADF and KPSS tests, respectively. 3 Although the OECD has published updated versions of the Health Data File, it do not provide homogeneous long-time series to carry out the analysis. The reason is that for some countries and some years the information about health data has not been adapted to the accounting system reform that took place in 2000. Besides, the use of this sample period will allow us to compare previous results in the literature with ours. We kindly thank M. Tieslau for providing us the data set.
846
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
Fig. 1. Logarithm of real per capita HCE.
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
847
Table 1 Individual ADF and KPSS tests with no breaks HCE
Australia Austria Belgium Canada Denmark Finland France Germany Greece Iceland Ireland Italy Japan Netherlands Norway Spain Sweden Switzerland United Kingdom United States
GDP
ADF
p-Value
KPSS
ADF
p-Value
KPSS
-0.515 -1.150 0.549 0.337 -2.427 -1.062 -3.368a -1.938 -2.372 -0.739 -3.172b -0.179 -3.254a -1.738 0.066 -1.789 -1.521 -1.340 -1.626 -1.542
0 5 0 1 0 0 8 3 0 0 5 0 1 2 0 5 0 0 0 1
0.146b 0.145b 0.156a 0.163a 0.165a 0.166a 0.170a 0.143b 0.131b 0.163a 0.143b 0.167a 0.165a 0.153a 0.162a 0.155a 0.168a 0.158a 0.140b 0.137b
−1.635 -1.472 -1.758 0.503 -3.016 -0.136 -1.772 -2.917 -2.038 -1.286 -1.791 -1.629 -2.617 -2.848 -0.562 -2.305 -2.741 -1.690 -3.405 -3.532a
0 3 5 6 0 6 0 8 0 7 1 5 5 5 3 1 7 1 5 1
0.150a 0.160a 0.159a 0.167a 0.149a 0.157a 0.163a 0.061 0.161a 0.154a 0.099 0.169a 0.123b 0.138b 0.150a 0.143b 0.163a 0.149a 0.096 0.146b
The order of the autoregressive correction (p) for the ADF test has been chosen according to the Ng and Perron (1995) t-sig criterion, with an initial maximum lag order of pmax = 8. The finite sample critical values for the ADF test are −3.531 (10% level) and −3.197 (5% level) and are taken from MacKinnon (1991). The finite sample critical values for the KPSS test are 0.122 (10%) and 0.149 (5%) and are given in Sephton (1995). The superscripts a and b denote rejection of the null hypothesis at the 5 and 10% levels, respectively.
At first sight, the individual test statistics offer mixed results. Thus, in most cases they point to non-stationarity, which agrees with the previous evidence – see Gerdtham and L¨othgren (2000). The main exception is for the US real GDP per capita, for which the ADF test rejects the null hypothesis and the KPSS test does not at the 5% level. This leads to a conclusion in favour of stationarity for this variable. However, there are three countries, France, Ireland and Japan, for which mild contradictory evidence is found when analysing the HCE. Thus for these countries both test statistics reject their respective null hypothesis. One potential source for these contradictions can be the lack of power shown by these tests when they are applied in finite samples. Thus we could argue that the number of time periods covered by our data set is not sufficient to warrant good finite sample properties for the tests. In this situation the panel data tests are found to be of great help, provided that they allow an increase in the power of the order of integration analysis by the combination of the cross-section and temporal dimensions. Looking at Table 2, we conclude that both the real HCE and GDP per capita are non-stationary panels. The IPS and MW tests do not reject the null hypothesis of unit root, while the Hadri test rejects it at the 5% level. This result is reached irrespective of the assumption made on the cross-section dependence, i.e. it is independent of the independence assumption. Thus Panel B of Table 2 displays the
848
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
Table 2 Panel data unit root and stationarity tests without structural breaks Panel A: assuming cross-section independence HCE ¯t LM MW Hadri (hom) Hadri (het)
p-Value
3.736 −2.424 20.173 9.611 9.446
GDP
1.000 0.992 0.996 0.000 0.000
p-Value
0.805 −0.422 33.760 9.244 8.158
0.790 0.663 0.746 0.000 0.000
Panel B: bootstrap distribution (%) 1
2.5
5
10
90
95
97.5
99
HCE ¯t LM MW Hadri (hom) Hadri (het)
−3.180 −3.415 20.337 −3.337 −3.124
−2.539 −2.878 24.087 −3.030 −2.827
−2.025 −2.463 27.216 −2.709 −2.505
−1.488 −1.988 31.120 −2.295 −2.107
2.493 1.502 69.223 5.071 4.721
3.184 2.043 77.014 6.999 6.540
3.804 2.547 84.535 8.872 8.349
4.644 3.143 95.155 11.042 10.470
GDP ¯t LM MW Hadri (hom) Hadri (het)
−4.634 −2.995 19.709 −3.151 −2.742
−3.974 −2.380 24.594 −2.827 −2.441
−3.453 −1.865 29.264 −2.510 −2.151
−2.852 −1.254 34.689 −2.078 −1.740
1.656 2.799 81.701 4.659 4.126
2.479 3.407 90.814 6.525 5.748
3.275 3.924 98.976 8.434 7.348
4.341 4.573 109.216 10.808 9.421
Hadri (hom) and Hadri (het) denote the Hadri KPSS test assuming homogeneity and heterogeneity, respectively, in the estimation of the long-run variance.
percentiles of the bootstrap distribution as described in Maddala and Wu (1999). We have performed 20,000 replications for the parametric bootstrap. However, the lack of power is not the only potential source of discrepancy between the individual unit root and stationarity tests. As pointed out in Cheung and Chinn (1994), a misspecification error of the deterministic component of the ADF and KPSS tests can lead to inconclusive results. This misspecification error can be considered in terms of failing to take into account the presence of structural breaks. This is supported by the evidence found in Jewell et al. (2003), where the Im and Lee (2001) unit root test is applied allowing for up to two level shifts. These authors conclude that the unit root hypothesis can be strongly rejected once the level shifts are accounted for. There are some comments on this result. First, although the unit root hypothesis is rejected, they constrain the analysis to up to two structural breaks and mention that their analysis can be extended to more breaks, though this is very cumbersome. Second, we feel that the allowance only for level shifts can give a simplified picture of HCE and GDP behaviour. Thus, we have to bear in mind that, beyond the rejection of the unit root hypothesis, we need to interpret the dates of the breaks, since further analysis can take the estimated breakpoints as given. In this regard, we think that the allowance of both level and slope shifts can provide a more realistic picture of the effects that these trending variables might be undergoing. Third, we think that a weakened structure of cross-section dependence should be included when performing the hypothesis testing, so
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
849
bootstrap distribution is the best choice. Finally, the application of a panel data stationarity test with structural breaks can complement the evidence drawn from the panel data unit root in Jewell et al. (2003). All these considerations have led us to apply the test in Carrion-i-Silvestre et al. (2003). One of the desirable properties of this test is that it considers structural breaks that can affect both the level and the slope of the time series, a specification that is consistent with the pictures of the time series – see Fig. 1. The empirical analysis has specified a maximum of mmax = 5 structural breaks, which seems to be reasonable given the number of time observations (T = 38). Following the suggestion in Bai and Perron (2001), the number of structural breaks associated with each individual is estimated using the LWZ criterion. Table 3 displays the results on HCE. We find that the stationarity null hypothesis is rejected for Japan (the test is on the boundary), Netherlands, Spain and Switzerland at the 5% level, but not for the other countries – see Panel A in Table 3. The finite sample critical values reported in the Table are computed by means of Monte Carlo simulations using 20,000 replications. One noticeable characteristic is that most of the time series are affected by multiple breaks. Norway and the United Kingdom exhibit one break, for eight countries we have found two breaks, and at least three breaks are detected for the rest of the countries. At first sight, we note that this is in sharp contrast with the evidence reported in Jewell et al. (2003). But this is not surprising since, first, it is obvious that they only accommodate for up to two structural breaks, and second, they analyse a different set to time series. The latter might seem to contradict our previous comment regarding the source of the data, but it does not. Let us give more details on this point. As we have mentioned above, our database is the same as the one used by these authors, but the point is that they work with the cross-sectional demeaned data instead of using the raw data. In doing so, they account for the presence of a sort of cross-sectional dependence – the one that assumes it to be common to all the individuals – but the collateral effect is that they get rid of the common structural breaks. The fact that we are working with raw data might be the reason for our finding more structural breaks than they did. When the individual information is introduced into the panel data test and the individuals are assumed to be cross-section independent, the stationarity hypothesis is strongly rejected – see Panel B in Table 3. However, independence is not a realistic assumption given the fact that health expenditures in each country are contemporaneously correlated. In order to control using any cross-sectional dependencies in the data, we approximate the bootstrap distribution of the tests. Now the evidence is mixed. The null hypothesis is still rejected when the long-run variance is supposed to be homogeneous. However, this assumption can be problematic here, because of the substantial heteroskedasticity that exists across countries – see McCoskey and Selden (1998). When this fact is taken into account in the estimation of the long-run variance, the null hypothesis cannot be rejected at the 5% level using the bootstrap critical values for the LM(λ) test. Taken together, our results suggest that the panel data set of HCE is stationary after the structural breaks are introduced into the model. Looking at the estimated break points we realise that most of these dates are associated with reforms aimed to extend the coverage and benefits of health care.4 The extension of 4
For instance, in the case of Australia the government introduced subsidies to encourage the inclusion of nursing accommodation in homes for the aged in 1966. In 1975 the National Health Insurance scheme was introduced.
850
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
Table 3 Panel data stationarity test with structural breaks for the HCE Panel A: country-by-country tests KPSS
Australia Austria Belgium Canada Denmark Finland France Germany Greece Iceland Ireland Italy Japan Netherlands Norway Spain Sweden Switzerland United Kingdom United States
0.024 0.041 0.026 0.037 0.030 0.029 0.046 0.032 0.046 0.050 0.032 0.022 0.055a 0.038a 0.066 0.044a 0.037 0.043a 0.076 0.043
m
3 2 3 2 2 3 2 3 2 2 3 4 2 5 1 5 2 3 1 3
Tb,1
1966 1974 1971 1972 1968 1969 1966 1969 1971 1968 1970 1972 1967 1964 1977 1964 1970 1965 1973 1969
Tb,2
Tb,3
1974 1980 1976 1991 1979 1977 1977 1975 1990 1989 1979 1979 1979 1969
1984
Tb,4
Tb,5
1990
1992 1990
1989 1984
1991
1975
1983
1990
1970 1980 1975
1977
1985
1990
1987
1992
1984
Finite sample critical values (%) 90
95
97.5
99
0.035 0.079 0.057 0.070 0.050 0.048 0.051 0.048 0.065 0.058 0.048 0.061 0.048 0.022 0.104 0.022 0.056 0.033 0.087 0.054
0.039 0.097 0.069 0.084 0.058 0.056 0.058 0.056 0.076 0.067 0.058 0.077 0.054 0.024 0.131 0.024 0.066 0.037 0.103 0.063
0.043 0.117 0.081 0.098 0.064 0.064 0.066 0.065 0.087 0.075 0.068 0.094 0.061 0.026 0.156 0.026 0.075 0.040 0.116 0.071
0.048 0.142 0.096 0.116 0.073 0.075 0.076 0.076 0.102 0.086 0.082 0.113 0.068 0.028 0.190 0.028 0.086 0.045 0.135 0.082
Panel B: panel data stationarity test: assuming cross-section independence
LM(λ) (hom) LM(λ) (het)
Tests
p-Value
12.039 8.520
0.000 0.000
Panel C: bootstrap distribution (%)
LM(λ) (hom) LM(λ) (het)
1
2.5
5
10
90
95
97.5
99
3.682 4.218
3.987 4.560
4.263 4.882
4.590 5.244
7.165 8.189
7.577 8.657
7.920 9.116
8.318 9.657
The finite sample critical values are computed by means of Monte Carlo simulations using 20,000 replications. LM(λ) (hom) and LM(λ) (het) denote the Carrion-i-Silvestre et al. (2003) KPSS test assuming homogeneity and heterogeneity, respectively, in the estimation of the long-run variance. The superscript a denotes rejection of the null hypothesis at the 5% level of significance.
Finally, in 1983 Medicare was introduced. The new system provoked a backlash from medical practitioners. In 1984 there were protests form the doctors. The government decided to increase the remuneration for doctors treating Medicare patients in public hospitals. In the case of the UK, the National Health System suffered major changes in 1973. For the US, in 1969 regulations were issued by The Secretary of the Health, Education and Welfare; there was the Omnibus Reconciliation Act of 1987; Clinton’s health system reform in 1993. Further details on reforms of health systems can be found in the webpage of the European section of the World Health Organization (http://www.euro.who.int/observatory). A brief overview of these reforms is available from the author upon request.
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
851
Table 4 Panel data stationarity test with structural breaks for the GDP Panel A: country-by-country tests KPSS
Australia Austria Belgium Canada Denmark Finland France Germany Greece Iceland Ireland Italy Japan Netherlands Norway Spain Sweden Switzerland United Kingdom United States
0.028 0.028 0.042 0.067 0.050 0.043 0.028 0.034 0.031 0.041 0.038 0.023 0.023 0.031 0.055 0.023 0.042 0.029 0.058 0.049
Tb,1
m
3 4 1 2 1 2 3 3 2 2 2 3 4 2 1 4 2 2 2 1
1974 1968 1972 1981 1973 1975 1974 1966 1973 1967 1977 1968 1966 1973 1987 1964 1969 1974 1980 1965
Tb,2
1981 1974
Tb,3
Tb,4
1989 1980
1988
1989 1990 1979 1973 1980 1980 1986 1981 1974 1981 1973 1991 1988 1990
1987 1990
1991 1981
1988
1984
1991
Finite sample critical values (%) 90
95
97.5
99
0.074 0.037 0.084 0.138 0.087 0.084 0.075 0.041 0.072 0.047 0.097 0.041 0.029 0.071 0.205 0.027 0.063 0.076 0.127 0.094
0.093 0.044 0.098 0.179 0.102 0.106 0.095 0.048 0.088 0.053 0.124 0.048 0.033 0.087 0.266 0.029 0.074 0.094 0.165 0.113
0.113 0.051 0.112 0.217 0.118 0.126 0.115 0.054 0.104 0.058 0.15 0.055 0.037 0.102 0.329 0.032 0.084 0.113 0.205 0.134
0.138 0.06 0.131 0.276 0.137 0.153 0.143 0.061 0.125 0.065 0.181 0.064 0.042 0.124 0.412 0.035 0.096 0.14 0.254 0.16
Panel B: panel data stationarity test: assuming cross-section independence
LM(λ) (hom) LM(λ) (het)
Test
p-Value
5.706 4.315
0.000 0.000
Panel C: bootstrap distribution (%)
LM(λ) (hom) LM(λ) (het)
1
2.5
5
10
90
95
97.5
99
15.505 12.218
15.803 12.468
16.110 12.701
16.438 12.974
19.086 15.409
19.513 15.840
19.907 16.241
20.349 16.781
The finite sample critical values are computed by means of Monte Carlo simulations using 20,000 replications. LM(λ) (hom) and LM(λ) (het) denote the Carrion-i-Silvestre et al. (2003) KPSS test assuming homogeneity and heterogeneity, respectively, in the estimation of the long-run variance.
the coverage and the increase of the cost of the medical care due to technological advances in medicine lead to increase of HCE. However, in some cases the economic crises of the 1970s lead governments to reconsider the extension of health coverage – for instance, this is the case of Belgium, Italy and Japan – which might have reduced the growth of the HCE. This is in line with the idea that governments play a major role in the financing of HCE in most of the OECD countries, and therefore, it is the consequence of the strong correlation between HCE and GDP – see Jewell et al. (2003). With regard to the GDP, the individual KPSS test points to the stationarity of all the countries since the null hypothesis cannot be rejected at the 5% level for any of them – see Panel A in Table 4. As before, the combination of the individual information, both for
852
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
homogeneous and heterogeneous long-run variance, points to the non-stationarity of the panel set. However, this conclusion is reversed when cross-section dependence is taken into account. Thus, the null of stationarity cannot be rejected by either the homogeneous or the heterogeneous long-run version of the test if the bootstrap critical values are used – see Panel C in Table 4. Now four countries present just one structural break, nine countries exhibit two structural breaks, and at least three breaks are detected for the rest of the countries. As before, most of the estimated break points are around the time of the oil crises. In all, we have found evidence that favours the stationarity of both the HCE and GDP. This result agrees with the one in Jewell et al. (2003) and supports the idea that these time series have been affected by multiple structural breaks. It should be stressed that this finding is robust to the presence of cross-section dependence, since it is based on the use of bootstrap critical values.
4. Conclusions In this paper, we have shown that the panel data sets of real per capita HCE and GDP are stationary around a broken trend that exhibits multiple structural breaks, which reinforces recent results obtained from the use of panel data unit root tests. Moreover, we have emphasised the fact that these structural breaks are allowed to affect both the level and the slope of the time series. This is specially convenient, since it offers a better fit of the pattern that is behind these time series. It should be stressed that our analysis is robust to cross-sectional dependence, since we have based the inference on the bootstrap distribution. Up to now this point has not been addressed in the literature in a satisfactory way. Our paper sheds light on some important questions that were raised in the past. In this regard, Hansen and King (1996) noted that some sort of misspecification in the model relating HCE and GDP could affect their conclusions about the existence of an equilibrium relationship – they were especially concerned about non-linearity. It is well known that a misspecification error of the deterministic component of the regressions that are used to test either the unit root or the stationarity hypothesis can bias the conclusions of the analyses towards the non-stationarity. It has been shown that non-linearities have to be addressed when modelling the relationship between HCE and GDP since the behaviour of these variables has been affected by structural changes due to either reforms in the health systems – supply side interventions aimed to promote and extend health coverage and benefits – or modifications of the economic relationships in a general way – for instance, interventions that tried to control health expenditure due to recessions caused by the oil shocks. The lack of allowing for these structural breaks can lead analysts to the erroneus conclusion that the panels are non-stationary when in fact they are. More interestingly, when these panels were characterised as non-stationary then the investigation continued using the panel cointegration tests. Notwithstanding, the application of these techniques is not robust to unattended structural breaks since they can lead to the conclusion that the residuals are non-stationary, i.e. point to the lack of cointegration. The fact that these two panel sets are stationary once the structural breaks are taken into account implies that meaningful advice for policy makers can be drawn from panel regressions that relate HCE and GDP without the application of cointegration techniques.
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
853
The goal is to take into account the presence of these unusual events, i.e. structural breaks, when performing the panel regression estimation. Furthermore, our results indicate that the parameters of the relationship between these variables might have changed during the time period analysed. This feature is relevant for policy advice that is based on the use of regression models, because the inference and prediction that emerges from these models can be more accurate if these structural breaks are allowed for. For instance, the debate about health care as a luxury good should be addressed taking into account these structural breaks, otherwise the estimation of the parameters, i.e. the income elasticity, might be biased. Finally, since some of the structural breaks that have been estimated for the HCE and GDP are located in similar dates, they might cancel out when regressing HCE against GDP. This phenomenon, known as cobreaking, might be found, for instance, for the structural breaks that corresponds with the oil crises. However, the feature that some of the structural breaks detected for the HCE are due to policies coming from the supply side of the market for health care indicates that some instability should be present when modelling the relationship. Future research should investigate this concern. Acknowledgements Financial support is acknowledged from the Ministerio de Ciencia y Tecnolog´ıa under grant SEC2001-3672. References Andrews, D.W.K., 1991. Heteroskedasticity and autocorrelaction consistent covariance matrix estimation. Econometrica 59, 817–858. Andrews, D.W.K., Monahan, J.C., 1992. An improved heteroskedasticity and autocorrelation consistent autocovariance matrix. Econometrica 60, 953–966. Bai, J., Perron, P., 1998. Estimating and testing linear models with multiple structural changes. Econometrica 66 (1), 47–78. Bai, J., Perron, P., 2001. Multiple Structural Change Models: A Simulation Analysis. Technical Report. Ben-David, D., Papell, D.H., 1995. The great wars, the great crash, and steady growth: some new evidence about an old stylized fact. Journal of Monetary Economics 36, 453–475. Ben-David, D., Lumsdaine, R.L., Papell, D.H., 1996. Unit Roots, Postwar Slowdowns and Long-Run Growth: Evidence from Two Structural Breaks. Technical Report, The Foerder Institute, pp. 33–96. Blomqvist, A.G., Carter, R.A.L., 1997. Is heath care really a luxury?. Journal of Health Economics 16, 207–229. Carrion-i-Silvestre, J.L., Del Barrio, T., L´opez-Bazo, E., 2003. Breaking the Panels. An Application to the GDP Per Capita. University of Barcelona, Download from: http://www.ub.es/div2/recerca/documents/papers/e97.pdf. Cheung, Y.-W., Chinn, M.D., 1994. Further investigation of the Uncertain Unit Root in GNP. Technical Report 288, University of California, Santa Cruz. Gerdtham, U.G., L¨othgren, M., 2000. On stationarity and cointegration of international health expediture and GDP. Journal of Health Economics 19, 461–475. Hadri, K., 2000. Testing for stationarity in heterogeneous panel data. Econometrics Journal 3, 148–161. Hansen, P., King, A., 1996. The determinants of health care expenditure: a cointegration approach. Journal of Health Economics 15, 127–137. Hansen, P., King, A., 1998. Heath care expenditure and GDP: panel data unit root test results – comment. Journal of Health Economics 17, 377–381. Im, K.S., Lee, J., 2001. Panel LM Unit Root Test with Level Shits. Technical Report, Department of Economics, University of Central Florida.
854
J.L. Carrion-i-Silvestre / Journal of Health Economics 24 (2005) 839–854
Im, K.S., Pesaran, M.H., Shin, Y., 1997. Testing for Unit Roots in Heterogeneous Panels. Technical Report, Department of Applied Economics, University of Cambridge. Im, K.S., Pesaran, M.H., Shin, Y., 2003. Testing for unit roots in heterogeneous panels. Journal of Econometrics 115, 53–74. Jewell, T., Lee, J., Tieslau, M., Strazicich, M.C., 2003. Stationarity of heath expenditures and GDP: evidence from panel unit root tests with heterogeneous structural breaks. Journal of Health Economics 22, 313–323. Kleiman, E., 1974. The Determinants of National Outlay on Heath. Macmillan, London. Kurozumi, E., 2002. Testing for stationarity with a break. Journal of Econometrics 108, 63–99. Kwiatkowski, D., Phillips, P.C.B., Schmidt, P.J., Shin, Y., 1992. Testing the null hypothesis of stationarity against the alternative of a unit root: how sure are we that economic time series have a unit root. Journal of Econometrics 54, 159–178. Lee, J., 1996. On the power of stationarity tests using optimal bandwidth estimates. Economics Letters 51, 131–137. Liu, J., Wu, S., Zidek, J.V., 1997. On segmented multivariate regressions. Statistica Sinica 7, 497–525. MacKinnon, J.G., 1991. Critical Values for Cointegration Tests. In: Engle, R.F., Granger, C.W.J. (Eds.), Long-Run Economic Relationships. Readings in Cointegration, Oxford 267–276. Maddala, G.S., Wu, S., 1999. A Comparative Study of Unit Root Tests with Panel Data and a New Simple Test. Oxford Bulletin of Economics and Statistics. Special Issue 631–652. McCoskey, S., Kao, C., 1998. A residual-based test of the null of cointegration in panel data. Econometric Reviews 17, 57–84. McCoskey, S., Selden, T.M., 1998. Health care expeditures and GDP: panel data unit root test results. Journal of Health Economics 17, 369–376. Newey, W.K., West, K.D., 1994. Automatic lag selection in covariance matrix estimation. Review of Economic Studies 61, 631–653. Newhouse, J.P., 1977. Medical care expenditures: a cross-national survey. Journal of Human Resources 12, 115–125. Ng, S., Perron, P., 1995. Unit root test in ARMA models with data-dependent methods for the selection of the truncation lag. Journal of the American Statistical Association 90, 268–281. OECD, 1998. OECD Health Care Data: A Software Package for International. OECD. Perron, P., 1989. The great crash, the oil price shock and the unit root hypothesis. Econometrica 57 (6), 1361–1401. Perron, P., 1997. Further evidence on breaking trend functions in macroeconomic variables. Journal of Econometrics 80, 355–385. Perron, P., Vogelsang, T., 1992. Nonstationarity and level shifts with an application to purchasing power parity. Journal of Business & Economic Statistics 10 (3), 301–320. Roberts, J., 1998. Spurious regression problems in the determinants of health care expenditure: a comment on Hitiris (1997). Applied Economics Letters. Sephton, P.S., 1995. Response surfaces estimates of the KPSS stationary test. Economics Letters 47, 255–261. Shin, Y., Snell, A., 2000. Testing for Stationarity in Heterogeneous Panels with Serially Correlated Errors. Technical Report, Department of Economics, University of Edinburgh.