International Journal of Forecasting 26 (2010) 326–347 www.elsevier.com/locate/ijforecast
Bayesian forecasting using stochastic search variable selection in a VAR subject to breaks Markus Jochmann a,∗ , Gary Koop a , Rodney W. Strachan b a University of Strathclyde, Australia b The University of Queensland, Australia
Abstract This paper builds a model which has two extensions over a standard VAR. The first of these is stochastic search variable selection, which is an automatic model selection device that allows coefficients in a possibly over-parameterized VAR to be set to zero. The second extension allows for an unknown number of structural breaks in the VAR parameters. We investigate the in-sample and forecasting performance of our model in an application involving a commonly-used US macroeconomic data set. In a recursive forecasting exercise, we find moderate improvements over a standard VAR, although most of these improvements are due to the use of stochastic search variable selection rather than to the inclusion of breaks. c 2009 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
Keywords: Vector autoregressive model; Predictive density; Over-parameterization; Structural break; Shrinkage
1. Introduction Since the so-called Minnesota revolution associated with the work of Chris Sims and colleagues at the University of Minnesota (see Doan, Litterman, & Sims, 1984; Litterman, 1986; Sims, 1980), Bayesian vector autoregressions (VARs) have become a popular and successful forecasting tool (see, among a myriad of others, Andersson & Karlsson, 2008; Kadiyala & Karlsson, 1993). Geweke and Whiteman (2006) ∗ Corresponding author.
E-mail addresses:
[email protected] (M. Jochmann),
[email protected] (R.W. Strachan).
provide an excellent survey of the development of Bayesian VAR forecasting. Despite the success of VARs relative to many alternatives, two problems are widely thought to undermine their forecast performance. These are the problems of structural breaks and overparameterization. The purpose of this paper is to investigate whether extending a VAR to allow for structural breaks and using stochastic search variable selection will help overcome these problems. With regard to the problem of over-parameterization, VARs have so many parameters that over-fitting is a serious risk. A good in-sample model fit will not necessarily lead to a good forecasting performance.
c 2009 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. 0169-2070/$ - see front matter doi:10.1016/j.ijforecast.2009.11.002
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
Furthermore, in a typical VAR, many coefficients are close to zero and/or are estimated imprecisely. This can lead to a great deal of forecast uncertainty (e.g. large predictive standard deviations). As a result, most Bayesian VARs involve informative priors (e.g. the Minnesota prior) that allow for shrinkage. Indeed, a wide variety of Bayesian and non-Bayesian approaches have emphasized the importance of shrinkage for improving the forecast performance. Potential problems with over-fitting become even greater when structural breaks are allowed for, increasing the need for shrinkage. Geweke and Whiteman (2006) provide a general discussion of this issue, and empirical examples include Diebold and Pauly (1990), and Koop and Potter (2004). In this paper, we investigate an alternative way of dealing with the over-parameterization problem: stochastic search variable selection (SSVS). SSVS can be thought of as a hierarchical prior where each of the parameters in the VAR is drawn from one of two normal distributions. The first of these has a zero mean and a variance very near to zero (i.e. the parameter is virtually zero and the corresponding variable is effectively excluded from the model). The second also has a zero mean but has a large variance, thus implying a relatively non-informative prior (i.e. the parameter is non-zero and the corresponding variable is allowed to enter the model). Such “spike and slab” priors have become popular for model selection or model averaging in regression models (see, e.g., Ishwaran & Rao, 2005). George, Sun, and Ni (2008) develop methods for implementing SSVS in VAR models and Korobilis (2008) presents an application in a factor model. In this paper we use a VAR with SSVS, using the same approach as George et al. (2008), but extended to allow for structural breaks in a recursive forecasting exercise. Note that this approach can be thought of as a way of doing shrinkage. That is, some coefficients are shrunk virtually to zero, while others are left relatively unconstrained and are estimated in a data-based fashion. With regard to the problem of structural beaks, several recent papers have highlighted the fact that structural instability seems to be present in a wide variety of macroeconomic and financial time series (e.g., Ang & Bekaert, 2002; Stock & Watson, 1996). The negative consequences of ignoring this instability for inference and forecasting have been
327
stressed by Clements and Hendry (1998, 1999), Koop and Potter (2001, 2007) and Pesaran, Pettenuzzo, and Timmerman (2006), among many others. If a structural break occurs, then pre-break data can potentially contaminate forecasts. In this paper, we adopt an approach similar to that of Chib (1998) for modelling the break process. This approach assumes that there is a constant probability of a break in every time period, and implies that our forecasts use only data since the most recent break. Furthermore, similarly to Koop and Potter (2007) and Pesaran et al. (2006), such an approach allows for the possibility that a structural break might hit during the forecast period. After building our model, which extends the VAR to allow for structural breaks and uses the SSVS hierarchical prior, we use it in a recursive forecasting exercise involving US data. In particular, we work with three variables: the unemployment rate, the interest rate and the inflation rate. The original data run from 1953Q1 to 2006Q3. We compare our full model (the VAR with SSVS plus breaks) with three restricted versions that use SSVS: one restricts breaks to affect only the VAR coefficients; the second allows breaks to occur only in the error covariance matrix; and the third is a standard VAR with SSVS. We also compare the results with those associated with two benchmark VARs, one based on a relatively non-informative prior and the other using the standard Minnesota prior. Our recursive forecasting evidence shows that SSVS offers an appreciable improvement in forecast performance, but evidence that the inclusion of structural breaks improves the forecast performance in this data set is weaker. In general, the full model which allows for breaks in all of the parameters forecasts poorly, but the restricted version that allows for breaks only in the error covariance matrix forecasts well. 2. The VAR with the SSVS prior and structural breaks The models used in this paper all begin with an unrestricted VAR, which we write as yt+h = X t−1 α + εt ,
(1)
where yt+h is an n × 1 vector of observations on the dependent variables at time t + h; α is a K × 1 vector of coefficients on the explanatory variables; εt
328
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
are independent N (0, Σ ) errors for t = 1, . . . , T ; and X t−1 is an n × K matrix containing lagged dependent variables appropriate for forecasting (i.e., X t−1 contains the information available at time t), and an intercept (arranged appropriately to define a VAR( p)), such that K = n(1 + pn). If we define the n × n identity matrix as In , then we can write X t−1 as X t−1 = In (yt−1 ⊗ In ) (yt−2 ⊗ In ) . . . yt− p ⊗ In . This standard specification of a VAR provides us with one-step-ahead forecasts when h = 0. When we allow h > 0, the above specification allows us to use the direct method to forecast at horizon h.1 For a discussion on the direct method and some evidence of its advantages over multi-step iterative forecasts, see Marcellino, Stock, and Watson (2006). To this foundation, we add the SSVS prior and allow for structural breaks. In this section, we describe how this is done, with Appendix A providing full details of our Bayesian econometric methods, including the Markov chain Monte Carlo (MCMC) algorithm. 2.1. Stochastic search variable selection 2.1.1. Informal description of SSVS SSVS is usually used as a method for selecting variables in a regression model (e.g., George & McCulloch, 1993, 1997). However, it can also be interpreted as a hierarchical prior for Eq. (1). In this section, we will describe the basic ideas of SSVS as it relates to the VAR coefficients, α = (α1 , . . . , α K )0 . However, we also apply SSVS to parameters relating to the off-diagonal elements of Σ , allowing for each of these to be shrunk virtually to zero (we do not allow for the diagonal elements of Σ to be shrunk to zero, since then it would be singular). Full details are given in the following section. The SSVS prior for each VAR coefficient is a mixture of two normals: α j |γ j ∼ 1 − γ j N 0, κ02 j + γ j N 0, κ12 j , (2) 1 The use of the direct method is a great simplification, since it means that we can always use standard analytical results for the one-step-ahead predictive. In models such as ours, the multistep-ahead predictive does not have a simple analytical form, and a computationally-demanding predictive simulation would be required. The drawback of the direct method is that we have to redo the forecasting exercise for different values of h.
where γ j is a dummy variable. If γ j equals one then α j is drawn from the second normal, and if it equals zero then α j is drawn from the first normal. The prior is hierarchical, since γ = (γ1 , . . . , γ K )0 is treated as an unknown vector of parameters and estimated in a data-based fashion. The SSVS aspect of this prior arises by choosing the first prior variance, κ02 j , to be “small” (so that the coefficient is virtually zero), and the second prior variance, κ12 j , to be “large” (implying a relatively non-informative prior for the corresponding coefficient). Just how “small” or “large” the prior variances can be chosen to be is discussed by George and McCulloch (1993, 1997), for example. Basically, the small prior variance is chosen to be small enough that the corresponding explanatory variable is, to all intents and purposes, deleted from the model. In this paper, we use what George et al. (2008) call a “default semi-automatic approach”, which requires no subjective prior information from the researcher.2 In order to understand how SSVS can be used in forecasting, note that γ j can be interpreted as an indicator of whether the jth VAR coefficient is in the model or not. At each point in time in a recursive forecasting exercise, SSVS methods allow for the calculation of Pr(γ j = 0|Data). SSVS can be used either as a model selection device (e.g., we could choose to forecast using the restricted VAR that includes only the coefficients for which Pr(γ j = 0|Data) > 0.5), or for performing model averaging. In our empirical work, we perform model averaging. That is, our MCMC algorithm provides us with draws of γ j for j = 1, . . . , K . Each draw implies a particular restricted VAR which we can use for forecasting. By averaging over all MCMC draws we are doing Bayesian model averaging. We stress that Pr(γ j = 0|Data) can be different at different points in time in the recursive forecasting exercise, and thus, the weights attached to the different models change over time.
2 Thus far, we have referred to SSVS as allowing for coefficients to be “virtually zero” and explanatory variables to be “to all intents and purposes zero”, since κ02 j is not precisely zero. From this point onwards, we will omit such qualifying words and phrases as “virtually” or “to all intents and purposes”.
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
2.1.2. Formal description of the SSVS prior for VARs In this section, we formally define the model and SSVS prior used in the empirical section when we do not allow for structural breaks. Our treatment is the same as that of George et al. (2008). The VAR in Eq. (1) can be written in matrix form as: Y = X A + ε,
(3)
where Y is a T × n matrix with the tth row given by 0 , and X is a T ×(1+ pn) matrix containing lagged yt+h dependent variables and an intercept, such that the tth 0 , . . . , y0 row of X is given by the vector (1, yt−1 t− p ). A is the matrix of coefficients and ε is a T ×n matrix with the tth row given by εt0 . In Eq. (1) we stacked the VAR coefficients in a vector containing K elements that we can formally define as α = vec(A), and this gave us K = n(1 + pn), where p is the lag length. We denote the elements of this vector by α j for j = 1, . . . , K . The earlier distributional assumption about εt defines the likelihood function. SSVS can be interpreted as defining a hierarchical prior for all of the elements of A and Σ . The prior for α given in Eq. (2) can be written more compactly as: α|γ ∼ N (0, D D),
(4)
where γ is a K ×1 vector of unknown parameters with typical element γ j ∈ {0, 1}, and D is a diagonal matrix with the ( j, j)th element given by d j , where κ if γ j = 0 dj = 0j (5) κ1 j if γ j = 1. Note that this prior implies a mixture of two normals, as written in Eq. (2). Our main empirical results are based on the use of what George et al. (2008) call the “default semi-automatic approach” to selecting the prior hyperparameters κ0 j and κ1 j , and the reader is referred to this paper for additional justifications for this approach. Basically, κ0 j should be selected so that α j is essentially zero and κ1 j should be selected so that α j is empirically substantive. The default semi-automatic p approach involves setting κ0 j = c0 vc ar(α j ) and p κ1 j = c1 vc ar(α j ), where vc ar(α j ) is an estimate of the variance of the coefficient in an unrestricted VAR (e.g., the ordinary least squares quantity or an estimate based on a preliminary MCMC run of the VAR using a non-informative prior). The pre-selected constants c0
329
and c1 must have c0 c1 , and we set c0 = 0.1 and c1 = 10. Appendix C discusses prior sensitivity and contains results using subjectively-elicited choices for κ0 j and κ1 j . For γ , the SSVS prior posits that each element has a Bernoulli form (independent of the other elements of γ ), and hence, for j = 1, . . . , K , we have Pr(γ j = 1) = q j Pr(γ j = 0) = 1 − q j .
(6)
We set q j = 0.5 for all j. This is a natural default choice, as it implies that each coefficient is a priori equally likely to be included or excluded. The covariances in Σ can also be given an SSVS prior. To see how this is done, note that we can decompose the error covariance matrix as: Σ −1 = ΨΨ 0 ,
(7)
where Ψ is upper-triangular. The SSVS prior involves using a standard gamma prior for the square of each of the diagonal elements of Ψ and the SSVS mixture of normals prior for each element above the diagonal. Note that this implies that the diagonal elements of Ψ are always included in the model, ensuring a positive definite error covariance matrix. This form for the prior also greatly simplifies posterior computation. Precise details are provided in the next paragraph. Let the non-zero elements of Ψ be labelled as ψi j , and define ψ = (ψ11 , . . . , ψnn )0 , η j = (ψ1 j , . . . , ψ j−1, j )0 for j = 2, . . . , n and η = (η20 , . . . , ηn0 )0 . For the diagonal elements of Ψ , we assume prior independence from each other and: ψ 2j j ∼ G a j , b j , (8) where G(a j , b j ) denotes the gamma distribution with mean a j /b j and variance a j /b2j . We specify the relatively non-informative choices of a j = 2.2 and b j = 0.24. Note that all of the variables in our empirical application are rates, measured as percentages, so that this prior is centered in a sensible region for such variables but is quite diffuse. Our empirical results are robust to moderately large changes in these prior hyperparameters. The hierarchical prior for the off-diagonal elements of Ψ takes the same mixture of normals form as α. In
330
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
particular, the SSVS prior has η j |ω j ∼ N 0, F j F j ,
2.2. Allowing for structural breaks (9) 0
where ω j = ω1 j , . . . , ω j−1, j is a vector of unknown parameters with typical element ωi j ∈ {0, 1}, and F j = diag( f 1 j , . . . , f j−1, j ), where κ if ωi j = 0 f i j = 0i j (10) κ1i j if ωi j = 1, for j = 2, . . . , n and i = 1, . . . , j − 1. Note that this prior implies a mixture of two normals for each off-diagonal element of Ψ : 2 2 ψi j |ωi j ∼ 1 − ωi j N (0, κ0i j ) + ωi j N (0, κ1i j ). (11) As we did with κ0 j and κ1 j (the prior hyperparameters in the SSVS prior on the VAR coefficients), we use a semi-automatic default approachpto selectc ar(ψi j ) ing κ0i j and κ1ip j . That is, we set κ0i j = c0 v and κ1i j = c1 vc ar(ψi j ), where vc ar(ψi j ) is an estimate of the variance of the appropriate off-diagonal element of Ψ from an unrestricted VAR (e.g., the ordinary least squares quantity or the posterior variance from a preliminary run of the unrestricted VAR using a non-informative prior). The pre-selected constants c0 and c1 are set (as before) to be c0 = 0.1 and c1 = 10. For ω = (ω20 , . . . , ωn0 )0 , the SSVS prior posits that each element has a Bernoulli form (independent of the other elements of ω), and hence we have Pr(ωi j = 1) = q i j , Pr(ωi j = 0) = 1 − q i j .
(12)
We make the default choice of q i j = 0.5 for all j and i, so that, a priori, each parameter is equally likely to be included or excluded. This completes our discussion of the SSVS hierarchical prior for the VAR. In our empirical section, for comparison purposes, we also include two other unrestricted VARs. To ensure the comparability of the priors across the VARs, one of our unrestricted VARs is a limiting case of the VAR with the SSVS prior. That is, our unrestricted VAR is as above except that γ j = ωi j = 1 for all i and j. The second of these is an unrestricted VAR with the commonlyused Minnesota prior. This is implemented exactly as described by Canova (2007, p. 378ff).
Given the finding of widespread structural instabilities in many macroeconomic time series models, it is potentially important to allow for structural breaks (in both the conditional mean and the conditional variance) of the VAR. Accordingly, we extend the VAR with the SSVS prior to allow breaks to occur. Our approach will allow for a constant probability of a break in each time period. With such an approach, the number of breaks, M −1, is unknown (and estimated in the model). This leads to a model with structural breaks at times t1 , . . . , t M−1 (and therefore M regimes exist). Thus, Eq. (1) is replaced by: yt+h = X t−1 α (1) + εt ,
εt ∼ N (0, Σ (1) )
for t = 1, . . . , t1 yt+h = X t−1 α (2) + εt ,
εt ∼ N (0, Σ (2) )
for t = t1 + 1, . . . , t2 .. . yt+h = X t−1 α (M) + εt ,
(13)
εt ∼ N (0, Σ (M) )
for t > t M−1 . Within each regime, we use the SSVS prior described previously. There are many models that could be used for the break process (see, among many others, Chib, 1998; Kim, Nelson, & Piger, 2004; Koop & Potter, 2007, 2009; Maheu & Gordon, 2008; Maheu & McCurdy, 2009; Martin, 2000; McCulloch & Tsay, 1993; Pastor & Stambaugh, 2001; Pesaran et al., 2006). In this paper, we adopt a specification closely related to that used by Chib (1998). It assumes that there is a constant probability of a structural break occurring, q, in every time period. This is a simple but attractive choice, since (unlike many other approaches) it does not impose a fixed number of structural breaks on the model.3 Furthermore, it allows us to predict the probability of a structural break occurring during our forecast period. Thus, it helps address one of the major concerns of macroeconomic forecasters: how 3 Formally, our algorithm requires the choice of a maximum number of breaks (with the actual number of breaks occurring insample being estimated from the data). We choose this maximum to be large enough so as not to unreasonably constrain the number of breaks.
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
to forecast when structural breaks might be occurring. In practice, our method will involve (with probability 1 − q) forecasting assuming that a break has not occurred (i.e., using the VAR model that holds at time τ , the period in which the forecast is being made), and (with probability q) forecasting assuming that a break has occurred (during the forecast horizon). In a spirit similar to that of Maheu and McCurdy (2009) and Maheu and Gordon (2008), we assume that if a break occurs then past data provide no help with forecasting, and accordingly forecasts are produced using only prior information.4 The approach of Chib (1998) been used and extended by many authors, including Kim et al. (2004), Pastor and Stambaugh (2001) and Pesaran et al. (2006). It involves the characterization of regimes defined by the structural breaks through discrete random variables, st (for t = 1, . . . , T ), that take on values {1, 2, . . . , M}. To implement this setup, we begin by selecting M, the maximum number of regimes. We assume that st follows a Markov process of a particular form. That is, 1 − q if j = i 6= M, q if j = i + 1, (14) Pr (st = j|st−1 = i) = 1 if i = j = M, 0 otherwise. This structure for st ensures that the time series variable goes from regime to regime. Once it has gone through the mth regime, there is no returning to this regime. It goes through regimes sequentially, so that it is not possible to skip from regime i to regime i + 2. Once it reaches the Mth regime it stays there. Eq. (14) can be thought of as defining a hierarchical prior for st , which we complete with a definition of the prior for q below. The difference between the prior used in this paper and that used by Chib (1998) is that we choose M to be a large value and do not assume that every regime is visited before time T .5 In this sense, our approach does not impose a fixed number of breaks in
4 In contrast, Koop and Potter (2007) and Pesaran et al. (2006) develop different models where past data have some predictive power, even after a structural break has occurred. Which alternative is preferable depends on the nature of the break process in the empirical problem under study. 5 The empirical results in this paper use M = 4, although using higher values yields virtually the same results.
331
the observed data (since some could occur after time T ). Thus, we are allowing for breaks occurring during the period being forecast. The advantages of such an approach are discussed by Koop and Potter (2009). To complete the model, a prior for q is required. A popular choice is a Beta prior: B(δ 1 , δ 2 ). We select δ 1 = 0.01 and δ 2 = 1.0, which is a relatively noninformative choice. Appendix C considers the issue of prior sensitivity to this choice. Details of the MCMC algorithm for this model are provided in Appendix A. 3. Empirical results In this section, we present our empirical results using a standard set of macroeconomic variables. Specifically, our US data set runs from 1953Q1 to 2006Q3 and contains the unemployment rate (seasonally adjusted civilian unemployment rate, all workers over age 16), the interest rate (yield on the three-month Treasury bill rate) and the inflation rate (the annual percentage change in a chain-weighted GDP price index).6 This set of variables (possibly transformed) has previously been used by Cogley and Sargent (2005), Primiceri (2005) and Koop, L´eon-Gonz´alez, and Strachan (2009), among many others. We allow for 4 lags in our VAR. This choice of a large lag length is motivated by our use of SSVS. That is, even if the true lag length is less than 4, the use of SSVS means that the coefficients on longer lags can be set to zero. We use forecast horizons of 1 quarter, 1 year and 2 years (which corresponds to h = 0, 3 and 7 in our timing convention established in Eq. (1)). Note that yt+h is our dependent variable and we repeat our entire empirical exercise three times, once for each forecasting horizon. Our in-sample results are for the h = 0 case. Note also that we are working with VARs, as opposed to models that allow for cointegration. Ignoring unit root and cointegration issues is common practice in Bayesian macroeconomic studies. For instance, most of the work associated with Christopher Sims’ work with VARs (see, e.g., the citations at the beginning of the introduction, or, more recently, 6 The data were obtained from the Federal Reserve Bank of St. Louis website, http://research.stlouisfed.org/fred2/.
332
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
extensions such as that of Sims & Zha, 2006) treats cointegration as a particular parametric restriction that may or may not hold and is likely to have little relevance for forecasting at short to medium horizons. Similar considerations have presumably motivated recent influential empirical macroeconomic papers such as those of Cogley and Sargent (2001, 2005) and Primiceri (2005), who work with extensions of VARs (although some forecasting papers, such as that of Hall, Anderson, & Granger, 1992, argue for the benefits of including cointegration restrictions). Nevertheless, we make one small alteration to the SSVS prior in order to maintain comparability with the Minnesota prior which is centered over a random walk. For the coefficient on the own first lag in each equation in the VAR, we use a prior mean of one in both elements of the normal mixture given in Eq. (2), instead of the zero typically used with SSVS in Eq. (2). SSVS always selects the second element in the normal mixture, so that this procedure involves centering the SSVS prior over a random walk (although with the associated prior variance left at its large, relatively non-informative value). We present empirical results for six models (with acronyms given in parentheses): the VAR with SSVS and breaks in all of the parameters (SSVS+BREAKS), the VAR with SSVS but no breaks (SSVS−VAR), the VAR with SSVS and breaks only in the VAR coefficients (SSVS+BREAKS COEFF), the VAR with SSVS and breaks only in the error covariance matrix (SSVS+BREAKS COV), the VAR with a Minnesota prior (MINN−VAR), and a standard constant coefficient VAR using a restricted version of the SSVS prior that simply imposes γ j = 1 for all parameters (BENCHMARK VAR). The model with no breaks (SSVS−VAR) is obtained by setting q = 0, while the ones that allow for breaks in only some of the parameters (i.e., SSVS+BREAKS COEFF and SSVS+BREAKS COV) are obtained by restricting appropriate blocks of parameters (i.e., the error covariance matrix and VAR coefficients respectively) to be unchanged when a break occurs. Thus, four of the six models can be interpreted as being nested within the SSVS+BREAKS model. For these models, we use the same prior as in the SSVS+BREAKS model, conditional on the restriction being imposed. Finally, for the model that uses the Minnesota prior, we specify the prior exactly as was done by Canova
(2007). We divide our discussion of the empirical results into two parts. In the first, we briefly present in-sample estimation results using the full sample. In the second, we present the results of our recursive forecasting exercise. We have confirmed the convergence of our MCMC algorithm using standard convergence diagnostics on the MCMC output for our most general model (SSVS+BREAKS) with the full sample. Estimates of one of these diagnostics, that of Gelman and Rubin (1992), are presented in Appendix D. 3.1. In-sample results We begin by briefly discussing the performance of SSVS using the full sample and h = 0. We focus on three models: SSVS−VAR, SSVS+BREAKS and SSVS+BREAKS COV. To preview our forecasting exercise, the last model is found to forecast particularly well. A complete set of point estimates and standard deviations for the VAR coefficients and the parameters determining the error covariance matrix (Ψ ) in these models is given in Appendix B. Note that, when using SSVS, we can calculate the posterior probability of inclusion of each parameter (i.e., the posterior probability that γ j = 1 for the VAR coefficients or the posterior probability that ωi j = 1 for the off-diagonal elements of Ψ can be calculated), and these are also given in Appendix B. Here we summarize a few noteworthy aspects of the results in Appendix B. In this discussion we will refer to a parameter as being included (excluded) if its posterior inclusion probability is greater than (less than) 0.5. Note first that, in models that allow for breaks, we find the most evidence in favor of two particular breaks. Fig. 1 plots the probability that a break occurs in each time period for the SSVS+BREAKS model (i.e., it plots Pr(st = j|st−1 = j − 1, Y ) against time). There is evidence of two breaks, one in the midto late-1960s and one in the mid-1980s. The latter is the timing associated with the Great Moderation of the business cycle. Fig. 1 implies that the cumulative probability that a break occurs sometime in the period 1964–1970 is high, and the same holds for the period 1983–1987. Accordingly, the posterior results in Appendix B present the VAR coefficients and Ψ in the three regimes indicated by Fig. 1. A key finding in all of the models that use SSVS is that this prior results in the exclusion of many
0.00
0.10
0.20
0.30
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
1960
1970
1980
1990
2000
Fig. 1. Posterior probability of a break occurring.
coefficients. SSVS−VAR includes only 12 of the 36 possible VAR coefficients and 1 of the 3 possible offdiagonal elements of Ψ . SSVS+BREAKS includes only 23 of the 144 possible VAR coefficients and 3 of the 9 possible elements in Ψ (in the three regimes). SSVS+BREAKS COV includes only 9 of the 36 possible VAR coefficients and 3 of the 9 possible offdiagonal elements in Ψ . This indicates that SSVS is an effective way of ensuring parsimony in overparameterized VARs with breaks. The results also make it clear that a conventional lag length selection procedure would not ensure parsimony in a similar way. SSVS always includes the own first lag and also tends to include the first lag of most of the other variables (although there are several exceptions to this). However, there are many “holes” in the lag distribution. For instance, in the SSVS−VAR, the unemployment rate equation includes the first two lags of the unemployment rate, but only the third lag of the interest rate and no lags at all of the inflation rate. Similar non-sequential patterns are found for many of the other equations in other models. The models SSVS is uncovering are very different from those that would be obtained using sequential lag length selection procedures. With regard to Ψ , all of our models (in all of our regimes) reveal one striking finding: only the off-diagonal element of Ψ relating to the correlation between the errors in the interest and unemployment rate equations is included. In models that allow for breaks, we are finding substantual changes in some of the elements of Ψ . These seem to be mainly related to the Great Moderation of the business cycle (remember
333
that Ψ is defined as in Eq. (7), so that larger diagonal elements indicate smaller error variances). In our most general model, SSVS+BREAKS, we find evidence in some cases that individual coefficients differ across regimes, but this evidence is not as strong as that which was found for Ψ . In general, with SSVS+BREAKS we are finding that posterior standard deviations for the VAR coefficients are becoming wider (relative to SSVS−VAR or SSVS+BREAKS COV), and many of the posterior inclusion probabilities are near neither zero nor one (indicating uncertainty over whether the coefficients should be excluded or included). This provides some informal evidence that even SSVS is having some trouble dealing with the proliferation of parameters in the VAR(4) with multiple breaks in all of the parameters. Overall, these results suggest that the error covariance matrix is associated with the breaks to a greater extent than the VAR coefficients. 3.2. Forecasting In this section, we present results from a recursive forecasting exercise. The recursive exercise will involve using data available up to time τ − 1 to forecast at time τ + h for h = 0, 3 and 7. These are one-, four- and eight-step-ahead forecasts, respectively. This will be carried out for τ = τ0 , . . . , T − h − 1, where τ0 is 1974Q4, for a total of 128 − h − 1 quarters for each h. We use a notation where yτ∗+h is the random variable we are wishing to forecast and yτ +h is the actual realization of yτ∗+h . Thus, p(yτ∗+h |X τ −1 , . . . , X 1 ) is the predictive density of interest. However, in models that have structural breaks, we only use post-break data to construct this predictive density. Let τ ∗ be the timing of the most recent break before τ . Then p(yτ∗+h |X τ −1 , . . . , X 1 ) will simplify to p(yτ∗+h |X τ −1 , . . . , X τ ∗ ). Remember that we are using a timing convention such that X τ −1 contains information through to time τ −1 (i.e., our explanatory variables are all lags of dependent variables). The properties of the predictive density can be estimated using our MCMC algorithm. We let θ denote all of the parameters of the VAR with breaks, θm denote the VAR parameters in the mth regime, (r ) and θm denote the MCMC replications (subsequent to the burn-in replications) for r = 1, . . . , R. Then, if no structural break occurs at time τ , standard
334
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
MCMC theory and the structure of our model imply that p(y ˆ τ∗+h |X τ −1 , . . . , X τ ∗ ) =
R 1 X p yτ∗+h |X τ −1 , . . . , X τ ∗ , θm(r ) R r =1
(15)
will converge to p(yτ∗+h |X τ −1 , . . . , X τ ∗ ) as R → ∞. (r )
Note that p(yτ∗+h |X τ −1 , . . . , X τ ∗ , θm ) has a standard analytical form: p(yτ∗+h |X τ −1 , . . . , X τ ∗ , θm(r ) ) (r ) = f N (yτ∗+h |X τ −1 αm , Σm(r ) ),
(16)
where f N denotes the normal density. However, our model also allows for uncertainty about whether a break occurs at time τ . To account for this we proceed as follows. At each draw of our MCMC algorithm, we have the VAR coefficients and error covariance matrix in the last regime (i.e., using data since the last structural break), and a draw of q (which is the probability that a break occurs at time τ ). With probability 1 − q, the predictive density is given by Eq. (16). With probability q, a break has occurred in the period we are forecasting, and the predictive density is based on the SSVS prior (see Appendix A for details). Once we have p(yτ∗+h |X τ −1 , . . . , X τ ∗ ), we have to choose some methods for evaluating the forecast performance. For this purpose we use the mean squared forecast error (MSFE), the mean absolute forecast error (MAFE), predictive likelihoods, and hit rates. The first two of these are based on point forecasts and are defined as: TP −h
MSFE =
τ =τ0
yτ +h − E yτ∗+h |X τ −1 , . . . , X τ ∗
2
T − h − τ0 + 1
and TP −h
MAFE =
τ =τ0
yτ +h − E y ∗ |X τ −1 , . . . , X τ ∗ τ +h T − h − τ0 + 1
.
The mean squared forecast error can be decomposed into the variance of the forecast errors plus their bias squared. The bias provides insights as to whether the point forecasts are, on average, correct. The bias
squared is defined as: T −h 2 P ∗ |X ∗ y − E y , . . . , X τ τ −1 τ τ =τ τ 0 . T − h − τ0 + 1 To aid in interpretation, we also present this bias squared (with the variance term being MSFE minus this). Predictive likelihoods are the most common method of Bayesian forecast comparison. Note that one great advantage of predictive likelihoods is that they evaluate the forecasting performance of the entire predictive density, not just the accuracy of the point forecasts (see Corradi & Swanson, 2006, for a thorough review of methods for evaluating the entire predictive density). Predictive likelihoods have been described and their motivation discussed by many, such as Geweke and Amisano (2007). The basic predictive likelihood is the predictive density of yτ∗ evaluated at the actual outcome yτ . We use the sum of the log predictive likelihoods for forecast evaluation: TX −h τ =τ0
log p yτ∗+h = yτ +h |X τ −1 , . . . , X τ ∗ ,
(17)
where each term in the summation can be approximated using the MCMC output in a method similar to Eq. (15). To aid in interpretation, note that this sum of log predictive likelihoods is closely related to the log of the marginal likelihood. In fact, if we had set τ0 = 1 they would have been the same (see, e.g., Geweke & Amisano, 2007). Following standard practice (e.g., Geweke & Amisano, 2007), we do not set τ0 = 1, since the early terms on the sum can be extremely sensitive to the prior. Hence, an alternative interpretation of Eq. (17) is as a model selection device, with an interpretation similar to that of the log of the marginal likelihood, but with less sensitivity to the prior than the marginal likelihood. Table 1 presents MSFEs and bias squared values, Table 2 presents the MAFEs, and Table 3 presents the sum of the log of the predictive likelihoods for our six models, three variables and three forecast horizons. They all tell a similar story. Adding SSVS to the benchmark VAR typically improves the forecasting performance, but adding breaks does not do so unless
335
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347 Table 1 MSFEs and bias2 values for different models. h
DepVar
BENCHMARK VAR
MINN−VAR SSVS−VAR SSVS+BREAKS SSVS+BREAKS COEFF SSVS+BREAKS COV
Mean squared forecast error 0
3
7
Unemp Interest Inflation Unemp Interest Inflation Unemp Interest Inflation
0.105 1.001 0.115 0.615 4.180 1.914 1.459 8.182 3.502
0.097 0.973 0.111 0.534 4.088 1.935 1.072 7.668 3.665
0.085 0.963 0.108 0.481 3.716 1.985 1.347 7.799 4.641
0.00062 0.00018 0.00365 0.00001 0.01331 0.15090 0.00268 0.01494 0.26029
0.00057 0.00002 0.00380 0.00000 0.01303 0.16205 0.00022 0.04365 0.37914
0.00004 0.00001 0.00407 0.00001 0.01135 0.19621 0.01648 0.08595 0.62274
0.110 1.053 0.119 1.106 5.671 2.464 2.541 12.001 12.029
0.101 0.910 0.138 0.641 3.577 1.893 1.573 7.374 3.741
0.087 0.913 0.092 0.595 3.906 1.277 1.036 7.289 2.781
0.00014 0.00027 0.00460 0.00652 0.01918 0.11417 0.03049 0.07996 0.31403
0.00003 0.00287 0.00195 0.02876 0.00041 0.00421 0.04020 0.00258 0.00148
Bias squared 0
3
7
Unemp Interest Inflation Unemp Interest Inflation Unemp Interest Inflation
0.00039 0.00412 0.00714 0.00196 0.01801 0.51580 0.01045 0.42897 1.30047
Note: The acronyms for the six models are given at the beginning of Section 3. Table 2 Mean absolute forecast errors for different models. h
0
3
7
DepVar
BENCHMARK VAR
MINN−VAR SSVS−VAR SSVS+BREAKS SSVS+BREAKS COEFF SSVS+BREAKS COV
Unemp Interest Inflation Unemp Interest Inflation Unemp Interest Inflation
0.210 0.609 0.248 0.633 1.510 0.898 0.905 2.166 1.271
0.203 0.590 0.241 0.601 1.490 0.896 0.810 2.132 1.292
0.198 0.600 0.240 0.578 1.443 0.910 0.878 2.170 1.425
0.232 0.642 0.257 0.762 1.629 1.005 1.044 2.532 2.346
0.217 0.580 0.250 0.585 1.423 0.877 0.971 2.160 1.356
0.200 0.587 0.232 0.591 1.491 0.861 0.819 2.042 1.205
Table 3 Log predictive likelihoods. h
BENCHMARK VAR
MINN−VAR
SSVS−VAR
SSVS+BREAKS
SSVS+BREAKS COEFF
SSVS+BREAKS COV
0 3 7
0.317 0.025 0.009
0.326 0.026 0.010
0.342 0.025 0.010
0.325 0.021 0.007
0.210 0.020 0.007
0.307 0.021 0.008
the breaks are restricted to affect only some of the parameters. We elaborate on these points next.
Tables 1 and 2 provide metrics of the forecast performance based on point forecasts. We begin by
336
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
discussing the three models that do not allow for breaks. The columns labelled BENCHMARK VAR, MINN−VAR and SSVS−VAR all present results for constant parameter VARs, but using different priors. With one exception (forecasting inflation at long horizons), we find that shrinkage helps, in the sense that SSVS and the Minnesota prior both lead to forecasts with lower MSFEs and MAFEs than a VAR with a relatively non-informative normal prior. Relative to the results of other forecasting studies, a 10% reduction in the square root of the MSFE indicates a substantial improvement in the forecast performance.7 For the unemployment rate, SSVS leads to improvements of this magnitude or more (particularly at short and medium horizons) relative to the VAR. For the interest rate, there are also appreciable improvements in forecast performance. Even for the inflation rate (a variable that is notoriously difficult to forecast), including SSVS does lead to some forecast improvements at short horizons relative to the VAR with the relatively non-informative prior. It is harder to make such strong statements when comparing SSVS to the Minnesota prior. Nevertheless, there are many cases where SSVS does appreciably better than the Minnesota prior (especially when forecasting at short or medium horizons), and only in a very few cases does the opposite occur. Overall, Tables 1 and 2 indicate the value of using the SSVS prior when forecasting with VARs relative to other popular priors. The final three columns of Tables 1 and 2 are obtained from models containing breaks. Here the results are more nuanced. The full model (SSVS+BREAKS), which allows for all of the parameters in the model to change when a break occurs, does not improve the forecast performance in this data set. In fact, it leads to a deterioration in the forecast performance. This sort of finding is common in forecasting studies (see, e.g., Dacco & Satchell, 1999). Models with high dimensional parameter spaces which fit well in-sample often do not forecast well. Even the use of the SSVS prior, which is intended to help overcome the problems caused by over-parameterization, does not fully overcome this problem in this data set. 7 As one example, among many, see Stock and Watson (2002), who report improvements of this magnitude in the MSFE relative to a simple benchmark.
The results in Table 1 relating to the forecast bias shed more light on the poor forecasting performance of the model with breaks. It can be seen that, at least at short and medium horizons, all of our models produce forecasts that are approximately unbiased on average. It is clearly the variance component that is driving the MSFE. This is consistent with the idea that the predictive density for the SSVS+BREAKS model is too dispersed. After a break occurs, our model begins estimating a new VAR using only data since the break, which leads to very imprecise predictive densities. The advantages and disadvantages of such a strategy have already been discussed in the literature. A particularly lucid exposition is provided by Pastor and Stambaugh (2001) in an empirical exercise involving the equity premium: In standard approaches to models that admit structural breaks, estimates of current parameters rely on data only since the most recent estimated break. Discarding the earlier data reduces the risk of contaminating an estimate . . . with data generated under a different [process]. That practice seems prudent, but it contends with the reality that shorter histories typically yield less precise estimates. Suppose a shift in the equity premium occurred a month ago. Discarding virtually all of the historical data on equity returns would certainly remove the risk of contamination by prebreak data, but it hardly seems sensible in estimating the current equity premium. Completely discarding the pre-break data is appropriate only when the premium might have shifted to such a degree that the pre-break data are no more useful . . . than, say, pre-break rainfall data, but such a view almost surely ignores economics. (Pastor & Stambaugh, 2001, pp. 1207–1208) Our approach, by discarding all pre-break data, reduces the risk that our forecasts will be contaminated by data from a different data generating process, but our resulting predictives are less precisely estimated. At least in the present application, this latter factor is clearly causing problems. Note that other Bayesian approaches exist that allow for pre-break data to play at least some role in post-break estimation (thus potentially leading to more precise forecasts). Examples include Pesaran et al. (2006) and Koop and Potter (2007). This line of reasoning suggests that restricted variants of our models, which allow only some
337
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347 Table 4 Hit rates. h
0
3
7
Event
BENCHMARK VAR
MINN−VAR SSVS−VAR SSVS−breaks
SSVS+BREAKS COEFF
SSVS+BREAKS COV
I II III I II III I II III
0.732 0.790 0.840 0.700 0.548 0.660 0.746 0.558 0.595
0.727 0.797 0.844 0.690 0.533 0.665 0.730 0.542 0.617
0.729 0.754 0.807 0.709 0.531 0.665 0.732 0.498 0.558
0.770 0.823 0.857 0.752 0.528 0.676 0.766 0.583 0.655
0.722 0.811 0.857 0.674 0.539 0.678 0.730 0.497 0.594
parameters to change when a break occurs, may forecast better than our SSVS+BREAKS model. The final two columns of Tables 1 and 2 strongly indicate that this is the case. With our three dependent variables and three forecasting horizons, we have nine forecasting exercises with results documented in Table 1. In five of these nine exercises, the model with SSVS but breaks only in the error covariance matrix (SSVS+BREAKS COV) has the lowest MSFE. In two cases, the model with SSVS but breaks only in the VAR coefficients has the lowest MSFE, and in the remaining two cases, SSVS with no breaks yields the lowest MSFE. Thus, in every case in Table 1, a model with SSVS forecasts best (in terms of MSFE). However, the best model never allows all of the parameters to change when a break occurs. The most common finding is that a constant parameter VAR with breaks in the error covariance matrix is the best model for forecasting. This is consistent with the findings of Sims and Zha (2006), who found switches in volatility to be predominant in their empirical analysis involving US macroeconomic data. Other authors, such as Primiceri (2005) and Koop et al. (2009), working with similar data sets, have also found error covariance matrices to be changing over time. However, these latter authors find some change in the VAR coefficients as well. Our discussion so far has focussed on point forecasts. Our results so far indicate that most of the evidence for breaks seems to arise from breaks in the conditional variance (i.e., the error covariances are changing over time), not the conditional mean (i.e., the VAR coefficients are exhibiting less change over time). Loosely speaking, point forecasts largely
0.756 0.797 0.823 0.735 0.524 0.660 0.685 0.503 0.604
reflect the conditional mean, not the conditional variance. Hence, the breaks in the conditional variance are not contaminating our point forecasts from the VARs without breaks. However, the log predictive likelihoods incorporate all aspects of the predictive distribution, and hence, it is possible that models with breaks will perform better in terms of predictive likelihoods than in terms of MSFEs or MAFEs. Table 3 presents some evidence for this story, in the sense that the SSVS+BREAKS model (which allows all of the parameters to change when a break occurs) performs much better in Table 3 than in Tables 1 and 2. However, overall, the log predictive likelihoods indicate that there is not much difference in forecasting performances across models. Remember that the sum of log predictive likelihoods can be interpreted in a similar fashion to the log of the marginal likelihood. If we were to create Bayes factors using the numbers in Table 3 as approximations to log marginal likelihoods, we would never find these Bayes factors to be very different from one. Hence, Table 3 does not present strong evidence that one model strongly dominates the other in terms of overall forecasting performance. However, for h = 1 and 7, the model that does best in terms of log-predictive likelihoods is the constantparameter VAR with SSVS. For h = 3 we find that the Minnesota prior does slightly better than SSVS−VAR. As a final exercise, in order to see how well our predictive densities perform in fitting the tails of the distribution, we calculate hit rates for three different rare events. Event I is that the unemployment and inflation rates both rise. Event II is that inflation rises by more than 0.25, and Event III is that the interest
338
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347 Table B.1 (continued)
Table B.1 Results for SSVS−VAR (VAR with SSVS without breaks). Mean
Sd
P(Inc)
0.401 1.289 −0.384 −0.028 0.013 −0.013 −0.007 0.056 −0.002 0.003 0.005 0.006 0.005
0.100 0.086 0.118 0.088 0.042 0.025 0.023 0.039 0.016 0.010 0.014 0.015 0.013
0.996 1.000 0.956 0.143 0.129 0.297 0.193 0.814 0.156 0.048 0.030 0.036 0.075
0.026 −0.604 0.573 −0.008 0.056 0.789 0.001 0.111 −0.001 0.393 −0.245 0.009 −0.042
0.124 0.276 0.318 0.111 0.112 0.082 0.034 0.092 0.025 0.209 0.259 0.120 0.091
0.130 0.910 0.823 0.104 0.241 1.000 0.096 0.663 0.094 0.947 0.552 0.117 0.209
0.221 −0.148 0.076 0.022 0.018 −0.001 0.002 −0.000 −0.001 1.508 −0.513 −0.003 −0.000
0.154 0.096 0.100 0.047 0.036 0.009 0.010 0.006 0.006 0.072 0.082 0.025 0.015
0.763 0.819 0.458 0.219 0.227 0.064 0.055 0.041 0.057 1.000 1.000 0.059 0.064
3.049 1.414 3.201 1.289
0.151 0.073 0.165 0.227
1.000
DepVar: Unemp constant unemp(−1) unemp(−2) unemp(−3) unemp(−4) interest(−1) interest(−2) interest(−3) interest(−4) inflation(−1) inflation(−2) inflation(−3) inflation(−4) DepVar: Interest constant unemp(−1) unemp(−2) unemp(−3) unemp(−4) interest(−1) interest(−2) interest(−3) interest(−4) inflation(−1) inflation(−2) inflation(−3) inflation(−4) DepVar: Inflation constant unemp(−1) unemp(−2) unemp(−3) unemp(−4) interest(−1) interest(−2) interest(−3) interest(−4) inflation(−1) inflation(−2) inflation(−3) inflation(−4) Ψ -matrix unemp, unemp interest, interest inflation, inflation unemp, interest
unemp, inflation interest, inflation
Mean
Sd
P(Inc)
0.006 −0.047
0.074 0.088
0.083 0.277
rate falls by more than 0.75. A “hit” is defined as occurring if the event occurs and the point forecast lies in the region defined by the event. The “hit rate” is the proportion of hits. Table 4 presents the hit rates for our various events, models and forecast horizons. Each reported hit rate has been computed from 5000 draws, and so has a maximum asymptotic numerical standard error no greater than 0.007. These hit rates are all reasonably high, being in excess of 0.70 except at long horizons. In Table 4 we find evidence in favor of using SSVS priors in general (in the sense that the models that use SSVS tend to have the highest hit rates). In addition, Table 4 provides us with the strongest evidence so far in favor of models with breaks. That is, with one exception (forecasting inflation one year ahead), SSVS+BREAKS COV (the model with breaks only in the error covariance matrix) yields the highest hit rates. Furthermore, in many cases the hit rates arising from this model are substantially higher than those from any other model. A story consistent with this finding is that there are periods where the homoskedastic models do not model the tails of the predictive distribution well. In this section, we have examined the forecasting performances of various models using a wide variety of metrics. We have strong evidence in favor of our main story: that SSVS is of benefit in improving the forecast performance. With regard to the benefits of including structural breaks, our story is weaker (having many exceptions and qualifications). Overall, there seem to be some benefits from allowing for structural breaks, but the gains in forecasting performance are most marked when the break process is restricted to allow only some parameters to change. For instance, we often find good forecasting performance from a heteroskedastic VAR where the error covariance changes but the VAR coefficients do not. However, a completely unrestricted model where both the VAR coefficients and the error covariance change rarely forecasts well.
339
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347 Table B.2 Results for SSVS+BREAKS (SSVS with breaks in both the coefficients and the error covariance matrix). Regime I Mean
Sd
P(inc)
Regime II Mean
Sd
P(inc)
Regime III Mean
Sd
P(inc)
1.442 0.831 −0.105 −0.082 0.037 −0.193 0.040 0.102 0.152 −0.037 −0.041 −0.074 0.165
0.477 0.180 0.166 0.148 0.103 0.157 0.128 0.148 0.155 0.108 0.140 0.230 0.218
0.994 1.000 0.389 0.331 0.250 0.759 0.367 0.474 0.658 0.249 0.208 0.318 0.512
0.018 0.986 −0.071 −0.042 −0.008 −0.007 −0.002 0.083 0.008 0.026 0.012 0.006 0.002
0.086 0.131 0.126 0.080 0.041 0.022 0.015 0.042 0.028 0.044 0.027 0.021 0.015
0.134 1.000 0.327 0.287 0.179 0.172 0.105 0.877 0.184 0.293 0.112 0.060 0.063
0.120 0.980 −0.017 −0.029 −0.011 −0.074 −0.024 0.080 0.022 0.057 0.033 0.013 0.003
0.168 0.116 0.075 0.078 0.047 0.064 0.065 0.077 0.043 0.066 0.062 0.047 0.035
0.430 1.000 0.151 0.195 0.161 0.657 0.271 0.603 0.300 0.508 0.276 0.147 0.130
0.018 −0.025 0.010 0.010 0.092 0.919 −0.179 0.039 0.056 0.002 0.006 0.006 0.005
0.195 0.080 0.058 0.054 0.107 0.180 0.216 0.120 0.115 0.030 0.040 0.043 0.034
0.160 0.136 0.061 0.078 0.476 1.000 0.536 0.234 0.311 0.046 0.038 0.038 0.062
0.018 −1.109 0.419 0.039 0.487 0.515 0.008 0.128 0.290 0.431 −0.105 0.016 −0.066
0.268 0.565 0.581 0.317 0.356 0.139 0.055 0.160 0.206 0.241 0.260 0.187 0.156
0.184 0.915 0.457 0.223 0.758 0.999 0.139 0.493 0.791 0.905 0.266 0.128 0.237
0.031 −0.091 0.009 −0.113 0.225 1.180 −0.111 −0.084 −0.049 0.025 0.009 0.010 −0.018
0.136 0.198 0.114 0.273 0.289 0.152 0.172 0.128 0.087 0.073 0.057 0.066 0.075
0.127 0.243 0.091 0.246 0.472 1.000 0.389 0.396 0.324 0.138 0.058 0.056 0.098
0.973 −0.159 0.013 0.014 0.006 0.000 0.000 0.008 −0.021 1.121 −0.136 −0.093 −0.033
0.386 0.077 0.048 0.035 0.022 0.025 0.026 0.041 0.054 0.164 0.206 0.133 0.078
0.949 0.893 0.150 0.134 0.106 0.150 0.121 0.155 0.251 1.000 0.421 0.428 0.305
0.298 −0.028 0.004 −0.005 0.017 0.001 0.005 −0.001 −0.003 1.522 −0.533 −0.024 −0.007
0.279 0.080 0.066 0.069 0.062 0.016 0.024 0.017 0.024 0.136 0.190 0.091 0.045
0.687 0.199 0.103 0.104 0.146 0.099 0.101 0.100 0.141 1.000 0.934 0.151 0.118
0.035 −0.021 0.015 0.016 0.004 0.002 0.000 −0.001 −0.001 1.145 −0.104 −0.042 −0.046
0.090 0.076 0.071 0.054 0.026 0.013 0.015 0.013 0.009 0.125 0.157 0.090 0.079
0.205 0.128 0.093 0.085 0.082 0.074 0.064 0.065 0.062 1.000 0.391 0.244 0.347
2.311 2.668 4.514
0.259 0.304 0.497
3.065 1.109 2.305
0.277 0.106 0.198
5.191 2.795 4.675
0.447 0.286 0.387
DepVar: Unemp Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4) DepVar: Interest Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4) DepVar: Inflation Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4) Ψ -matrix Unemp, unemp Interest, interest Inflation, inflation
(continued on next page)
340
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
Table B.2 (continued)
Unemp, interest Unemp, inflation Interest, inflation
Regime I Mean
Sd
P(inc)
Regime II Mean
Sd
P(inc)
Regime III Mean
Sd
P(inc)
1.408 0.052 −0.039
0.429 0.203 0.208
0.987 0.179 0.288
2.020 0.009 −0.025
0.423 0.160 0.074
1.000 0.136 0.181
2.892 0.289 −0.205
0.698 0.581 0.297
0.997 0.360 0.468
4. Conclusions In this paper, we develop a model that has two major extensions over a standard VAR. The first of these is SSVS, and the second is breaks in the model parameters. The motivation for the first is that VARs have a large number of parameters (with many of them being insignificant in empirical applications). This can lead to over-parameterization problems: overfitting in-sample, along with imprecise inference. Our hope is that SSVS will mitigate these problems. The motivation for the second is that structural breaks are often observed to occur in macroeconomic data sets, and a forecasting procedure that ignores them could be seriously misleading. We investigate the forecasting performance of our model in an application involving a standard trivariate VAR with three popular macroeconomic variables. Our findings are mixed, but are moderately encouraging. In our recursive forecasting exercise, the use of SSVS in a VAR without breaks brings substantial improvements relative to a standard VAR and some improvements relative to a VAR with a Minnesota prior. However, the evidence that the addition of breaks is required is more nuanced. That is, the model that allows all of the parameters to change when a break occurs tends to forecast poorly. However, a model that restricts breaks to occur only in the error covariance matrix forecasts well. Acknowledgements The authors are grateful to two anonymous referees, and particularly to the editor of the special edition, Gael Martin, for many useful comments and suggestions that have improved this paper. The authors would like to thank the Leverhulme Trust for financial support under Grant F/00 273/J. The authors are Fellows at the Rimini Centre for Economic Analysis.
Appendix A. Technical details of the MCMC algorithms MCMC algorithm for the VAR with the SSVS prior Posterior computation in the VAR with the SSVS prior can be carried out using the MCMC algorithm described by George et al. (2008). This model and prior (and the associated notation) are described in Section 2.1. The MCMC algorithm involves the following posterior conditional distributions. We use a notation where θ denotes all of the parameters in the VAR with the SSVS prior and θ(−a) denotes all of the parameters except for a. For the VAR coefficients we have α|Y, θ(−α) ∼ N (α, V α ),
(18)
where h i−1 V α = (ΨΨ 0 ) ⊗ (X 0 X ) + (D D)−1 , α = V α (ΨΨ 0 ) ⊗ (X 0 X )αˆ , ˆ αˆ = vec( A) and Aˆ = (X 0 X )−1 X 0 Y. The conditional posterior for γ has γ j being independent (for all j) Bernoulli random variables: h i Pr γ j = 1|Y, θ(−γ j ) = q j , h i (19) Pr γ j = 0|Y, θ(−γ j ) = 1 − q j , where α 2j 1 κ1 j exp − 2κ 2 1j
qj =
2 1 exp − α j κ1 j 2κ12 j
qj
. α2 q j + κ10 j exp − 2j 1−qj 2κ0 j
341
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347 Table B.3 Results for SSVS+BREAKS COV (SSVS with breaks in the error covariance matrix). Mean
Sd
P(inc)
0.385 1.149 −0.218 −0.055 0.016 −0.040 −0.014 0.073 −0.001 0.025 0.011 0.003 0.002
0.089 0.109 0.144 0.105 0.046 0.040 0.031 0.040 0.011 0.033 0.024 0.018 0.013
0.995 1.000 0.775 0.283 0.163 0.577 0.266 0.873 0.109 0.344 0.089 0.046 0.063
0.008 −0.187 0.161 −0.002 0.063 0.999 −0.089 0.004 −0.004 0.099 −0.030 0.007 −0.002
0.064 0.200 0.223 0.077 0.092 0.114 0.108 0.031 0.023 0.135 0.127 0.043 0.029
0.071 0.551 0.382 0.083 0.335 1.000 0.477 0.089 0.087 0.439 0.158 0.044 0.051
0.236 −0.145 0.044 0.053 0.015 −0.001 0.000 −0.001 −0.001 1.392 −0.397 −0.009 −0.001
0.125 0.066 0.081 0.060 0.033 0.006 0.007 0.006 0.005 0.080 0.101 0.041 0.017
0.867 0.931 0.319 0.467 0.228 0.057 0.040 0.042 0.037 1.000 0.984 0.088 0.071
Regime II Mean
Sd
P(inc)
DepVar: Unemp Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4) DepVar: Interest Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4) DepVar: Inflation Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4) Regime I Mean
Sd
P(inc)
Regime III Mean
Sd
P(inc)
Ψ -matrix Unemp, unemp Interest, interest
2.303 2.407
0.207 0.233
2.692 0.855
0.266 0.084
5.072 2.437
0.416 0.246 (continued on next page)
342
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
Table B.3 (continued)
Inflation, inflation Unemp, interest Unemp, inflation Interest, inflation
4.259 1.014 0.013 −0.093
0.389 0.392 0.108 0.215
0.953 0.114 0.323
Mean
Sd
P(inc)
2.085 1.309 0.004 −0.038
0.209 0.457 0.158 0.082
0.977 0.147 0.241
The conditional posterior for ψ can be obtained by noting that the conditional posteriors of ψ 2j j (for j = 1, . . . , n) are independent of one another, with ψ 2j j |Y, θ(−ψ j j )
T ∼ G aj + , bj , 2
The preceding equation uses notation where V = (Y − X A)0 (Y − X A) has elements vi j , v j = (v1 j , . . . , v j−1, j )0 , and V j is the upper left j × j block of V . The conditional posterior of η has the conditional posteriors of η j (for j = 2, . . . , n) being independent of one another, with: η j |Y, θ(−η j ) ∼ N η j , V j ,
1 exp κ 1i j
1 exp κ
(20)
v11 b1 + , if j = 1, 2 h −1 i−1 b j = b + 1 v − v0 V v + D D j , jj j j j−1 1 j 2 if j = 2, . . . , n.
(21)
where h −1 i V j = V j−1 + D j D j and η j = −ψ j j V j v j . Finally, the conditional posterior for ω has ωi j being independent (for all i j) Bernoulli random variables: i h Pr ωi j = 1|Y, θ(−ωi j ) = q i j , h i (22) Pr ωi j = 0|Y, θ(−ωi j ) = 1 − q i j ,
0.345 0.629 0.360 0.126
1.000 0.253 0.202
where
qi j =
where
4.583 2.823 0.116 −0.027
1i j
−
ψi2j 2 2κ1i j
!
−
ψi2j 2 2κ1i j
! qi j ψ2
q i j + κ 1 exp − i2j 0i j 2κ0i j
!
. 1 − qi j
In summary, an MCMC algorithm for the VAR with SSVS involves sequentially drawing from Eqs. (18)– (22). MCMC algorithm for the VAR with the SSVS prior and structural breaks For the parameters within each regime, denoted by α ( j) and Σ ( j) for j = 1, . . . , M in Eq. (13), the MCMC algorithm described by George et al. (2008) applies, but using only data within the regime. Hence, we need only describe an MCMC algorithm for drawing the parameters of the break process defined by Eq. (14). Let Si = (s1 , . . . , si )0 and S i+1 = (si+1 , . . . , sT )0 . Using our previously-given notation for the dependent and explanatory variables, the likelihood function for our model implies p(yt+h |X t−1 , st = m) = p(yt+h |X t−1 , θm ) for m = 1, . . . , M. We use the generic notation θm to denote the parameters in regime m. In our VAR, θm = {α (m) , γ (m) , ψ (m) , η(m) , ω(m) }, where the parameters have the same definitions as above except that we are using (m) superscripts to denote the parameters in each regime. If Θ = 0 )0 , then the algorithm of Chib (1998) (θ10 , . . . , θ M proceeds by sequentially drawing from Θ|Y, ST , q,
(23)
ST |Y, Θ, q
(24)
and q|Y, Θ, ST .
(25)
We have already described how to draw from Eq. (23). Simulation from Eq. (24) is done using a method
343
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347 Table C.1 Posterior results for the SSVS+BREAKS model with a subjective prior. Regime I Mean
Sd
P(inc)
Regime II Mean
Sd
P(inc)
1.254 0.915 −0.179 −0.079 0.006 −0.058 0.024 0.105 0.162 −0.034 −0.047 −0.059 0.140
0.526 0.196 0.201 0.134 0.061 0.122 0.114 0.160 0.172 0.113 0.161 0.238 0.218
0.964 1.000 0.561 0.368 0.134 0.287 0.208 0.425 0.575 0.207 0.240 0.337 0.408
0.135 1.223 −0.180 −0.113 −0.000 0.002 0.002 0.027 0.002 0.009 0.005 0.002 0.001
0.123 0.094 0.176 0.115 0.029 0.011 0.012 0.025 0.011 0.015 0.015 0.012 0.010
0.632 1.000 0.581 0.555 0.094 0.033 0.039 0.360 0.036 0.079 0.053 0.037 0.026
0.018 −0.040 0.020 0.014 0.133 0.844 −0.183 0.030 0.040 0.004 0.012 0.006 0.005
0.281 0.104 0.098 0.088 0.116 0.182 0.220 0.116 0.107 0.054 0.073 0.063 0.047
0.328 0.224 0.177 0.243 0.684 1.000 0.529 0.216 0.220 0.114 0.134 0.117 0.103
0.039 −0.967 0.635 0.080 0.223 0.673 −0.002 0.234 0.012 0.463 −0.227 0.131 −0.218
0.173 0.286 0.447 0.274 0.217 0.071 0.036 0.088 0.048 0.177 0.298 0.296 0.200
0.270 1.000 0.806 0.366 0.632 1.000 0.100 0.938 0.137 0.986 0.525 0.403 0.689
0.774 −0.130 0.013 0.015 0.006 −0.000 −0.000 0.002 −0.015 1.214 −0.219 −0.097 −0.021
0.460 0.091 0.057 0.040 0.022 0.019 0.019 0.029 0.045 0.206 0.257 0.140 0.066
0.862 0.753 0.173 0.155 0.086 0.060 0.057 0.070 0.137 1.000 0.563 0.437 0.215
0.110 −0.037 0.015 0.001 0.013 0.000 0.002 −0.001 −0.005 1.551 −0.559 −0.004 −0.001
0.141 0.081 0.067 0.040 0.039 0.009 0.009 0.009 0.010 0.074 0.090 0.034 0.017
0.471 0.220 0.127 0.086 0.110 0.021 0.026 0.025 0.037 1.000 0.999 0.092 0.058
2.315 2.535 4.368 1.055
0.277 0.307 0.533 0.482
0.942
3.675 1.397 2.961 1.759
0.206 0.080 0.166 0.301
1.000
DepVar: Unemp Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4) DepVar: Interest Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4) DepVar: Inflation Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4) Ψ -matrix Unemp, unemp Interest, interest Inflation, inflation Unemp, interest
(continued on next page)
344
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
Table C.1 (continued)
Unemp, inflation Interest, inflation
Regime I Mean
Sd
P(inc)
Regime II Mean
Sd
P(inc)
0.019 −0.001
0.226 0.200
0.298 0.293
0.038 −0.038
0.170 0.082
0.279 0.251
developed by Chib (1996). This involves noting that: p(ST |Y, Θ, q) = p(sT |Y, Θ, q) p(sT −1 |Y, S T , Θ, q) · · · p(st |Y, S t+1 , Θ, q) · · · p(s1 |Y, S 2 , Θ, q).
(26)
Draws from st can be obtained using the fact (see Chib, 1996) that p st |Y, S t+1 , Θ, q ∝ p (st |yt , Θ, q) p(st+1 |st , q).
(27)
Since p(st+1 |st , q) is the transition probability and the integrating constant can easily be obtained (conditional on the value of st+1 , st can take on only two values), we only need to worry about p(st |yt , Θ, q). Chib (1996) recommends the following recursive strategy. Given a knowledge of p(st−1 = m|X t−1 , Θ, q) (and remembering that, in the VAR case, X t−1 contains an intercept and lags of the dependent variable), we can obtain: p (st = j|yt , Θ, q) p st = j|X t−1 , Θ, q p yt+h |X t−1 , θ j = , (28) M P p st = m|X t−1 , Θ, q p yt+h |X t−1 , θm m=1
using the fact that p (st = j|X t−1 , Θ, q) = q · p (st−1 = j − 1|X t−1 , Θ, q) + (1 − q) · p (st−1 = j|X t−1 , Θ, q) ,
(29)
for j = 1, . . . , M. Thus, the algorithm proceeds by calculating Eq. (27) for every time period using Eqs. (28) and (29) beginning at t = 1 and going forward in time. Then the states themselves are drawn using Eq. (26), beginning at period T and going backwards. To be precise, the algorithm begins with p(s1 = 1|X 0 , Θ, q) = 1 and then uses Eq. (29) to calculate p(st = k|X t−1 , Θ, q) for t = 2, . . . , T . This allows for everything in Eq. (28) to be calculated
except for p(yt+h |X t−1 , θm ). However, this is just the normal p.d.f. from the SSVS−VAR likelihood for the tth observation (evaluated at the MCMC draw of θm ). Thus, Eq. (28) can be evaluated and substituted into Eq. (27) to obtain the discrete p.d.f. p(st |Y, S t+1 , Θ, q) for t = 1, . . . , T . Draws from these p.d.f.s can be used in the ordering of Eq. (26) to draw the states (conditional on all of the other parameters in the model). Finally, derivations in McCulloch and Tsay (1993) imply that the conditional posterior for q is Beta: q|Y, sT ∼ B δ 1 , δ 2 , where δ 1 = δ 1 + T − sT and δ 2 = δ 2 + sT − 1. With regard to prediction, with the exception of the model with structural breaks, the strategy outlined in the body of the text using Eqs. (15) and (16) is applied. When breaks are present, the predictive density, conditional on the MCMC draws of the parameters, is a mixture of Eq. (16) (with probability 1 − q) and the prior-based quantity (with probability q). Since the posterior for q indicates that it is very small, this prior quantity has little effect, but cannot be evaluated analytically due to the hierarchical structure of the prior. For ease of computation, we approximate it using f N (yτ∗ |µ, ˜ Σ˜ ), where µ˜ and Σ˜ are the mean vector and covariance matrix of a large number of draws from the prior predictive density. Appendix B. Additional empirical results We begin this appendix with posterior results using the full sample for three of the models. We present
345
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347 Table D.1 Gelman and Rubin SSVS+BREAKS.
Table D.1 (continued) (1992)
convergence
diagnostics
for
Regime I
Regime II
Regime III
1.01 1.00 1.00 1.01 1.01 1.00 1.00 1.00 1.00 1.00 1.01 1.00 1.01
1.00 1.00 1.00 1.00 1.00 1.02 1.01 1.00 1.01 1.01 1.01 1.00 1.02
1.01 1.01 1.01 1.00 1.02 1.00 1.00 1.00 1.00 1.01 1.00 1.03 1.01
Unemp, inflation Interest, inflation
Regime I
Regime II
Regime III
1.00 1.00
1.00 1.00
1.00 1.00
DepVar: Unemp Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4) DepVar: Interest Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4)
1.00 1.02 1.00 1.02 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.01
1.01 1.04 1.01 1.01 1.02 1.06 1.01 1.00 1.02 1.02 1.01 1.02 1.01
1.01 1.02 1.03 1.00 1.00 1.00 1.00 1.00 1.00 1.01 1.01 1.02 1.05
1.02 1.01 1.01 1.00 1.00 1.00 1.00 1.01 1.01 1.00 1.00 1.00 1.00
1.00 1.01 1.01 1.00 1.00 1.01 1.00 1.02 1.01 1.00 1.00 1.00 1.00
1.01 1.03 1.04 1.00 1.02 1.00 1.01 1.00 1.02 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00
1.00 1.01 1.00 1.00
1.00 1.00 1.00 1.00
DepVar: Inflation Constant Unemp(−1) Unemp(−2) Unemp(−3) Unemp(−4) Interest(−1) Interest(−2) Interest(−3) Interest(−4) Inflation(−1) Inflation(−2) Inflation(−3) Inflation(−4) Ψ -matrix Unemp, unemp Interest, interest Inflation, inflation Unemp, interest
the posterior mean and standard deviation of each parameter, along with the probability of inclusion (i.e., the posterior probability that γ j = 1 for each of the VAR coefficients and the posterior probability that ωi j = 1 for the off-diagonal elements of Ψ ), which is denoted by P(inc) in the tables (Tables B.1–B.3). Appendix C. Prior sensitivity analysis We have carried out a prior sensitivity analysis in many dimensions in order to confirm that our results are robust to reasonable changes in the prior. We have focussed on prior sensitivity in our most general model (SSVS+BREAKS), reasoning that if there is a sensitivity to the prior it is most likely to show up in the model with the most parameters. For the sake of brevity, we do not present all of our prior sensitivity results, since they mostly show very little sensitivity. For instance, changing the B(0.01, 1.0) prior for q to B(0.001, 10) – a very substantial change – results in posterior results that are virtually identical to those in Table B.2, and hence are not worth reporting. Unsurprisingly, since there are so many VAR coefficients, the greatest sensitivity is found with regard to their prior. However, even here the empirical results are qualitatively little changed, even by large changes in the prior. As an illustration of this, Table C.1 contains results which are comparable to the results in Table B.2, but where we replace the semi-default automatic prior of George et al. (2008) with a subjective prior that sets the variances in the normal mixture priors to κ0 j = κ0i j = 0.01 and κ1 j = κ1i j = 1.0. These values reflect our prior beliefs about what “big” and “small” prior variances in the SSVS prior should be, but a comparison with the semi-automatic default prior shows that (especially for the VAR coefficients) the subjective prior was much less informative. Due to the issues associated with Bartlett’s paradox, it is not surprising that this much more non-informative prior tends to select
346
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347
more parsimonious models. In particular, with this prior we find less evidence for two breaks occurring. Accordingly, in Table C.1 we present posterior results for the VAR coefficients and Ψ for only two regimes. Despite this change, the coefficients in Table C.1 still tend to be very similar to those in Table B.2, and features such as impulse responses are quite similar. Thus, although we find some sensitivity to large changes in the SSVS prior, their implications for the main qualitative conclusions of the paper are small. Appendix D. MCMC convergence diagnostics Table D.1 presents the commonly-used MCMC convergence diagnostic of Gelman and Rubin (1992) for the posterior mean of every VAR coefficient and element of Ψ for the most general model used in the paper (SSVS+BREAKS), using the full sample, h = 0 and 25,000 replications (with the first 5000 being discarded as burn-in replications). Table B.2 presents the posterior means and standard deviations of the parameters of this model. This convergence diagnostic involves running the algorithm many times with overdispersed starting values, and then comparing the within-sequence variation to the between-sequence variation. The precise formula is given by Gelman and Rubin (1992) and Koop (2003, p. 68). If the sampler has converged, then these should be the same and the convergence diagnostic should be one. The results in Table D.1 present strong evidence that the sampler has converged. References Andersson, M., & Karlsson, S. (2008). Bayesian forecast combination for VAR models. In S. Chib, W. Griffiths, G. Koop, & D. Terrell (Eds.), Bayesian econometrics — Advances in econometrics, volume 23 (pp. 501–524). Emerald Group Publishing. Ang, A., & Bekaert, G. (2002). Regime switches in interest rates. Journal of Business and Economic Statistics, 20, 163–182. Canova, F. (2007). Methods for applied macroeconomic research. Princeton: Princeton University Press. Chib, S. (1996). Calculating posterior distributions and modal estimates in Markov mixture models. Journal of Econometrics, 75, 79–97. Chib, S. (1998). Estimation and comparison of multiple changepoint models. Journal of Econometrics, 86, 221–241. Clements, M., & Hendry, D. (1998). Forecasting economic time series. Cambridge: Cambridge University Press.
Clements, M., & Hendry, D. (1999). Forecasting non-stationary economic time series. Cambridge: The MIT Press. Cogley, T., & Sargent, T. (2001). Evolving post-world war II inflation dynamics. NBER Macroeconomic Annual, 16, 331–373. Cogley, T., & Sargent, T. (2005). Drifts and volatilities: Monetary policies and outcomes in the post WWII US. Review of Economic Dynamics, 8, 262–302. Corradi, V., & Swanson, N. (2006). Predictive density evaluation. In C. W. J. Granger, G. Elliot, & A. Timmerman (Eds.), Handbook of economic forecasting (pp. 197–284). Amsterdam: Elsevier. Dacco, R., & Satchell, S. (1999). Why do regime-switching models forecast so poorly? International Journal of Forecasting, 18, 1–16. Diebold, F., & Pauly, P. (1990). The use of prior information in forecast combination. International Journal of Forecasting, 6, 503–508. Doan, T., Litterman, R., & Sims, C. (1984). Forecasting and conditional projection using realistic prior distributions. Econometric Reviews, 3, 1–100. Gelman, A., & Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511. George, E., & McCulloch, R. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 85, 398–409. George, E., & McCulloch, R. (1997). Approaches for Bayesian variable selection. Statistica Sinica, 7, 339–373. George, E., Sun, D., & Ni, S. (2008). Bayesian stochastic search for VAR model restrictions. Journal of Econometrics, 142, 553–580. Geweke, J., & Amisano, J. (2007). Hierarchical Markov normal mixture models with applications to financial asset returns. Manuscript available at: http://www.biz.uiowa.edu/ faculty/jgeweke/papers/paperA/paper.pdf. Geweke, J., & Whiteman, C. (2006). Bayesian forecasting. In G. Elliott, C. W. J. Granger, & A. Timmermann (Eds.), Handbook of economic forecasting (Chapter 1). Amsterdam: Elsevier. Hall, A., Anderson, H., & Granger, C. (1992). A cointegration analysis of treasury bill yields. Review of Economics and Statistics, 74, 116–126. Ishwaran, H., & Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Annals of Statistics, 33, 730–773. Kadiyala, K., & Karlsson, S. (1993). Forecasting with generalized Bayesian vector autoregressions. Journal of Forecasting, 12, 365–378. Kim, C.-J., Nelson, C., & Piger, J. (2004). The less volatile US economy: A Bayesian investigation of timing, breadth, and potential explanations. Journal of Business and Economic Statistics, 22, 80–93. Koop, G. (2003). Bayesian econometrics. UK: John Wiley and Sons. Koop, G., L´eon-Gonz´alez, R., & Strachan, R. W. (2009). On the evolution of monetary policy. Journal of Economic Dynamics and Control, 33, 997–1017. Koop, G., & Potter, S. (2001). Are apparent findings of nonlinearity due to structural instability in economic time series? The Econometrics Journal, 4, 37–55.
M. Jochmann et al. / International Journal of Forecasting 26 (2010) 326–347 Koop, G., & Potter, S. (2004). Forecasting in dynamic factor models using Bayesian model averaging. The Econometrics Journal, 7, 550–565. Koop, G., & Potter, S. (2007). Estimation and forecasting in models with multiple breaks. Review of Economic Studies, 74, 763–789. Koop, G., & Potter, S. (2009). Prior elicitation in multiple change-point models. International Economic Review, 50(3), 751–772. Korobilis, D. (2008). Forecasting in vector autoregressions with many predictors. In S. Chib, W. Griffiths, G. Koop, & D. Terrell (Eds.), Bayesian econometrics — Advances in econometrics, volume 23 (pp. 403–432). Emerald Group Publishing. Litterman, R. (1986). Forecasting with Bayesian vector autoregressions—Five years of experience. Journal of Business and Economic Statistics, 4, 25–38. Maheu, J., & Gordon, S. (2008). Learning, forecasting and structural breaks. Journal of Applied Econometrics, 23(5), 553–583. Maheu, J., & McCurdy, T. (2009). How useful are historical data for forecasting the long-run equity return distribution? Journal of Business and Economic Statistics, 27, 95–112. Marcellino, M., Stock, J., & Watson, M. (2006). A comparison of direct and iterated multistep AR methods for forecasting macroeconomic series. Journal of Econometrics, 135, 499–526.
347
Martin, G. M. (2000). US deficit sustainability: A new approach based on multiple endogenous breaks. Journal of Applied Econometrics, 15, 83–105. McCulloch, R., & Tsay, R. (1993). Bayesian inference and prediction for mean and variance shifts in autoregressive time series. Journal of the American Statistical Association, 88, 968–978. Pastor, L., & Stambaugh, R. (2001). The equity premium and structural breaks. Journal of Finance, 56, 1207–1239. Pesaran, M. H., Pettenuzzo, D., & Timmerman, A. (2006). Forecasting time series subject to multiple structural breaks. Review of Economic Studies, 73, 1057–1084. Primiceri, G. (2005). Time varying structural vector autoregressions and monetary policy. Review of Economic Studies, 72, 821–852. Sims, C. (1980). Macroeconomics and reality. Econometrica, 48, 1–80. Sims, C., & Zha, T. (2006). Were there regime switches in macroeconomic policy? American Economic Review, 96, 54–81. Stock, J., & Watson, M. (1996). Evidence on structural instability in macroeconomic time series relations. Journal of Business and Economic Statistics, 14, 11–30. Stock, J., & Watson, M. (2002). Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics, 20(2), 147–162.