ARTICLE IN PRESS
Journal of Economic Dynamics & Control 30 (2006) 1063–1079 www.elsevier.com/locate/jedc
Assessing Markov chain approximations: A minimal econometric approach Pedro Silos Department of Economics, University of Iowa, USA Received 11 October 2004; accepted 11 April 2005 Available online 20 July 2005
Abstract Markov chain approximations of continuous state-space processes are common in economic models. Increasing the dimensions of the state space is costly; this paper develops a procedure to evaluate the tradeoff between the number of dimensions devoted to modelling dynamics and those devoted to modelling the contemporaneous state space. The methodology borrows from a previous literature which formalizes inference within calibrated models. Approximations for post-war real per capita US consumption growth are compared. Standard business cycle theory is used to generate needed information regarding state transition probabilities. In this application it is useful to trade some accuracy in defining the state space for more realistic dynamics. r 2005 Elsevier B.V. All rights reserved. JEL classification: C11; C15; C52 Keywords: Markov chain approximations; Bayesian model comparison
1. Introduction Markov chain approximations of macroeconomic variables have been used frequently in dynamic stochastic general equilibrium models. Examples include Present address: Research Department, Federal Reserve Bank of Atlanta, 1000 Peachtree St. NE, Atlanta, GA 30309, USA. Tel.: +1 404 498 8630; fax: +1 404 498 8956. E-mail address:
[email protected].
0165-1889/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.jedc.2005.04.005
ARTICLE IN PRESS 1064
P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
approximations of income processes as in Aiyagari (1994), who uses a seven state Markov chain to approximate labor endowment. Also, many papers in the asset pricing literature approximate real per capita consumption growth (heretofore consumption growth), from Mehra and Prescott (1985) to Otrok et al. (2002). The reasons for using such approximations are usually computational, in that subsequent calculations are more convenient to carry out on a discrete grid. For example, in consumption-based asset pricing models, some preference specifications are simple enough that returns and prices can be calculated even if consumption growth is described as a continuous state autoregression; for recursive preferences, discretization decreases the computational burden. A well-known method for computing Markov chain approximations of continuous state space univariate or vector first-order autoregressions is due to Tauchen (1986). Although not the only one available (see e.g. Tauchen and Hussey, 1991), this method is popular due to its simplicity. If a given time series can be modelled as an autoregressive process there are two choices to be made when approximating it as a Markov chain. One pertains to the choice of the lag length: the optimal model might have p lags, versus a model with p þ 1 lags or p 1 lags. The other choice is the dimension of the state space for the variable. Let N denote the number of states the variable can take values on. Choices of N and p imply a transition probability matrix of dimension N p N p , which can be quite large even for moderate N if p41. Table 1 and Figs. 1 and 2 illustrate the tradeoffs involved in modelling consumption growth. The first column in Table 1 shows the sample values for the mean, variance and the first three autocorrelations for the observed consumption growth series. The second and third columns show the same moments obtained by simulating a long time series from a transition matrix calculated for consumption growth using Tauchen’s method for the p ¼ 2; N ¼ 3 case (second column), and the p ¼ 1; N ¼ 9 case (third column). Fig. 1 plots the unconditional densities (left axis) for a continuous state space AR(3) model, with the unconditional mass function (right axis) obtained from the transition matrices for the p ¼ 2; N ¼ 3 Markov chain approximation, while Fig. 2 compares the same continuous state space density with a Markov chain approximation where p ¼ 1; N ¼ 9. Although the number of states (N) is rather large in the second case, the model does not match the second and third autocorrelations closely. However, with a clearly worse approximation to the
Table 1 Consumption growth moments: models and data
Mean Variance First autocorr. Second autocorr. Third autocorr.
Actual data
p ¼ 2, N ¼ 3
p ¼ 1, N ¼ 9
1.005 2:9 105 0.22 0.14 0.16
1.005 3:5 105 0.19 0.1 0.03
1.005 3 105 0.22 0.05 0.01
ARTICLE IN PRESS P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
1065
Densities: Continuous State Space vs. MC Approximation (p=2, N=3)
70
1 0.9
60
0.8 50
0.7 0.6
40
0.5 30
0.4 0.3
20
0.2 10 0.1 0 0.975 0.98 0.985 0.99 0.995
1
0 1.005 1.01 1.015 1.02 1.025
Fig. 1. p ¼ 2; N ¼ 3 vs. continuous state space AR(3).
Densities: Continuous State Space vs. MC Approximation (p=1, N=9)
70
0.4
60
0.35 0.3
50
0.25 40 0.2 30 0.15 20
0.1
10
0.05
0 0.975 0.98 0.985 0.99 0.995
1
0 1.005 1.01 1.015 1.02 1.025
Fig. 2. p ¼ 1; N ¼ 9 vs. continuous state space AR(3).
unconditional density, the Markov chain model with p ¼ 2 is closer to matching the second and third autocorrelation, but to keep the dimension of the transition matrix fixed, N has to be decreased. With N p limited by computational considerations, how should one choose N and p? Clearly, this depends upon the economic questions being addressed. More detail in the unconditional marginal distribution for the random variable can be characterized with greater N; greater p is necessary to capture more features of the dynamics of the process. Inasmuch as both volatility (unconditional variance) and persistence (first- and higher-order dynamics) are often of interest in dynamic macroeconomics, it is useful to have a formal procedure for assessing the tradeoffs involved. In general, the classical statistical procedure for doing this would involve
ARTICLE IN PRESS 1066
P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
assuming that one of the candidate approximations represents the ‘correct’ specification,1 and comparing the maximized value of the likelihood under that specification to those under other specifications. Such a procedure is not feasible here: the data of any continuous state-space time series will assign zero probability to any Markov chain approximation. This state of affairs is common in the calibrated model literature, and several methods have been proposed recently for introducing formalism into model assessment procedures (see, e.g., the survey by Canova and Ortega). Traditionally, studies that have used calibrated models to examine economic issues have been informal in assessing the fit of those models, and usually their conclusions are based on some implicit measure of distance between some of the moments of modelsimulated data and the corresponding sample moments from the observed data (a prototypical example is Cooley and Prescott (1995)). Following the notation in Canova and Ortega (2000), let yt represent the data and xt ¼ f ðg; zt Þ represent a simulated value of the corresponding vector from the model, which is a function of a set of parameters g and a driving process zt . Besides the informal approach when comparing moments, examples in the literature are Watson (1993), who focuses on the properties of the implicit ‘residual term’ between the observed variable yt and the simulated xt values; Gregory and Smith (1991), who take sampling uncertainty into account in their treatment of zt , but treat the data as given; Cechetti et al. (1994), who base their estimation and calibration method in the sampling variability in the data, treating g and the parameters of the zt process as known; and finally a Bayesian approach developed by DeJong et al. (1996) and Geweke (1999), who treat model and data uncertainty symmetrically. The ‘minimal econometric approach’ developed by Geweke does permit comparison of multiple models, without the necessity to evaluate the likelihood functions of the models entertained. This methodology only requires being able to simulate from a given model, i.e. to generate a sequence of random draws from the implied distribution of a model. The approach, described in more detail Section 2 of the paper, is implemented in Section 3 to assess alternative Markov chain approximations for the consumption growth series, which is used routinely in asset pricing models, and thus the assessment provided here may be of broader use.
2. Model comparison To present the details of the comparison procedure, I borrow heavily from Geweke (1999). Let A1 and A2 be any two models to be compared. These can be general equilibrium models or Markov chain approximations. What is common between these models is that they do not describe the data in an exact way: variables that are sometimes approximated as Markov chains can take more than a finite 1 An exception within the classical framework for comparing misspecified models using a likelihoodbased approach is Vuong (1989). In the Bayesian framework, Ferna´ndez-Villaverde and Rubio-Ramı´ rez (2004) also allows for a likelihood-based comparison of misspecified models.
ARTICLE IN PRESS P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
1067
number of states. Therefore, the value of the likelihood of a discrete specification for a continuous state-space time series will be zero. Let h denote the vector of moments (or functions of moments) that these models claim to describe, and hence how the models perform when describing h will be the basis of comparison between A1 and A2 .2 This vector of moments will be a function of the model’s set of underlying parameters. However, this set might not be the same across the models considered and therefore we need a common framework (h) that enables the researcher to do the comparison. The specification of a prior distribution over the set of underlying parameters induces a density over h. Denoting this density pðh j Aj Þ within model Aj , realizations from this distribution are obtained via simulation. Let model D (D for data) be a statistical model, a reduced form model independent of A1 and A2 that will serve as a link between these and the data. It will also be endowed with a prior distribution pðh j DÞ, and the assumption of normality in the errors will imply a certain likelihood function for the data, denoted pðy j h; DÞ. The distribution of interest for the vector h in the context of model D will be the posterior distribution pðh j y; DÞgiven by pðh j y; DÞ / pðy j h; DÞpðh j DÞ.
(1)
Simulated values of h from this distribution will be generated via a posterior simulator described below. The final goal is to obtain a numerical measure that allows the comparison between any two given models. This measure will be given by pðA1 j y; DÞ=pðA2 j y; DÞ, which is the ratio of model probabilities given the observed data and the model D that we are using to conduct the comparison. Conditional on model D that quotient can be interpreted as a posterior odds ratio. The remainder of the section derives this numerical measure, relating it to draws of h from models A1 , A2 and the posterior distribution of model D, pðh j y; DÞ, and also discusses some computational issues. Let me begin with the following two assumptions: Assumption 1. pðy j h; A1 ; DÞ ¼ pðy j h; A2 ; DÞ ¼ pðy j h; DÞ. This assumption implies that A1 and A2 bring no new information about the observables y if h is known in the context of model D, since the purpose of those models is precisely to describe h. A straightforward result from assumption 1 is pðA1 j h; y; DÞ ¼
pðy j h; A1 ; DÞpðA1 j h; DÞ ¼ pðA1 j h; DÞ. pðy j h; DÞ
(2)
An equivalent result holds for model A2 . The sequence of observables y is not needed to make any comparison between A1 and A2 if the vector h is already known in the context of model D. That is, given model D, h is sufficient to distinguishing A1 and A2 . The second assumption we need is the following: Assumption 2. pðh j DÞ / constant, pðh j A1 ; DÞ ¼ pðh j A1 Þ, and pðh j A2 ; DÞ ¼ pðh j A2 Þ. 2
It is worth noting that in likelihood-based comparisons, there is no need for specifying a vector of moments.
ARTICLE IN PRESS 1068
P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
This assumption means that prior to observing data, model D brings no information about h, either directly through the prior distribution of h given model D, or indirectly through either A1 or A2 . Now note that under Assumptions 1 and 2, we have Z Z pðA1 j y; DÞ ¼ pðA1 j h; y; DÞpðh j y; DÞ dh ¼ pðA1 j h; DÞpðh j y; DÞ dh Z pðh j A1 ; DÞpðA1 j DÞ pðh j y; DÞ dh ¼ pðh j DÞ Z pðA1 j DÞ ¼ pðh j A1 Þpðh j y; DÞ dh. ð3Þ pðh j DÞ R The last integral, pðh j A1 Þpðh j y; DÞ dh, is the expectation of pðh j A1 Þ under the posterior distribution pðh j y; DÞ, which is proportional to the likelihood because of Assumption 2. Although the support for both densities is the same, this quantity will be small whenever simulations from model D and simulations from model A1 congregate in different regions of that support. The farther away these regions are from one another, the smaller the quantity will be. The greater the overlap between these regions, the greater the quantity. An analogous expression can be derived for model A2 , and therefore a posterior odds ratio can be constructed: pðA1 j y; DÞ pðA1 j DÞ pðy j A1 ; DÞ ¼ pðA2 j y; DÞ pðA2 j DÞ pðy j A2 ; DÞ which, by virtue of (3), becomes R pðA1 j y; DÞ pðA1 j DÞ pðh j A1 Þpðh j y; DÞ dh R . ¼ pðA2 j y; DÞ pðA2 j DÞ pðh j A2 Þpðh j y; DÞ dh
(4)
(5)
Comparing (5) and (4), the key ingredient in the model comparison is the Bayes factor, a quotient of marginal likelihoods: R pðh j A1 Þpðh j y; DÞ dh pðy j A1 ; DÞ ¼R . (6) pðy j A2 ; DÞ pðh j A2 Þpðh j y; DÞ dh Given that the ratio of prior odds will be set to one throughout applications in the paper, this ratio will provide a numerical value for comparing models A1 and A2 . The higher the Bayes factor, the closer simulations from model A1 are to simulations 1 from model D, relative to model A2 and vice versa. Given simulations fhðs1 Þ gM s1 ¼1 and ðs2 Þ M 2 fh gs2 ¼1 from models A1 and A2 respectively, I can estimate the Bayes factor by PM 1 pðhðs1 Þ j y; DÞ pðy j A1 ; DÞ ð1=M 1 Þ s1 ¼1 . (7) PM 2 pðy j A2 ; DÞ ð1=M 2 Þ s ¼1 pðhðs2 Þ j y; DÞ 2
The computation of the marginal likelihoods amounts to measuring the overlap between the densities from models A1 and A2 , and the posterior density from model D. This is done in two steps. First, a long sequence of draws is obtained from both pðh j A1 Þ and pðh j A2 Þ. Second, evaluate the posterior density pðh j y; DÞ at the draws
ARTICLE IN PRESS P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
1069
obtained from pðh j Aj Þ and compute the average of those evaluations. This average is the approximate marginal likelihood of model Aj . An example might be useful to understand the procedure. Suppose a researcher wants to compare two asset pricing models that differ on, say, preferences or the endowment process. The researcher will need a formal method to carry out the comparison between the two economic models. This is analogous to what is being done in this paper, but instead of economic models, I am comparing two Markov chain approximations. The comparison is based on a set of moments of interest. For comparing two economic models, such moments might include average ratios of key economic variables, standard deviations of some of those variables, etc. In the Markov chain example, I will focus on intercepts, slope coefficients, and variances in the implied continuous-state autoregression. The competing models are used to simulate large samples of model output, drawn from the joint distribution of the objects of interest for the models considered. These distributions are judged in light of the analogous joint distribution produced from a reduced form, statistical model of the actual observed data. Specifically, the models are compared via a ‘Bayes factor’: one model ‘beats’ another if the expected value of its joint distribution, taken with respect to the data-induced distribution, is larger than the other’s.
3. Comparing Markov chain approximations The ‘minimal econometric’ approach described in Section 2 will be used to explore the significance of the tradeoff between the number of lags in the original autoregressive model and the number of gridpoints that defines the state space for the variable. The main elements will be two models that the researcher wants to compare (A1 and A2 ), an auxiliary model denoted D and a vector h, considered the ‘object of interest’ that models claim to describe. Denoting yt the variable of interest I assume that it can be modelled as yt ¼ f0 þ f1 yt1 þ þ fp ytp þ t ;
t Nð0; s2 Þ.
(8)
All Markov chain approximations will be computed using Tauchen’s method. To apply that method the modeller needs to assume that the original variable follows an autoregressive process such as (8).3 This equation implies that yt has a continuous state space, and furthermore yt is not yet a Markov process. However, defining the vector X t ¼ fyt1 ; . . . ; ytp g, it is clear then that X t will depend only on X t1 . The details of going from a continuous state space autoregression to a Markov chain are described in the Appendix, but the main idea is to define a grid of N points, f¯y1 ; y¯ 2 ; . . . ; y¯ N g, which are the states that the variable yt can take. A simple way of doing this is to center this set at the mean of the variable and let its range be a multiple of the standard deviation of yt . The state space for yt will be the same as the 3
As it seems natural, model D, the auxiliary model, will have the same structure: it will be an AR process with lags pD .
ARTICLE IN PRESS P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
1070
state space for any lag of yt and hence the total dimension of the ‘extended’ state space for X t will be N p . Using the notation from the previous section, models A1 and A2 will be any two Markov chain models that approximately describe the above process and that will differ in the choices of p and N. The ‘extended’ state space will be described by a transition matrix of dimension N p with typical element Rik , denoting the probability of transiting from state i to state k. This transition matrix will satisfy the conditions for a limiting (stationary) distribution pi , i ¼ 1; . . . ; N to exist, with each pi being a function of the Rik . With both the transition probabilities and the limiting distribution, we can find any population moment: mean, variance, jth order autocovariance, etc. As an example, denoting the state space for the discretized variable yt as f¯y1 ; y¯ 2 ; . . . ; y¯ N g, the mean m, variance g0 and first autocovariance g1 are given by m¼
N X
pi y¯ i ,
(9)
i¼1
g0 ¼
N X
pi ð¯yi mÞ2 ,
(10)
i¼1
g1 ¼
N X i¼1
( pi
N X
) i
k
Rik ð¯y mÞð¯y mÞ .
(11)
k¼1
I now need to determine what type of econometric model I will use to link any two Markov chains (i.e. I need to choose model D) and choose the vector of moments that will be the basis of comparison between models A1 and A2 . The econometric model will be a continuous state space autoregression of order pD , since one would want the Markov chain to represent such a process. The purpose of this paper is to compare discrete approximations of a continuous state space time series. An approximation would be more accurate when it better describes the statistical properties of the original series. Therefore, an initial and sensible choice for h would be a vector whose elements were population moments of the model, say, mean, variance and first autocorrelation. However, it will be more convenient to work with an alternative choice, namely h ¼ ff0 ; f1 ; f2 ; . . . ; fpD ; s2 g0 . The reason is that given that the prior on h needs to be flat (see Assumption 2), I would need to find a prior distribution on ff0 ; f1 ; f2 ; . . . ; fpD ; s2 g0 that induces a flat prior on the vector of mean, variance and autocorrelations. It is easier to start with a flat prior on the intercept, slopes and variance of the error term since the density pðh j y; DÞ will be a function of these parameters. Given a transition probability matrix, the relationship between consumption growth population moments and h is given by ( f1 gj1 þ f2 gj2 þ þ fpD gjpD if j ¼ 1; 2; . . . ; gj ¼ (12) f1 g1 þ f2 g2 þ þ fpD gpD þ s2 if j ¼ 0;
ARTICLE IN PRESS P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
1071
where gj is the jth order autocovariance and f0 ¼ mð1 f1 fpD Þ,
(13)
where m is the population mean of the series. The mapping from the Rik , the elements of the transition matrix, to h should be clear now. Eqs. (9)–(11) and their extensions to higher-order autocovariances provide the link between the Rik and pi , and the population moments in the Markov chain model, while the inverse Yule-walker equations (12)–(13), map these population moments into elements of h. Hence, by specifying a prior distribution on the transition probabilities for a given model Aj a prior distribution is induced on h through the stationary distribution and the previous equations (inverse Yule-Walker equations). This distribution will be the density pðh j Aj Þ. The description of where these priors come from is the purpose of the next section. 3.1. Priors It has already been mentioned that pðh j DÞ / constant, and therefore the choice of a prior distribution affects only the parameters involved within the two Markov chain models A1 and A2 . The final goal is to induce priors on the elements of the object of interest h through the elements of the transition matrix Rik , to generate draws (i.e. to simulate) from the density pðh j Aj Þ. The way I proceed is to begin with a dynamic economic model that replicates certain features of the observed consumption growth process. I will use it to induce priors on the elements of the transition matrix and hence induce a density on the elements of h. The model chosen is a standard (Hansen (1985)) real business cycle model, suitably modified to incorporate growth. This model is well known, having been studied by several economists such as Campbell (1994) and Uhlig (1999), and I will just give a short introduction, to fix notation and to understand the mapping from this model to the transition matrix. This description follows Uhlig (1999) closely. The model has a representative agent that maximizes lifetime utility over consumption and leisure. The agent makes the standard two choices in simple macroeconomic models: how to allocate his endowment of time between labor and leisure, and what proportion of the physical good to invest and how much to consume. More formally, the maximization problem can be written as maxfct ;ktþ1 ;nt g1 E0 t¼0
1 X
bt ðlogðct Þ þ Hð1 nt ÞÞ;
b 2 ð0; 1Þ; H40
t¼0
subject to it þ ct ¼ zt kat1 n1a ; t kt ¼ ð1 dÞkt1 þ it ; l t þ nt ¼ 1,
0oao1, 0odo1,
(14) (15) (16)
ARTICLE IN PRESS 1072
P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
ztþ1 ¼ emþtþ1 zt ;
tþ1 Nð0; s2 Þ,
k0 40 given.
(17) (18)
The notation is standard: ct , l t and nt denote consumption, leisure and employment at time t; labor and capital, kt1 , are used in the production process which is also affected by a productivity shock zt that follows a random walk with drifting parameter m. Due to the uncertainty about the true values of the parameters b, g, d, a, r, m and s2 , I will specify prior distributions for these parameters instead of specifying fixed values. By generating draws from these distributions and solving the model for each of the draws, these priors will induce a distribution for the Rij , the parameters in the transition matrix. Standard solution techniques require stationarity in the model economy. Due to the presence of a unit root in the technology process the model presented above is non-stationary. The transformation into a stationary economy involves scaling 1=ð1aÞ variables dated t by the quantity zt1 (except hours worked and leisure). For any 1=ð1aÞ variable xt , define the new variable xt ¼ xt =zt1 . To derive decision rules (policy functions) for consumption, investment and labor, I will solve the log-linearized version of the problem; i.e., instead of solving for the decision rules in the scaled original variables ct ; it ; nt , etc. the policy functions will be constrained to be linear in their logarithms, and will be obtained in terms of deviations from the steady state. For a variable xt , let x^ t denote its percentage deviation from its steady-state value x¯ , i.e. x^ t ¼ log xt log x¯ . In log-linear form, the resource constraint and the Euler equation are
0 ¼ Ak^t þ Bk^t1 þ C w^ t þ Dt ,
(19)
0 ¼ Et ½J w^ tþ1 þ K w^ t þ F t ,
(20)
tþ1 Nð0; s2 Þ.
(21)
The matrices in this system will be functions only of the parameters in the model. In this simple problem there is only one endogenous state variable kt1 , one exogenous state variable t and a vector wt of three control variables with elements it , ct and nt . The solution is a set of two (matrix) equations that describe the equilibrium laws of motion for the steady-state deviations for the endogenous state variable and the control variables: k^t ¼ Pk^t1 þ Qt ,
w^ t ¼ Rk^t1 þ St .
(22) (23)
The solutions for the matrices P, Q, R and S are given using methods developed by Blanchard and Kahn (1980) and Sims (2002), briefly described by Uhlig (1999). These matrices will be also functions of the behavioral and technology parameters b; g; d; a; r; m and s2 , and hence given a value for the initial deviation of the capital stock from its steady-state value, realizations for shock disturbance t , and a value for the parameters, simulation from the system (22) and (23) is straightforward.
ARTICLE IN PRESS P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
1073
Table 2 Prior distributions Parameter
Prior
b d a m s
Nð0:988; ð0:01Þ2 ÞI ½0;1 ðbÞ Nð0:026; ð0:002Þ2 ÞI ½0;1 ðdÞ Nð0:40; ð0:02Þ2 ÞI ½0;1 ðaÞ Nð0:003; ð0:001Þ2 Þ Nð0:007; ð0:001Þ2 ÞI ½0;1 ðs Þ
Once I obtain a simulated series for the stationary economy, I undo the transformation to retrieve moments in the original (unscaled) variables. To specify prior distributions for the set of parameters, I obtained means drawn from previous work and also some prior predictive analysis. I wanted to ensure that the model roughly matched consumption growth moments, without being too concerned on how the model did in terms of matching observations for investment, employment, interest rates, etc. All parameters were endowed with (truncated) normal prior distributions, and the exact values for mean and standard deviations are given in Table 2.4 The process of simulating one draw from the density pðh j Aj Þ can be summarized in the following steps: Step 1: Draw one value for the parameters of the business cycle model from the prior distribution given in the previous table. Solve the model and generate a time series of length 1000 for consumption growth. Step 2: Fit an autoregression of order p by least squares to this time series and take it to be the population autoregression for the induced process for consumption growth in the real business cycle model. Step 3: Specify a value for N (number of states in the Markov chain) and the grid for consumption growth (specified to be centered at the mean of the consumption growth series generated in Step 1 and with a length equal to a multiple of the standard deviation, i.e. states are added in the center of the distribution), and apply Tauchen’s method to the autoregression fit in Step 2, yielding a transition matrix of order N ¼ N p , with typical element Rik . Impose restrictions on this matrix to yield a symmetric unconditional distribution. The choice of these two elements, the number of states N and the number of lags p is what defines a particular Markov chain approximation Aj . Step 4: From the transition matrix and the grid for consumption growth obtained in Step 2, calculate population moments such as m, g0 , g1 , etc...as in Eqs. (9)–(11), and from these, retrieve the elements of h from the inverse Yule-Walker equations (12)–(13). 4
The value of H was determined so that in a steady state the proportion of time that an individual spends working is 1/3.
ARTICLE IN PRESS P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
1074
Mean
Standard Deviation
1000
1000
800
800
600
600
400
400
200
200
0
2
4
6
8
0
1
2
3
4
5
6
7
8
9
x 10-3 First Autocorrelation (γ1/γ0)
Second Autocorrelation (γ2/γ 0)
10
10
8
8
6
6
4
4
2
2
0 -0.2
-0.1
0
0.1
0.2
0.3
10 x 10-3
0.4
0.5
0.4
0.5
0 -0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Third Autocorrelation (γ 3/γ 0) 10 8 6 4 2 0 -0.2
-0.1
0
0.1
0.2
0.3
Fig. 3. Priors for moments of consumption growth (solid line). Observed data value (dotted line).
These four steps result in one draw from the density pðh j Aj Þ. The procedure is repeated M times to obtain the shape of the entire density. Fig. 3 shows how the model does (with the priors for parameters specified above) in terms of matching certain moments of consumption growth for 5000 simulations, along with values obtained from the actual consumption growth series.5 It is important to note that results will be sensitive to the mean values in these distributions. The implied consumption growth process would look very different with substantial changes in parameters such as the discount factor or the coefficient of relative risk aversion. 3.2. Posterior simulation As mentioned, the posterior density pðh j y; DÞ is the density to which simulations from a given model will be compared, providing a link between two models and 5
When simulating the model, the first autocorrelation was constrained to be smaller than 0.3, not to have too much persistence in the series. The standard deviation was also bounded below at a value of 0.0009. These constraints can be seen as additional properties of my prior distribution for consumption growth within the RBC model.
ARTICLE IN PRESS P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
1075
enabling the computation of a Bayes factor. A kernel for this density is given by the Bayes theorem, a formula that I rewrite again here for convenience: pðh j y; DÞ / pðy j h; DÞpðh j DÞ.
(24)
Assumption 2 ensures that this posterior will be proportional just to the likelihood given that the prior is flat. For the comparison of Markov chain approximations it is convenient to specify an autoregression as my model D. In this case, I can make use of the results in Zellner (1971) and generate draws from the posterior in the following way. Suppose I specify model D to be yt ¼ f0 þ f1 yt1 þ þ fpD ytpD þ t ;
t Nð0; s2 Þ.
(25)
Denote b ¼ ðf0 ; f1 ; . . . ; fpD Þ0 , then h ¼ ðb; s2 Þ0 . Given Assumption 2, in which the prior distribution for h is constrained to be proportional to a constant, the posterior distribution is proportional to the likelihood: T=2 1 1 0 0 2 2 ^ ^ pðb; s j y; DÞ / exp 2 ðS þ ðb bÞ X X ðb bÞÞ I ðsÞ ðbÞ. (26) s2 2s To clarify the notation in this expression, here T denotes the total number of observations, b^ denotes the ordinary least-squares estimate of b, S 2 is the total sum ^ 0 ðy X bÞ, ^ y is a T 1 vector with all the observations, X is the of squares ðy X bÞ data with pD lagged values of yt for all observations T, and I ðsÞ ðbÞ is an indicator function that assigns a value of zero to the density whenever any eigenvalue of the autoregression is outside the unit circle. From the previous expression it is clear that ^ ðX 0 X Þ1 s2 ÞI ðsÞ ðbÞ. b j s2 ; y; DMVNðb; 2
(27) 2
2
Since usually s is unknown I need the marginal posterior density for s , pðs j y; DÞ, which is obtained by integrating out b from the posterior: Z 2 pðs j y; DÞ ¼ pðb; s2 j y; DÞ db, (28) which yields s2 j yInverse-w2 ðS 2 ; T d 2Þ. In this expression d is the dimension of the vector b. Hence to simulate from this posterior a draw from s2 j ðy; DÞ is made, and conditional on this draw, b is generated from b j ðs2 ; y; DÞ.6 Before discussing the results, let me summarize the main points of Section 3. The goal is to compare Markov chain approximations which are defined by a choice of p (lag order) and a choice of N (number of gridpoints in the state space). In order to undertake the comparison the modeller needs to specify a vector of interest, h, which is the object that the approximations assessed should be able to describe well. The 6
With this approach, each draw comes from the posterior distribution and there is no convergence (burn-in) phase, as is the case with Gibbs sampling, for example.
ARTICLE IN PRESS P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
1076
first part of the section explains the mapping between h and the moments of an autoregressive process of order p. For a given Markov chain approximation Aj , this mapping will be used to draw from the density pðh j Aj Þ using the real business cycle model as a prior, as described in the four steps at the end of Section 3.1. To compute the marginal likelihood for the approximation Aj , it is necessary to draw h from the density pðh j y; DÞ in order to compute posterior moments and be able to evaluate the posterior density at the draws from models A1 and A2 , as explained at the end of Section 2. This step is done using a standard posterior simulator described in Section 3.2.
4. Results After discussing how to obtain simulations from models D and Aj , and explaining how to obtain a marginal likelihood, I present some results in this section. The definition of consumption used here is total real personal consumption expenditures, which is transformed to per capita terms through dividing by the total noninstitutionalized population 16 years or older. Real consumption is taken from the DRI database (original source: Bureau of Economic Analysis) and population is taken from the Bureau of Labor Statistics. The sample covers the period 1947–2001; there are 212 quarterly observations. A continuous state space AR of order pD ¼ 3 has been chosen as model D. The reason for this choice is that according to Otrok et al. (2002) this seems to be the best representation for consumption growth when using autoregressive models with the number of lags varying between 1 and 5. Dct ¼ f0 þ f1 Dct1 þ f2 Dct2 þ f3 Dct3 þ t ;
t Nð0; s2 Þ.
(29)
Throughout, the number of draws from the posterior distribution of this autoregression and the number of simulations from the prior for the real business cycle model were, both, set at 5000. Posterior means and standard deviations from model D are displayed in Table 3. The main results are presented in Table 4. The auxiliary model D was always an autoregression of order pD ¼ 3 and one simulation from this model was used to compute posterior moments. Instead of presenting Bayes factors, the table shows marginal likelihoods7 (along with numerical standard errors) for several choices of p, the order of the population autoregression in the Markov chain model, and N, the number of gridpoints in the state space. For most combinations of N and p, the performance of the approximations increase as both N and p get large. However, for N sufficiently large (about 4 or 5) gains in accuracy are small (if any) and there is considerable overlap between the distribution of marginal likelihoods. For example, taking p ¼ 1 and N ¼ 20, a two standard error region around the marginal likelihood includes the computed 7
The Bayes factor between any two models is the ratio of the marginal likelihoods.
ARTICLE IN PRESS P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
1077
Table 3 Posterior means and standard deviations (model D parameters)
f0 f1 f2 f3 s2
Mean
Standard deviation
0.7229 0.1053 0.0843 0.0915 4:2 105
0.1045 0.0678 0.0686 0.0678 4:2 106
Table 4 Marginal likelihoods (standard errors in parentheses): model D: AR(3); h ¼ ðf0 ; f1 ; f2 ; f3 ; s2 Þ0
N¼2 N¼3 N¼4 N¼5 N ¼ 10 N ¼ 15 N ¼ 20
p¼1
p¼2
p¼3
4.0 1060 (4.0 1060) 1.7 108 (1.1 107) 2.2 108 (1.5 107) 2.9 108 (1.6 107) 3.2 108 (1.7 107) 3.4 108 (1.7 107) 3.1 108 (1.5 107)
5.5 1017 (5.5 101017) 5.4 108 (3.5 107) 6.7 108 (3.5 107) 6.4 108 (3.4 107) 7.4 108 (3.6 107) —
6.2 10 9 (6.2 109) 1.5 109 (9.2 107) 1.9 109 (1.1 108) (2.7 109) (1.4 108) —
—
— —
marginal likelihood for the case p ¼ 1 and N ¼ 5.8 The large gains in the adequacy of the approximation also occur at small N when p ¼ 2. The combinations p ¼ 2; N ¼ 4 and p ¼ 5; N ¼ 5 have very close marginal likelihoods, and both have a considerable overlap with the distribution of the marginal likelihood for the case p ¼ 2; N ¼ 10. The interesting exercise is to look at the optimal choice among different models when the dimension of the transition matrix N ¼ N p is fixed. For N p ¼ 4, there are two choices: either p ¼ 1; N ¼ 4 or p ¼ 2; N ¼ 2. Clearly, the choice favors p ¼ 1 with a Bayes factor of the order of almost 1024 . Increasing N to about 8 or 10, gives three possibilities: p ¼ 1; N ¼ 8, p ¼ 2; N ¼ 3 and p ¼ 3; N ¼ 2. The choice now favors p ¼ 2, with a marginal likelihood of roughly the same order of magnitude as p ¼ 1 and much larger than p ¼ 3. If I increase N to be 15 or 16 the two competing models are p ¼ 1; N ¼ 15 and p ¼ 2; N ¼ 4. Now the procedure favors the p ¼ 2 case with a Bayes factor of about 2. 8
For computational reasons the cases p ¼ 2; N410 and p ¼ 3; N45 were not considered and marginal likelihoods are not reported.
ARTICLE IN PRESS 1078
P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
Overall it seems that modelling consumption growth as an autoregression of order 2 prior to its discrete state approximation yields satisfactory results with only 3 or 4 states and a total dimension for the transition probability matrix of 9 or 16.
5. Conclusion Although Markov chain approximations have been used extensively in economics, the tradeoff between the two fundamental issues when modelling a time series as a Markov chain, namely, the number of lags (p) in the original model, and the dimension of the state space (N) has been (to my knowledge) ignored. In this paper I have approached this issue using a methodology that has been recently used to compare calibrated models in dynamic macroeconomics, and amounts to measuring the degree of overlap between multivariate densities of a vector of interest that the models claim to describe. The particular series chosen to address the question was US post-war real per capita consumption growth. For a chosen vector of interest, optimal choices for N and p were calculated for several dimensions of the final transition matrix. In this case, it turned out to be advantageous to sacrifice some detail in the state space (N) in order to obtain a better representation of the dynamics.
Acknowledgements I thank John Geweke, Beth Ingram and especially Charles Whiteman for their help. Two referees gave suggestions that substantially improved a previous version of the paper. I also thank Grace Chan, Simon Lee, Linnea Polgreen, seminar participants at the University of Iowa for useful comments. Finally, I thank Chris Otrok for providing me with the computer code for the multivariate version of Tauchen’s procedure. The views expressed here are the author’s and not necessarily those of the Federal Reserve Bank of Atlanta or the Federal Reserve System.
References Aiyagari, R., 1994. Uninsured idiosyncratic risk and aggregate saving. Quarterly Journal of Economics 109, 659–684. Blanchard, O.J., Kahn, C.M., 1980. The solution of linear difference models under rational expectations. Econometrica 48, 1305–1312. Campbell, J.Y., 1994. Inspecting the mechanism: an analytical approach to the stochastic growth model. Journal of Monetary Economics 33, 463–506. Canova, F., Ortega, E., 2000. Testing calibrated general equilibrium models. In: Mariano, R., Schuerman, T., Weeks, M. (Eds.), Simulation-based Inference in Econometrics. Cambridge University Press, Cambridge, pp. 400–436. Cechetti, S., Lam, P., Mark, N., 1994. The equity premium and the risk free rate: matching the moments. Journal of Monetary Economics 31, 21–45.
ARTICLE IN PRESS P. Silos / Journal of Economic Dynamics & Control 30 (2006) 1063–1079
1079
Cooley, T., Prescott, E., 1995. Economic growth and business cycles. In: Cooley, T. (Ed.), Frontiers of Business Cycle Research. Princeton University Press, Princeton, pp. 1–38. DeJong, D., Ingram, B., Whiteman, C., 1996. A Bayesian approach to calibration. Journal of Business and Economics Statistics 14, 1–10. Ferna´ndez-Villaverde, J., Rubio-Ramı´ rez, J., 2004. Comparing dynamic equilibrium models to data: a Bayesian approach. Journal of Econometrics 123, 153–187. Geweke, J., 1999. Computational experiments and reality. Federal Reserve Bank of Minneapolis Staff Report. Gregory, A., Smith, G.W., 1991. Calibration as testing: inference in simulated macroeconomic models. Journal of Business and Economics Statistics 9, 297–303. Hansen, G., 1985. Indivisible labor and the business cycle. Journal of Monetary Economics 16, 309–327. Mehra, R., Prescott, E.C., 1985. The equity premium: a puzzle. Journal of Monetary Economics 10, 335–359. Otrok, C., Ravikumar, B., Whiteman, C.H., 2002. Evaluating asset pricing models using the Hansen–Jagannathan bound: a Monte Carlo investigation. Journal of Applied Econometrics 17, 149–174. Sims, C., 2002. Solving linear rational expectations models. Computational Economics 20, 1–20. Tauchen, G., 1986. Finite state Markov-Chain approximations to univariate and vector autoregressions. Economic Letters 20, 177–181. Tauchen, G., Hussey, R., 1991. Quadrature-based methods for obtaining approximate solutions to nonlinear asset pricing models. Econometrica 59, 371–396. Uhlig, H., 1999. A toolkit for analysing nonlinear dynamic stochastic models easily. In: Marimon, R., Scott, A. (Eds.), Computational Methods for the Study of Dynamic Economies. Oxford University Press, Oxford, pp. 30–61. Vuong, Q.H., 1989. Likelihood ratio test for model selection and non-nested hypotheses. Econometrica 57, 307–333. Zellner, A., 1971. An introduction to Bayesian inference in econometrics. John Wiley and Sons.