Journal of Economic Dynamics & Control 74 (2017) 56–86
Contents lists available at ScienceDirect
Journal of Economic Dynamics & Control journal homepage: www.elsevier.com/locate/jedc
DSGE pileups$ Stephen D. Morris Department of Economics, Bowdoin College, 9700 College Station, Brunswick, ME 04011, United States
a r t i c l e in f o
abstract
Article history: Received 14 March 2016 Received in revised form 4 November 2016 Accepted 11 November 2016 Available online 17 November 2016
The sampling distribution of estimators for DSGE structural parameters tends to be nonnormal and/or pile up on the boundary of the theoretically admissible parameter space. This calls into question both the reliability of asymptotic approximations and the presumption of correct specification. This paper seeks to develop a conceptual framework for understanding how these phenomena arise, and to provide pragmatic methods for dealing with them in practice. The results are presented in three examples and a medium scale DSGE model. & 2016 Elsevier B.V. All rights reserved.
JEL classification: C32 C51 E12 E52 Keywords: Global identification Maximum likelihood estimation Posterior probability
1. Introduction This paper considers dynamic stochastic general equilibrium (DSGE) models, a core empirical tool of monetary policymakers and academics alike. Important policies are routinely devised on the basis of estimated DSGE structural parameters, which may include the discount rate, coefficient of relative risk aversion, indices of price and wage stickiness, and other theoretical objects.1 Despite their wide use, a common empirical concern remains the typically unwieldy character of the small sample likelihood function.2 When the likelihood is multimodal or otherwise “not well behaved”, the distribution of the maximum likelihood estimator (MLE) may be as well.3 In other cases, the MLE may lie on the boundary of the theoretically admissible parameter space, casting a shadow over the primitive assumption of correct specification.4 In either case, conventional Gaussian asymptotics yield poor approximations to the true sampling ☆ Previous versions of this paper circulated as Chapter 1 of the author's 2014 UC San Diego dissertation and subsequent working papers. The author gratefully acknowledges the support of his advisor, James D. Hamilton, and dissertation committee. Seminar participants at UC San Diego, Bowdoin College, the 2014 Annual Meeting of the Society for Economic Dynamics at the University of Toronto, and the 2014 Identification in Macroeconomics Workshop at the National Bank of Poland provided useful feedback. Finally, the paper benefitted from suggestions from Thomas Lubik (the editor) and two anonymous referees. Any errors are the author's own. Comprehensive replication materials are freely available on the author's website. E-mail address:
[email protected] . 1 Of the many central banks that openly use DSGE models to inform policy decisions are the Swedish Sveriges Riksbank, the Norwegian Norge Bank, and the US Federal Reserve. See also Christiano et al. (2010). In terms of academic diffusion, as of November 2016, the representative paper of Smets and Wouters (2003) had over 3700 citations on Google Scholar. 2 See for example Andreasen (2010). 3 See the Technical Appendix in Bårdsen and Fanelli (2015), Figures 1 and 2, for a bimodal distribution for the MLE. 4 For instance, it is common to find an estimated discount factor which wishes to exceed the upper bound of 1.
http://dx.doi.org/10.1016/j.jedc.2016.11.002 0165-1889/& 2016 Elsevier B.V. All rights reserved.
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
57
distribution. Moreover, the posterior probability may inherit some of these properties, making it difficult to compute.5 By now, the identification and estimation of DSGE models is in many respects well understood (Fernández-Villaverde et al., 2015). But the source of these particular outcomes remains largely mysterious, with weak and global identifiability sometimes proffered as relevant concerns.6 This paper seeks to develop a conceptual framework for understanding how these phenomena arise, and to provide pragmatic methods for dealing with them in practice. Related to the first objective is the identifiability of the structural parameters. When we study the identifiability of a DSGE model, there are two dimensions to be considered (Canova and Sala, 2009). The first is binary yes/no local identifiability (Iskrev, 2010; Komunjer and Ng, 2011; Qu and Tkachenko, 2012) or global identifiability (Fukač et al., 2007; Kociȩcki and Kolasa, 2014; Qu and Tkachenko, 2016). Both of these concern whether a point has observationally equivalent points elsewhere in the parameter space. The second is weak identification (Andrews and Mikusheva, 2015; Dufour et al., 2013; Guerron-Quintana et al., 2013; Inoue and Rossi, 2011; Qu, 2014). Weak identification occurs when structural parameters may be point identifiable, but near enough regions of non-identifiability so that conventional Gaussian asymptotics yield poor approximations, even in large finite samples.7 The term “pileups” was first used to describe outcomes of weak identification in DSGE models by Kleibergen and Mavroeidis (2014) with respect to Bayesian estimation. They use the term to illustrate how the posterior probability may accumulate near regions of nonidentifiability when priors are not informative enough to compensate for lack of information in the data. This paper documents how such phenomena may arise with respect to classical estimators as a consequence of weak identifiability via the choice of parameter bounds and issues pertaining to global identifiability. In three well-known models, three distinct and commonly observed, yet sometimes incompletely understood phenomena are shown to arise: Boundary estimates, skewed distributions, and multiple modes. Ultimately, we also connect these observations to such outcomes for the posterior probability. The second objective of this paper is to introduce an estimation methodology which is well-suited to these concerns. As Kleibergen and Mavroeidis note (p. 1184), the methodology they introduce is intended for the limited information analysis of single equations which come from a DSGE model with reduced form vector autoregression (VAR) representation. This procedure is not amenable to medium scale DSGE models, which are generally large systems of equations without finite order VAR representation. One natural way to go about the analysis of weakly identifiable parameters is with robust statistics, which are available for DSGE models in the general case (e.g. Andrews and Mikusheva, 2015). But classical estimators for DSGE parameters often lie on or near the boundary of the admissible parameter space, which causes even weak identification robust asymptotic covariances to become degenerate. Thus, one is forced to manually suppress the MLE from obtaining the boundary to make these results operable (see Andrews and Mikusheva, p. 137). On the other hand, computing the small sample distribution for the MLE by bootstrap may be infeasible, since one is confronted with a speed vs. reliability tradeoff with respect to numerical minimizers (Andreasen, 2010). This paper presents a different empirical approach, the minimum chi-squared estimator (MCSE), which is efficient in the sense that it is asymptotically equivalent to the MLE, yet is easier to compute in practice. It is based on the VARMA representation of a DSGE model, which exists in the general case (Morris, 2016b), and contrary to conventional wisdom (i.e. Hannan, 1971), may in fact be an identifiable reduced form representation under reasonable parameterizations (Zadrozny, 2016). This facilitates the computation of small sample bootstrapped confidence intervals, for example. Moreover, an immediate byproduct of this framework is two tests for model specification. This allows one to differentiate between symptoms of weak identification, compared with symptoms of misspecification, which may otherwise be ambiguous. This paper is structured as follows: Section 2 discusses technical background related to the pileup phenomenon. Section 3 develops the econometric framework to be utilized in the remainder of the paper. Section 4 puts this methodology to work in the context of three examples. Section 5 discusses the consequences of the findings in these examples for the posterior probability with uninformative, and informative priors. Section 6 carries forth the approach to the widely utilized medium scale DSGE model of Smets and Wouters (2007). Section 7 concludes.
2. Background The most well known usage of the term “pileups” refers to peculiarities in the moving average model yt ¼ ut þ θut 1 for ut iidNð0; σ 2 Þ. It is helpful to review this case for context. Broadly speaking jθj may take on any value in ð0; 1Þ, but in economic applications, analysts have traditionally imposed the invertibility condition that jθjA Θ ¼ ð0; 1Þ.8 This theoretically motivated condition is also convenient in that it serves as an identifying restriction. The concentrated likelihood function 5 See An and Schorfheide (2007) for the bimodal posterior probability for a small scale New Keynesian model. In terms of computational effort, if the posterior is multimodal, Markov Chain Monte Carlo algorithms tend to undercover as they become “stuck” on one mode. See Creal (2012), Chib and Ramamurthy (2010), Herbst and Schorfheide (2014), Lanne and Luoto (2015), and Waggoner et al. (2016) for alternative robust posterior computation approaches. 6 Mikusheva et al. (2014) Figure 1 depicts the bimodal distribution of the MLE arising from weak identification. Herbst and Schorfheide (2014) demonstrate how a bimodal posterior may arise when parameters are not globally identified. 7 See Nelson and Startz (1990) and Bound et al. (1995), which consider instrumental variables estimators. 8 jθj o 1 guarantees that ðI þ θLÞ 1 exists so that ut is a function of current and past observables only.
58
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
Fig. 1. Pileups in the moving average model.
has the property that LðθÞ ¼ Lð1=θÞ 8 jθjA ð0; 1Þ, so jθj is only globally identifiable on ð0; 1Þ if jθj ¼ 1.9 In other words, θ and θ ¼ 1=θ are observationally equivalent (Rothenberg, 1971). Imposing invertibility yields global identifiability on Θ for jθj A ð0; 1Þ [ ð1; 1Þ because only one of jθj or jθ j may lie on Θ. The analyst, having found a theoretically inadmissible MLE ^ A ð1; 1Þ, simply transforms it to the observationally equivalent, but theoretically sensible estimator 1=θ^ A Θ. Therein jθj ^ A ð1; 1Þ is identifiable on Θ despite being inconsistent with theory. jθj The trouble lies in the seeming paradox that the set of potential likelihood maximizers under these restrictions, fΘ; 1g, is larger than the theoretically motivated space Θ. Indeed, 1 is not just inconsistent with the theory underlying Θ, but also not identifiable on Θ, as it is neither an element nor becomes one when inverted. The phenomenon observed in Monte Carlo simulations by Kang (1975) and subsequent studies was that the MLE “piled up” on 1 with positive probability in small samples, even when θ0 A Θ in population.10 Thus, the resulting distribution had two modes, at θ0 and 1.11 Similar results were later described with respect to higher-order MA and ARMA processes and unobservable components models.12 Fig. 1 summarizes this discussion by categorizing points along ð0; 1Þ as consistent with theory or globally identifiable on Θ, with or without the transformation j1=θj. We now return to DSGE models which, in contrast, depend on multiple structural parameters. These are henceforth collected in the k 1 vector θ. As in the moving average model, restrictions for the parameter space θ A Θ DRk have been imposed essentially since the inception of DSGE estimation, and have traditionally catered to what values for the structural parameters are theoretically reasonable.13 Furthermore, more recently, the identifiability of DSGE parameters has also been called into question.14 In response to this concern, several strands of literature have emerged which provide the analyst with the means of further restricting the parameter space so that each element is observationally equivalent to only itself in said space.15 Thus, like the moving average model, DSGE models are empirically dependent upon restrictions ensuring both theoretic feasibility and identifiability. There are additional parallels with the moving average model. Later in this paper, each value of the structural parameters θ in the canonical New Keynesian model is proven to have exactly one observationally equivalent point θ outside of its locality. In other words, these parameters are not generally globally identifiable in Rk . Such observational equivalence may occur between points of determinacy and/or indeterminacy, both of which therefore must be considered. But as in the moving average model with its invertibility condition, identifiability in Rk is not so relevant as is identifiability within Θ DRk , the theoretically admissible space. The positive result of this analysis is that theoretical restrictions on Θ alone may prove enough to partition observationally equivalent points, therein ensuring global identifiability. So why is it that, as we will see, a pileup phenomenon occurs for the New Keynesian model regardless? DSGE pileups are differentiated from MA pileups in three respects, summarized diagrammatically in comparison with Fig. 1 by Fig. 2. First, in the moving average model, 1 is the only point which is neither consistent with theory nor has an observationally equivalent point which is. In DSGE models there are entire regions of Rk with these characteristics (□). Therefore, it is not surprising that an estimator such as the MLE should fall in these regions in small samples. Second, in DSGE models there is an entire secondary category of problematic points. These are theoretically admissible points which are observationally equivalent to other theoretically admissible points, and therein may not be distinguished between on that basis alone ( ). The union of these regions make up the “pileup region.” Recognizing that pileups are a consequence of weak identifiability, or, the closeness of parameter points to non-identifiable regions, the closeness of the population parameter value to this region will dictate the severity of the phenomenon. Third, the MA pileup point 1 is observationally equivalent to only itself. But in DSGE models, points in the pileup region may be observationally equivalent to other nonequivalent points. Thus, DSGE pileups are 9 The covariances which are sufficient statistics for the likelihood, Eðy2t Þ ¼ ð1þ θ2 Þσ 2 , Eðyt yt 1 Þ ¼ θσ 2 , and Eðyt yt n Þ ¼ 0 8 n Z 2, are identical when ðθ; σ 2 Þ are replaced by ð1=θ; ðθσÞ2 Þ. 10 Along with Kang, Cooper and Thompson (1977) and others presented simulations. Sargan and Bhargava (1983) provided a formal mathematical explanation for this phenomenon. 11 See DeJong and Whiteman (1993) Figure 2, column 1 for a graphical depiction. 12 See Stock and Watson (1998). Stock (1994) provides a summary review of this literature, and pileups more generally. 13 See Ireland (2004, Section 2) for an early example of defining θ on the basis of economic theory. 14 Aside from the references in the Introduction, Kleibergen and Mavroeidis (2009) consider weak identification of coefficients in the Phillips curve and Cochrane (2011) considers non-identification of coefficients in the Taylor rule. 15 Iskrev (2010), Komunjer and Ng (2011), and Qu and Tkachenko (2012) consider local identification of DSGE parameters. These results suggest which subset of the structural parameters must be fixed to constants to achieve conditional local identifiability of the complement subset at a point. Fukač et al. (2007), Kociȩcki and Kolasa (2014), and Qu and Tkachenko (2016) consider global identification. The results of these papers may be used to determine parameter spaces in which global identifiability is known.
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
59
Fig. 2. Pileups in DGSE models. Notes: “ A⋃B ” denotes the union of sets A and B and “A0 ” denotes the complement of A.
often realized as multimodal distributions. Finally, note that regions of the parameter space may be determinate or indeterminate (possessing sunspot equilibria). This will further be addressed in the coming examples. How may one approach this problem? In practice, one must be careful to discern whether structural parameters are globally identifiable, and where observationally equivalent points lie. But this is typically no easy task, since the state space parameters which constitute DSGE models are not themselves identifiable. Moreover, it remains unclear how to compute a relatively realistic small sample distribution of a given estimator, regardless. The next section presents a framework which is useful in this otherwise complicated situation.
3. Framework 3.1. Model Consider DSGE models with state space representation, xt ¼ cx þAx xt 1 þ Ay yt 1 þBεt yt ¼ cy þC x xt 1 þ C y yt 1 þ Dεt
ð1Þ
yt is an ðn 1Þ vector of observables, xt is an ðs 1Þ vector of possibly unobservable states, and εt is a column vector of errors with dimension at least 1. All matrix coefficients are conformable. A general class of models has this representation; for example, (1) may always be written in the so-called ABCD form by defining the state vector z0t ¼ x0t y0t (FernándezVillaverde et al., 2007). In this case, the state and observation matrices are, respectively, " # Ax Ay ð2Þ and C ¼ C x C y : A¼ Cx Cy This paper restricts attention to the case of (1) which meets four assumptions:
60
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
Assumption 1 (Square and Gaussian). εt is ðn 1Þ with εt iidNð0n1 ; ΣÞ. Assumption 2 (Regular). Σ is ðn nÞ positive definite. Assumption 3 (Stationary). For any z A C, if detðzI s þ n AÞ ¼ 0 then jzjo 1. Assumption 4 (Left-invertible). detðHðzÞÞ a0 in jzj 4 1 for impulse-response function HðL 1 Þ ¼ D þ CðI s þ n ALÞ 1 BL and lag operator L. Under Assumptions 1 and 3, fyt gTt ¼ 1 is weakly stationary with Eðyt Þ ¼ ½0ns I s ðI n þ s AÞ 1 c for the j-dimensional identity h i matrix I j and c0 ¼ c0x c0y . Assuming momentarily that c ¼ 0s þ n1 for conciseness, yt also has causal, infinite-order vector moving average ðVMAð1ÞÞ representation yt ¼ HðL 1 Þεt .16 Assumption 4 ensures that this is the Wold representation, and so there do not exist infinitely many observationally equivalent VMAð1Þ representations (Lippi and Reichlin, 1994). In addition, it implies that D is full rank (Kociȩcki and Kolasa, 2014, Appendix Proposition). This yields two substantive implications for the analysis: First, fyt gTt ¼ 1 also has VMAð1Þ representation yt ¼ HðL 1 ÞD 1 ut for ut ¼ Dεt reduced form errors with ut iidNð0n1 ; ΩÞ and Ω ¼ DΣD0 . Second, given also Assumption 2, Ω is positive definite (Abadir and Magnus, 2005, p. 222). Therefore, it has Cholesky decomposition Ω ¼ LL0 for L lower triangular and nonsingular. Finally, the parameters c, A, B, C, D, and Σ typically depend on a set of underlying structural parameters. This feature is abstracted from momentarily but will be returned to. 3.2. Reduced form representations of the model Assumption 4 implies that HðL 1 Þ is invertible, and therefore that (1) has pure vector autoregressive (VAR) formulation ΦðLÞyt ¼ ut for ΦðLÞ 1 ¼ HðL 1 ÞD 1 ¼ I n þ Φ1 L þ Φ2 L2 þ ⋯ þΦp Lp and each Φi ðn nÞ. In special cases, the order p of this VAR(p) may be finite. This feature is useful when true, since the finitely many reduced form parameters Φ1,…,Φp, and vechðLÞ fully characterize the likelihood function (Rothenberg, 1971) and form the basis of impulse-response matching estimators, for example Christiano et al. (2005). Explicitly, as reduced form parameters, these terms are known to be globally identifiable (via the closed-form Yule Walker equations). This makes them considerably easier to work with than the ABCD parameters of (1), which are subject to equivalent similarity transformations in the spectral density matrix, and thus not reduced form parameters in the traditional sense (Komunjer and Ng, 2011). Unfortunately, as discussed by Ravenna (2007), (1) only necessarily has equivalent VARð1Þ representation. This subjects empirical approaches which presume finite VAR representation a priori to not inconsequential truncation biases. Even when (1) does not have finite order VAR representation, it does however typically have finite order vector autoregressive moving average (VARMA) representation (Hannan and Diestler, 1988, Chapter 2). Such a VARMA(p,q) model emerges from Wold representation via another decomposition of the VMAð1Þ parameters: ΦðLÞ 1 MðLÞ ¼ HðL 1 ÞD 1 . In other words, now again allowing for c a 0s þ n1 in general, yt ¼ μþ Φ1 yt 1 þ ⋯ þ Φp yt p þ ut þ M 1 ut 1 þ ⋯ þ M q ut q
ð3Þ
for μ ðn 1Þ and each M i ðn nÞ. And yet, while finite order VARMA representation may exist when finite order VAR does not, VARMA representation is not generally utilized towards the identification and estimation of DSGE models (outside of a few exceptions, e.g. Paccagnini and Rossi, 2012). This is due to the dueling and formidable complications of (1) obtaining concise finite order VARMA representation from a given ABCD model and (2) ensuring that this representation is identifiable. In other words, the modeling challenge is to find VARMA parameters which may be considered reduced form parameters in the traditional sense. In consideration of the first point, not all Wold representations imply finite order p and q VARMA models (Lütkepohl, 2006). Specifically, finite order VARMA representation exists if and only if the infinite dimensional Hankel (mirror image Toeplitz) matrix of block autocovariances (second moments) Γ i ¼ Eðyt y0t i Þ ¼ Γ 0 i 8 i Z 1 has finite rank (see Box et al., 2015, p. 540). The rank of the Hankel matrix, known as the McMillan degree rn maxfp; qg, is typically difficult to ascertain directly; one must be able to show that all second moments above some finite order are linear combinations of lower order second moments. Moreover, the second moments of DSGE models are generally nonlinear and/or not analytical functions of possibly many underlying structural parameters. These features make determining the McMillan degree directly difficult. In consideration of the second point, even if the McMillan degree were known, the resulting finitely many VARMA parameters would not necessarily be identifiable. This is due to the well known problem of common left AR and MA factors (Hannan, 1971). Specifically, if one were to premultiply ΦðLÞ and MðLÞ by any invertible operator CðLÞ ¼ C 0 þ C 1 L þ ⋯ satisfying detðC 0 Þ a 0 and detðCðzÞÞ a0 for z r 1, they would obtain an equivalent VARMA decomposition of the Wold 16 The assumptions that the model is square (not stochastically singular) and Gaussian in Assumption 1 are overly strong for VMAð1Þ representation to exist. However, they imply that the likelihood function is closed-form, which will be utilized in this paper. These restrictions are consistent with a large subset of the literature; for example, squareness is frequently achieved by augmenting a given model with serially correlated errors (Ireland, 2004). The results of this paper could possibly be extended to the non-Gaussian case by application of particle filtering, for example Fernández-Villaverde and Rubio-Ramírez (2007).
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
61
parameters ½CðLÞΦðLÞ 1 CðLÞMðLÞ ¼ ΦðLÞ 1 MðLÞ. For this reason, exclusion restrictions are usually placed on the AR and MA parameters to ensure unique representations which cancel common factors, i.e. which ensure that the operator ½ΦðLÞ: MðLÞ is left coprime. Such canonical representations include the echelon and final equations forms of VARMA models, which are well known and widely used outside of the DSGE case (see Lütkepohl, 2005, Chapter 12.1.2). But determining the McMillan degree of a DSGE model which now in addition is a canonical form seems to be entirely infeasible. Certainly these difficulties with respect to VARMA representation of DSGE models are potentially daunting. But the possibility that (1) may have finite VARMA representation which is left coprime by construction seems not to have even been considered. If this were true, it would be useful. In fact, it is not difficult to determine a finite order VARMA representation for (1), should it exist, and whether the resulting parameters are indeed identifiable reduced form parameters. This only requires a few additional, but generally satisfied, assumptions. 3.3. Reduced form VARMA parameters We now make an assumption about (1) pertaining to the number of states and observables: Assumption 5. s Z n. This assumption may always trivially be satisfied by adding elements of yt also to xt. Otherwise, persistent shocks, MA errors, or more lags of states may be added to the model. Later, we will see that any such restriction ultimately becomes a testable hypothesis. Regardless, as we will also see, in medium-scale models this assumption will generally be satisfied by construction. Given Assumptions 4 and 5, the results of Morris (2016b) guarantee that the observables fyt g of the state space (1) have VARMA representation. For convenience, the substantive implication of Proposition 1 and Corollary 1 of that paper is restated here as a single Proposition. Proposition 1 (Morris, 2016b). Eq. (1) has VARMA(p, p 1) representation (3) for p ¼ i þ2 and i Z 0 if F i exists and is full column rank n for 0 Fi ¼ ðsnÞ
@
F x ð jÞ ¼ ðnsÞ
i X
11 F x ð jÞ F x ð jÞA 0
F x ðiÞ0
ð4Þ
j¼0
∂yt ∂xt 1 j
( ¼
Cx
CAj 1 A0x
0 C 0x
if j ¼ 0:
ð5Þ
if j4 0:
This result is summarized in Appendix A, where also the closed form functional correspondence between representations state space and VARMA representations (1) and (3) is given in Eq. (A.1). In comparison with the results of Ravenna (2007), which may yield large p and q orders for the companion VARMA form of a state space model, the above result may be used to pragmatically obtain the most concise possible VARMA representation. In practice, one simply begins with i¼0 and successively computes the rank of F(i) until it is found to exist and be full column rank n. The iteration may be terminated at least at i¼T (the time sample of interest), or any other finite number above which a finite order representation is not useful in the application of interest. Note, since i Z0, the smallest possible representation under this result is VARMA(2,1); Corollaries 2 and 3 of Morris (2016b) describe further restrictions on the parameters of (1) which imply VARMA(1,1), or VAR (1) representation, exist. In practice, the subjective decision of how to specify the variables of (1) so as to satisfy Assumption 5 will affect the rank of Fi for each i, and/or whether these additional Corollaries are applicable. In interest of obtaining the most concise representation, a good practice for larger-scale models is to first reduce the dimensionality of the model (1) to its minimal form. Komunjer and Ng (2011) provide guidance to obtain this representation by simply deleting rows and columns from parameters and variables. Let us assume a VARMA(p,q) representation (3) has been distinguished for a given model (1) using Proposition 1, or any other method. Generally, such a VARMA model will be subject to linear (e.g. exclusion) restrictions, 0
½μ0 ϕ0 m0 l 0 ¼ Rπ þ r 0
ð6Þ 0 0
0
0 0
ϕ ¼ ½ðvecðΦ1 ÞÞ ; …; ðvecðΦp ÞÞ is ðpn 1Þ, m ¼ ½ðvecðM1 ÞÞ ; …; ðvecðM q ÞÞ is ðqn 1Þ, and l ¼ vechðLÞ is ðnðn þ 1Þ=2 1Þ. R is a known, conformable matrix with h rn þðp þ qÞn2 þnðn þ 1Þ=2 columns, and full column rank h. r is a known vector. π is an ðh 1Þ vector of parameters possibly in-common of μ, ϕ, m, and l. As previously noted, the identifiability of π is no certain issue in general. Yet the results of Zadrozny (2016) allow one to ascertain whether π are identifiable, and thus reduced from parameters in the traditional sense. This result does not require one to find the McMillian degree, or associated canonical form, directly. To utilize it, let us assume that the VARMA(p,q) model (3) under consideration is a VARMAðp; p 1Þ. This is conveniently the implication of Proposition 1, but in general any 2
2
62
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
VARMA(p,q) model may trivially be written in this form by including AR or MA terms which are 0nn . Given this p define, 2 3 2 3 M1 In … 0nn Φ1 In … 0nn 6 ⋮ 6 ⋮ ⋮ ⋱ ⋮ 7 ⋮ ⋱ ⋮ 7 6 7 6 7 7 7 M ð7Þ Φ ¼6 ¼6 6 7 6 7 0 … I M 0 … I Φ ðnpnpÞ ðnðp 1Þnðp 1ÞÞ nn n nn n p 2 p 1 4 5 4 5 M p 1 0nn … 0nn Φp 0nn … 0nn In terms of these matrices we make 3 final assumptions. Assumption 6. rank½Φ I np λ; ½I n ; M01 ; …; M 0p 1 0 L ¼ np for any real or complex scalar λ. Assumption 7. rank½Φ0 I np λ; ½I n ; 0nnðp 1Þ 0 ¼ np for any real or complex scalar λ. Assumption 8. M has nðp 1Þ distinct eigenvalues. Under Assumptions 1–8, we have the following implication. Since this result is really a Corollary to the results of Zadrozny (2016), it is explicitly proven in Appendix B. Yet it follows directly from Zadrozny's result under the stated assumptions. Corollary 1 (Zadrozny, 2016). π are identifiable reduced form parameters. The fact that π are identifiable also means they may be consistently estimated. Appendix C defines a consistent instrumental variables (IV) estimator π^ IV . Appendix D writes the likelihood function for the VARMA model in closed form, along with the asymptotic information matrix I . π^ IV also serves as a convenient starting point for the numerical maximization of the likelihood function, a process which yields the efficient maximum likelihood estimator π^ MLE . This estimator has asymptotic covariance matrix I 1 . 3.4. Identification and estimation of structural parameters Thus far we have claimed to have considered representation, identification, and estimation of a DSGE model with state space formulation (1). Yet, we have not explicitly considered a key feature: That the ABCD parameters, and therefore ðh 1Þ reduced form parameters π, generally depend on an underlying ðk 1Þ vector of structural parameters, θ. This correspondence takes the form of a generally nonlinear and not analytically tractable, but we assume continuously differentiable function g. It is defined on what is typically a non-convex parameter space of interest, Θ DRk . π ¼ gðθÞ
ð8Þ
Under Assumptions 1–8, π are reduced form parameters. Therefore, several identifiability results from Rothenberg (1971) are directly applicable. For instance, a necessary condition for the identification of θ is k rh. We say θ is locally identifiable at a point θ0 if and only if the ðh kÞ Jacobian Gðθ0 Þ ¼ ∂g=∂θ0 jθ ¼ θ0 is full column rank k. θ is globally identifiable at a point θ0 A Θ in Θ if and only if πðθ0 Þ ¼ πðθ0 Þ for θ0 A Θ implies that θ0 ¼ θ0 . If θ is indeed globally identifiable in Θ at what we henceforth denote the population value θ0, then an efficient estimator for θ is the maximum likelihood estimator θ^ MLE . However, maximizing the likelihood function for DSGE models in particular is typically difficult (Andreasen, 2010). Specifically, even if θ is identifiable, the likelihood function may contain flat regions, for instance as a result of an unfortunate choice of observables in yt. Otherwise, it may contain discontinuities and boundary points as a result of imposed parametric restrictions for θ. These may include but are not limited to theoretical restrictions on values θ may take on, sign restrictions, determinacy, or stability of solutions (Assumption 3). These are in fact also the features which contribute to the parameter space Θ on which g is defined being possibly not convex in all cases. But as Rothenberg (1973, p. 24) observes, the maximum likelihood estimator is by no means the only applicable efficient estimator in many settings. Given θ^ MLE and I^ (the asymptotic information matrix with respect to π evaluated at θ^ MLE ) are available, a possible alternative is the minimum chi-square estimator, θ^ MCSE ¼ arg min Tðπ^ MLE gðθÞÞ0 I^ ðπ^ MLE gðθÞÞ0
ð9Þ
θ A Θ D Rk
Hamilton and Wu (2012) applied this estimator towards Gaussian affine term structure models (ATSM), which generally have ^ θ^ MCSE Þ ¼ T 1 ½Gðθ^ MCSE Þ0 I^ Gðθ^ MCSE Þ 1 . As VAR representation. Their proposal is to approximate the variance of θ^ MCSE with Avarð they prove, this is equivalent to the usual asymptotic variance for θ^ MLE , meaning there is no basis to conclude that the MLE has better properties. The MCSE offers several pragmatic advantages over the MLE in the case of ATSM which carry over to the case of DSGE models. For example, the fact that “half” of the required numerical search is carried out with respect to the first step of estimating π may ease the numerical burden introduced by g. Moreover, in the special case that h¼k, then θ^ MCSE ¼ arg minθ A Θ D Rk ðπ^ MLE gðθÞÞ0 ðπ^ MLE gðθÞÞ0 is identical to θ^ MLE . In this case the MCSE also has the additional benefit that the criterion will be identically zero at the optimum, reconfirming to the analyst that the MCSE¼MLE has indeed been found. There are two primary advantages of preceding this way with respect to pileups. The first is that the numerical ease of the MCSE allows one to compute the bootstrap distribution of the estimator without substantial numerical headaches. This is particularly useful in the context of pileups which make conventional asymptotic results poor approximations, or perplex
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
63
weak identification robust statistics with boundary estimates. Of course, the bootstrap is no silver bullet, and in particular, the unreliability of the bootstrap in the context of weak instrumental variables has been well-documented by Rothenberg (1984), Horowitz (2001, Section 4.5), and Hahn et al. (2004). The fact that pileups are an outcome of weak identification should at least give pause to the application of the bootstrap herein. Yet, as Stock (2008) notes, this outcome in weak IV, related to failure of Edgeworth expansion (Hall, 1992, Chapter 3), has to do with the fact that the distribution of the IV estimator depends on a nuisance parameter which is not even consistently estimable with weak instruments. In all examples of pileups we consider, all parameters are globally identifiable in Θ and may be consistently estimated. Thus, the bootstrap should be reliable, and there is precedent for this assumption. For example, as Moreira et al. (2004) shows, the bootstrap for the Anderson–Rubin statistic is valid under weak identification due to its lack of dependence on such nuisance parameters. The second advantage of this framework is two immediate tests of specification, as applied to ATSM by Hamilton and Wu (2014). First, the restrictions on VARMA representation (6) typically take the form of zeros. When these zeros appear on the AR coefficients, they amount to Granger causality restrictions which are directly testable with a conventional F test. Second, the chi-squared statistic (9) is also a test of the h k overidentifying restrictions embodied in g. When no pileups are present, this statistic has a χ 2h k distribution. Otherwise, the bootstrap is again useful. Why are these tests particularly useful in the context of pileups? In some situations, unwieldy distributions of estimators may be construed to be evidence of misspecification. For example, if one finds in many time samples that the data favors a discount factor greater than 1, they may be tempted to conclude that the model is poorly formulated. But as we will soon see, though seemingly suggestive of misspecification, such pileups may occur even when the model is in fact correctly specified.
4. Examples 4.1. Basic stochastic growth model First, we consider the classic stochastic growth model presented by Brock and Mirman (1972). In this model the social planner solves, max fC t g
1 X
subject to Y t ¼ At K αt and K t þ 1 ¼ Y t C t :
βt ln C t
ð10Þ
t¼0
Yt is output, Kt is capital, Ct is consumption, and At is productivity in period t. For simplicity normalize A ¼ 1 and assume that the output is measured with error εt iidNð0; σ 2 Þ. Where lowercase letters represent logs, the solution to this model is kt ¼ lnðαβÞ þ αkt 1 and yt ¼ αkt . These equations have the form of (1), kt ¼ lnðαβÞ þ α xt 1 þ 0 yt 1 þ 0 εt |{z} |{z} |{z} |fflfflffl{zfflfflffl} |{z} xt
cx
Ax
B
Ay
2
yt ¼ α lnðαβÞ þ α xt 1 þ 0 yt 1 þ 1 εt |{z} |{z} |fflfflfflfflffl{zfflfflfflfflffl} |{z} Cx
cy
Cy
ð11Þ
D
In terms of the notation of (1), the dimensions of this model are s ¼1 and n¼1, and Σ ¼ σ 2 . The k ¼3 structural parameters in θ0 ¼ ðα; β; σÞ are assumed to belong to the parameter space Θ R3 defined by 0 o αo 1, 0 o β o 1, and 0 oσ o 1. Therefore, Assumptions 1–5 are satisfied. In the ensuing experiments we will consider the following calibrated, pseudo-population value: θ0 : α0 ¼ 1=3;
β0 ¼ 0:99;
σ 0 ¼ 0:1:
To obtain the most concise reduced form representation of this model, simply note that 1=C x exists for all θ A Θ, and therefore xt 1 ¼ ð1=C x Þyt ðcy =C x Þ ð1=C x Þεt . Inserting this into the state equation and rearranging yields the ARMA(1,1) model, yt ¼ ðcx C X þ cy ð1 Ax ÞÞ þ Ax yt 1 þ εt þð Ax Þ εt 1 |{z} |fflfflffl{zfflfflffl} |ffl{zffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |{z} Φ1
μ
ut
M1
ð12Þ
ut 1
Here, Ω ¼ σ 2 so L ¼ σ. The h¼3 reduced form parameters are collected in π 0 ¼ ðμ; Φ1 ; LÞ; M1 ¼ Φ1 is redundant and so not a separate reduced form parameter. In terms of the notation introduced in (6), 2 3 2 3 1 0 0 2 3 μ μ 6 Φ 7 60 1 07 6 17 6 7 76 ð13Þ 6 7¼6 7 4 Φ1 5 4 M1 5 4 0 1 0 5 L L 0 0 1 |fflfflffl{zfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl} π R
64
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
Fig. 3. “Boundary estimates.” Monte Carlo for Brock and Mirman (1972) model: T ¼ 225, N¼ 1000.
The Zadrozny (2016) conditions Assumptions 6–8 are sufficient for the identifiability of all VARMA parameters ϕ, m, and L, and therefore π. However, they are by no means generally necessary for the identification of π under restrictions R. Here, due to the restrictions in (13), M1 is identifiable exactly when Φ1 is. Using yt 2 as an instrument, Φ1 is identifiable if the autocovariance Γ 1 ¼ E½ðyt yÞðyt 1 yÞ is nonzero for y ¼ Eðyt Þ. Since Γ 1 ¼ ασ 2 , it is nonzero in all of Θ. So, Φ1 ¼ Ax and M 1 ¼ Ax are identifiable in all of Θ. By similar reasoning, μ ¼ ð1 Φ1 Þy and L are also identifiable in all of Θ. In sum, π is globally identifiable in all of Θ. Given an efficient estimator for π such as the MLE π^ MLE , the MCSE is the argument θ which minimizes (9). However, due to the fact that both θ and π are 3-dimensional, θ is just identified, meaning that the MCSE θ^ MCSE may be retrieved from π^ MLE as the solution to π^ MLE ¼ gðθÞ. This takes to the form of an analytical one-to-one mapping, α ¼ Φ1 ;
β ¼ ð1=Φ1 Þexpfμ=Φ1 g;
σ ¼ jLj:
In this special case, the MCSE is also identically the MLE θ^ MLE . Thus, any phenomena we observe are true for that estimator as well. In that light, 1000 data series of length T¼225, approximately the quarterly time sample of post-War data, are generated from θ0. In a first step, π^ MLE is obtained by likelihood maximization, which is easy since an initial IV estimate for Φ gives a suitable starting point for numerical maximization. The distribution of these ARMA estimates appears in the first row of Fig. 3. In a second step, the structural parameter estimators θ^ MCSE ¼ θ^ MLE are obtained. These appear in the second row of Fig. 3. Each of 1, 5, and 10% Kolmogorov–Smirnov tests that any of these distributions are Gaussian flatly reject.17 The pileup in this case takes the form of a consistent boundary estimate for β. In practice, how would an analyst, or audience to an empirical study, react to such estimates? They might conclude that this model is misspecified, as there is strong theoretic rationale for β o 1. And yet, this model is not misspecified. Rather, estimates for both α and β tend to be small sample biased-upward. In other words, the first type of pileup one may observe in practice is boundary estimates, and these by no means indicate that the model is necessarily misspecified.
4.2. Search and matching in the labor market Mortensen and Pissarides's (1994) search and matching in the labor market model has recently been extended to the form of an estimable medium scale DSGE model (Christiano et al., 2016). Krause and Lubik (2010) derive a simple version of this model for calibration purposes with observables yt (output), and two states: nt (employment) and θ t (labor market tightness). In this formulation, yt ¼nt (Equation (1), p. 261). Thus, under risk neutrality (IES¼0; Equation (29), p. 266), the
17 The simplicity of this model, in contrast with the remaining models to be considered in this paper, makes it feasible to compute the true coverage probability of the 95% bootstrap confidence interval by Monte Carlo simulation. Following the procedure of Horowitz (2001, Section 3.4), for each of n oB 225 b it t ¼ 1 are inferred assuming u b i; 1 ¼ 0. We then i ¼ 1; …; B Monte Carlo draws θbi from a dataset of length T ¼225 with data-generating θ0, errors u i¼1 n 225 oB sample with replacement B times from each of these i sets of errors to create another j ¼ 1…; B synthetic data sets for each i, labeled yj;i;t t ¼ 1 .
j¼1
B For each i, we compute the estimators fθbj;i gj ¼ 1 and record a 1 if θ0 lies in the 95% confidence interval, or zero otherwise. The true coverage probability of
the 95% bootstrap is 100 ð#1'sÞ=B. This experiment is conducted informally with multiple values for B and the coverage probability appears to be approaching the correct magnitude as B grows large.
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
65
solution to this model may be written as follows, assuming that yt is measured with error εt iidNð0; σ 2 Þ. θt ¼ 0 þ |{z} |{z} xt
cx
ξ
xt 1 þ 0 yt 1 þ 0 εt |{z} |{z} ρ 1u βð1 ρÞ ξ η B Ay 1ρ u |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Ax
uρ yt ¼ 0 þ ρð1 ξÞ xt 1 þ y þ 1 εt |{z} |fflfflfflffl{zfflfflfflffl} u ffl} t 1 |{z} |fflffl{zffl cy D C x
ð14Þ
Cy
The k¼ 6 structural parameters of this model are collected in θ0 ¼ ðρ; ξ; β; η; u; σÞ.18 0 o ρo 1 is the separation rate. 0 oξ o 1 is the match elasticity. 0 o β o 1 is the discount factor. 0 o η o1 is the Nash bargaining parameter. 0 ou o0:2 is the steady state unemployment. 0 oσ o 1. Aside from ξ0 ¼ 0:1 which is suggested by Cooley and Quadrini (1999), the calibration considered by Krause and Lubik is the following, which we presume is population in our experiments: θ0 : ρ0 ¼ 0:1;
ξ0 ¼ 0:1;
β0 ¼ 0:99;
η0 ¼ 0:5;
u0 ¼ 0:12;
σ ¼ 0:1:
Population steady state unemployment u0 is set to the relatively high value of 12% to include both those unable to find work, and those marginally attached to labor force. Krause and Lubik show that in certain regions of the parameter space, the solution to the model is indeterminate, i.e. exogenous uncertainty not attributable to the fundamental errors εt known as “sunspots” may drive the evolution of the endogenous variables. In practice, indeterminacy causes numerical solution algorithms for larger scale models to yield multiple solutions, as we consider explicitly in the next example. At θ0, Assumptions 1–5 are satisfied. The observation equation implies xt 1 ¼ ð1=C x Þyt ðC y =C x Þyt 1 ð1=C x Þεt which, when plugged into the state equation, yields the ARMA(2,1) model, yt ¼ ðC y þ Ax Þ yt 1 þð Ax C y Þ yt 2 þ εt þ ð Ax Þ εt 1 |{z} |fflfflffl{zfflfflffl} |ffl{zffl} |fflfflfflfflffl{zfflfflfflfflffl} |fflfflfflfflffl{zfflfflfflfflffl} Φ1
Φ2
ut
M1
ð15Þ
ut 1
Ω ¼ σ 2 so L ¼ σ. The h¼4 reduced form parameters are collected in π 0 ¼ ðΦ1 ; Φ2 ; M 1 ; LÞ. No elements of π are linearly redundant, so R ¼ I 4 . In addition, Cx is not identifiable, as it does not appear in the ARMA parameters. It is a priori at least unclear whether Ax and Cy are uniquely identifiable; solving for these two parameters from Φ1 and Φ2 yields two solutions. In order for these parameters to be identifiable, we must first check that π is indeed identifiable. This amounts to checking that Assumptions 6–8 are satisfied. Since M1 is a scalar, Assumption 8 is satisfied by construction, meaning it suffices to show, 2" " # 3 # "
# Φ1 1 1 Φ1 Φ2 1 I2 ; L 5 ¼ 2 and rank I2 ; ¼ 2: rank4 Φ 0 M1 1 0 0 2 ð23Þ
ð23Þ
This is true at θ0 and other points in Θ. In order to estimate this model, we first obtain the MLE for π. The distribution of these estimators is given in the first row of Fig. 4. In a second step, we use the MCSE to recover estimators for Ax and Cy. This estimator is written as 02 02 3 2 3 2 310 31 2 3 ^ 1;MLE ^ 1;MLE Φ Φ C y þ Ax C y þ Ax ^ A B6 ^ B6 ^ 7 6 7 6 7C 7C C ^ B6 Φ C 4 x;MCSE 5 ¼ arg minT B6 Φ 7 7 ð16Þ @4 2;MLE 5 4 Ax C y 5A I ψψ @4 2;MLE 5 4 Ax C y 5A C^ y;MCSE ½Ax ;C y 0 ^ 1;MLE ^ 1;MLE Ax Ax M M for I^ ψψ the upper-left block of the full ARMA information matrix I described in Appendix D, Eq. (D.9) evaluated at π^ MLE . Third, we may use the analytical correspondence between the state space and structural parameters to back out 2 of the structural parameters from Ax and Cy. These are identically the MCSE estimators for the structural parameters. One of these free variables must be u or ρ, since those are the only two parameters appearing in Cy. There are 7 choices of fixing 3 of ρ, ξ, β, η, and u to choose from. Each choice results in the complement 2 parameters being globally identifiable. In the third row of Fig. 4, we observe the distribution of the MCSE for ρ and ξ. In this example, we observe a long left tail in the MCSE for ξ. This is skew rather than a boundary of Θ; recall 0 oξ o 1 but estimates pile up at 0.1. This result may appear somewhat unintuitive at first, since both Ax and Cy are both seemingly wellbehaved. But as the two right-most scatter plots in the third row of Fig. 4 indicate, the correspondence between Ax and Cy and ξ is a cone. This functional form results in a natural boundary for values ξ may take on. In other words, while estimates may pile up at imposed boundaries such as Θ, they may also pile up at boundary points implied by the functional form of the model itself. 18 Krause and Lubik consider a structural parameter vector in which u is replaced by b. The reparameterization here is used to avoid the need for the nonlinear solution described in Equations (18)–(22) on p. 264.
66
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
Fig. 4. “Skew.” Monte Carlo for Krause and Lubik (2010) model: T ¼ 225, N ¼1000.
4.3. New Keynesian model Next we consider An and Schorfheide's (2007) 6-equation, canonical New Keynesian model, zt ¼ ρz zt 1 þεzt
ð17Þ
g t ¼ ρg g t 1 þ εgt
ð18Þ
r t ¼ ρr r t 1 þ ð1 ρr Þψ π π t þ ð1 ρr Þψ y ðyt g t Þ þ εrt
ð19Þ
yt ¼ Et yt þ 1 þ g t Et g t þ 1 ð1=τÞðr t Et π t þ 1 Et zt þ 1 Þ
ð20Þ
π t ¼ βEt π t þ 1 þ κðyt g t Þ
ð21Þ
ct ¼ yt g t
ð22Þ
In log differences from steady state, zt is the total factor productivity, gt is the government spending, rt is the nominal interest rate, yt is the output, πt is the inflation, and ct is the consumption. Θ R14 , the theoretically admissible parameter space for the model, is defined liberally to account the range of opinion. 0 o τ r 5 is the CRRA. 0:995 rβ o 1 is the discount factor. 0 oκ r 1 is the slope of the Phillips curve. 1:25 oψ π r 1:75 and 0 o ψ y o 5 are Taylor rule coefficients. 0 o ρi o 1 are shock persistence parameters for i¼z, g, and r. These total 8 structural parameters are collected in ð23Þ θ0s ¼ ρg ; β; κ; τ; ρz ; ψ y ; ψ π ; ρr 18
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
67
Table 1 Determinacy and indeterminacy in New Keynesian model. Determinacy: Active monetary policy θ0 Solution Φ 2
0:69 6 0:14 4 0:24
1
0 0:95 0
2
0:69 6 4 0:08 0:21
2
2
0:79 6 0:19 4 0:12
3
1e5 Ω
0 0:95 0 0 0:95 0
Indeterminacy: Passive monetary policy θ1 : Same as θ0 , but with ψ π ¼ 0:1. 2 1:01 0 1 6 3:31 0:95 4 2:4
0
2
0:76 6 4 0:18 0:24
2
2
0:74 6 0:75 4 0:58
3
3 0:48 7 0:19 5 1:38
0
0 0:95
8:66 4:71
2
0:27 6 0:54 4 0:44
0:46 7 5 0:62
3 0:02 0:70 7 5
11:1
4:94
2:89
2
1:4 6 42:6 4 0:028
3 0:03 7 0:03 5 0:95
2
3
2
0:03
1:55 6 4 2:93 2:25
3
0:49 6 4 0:14 1:15 0:32 6 1:36 4 0:33
0:17 7 5 0:78
7 5 5:5
7:88
2
1:40
0:95 0
0
1:46
3
13:7
1:91 6 4:41 4 3:13
0:50 0:34 7 5
0:25
λ
2
3
902
8:85 2:94
2
3
2
3 7 5 596
2
3
2
7 5 3:28
3 1:17 6 7 0:95 4 5 0:9 3 0:95 6 0:9 7 4 5 0:51
2:60 0:63
Stable? 3
1:25 6 0:95 7 5 4 0:9
3 7 5 3:55
7 5
1369
2
3
7 5 3:8
The vector of shocks ε0t ¼ εzt εgt εrt is distributed iidNð0; ΣÞ for
vechðΣÞ0 ¼ θ0σ ¼ σ 2z ; σ gz ; σ 2g ; σ rz ; σ rg ; σ 2r
3 1:51 6 0:95 7 4 5 0:9 3 0:95 6 7 4 0:9 5 0:81 2
3 0:95 6 0:9 7 4 5 0:62
No
No
Yes
No
Yes
Yes
ð24Þ
16
The standard deviations are normalized so that 0 oσ i o 0:1 for i¼z, g, and r (they are nonnegative, a benign restriction discussed formally by Hamilton et al., 2007), and the correlations 1 o ρij ¼ σ ij =ðσ i σ j Þ o1. Note, in that we are assuming these correlations may be nonzero we are generalizing An and Schorfheide's (2007) original assumptions. Finally, we combine (23) and (24) to obtain the ð14 1Þ vector of aggregate structural parameters, θ ¼ ðθ0s ; θ0σ Þ0 . A common presumption for models of this scale is that numerical algorithms are necessary to obtain a solution. In fact, this model may be solved analytically using the method of undetermined coefficients (see i.e. Galí, 2008). Taking as given the ABCD representation of this model presented by Komunjer and Ng (2011, Table 1), let us guess that the solution has the following form for Y 0t ¼ r t yt π t and X 0t ¼ zt g t r t : 2 3 crz crg crr 6c 7 ð25Þ Et Y t þ 1 ¼ 4 yz cyg cyr 5 X t cπz cπg cπr |fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl} C
Each element cij depends on θs in some a priori unknown way. Plugging Et yt þ 1 , Et π t þ 1 , and Et zt þ 1 ¼ ρz zt into aggregate demand (20), 1 ρ 1 1 1 rt ð26Þ yt ¼ cyz þ cπz þ z zt þ cyg þ cπg þ 1 ρg g t þ cyr þ cπr τ τ τ τ τ |fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} f yz
f yg
f yr
Plugging in the implied expression for Et π t þ 1 , along with (26) into the Phillips curve (21),
κ ρ κ κ κ κ π t ¼ κcyz þ β þ cπz þ z zt þ κcyg þ β þ cπg ρg κ g t þ κcyr þ β þ cπr r t τ τ ffl} τ τ τ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} f πz
f πg
f πr
ð27Þ
68
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
Table 2 Observational equivalence in New Keynesian model. Observationally equivalent point θ0 (indeterminacy) Solution Φ 2
0:83 6 0:28 4 0:24
1
0 0:95 0
2
0:79 6 0:19 4 0:12
2
2
0:76 6 0:08 4 0:04
3
2
0:29 0:38 7 5
4:71 6 9:3 4 1:14
0:73
0
0:25
0:95 0
2
3
0:27 6 0:54 4 0:44
0:46 7 5 0:62
0 0:95 0
1e5 Ω
3
3
2
0:23 0:57 7 5
0:18 6 0:12 4 0:11
0:55
λ 9:36
5:82
5:26
11:1
4:94
2:89
10:7
3:79
1:74
3
2
Stable? 3
7 5
1:05 6 0:95 7 4 5 0:41
3
2
7 5
3 0:95 6 0:9 7 4 5 0:51
3
2
3 0:95 6 0:8 7 5 4 0:51
7 5
No
Yes
Yes
Notes: Lower panel lists comparative monetary policy impulse-responses from a 1% increase in nominal interest rates at θ0 vs its observationally equivalent point θ0 .
Therefore, collecting (26) and (27), and using the implicitly defined terms f ij ðθs Þ, 2
rt
3
2
0
0
1
32
zt
3
6y 7 6f 76 7 4 t 5 ¼ 4 yz f yg f yr 5 4 g t 5 f f f πt rt πz πg πr |fflffl{zfflffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflffl{zfflffl} Yt
ð28Þ
Xt
F
Finally, using also the Taylor rule we have the system of implicit functions 2
1
6 F ðC; θs Þ ¼ 0: F 4 0 0
ð1 ρr Þψ y 1 κ
ð1 ρr Þψ π
3
2
0
6 7 1=τ 5C 4 ρz =τ κρz =τ β þκ=τ
ð1 ρg Þψ y ρg 1 ρg κ=ρg
1 þ ρr
3
1=τ 7 5¼0 κ=τ
ð29Þ
F makes up the complete system of 9 and equations and 9 unknowns necessary to solve for the elements of C. Although it is infeasible to do this by hand, it is straightforward to make use of symbolic computation software for this purpose. Since the mapping is in fact a cubic, there are consequently three solutions. Aside from crg ¼ 0 and cπg ¼ 0, expressions for fcij g in terms of θs are too complicated to provide human intuition. But they are known to the computer in closed form. To verify these derivations are correct, θ0 is chosen to be the value also utilized by Komunjer and Ng (2011) in Table 1, where additionally we let ρgz ¼ 0:5, ρrz ¼ 0:5, and ρrg ¼ 0:5. The values of the ABCD parameters from one of these analytical solutions are the same as those which emerge from Sims's (2002) numerical GENSYS routine. What does it mean that there are three analytical solutions, and only one matches with the numerical solution? Does it imply that there is indeterminacy, or multiple equilibria, at all points? To begin to understand this situation, additionally note that the analytical solution is not the only useful simplification of the model. Given (28), Corollary 3 from Morris
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
69
Fig. 5. “Multiple modes.” New Keynesian model with unconstrained parameter space Monte Carlo: T ¼ 225, N ¼1000.
(2016b) also ensures that this ABCD model has VAR(1) representation of the form, 2 3 2 32 3 ϕrr 0 ϕrπ rt 1 rt 6y 7 6ϕ 76 7 4 t 5 ¼ 4 yr ρg ϕyπ 5 4 yt 1 5 þ Dεt ϕπr 0 ϕππ πt 1 πt |fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
ð30Þ
Φ
1
and D each ð3 3Þ matrices coming directly from ABCD representation and ut ¼ Dεt iidNð0; ΩÞ for Ω ¼ DΣD0 . for Φ ¼ CAC The zeros in Φ and repeated elements in Ω mean there are 7 þ6¼13 total reduced form parameters in π. Given this setup, we may show that the fact that there are three solutions for each θ does not necessarily imply indeterminacy for all points, since all three solutions are not necessarily stable. Stability is ensured only if the eigenvalues fλg of Φ are less than 1. They are defined as the solutions to the characteristic equation, ðλ ρg Þððλ ϕrr Þðλ ϕππ Þ ϕrπ ϕπr Þ ¼ 0 2
ð31Þ
The solutions are ρg and the roots of λ ðϕrr þ ϕππ Þλ þ ðϕrr ϕππ ϕrπ ϕπr Þ ¼ 0. In the top panel of Table 1, all three VAR solutions are listed at θ0. Note that the only solution which matches with the numerical solution is the third. The other two fail to be stable. What about if we consider alternative data generating points such as those representing “passive monetary policy,” i.e. a weak reaction to inflation? Such points should result in multiple stable solutions (Lubik and Schorfheide, 2004). Indeed, in the bottom panel of Table 1, we observe that at the alternative parameterization θ1 with a weak policy response to inflation, the second solution is stable as well. Is there any way of understanding why there are precisely three analytical solutions from the perspective of numerical solutions and sunspot equilibria? Sims's (2002) numerical algorithm for solving linear rational expectations models begins
70
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
Fig. 6. New Keynesian model with constrained parameter space Monte Carlo: T ¼ 225, N ¼1000.
with the presumed representation Γ 0 wt ¼ Γ 1 wt þ Ψ εt þΠηt for wt a vector of endogenous variables, Γ0 and Γ1 square conformable matrices, εt a vector of fundamental errors, and ηt an ðe 1Þ vector of forecast errors with elements ηit ¼ wit Et 1 wit 1 . The first step in this solution algorithm is the QZ decomposition Γ 0 ¼ QSZ 0 and Γ 1 ¼ QTZ 0 . The diagonal elements of S and T, labeled fsii g and ft ii g, are known as the generalized eigenvalues of fΓ 0 ; Γ 1 g, and the QZ decomposition is not unique; Sims's algorithm chooses the solution which places the ratio jt ii j=jsii j in increasing order for consecutive rows i. The number of these rows for which jt ii j=jsii j Z1 is equivalently the number of unstable roots in the system. Farmer et al. (2015) call the difference di ¼ e ðnumber of unstable rootsÞ the “degrees of indeterminacy” in the system. When di¼0, Sims's solution returns a stable solution, but otherwise will complain that the solution is indeterminate, i.e. there are possibly many solutions. Farmer et al.'s suggestion for solving indeterminate models such as these is to choose di elements of η to be included as “new fundamental” errors in a newly expanded error vector ε~ t , and simply reapply Sims's algorithm. As the authors prove (Lemma 1), the new fundamental errors are linear combinations of an underlying set of di sunspot shocks and the endogenous variables wt. Moreover, their procedure is equivalent to that previously presented by Lubik and Schorfheide (2004, Theorem 2). Farmer et al.'s framework for solving indeterminate models is applied to An and Schorfheide's (2007) model in Appendix E. Given this setup, e¼2, and the number of unstable roots depends on the parameterization of the model. There are 2 unstable roots at θ0, resulting in di ¼0 and therefore a single stable solution using Sims's algorithm (Table 1, Solution 3). However, at θ1, there is only 1 unstable roots and therefore di ¼1. Thus, there are 2 ways to choose di ¼1 element from the e¼ 2 dimensional vector ηt, and therefore 2 stable solutions (Table 1, Solutions 2 and 3), each containing di ¼1 sunspot. In summary, in Table 1, analytical Solutions 2 and 3 are both potential solutions when sunspots may be present. On the other hand, Solution 1 is a solution which arises from the mathematics, but is always not stable, and does not correspond to a sunspot. Since there are k¼ 14 structural parameters but only h¼13 reduced form parameters, at least one element of θ must be fixed to its value in θ0 for the remaining parameters to have a fighting chance of identifiability. As Komunjer and Ng (2011)
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
71
show, fixing ψ π to its value in θ0 yields local identifiability of the remaining parameters. But in fact, this does not yield global identifiability in all of R13 . Using the fact that the number of reduced form and free structural parameters are the same, we may solve for θ in terms of π. There are two solutions, i.e. observational equivalence in the reals. This equivalence is limited to five parameters in particular, Observational equivalence in τ; ρz ; 100σ z ; ρgz and ρrz : While this may appear to be a problem, the observationally equivalent pair to θ0, labeled θ0 , is not contained in Θ. In fact, the CRRA takes on a value of 45.4 at θ0 , which is completely contrary to sensible economic theory. Table 2 depicts the solutions at θ0 , as well as monetary policy impulse-responses from a 1% increase in nominal interest rates. The first item to note is that once again, both Solutions 2 and 3 are stable. In other words, the point θ0 is indeterminate, while its observationally equivalent point θ0 has just one stable solution. Therefore, in order to detect failure of global identification, it is important to consider all possibly indeterminate regions of the parameter space, making Qu and Tkachenko's (2016) approach especially useful. Second, the economically non-sensible point θ0 also yields economically contradictory monetary policy impulse responses. Thus, on the basis of either parameter bounds or sign restrictions which define Θ, θ is however globally identifiable in Θ. Since θ is just identified, the MCSE is once again equivalently the MLE. To begin to understand the effects of this observational equivalence on the distribution of either estimator, Fig. 5 depicts the results of a Monte Carlo experiment without bounds Θ imposed. This represents the true distribution of the estimator given an analyst has not imposed bounds on the problem, and/or does not realize there is a potential observational equivalence problem. What we observe is multimodality. For example, the CRRA τ has two modes, one in the economically sensible region near the true data-generating value of 2, and one below zero due to the observationally equivalent point θ0 with its infeasible CRRA¼ 45.4. Multimodality is only evident in those five parameters which are subject to the observational equivalence problem. How about when the parameter space is constrained by Θ? This would reflect the distribution of the estimator when the rational analyst sorted out not economically sensible points. The results of this exercise are depicted in Fig. 6. Importantly, despite the fact that θ0 does not lie in Θ, we still observe multimodality in several parameters, including ρz and ρgz. In other words, while θ is globally identifiably in Θ at θ0, the closeness of θ0 to Θ causes the pileup symptom of weak identification to occur (Fig. 2). Moreover, despite the fact that multimodality in some parameters such as τ has been resolved, we now observe boundary pileups. Therefore, even if one type of pileup is ameliorated by a given restriction, there is no guarantee that another will not be created. We now expand on this observation in the context of Bayesian estimation.
5. Posterior probability Prior probabilities are widely used to address identification problems, such as those illustrated in the above examples. For example, DeJong and Whiteman (1993) demonstrated that in the case of the moving average model, uniform priors for θ on ð0; 1Þ ensured no evidence of the pileup phenomenon in the posterior. Does this result also follow in the current situation, particularly with respect to the New Keynesian model? Before proceeding to the general case, we must consider this important question. 5.1. Uninformative priors Sims (2003), when reviewing the work of Smets and Wouters (2003), suggested a hypothetical approach to easily computing DSGE posterior probabilities. This proposal involved computing the posterior for a reduced form VAR model, and inferring the posterior for the structural parameters of an underlying, just identified DSGE model, indirectly. As in the current paper, Kleibergen and Mavroeidis (2014, p. 1190) also study a just identified New Keynesian model. As they show using Sims's (2003) proposed procedure, a pileup in the posterior probability for the structural parameters occurs at a point of non-identification when flat priors for the VAR are utilized. However, they show that this pileup is resolved with Jeffreys (1961), or other more informative priors. Here, we wish to repeat this experiment with respect to Jeffreys priors using the New Keynesian model considered in Section 4.3 and its unique global identifiability concerns. Following Tiao and Zellner (1964), such priors may be written as pðϕ; ΩÞ ¼ pðϕÞpðΩÞ
pðϕÞ ¼ constant:
pðΩÞ p jΩj 2
0 The posterior probability for ϕ given Ω and the 3T 1 data Y ¼ Y 01 ; …; Y 0T is
b pðϕjΩ; YÞ N ϕðΩÞ; σbðΩÞ
ð32Þ
ð33Þ
b for N multivariate Gaussian and ϕðΩÞ the generalized least squares (GLS) estimator for ϕ depending on Ω and σbðΩÞ a consistent estimator for its variance–covariance matrix. The marginal posterior probability for Ω meanwhile has the form pðΩjYÞ IWðΩ; T 7Þ
ð34Þ
where IW is the inverse-Wishart distribution with T 7 the degrees of freedom, given 7 the dimension of ϕ. The implied posterior for θ is depicted in Fig. 7. As we see, the multimodal pileup in ρz is very much apparent, even with such priors.
72
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
Fig. 7. New Keynesian model posterior: Jeffreys priors: T ¼ 225.
Thus, while relatively uninformative priors such as Jeffreys may be useful for reducing pileups in some circumstances, this cannot generally be assumed. 5.2. Informative priors How about if relatively more informative priors are utilized? Assuming a conditionally normal prior for ϕ and inverse Wishart for Ω implies a conjugate posterior. The priors considered here are Ω IWðΩðθ0 Þ; T 7Þ
ð35Þ 0
ϕ Nðϕðθ0 Þ; R þ ðΩ cI3 ÞR þ Þ þ
0
1 0
ð36Þ
for R ¼ ðR RÞ R the Moore Penrose pseudo-inverse of the selection matrix vecðΦÞ ¼ Rϕ and c a tuning parameter which we adjust to test varying degrees of information in the prior. Woźniak (2016) gives the functional form of the posterior probability, which is calculated for two values, c¼1000 (relatively little information about ϕ) and c ¼100 (a relative preponderance of information about ϕ). The results are depicted in Fig. 8. While the multimodality in ρz previously viewed in the context of Jeffreys priors is eliminated, the double-edged sword of utilizing informative priors is they may place weight on the wrong area of the parameter space. For example, the posterior probability for ρrz now accumulates on the wrong side of zero. In summary, while utilizing a prior distribution may alleviate the pileup in the context of non-identification of parameters, such as those scenarios considered by Kleibergen and Mavroeidis (2014), it is by no means a foolproof solution in other cases, such as when global identifiability is a concern.
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
73
Fig. 8. New Keynesian model posterior: Informative priors: T ¼225. Notes: Blue: c ¼1000. Orange: c ¼100. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
6. Medium scale DSGE model The simple example models considered in Section 4 have several convenient features in common. These include that they have analytical solutions, and either reduced form VAR, or concise and identifiable VARMA representation. Nonetheless, in practice, an analyst will generally be confronted with larger scale models which generally do not have analytical solutions, and may seem to not have concise VARMA representation. How should one proceed subject to these more realistic circumstances? In practice, what one needs to apply the methodology suggested in this paper is state space representation which satisfies Assumptions 1–4. Then the steps to be followed are as follows: (1) Obtain VARMA representation using Proposition 1. (2) Check that the VARMA parameters π are identifiable at theoretically reasonable structural parameter values θ0 using Corollary 1. (3) If Step 2 is satisfied, check that structural parameters θ are globally identifiable at θ0 in the theoretically admissible parameter space Θ by searching for other values θ0 a θ0 which yield equivalent values for π in Θ. (4) If Step 3 is satisfied, estimated θbMCSE using the approach in Section 3.4. (5) Reapply Steps 2 and 3 to θbMCSE . Then, compute small sample distributions and test statistics. Why would one use the above approach, rather than maximum likelihood, for example? There are three key advantages. First, that π are reduced form parameters that allow one to search for observationally equivalent points, and therein assess global identifiability. This is more difficult to do from a likelihood perspective. No analytical solutions are required. Second, numerically finding the MCSE may be less computationally burdensome than the likelihood maximizer, expediting the computation of small sample statistics. Third, tests of specification are immediately available, which is useful since pileups may otherwise be wrongly construed as evidence of misspecification.
74
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
Table 3 Elements of Smets and Wouters (2007) model. Structural parameters (41)
Endogenous (33)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
τ ℓ β h s α γ Φ δ u ιp ξp ζp ιw ξw ζw λw Y^
CRRA Utlity/Disutility of labor Discount factor Habit formation in cons. Investment adj. cost Capital intensivity in prod. Avg. gr. rate of product. Fixed cost of production Depreciation of capital Capital utilization intensity Price indexation Calvo price prob. Kimball price agg. curv. Wage indexation Calvo wage prob. Kimball wage agg. curv. Wage markup St. state detrended output
zt gt et bt ft mpt mw t λpt w λt kt ct it πt wt p μt w μt qt rt
Total factor prod. Government spending Price of inv. vs. cons. Bond premium Fed shock to int. rate Shock to price markup Shock to wage markup p p Func. of mt and εt w w Func. of mt and εt Installed capital Real consumption Real investment Inflation Real wage Price markup Wage markup Tobin's Q Nominal interest rate
19 20 21 22 23 24 25 26 27 28 29 30
Π G ψπ ψy ψΔ ρr ρb ρe ρf ρz ρg ρp
St. state inflation St. state gov. spending Taylor rule inflation coeff. Taylor rule output coeff. Tay. rule output chg. coeff. rt persistence bt persistence et persistence ft persistence zt persistence gt persistence mpt persistence
rt st ut lt yt kt ct it wt qt r t
31 32 33 34 35 36 37 38 39 40 41
ρw ϑgz ϑp ϑw σb σe σf σz σg σp σw
mw t persistence εzt coeff. for gtAR(1) p MA(1) coeff. for mt process w MA(1) coeff. for mt process εbt std dev εet std dev εft std dev εzt std dev εgt std dev εpt std dev εwt std dev
Real rental rate on kt Utilized capital Capacity utilization Labor hours Real output Natural level inst. cap. Natural real consump. Natural real invest. Natural real wage Natural Tobin's Q Natural nom. int. rate Natural real r.r. on kt Natural utilized cap. Natural labor hrs. Natural real output
k
r k t st lt yt
Shocks (7) εzt εgt εet εbt εft εpt εwt
To To To To To To To
zt gt et bt ft mpt mw t
Moreover, concise, reduced form VARMA representation is generally available, even for larger scale models. To motivate this claim and that the approach proposed above may be useful in practice, we now apply this methodology to a DSGE model of the scale and type commonly used in policy deliberations. Appendix F presents the fundamental equations of the Smets and Wouters (2007) model, which has widely influenced model formulation in both academic and policymaking circles. This model is set up with n ¼7 observables in y1t y2t for 0 y1t ¼ ½ct it π t wt 0 y2t ¼ r t yt lt Variable and parameter names are listed in Table 3 in Appendix F. Morris (2016b) shows that this model has VARMA(3,2) representation with three columns of restrictions on the third lagged AR parameter, written as " # " #" # " #" # " # y1t y1t 1 y1t 2 Φ1A Φ1B Φ2A Φ2B Φ3A þ U t þM 1 U t 1 þ M 2 U t 2 ð37Þ ¼ þ þ y y2t y2t 1 y2t 2 Φ1C Φ1D Φ2C Φ2D Φ3C 1t 1 For U t iidNð0; ΩÞ. Assumptions 1–8 are satisfied at the parameterization θ0 considered in Morris (2016b). Therefore, the ð252 1Þ parameter π is identifiable. There are 41 total structural parameters, but at most 36 are identifiable (Komunjer and Ng, 2011). One set of five parameters to fix to constants which yields identifiability of the complement 36 is ðδ; ζ p ; ζ w ; Y^ ; Π^ Þ, which are maintained for the remainder of the analysis. Bounds Θ R36 are defined to be at least as large as the 95% posterior in Smets and Wouters (2007, Tables 1A and 1B). For simplicity only considering the determinate regions of the parameter space, several numerical searches are first conducted to determine whether there are any observationally equivalent points to θ0 in Θ. The reduced form parameters π are useful for this purpose since if there are observationally equivalent points, they must satisfy πðθ0 Þ ¼ πðθ0 Þ. In this experiment, only the region of indeterminacy is considered, but indeterminate regions could also be using Farmer et al.'s (2015) solution method, as in Appendix E. No observationally
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
75
Fig. 9. Smets and Wouters (2007) model Monte Carlo: T ¼225, N ¼ 1000.
equivalent points are found. In order to investigate the distribution of the MCSE for this model, a Monte Carlo experiment with T ¼225 and N ¼1000 is once again conducted. The results are depicted in Fig. 9. As in the preceding examples, three types of pileups are observed: Boundary estimates (ρz: Row 4, column 2), skew (s: Row 1, column 5), and to a lesser extent, multimodality (γ: Row 2, column 1). How would an analyst discern these are indeed benign pileups in practice and not evidence of misspecification? First, the exclusion restrictions on y2t 3 in (37) are testable by simple application of F test. Second, the MCSE presents an immediate test of overidentifying restrictions. While this distribution may not be assumed to be chi-squared, it may be bootstrapped as well. Another issue to be considered in practice is the choice of observables (Canova et al., 2014). Could it be that the pileups we observe here are simply the result of an unfortunate choice of variables? To consider this thesis, we focus on the Phillips curve relation in this model, which is known to be weakly identifiable (Kleibergen and Mavroeidis, 2009). Specifically, we wish to include the markup μt in the Phillips curve as a new observable to potentially help alleviate the pileup problem.
76
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
Fig. 10. Smets and Wouters (2007) model Monte Carlo under alternative choice of observables: T¼ 225, N ¼1000.
Following Morris (2016a), because a profit-maximizing firm will charge a fixed markup of average price over marginal cost, the markup is also the negative of real marginal costs, μt ¼ mct . Real marginal cost, in turn, is linearly related to the output gap: mct ¼ cðyt yt Þ for c 4 0 the output elasticity of marginal cost (see Galí, 2008). Therefore, in empirical analysis of the Philips curve, it is common to replace the markup term with λμt ¼ λmct ¼ αðyt yt Þ; α ¼ cλ 4 0, and estimate α using empirical estimates of the output gap. However, as Galí and Gertler (1999) explain, a direct proxy for marginal cost is also available in mct ¼ lnSt lnESt for St the labor income share St ¼ ðW t =P t Þ ðLt =Y t Þ. W t =P t is the real wage, Lt labor hours, and Yt the nominal output. Thus, a suitable proxy for markup is readily available in the data. One may as such simply replace lt with μt in y2t, yielding a new set of 7 observables. The new state space has the same VARMA(3,2) representation, and exclusion restrictions, as was the case with the original set of observables.
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
77
The result of recomputing this experiment with the new set of observables is depicted in, Fig. 10. A few of the parameters which contribute the Phillips curve in the model are β, ιp and γ. While β and ιp are seemingly unaffected by this change of dataset, γ now has majority mass on the boundary. Pileups in other parameters not directly contributing to the Phillips curve seem to likewise not be affected. Thus, while we cannot rule out the hypothesis that an unfortunate choice of observables contributes to the pileup phenomenon in the case of this model, we may at the same time also not conclude that there is an easy solution to the problem via this route.19
7. Conclusion This paper has sought to develop a conceptual framework for understanding how pileup phenomena arise in DSGE models, and to provide pragmatic methods for dealing with them in practice. In three examples, three distinct types of pileups were described: boundary estimates, skew, and multimodality. These issues are intimately related to the identifiability properties of the model, as pileups are a consequence of weak identification. They are also all apparent in a medium scale model. Moreover, while the Bayesian approach offers a promising way to deal with this phenomenon, it by no means results in well-behaved or otherwise single modal posterior probabilities in all cases, even with relatively informative priors. The empirical approach presented in this paper is useful in these situations, offering the analyst a pragmatic way to compute the true sampling distribution of estimators in the very common situation that such pileups should arise.
Appendix A. VARMA representation This section outlines the proof of Proposition 1 and Corollary 1 of Morris (2016b), which jointly say that if Fi (4) an ðs nÞ matrix exists and is full column rank n for some i Z 0, the model (1) has VARMAðiþ 2; i þ 1Þ representation. The purpose of conducting this exercise here is to promote ease of use of the result and support that the functional correspondence between (1) and (3) is indeed closed form. Assuming momentarily c ¼ 0s þ n1 , (1) implies that 8 iZ 0, 2 3 2 i3 3 2 32 # yt þ i εt þ i D … 1i Z 1 CAi 1 ½B0 D0 0 CA " x 6 ⋮ 7 6 7 t1 6 76 ⋮ 7 þ4 ⋮ ⋱ ⋮ 4 5¼4 ⋮ 5 5 54 yt 1 yt εt … D 0 C nn 1 is the indicator function. Henceforth, let i be understood to be the smallest i Z 0 so that Fi exists and is full column rank n.
P 1 i 0 F x ðiÞ exists for Fx(i) an ðn sÞ matrix defined by (5), then surely does Because F i ¼ j ¼ 0 F x ðjÞ F x ðjÞ
P 1 i 0 F i;k ¼ F x ðkÞ0 8 k ¼ 0; …; i ðF i;i F i Þ. So, the states may be written entirely in terms of observables and j ¼ 0 F x ðjÞ F x ðjÞ errors, xt 1 ¼ F i yt þ i þ F i;i 1 yt þ i 1 þ ⋯ þ F i;0 yt F lag yt 1 Gi εt þ i ⋯ G0 εt
P 1 P i i 0 0 where additionally F lag ¼ j ¼ 0 F x ðjÞ F x ðjÞ j ¼ 0 F x ðjÞ F y ðjÞ for 8 if j ¼ 0: < Cy ∂yt h i0 ¼ F y ðjÞ ¼ j1 0 0 Ay C y if j 4 0: ∂yt 1 j : CA ðnnÞ 1 P iq F x ðjÞ0 F x ðjÞ F ði kÞ0 Gε ði q kÞ for q¼0,…,i and k¼0 x ( D if j ¼ 0: ∂yt 0 Gε ðjÞ ¼ ¼ if j 4 0: CAj 1 B0 D0 ∂εt j ðnnÞ
and Gq ¼
P
i j¼0
Inserting the implied expressions for xt and xt 1 into xt ¼ Ax xt 1 þAy yt 1 þ Bx εt and simplifying yields the following VARMAðp; p 1Þ representation for p ¼ i þ 2. Note, all coefficients begin with the Moore–Penrose pseudoinverse F iþ ¼ ðF 0i F i Þ 1 F 0i . This exists because, as was assumed, F i is full column rank. The assumption that Fi is full rank is not unlike P the assumption that A0 is invertible in the structural VAR A0 yt ¼ pi¼ 1 Ai yt i þ εt , which would in that case yield the Pp 1 1 reduced form VAR yt ¼ i ¼ 1 A0 Ai yt i þ A0 εt , with in-common terms in coefficients A0 1 : yt ¼ F iþ ðAx F i 1i Z 1 F i;i 1 Þ yt 1 þ1i Z 1 F iþ ðAx F i;i 1 1i Z 2 F i;i 2 Þ yt 2 þ ⋯ þ F iþ F lag yt i 1 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflffl{zfflfflffl} ~1 Φ
~2 Φ
~ i þ 1 Φ
19 Alternatively, one could investigate the result of “turning off” shocks to price and wage markups, and preferences, along the lines of Canova et al. (2014). This would however result in stochastically singular model, violating Assumption 1. One could utilize a GMM estimator or similar in this case.
78
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
þ F iþ ðAy Ax F lag Þ yt i 2 þ Dεt þ F iþ ð Ax Gi þ 1i Z 1 Gi 1 ÞD 1 Dεt 1 |{z} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}M~ |fflfflffl{zfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl} 1 ut
~ i þ 2 Φ
þ 1i Z 1 F iþ ð Ax Gi 1 þ 1i Z 2 Gi 2 ÞD 1
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} ~2 M
ut 1
Dεt 2 þ⋯ þF iþ ðBx Ax G0 ÞD 1 |fflfflffl{zfflfflffl} ut 2
Dεt i 1 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflffl{zfflfflfflfflffl} ~ M iþ1
ðA:1Þ
ut i 1
~ j þM ~ 8 j ¼ 1; …; p 1 ¼ i þ 1 define the final parameters in (3). ~ j þΦ ~ j 8 j ¼ 1; …; p ¼ iþ 2 and Mj ¼ M Φj ¼ Φ j Because iZ 0, the smallest possible AR order of this VARMAðp; p 1Þ is p ¼ i þ 2 ¼ 0 þ 2 ¼ 2, and thus a VARMAð2; 1Þ model. But this result is in fact only possible in special cases. Recall, Assumption 5 is sZ n. Since F 0 ¼ ðC 0x C x Þ 1 C 0x , the matrix F0 may only exist if s¼n; otherwise C 0x C x is not full rank, because Cx is ðn sÞ. Thus, the minimum possible VARMAðp; p 1Þ order using Proposition 1 of Morris (2016b) for sZ n in general is in fact VARMA(3,2). VARMA(2,1) may only exist if s ¼n, in which case F 0 ¼ C x 1 ; this is Morris (2016b, Corollary 1). Note from the functional form of (A.1) that if in addition to s¼n and ~ 1 disappear. Thus, in this case, the model is further ~ 1 and M i¼ 0, we have Ay ¼ 0sn and C y ¼ 0nn , and so all terms but Φ reduced to a VARMA(1,1). This is Morris (2016b, Corollary 2). If in addition there exists an invertible ðn nÞ matrix H such ~ 1 ¼ 0nn and this further reduces to VAR(1). This is Corollary 3 of Morris (2016b). that yt ¼ Hxt, in turns out that M 0 Finally, if the original model were stated in terms of non-mean zero variable zt ¼ c þ Azt 1 þ B0 D0 εt for z0t ¼ x0t y0t , then the VARMAðp; p 1Þ expression (A.1) must be amended to include an ðn 1Þ constant μ on the right-hand side, as in (3). It is written as ! p X μ ¼ In Φi ½0ns I n ðI n þ s AÞ 1 c ðA:2Þ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} i¼1
Eðyt Þ
Appendix B. Identifiability of VARMA parameters Zadrozny (2016) assumes that the model has VARMAðp; p 1Þ form yt ¼ A1 yt 1 þ ⋯ þ Ap yt p þB0 vt þB1 vt 1 þ ⋯ þ Bp 1 vt p þ 1 for B0 lower-triangular and nonsingular and vt iidð0n1 ; I n Þ. Assuming momentarily μ ¼ 0n1 , the setup of this paper implies VARMAðp; p 1Þ form yt ¼ Φ1 yt 1 þ⋯ þ Φp yt p þ ut þ M 1 ut 1 þ ⋯ þ M p 1 ut p þ 1 with ut iidNð0n1 ; ΩÞ (for example, as a result of Proposition 1). Under Assumption 2, Ω has Cholesky decomposition Ω ¼ LL0 for L lower-triangular and nonsingular. Thus, the setup here is just the special case of Zadrozny in which vt is Gaussian, since B0 ¼ L, B1 ¼ M 1 L, etc. Given this setup, Zadrozny makes assumptions (Conditions I, II, and III) which are equivalent to Assumptions 2–4 in this paper. In addition to these assumptions, Zadrozny assumes that the VARMA model under consideration is controllable and observable (Conditions IV and V), and that the matrix M (7) is diagonizable (Condition VI). Under Conditions I–VI, Zadrozny shows that the AR and MA parameters of the presumed VARMA representation are identifiable. Assumptions 6–8 are equivalent to the as of yet missing Conditions IV–VI (Hautus, 1970). So Φ1,…,Φp, L, M 1 L, …, M p 1 L are identifiable here by the same reasoning. We know M1,…, M p 1 to be individually identifiable because L is invertible. μ is identifiable from (A.2) because Eðyt Þ is identifiable, and now all AR parameters are known to be as well. Finally, the entire set of what are therefore reduced form parameters π is identifiable because R (6) is full column rank h.
Appendix C. Consistent estimation of reduced form parameters Consider the model (3) written in VARMAðp; p 1Þ form. Since ut are iid, yt i 8 iZ p are uncorrelated with all lags of the errors, and so valid instruments for the AR parameters. In particular, a subset of the Yule Walker equations, written in terms of Γ i ¼ Eðyt y0t i Þ ¼ Γ 0 i 8 i Z 1, is 2 3 Γ p 1 Γ p … Γ 2p 2 6 ⋮ ⋱ ⋮ 7 Γ p … Γ 2p 1 ¼ Φ1 … Φp 4 ⋮ ðC:1Þ 5 Γ1 … Γp 1 Γ0 A consistent estimator for the vectorized AR parameters ϕ ¼ ½ðvecðΦ1 ÞÞ0 ; …; ðvecðΦp ÞÞ0 0 ðpn2 1Þ is obtained by simply estimating sample analogues to the autocovariances, plugging in to (C.1), and inverting the right-most matrix; the inverse exists in population because the AR parameters are identifiable (see Appendix B). The second step is to obtain a consistent estimator of the vectorized MA coefficients m ¼ ½ðvecðM 1 ÞÞ0 ; …; ðvecðM p 1 ÞÞ0 0 ððp 1Þn2 1Þ and Cholesky decomposition of error covariances l ¼ vechðLÞ ðnðn þ 1Þ=2 1Þ. This is less straightforward, but a “feasible” approach does exist. Multiplying both sides of the VARMAðp; p 1Þ now by y0t i 8 i op and taking expectations yields, yt ðp 1Þ : Γ p 1 ¼ Φ1 Γ p 2 þ ⋯ þ Φp 2 Γ 1 þ Φp 1 Γ 0 þΦp Γ 01 þM p 1 Ω
yt ðp 2Þ : Γ p 2 ¼ Φ1 Γ p 3 þ ⋯ þ Φp 2 Γ 0 þ Φp 1 Γ 01 þΦp Γ 02 þM p 2 Ω þM p 1 ððΦ1 þ M 1 ÞΩÞ0 yt ðp 3Þ : Γ p 3 ¼ Φ1 Γ p 4 þ ⋯ þ Φp 2 Γ 01 þ Φp 1 Γ 02 þΦp Γ 03 þM p 3 Ω þM p 2 ððΦ1 þ M 1 ÞΩÞ0
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
79
þ M p 1 ððΦ1 ðΦ1 þM 1 Þ þΦ2 þ M 2 ÞΩÞ0 and so forth, where M 0 ¼ I n . In theory, the above expressions may be used to solve for consistent estimators for fM i g and Ω in ^ i g, which are given by the Yule Walker terms of the estimators fΓ^ i g, which are directly available from the data, and fΦ equations previous. Yet, the above expressions are difficult to solve in practice since they comprise a matrix quadratic equation. Therefore, the suggestion of this paper is to solve for estimates of fMi g and Ω by iteration. In particular, by beginning with an assumption about Ω, one may calculate estimates of M1,…, M p 1 using the first p 1 equations, and then update the estimate of Ω with the p-th equation. For example, if p¼ 3, we have: Γ 2 ¼ Φ1 Γ 1 þ Φ2 Γ 0 þΦ3 Γ 01 þM 2 Ω Γ 1 ¼ Φ1 Γ 0 þ Φ2 Γ 01 þΦ3 Γ 02 þM 1 Ωþ M 2 ððΦ1 þ M 1 ÞΩÞ0 Γ 0 ¼ Φ1 Γ 01 þ Φ2 Γ 02 þΦ3 Γ 03 þΩ þ M 1 ððΦ1 þ M 1 ÞΩÞ0 þ M 2 ððΦ1 ðΦ1 þ M 1 Þ þ Φ2 þ M 2 ÞΩÞ0 ^ ^ Begin with iteration i ¼ 1. By guessing Ωði 1Þ ¼ Ωð0Þ ¼ I n , or any other positive definite matrix, the first equation gives a ^ 2 ð1Þ. closed-form expression for M
^ 2 ðiÞ ¼ Γ^ 2 Φ ^ 1Þ 1 ^ 1 Γ^ 1 Φ ^ 2 Γ^ 0 Φ ^ 3 Γ^ 01 Ωði M The above expression exists because all positive definite matrices are invertible. The second expression gives an expression ^ 1 ð1Þ. for a conditional estimator M h i n o1 ^ 2 ðiÞΩði ^ 1 ðiÞÞ ¼ Ωði ^ ^ 1ÞÞ K n;n 1Þ I n þ I n ðM vecðM
^ 2 ðiÞΩði ^ ^ 1 Γ^ 0 Φ ^ 2 Γ^ 01 Φ ^ 3 Γ^ 02 M ^ 01 vec Γ^ 1 Φ 1ÞΦ
for K n;n the commutation matrix. The third (p-th) equation defines the update for Ω: n h i o1 ^ 1 ðiÞÞ M ^ 1 ðiÞ þ Φ ^ 1 ðiÞÞ þ Φ ^ 2 ðiÞ M ^ 2 ðiÞ ^ ^ 1 ðΦ ^ 1 þM ^ 2 þM vechðΩðiÞÞ ¼ Dnþ I n2 þ ðΦ1 þ M
^ 1 Γ^ 01 Φ ^ 2 Γ^ 02 Φ ^ 3 Γ^ 03 vec Γ^ 0 Φ for Dnþ ¼ ðD0n Dn Þ 1 D0n the Moore–Penrose pseudoinverse of the duplication matrix Dn. One may iterate on this 3-equation sequence once, or until the estimates sufficiently do not change.
Appendix D. Likelihood function and its properties Likelihood function: Here we will derive the likelihood function for the Gaussian VARMA(p,q) model (3). See also 1 P μ. Defining ψ 0 ¼ ½ϕ0 ; m0 , Lütkepohl (2005) (p. 467 onward) for similar steps. Recall from (A.2) that Eðyt Þ ¼ I n pi¼ 1 Φi and assuming initial conditions y p þ 1 Eðyt Þ ¼ ⋯ ¼ y0 Eðyt Þ ¼ u q þ 1 ¼ ⋯ ¼ u0 ¼ 0n1 for simplicity, the likelihood of (3) is proportional to ( ) T 1X ut ðμ; ψ Þ0 Ω 1 ut ðμ; ψ Þ ðD:1Þ L0 ðμ; ψ; ΩÞ ¼ jΩj T=2 exp 2t ¼1
P 1 pure VARð1Þ representation ut ðμ; ψÞ ¼ ðyt Eðyt ÞÞ ti ¼ 1 Π i ðψÞðyt i Eðyt ÞÞ where Πi are the coefficients of the P P1 yt Eðyt Þ ¼ i ¼ 1 Π i ðyt i Eðyt ÞÞ þ ut . These are recursively defined as Π i ¼ Φi þ M i 1i 4 1 ij ¼11 M i j Π j with M0 ¼ I n and M i j ¼ 0nn for i j 4 q. Normal equations: The normal equations for μ corresponding to the exact log-likelihood ℓðμ; ψ; ΩÞ ¼ lnðconstantÞ þ lnLðμ; ψ; ΩÞ are written in terms of the pure VAR representation coefficients fΠ i g and VARMA AR coefficients fΦi g, ! !1 p T T t 1 X X X X ∂ℓ 0 1 ∂ut ∂Eðyt Þ 0 1 ¼ ut Ω ¼ ut Ω In Π i ðψÞ In Φi ðD:2Þ ∂μ0 ∂Eðyt Þ0 ∂μ0 t¼1 t¼1 i¼1 i¼1 Similarly, the normal equations for ψ are T X ∂ℓ ∂ut ¼ u0t Ω 1 0 0 ∂ψ ∂ψ t¼1
ðD:3Þ
where ut ¼ ðyt Eðyt ÞÞ ½Φ1 ðyt 1 Eðyt ÞÞ þ ⋯ þ Φp ðyt p Eðyt ÞÞ þ M1 ut 1 þ ⋯ þ Mq ut q . ∂ut =∂m0 are defined recursively under
80
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
the assumption that the initial conditions hold, and as a consequence, that ∂ut =∂m0 ¼ 01ðp 1Þn2 for all t r 0:
∂ut ∂ut ∂ut ¼ ∂ψ 0 ∂ϕ0 ∂m0
ðD:4Þ
∂ut 0 0 0 ¼ ½ðyt 1 Eðyt ÞÞ ; …; ðyt p Eðyt ÞÞ I n ∂ϕ
h i ∂ut q ∂ut ∂ut 1 ¼ u0t 1 ; …; u0t q I n M 1 … M q 0 ∂m ∂m0 ∂m0 The normal equations corresponding to l ¼ vechðLÞ for the Cholesky decomposition Ω ¼ LL0 are " # T 1X 0 1 ∂ℓ T ∂vechðΩÞ 1 0 0 1 vecðΩ ¼ Þ þ u Ω Ω u Dn 0 0 t t 2 2t ¼1 ∂l ∂l ∂vechðΩÞ ¼ Dnþ L I n þ ðI n LÞK n;n Dn 0 ∂l
ðD:5Þ
ðD:6Þ
Maximum likelihood estimator: A consistent estimator for Ω is always available from, T X ^ ¼1 ut ðμ; ^ γ^ Þut ðμ; ^ ψ^ Þ0 Ω Tt¼1
ðD:7Þ
where μ^ and ψ^ are consistent estimators for μ and ψ, respectively. But in the special case these are the maximum likelihood ^ MLE . While one approach to obtaining the MLE estimators μ^ MLE and ψ^ MLE , then the above estimator for Ω is likewise the MLE Ω is to numerically maximize ℓ by searching over values of μ, ψ, and l, a well-known result with respect to Gaussian likelihood functions is that the MLE for Ω may be plugged into ℓ to yield the concentrated log-likelihood in terms of μ and ψ alone. ^ Given this expression, in place of maximizing ℓ directly, one may equivalently simply seek to minimize detðΩðμ; ψÞÞ by searching over values of μ and ψ numerically. Useful starting points are given by instrumental variables estimators ^ MLE is then also known. (Appendix C). The minimizers are identically μ^ MLE and ψ^ MLE and given these, Ω Information matrix: A consistent estimator of the asymptotic information matrix with respect to the reduced form ^ as follows. The below results parameters π, labeled I , may be written in terms of any consistent estimators μ, ^ ψ^ , and Ω follow from the normal equations (D.2), (D.3), and (D.5) and elementary matrix algebra operations unless noted. R is the restriction matrix defined in (6): 2 3 I^ μμ I^ μψ I^ μl 6 0 0 7 6 7 I^ ¼ R0 6 I^ μψ I^ ψψ I^ ψl 7R ðD:8Þ 4 0 5 0 ^ ^ ^ I I I ll μl
ψl
The top-left block simply comes from the next derivative of the normal equations (D.2): !0 1 !1 !0 ! p p T TX 1 TX 1 X X 1X ^ 1 In ^i ^i I^ μμ ¼ In In Π i ðψ^ Þ Ω Π i ðψ^ Þ In Φ Φ Tt¼1 i¼1 i¼1 i¼1 i¼1 The middle block is the following, using the form of ∂ut =∂ψ 0 from (D.4): !0 T 1X ∂ut ^ ^ 1 ∂ut Ω I ψψ ¼ 0 0 T t ¼ 1 ∂ψ ψ ¼ ψ^ ∂ψ ψ ¼ ψ^
ðD:9Þ
Note, the previous equation does not have a term containing ∂2 ut =∂ψ∂ψ 0 , which should be surprising given the normal equations with respect to ψ (D.3) contain ∂ut =∂ψ 0 . This is because this term disappears using the “trick” described on Lütkepohl (2005, p. 473). In short, the coefficient of this second derivative contains ut, which is mean zero, and uncorrelated from the second derivative. The same trick implies the following two equalities: !0 !0 1 p T TX 1 X 1X ^ 1 ∂ut ^i I^ μψ ¼ In In Π i ðψ^ Þ Ω Φ I^ μl ¼ 0nnðn þ 1Þ=2 T ∂ψ 0 t¼1
i¼1
i¼1
ψ ¼ ψ^
Beginning with (D.3), basic matrix calculus and algebra operations yield, " !0 # T
1 1X ∂ut 0 ^ ^ 1 Dn ∂vechðΩÞ ^ Ω I^ ψl ¼ u ð μ; ^ ψ Þ Ω t 0 ^ 0 Tt¼1 ∂ψ ψ ¼ ψ^ ∂l l¼l Pt P 1 1 1 0 Finally, beginning with dð∂ℓ=∂ΩÞ ¼ T=2dΩ þ 1=2dΩ ð t ¼ 1 ut ut ÞΩ þ Ω 1 ð tt ¼ 1 ut u0t ÞdΩ 1 , 0 h 1 1 1 ∂vechðΩÞ ^ ^ 1 I^ ll ¼ D0n T Ω Ω 0 T2 ^ ∂l l¼l
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
"
T X
^ 1 Ω
# ! ! ^ 1 Ω ^ 1 ut ðμ; ^ ψ^ Þut ðμ; ^ ψ^ Þ0 Ω
t¼1
^ 1 … Ω
81
" ^ 1 Ω
# !!# ∂vechðΩÞ ^ 1 ut ðμ; ^ ψ^ Þut ðμ; ^ ψ^ Þ0 Ω Dn 0 ^ ∂l l¼l t¼1 T X
Appendix E. Indeterminate solution to An and Schorfheide (2007). 0 The model (17)–(22) has the form of Sims (2002, Equation (1)), Γ 0 wt ¼ Γ 1 wt þ Ψ εt þΠηt for wt ¼ v0t ðι0 Et vt þ 1 Þ0 is ð8 1Þ 0 0 0 where vt ¼ zt g t r t yt π t ct is ð6 1Þ and ι0 Et vt þ 1 ¼ Et yt þ 1 Et π t þ 1 is ð2 1Þ. εt ¼ εzt εgt εrt is ð3 1Þ, and η0t ¼ ηyt ηπt is e¼2 dimensional for ηyt ¼ yt Et 1 yt and ηπt ¼ π t Et 1 π t . The parameters are: 2 3 1 0 0 0 0 0 0 0 2 3 6 0 0 0 1 0 0 0 0 0 0 7 6 7 6 7 60 07 6 7 Þψ 1 ð1 ρ Þψ 0 0 0 0 0 ð1 ρ r y r y 6 7 6 7 6 7 6 ρ =τ ð1 ρ Þ 1=τ 7 60 07 1 0 0 1 1=τ z g 7 7 Γ0 ¼ 6 ι ¼6 6 7 6 1 0 7 ð88Þ 6 0 7 ð62Þ κ 0 κ 1 0 0 β 6 7 6 7 6 7 6 0 7 40 15 1 0 1 0 1 0 0 6 7 6 7 4 0 0 0 1 0 0 0 0 5 0 0 2 6 6 6 6 6 6 6 Γ1 ¼ 6 6 ð88Þ 6 6 6 6 4
ρz
0
0
0
0
0 0 0
0
3
0
ρg
0
0
0
0
0
0 0
0 0
ρr 0
0 0
0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
07 7 7 07 7 07 7 7 07 7 07 7 7 05
0
0
0
0
0
0
0
1
0
0 2
1
0
60 6 6 60 6 60 6 Ψ ¼6 60 ð83Þ 6 60 6 6 40
1
0
0 3 0 07 7 7 17 7 07 7 7 07 7 07 7 7 05
0
0
0
0 0 0 0
1
0 0 2 3 0 0 60 07 6 7 6 7 60 07 6 7 60 07 6 7 Π ¼6 7 60 07 ð82Þ 6 7 60 07 6 7 6 7 41 05 0
0
1
At θ0 there is 1 unstable root. So, the degrees of indeterminacy are di ¼ e ðunstablerootsÞ ¼ 2 1 ¼ 1, and Sims's algorithm will complain that the point yields an indeterminate solution. Farmer et al.'s (2015) solution is to choose di¼1 element from ηt at a time and use it to augment the set of shocks εt. This yields two alternative representations of the model which will each have a stable solution using Sims's algorithm. The first model is Γ 0 wt ¼ Γ 1 wt þΨ 1 ε1t þ Π 1 ηπt for ε01t ¼ ε0t ηyt and the 0 0 second Γ 0 wt ¼ Γ 1 wt þ Ψ 2 ε2t þΠ 2 ηyt for ε2t ¼ εt ηπt where 2 2 3 2 2 3 3 3 1 0 0 0 0 1 0 0 0 0 60 1 0 07 607 60 1 0 07 607 6 6 7 6 6 7 7 7 6 6 7 6 6 7 7 7 60 0 1 07 607 60 0 1 07 607 6 6 7 6 6 7 7 7 60 0 0 07 607 60 0 0 07 607 6 6 7 6 6 7 7 7 Ψ1 ¼ 6 7 Π1 ¼ 6 7 Ψ 2 ¼ 6 7 Π2 ¼ 6 7 6 6 6 607 7 7 7 0 0 0 0 0 0 0 0 0 ð84Þ 6 7 ð81Þ 6 7 ð84Þ 6 7 ð81Þ 6 7 60 0 0 07 607 60 0 0 07 607 6 6 7 6 6 7 7 7 6 6 7 6 6 7 7 7 40 0 0 15 405 40 0 0 05 415 0
0
0
0
1
0
0
0
1
0
The first model's solution has the following form; note the repeated elements ρvy and ρyy: 2 3 2 3 ρvv σ vε ρvy " ρvy " " # # # vt 6 ð66Þ ð61Þ 7 vt 1 6 ð63Þ ð61Þ 7 εt ¼4 ρ þ Solution to model 1: 5 4 5 ρyy σ yε ρyy ηyt Et yt þ 1 Et 1 yt yv ð16Þ
x0t ¼ zt g t r t v02t ¼ yt π t ct v0t ¼ x0t v02t 2 3 2 3 2 3 0 ρxx ρxy σ xε ð33Þ 6 ð33Þ 7 6 ð31Þ 7 6 ð33Þ 7 ρvv ¼ 4 ρvy ¼ 4 ρ 5 σ vε ¼ 4 σ 5 ρ2x 0 5 2ε 2y ð61Þ ð66Þ ð33Þ
ð33Þ
ð61Þ
ð31Þ
ð13Þ
ð11Þ
ð33Þ
ð31Þ
" ρyv ¼ ð16Þ
# ρyx ð13Þ
0
ð13Þ
ðE:1Þ
82
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
We wish to enable reduced form analysis across all solutions, and so must simplify the indeterminate solution (E.1) to be entirely in terms of states vt and errors εt (as in the determinate solution). Since ηyt ¼ yt Et 1 yt , the first block (left-hand side xt) may be simplified to xt ¼ ρxx xt 1 þ ρxy yt þ σ xε εt and the third block (left-hand side Et yt þ 1 ) to Et yt þ 1 ¼ ρyx xt 1 þ ρyy yt þ σ yε εt . Guess yt ¼ ρ0y xt 1 þ σ 0y εt for some ð3 1Þ vectors ρy and σy so that Et yt þ 1 ¼ ρ0y xt . Plugging this into the left-hand side of the third block and multiplying both sides of the first block by ρy yields two equations with ρ0y xt on the left-hand side. Comparing coefficients on xt 1 yields, 1 ρ0y ¼ ρyx ρxx
Note, since yt ¼ ρ0y xt 1 þ σ 0y εt , then ηyt ¼ yt Et 1 yt ¼ yt ρ0y xt 1 ¼ σ 0y εt . Therefore, the covariance of the forecast errors ηyt with the fundamental errors εt is Eðηyt ε0t Þ ¼ σ 0y Σ. Thus, σ 0y is a ð3 1Þ vector of new structural parameters which dictate the ~ in Lubik and Schorfheide (2004) and Mz in Farmer macro-forecast relationship, and must be estimated directly (related to M et al. (2015)). Plugging in yields the simplified solution in terms of vt and εt only: " # " # " # ρxx þ ρxy ρ0y σ xε þ ρxy σ 0y xt Solution to model 1; simplified: ¼ þ ðE:2Þ x ε ρ2x þ ρ2y ρ0y t 1 σ 2ε þ ρ2y σ 0y t v2t |fflfflffl{zfflfflffl} vt
An analogous approach may be used for the solution to model 2, replacing y's with π's.
Appendix F. Smets and Wouters (2007) appendix Equilibrium with real rigidities: The model is composed of two separate equilibria. The first, corresponding to the theoretical setting in which prices and wages are sticky, is represented by 14 unique equations paired with 14 endogenous variables. Nominal, detrended “installed” capital obeys kt ¼
I^ 1 δ 1 I^ kt 1 þ it þ sγ 1 þ βγ 1 τ et γ γ K^ K^
ðF:1Þ
^ K^ ¼ γ ð1 δÞ is the ratio of steady state investment to capital. The Euler equation arising from household optimization I= may be written as follows. ct is the log deviation of real, detrended consumption from its natural rate: γ h 1 τ γ Et ct þ 1 þ ct 1 κe ðlt Et lt þ 1 Þ γ þh γ þh τ γ þh 1 γ h ðr t Et π t þ 1 þbt Þ τ γ þh
ct ¼
1 1 α K K^ Y^ R 1 þ λw α Y^ C^ ^ 1 Y I^ ¼ ¼ γ ð 1 δÞ ^ ^ ^ G I K K^ C 1 Y^ K^ Y^ !1=ð1 αÞ Y^ αα ð1 αÞ1 α ^ W¼ ðRK Þα Y^ þ Φ
ðF:2Þ
κe ¼
K^ Y^ þΦ L α 1 ¼ K^ Y^ Y^
RK ¼
L 1 α RK ¼ ^ α W K^
γτ ð1 δÞ β
W is the steady state of detrended real wages and RK is the steady state of the rental rate on depreciable capital. The ^ LÞ are steady state parameters parameters ðY^ ; GÞ are steady state values that appear in the structural parameter θ. ðK^ ; C^ ; I; which are functions of the structural parameters, as described below. Hats correspond to detrended variables. Investment is real, detrended, and in part determined in part by Tobin's Q: it ¼
βγ 1 τ 1 1 1 Et it þ 1 þ it 1 þ 2 q þ et 1 þ βγ 1 τ sγ 1 þ βγ 1 τ t 1 þ βγ 1 τ
ðF:3Þ
The Phillips curve is written as πt ¼
βγ 1 τ ιp Et π t þ 1 þ π t 1 κp μpt þ mpt 1 þ ιp βγ 1 τ 1 þ ιp βγ 1 τ
κp ¼
1 ξp βγ 1 τ ð1 ξp Þ ξp ð1 þζ p ðΦ=Y^ ÞÞð1 þ ιp βγ 1 τ Þ
The wage relation is
ðF:4Þ
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
wt ¼
βγ 1 τ 1 1 þιw βγ 1 τ w ðEt wt þ 1 þEt π t þ 1 Þ þ 1 τ ðwt 1 þιw π t 1 Þ π t κ w μw t þ mt 1 þ βγ 1 τ 1 þβγ 1 τ βγ
κw ¼
ð1 ξw βγ 1 τ Þð1 ξw Þ ξw ð1 þζ w λw Þð1 þιw βγ 1 τ Þ p
83
ðF:5Þ
w
Wages are real and detrended. μt and μt are the markups to prices and wages, respectively: μpt ¼ αðst lt Þ wt þzt μw t ¼ wt ℓlt
ðF:6Þ
γ h ct þ ct 1 γ h γ h
ðF:7Þ
where markups are nominal and not detrended. The same is true for Tobin's Q, which obeys qt ¼ βð1 δÞγ τ Et qt þ 1 þEt π t þ 1 þð1 βð1 δÞγ τ ÞEt r ktþ 1 r t bt
ðF:8Þ
Taylor's rule for the nominal interest rate is written as follows. Starred variables are those arising from the equilibrium derived without real rigidities. yt will be known as the “natural” rate of output. ft is the idiosyncratic component of Fed policy not captured by the Taylor rule: r t ¼ ρr r t 1 þ ð1 ρr Þψ π π t þ ðð1 ρr Þψ y þ ψ Δ Þ yt yt ψ Δ yt 1 yt 1 þ f t ðF:9Þ while the real but not detrended rental rate for installed capital kt is given by r kt ¼ lt þ wt st
ðF:10Þ
where utilized capital st is related to installed capital (both nominal, detrended) by st ¼ kt 1 þut
ðF:11Þ
where ut is nominal and not detrended capacity utilization, simply defined by ut ¼
1u k r u t
ðF:12Þ
Nominal and not detrended labor hours are given by lt ¼
1 α 1 Y^ st zt y 1 α Y^ þΦ t 1 α 1α
ðF:13Þ
and finally, real and detrended output is defined by the aggregate accounting equality. I^ K^ K^ 1 u k C^ rt þ gt yt ¼ ct þ it þRK K^ Y^ Y^ u Y^
ðF:14Þ p
w
k
Eqs. (F.1)–(F.14) define rules of motion for the 14 variables kt, ct, it, πt, wt, μt , μt , qt, rt, rt , st, ut, lt, and yt. Equilibrium without real rigidities defines the rule of motion for the natural rate of output, yt , in the Taylor rule. The remaining 7 variables used p w above, zt, gt, et, bt, ft, mt , and mt , will be given reduced form rules of motion in the section after next. Equilibrium without real rigidities: The second set of equilibrium conditions defining the model corresponds to the theoretical setting in which prices and wages are flexible or, “natural” rates. There are 10 equations paired with 10 endogenous variables. The natural level of installed capital follows
kt ¼
I^ 1δ 1 I^ i þ sγ 1 þ βγ 1 τ et k þ γ t 1 γ K^ t K^
ðF:15Þ
while the natural rate of consumption ct follows ct ¼
γ h γ h h wt ℓlt ct 1 γ γ γ
ðF:16Þ
This expression for ct is distinct from the consumption Euler equation for ct in Eq. (F.2). This is because the flexible price/ wage analogue of Eq. (F.2) is used to define a rule of motion for r t in the absence of a flexible price/wage Taylor rule. The natural rate of investment follows it ¼
βγ 1 τ 1 1 1 E t i þ i þ q þet 1 þβγ 1 τ t þ 1 1 þ βγ 1 τ t 1 sγ 2 1 þ βγ 1 τ t
Given that price markup is zero in this case, we have the following wages and Tobin's q wt ¼ α st lt þ zt qt ¼ βð1 δÞγ τ Et qt þ 1 þð1 βð1 δÞγ τ ÞEt r k t þ 1 r t bt
and as noted, in place of a Taylor rule,
r t
is defined by a rearranged consumption Euler equation
ðF:17Þ
ðF:18Þ ðF:19Þ
84
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
r t ¼
τ γ c γEt ct þ 1 hct 1 ð1 τÞ κe lt Et lt þ 1 bt γ h t γ h
ðF:20Þ
κe is defined following Eq. (F.2), and the natural rental rate for installed capital kt is
r k t ¼ lt þ wt st
Natural utilized capital
ðF:21Þ st
is related to the natural level of installed capital by
1 u k rt st ¼ kt 1 þ
ðF:22Þ
u
Finally, natural labor hours and the natural output are, respectively, 1 α 1 Y^ s zt y 1 α Y^ þ Φ t 1 α t 1 α
ðF:23Þ
I^ K^ K^ 1 u k C^ rt þ gt it þ RK yt ¼ ct þ ^ ^ ^ KY Y^ u Y
ðF:24Þ
lt ¼
n
n
n
n
To conclude, the 10 numbered equations (F.15)–(F.24) define rules of motion for the 10 natural rates and levels kt , ct , it , wt , n n kn qt , rt , rt , st , lt , and yt . p w Exogenous processes: Seven variables zt, gt, et, bt, ft, mt , and mt do not have equilibrium conditions. Instead, five of these are AR(1) and two are ARMA(1,1): zt ¼ ρz zit þεzt
ðF:25Þ
g t ¼ ρg g t 1 þ εgt þ ϑgz εzt
ðF:26Þ
et ¼ ρe et 1 þ εet
ðF:27Þ
bt ¼ ρb bt 1 þ εbt
ðF:28Þ
f t ¼ ρf f t 1 þ εft
ðF:29Þ
mpt ¼ ρp mpt 1 þ εpt ϑp εpt 1
ðF:30Þ
w mw t ¼ ρw mt 1 þ εwt ϑw εwt 1
ðF:31Þ
All errors εðiÞt are iid Gaussian with mean zero and variance It will be convenient to write each of the ARMA(1,1) processes (F.30) and (F.31) as 2-dimensional VAR(1)'s. Specifically, defining σ 2ðiÞ .
λpt ¼ ρp mpt ϑp εpt w λw t ¼ ρw mt ϑw εwt
Then (F.30) and (F.31) are mpt ¼ λpt þεpt
ðF:32Þ
w mw t ¼ λt þ εwt
ðF:33Þ
λpt ¼ ρp λpt 1 þ ðρp ϑp Þεpt
ðF:34Þ
w λw t ¼ ρw λt 1 þ ðρw ϑw Þεwt
ðF:35Þ
Steady state conditions: The steady states of output, inflation, and government spending, Y^ , Π, and G, are included in the structural parameters, and R ¼ ðΠγ τ Þ=β. Using the definitions following Eq. (F.2), we already have explicit functions for the steady states of the rental rate RK and wage W. Given those definitions we may also define ! G I^ K^ ^ ^ 1 α RK ^ I^ K^ ^ ^ K Y I ¼ ðγ ð1 δÞÞ Y^ L ¼ K¼ C ¼ 1 ^ ^ ^ ^ ^ α W γ ð1 δÞ Y KY Y
References Abadir, K.M., Magnus, J.R., 2005. Matrix Algebra. Cambridge University Press, Cambridge, UK. An, S., Schorfheide, F., 2007. Bayesian analysis of DSGE models. Econom. Rev. 26 (2–4), 113–172. Andreasen, M.M., 2010. How to maximize the likelihood function for a DSGE model. Comput. Econ. 35 (2), 127–154. Andrews, I., Mikusheva, A., 2015. Maximum likelihood inference in weakly identified DSGE models. Quant. Econ. 6.
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
85
Bårdsen, G., Fanelli, L., 2015. Frequentist evaluation of small DSGE models. J. Bus. Econ. Stat. 33 (3), 307–322. Bound, J., Jaeger, D.A., Baker, R., 1995. Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variables is weak. J. Am. Stat. Assoc. 90, 443–450. Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M., 2015. Time Series Analysis: Forecasting and Control. John Wiley & Sons, Hoboken, New Jersey, USA. Brock, W.A., Mirman, L.J., 1972. Optimal economic growth and uncertainty: the discounted case. J. Econ. Theory 4 (3), 479–513. Canova, F., Ferroni, F., Matthes, C., 2014. Choosing the variables to estimate singular DSGE models. J. Appl. Econom. 29 (7), 1099–1117. Canova, F., Sala, L., 2009. Back to square one: identification issues in DSGE models. J. Monet. Econ. 56 (4), 431–449. Chib, S., Ramamurthy, S., 2010. Tailored randomized block MCMC methods with application to DSGE models. J. Econom. 155 (1), 19–38. Christiano, L.J., Eichenbaum, M., Evans, C.L., 2005. Nominal rigidities and the dynamic effects of a shock to monetary policy. J. Polit. Econ. 113 (1), 1–45. Christiano, L.J., Eichenbaum, M.S., Trabandt, M., 2016. Unemployment and business cycles. Econometrica 84 (4). Christiano, L.J., Trabandt, M., Walentin, K., 2010. DSGE Models for Monetary Policy Analysis. NBER Working Paper No. 16074. Cochrane, J., 2011. Determinacy and identification with Taylor Rules. J. Polit. Econ. 119 (3), 565–615. Cooley, T.F., Quadrini, V., 1999. A Neoclassical model of the Phillips curve relation. J. Monet. Econ. 44 (2), 165–193. Cooper, D., Thompson, R., 1977. Note concerning the Akaike and Hannan estimation procedures for an autoregressive-moving average process—a note on the estimation of the parameters of the autoregressive-moving average process. Biometrika 64 (3), 625–628. Creal, D., 2012. Sequential Monte Carlo samplers for Bayesian DSGE models. Econom. Rev. 31 (3), 245–296. DeJong, D.N., Whiteman, C.H., 1993. Estimating moving average parameters: classical pileups and Bayesian posteriors. J. Bus. Econ. Stat. 11 (3), 311–317. Dufour, J.-M., Khalaf, L., Kichian, M., 2013. Identification-robust analysis of DSGE and structural macroeconomic models. J. Monet. Econ. 60 (3), 340–350. Farmer, R.E., Khramov, V., Nicolò, G., 2015. Solving and estimating indeterminate DSGE models. J. Econ. Dyn. Control 54, 17–36. Fernández-Villaverde, J., Rubio-Ramírez, J., Sargent, T.J., Watson, M.W., 2007. ABCs (and Ds) of understanding VARs. Am. Econ. Rev. 97 (3), 1021–1026. Fernández-Villaverde, J., Rubio-Ramírez, J.F., 2007. Estimating macroeconomic models: a likelihood approach. Rev. Econ. Stud. 74 (4), 1059–1087. Fernández-Villaverde, J., Rubio-Ramirez, J.F., Schorfheide, F., 2015. Solution and estimation methods for DSGE models. In: Handbook of Macroeconomics, vol. 2, pp. 15–42. Fukač, M., Waggoner, D.F., Zha, T., 2007. Local and Global Identification of DSGE Models: A Simultaneous-Equation Approach. Working Paper. Galí, J., 2008. Monetary Policy, Inflation, and the Business Cycle. Princeton University Press, Princeton, New Jersey, USA. Galí, J., Gertler, M., 1999. Inflation dynamics: a structural econometric analysis. J. Monet. Econ. 44 (2), 195–222. Guerron-Quintana, P., Inoue, A., Kilian, L., 2013. Frequentist inference in weakly identified dynamic stochastic general equilibrium models. Quant. Econ. 4 (2), 197–229. Hahn, J., Hausman, J., Kuersteiner, G., 2004. Estimation with weak instruments: accuracy of higher-order bias and mse approximations. Econom. J. 7 (1), 272–306. Hall, P., 1992. The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York. Hamilton, J.D., Waggoner, D., Zha, T., 2007. Normalization in econometrics. Econom. Rev. 26 (2–4), 221–252. Hamilton, J.D., Wu, C., 2012. Identification and estimation of Gaussian affine term structure models. J. Econom. 168 (2), 315–331. Hamilton, J.D., Wu, J.C., 2014. Testable implications of affine term structure models. J. Econom. 178, 231–242. Hannan, E., 1971. The identification problem for multiple equation systems with moving average errors. Econometrica 39, 751–765. Hannan, E., Diestler, M., 1988. The Statistical Theory of Linear Systems. Wiley, New York, USA. Hautus, M., 1970. Stabilization controllability and observability of linear autonomous systems. In: Indagationes Mathematicae (Proceedings), vol. 73. Elsevier, Amsterdam, pp. 448–455. Herbst, E., Schorfheide, F., 2014. Sequential Monte Carlo sampling for DSGE models. J. Appl. Econom. 29 (7), 1073–1098. Horowitz, J.L., 2001. The bootstrap. In: Handbook of Econometrics, vol. 5, pp. 3159–3228. Inoue, A., Rossi, B., 2011. Testing for weak identification in possibly nonlinear models. J. Econom. 161 (2), 246–261. Ireland, P., 2004. A method for taking models to the data. J. Econ. Dyn. Control 28, 1205–1226. Iskrev, N., 2010. Local identification in DSGE models. J. Monet. Econ. 57 (2), 189–202. Jeffreys, H., 1961. Theory of Probability, vol. 3. Oxford University Press, Oxford, UK. Kang, K.M., 1975. A Comparison of Estimators of Moving Average Processes. Australian Bureau of Statistics, Manuscript. Kleibergen, F., Mavroeidis, S., 2009. Weak instrument robust tests in GMM and the New Keynesian Phillips curve. J. Econ. Bus. Stat. 27 (1), 293–311. Kleibergen, F., Mavroeidis, S., 2014. Identification issues in limited-information bayesian analysis of structural macroeconomic models. J. Appl. Econom. 29 (7), 1183–1209. Kociȩcki, A., Kolasa, M., 2014. Global Identification of Linearized DSGE Models. Working Paper. Komunjer, I., Ng, S., 2011. Dynamic identification of DSGE models. Econometrica 79 (6), 1995–2032. Krause, M.U., Lubik, T.A., 2010. Instability and indeterminacy in a simple search and matching model. Fed. Reserve Bank Richmond Econ. Q. 96 (3). Lanne, M., Luoto, J., 2015. Estimation of DSGE Models Under Diffuse Priors and Data-driven Identification Constraints. CREATES Research Papers 37. Lippi, M., Reichlin, L., 1994. VAR analysis, nonfundamental representations, Blaschke matrices. J. Econom. 63 (1), 307–325. Lubik, T., Schorfheide, F., 2004. Testing for indeterminacy: an application to U.S. monetary policy. Am. Econ. Rev. 94 (1), 190–217. Lütkepohl, H., 2005. New Introduction to Multiple Time Series Analysis. Springer, Berlin. Lütkepohl, H., 2006. Forecasting with VARMA models. In: Handbook of Economic Forecasting, vol. 1, pp. 287–325. Mikusheva, A., et al., 2014. Estimation of dynamic stochastic general equilibrium models. Quantile 12, 1–21. (in Russian). Moreira, M., Porter, J.R., Suarez, G.A., 2004. Bootstrap and Higher-order Expansion Validity When Instruments May be Weak. NBER Technical Working Paper 302. Morris, S.D., 2016a. Efficient estimation of macroeconomic equations with latent states. Econ. Model., 60, http://www.sciencedirect.com/science/article/pii/ S0264999316306022. in press. Morris, S.D., 2016b. VARMA representation of DSGE models. Econ. Lett. 138, 30–33. Mortensen, D.T., Pissarides, C.A., 1994. Job creation and job destruction in the theory of unemployment. Rev. Econ. Stud. 61 (3), 397–415. Nelson, C., Startz, R., 1990. Some further results on the exact small sample properties of the instrumental variables estimator. Econometrica 58, 967–976. Paccagnini, A., Rossi, R., 2012. The VARMA Representation of a DSGE Model: A Sign-Restriction Application. Working Paper. Qu, Z., 2014. Inference in DSGE models with possible weak identification. Quant. Econ. 5, 457–494. Qu, Z., Tkachenko, D., 2012. Identification and frequency domain quasi-maximum likelihood estimation of linearized dynamic stochastic general equilibrium models. Quant. Econ. 3 (1), 95–132. Qu, Z., Tkachenko, D., 2016. Local and global parameter identification in DSGE models allowing for indeterminacy. Rev. Econ. Stud., in press. Ravenna, F., 2007. Vector Autoregressions and Reduced Form Representations of DSGE Models. J. Monet. Econ. 54 (7), 2048–2064. Rothenberg, T., 1971. Identification in parametric models. Econometrica 39 (3), 577–591. Rothenberg, T., 1973. Efficient Estimation With A Priori Information. Yale University Press, New Haven, Connecticut, USA. Rothenberg, T.J., 1984. Approximating the distributions of econometric estimators and test statistics. In: Handbook of Econometrics, vol. 2, pp. 881–935. Sargan, J.D., Bhargava, A., 1983. Maximum likelihood estimation of regression models with first order moving average errors when the root lies on the unit circle. Econometrica, 799–820. Sims, C., 2002. Solving linear rational expectations models. Comput. Econ. 20 (1–2), 1–20. Sims, C., 2003. Comments on Smets and Wouters. Mimeo. Smets, F., Wouters, R., 2003. An estimated dynamic stochastic general equilibrium model of the Euro Area. J. Eur. Econ. Assoc. 1 (5), 1123–1175. Smets, F., Wouters, R., 2007. Shocks and frictions in US business cycles: a Bayesian DSGE approach. Am. Econ. Rev. 97 (3), 586–606. Stock, J.H., 1994. Unit roots, structural breaks and trends. In: Handbook of Econometrics, vol. 4, pp. 2739–2841.
86
S.D. Morris / Journal of Economic Dynamics & Control 74 (2017) 56–86
Stock, J.H., 2008. Weak Instruments, Weak Identification, and Many Instruments, Part II. NBER Summer Institute: What's New in Econometrics? Time Series, Lecture 4. Stock, J.H., Watson, M.W., 1998. Median unbiased estimation of coefficient variance in a time-varying parameter model. J. Am. Stat. Assoc. 93 (441), 349–358. Tiao, G.C., Zellner, A., 1964. On the Bayesian estimation of multivariate regression. J. R. Stat. Soc., 277–285. Waggoner, D.F., Wu, H., Zha, T., 2016. Striated metropolis-hastings sampler for high-dimensional models. J. Econom. 192 (2), 406–420. Woźniak, T., 2016. Bayesian vector autoregressions. Aust. Econ. Rev. 49 (3), 365–380. Zadrozny, P.A., 2016. Extended Yule–Walker identification of VARMA models with single-or mixed-frequency data. J. Econom. 193 (2), 438–446.