Computational Statistics and Data Analysis 73 (2014) 69–86
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Asymptotic distributions for quasi-efficient estimators in echelon VARMA models Jean-Marie Dufour a,1 , Tarek Jouini b,∗ a
Department of Economics, McGill University, Leacock Building, Room 519, 855 Sherbrooke Street West, Montréal, Québec H3A 2T7, Canada b Department of Economics, University of Windsor, 1177 Chrysler Hall North Building, 401 Sunset avenue, Windsor, Ontario N9B 3P4, Canada
article
info
Article history: Received 10 December 2012 Received in revised form 31 July 2013 Accepted 1 November 2013 Available online 13 November 2013 Keywords: Stationary invertible VARMA Echelon form Kronecker indices Truncation lag Linear estimation Simulation
abstract Two linear estimators for stationary invertible vector autoregressive moving average (VARMA) models in echelon form — to achieve parameter unicity (identification) — with known Kronecker indices are studied. It is shown that both estimators are consistent and asymptotically normal with strong innovations. The first estimator is a generalized-leastsquares (GLS) version of the two-step ordinary least-squares (OLS) estimator studied in Dufour and Jouini (2005). The second is an asymptotically efficient estimator which is computationally much simpler than the Gaussian maximum-likelihood (ML) estimator which requires highly nonlinear optimization, and ‘‘efficient linear estimators’’ proposed earlier (Hannan and Kavalieris, 1984; Reinsel et al., 1992; Poskitt and Salau, 1995). It stands for a new relatively simple three-step estimator based on a linear regression involving innovation estimates which take into account the truncation error of the first-stage long autoregression. The complex dynamic structure of associated residuals is then exploited to derive an efficient covariance matrix estimator of the VARMA innovations, which is of order T −1 more accurate than the one by the fourth-stage of Hannan and Kavalieris’ procedure. Finally, finite-sample simulation evidence shows that, overall, the asymptotically efficient estimator suggested outperforms its competitors in terms of bias and mean squared errors (MSE) for the models studied. © 2013 Elsevier B.V. All rights reserved.
1. Introduction Vector autoregressive (VAR) modeling has received considerable attention, especially in time series econometrics; see Lütkepohl (2001, 2005), Hamilton (1994) and Dhrymes (1998). This popularity is due to the fact that such models are easy to estimate and can account for relatively complex dynamic phenomena. However, besides it often requires a very large number of parameters to produce a good fit, the VAR specification is not invariant to many basic linear transformations. For instance, VAR subvectors follow VARMA models. Temporal and contemporaneous aggregations lead to mixed VARMA processes (Lütkepohl, 1987). Also, trend and seasonal adjustments lead to models outside the VAR class (Maravall, 1993). The VARMA structure includes VAR models as a special case and can reproduce in a more parsimonious way a broader class of autocovariances and data generating processes, which can improve estimation and forecasting; see Lütkepohl (2006) and Athanasopoulos and Vahid (2008).
∗
Corresponding author. Tel.: +1 519 253 3000x2381; fax: +1 519 973 7096. E-mail addresses:
[email protected] (J.-M. Dufour),
[email protected] (T. Jouini). URL: http://www.jeanmariedufour.com (J.-M. Dufour).
1 Tel.: +1 514 398 8879; fax: +1 514 398 4938. 0167-9473/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.csda.2013.11.002
70
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
VARMA modeling has been proposed long ago (see Hillmer and Tiao, 1979, Tiao and Box, 1981, Reinsel, 1997 and Lütkepohl, 2005) but has been of little use in practice. Indeed, besides fulfilling potentially complex restrictions to achieve identifiability, the task is compounded by the multivariate nature of the data. Once an identifiable specification has been formulated, different estimation methods are considered. But the most studied is ML with strong Gaussian errors; see Hannan (1969a), Hannan et al. (1986), Mauricio (2002, 2006), and Gallego (2009), among others. However, maximizing the exact likelihood in stationary invertible VARMA models is computationally burdensome. Tiao and Box (1981) stressed that it is much easier to maximize a conditional likelihood, though numerical problems still occur with high-dimensional systems in lack of suitable initial values. Recently, Metaxoglou and Smith (2007) studied the identification and ML estimation of VARMA models using EM algorithm-based state-space methods. Although this can yield improvements over earlier ML approaches, we note that recovering the echelon VARMA coefficient estimates from the state-space formulation may not necessarily lead to stationary and invertible models. Further, the Gaussian ML estimation of VARMA models still requires potentially lengthy iterative optimization over a high-dimensional parameter space. Thus, in high-dimensional systems, nonlinear estimation procedures cannot compete with linear methods from the computational cost viewpoint, especially when simulation-based inference is required. Recursive linear regression methods, initially proposed by Hannan and Rissanen (1982) for ARMA models, have been extended to the VARMA case; see Hannan and Kavalieris (1984), Reinsel et al. (1992) and Poskitt and Salau (1995). It consists in estimating, by OLS, the innovations of the VARMA process from a long autoregression to then be used as regressors to estimate the VARMA parameters. Finally, a linear regression on transformed regressors involving newly filtered residuals is performed to achieve efficiency. Note that this multistep linear estimation was initially introduced for model selection and for obtaining consistent estimates which can be used to initialize nonlinear methods, such as ML. The seminal paper by Hannan and Kavalieris (1984) proposed a four-step linear procedure for specifying and estimating stationary ARMAX systems. The first three steps focus on model specification and on providing initial estimates, using Toeplitz regressions based on the Levinson–Whittle algorithm. However, these estimates are substantially biased especially when the ratio of the autoregression-order to the sample size is too large (Hannan and Deistler, 1988). Finally, using a GLS regression, the fourth stage yields asymptotically efficient estimates. Reinsel et al. (1992) analyzed the ML estimation of VARMA models from a GLS viewpoint. Modulo some approximations allowing for the asymptotic equivalence between GLS and ML, they derived a linear regression with error terms following a moving average (MA) process. However, their analysis underscores the heavy computational burden of the method since it systematically requires the inversion of a high-dimensional weighting matrix. Inspired by Koreisha and Pukkila (1990), Poskitt and Salau (1995) investigated the relationship between the GLS and Gaussian estimation of echelon form VARMA models. Although asymptotically equivalent to ML, their estimates are substantially biased in finite samples. With a simulation study comparing selected linear methods on the quality of the estimates and the accuracy of implied forecasts and impulse responses, Kascha (2012) highlighted the overall superiority of the fourth-stage linear estimation procedure of Hannan and Kavalieris (1984), while noting situations where the investigated methods do not perform very well. For making VARMA modeling practical, one needs estimation methods that are simple, quick and easy to implement with standard software. More especially as large-sample-approximation-based inference in high-dimensional dynamic models is unreliable, and that simulation-based procedures, such as bootstrap techniques, are rather recommended. However, such methods are impractical if computing the estimator is difficult or time consuming. In this paper, we study two linear estimators for stationary invertible echelon form VARMA models with known Kronecker indices. We focus on the echelon form since it often tends to deliver relatively parsimonious parameterization (involving fewer free parameters) than equivalent identification schemes, such as the final equations form; see Lütkepohl (2005). Our setup easily adapts to cointegrated VARMA and VARMAX framework and alternative identifying schemes. The first estimator is a GLS version of the two-step OLS estimator studied in Dufour and Jouini (2005), using a more general setup. The second is a new relatively simple three-step linear estimator which is asymptotically equivalent to ML. Unlike predecessors, it relies on the novelty that consists on using, among the regressors, filtered residuals which take into account the truncation error of the first-stage long autoregression, based on a newly proposed recursive scheme√using consistent initial values. It can also be interpreted as a one-step estimator by the scoring method, starting from a T -consistent two-step linear estimator. The proposed estimator is computationally much simpler and more practical than the ML estimator and earlier asymptotically efficient ‘‘linear’’ estimators, namely those suggested by Hannan and Kavalieris (1984), Reinsel et al. (1992), and Poskitt and Salau (1995). As such, both of the estimators studied provide a handy basis for applying resampling inference methods (e.g., bootstrapping). We show that both estimators are consistent and asymptotically normal with strong innovations. Besides being computationally simpler, our efficient estimator shows distributional theory with explicit formulae of its asymptotic covariance matrix which is relatively simple and easy to estimate for inference purpose. Also, exploiting the complex dynamic structure of the third-stage regression residuals, we derive an efficient covariance estimator of the VARMA innovations, which is of order T −1 more accurate than the one by the fourth-stage of Hannan and Kavalieris (1984). Finally, finite-sample simulation evidence shows that two versions of our fully efficient estimator outperform the multistep linear estimators studied. The paper proceeds as follows. Section 2 presents the echelon form VARMA setup. Section 3 derives the two-step GLS estimator and gives its properties such as convergence and asymptotic normality. Section 4 provides a heuristic derivation of the three-step GLS estimator then states its convergence and asymptotic efficiency. Section 5 shows a comparative
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
71
simulation study on the finite-sample performance of competing procedures. Finally, Section 6 concludes. Proofs are given in Appendix. 2. Framework Let {yt : t ∈ Z} be a k-dimensional random process with the echelon-form VARMA representation
Φ (L)yt = µΦ + Θ (L)ut , (2.1) p¯ p¯ i j ¯ = max(p1 , . . . , pk ) given a vector of Kronecker indices where Φ (L) = Φ0 − i=1 Φi L , Θ (L) = Θ0 + j=1 Θj L , p ′ (p1 , . . . , pk ) , L denotes the lag operator, Θ0 = Φ0 , with Φ0 a lower-triangular matrix whose all diagonal elements are equal to one, µΦ = Φ (1) µy , with µy = E(yt ), and {ut : t ∈ Z} is a sequence of multivariate innovations. The echelon VARMA operators Φ (L) = [φlm (L)]l,m=1,...,k and Θ (L) = [θlm (L)]l,m=1,...,k are left coprime and satisfy a set of restrictions such that, on any given row l of Φ (L) and Θ (L), φlm (L) and θlm (L) have the same degree pl with
φlm (L)
=
1−
pl
φll,i Li if l = m,
i=1 pl
=
(2.2)
−
φlm,i Li if l ̸= m,
i=pl −plm +1
θlm (L) =
pl
θlm,j Lj with Θ0 = Φ0 ,
(2.3)
min(pl + 1, pm ) for l ≥ m, min(pl , pm ) for l < m,
(2.4)
j =0
plm
= =
for l, m = 1, . . . , k. Note that pl = pll is the number of free varying coefficients on the l-th diagonal element of Φ (L) as well the order of the polynomials on the corresponding row of Θ (L), while plm specifies the number of free coefficients in k the operator φlm (L) for l ̸= m. l=1 pl is the McMillan degree and P = [plm ]l,m=1,...,k is the matrix formed by the Kronecker
k k
k
indices. This leads to l=1 m=1 plm autoregressive (AR) and k l=1 pl MA free coefficients, respectively. For proofs on the uniqueness of the echelon form and other identification conditions, one should consult Hannan (1969b, 1970, 1976, 1979), Deistler and Hannan (1981), Hannan and Deistler (1988), and Lütkepohl (2005, Chapter 12). The process (2.1) is said to be stationary and invertible with respective pure infinite-order AR and MA representations:
Π (L)yt = µΠ + ut
and yt = µy + Ψ (L)ut ,
(2.5)
where Π (L) = Θ (L)−1 Φ (L) = Ik − τ =1 Πτ Lτ , Ψ (L) = Φ (L)−1 Θ (L) = Ik + v=1 Ψv Lv , and µΠ = Π (1) µy , if respectively det{Φ (z )} ̸= 0 and det{Θ (z )} ̸= 0 for all |z | ≤ 1(z ∈ C), with det{Π (z )} ̸= 0 and det{Ψ (z )} ̸= 0 for all |z | ≤ 1, Further, real constants C > 0 and ρ ∈ (0, 1) exist such that
∞
∥Π τ ∥ ≤ C ρ τ
and
∞
∥Ψv ∥ ≤ C ρ v ,
(2.6) τ
where ∥.∥ is Schur’s norm, i.e. ∥A∥ = tr[A A] for any matrix A. Also, let τ =0 Λτ (η)z = Θ (z ) . Then by invertibility ∥Λτ (η)∥ ≤ C ρ τ , where η is the vector of all free varying parameters implied by the echelon form, as shall be specified below. Now, set vt = yt − ut . Then the latter is uncorrelated with the error term ut since
vt = Φ0−1 µΦ +
p¯
Φi yt −i +
i=1
∞
′
2
p¯
−1
Θ j ut − j .
(2.7)
j =1
′ ¯ 0 , Φ1 , . . . , Φp¯ , Θ1 , . . . , Θp¯ , where Φ ¯ 0 = Ik − Φ0 , Also, let Xt = 1, vt′ , y′t −1 , . . . , y′t −¯p , u′t −1 , . . . , u′t −¯p and β = vec µΦ , Φ 2 be two vectors of respective sizes kh + 1 and k h + k, with h = 2p¯ + 1. Then the echelon restrictions (2.1)–(2.4) imply a unique k2 h + k by r full-rank columns matrix R formed by r selected distinct vectors from the identity matrix Ik2 h+k such that R′ R = Ir and β = Rη, where η is an r-sized vector of free varying parameters with r < k2 h + k, so that (2.1) takes the form:
yt = Xt′ ⊗ Ik Rη + ut ,
(2.8)
where Xt′ ⊗ Ik R is a k × r matrix. Further, under the assumption that the process is regular [by means, with nonsingular covariance matrix of the innovations in the Wold decomposition, so that the process is not linearly predictable and has a nonsingular instantaneous covariance matrix] with continuous distribution, the echelon form ensures that R′ Xt ⊗ Ik has a nonsingular covariance matrix, so that rank R′ ΓX ⊗ Ik R = r, where ΓX = E Xt Xt′ .
72
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
3. Generalized two-step linear estimation Let y−nT +1 , . . . , yT be a random sample of size nT + T where nT is a sequence, function of T , such that nT → ∞ as T → ∞. Further, consider the infinite-order autoregression (2.5) ‘‘truncated’’ at the lag-order nT , precisely:
yt = µΠ (nT ) +
nT
Πτ yt −τ + ut (nT ),
(3.1)
τ =1
where µΠ (nT ) and ut (nT ) stand respectively for a constant term and a compound innovation, such that µΠ (nT ) = Ik − ∞ nT τ =1 Πτ µy and ut (nT ) = τ =nT +1 Πτ yt −τ − µy + ut . The following assumptions on the VARMA innovations ut and the truncation order nT of the long autoregression are needed to establish the consistency and asymptotic distribution of the linear estimators studied below.
Assumption 3.1. The vectors ut , t ∈ Z, are independent and identically distributed (i.i.d.) with mean zero, positive definite (p.d.) covariance matrix Σu = E(ut u′t ) and continuous distribution. Assumption 3.2. There is a finite constant m4 such that E|ui,t uj,t ur ,t us,t | ≤ m4 < ∞, for all t and all 1 ≤ i, j, r , s ≤ k. Assumption 3.3. nT is a function of T such that nT → ∞ and n2T /T → 0 as T → ∞, and, for some c > 0, 0 < δ1 < 1/2 and T sufficiently large, nT ≥ cT δ1 . Assumption 3.4. nT is a function of T such that nT → ∞ and n4T /T → 0 as T → ∞, and, for some c > 0, 0 < δ2 < 1/4 and T sufficiently large, nT ≥ cT δ2 . Assumption 3.1 entails a strong VARMA process, while Assumption 3.2 ensures that the empirical autocovariances of the process have finite variances. Assumptions 3.3 and 3.4 show alternative conditions on the truncation lag nT of the first-stage long autoregression, which are required to ensure convergence and asymptotic normality of the estimators suggested. These assumptions state that nT should grow towards infinity neither too fast nor too slowly. Further, by invertibility, ∥Πτ ∥ decays ¯ at an exponential rate (see (2.6)). Hence, for some δ > 0, whenever nT = cT δ for some c > 0 and δ¯ > 0, Tδ
∞
∥Πτ ∥ → 0 as T → ∞.
(3.2)
τ =nT +1
˜ Y (nT ) Γ˜ −1 be the OLS estimator of the coefficient matrix Π (nT ) = ˜ (nT ) = µ ˜ 1 (nT ), . . . , Π ˜ nT (nT ) = W Let Π ˜ Π (nT ) , Π Y (nT )
˜ Y (nT ) = T µΠ (nT ) , Π1 , . . . , ΠnT , where W . . . , y′t −nT ]′ . Further, let
u˜ t (nT ) = yt − µ ˜ Π (nT ) −
nT
T −1
t =1
˜ τ (nT )yt −τ , Π
yt Yt (nT )′ and Γ˜ Y (nT ) = T −1
T
t =1
Yt (nT )Yt (nT )′ , with Yt (nT ) = [1, y′t −1 ,
t = 1, . . . , T ,
(3.3)
τ =1
˜ u(nT ) = T −1 t =1 u˜ t (nT )˜ut (nT )′ . Then, under be the OLS residuals of the long autoregression (3.1), and set Σ Assumptions 3.1–3.3, and (3.2), Dufour and Jouini (2010) showed that T 1/2 /nT u˜ t (nT ) − ut is stochastically bounded, uniformly in t = 1, . . . , T , that is T
u˜ t (nT ) − ut = Op nT /T 1/2 ,
uniformly in t = 1, . . . , T .
(3.4)
Then
−1 Σ ˜ u(nT ) − Σu , Σ ˜ u(n ) − Σu−1 = Op nT /T 1/2 . T
(3.5)
The asymptotic equivalence stated above suggests that we may be able to estimate consistently the VARMA parameters in (2.8) by replacing the unobserved innovations in Xt with their respective first-stage estimates. Thus, modulo some manipulations, (2.8) can equivalently be rewritten as yt = X˜ t (nT )′ ⊗ Ik Rη + et (nT ),
(3.6)
′ where X˜ t (nT ) = 1, v˜ t (nT )′ , y′t −1 , . . . , y′t −¯p , u˜ t −1 (nT )′ , . . . , u˜ t −¯p (nT )′ , v˜ t (nT ) = yt − u˜ t (nT ) and
et (nT ) = u˜ t (nT ) +
p¯ j =0
Θj ut −j − u˜ t −j (nT ) .
(3.7)
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
73
Noting that et (nT ) − u˜ t (nT ) = Op nT /T 1/2 , in view of (3.7) and (3.4), an explicit two-step (feasible) GLS estimator of η is simply
η˜ = arg min η
T
˜ X (nT ) , ˜ u−(n1 ) et (nT ) = Q˜ X (nT ) W et (nT )′ Σ T
(3.8)
t =1
− 1
′
˜ t =1 Xt (nT ) −1 X˜ t (nT )′ . In addition, let QX = R′ ΓX ⊗ Σu−1 R . Then under suitable conditions, Assumptions 3.1–3.4, and (3.2), Dufour where Q˜ X (nT ) =
˜ u−(n1 ) R R Γ˜ X (nT ) ⊗ Σ T
˜ X (nT ) = T −1 and W
T
t =1
˜ u−(n1 ) yt , with Γ˜ X (nT ) = T −1 R′ X˜ t (nT ) ⊗ Σ T
T
and Jouini (2010) have shown that
∥η˜ − η∥ = Op T −1/2
(3.9)
and d
T 1/2 η˜ − η −→ N 0, QX .
(3.10)
T →∞
Further, they suggested Q˜ X (nT ) as a consistent estimator of QX .
˜ e(nT ) = (T − p¯ )−1 Now, let Σ
T
t =¯p+1
e˜ t (nT ) = yt − X˜ t (nT )′ ⊗ Ik Rη, ˜
e˜ t (nT )˜et (nT )′ , where t = p¯ + 1, . . . , T .
(3.11)
Then, using (2.8), (3.4), (3.9) and (3.11), Dufour and Jouini (2010) showed that
e˜ t (nT ) − ut = Op nT /T 1/2 ,
uniformly in t = p¯ + 1, . . . , T .
(3.12)
Hence
−1 Σ ˜ e(n ) − Σu−1 = Op nT /T 1/2 . ˜ e(nT ) − Σu , Σ T
(3.13)
4. Asymptotic efficiency The two-step linear estimator described above is not efficient. To allow for efficiency, a further linear regression is needed. As will be shown below, the latter is achieved by exploiting the nonlinear structure of the VARMA innovations in the model parameters. Unlike Hannan and Kavalieris (1984)’s procedure which is heavy to implement, even in small systems, and whose fourth-stage (efficient) estimator does not explicitly show the echelon-form restrictions, we yield a simple and compact efficient estimator with a simple estimator of its covariance matrix. However, a brief description of its competitors is required. 4.1. Competing procedures Using our setup, we stress that running OLS on (3.6) corresponds to the third-stage and the second-stage estimation procedures of Hannan and Kavalieris (1984) and Reinsel et al. (1992), respectively. Denote by η˜ the resulting estimators and ˜ i and Θ ˜ j be the implied two-step OLS estimates of µΦ , Φi and Θj , respectively. Further, designate by u˜ t the implied let µ ˜ Φ, Φ ‘‘implicit’’ VARMA innovation estimates or residuals such that
˜ (L)yt = µ ˜ (L)˜ut , ˜Φ +Θ Φ p¯ ˜ ( L) = Φ ˜ 0 − i =1 Φ ˜ i Li and Θ ˜ (L) = pj¯=0 Θ ˜ j Lj , with Φ ˜0 = Θ ˜ 0 . Solving for u˜ t , one gets where Φ u˜ t =
∞
˜ 0 yt −τ − Λτ (η) ˜ Φ
τ =0
p¯
(4.1)
˜ i yt −i−τ − µ Φ ˜Φ ,
(4.2)
i =1
˜ (L)−1 . As suggested in the literature (see Hannan and Kavalieris, 1984 and Reinsel et al., 1992), where τ =0 Λτ (η) ˜ Lτ = Θ ˜ these implicit residuals, ut , are approximated (or filtered) with ∞
t +nT −1
εt (η) ˜ =
τ =0
˜ 0 yt −τ − Λτ (η) ˜ Φ
p¯ i =1
˜ i yt −i−τ − µ Φ ˜Φ ,
t = −nT + 1, . . . , T .
(4.3)
74
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
Hannan–Kavalieris (HK) procedure: ′ Let Vt (η) ˜ = 1, y′t − εt (η) ˜ ′ , y′t −1 , . . . , y′t −¯p , εt −1 (η) ˜ ′ , . . . , εt −¯p (η) ˜ ′ be the regressor vector based on the two-step OLS t +n −1 residuals εt (η) ˜ defined above. Also, set Wt (η) ˜ = τ =0T R′ Vt −τ (η) ˜ ⊗ Λτ (η) ˜ ′ . Then, the efficient estimator of Hannan and Kavalieris (1984) for η is
ηˆ KH = η˜ +
−1
T
˜ e−(n1 ) Wt (η) ˜ ′ Wt (η) ˜ Σ T
T
˜ e−(n1 ) εt (η), ˜ Wt (η) ˜ Σ T
(4.4)
t =−nT +1
t =−nT +1
˜ e(nT ) are the respective OLS estimators of η and Σu obtained from model (3.6). These authors have then where η˜ and Σ ˜ ¯ )−1 Tt=−nT +1+¯p ϱt (η, proposed Σϱ(η, ˜ ηˆ HK )ϱt (η, ˜ ηˆ HK )′ , where ϱt (η, ˜ ηˆ HK ) = εt (η) ˜ − Wt (η) ˜ ′ (ηˆ HK − η) ˜ , as ˜ ηˆ HK ) = (nT + T − p the fourth-stage estimator of Σu . Reinsel–Basu–Yap (RBY) procedure: Manipulating (2.1), the GLS estimator of Reinsel et al. (1992) obtains from the linear regression: yt (η) ˜ = Vt (η) ˜ ′ ⊗ Ik Rη +
p¯
˜ j ut −j + Dt (η, Θ ˜ η),
t = −nT + 1, . . . , T ,
(4.5)
j =0
p¯
p¯ ˜ ˜ − ut −j . Dropping the compound j=0 Θj − Θj εt −j (η) ′ term Dt (η, ˜ η) – considered as being negligible – from model (4.5), then setting y(η) ˜ = y−nT +1 (η) ˜ ′ , . . . , yT (η) ˜ ′ , V (η) ˜ = ˜ = pj¯=0 Lj ⊗ Θ ˜ j , where Lj stands for a (nT + T ) × (nT + T ) lag matrix which has ones on V−nT +1 (η), ˜ . . . , VT (η) ˜ and Θ where yt (η) ˜ = yt − εt (η) ˜ +
j=0
˜ j εt −j (η) Θ ˜ and Dt (η, ˜ η) =
the jth diagonal below the main diagonal and zeros elsewhere (L0 reduces to the identity matrix), we get the stacked form model
˜ u, y(η) ˜ = V (η) ˜ ′ ⊗ Ik Rη + Θ
′ ′
(4.6)
˜ u having a covariance matrix estimator Ξ˜ ε(η) ˜ In T + T where u = u′−nT +1 , . . . , uT , with Θ ˜ = Θ T −1 ′ ˜ ˜ t (η) ˜ and Θ is a k(nT + T ) × k(nT + T ) matrix based on the two-step OLS estimates. Therefore, = (nT + T ) t =−nT +1 εt (η)ε the GLS estimator of Reinsel et al. (1992) is
1 1 − 1 ˜ ⊗ Ik Ξ˜ ε(−η) ˜ ηˆ RBY = R′ V (η) ˜ ⊗ Ik Ξ˜ ε(−η) ˜ ′ ⊗ Ik R R′ V (η) ˜ y(η), ˜ V (η)
′ ˜ , where Σ ˜ ε(η) ˜ ε(η) ⊗Σ ˜ ˜ Θ
(4.7)
thus requiring the burdensome task, even in small samples, of inverting the k(nT + T ) × k(nT + T ) high-dimensional matrix ˜ ε(η) ¯ components of y(η) Ξ ˜ and p¯ columns of V (η) ˜ ˜ . An improved version of this estimator is obtained by deleting the first kp ˜ ε(η) and only retaining the k(nT + T − p¯ ) × k(nT + T − p¯ ) lower right corner block matrix of Ξ ˜ , but it still requires the systematic inversion of a large matrix. Poskitt–Salau (PS) procedure: The second-stage estimation procedure of Poskitt and Salau (1995) consists in running OLS on a variant of (3.6), precisely
v˜ t (nT ) = X˜ t (nT )′ ⊗ Ik Rη + ζt , (4.8) p¯ ′ ˜ t (nT ). Further, set v˜ (nT ) = v˜ 1 (nT )′ , . . . , v˜ T (nT )′ , X˜ (nT ) = X˜ 1 (nT ), . . . , X˜ T (nT ) where ζt = j=0 Θj ξt −j , with ξt = ut − u ′ ′ and ζ = ζ1 ′ , . . . , ζT′ , where ζ = Θ ξ and ξ = ξ1 ′ , . . . , ξT ′ . Then, the efficient GLS estimator of Poskitt and Salau (1995) is
− 1 ηˆ PS = R′ X˜ (nT ) ⊗ Ik Ξ˜ u−(n1T ) X˜ (nT )′ ⊗ Ik R R′ X˜ (nT ) ⊗ Ik Ξ˜ u−(n1T ) v˜ (nT ), (4.9) ′ ˜ u(nT ) = Θ ˜ IT ⊗ Σ ˜ u(nT ) Θ ˜ (estimating the covariance where, again, one has to invert a kT × kT high-dimensional matrix Ξ ˜ now corresponding to the OLS moving-average parameter estimates from model (4.8). An improved matrix of ζ ), with Θ version of ηˆ PS is obtained in a similar way to ηˆ RBY . 4.2. Our procedure Having shown how our setup is practical and flexible to adapt to alternative procedure, we now derive our efficient linear estimator. In view of (3.7), the two-step feasible GLS (eventually two-step OLS) residuals (3.11) are such that e˜ t (nT ) = u˜ t (nT ) +
p¯ j =0
˜ j u˜ t −j − u˜ t −j (nT ) , Θ
(4.10)
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
75
where, similarly, u˜ t are the implicit VARMA residuals or estimates of ut matching the two-step GLS (eventually OLS) estimator η˜ since (4.10) can be expressed as (4.1). Indeed, because the error terms et (nT ) in (3.7) are functions of the actual innovations ut , it follows that by estimating et (nT ) one implicitly and simultaneously estimates ut . More importantly, (4.10) reveals that these implicit estimates u˜ t are endogenous functions not only of the two-step GLS moving average coefficient ˜ j and the resulting residuals e˜ t (nT ) as well, but also of the first-stage OLS autoregressive residuals u˜ t (nT ). Hence, estimates Θ ˜ (L)−1 = ∞ using the fact that Θ ˜ Lτ , one sees that τ =0 Λτ (η) u˜ t = u˜ t (nT ) +
∞
Λτ (η) ˜ e˜ t −τ (nT ) − u˜ t −τ (nT ) .
(4.11)
τ =0
This paper proposes a new recursive filtering scheme for approximating these implicit residuals with ut (η) ˜ = u˜ t (nT ) +
t −1
Λτ (η) ˜ e˜ t −τ (nT ) − u˜ t −τ (nT ) ,
t = 1, . . . , T ,
(4.12)
τ =0
initiating with e˜ t (nT ) = u˜ t (nT ) [hence ut (η) ˜ = u˜ t (nT )] for 1 ≤ t ≤ p¯ . Precisely, our scheme describes the pointwise adjustment mechanism through which the approximate (or filtered) implicit VARMA residuals ut (η) ˜ are recursively computed around u˜ t (nT ). Corollary 4.1. Let {yt : t ∈ Z} be a k-dimensional stationary invertible stochastic process with the VARMA representation in the echelon form given by (2.1)–(2.4). Let also u˜ t be the implicit VARMA innovation estimates matching the two-step estimator η˜ , as equivalently defined in (4.2) or (4.11) but respectively approximated with (4.3) and (4.12). Then, under Assumptions 3.1–3.4,
n T u˜ t − εt (η) ˜ = Op ρ t 1/2 . ˜ = Op (ρ t +nT ) and u˜ t − ut (η)
(4.13)
T
Obviously, the recursive schemes (4.3) and (4.12) yield approximations with different (pointwise) convergence speeds towards the implicit VARMA residuals u˜ t , regardless of the persistence degree of the process and the estimation method (OLS or GLS) used for obtaining the two-step VARMA parameter estimates. However, while noting that we loose nT observations with our recursive scheme, we stress that this is compensated with the use of better initial values, namely the first-stage autoregressive residuals that we know are consistent; see (3.4). Of course, the recursive schemes above are asymptotically equivalent only when the Kronecker indices are all equal, namely when GLS reduces to OLS. Similarly, it is worth emphasizing that the VARMA innovation ut can be expressed from (3.7) as ut = u˜ t (nT ) +
∞
Λτ (η) et −τ (nT ) − u˜ t −τ (nT ) ,
(4.14)
τ =0
and then be approximated with ut (η) = u˜ t (nT ) +
t −1
Λτ (η) et −τ (nT ) − u˜ t −τ (nT ) ,
t = 1, . . . , T .
(4.15)
τ =0
T −1 ˜ u(η) Hence, ∥ut − ut (η)∥ = Op (ρ t nT /T 1/2 ), in view of (3.4). Also, let Σ ˜ ut (η) ˜ ′ . Then its rate of convergence ˜ =T t =1 ut (η) to Σu follows. Proposition 4.1. Let {yt : t ∈ Z} be a k-dimensional stationary invertible stochastic process with the VARMA representation in the echelon form given by (2.1)–(2.4). Then, under Assumptions 3.1–3.4,
−1 −1 Σ ˜ ˜ u(η) = Op T −1/2 . ˜ − Σu , Σ u(η) ˜ − Σu
(4.16)
′ Now, let Xt (η) ˜ = 1, vt (η) ˜ ′ , y′t −1 , . . . , y′t −¯p , ut −1 (η) ˜ ′ , . . . , ut −¯p (η) ˜ ′ with vt (η) ˜ = yt − ut (η) ˜ . Further, set Zt◦ (η, ˜ η) = t −1 ′ ′ ˜ ⊗ Λτ (η) . Then manipulating (4.15) and (4.12), one gets τ =0 R Xt −τ (η)
ut (η) ˜ − ut (η) = −Zt◦ (η, ˜ η)′ η˜ − η .
(4.17)
The latter expression can further be rearranged to obtain the linear regression model
ωt (η) ˜ = Zt (η) ˜ ′ η + ϵt (η, ˜ η),
(4.18)
where
′ ωt (η) ˜ = ut (η) ˜ + Zt (η) ˜ ′ η˜ and ϵt (η, ˜ η) = ut (η) + Zt (η) ˜ − Zt◦ (η, ˜ η) η˜ − η ,
(4.19)
76
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
with Zt (η) ˜ = τ =0 R Xt −τ (η) ˜ ⊗ Λτ (η) ˜ ′ . Note that (4.17) is an identity obtained by exploiting the nonlinear structure of the VARMA innovations in the model parameters. So it does not stand for a Taylor expansion. More importantly, the complex dynamic structure of the error terms ϵt (η, ˜ η) driving the process (4.18) – missed by Hannan and Kavalieris (1984) in their fourth stage – is completely specified up to the unknown parameter vector η; see (4.19). Hence, once estimated, these errors provide a closed form solution for computing accurately the approximate implicit VARMA residuals or innovation estimates matching the three-step efficient linear estimator that we shall define below. Such a result has not been established yet in the literature. (or In view of (4.17) and (4.19) (4.18) and (4.19)), one sees, by Lemma 2.2 of Kreiss and Franke (1992) and (3.9), that ϵt (η, ˜ η) − ut (η) ˜ = Op T −1/2 , which suggests obtaining a third-stage GLS (fully efficient) linear estimator of η, say ηˆ , such that
t −1
ηˆ = arg min η
where Q˜ X (η) ˜ =
′
T
1 ˜ ˜, ˜ u−(η) ˜ η) = Q˜ X (η) ϵt (η, ˜ η)′ Σ ˜ WX (η) ˜ ϵt (η,
(4.20)
t =1
T −1
T
t =1
1 ˜ u−(η) ˜ ′ Zt (η) ˜ Σ ˜ Zt (η)
−1
−1 ˜ X (η) and W ˜ = T
T
t =1
1 −1 ˜ X (η) ˜ u−(η) ˜ . Further, let Ω Zt (η) ˜ Σ ˜ = T ˜ ωt (η)
T
t =1
1 ˜ u−(η) Zt (η) ˜ Σ ˜ . Then, in view of ωt (η) ˜ (see (4.19)), ˜ ut (η)
˜ X (η) ηˆ = η˜ + Q˜ X (η) ˜ Ω ˜ .
(4.21)
Clearly, our third-stage GLS estimators are different from their competitors since alternative regressors and weighting matrix are used in their computation. Precisely, we exploit the explicit form of the second-stage regression residuals to derive a new recursive filtering scheme for approximating the implicit VARMA residuals matching the two-step estimator (see (4.12)). These well behaved approximate residuals stand for ‘‘new regressors’’ which, unlike predecessors (see (4.3)), depend on consistent (better) initial values, and explicitly take into account the truncation error of the firststage autoregression along with some adjustments with respect to the second-stage regression residuals. Finally, it is noteworthy that ηˆ is asymptotically equivalent to ML under Gaussian errors since
∂ ut (η) ∂η′ η=η˜
= −Zt (η) ˜ ′ (see (4.17)), and
that it corresponds to an iteration of the scoring algorithm starting from η˜ , in view of (4.21). Another feature characterizing the computation of our fully efficient estimators, compared to those of Hannan and Kavalieris (1984) and Poskitt and Salau (1995), with the exception of Reinsel et al. (1992), consists in using a weighting matrix exhibiting faster rate of convergence, hence better sample properties; see (3.13) and (3.5) versus Proposition 4.1. However, we stress that, although Reinsel et al. (1992) procedure’s relies on a refined weighting matrix, it still uses filtered residuals from an alternative scheme. Now, let Q˜ X◦(η) ˜ =
T −1
T
t =1
1 ◦ ˜ u−(η) Zt◦ (η, ˜ η)Σ ˜ η)′ ˜ Zt (η,
−1
E Zt Σu−1 Zt ′
and QX (η) =
−1
, with Zt =
∞
τ =0
R′ Xt −τ ⊗
Λτ (η)′ . Also, denote by ∥A∥21 the largest eigenvalue of A′ A, for any matrix A. Proposition 4.2. Let {yt : t ∈ Z} be a k-dimensional stationary invertible stochastic process with the VARMA representation in the echelon form given by (2.1)–(2.4). Then, under Assumptions 3.1–3.4,
◦ Q˜
X (η) ˜
˜ ◦ ˜ = Op T −1/2 . − QX (η) 1 , Q˜ X (η) ˜ − QX (η) 1
(4.22)
The next theorems establish the convergence and the asymptotic normality of our efficient estimator. Theorem 4.1. Let {yt : t ∈ Z} be a k-dimensional stationary invertible stochastic process with the VARMA representation in the echelon form given by (2.1)–(2.4). Then, under Assumptions 3.1–3.4,
∥ηˆ − η∥ = Op T −1/2 .
(4.23)
Theorem 4.2. Let {yt : t ∈ Z} be a k-dimensional stationary invertible stochastic process with the VARMA representation in the echelon form given by (2.1)–(2.4). Then, under Assumptions 3.1–3.4, d
T 1/2 ηˆ − η −→ N 0, QX (η) .
(4.24)
T →∞
A consistent estimator of its asymptotic covariance matrix is then
T t =1
1 ˜ u−(η) Zt (η) ˜ Σ ˜ ′ ˜ Zt (η)
− 1
. As mentioned above with
respect to (4.19), we suggest better filtering accurately, from the third-stage regression residuals ϵt (η, ˜ η) ˆ , well-behaved VARMA innovation estimates in finite samples, say ut (η) ˆ , such that:
′ ϵt (η, ˜ η) ˆ = ut (η) ˆ + Zt (η) ˜ − Zt◦ (η, ˜ η) ˆ η˜ − ηˆ ,
t = p¯ + 1, . . . , T ,
(4.25)
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
77
˜ u(η) ¯ )−1 Tt=¯p+1 ut (η) where ϵt (η, ˜ η) ˆ = ωt (η) ˜ − Zt (η) ˜ ′ ηˆ and Zt◦ (η, ˜ η) ˆ = τ =0 R Xt −τ (η) ˜ ⊗ Λτ (η) ˆ ′ . Finally, let Σ ˆ ˆ = (T − p ut (η) ˆ ′ be the resulting third-stage efficient estimator of the VARMA innovation covariance matrix Σu . Then its rate of t −1
′
convergence follows. Proposition 4.3. Let {yt : t ∈ Z} be a k-dimensional stationary invertible stochastic process with the VARMA representation in the echelon form given by (2.1)–(2.4). Then, under Assumptions 3.1–3.4,
−1/2 Σ ˜ u(η) . ˆ − Σu = Op T
(4.26)
˜ ϱ(η, ˜ u(η) To roughly show to which extent Σ ˜ ηˆ HK ) is less accurate than Σ ˆ in estimating Σu in finite samples, assume for
˜ u(η) ˜ e(nT ) . Therefore, Wt (η) simplicity that the HK procedure uses ut (η) ˜ and Σ ˜ and Σ ˜ = Zt (η), ˜ ηˆ HK = ηˆ and ˜ instead of εt (η) then ϱt (η, ˜ ηˆ HK ) = ϵt (η, ˜ η) ˆ . Hence, in view of (4.25), our well-behaved error covariance estimator suggested above is of order T −1 more accurate than the one by the fourth-stage of Hannan and Kavalieris (1984) in estimating the VARMA innovation covariance matrix Σu , since ϱt (η, ˜ ηˆ HK ) − ut = ut (η) ˆ − ut + Op T −1 ,
t = p¯ + 1, . . . , T .
(4.27)
5. Simulation study The small-sample performance of our proposed estimators is studied with Monte Carlo (MC) simulations. We only focus on the fully efficient estimates since they stand for the major contribution of the paper. Specifically, we consider a comparative study involving those suggested by Hannan and Kavalieris (1984) (HK), Reinsel et al. (1992) (RBY) and Poskitt and Salau (1995) (PS), respectively. In these simulations, the improved versions of the last two estimators described above were used. In addition, two versions of our proposed three-step estimator, say TS1 and TS2, were considered. The first one relies on the two-step GLS estimator given in (3.8), while the second is based on the two-step OLS estimator studied in Dufour and Jouini (2005). Obviously TS1 and TS2 are identical when the Kronecker indices characterizing the echelon canonical form are all equal. While noting that a two-step OLS estimation has been used for obtaining the GLS estimators of Hannan and Kavalieris (1984) and Reinsel et al. (1992), those of Poskitt and Salau (1995) were obtained by implementing their three-step procedure in full. Of course, all competing (fully efficient) estimators are asymptotically equivalent to ML estimators since they roughly correspond to one iteration of the Gauss–Newton algorithm, starting from a √ T -consistent estimator. Finally, ML estimation was omitted in the simulations for the following reasons. First, its finite sample properties have been extensively studied in the literature and were found more or less satisfactory given the model at hand. Second, besides the fact that state-space formulation based ML estimation of VARMA models still requires potentially high evaluations of the EM algorithm, more especially in big or persistent systems, it also fails to handle the parsimonious echelon form parameterization since it is not guaranteed that the resulting estimated echelon VARMA models are stationary and invertible. Third, in big systems, nonlinear estimation procedures cannot compete with linear methods from the computational cost viewpoint, especially for simulation-based inference using bootstrap methods or maximized Monte Carlo (MMC) tests (see Dufour, 2006 and Dufour and Jouini, 2006). Finally, as the paper deals with efficient linear estimation methods for VARMA models, we only studied the finite-sample performance of the main procedures compared to the ones we suggested. We simulate two bivariate stationary invertible Gaussian ARMA processes with drifts and respective Kronecker indices (1, 2) and (2, 1), using sample sizes 100 and 200. Simulation results on the bias (in absolute value) and MSE of the estimates for each procedure are given in Tables 1–4. These tables also show the MSE ratios of the alternative fully efficient estimators with respect to TS1. These results are based on 1000 replications using GAUSS random number generator. To avoid numerical problems due to initialization, extra first 100 pseudo-data were generated then discarded. Trials associated with estimates implying noninvertible VARMA processes are thrown then replaced. In all simulations, the rate of replacement did not exceed 5% in the worst case. The two-step echelon parameter estimates were obtained from models using, as regressors, autoregressive residuals associated with autoregression truncation set to the integer part of ln T then T −1/2 , since it has been recommended in the literature to choose the truncation order between these two values. This strategy has been considered to draw the effect of the first-stage autoregression lag-order choice on the finite sample properties of the echelon parameter estimates. The error covariance matrix with σ11 = 0.49, σ22 = 0.29 and σ12 = σ21 = −0.14, is used for both simulated models. The parameter values of the simulated echelon VARMA models as well as the resulting eigenvalues (describing the persistence degree of the model) are given in the tables. For a better comparison with HK and RBY procedures, the latter are finally computed after discarding the first nT values of the residuals εt (η) ˜ (namely, ε−nT +1 (η), ˜ . . . , ε0 (η) ˜ ; see (4.3)) to avoid, though partially, problems due to initialization since preliminary simulations (that we omitted) showed poor HK estimates. For both models, simulation evidence shows that, unlike TS1, TS2 and RBY methods whose respective estimates show small to moderate bias, HK and PS procedures yield estimates with substantial bias associated with relatively significant MSE for T = 100 (see upper panels of Tables 1 and 3). These biases decrease with the sample size (see Tables 1–4). It is suspected that the bias associated with PS procedure is due to the weighting matrix used in the computation of the estimates. Poskitt
78
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
Table 1 Echelon VARMA model with Kronecker indices (1, 2): a comparative simulation study on alternative fully efficient GLS estimators. nT
Coefficient
Value
Sample size T = 100 Bias
MSE
MSE ratio
TS1
TS2
HK
RBY
PS
TS1
TS2
HK
RBY
PS
TS2
HK
RBY
PS
4
µΦ ,1 µΦ ,2 φ11,1 φ12,1 φ22,1 φ21,2 φ22,2 θ11,1 θ21,1 θ12,1 θ22,1 θ21,2 θ22,2
0.000 0.000 1.200 0.240 0.400 −0.900 −0.270 0.800 0.500 0.400 0.400 0.340 0.850
0.009 0.003 0.020 0.000 0.005 0.005 0.002 0.015 0.007 0.018 0.037 0.035 0.073
0.010 0.004 0.020 0.000 0.000 0.008 0.000 0.014 0.004 0.017 0.030 0.028 0.067
0.001 0.001 0.005 0.003 0.015 0.013 0.005 0.025 0.025 0.104 0.059 0.004 0.204
0.011 0.004 0.018 0.001 0.000 0.006 0.002 0.013 0.002 0.024 0.033 0.030 0.078
0.009 0.000 0.008 0.009 0.135 0.089 0.048 0.210 0.081 0.213 0.168 0.334 0.406
0.200 0.145 0.056 0.046 0.111 0.078 0.068 0.096 0.090 0.117 0.135 0.165 0.159
0.200 0.146 0.057 0.046 0.106 0.074 0.068 0.097 0.089 0.118 0.127 0.154 0.153
0.257 0.179 0.079 0.062 0.134 0.094 0.086 0.111 0.102 0.185 0.160 0.164 0.261
0.204 0.151 0.057 0.047 0.115 0.077 0.073 0.097 0.095 0.126 0.148 0.166 0.159
0.190 0.123 0.051 0.044 0.181 0.121 0.083 0.219 0.115 0.239 0.220 0.376 0.418
1.000 1.006 1.001 1.001 0.956 0.950 1.000 1.004 0.994 1.006 0.941 0.935 0.960
1.286 1.231 1.390 1.361 1.205 1.212 1.270 1.153 1.136 1.583 1.185 0.993 1.639
1.020 1.039 1.012 1.018 1.033 0.987 1.082 1.013 1.060 1.081 1.102 1.004 0.997
0.952 0.845 0.896 0.969 1.622 1.558 1.224 2.274 1.274 2.043 1.630 2.274 2.626
10
µΦ ,1 µΦ ,2 φ11,1 φ12,1 φ22,1 φ21,2 φ22,2 θ11,1 θ21,1 θ12,1 θ22,1 θ21,2 θ22,2
0.000 0.000 1.200 0.240 0.400 −0.900 −0.270 0.800 0.500 0.400 0.400 0.340 0.850
0.002 0.005 0.024 0.000 0.006 0.014 0.003 0.012 0.000 0.011 0.028 0.019 0.050
0.003 0.005 0.025 0.000 0.003 0.012 0.002 0.009 0.001 0.009 0.028 0.022 0.051
0.002 0.002 0.024 0.003 0.003 0.005 0.001 0.024 0.005 0.025 0.032 0.039 0.076
0.002 0.004 0.023 0.000 0.003 0.007 0.000 0.000 0.006 0.003 0.026 0.026 0.033
0.002 0.003 0.022 0.000 0.020 0.003 0.007 0.053 0.009 0.040 0.056 0.066 0.090
0.206 0.169 0.062 0.046 0.105 0.081 0.064 0.100 0.090 0.122 0.126 0.160 0.147
0.208 0.168 0.062 0.047 0.102 0.078 0.064 0.100 0.090 0.124 0.122 0.156 0.148
0.266 0.210 0.079 0.073 0.110 0.088 0.075 0.102 0.096 0.161 0.127 0.161 0.154
0.211 0.169 0.064 0.050 0.111 0.079 0.070 0.103 0.095 0.135 0.132 0.172 0.153
0.199 0.155 0.060 0.046 0.107 0.078 0.064 0.106 0.094 0.133 0.127 0.189 0.156
1.009 0.994 1.009 1.007 0.968 0.960 0.996 1.006 0.999 1.019 0.974 0.974 1.005
1.291 1.240 1.284 1.564 1.046 1.082 1.174 1.020 1.066 1.321 1.006 1.004 1.043
1.025 1.003 1.038 1.068 1.056 0.983 1.090 1.031 1.051 1.109 1.047 1.077 1.037
0.969 0.919 0.971 0.988 1.018 0.970 0.990 1.062 1.036 1.092 1.007 1.185 1.054
Note—TS1 and TS2 are the three-step GLS estimators based on the two-step GLS estimator and the two-step OLS estimator, respectively. While HK, RBY and PS stand for the fully efficient GLS estimators suggested by Hannan and Kavalieris (1984), Reinsel et al. (1992) and Poskitt and Salau (1995), respectively. These estimates are obtained with 1000 replications. The eigenvalues of the model are real 0.900, 0.400 and 0.300 for the autoregressive (AR) operator, and real 0.824 and conjugate −0.188 ∓ 0.790i (0.813 in norm) for the moving-average (MA) operator. Recall that the number of eigenvalues in each of the AR and MA operators is equal to the McMillan degree.
and Salau (1995) argued that the error term in their linear regression follows a moving-average process of order p¯ , namely ¯ ζt = pj=0 Θj ξt −j with T −1 Tt=1 ξt ξt′ = Op nT /T Σu (see Hannan and Kavalieris, 1986 and Poskitt and Salau, 1995), but
˜ u(nT ) which we know is Op 1 Σu . The bias associated with HK procedure may be attributed to using more instead, they used Σ or less well behaved filtered residuals in finite samples, and a weighting matrix mismatching the one iteration of the scoring algorithm (starting from the two-step OLS estimates). In fact, they use the third-stage error covariance estimator of their procedure instead of the one associated with a faster rate of convergence, namely the one based on the filtered residuals necessary to their fourth-stage estimation. Although RBY procedure uses the same filtering scheme as the HK method, it relatively delivers estimates with satisfactory finite sample properties. This is due perhaps to using an error covariance estimator with better small-sample properties as a weighting matrix in their GLS linear regression. It is well known that approximating VARMA models having highly persistent MA operators usually requires autoregressions with many lags, and vice versa. Also, approximating nonpersistent VARMA models with autoregressions using many lags would result in estimates with higher bias and/or MSE. This exactly occurs with TS1 and TS2 procedures for the echelon VARMA model with Kronecker indices (2, 1) since the dominant eigenvalue associated with the MA operator, namely 0.681 (in norm), is not considered as persistent; see Tables 3 and 4. The same tables show that, given the sample size, increasing the lag-order nT reduces the large bias for HK and PS procedures, and yields parameter estimates with MSE decreasing for the HK procedure but with a mixed tendency for the PS method. Further, while noticing that RBY estimates are characterized with a slight increase in the bias, they exhibit MSE with a mixed tendency. Besides noting that the bias generally decreases with nT with all methods for the echelon VARMA(1, 2), we stress that the overall tendency for the MSE is not pronounced. This is due perhaps to the fact that the largest eigenvalue associated with the MA operator, namely 0.824, cannot characterize the model as less or highly persistent; see Tables 1 and 2. Simulation results show that, overall, TS1, TS2 and RBY methods outperform those of HK and PS by far. For a better idea on which procedure provides estimates with better sample properties – since we note that those of RBY procedure behave in a way quite similar to ours – we compute the ratio of the MSE of each procedure’s estimates relative to those associated with TS1. Obviously, with the exception of TS2 and PS procedures, those of RBY and (to a large extent) HK provide estimates with MSE ratios, overall, greater than unity. Note that the cases where the MSE ratios of PS estimates are less than unity are somehow (indirectly) attributed to relatively substantial biases characterizing some of the echelon parameter estimates. These cases also match some situations where the reduction in the standard deviation of the estimates outweighs the increase in the square of the associated bias. Further, the frequency
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
79
Table 2 Echelon VARMA model with Kronecker indices (1, 2): a comparative simulation study on alternative fully efficient GLS estimators. nT
Coefficient
Value
Sample size T = 200 Bias
MSE
MSE ratio
TS1
TS2
HK
RBY
PS
TS1
TS2
HK
RBY
PS
TS2
HK
RBY
PS
5
µΦ ,1 µΦ ,2 φ11,1 φ12,1 φ22,1 φ21,2 φ22,2 θ11,1 θ21,1 θ12,1 θ22,1 θ21,2 θ22,2
0.000 0.000 1.200 0.240 0.400 −0.900 −0.270 0.800 0.500 0.400 0.400 0.340 0.850
0.001 0.000 0.012 0.001 0.000 0.005 0.000 0.002 0.000 0.005 0.011 0.008 0.030
0.001 0.000 0.012 0.001 0.002 0.006 0.001 0.000 0.000 0.003 0.008 0.005 0.028
0.003 0.000 0.004 0.013 0.011 0.003 0.003 0.035 0.005 0.081 0.029 0.041 0.118
0.001 0.000 0.010 0.000 0.004 0.001 0.001 0.002 0.005 0.009 0.011 0.016 0.031
0.000 0.000 0.011 0.002 0.012 0.003 0.015 0.111 0.028 0.093 0.023 0.063 0.206
0.114 0.094 0.038 0.030 0.062 0.044 0.040 0.060 0.058 0.075 0.073 0.097 0.087
0.114 0.094 0.038 0.030 0.060 0.044 0.039 0.061 0.058 0.076 0.070 0.094 0.087
0.177 0.119 0.046 0.046 0.073 0.053 0.045 0.078 0.064 0.134 0.080 0.118 0.157
0.116 0.093 0.038 0.032 0.065 0.046 0.041 0.064 0.062 0.085 0.081 0.104 0.094
0.110 0.094 0.037 0.030 0.076 0.051 0.046 0.123 0.065 0.130 0.080 0.136 0.218
1.001 1.003 0.999 0.999 0.968 0.982 0.984 1.018 0.999 1.007 0.961 0.966 0.993
1.547 1.268 1.208 1.488 1.189 1.196 1.148 1.301 1.085 1.781 1.103 1.212 1.794
1.016 0.996 1.016 1.048 1.056 1.029 1.038 1.058 1.063 1.132 1.095 1.068 1.074
0.961 1.006 0.972 0.991 1.232 1.157 1.164 2.038 1.117 1.720 1.101 1.391 2.490
14
µΦ ,1 µΦ ,2 φ11,1 φ12,1 φ22,1 φ21,2 φ22,2 θ11,1 θ21,1 θ12,1 θ22,1 θ21,2 θ22,2
0.000 0.000 1.200 0.240 0.400 −0.900 −0.270 0.800 0.500 0.400 0.400 0.340 0.850
0.004 0.005 0.011 0.000 0.000 0.005 0.000 0.001 0.001 0.000 0.010 0.009 0.021
0.004 0.005 0.011 0.000 0.000 0.005 0.000 0.004 0.001 0.002 0.008 0.008 0.020
0.004 0.005 0.012 0.001 0.000 0.005 0.001 0.000 0.002 0.002 0.010 0.010 0.027
0.003 0.004 0.011 0.000 0.001 0.004 0.000 0.011 0.002 0.008 0.006 0.005 0.009
0.004 0.004 0.010 0.000 0.013 0.002 0.005 0.019 0.005 0.017 0.020 0.030 0.038
0.115 0.093 0.038 0.031 0.062 0.047 0.039 0.063 0.060 0.079 0.072 0.096 0.088
0.116 0.094 0.039 0.031 0.060 0.046 0.039 0.064 0.059 0.079 0.071 0.094 0.087
0.170 0.115 0.052 0.045 0.063 0.048 0.043 0.064 0.064 0.086 0.074 0.097 0.086
0.120 0.097 0.040 0.033 0.063 0.048 0.041 0.067 0.062 0.083 0.072 0.100 0.089
0.112 0.090 0.038 0.032 0.067 0.049 0.040 0.061 0.063 0.083 0.071 0.109 0.089
1.006 1.000 1.006 1.001 0.975 0.985 0.985 1.015 0.997 1.004 0.981 0.982 0.984
1.472 1.232 1.351 1.433 1.017 1.028 1.085 1.017 1.077 1.096 1.032 1.016 0.981
1.037 1.036 1.038 1.057 1.020 1.025 1.029 1.062 1.035 1.054 1.003 1.042 1.011
0.969 0.968 0.985 1.012 1.081 1.035 1.014 0.972 1.056 1.058 0.991 1.138 1.008
Note—TS1 and TS2 are the three-step GLS estimators based on the two-step GLS estimator and the two-step OLS estimator, respectively. While HK, RBY and PS stand for the fully efficient GLS estimators suggested by Hannan and Kavalieris (1984), Reinsel et al. (1992) and Poskitt and Salau (1995), respectively. These estimates are obtained with 1000 replications. The eigenvalues of the model are real 0.900, 0.400 and 0.300 for the autoregressive (AR) operator, and real 0.824 and conjugate −0.188 ∓ 0.790i (0.813 in norm) for the moving-average (MA) operator. Recall that the number of eigenvalues in each of the AR and MA operators is equal to the McMillan degree.
of these below-unity ratios is generally increasing with nT and decreasing with the sample size. Finally, it is noteworthy that, while TS2 generally dominates RBY, TS1 has a slight advantage over TS2. So, choosing either TS1 or TS2 would have no significant effect on the small-sample behavior of the resulting echelon VARMA parameter estimates for the models studied. 6. Conclusion This paper proposes a new three-step linear estimation procedure for stationary invertible echelon VARMA models. It can be extended to VARMAX or integrated and cointegrated VARMA models as well. The estimation method focuses on the echelon form parameterization as it tends to deliver relatively parsimonious models, but may easily adapt to other parameterizations such as the final equations form. Our setup provides simplified and practical echelon parameter estimates that are easier to obtain than those of Hannan and Kavalieris (1984), Reinsel et al. (1992), and Poskitt and Salau (1995). We extend the results of Dufour and Jouini (2005) to the two-step GLS estimator and show its consistency and asymptotic normality with strong innovations. Exploiting the explicit form of the second-stage regression residuals, we propose a new recursive filtering scheme based on consistent (hence better) initial values for obtaining well-behaved pseudo-regressors necessary to our third-stage GLS (fully efficient) estimation. These filtered residuals which approximate the implicit VARMA innovation estimates matching the two-step linear estimator, are functions of the first-stage autoregression residuals and the second-stage regression residuals as well. So, they take into account the truncation error associated with the long-autoregression used in the first-stage, along with some adjustments involving the first two-step regression residuals. Besides this novelty, our third-stage linear regression is derived by exploiting the nonlinear structure of the VARMA innovations in the model parameters without using Taylor expansion. As such, the resulting three-step GLS estimator, for which we establish its consistency and asymptotic normality with strong innovations, then show its asymptotic equivalence to ML (hence efficiency) under Gaussian innovations, provides an intuitive interpretation of nonlinear estimation methods such as nonlinear GLS and ML. Although our three-step linear estimation procedure is asymptotically equivalent to those by Hannan and Kavalieris (1984), Reinsel et al. (1992) and Poskitt and Salau (1995), it is computationally much simpler relatively. In addition, the asymptotic covariance estimators we gave for the second and third-stage echelon VARMA parameter estimators as well, are simple and more practical for inference, especially in the context of simulation-based techniques such as bootstrap methods or MMC tests. Finally, by examining the complex dynamic structure of the third-stage regression residuals, we provide an efficient estimator of the
80
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
Table 3 Echelon VARMA model with Kronecker indices (2, 1): a comparative simulation study on alternative fully efficient GLS estimators. nT
Coefficient
Value
Sample size T = 100 Bias
MSE
MSE ratio
TS1
TS2
HK
RBY
PS
TS1
TS2
HK
RBY
PS
TS2
HK
RBY
PS
4
µΦ ,1 µΦ ,2 φ21,0 φ11,1 φ21,1 φ22,1 φ11,2 φ12,2 θ11,1 θ21,1 θ12,1 θ22,1 θ11,2 θ12,2
0.000 0.000 0.500 1.800 −0.400 0.800 −0.360 −0.900 0.330 −0.180 −0.200 −0.400 −0.200 0.920
0.001 0.004 0.003 0.002 0.037 0.064 0.005 0.012 0.055 0.016 0.021 0.072 0.061 0.024
0.001 0.004 0.004 0.001 0.041 0.069 0.007 0.015 0.055 0.016 0.022 0.080 0.064 0.032
0.001 0.006 0.005 0.030 0.016 0.038 0.157 0.251 0.024 0.085 0.066 0.126 0.098 0.334
0.000 0.005 0.005 0.000 0.045 0.075 0.011 0.021 0.043 0.000 0.014 0.071 0.055 0.015
0.001 0.001 0.003 0.010 0.038 0.067 0.053 0.083 0.118 0.087 0.111 0.188 0.070 0.191
0.158 0.188 0.033 0.034 0.096 0.144 0.111 0.169 0.130 0.108 0.141 0.176 0.138 0.205
0.159 0.189 0.033 0.034 0.100 0.149 0.112 0.169 0.131 0.109 0.144 0.184 0.140 0.210
0.237 0.199 0.062 0.063 0.179 0.252 0.246 0.382 0.214 0.209 0.183 0.336 0.255 0.473
0.166 0.190 0.036 0.039 0.112 0.166 0.130 0.196 0.143 0.128 0.154 0.202 0.158 0.243
0.164 0.180 0.033 0.034 0.095 0.142 0.116 0.176 0.160 0.132 0.169 0.233 0.129 0.253
1.007 1.006 1.020 1.009 1.033 1.029 1.007 1.002 1.012 1.008 1.019 1.043 1.014 1.027
1.503 1.059 1.873 1.817 1.852 1.746 2.212 2.261 1.651 1.924 1.297 1.905 1.849 2.307
1.053 1.015 1.093 1.134 1.155 1.148 1.169 1.163 1.105 1.179 1.090 1.146 1.145 1.185
1.040 0.958 1.004 1.001 0.982 0.982 1.042 1.042 1.237 1.213 1.201 1.320 0.940 1.234
10
µΦ ,1 µΦ ,2 φ21,0 φ11,1 φ21,1 φ22,1 φ11,2 φ12,2 θ11,1 θ21,1 θ12,1 θ22,1 θ11,2 θ12,2
0.000 0.000 0.500 1.800 −0.400 0.800 −0.360 −0.900 0.330 −0.180 −0.200 −0.400 −0.200 0.920
0.000 0.000 0.000 0.001 0.043 0.081 0.022 0.046 0.061 0.013 0.031 0.095 0.062 0.071
0.000 0.000 0.001 0.002 0.047 0.086 0.020 0.043 0.061 0.013 0.030 0.100 0.063 0.073
0.005 0.002 0.000 0.000 0.039 0.076 0.034 0.066 0.051 0.003 0.032 0.097 0.040 0.107
0.000 0.000 0.001 0.005 0.047 0.085 0.011 0.031 0.052 0.007 0.025 0.086 0.068 0.043
0.001 0.001 0.006 0.005 0.030 0.069 0.056 0.098 0.075 0.000 0.045 0.089 0.034 0.122
0.173 0.208 0.040 0.038 0.116 0.172 0.115 0.173 0.139 0.123 0.148 0.213 0.143 0.226
0.173 0.208 0.040 0.040 0.118 0.175 0.119 0.176 0.141 0.123 0.152 0.219 0.148 0.239
0.198 0.217 0.048 0.046 0.142 0.208 0.145 0.218 0.181 0.162 0.168 0.258 0.187 0.265
0.175 0.209 0.041 0.044 0.121 0.179 0.128 0.187 0.146 0.133 0.157 0.224 0.158 0.248
0.183 0.208 0.041 0.039 0.112 0.169 0.127 0.195 0.151 0.123 0.158 0.213 0.132 0.237
1.002 0.999 1.005 1.040 1.018 1.017 1.031 1.021 1.015 0.993 1.025 1.024 1.035 1.058
1.143 1.042 1.190 1.197 1.220 1.208 1.259 1.260 1.302 1.313 1.132 1.208 1.311 1.170
1.011 1.004 1.011 1.139 1.040 1.043 1.112 1.082 1.049 1.076 1.062 1.049 1.103 1.095
1.060 1.001 1.029 1.028 0.966 0.984 1.103 1.126 1.084 0.994 1.064 0.997 0.926 1.048
Note—TS1 and TS2 are the three-step GLS estimators based on the two-step GLS estimator and the two-step OLS estimator, respectively. While HK, RBY and PS stand for the fully efficient GLS estimators suggested by Hannan and Kavalieris (1984), Reinsel et al. (1992) and Poskitt and Salau (1995), respectively. These estimates are obtained with 1000 replications. The eigenvalues of the model are real 0.800 and a double root 0.900 for the autoregressive (AR) operator, and real −0.530 and conjugate −0.350 ∓ 0.584i (0.681 in norm) for the moving-average (MA) operator. Recall that the number of eigenvalues in each of the AR and MA operators is equal to the McMillan degree.
covariance matrix of the VARMA innovations, which is of order T −1 more accurate than the one by the fourth-stage of Hannan and Kavalieris (1984). The small-sample performance of our efficient linear estimators is studied compared to competing ones, namely those of Hannan and Kavalieris (1984), Reinsel et al. (1992), and Poskitt and Salau (1995). Simulation evidence shows that, in most cases, our fully efficient estimators outperform their competitors in terms of bias and MSE for the models studied. It also stresses the sensitivity of the small-sample properties of the echelon VARMA parameter estimates to the truncationorder of the first-stage autoregression approximating the true innovations. This suggests that further investigation should be made in this way for developing efficient model selection procedures to estimate accurately the autoregression truncationlag in finite samples. Indeed, such a truncation may severely affect, through the echelon VARMA parameter estimates, the finite-sample behavior of the resulting high dynamics or smooth functions of the VARMA slope parameters and innovation variances, such as impulse responses, error variance decomposition, predictability measures or long-term forecasts, usually subject of interest in most applied work. Acknowledgments The authors thank Vadim Marmer, Peter Phillips, two anonymous referees, an anonymous Associate Editor and the Editor Stanley Azen for several useful comments. This work was supported by the Canada Research Chair Program (Chair in Econometrics, Université de Montréal), the Alexander-von-Humboldt Foundation (Germany), the Canadian Network of Centres of Excellence [program on Mathematics of Information Technology and Complex Systems (MITACS)], the Canada Council for the Arts (Killam Fellowship), the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, and the Fonds FCAR (Government of Québec). Appendix. Proofs
˜ (¯p) = [−µ ˜ 0 , −Φ ˜ 1 , . . . , −Φ ¯ p¯ ], Φ (¯p) = [−µΦ , Φ0 , −Φ1 , . . . , −Φp¯ ] and finally Yt (¯p) = Proof of Corollary 4.1. Let Φ ˜ Φ, Φ [1, y′t , y′t −1 , . . . , y′t −¯p ]′ . Then, in view of (4.2), (4.3), (3.9) and using the multivariate version of Lemma 2.2 of Kreiss and
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
81
Table 4 Echelon VARMA model with Kronecker indices (2, 1): a comparative simulation study on alternative fully efficient GLS estimators. nT
Coefficient
Sample size T = 200
Value
Bias
MSE
MSE ratio
TS1
TS2
HK
RBY
PS
TS1
TS2
HK
RBY
PS
TS2
HK
RBY
PS
5
µΦ ,1 µΦ ,2 φ21,0 φ11,1 φ21,1 φ22,1 φ11,2 φ12,2 θ11,1 θ21,1 θ12,1 θ22,1 θ11,2 θ12,2
0.000 0.000 0.500 1.800 −0.400 0.800 −0.360 −0.900 0.330 −0.180 −0.200 −0.400 −0.200 0.920
0.000 0.000 0.001 0.002 0.016 0.028 0.000 0.002 0.025 0.009 0.005 0.025 0.026 0.004
0.000 0.000 0.001 0.001 0.017 0.030 0.000 0.002 0.026 0.010 0.006 0.029 0.028 0.006
0.002 0.000 0.002 0.016 0.012 0.027 0.076 0.119 0.032 0.008 0.053 0.103 0.000 0.192
0.000 0.000 0.001 0.001 0.017 0.030 0.001 0.005 0.019 0.003 0.000 0.021 0.029 0.000
0.000 0.000 0.001 0.004 0.016 0.027 0.025 0.040 0.051 0.052 0.047 0.089 0.037 0.105
0.078 0.083 0.019 0.023 0.059 0.087 0.073 0.109 0.080 0.069 0.095 0.107 0.082 0.136
0.079 0.083 0.019 0.023 0.060 0.088 0.073 0.109 0.081 0.070 0.096 0.109 0.084 0.138
0.103 0.084 0.023 0.035 0.067 0.099 0.136 0.208 0.097 0.099 0.113 0.162 0.108 0.275
0.080 0.083 0.020 0.025 0.062 0.092 0.081 0.121 0.089 0.077 0.104 0.119 0.094 0.155
0.081 0.081 0.020 0.023 0.062 0.092 0.075 0.112 0.090 0.082 0.105 0.136 0.085 0.161
1.002 1.003 1.004 1.003 1.013 1.012 1.007 1.007 1.009 1.011 1.014 1.022 1.015 1.013
1.306 1.013 1.185 1.498 1.132 1.132 1.858 1.910 1.214 1.421 1.194 1.515 1.308 2.013
1.019 0.999 1.031 1.080 1.055 1.054 1.112 1.110 1.103 1.107 1.094 1.119 1.140 1.133
1.027 0.978 1.046 1.008 1.058 1.050 1.027 1.026 1.123 1.190 1.110 1.277 1.032 1.179
14
µΦ ,1 µΦ ,2 φ21,0 φ11,1 φ21,1 φ22,1 φ11,2 φ12,2 θ11,1 θ21,1 θ12,1 θ22,1 θ11,2 θ12,2
0.000 0.000 0.500 1.800 −0.400 0.800 −0.360 −0.900 0.330 −0.180 −0.200 −0.400 −0.200 0.920
0.003 0.003 0.001 0.000 0.016 0.028 0.007 0.015 0.022 0.007 0.011 0.033 0.026 0.020
0.002 0.003 0.001 0.001 0.018 0.031 0.006 0.013 0.021 0.008 0.011 0.038 0.027 0.021
0.003 0.003 0.001 0.001 0.017 0.029 0.004 0.010 0.021 0.009 0.012 0.037 0.028 0.020
0.002 0.004 0.001 0.001 0.018 0.031 0.002 0.008 0.020 0.008 0.010 0.033 0.031 0.006
0.003 0.004 0.000 0.005 0.014 0.029 0.033 0.056 0.028 0.003 0.021 0.045 0.012 0.063
0.082 0.089 0.021 0.024 0.062 0.090 0.075 0.112 0.081 0.072 0.096 0.117 0.083 0.140
0.082 0.089 0.021 0.025 0.062 0.091 0.077 0.114 0.082 0.072 0.098 0.121 0.084 0.145
0.082 0.089 0.021 0.025 0.062 0.090 0.077 0.115 0.082 0.074 0.097 0.119 0.085 0.144
0.081 0.089 0.021 0.025 0.062 0.091 0.078 0.116 0.082 0.074 0.099 0.120 0.088 0.147
0.084 0.090 0.023 0.025 0.067 0.099 0.083 0.125 0.085 0.076 0.102 0.126 0.080 0.150
1.005 1.005 1.003 1.017 1.011 1.011 1.019 1.018 1.010 0.998 1.021 1.029 1.014 1.029
1.003 1.001 1.005 1.027 1.004 1.001 1.027 1.025 1.011 1.020 1.018 1.016 1.021 1.028
0.999 1.005 1.002 1.027 1.010 1.010 1.033 1.031 1.015 1.017 1.033 1.025 1.056 1.043
1.028 1.015 1.057 1.038 1.079 1.100 1.098 1.113 1.050 1.045 1.066 1.078 0.966 1.065
Note—TS1 and TS2 are the three-step GLS estimators based on the two-step GLS estimator and the two-step OLS estimator, respectively. While HK, RBY and PS stand for the fully efficient GLS estimators suggested by Hannan and Kavalieris (1984), Reinsel et al. (1992) and Poskitt and Salau (1995), respectively. These estimates are obtained with 1000 replications. The eigenvalues of the model are real 0.800 and a double root 0.900 for the autoregressive (AR) operator, and real −0.530 and conjugate −0.350 ∓ 0.584i (0.681 in norm) for the moving-average (MA) operator. Recall that the number of eigenvalues in each of the AR and MA operators is equal to the McMillan degree.
Franke (1992), we show, for t = −nT + 1, . . . , T , that ∞ Λτ (η) + Λτ (η) u˜ t − εt (η) ˜ (¯p) − Φ (¯p) Yt −τ (¯p) ˜ − Λτ (η) Φ (¯p) + Φ ˜ ≤ τ =t +n T
= Op (ρ t +nT ).
(A.1)
On the other hand, using (4.11), (4.12), (3.9) and Lemma 2.2 of Kreiss and Franke (1992), we show, for t = 1, . . . , T , that ∞ u˜ t − ut (η) Λτ (η) + Λτ (η) ˜ ≤ ˜ − Λτ (η) e˜ t −τ (nT ) − ut −τ + ut −τ − u˜ t −τ (nT ) τ =t
n T = Op ρ t 1/2 ,
(A.2)
T
since ∥ut −τ − u˜ t −τ (nT )∥ and ∥˜et −τ (nT ) − ut −τ ∥ are both Op nT /T 1/2 in view of (3.4) and (3.12), respectively.
Proof of Proposition 4.1. By the triangular inequality, T 1 Σ ut (η) ˜ u(η) ˜ + ut ut (η) ˜ − ut + Op T −1/2 , ˜ − ut ut (η) ˜ − Σu ≤
(A.3)
T t =1
where ut (η) ˜ − ut ≤ ut (η) ˜ − ut (η) + ut (η) − ut with ut (η) − ut = Op ρ t nT /T 1/2 . Using (4.15) and (4.12),
t −1 ut (η) Λτ (η) ˜ − ut (η) ≤ ˜ e˜ t −τ (nT ) − et −τ (nT ) + Λτ (η) ˜ − Λτ (η) et −τ (nT ) − u˜ t −τ (nT ) .
τ =0
(A.4)
82
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
On substituting et −τ (nT ) and e˜ t −τ (nT ) with their expressions in (3.6) and (3.11), and using (3.4), (3.9) and Lemma 2.2 of Kreiss and Franke (1992), we have: t −1 t −1 Λτ (η) Λτ (η) ˜ e˜ t −τ (nT ) − et −τ (nT ) ≤ k1/2 ˜ X˜ t −τ (nT ) R η˜ − η = Op T −1/2 ,
τ =0
(A.5)
τ =0
p¯ t −1 t −1 Λτ (η) Λτ (η) ˜ − Λτ (η) et −τ (nT ) − u˜ t −τ (nT ) ≤ ˜ − Λτ (η) Θj ut −τ −j − u˜ t −τ −j (nT )
τ =0
τ =0 j =0
= Op
n T .
(A.6)
T
Hence, ut (η) ˜ − ut (η) = Op T −1/2 and then ut (η) ˜ − ut = Op T −1/2 + Op ρ t nT /T 1/2 , for t = 1, . . . , T . Thus, −1 −1 Σ ˜ ˜ u(η) = Op T −1/2 . ˜ − Σu , Σ u(η) ˜ − Σu 1 Proof of Proposition 4.2. Note that QX−(η) is p.d. by definition and let Q¯ X (η) =
T 1 T
t =1
Zt Σu−1 Zt ′
− 1
(A.7)
. Then
◦−1 ◦−1 ◦−1 −1 −1 −1 Q˜ ˜ ˜ ¯ −1 ¯ −1 (A.8) X (η) ˜ − QX (η) 1 ≤ QX (η) ˜ − QX (η) ≤ QX (η) ˜ − QX (η) + QX (η) − QX (η) , −1 1 where Q¯ X (η) − QX−(η) = T1 Tt=1 Zt Σu−1 Zt ′ − E Zt Σu−1 Zt ′ = Op T −1/2 . Further, ◦−1 Q˜ ¯ −1 (A.9) X (η) ˜ − QX (η) ≤ Q1 + Q2 + Q3 , ′ T 1 −1 ˜ u−(η) where, specifically, Q1 = T −1 t =1 Zt Σu−1 Zt◦ (η, ˜ η) − Zt , Q2 = T −1 Tt=1 Zt Σ Zt◦ (η, ˜ η)′ and, finally, Q3 = ˜ − Σu T T ˜ −1˜ Zt◦ (η, T −1 t =1 Zt◦ (η, ˜ η) − Zt Σ ˜ η)′ . In particular, Q1 ≤ T −1 t =1 Zt Σu−1 Zt◦ (η, ˜ η) − Zt , where, by invertibility u(η) 2 of the VARMA process, EZt = O(1). Further, t −1 ∞ ◦ ′ ′ Z (η, ≤ R Xt −τ (η) ˜ − Xt −τ ⊗ Λτ (η) + X ⊗ Λτ (η) , (A.10) t ˜ η) − Zt τ =0 τ =t t −τ with
∞ ∞ ∞ 2 ′ Xt −τ ⊗ Λτ (η) ≤ tr ΓX (τ1 − τ2 ) Λτ1 (η) Λτ2 (η) = O(ρ 2t ), E τ =t τ =t τ =t
(A.11)
t −1 t −1 ′ Xt −τ (η) Xt −τ (η) ˜ − Xt −τ ⊗ Λτ (η) ≤ ˜ − Xt −τ Λτ (η), τ =0 τ =0
(A.12)
1
2
and
2
2 p¯ ˜ − ut −j−τ . Hence Zt◦ (η, ˜ η) − Zt = Op T −1/2 + Op (ρ t ) and then Q1 = j=0 ut −j−τ (η) Op T −1/2 . Using Proposition 4.1, we also show that Q2 and Q3 are both Op T −1/2 , since Zt◦ (η, ˜ η) is Op (1). Hence ◦−1 Q˜ − Q¯ −1 , Q˜ ◦−1 − Q −1 and Q˜ ◦−1 − Q −1 are all Op T −1/2 . Finally,
with Xt −τ (η) ˜ − Xt −τ =
X (η)
X (η) ˜
X (η)
X (η) ˜
X (η) ˜
X (η) 1
−1/2 . X (η) ˜ − QX (η) 1 = Op T
◦ Q˜
(A.13)
On the other hand, T ◦ −1 −1 1 ◦ ◦ Zt (η) Zt (η) + Z (η, Zt (η) , Q˜ ˜ ◦−1 ≤ Σ ˜ ˜ − Z ( η, ˜ η) ˜ ˜ η) ˜ − Z ( η, ˜ η) − Q t t t X (η) ˜ X (η) ˜ 1 u(η) ˜
T t =1
(A.14)
1 ˜ u−(η) where, by Proposition 4.1, Σ = Op (1). Further, by Lemma 2.2 of Kreiss and Franke (1992), ˜
t −1 Zt (η) Xt −τ (η) ˜ − Z ◦ (η, ˜ η) ≤ R ˜ − Xt −τ + Xt −τ Λτ (η) ˜ − Λτ (η) = Op T −1/2 . t
τ =0
(A.15)
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
83
˜ ◦−1 and Q˜ −1 − Q˜ ◦−1 are both Op T −1/2 . Then, Zt (η) ˜ ≤ Zt (η) ˜ − Zt◦ (η, ˜ η) + Zt◦ (η, ˜ η) = Op (1). Hence Q˜ X−(1η) X (η) ˜ 1 X (η) ˜ ˜ ˜ − QX (η)
Therefore,
Q˜ X (η) ˜ ◦ ˜ = Op T −1/2 . ˜ − QX (η) 1
(A.16)
˜◦ Ω ˜ X (η) ˜ X•(η) ˜ ◦ ˜ , where Ω ˜ X◦(η) Proof of Theorem 4.1. Note that (4.21) can be rewritten as ηˆ − η = Q˜ X (η) ˜ Ω ˜ + QX (η) ˜ ˜ − ΩX (η) ˜ = T T T −1 −1 • −1 −1 −1 ◦ −1 ◦ ˜ ˜ ˜ ˜ and ΩX (η) ˜ η)Σu(η) T ˜ η)Σu(η) t =1 Zt Σu ut . Then, t =1 Zt (η, t =1 Zt (η, ˜ ut (η). In addition, let ΩX (η) = T ˜ ut (η) ˜ = T by the triangular inequality,
• • ηˆ − η ≤ QX (η) ΩX (η) + Q˜ ◦ − QX (η) Ω + QX (η) Ω ˜ X (η) ˜ X (η) X (η) ˜ ˜ ˜ − ΩX (η) 1 1 1 + Q˜ ◦ Ω ˜ ◦ ˜ Ω ˜ X (η) ˜ X (η) ˜ X◦(η) + Q˜ X (η) ˜ − QX (η) ˜ ˜ −Ω X (η) ˜ ˜ , 1
1
(A.17)
˜ ◦ are both Op T −1/2 . In addition, where QX (η) 1 = Op (1), ΩX (η) = Op T −1/2 , while Q˜ X◦(η) − QX (η) 1 and Q˜ X (η) ˜ − QX (η) ˜ ˜ 1 −1 T T T 1 −1 ˜ u(η) ˜ u−(η) ut (η) and S3 = T −1 t =1 Zt◦ (η, ˜ η) − Zt Σ set S1 = T −1 t =1 Zt Σu−1 ut (η) − ut , S2 = T −1 t =1 Zt Σ ˜ ut (η). ˜ − Σu Then
− ΩX (η) ≤ S1 + S2 + S3 . Using the fact that ∥ut − ut (η)∥ = Op ρ t nT /T 1/2 and that vec B = B, we show that • Ω ˜
X (η) ˜
T ∞ 1
ES1 ≤ R
T t =1 τ =0
n 2 1/2 2 1/2 T Λτ (η) Σu−1 Eut (η) − ut = O 3/2 . EXt −τ T
(A.18)
(A.19)
Moreover,
T T 1 1 −1 − 1 − 1 − 1 S2 ≤ ˜ ˜ − Σu ˜ ˜ − Σu u t Z Σ ut (η) − ut + Z Σ , T t =1 t u(η) T t =1 t u(η) T
where, as in (A.19), T1
˜ −1˜ − Σu−1 t =1 Zt Σu(η)
(A.20)
ut (η) − ut = Op nT /T 2 . Further, using Proposition 4.1,
∞ T T 1 1 − 1 − 1 − 1 − 1 ′ Λτ (η) ˜ u(η) ˜ u(η) − Σu Zt Σ − Σu u t ≤ Σ ut Xt −τ = Op T −1 , ˜ ˜ T t =1 τ =0 T t =1 −1/2 T ′ , by the VARMA structure of yt . Hence, S2 = Op T −1 . Finally, since T1 t =1 ut Xt −τ = Op T 1 2 S3 ≤ Ω + Ω , Z (η) ˜ Z (η) ˜ −1 −1 T ◦ T 2 −1 −1 ◦ ˜ u(η) ˜ u(η) ˜ η) − Zt Σ where ΩZ1(η) ˜ η) − Zt Σ t =1 Zt (η, t =1 Zt (η, ˜ =T ˜ ut . Also, ˜ =T ˜ ut (η) − ut and ΩZ (η) 1 11 12 13 Ω ≤ Ω + Ω + Ω , Z (η) ˜ Z (η) ˜ Z (η) ˜ Z (η) ˜
with
(A.21)
(A.22)
(A.23)
ΩZ11(η) ˜ =
T ∞ −1 1 ′ ˜ u(η) R Xt −τ ⊗ Λτ (η)′ Σ ˜ ut (η) − ut , T t =1 τ =t
(A.24)
ΩZ12(η) ˜ =
T t −1 1 ′ 1 ˜ u−(η) R Xt −τ (η) − Xt −τ ⊗ Λτ (η)′ Σ ˜ ut (η) − ut , T t =1 τ =0
(A.25)
ΩZ13(η) ˜ =
T t −1 1 ′ 1 ˜ u−(η) R Xt −τ (η) ˜ − Xt −τ (η) ⊗ Λτ (η)′ Σ ˜ ut (η) − ut , T t =1 τ =0
(A.26)
where Xt (η) = 1, vt (η)′ , y′t −1 , . . . , y′t −¯p , ut −1 (η)′ , . . . , ut −¯p (η)′
′
and vt (η) = yt − ut (η). Similarly, we have
T ∞ 11 −1 1 ′ Λτ (η) ut (η) − ut X = Op nT , Ω ≤ R Σ ˜ u(η) t −τ Z (η) ˜ ˜ 3/2 T t =1 τ =t
T
(A.27)
84
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
T 1
since E
∞ ut (η) − ut X ′ = O nT /T 3/2 . Further, t −τ t =1 τ =t Λτ (η)
T
1/2 1/2 T t −1 t −1 12 −1 1 2 2 ut (η) − ut Ω ≤ R Σ Λτ (η) Xt −τ (η) − Xt −τ ˜ u(η) , Z (η) ˜ ˜ T t =1
τ =0
(A.28)
τ =0
2 p¯ , with Eut −j−τ (η) − ut −j−τ = O ρ t −j−τ nT /T 1/2 . Hence, j=0 ut −j−τ (η) − ut −j−τ 2 Xt −τ (η) − Xt −τ = Op ρ 2(t −τ ) n2 /T and then t −1 Xt −τ (η) − Xt −τ 2 = Op n2 /T . Therefore, Ω 12 is Op n2 /T 2 . On T T T τ =0 Z (η) ˜ 2 where Xt −τ (η) − Xt −τ =
the other hand,
T t −1 13 −1 1 Ω ≤ R Σ Λτ (η) ut (η) − ut Xt −τ (η) ˜ u(η) ˜ − Xt −τ (η) , Z (η) ˜ ˜ T t =1
(A.29)
τ =0
2 2 p¯ = Op nT /T 2 , since ut (η)− ut = ˜ u (η) . Therefore, we get ΩZ13(η) j=0 ut −j−τ (η)− ˜ −1/2 t −j−τ 1 = Op nT /T 3/2 . Further, . Hence, ΩZ (η) and ut (η) ˜ − ut (η) = Op T ˜
with Xt −τ (η)− ˜ Xt −τ (η) =
Op ρ t nT /T
1/2
2 21 22 23 Ω ≤ Ω + Ω + Ω , Z (η) ˜ Z (η) ˜ Z (η) ˜ Z (η) ˜
with
(A.30)
ΩZ21(η) ˜ =
T ∞ −1 1 ′ ˜ u(η) R Xt −τ ⊗ Λτ (η)′ Σ ˜ ut , T t =1 τ =t
(A.31)
ΩZ22(η) ˜ =
T t −1 1 ′ 1 ˜ u−(η) R Xt −τ (η) − Xt −τ ⊗ Λτ (η)′ Σ ˜ ut , T t =1 τ =0
(A.32)
ΩZ23(η) ˜ =
T t −1 1 ′ 1 ˜ u−(η) R Xt −τ (η) ˜ − Xt −τ (η) ⊗ Λτ (η)′ Σ ˜ ut . T t =1 τ =0
(A.33)
Similarly,
T ∞ ′ 21 −1 1 Λτ (η) ut X = Op T −1 , Ω ≤ R Σ ˜ u(η) t −τ Z (η) ˜ ˜
(A.34)
T t =1 τ =t
T 1
since E
T
t =1
′ ∞ ut X = O T −1 . Hence Ω 21 = Op T −1 . Further, t −τ τ =t Λτ (η) Z (η) ˜
T −1 T −τ n −1 22 −1 1 ′ T Λτ (η) ut +τ Xt (η) − Xt = R Σ Ω ≤ R Σ ˜ u(η) ˜ u(η) , Z (η) ˜ ˜ ∆ = Op ˜ 3/2 T τ =0 t =1
since E ∆ ≤
1 T
T −1 T −τ τ =0
t =1
T
2 1/2 2 1/2 Λτ (η) Eut +τ EXt (η) − Xt = O nT /T 3/2 . Further,
T −1 T 23 −1 ′ 1 Ω ≤ R Σ ˜ u(η) Λτ (η) ut Xt −τ (η) ˜ − Xt −τ (η) , Z (η) ˜ ˜ T τ =0 t =τ +1 T
where T1
t =τ +1
′ 2
ut Xt −τ (η) ˜ − Xt −τ (η) =
(A.35)
(A.36)
′ p¯ 1 T 2 ˜ − ut −j−τ (η) , with j=0 T t =τ +1 ut ut −j−τ (η)
T 1 ′ u u (η) ˜ − ut −τ (η) ≤ ∆1 + ∆2 , T t =τ +1 t t −τ
(A.37)
where
T t −τ −1 T −τ 1 −1 ′ ′ ∆1 = ut et −τ −ν (nT ) − u˜ t −τ −ν (nT ) Λν (η) ˜ − Λν (η) ≤ ∆11 Λν (η) ˜ − Λν (η), T t =τ +1 ν=0 ν=0 T t −τ −1 T −τ 1 −1 ′ ∆2 = ut e˜ t −τ −ν (nT ) − et −τ −ν (nT ) Λν (η) ˜ ′ ≤ ∆22 Λν (η) ˜ , T t =τ +1 ν=0 ν=0
(A.38)
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
∆11
∆22
1 = T 1 = T
85
T n ′ T , ut et −τ −ν (nT ) − u˜ t −τ −ν (nT ) = Op T t =τ +ν+1 T ′ ′ ˜ ut [η − η] ˜ R Xt −τ −ν (nT ) ⊗ Ik = Op T −1 . t =τ +ν+1
(A.39)
(A.40)
T
Therefore, using Lemma 2.2 of Kreiss and Franke (1992), ∆1 = Op nT /T 3/2 and ∆2 = Op T −1 . It follows that, T1
t =τ +1
′ ′ are all Op T −1 . Hence Ω 2 and S3 are both ut ut −τ (η) ˜ − ut −τ (η) , T1 Tt=τ +1 ut Xt −τ (η) ˜ − Xt −τ (η) and ΩZ23(η) ˜ Z (η) ˜ −1 ˜ X•(η) . Likewise, Op T −1 . Thus, Ω ˜ − ΩX (η) = Op T
1 2 3 4 Ω ≤ R Ω + Ω + Ω + Ω , ˜ X (η) ˜ X◦(η) ˜ −Ω X (η) ˜ ˜ X (η) ˜ X (η) ˜ X (η) ˜ ′ ΩX1(η) ˜ = R vec
T t =1 τ =0
ΩX2(η) ˜
= R vec
ΩX3(η) ˜
′
= R vec
ΩX4(η) ˜
′
′
T t −1 1
T t =1 τ =0
T t −1 1
T t =1 τ =0
= R vec
T t −1 1
T t −1 1
T t =1 τ =0
with
(A.41)
′ ′ −1 ˜ u(η) ˜ − Xt −τ , ˜ − ut Xt −τ (η) Λτ (η) ˜ − Λτ (η) Σ ˜ ut (η)
(A.42)
Λτ (η) ˜ − Λτ (η)
′
1 ˜ u−(η) Σ ˜
ut (η) ˜ − ut Xt −τ
′
,
(A.43)
′ −1 ˜ Λτ (η) ˜ − Λτ (η) Σu(η) ˜ − Xt −τ , ˜ ut Xt −τ (η) ′
(A.44)
′
1 ′ ˜ u−(η) Λτ (η) ˜ − Λτ (η) Σ ˜ ut Xt −τ .
(A.45)
= Op T −3/2 , while Ω 2 , Ω 3 and Ω 4 are all Op T −1 . Using the same arguments as before, we see that ΩX1(η) ˜ X (η) ˜ X (η) ˜ X (η) ˜
−1 = Op T −1 . Similarly, we show that Ω ˜ X (η) ˜ X◦(η) ˜ X◦(η) . Further, using the fact that Thus, Ω ˜ −Ω ˜ ˜ − ΩX (η) = Op T ◦ ≤ Ω + Ω ≤ ΩX (η) + Ω ˜ ˜ −Ω Ω ˜ X (η) ˜ X (η) ˜ X◦(η) ˜ X◦(η) ˜ X◦(η) ˜ X (η) (A.46) ˜ −Ω ˜ ˜ ˜ ˜ − ΩX (η) + ΩX (η) ˜ , −1/2 −1/2 = Op T ˜ X (η) we conclude that Ω . Finally, ηˆ − η = Op T . ˜ • ◦ 1/2 ˜ ˜◦ Ω ˜ X (η) ˜ ˜ Proof of Theorem 4.2. Let the random vectors S˜X (η) QX (η) − Ω and SX (η) = T 1/2 QX (η) ΩX (η) . ˜ =T ˜ Ω ˜ + QX (η) ˜ X (η) ˜ X (η) ˜ Then,
• • 1/2 ˜ ◦ ˜SX (η) ˜ + QX (η) Ω ˜ X (η) QX (η) ˜ − SX (η) ≤ T ˜ − QX (η) 1 ΩX (η) ˜ ˜ − ΩX (η) 1 ◦ + Q˜ Ω . ˜ ◦ ˜ Ω ˜ X (η) ˜ X (η) ˜ X◦(η) + Q˜ X (η) (A.47) ˜ − QX (η) ˜ ˜ −Ω ˜ X (η) ˜ 1 1 −1/2 Using Proposition 4.2 and Theorem 4.1, we show that ˜SX (η) . Therefore, by the central limit theorem ˜ − SX (η) = Op T for stationary processes (see Anderson, 1971, Section 7.7, Scott, 1973, Theorem 2 and Chung, 2001, Theorem 9.1.5) and the d −1 1/2 assumption of independence between ut and Zt , we have T ΩX (η) −→ N 0, QX (η) . Hence, T →∞
d T 1/2 ηˆ − η = S˜X (η) ˜ −→ N 0, QX (η) .
T →∞
(A.48)
Proof of Proposition 4.3. Note that (4.25) reduces to ut (η) ˆ = ut (η) ˜ + Zt◦ (η, ˜ η) ˆ ′ η˜ − ηˆ ,
(A.49)
since ϵt (η, ˜ η) ˆ = ωt (η) ˜ − Zt (η) ˜ ′ ηˆ and ωt (η) ˜ = ut (η) ˜ + Zt (η) ˜ ′ η˜ . Therefore, using Lemma 2.2 of Kreiss and Franke (1992), Proposition 4.1, (3.9) and Theorem 4.1, we get:
n T ut (η) ˆ − ut = ut (η) ˜ − ut + Zt◦ (η, ˜ η) ˆ η˜ − η + ηˆ − η = Op T −1/2 + Op ρ t 1/2 , T −1/2 ˜ u(η) for t = p¯ + 1, . . . , T . Then as in Proposition 4.1, we show that Σ . ˆ − Σu = Op T
(A.50)
86
J.-M. Dufour, T. Jouini / Computational Statistics and Data Analysis 73 (2014) 69–86
References Anderson, T.W., 1971. The Statistical Analysis of Time Series. John Wiley & Sons, New York. Athanasopoulos, G., Vahid, F., 2008. VARMA versus VAR for macroeconomic forecasting. J. Bus. Econom. Statist. 26, 237–252. Chung, K.L., 2001. A Course in Probability Theory, third ed. Academic Press, New York. Deistler, M., Hannan, E.J., 1981. Some properties of the parameterization of ARMA systems with unknown order. J. Multivariate Anal. 11, 474–484. Dhrymes, P.J., 1998. Time Series, Unit Roots, and Cointegration. Academic Press, San Diego, California, USA. Dufour, J.-M., 2006. Monte Carlo tests with nuisance parameters: a general approach to finite-sample inference and nonstandard asymptotics in econometrics. J. Econometrics 133 (2), 443–477. Dufour, J.-M., Jouini, T., 2005. Asymptotic distribution of a simple linear estimator for VARMA models in echelon form. In: Duchesne, P., Rémillard, B. (Eds.), Statistical Modeling and Analysis for Complex Data Problems. Kluwer, Springer-Verlag, Canada, pp. 209–240 (Chapter 11). Dufour, J.-M., Jouini, T., 2006. Finite-sample simulation-based tests in VAR models with applications to Granger causality testing. J. Econometrics 135 (1–2), 229–254. Dufour, J.-M., Jouini, T., 2010. Asymptotic distributions for some quasi-efficient estimators in echelon-form VARMA models. Discussion Paper, Department of Economics, McGill University. Gallego, J.L., 2009. The exact likelihood function of a vector autoregressive moving average process. Statist. Probab. Lett. 79 (6), 711–714. Hamilton, J.D., 1994. Time Series Analysis. Princeton University Press, Princeton, New Jersey. Hannan, E.J., 1969a. The estimation of mixed moving average autoregressive systems. Biometrika 56 (3), 579–593. Hannan, E.J., 1969b. The identification of vector mixed autoregressive-moving average systems. Biometrika 57, 223–225. Hannan, E.J., 1970. Multiple Time Series. John Wiley & Sons, New York. Hannan, E.J., 1976. The identification and parameterization of ARMAX and state space forms. Econometrica 44 (4), 713–723. Hannan, E.J., 1979. The statistical theory of linear systems. In: Krishnaiah, P.R. (Ed.), Developments in Statistics, Vol. 2. Academic Press, New York, pp. 83–121. Hannan, E.J., Deistler, M., 1988. The Statistical Theory of Linear Systems. John Wiley & Sons, New York. Hannan, E.J., Kavalieris, L., 1984. Multivariate linear time series models. Adv. Appl. Probab. 16, 492–561. Hannan, E.J., Kavalieris, L., 1986. Regression, autoregression models. J. Time Ser. Anal. 7 (1), 27–49. Hannan, E.J., Kavalieris, L., Mackisack, M., 1986. Recursive estimation of linear systems. Biometrika 73 (1), 119–133. Hannan, E.J., Rissanen, J., 1982. Recursive estimation of mixed autoregressive-moving average order. Biometrika 69 (1), 81–94; 1983. Biometrika 70, 303 (Errata). Hillmer, S.C., Tiao, G.C., 1979. Likelihood function of stationary multiple autoregressive moving average models. J. Amer. Statist. Assoc. 74 (367), 652–660. Kascha, C., 2012. A comparison of estimation methods for vector autoregressive moving-average models. Econometric Rev. 31 (3), 297–324. Koreisha, S.G., Pukkila, T.M., 1990. A generalized least-squares approach for estimation of autoregressive-moving-average models. J. Time Ser. Anal. 11 (2), 139–151. Kreiss, J.-P., Franke, J., 1992. Bootstrapping stationary autoregressive moving-average models. J. Time Ser. Anal. 13 (4), 297–317. Lütkepohl, H., 1987. Forecasting Aggregated Vector ARMA Processes. Springer-Verlag, Berlin. Lütkepohl, H., 2001. Vector autoregressions. In: Baltagi, B. (Ed.), Companion to Theoretical Econometrics. Blackwell Companions to Contemporary Economics, Basil Blackwell, Oxford, UK, pp. 678–699 (Chapter 32). Lütkepohl, H., 2005. New Introduction to Multiple Time Series Analysis. Springer-Verlag, Berlin, Heidelberg, Germany. Lütkepohl, H., 2006. Forecasting with VARMA models. In: Elliott, G., Granger, C.W.J., Timmermann, A. (Eds.), Handbook of Economic Forecasting, Vol. 1. Elsevier, Netherland, pp. 287–325 (Chapter 6). Maravall, A., 1993. Stochastic linear trends: models and estimators. J. Econometrics 56, 5–37. Mauricio, J.A., 2002. An algorithm for the exact likelihood of a stationary vector autoregressive-moving average model. J. Time Ser. Anal. 23 (4), 473–486. Mauricio, J.A., 2006. Exact maximum likelihood estimation of partially nonstationary vector ARMA models. Comput. Statist. Data Anal. 50 (12), 3644–3662. Metaxoglou, K., Smith, A., 2007. Maximum likelihood estimation of VARMA models using a state-space EM algorithm. J. Time Ser. Anal. 28 (5), 666–685. Poskitt, D.S., Salau, M.O., 1995. On the relationship between generalized least squares and Gaussian estimation of vector ARMA models. J. Time Ser. Anal. 16 (6), 617–645. Reinsel, G.C., 1997. Elements of Multivariate Time Series Analysis, second ed. Springer-Verlag, New York. Reinsel, G.C., Basu, S., Yap, S.F., 1992. Maximum likelihood estimators in the multivariate autoregressive moving-average model from a generalized least squares viewpoint. J. Time Ser. Anal. 13 (2), 133–145. Scott, D.J., 1973. Central limit theorems for martingales and for processes with stationary increments using a Skorokhod representation approach. Adv. Appl. Probab. 5, 119–137. Tiao, G.C., Box, G.E.P., 1981. Modeling multiple time series with applications. J. Amer. Statist. Assoc. 76 (376), 802–816.