Economics Letters 62 (1999) 265–270
Efficient GMM and MD estimation of autoregressive models a b a, Yangseon Kim , Hailong Qian , Peter Schmidt * a
b
Department of Economics, Michigan State University, East Lansing, MI 48824, USA School of Economics and Finance, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand Received 27 July 1998; accepted 4 September 1998
Abstract This paper considers minimum distance estimation of the AR( p) model. Given the first p autocorrelations, higher-order autocorrelations are shown to be redundant. Thus, given non-normality, improvements to the normal quasi-MLE must depend on something other than autocorrelations. 1999 Elsevier Science S.A. All rights reserved. Keywords: Autoregressive model; GMM; Minimum distance JEL classification: C13; C22
1. Introduction In this paper, we consider minimum distance (MD) and generalized method of moments (GMM) estimation of the univariate AR( p) process: y t 5 f 01 y t 21 1 f 02 y t22 1 ? ? ? 1 f 0p y t 2p 1 ´t ,
(1) 2
0
0
0
where ´t is a white-noise process with mean zero and variance s ´ ; f ; (f 1 , . . . ,f p )9 is the p 3 1 vector of parameters to be estimated; and f 0 satisfies the usual ‘roots outside the unit circle’ requirement so that the process y t is stationary. We consider the MD estimator that minimizes the 0 optimal quadratic form in the difference rˆ 2 r (f ), where r (f ) is the vector of the first m (m $ p) population autocorrelations and rˆ contains the corresponding sample autocorrelations. We show that this estimator is asymptotically equivalent to the GMM estimator that asserts uncorrelatedness of y t2j and ´t , for j 5 1, . . . ,m. We then show the following redundancy result: for m . p, the last (m 2 p) moment conditions are redundant, in the sense that they do not increase efficiency. That is, the MD (or GMM) estimator based on all m autocorrelations is no more efficient than the estimator using the first p autocorrelations.
*Corresponding author. Tel.: 11-517-355-8381; fax: 11-517-432-1068. E-mail address:
[email protected] (P. Schmidt) 0165-1765 / 99 / $ – see front matter PII: S0165-1765( 99 )00017-8
1999 Elsevier Science S.A. All rights reserved.
Y. Kim et al. / Economics Letters 62 (1999) 265 – 270
266
The novelty and importance of this result lie in the fact that it does not depend on normality. It is well known that the normal quasi-MLE relies only on the first p autocorrelations, and is efficient if normality holds. If normality does not hold, we expect there to be estimators that are more efficient than the normal quasi-MLE, and these estimators will rely on features of the data other than just the first p autocorrelations. However, our result indicates that, even under non-normality, there are no efficiency gains to be had from considering higher-order autocorrelations; any such efficiency gains must come from considering features of the data other than just the autocorrelations.
2. Notation and statement of the results The parameters to be estimated are f 5 (f1 , . . . ,fp )9, with true value f 0 . For m $ p, define ) y t(m) 5 ( y t ,y t 21 , . . . ,y t 2m )9, Z t(m) 5 y t(m21 5 ( y t21 ,y t 22 , . . . ,y t 2m )9. 21 (m)
(2)
0
Consider the moment conditions E[h( y t ,f )] 5 0, where
h( y
(m) t
,f ) 5 Z
(m)
(m) t
y t 21 ( y t 2 f1 y t21 2 ? ? ? 2 fp y t2p ) ? ( y t 2 f1 y t21 2 ? ? ? 2 fp y t2p ) 5 : . y t 2m ( y t 2 f1 y t21 2 ? ? ? 2 fp y t 2p )
(m)
3
0
4
(3)
0
Define V 5 E(Z t Z t 9 ), an m 3 m matrix with Vij 5 g li2jl , where g s is the s-period autocovariance ] ˆ of the y t . Let h(f ) 5 T 21 o Tt5m 11 h( y (m) t ,f ). Then, we define f to be the GMM estimator based on the (m) 0 ˆ moment conditions E[h( y t ,f )] 5 0 and using the optimal weighting matrix (s ´2 V )21 ; that is, f ] 21] minimizes h T (f )9V h T (f ). (In practice, a consistent estimator of V would be used.) We next wish to define the MD estimator f˜ . For m $ p, define rˆ 5 ( rˆ 1 , . . . , rˆ m )9 to be the vector of 21 T the first m sample autocorrelations, so that rˆ j 5 gˆ j / gˆ 0 with gˆ i ; T o t 5i 11 y t y t2i . Let r (f ) be the corresponding m 3 1 vector of population autocorrelations implied by the parameter value f, with ] 0 0 0 true value r ; r (f ). Let C be the asymptotic variance matrix of ŒT( rˆ 2 r ), as given by Bartlett’s formula (see, for example, Brockwell and Davis, 1991, p. 221). Then the MD estimator f˜ minimizes 21 [ rˆ 2 r (f )]9C [ rˆ 2 r (f )]. The following theorem establishes the asymptotic equivalence (in the sense relevant for our main result) of the GMM estimator and the minimum distance estimator. Theorem 1. The GMM estimator fˆ based on the moment condition (3) and the MD estimator f˜ based on the difference ( rˆ 2 r ) have the same asymptotic distribution. ˆ . By Theorem 1, this implies that the We next state our redundancy result, for the GMM estimator f ˜ corresponding redundancy result holds for the MD estimator f. Theorem 2. For m . p, let fˆ be the GMM estimator based on the m moment conditions in L(3) above, L and let f be the GMM estimator based on just the first p moment conditions. Then fˆ and f have the same asymptotic distribution; the last (m 2 p) moment conditions are redundant.
Y. Kim et al. / Economics Letters 62 (1999) 265 – 270
267
3. Proof of the results 0 We begin by proving Theorem 2. The moment conditions are E[h( y (m) t ,f )] 5 0, as given explicitly 0 in Eq. (3) above. It is easy to see (using the law of iterated expectations) that h( y (m) t ,f ) is a 2 white-noise sequence with variance matrix s ´ V (where as above Vij 5 gli2jl ). Therefore, the optimal weighting matrix is the inverse of ] ] lim E[T ?h T (f 0 )h T (f 0 )9] 5 s ´2 V. (4) T →`
Next, define the expected derivative matrix D (m 3 p): ] ≠h T (f 0 ) ≠h( y (m) t ,f ) D 5 plim]]] 5 E ]]]] 5 2 E[Z t(m) Z (t p) 9 ]. ≠f 9 ≠f 9
(5)
Thus D, like V, has typical element Dij 5 2 gli 2jl , and indeed D is just the minus submatrix composed of the first p columns of V. L ˆ ) and some (f) of the moment conditions, we Finally, to compare the estimators using all ( f partition D and V : D5
FG
F
V11 D1 , V5 D2 V21
G
V12 , V22
(6)
where D1 and V11 are p 3 p; D2 and V21 are (m 2 p) 3 p; V12 is p 3 (m 2 p); and V22 is (m 2 p) 3 ] 0 2 21 21 (m 2Lp). Then we have the standard results that ŒT( fˆ 2 f ) → N(0,D) with D 5 s ´ (D9V D) , and ] 21 ŒT(f 2 f 0 ) → N(0,D1 ) with D1 5 s 2´ (D 91 V 21 . According to Theorem 1 of Breusch et al. 11 D 1 ) (1998), a necessary and sufficient condition for D 5 D1 is D2 5 V21 V 21 11 D 1 . But this condition clearly holds in the present case, since D1 5 2 V11 and D2 5 2 V21 . ˆ is the GMM estimator based on the moment We next wish to prove Theorem 1. Recall that f (m) 0 ˆ condition E[h( y t ,f )] 5 0. Equivalently, f is a MD estimator based on minimizing the optimal ] quadratic form in the sample quantity h T (f ); this is a legitimate interpretation since a MD estimator ] ] would consider the difference between h T (f ) and the population quantity p lim h T (f 0 ) 5 0. Now 21] ] define gT (f ) 5 gˆ 0 h T (f ), and let f¨ be the MD estimator that minimizes the optimal quadratic form in ]g (f ). Then it is easy to show that f ˆ and f¨ have the same asymptotic distribution, since the expected T derivative matrix and the optimal weighting matrix change by scale factors that cancel in the asymptotic variance formula. (In fact, with the estimated weighting matrices calculated in the usual ˆ and f¨ would be numerically identical.) way based on the same initial consistent estimate of f, f ¨ Therefore, it is sufficient to show that f has the same asymptotic distribution as the MD estimator f˜ . As a matter of notation, let r* (f ) and rˆ * be the p 3 1 leading subvectors of the m 3 1 vectors r (f ) and rˆ, respectively. Define the following matrices 2 f3 2 f4 ?? 1 2 f2 2 (f1 1 f3 ) 1 2 f4 2 f5 ?? A(f ) 5 2 (f2 1 f4 ) 2 (f1 1 f5 ) 1 2 f6 ?? : : : : 2 fp 21 2 fp 22 2 fp 23 ? ?
3
2 fp 22 2 fp 21 2 fp 2 fp 21 2 fp 0 2 fp 0 0 : : : ? 2 f3 2 f2 2 f1 ? ? ?
0 0 0 , : 1
4
(7a)
Y. Kim et al. / Economics Letters 62 (1999) 265 – 270
268
2 fp 0 B(f ) 5 0 :
3
2 fp21 2 fp 0 :
2 fp 22 ? ? ? 2 f2 2 f1 2 fp 21 ? ? ? 2 f3 2 f2 , 2 fp ? ? ? 2 f4 2 f3 : : : :
4
(7b)
(7c)
1 0 0 0 0 0 2 f1 1 2 f2 2 f1 1 0 2 f 2 f 2 f 1 C(f ) 5 3 2 1 ? ? : ?? ?? ??? ?? ?? : : ? ? : : ? ? ? 2 f3 H(f ) 5
F
A(f ) B(f )
??? ???
??? ???
?? ? ?? ? ?? ? 2 f2
?? ? 1 2 f1
0 0 0 : , : 0 1
G
0 . C(f )
(7d)
These matrices arise naturally in considering the relationship between f and r (f ). For example, the Yule–Walker equations state that, for i 5 1, . . . , p, 0 ri (f 0 ) 2 f 10 ri 21 (f 0 ) 2 ? ? ? 2 f i21 r1 (f 0 ) 2 f 0i 2 f 0i 11 r1 (f 0 ) 2 ? ? ? 2 f 0p rp 2i (f 0 ) 5 0,
(8)
which can be written as A(f 0 )r* (f 0 ) 5 f 0 . 0
(9) 0
Furthermore, given f , r* (f ) is uniquely determined (see, for example, Brockwell and Davis, 1991, 0 p. 94); therefore, A(f ) is nonsingular. Similarly, for i 5 p 1 1, . . . ,m (with m . p), we have
ri (f 0 ) 2 f 10 ri 21 (f 0 ) 2 ? ? ? 2 f p0 ri2p (f 0 ) 5 0,
(10)
and (8) and (10) can be written together as H(f 0 )r (f 0 ) 5 b(f 0 ),
(11)
where b(f ) 5 (f 9,0 13(m 2p) )9. We note that C(f 0 ) is nonsingular (because it is a lower triangular matrix with nonzero diagonal elements) and therefore H(f 0 ) is nonsingular (because it is a lower block triangular matrix with nonsingular blocks). We now return to the problem of showing that f¨ and f˜ have the same asymptotic distribution. Recall that f¨ is the optimal MD estimator based on ]g (f ) 5 gˆ 21h] (f ), T 0 T
(12a)
Y. Kim et al. / Economics Letters 62 (1999) 265 – 270
O T
5 gˆ
21 0
?T
21
t5m11
y t21 ( y t 2 f1 y t 21 2 ? ? ? 2 fp y t2p ) : y t2p ( y t 2 f1 y t 21 2 ? ? ? 2 fp y t2p ) , y t2( p 11) ( y t 2 f1 y t 21 2 ? ? ? 2 fp y t2p ) : y t2m ( y t 2 f1 y t21 2 ? ? ? 2 fp y t2p )
gˆ 1 2 f1 gˆ 0 2 ? ? ? 2 fp gˆ p 21 : 1 gˆ p 2 f1 gˆ p 21 2 ? ? ? 2 fp gˆ 0 5] 1 O p (T 21 ), gˆ 0 gˆ p 11 2 f1 gˆ p 2 ? ? ? 2 fp gˆ 1 : gˆ m 2 f1 gˆ m 21 2 ? ? ? 2 fp gˆ m2p
rˆ 1 2 f1 2 ? ? ? 2 fp rˆ p 21 : rˆ p 2 f1 rˆ p 21 2 ? ? ? 2 fp 5 1 O p (T 21 ), rˆ p 11 2 f1 rˆ p 2 ? ? ? 2 fp rˆ 1 : rˆ m 2 f1 rˆ m21 2 ? ? ? 2 fp rˆ m2p 5 [H(f ) rˆ 2 b(f )] 1 O p (T
21
).
269
(12b)
(12c)
(12d)
(12e)
21
(The O p (T ) term arises because the sums in (12b) are over t from m 1 1 to T, while the sum in the definition of gˆ j is over t from j 1 1 to T. When divided by T, the finite number of terms from j 1 1 to m, divided by T, is O p (T 21 ).) Substituting (11), evaluated at f, into (12e), we obtain ]g (f ) 5 H(f )[ rˆ 2 r (f )] 1 O (T 21 ). T p
(13)
Thus the optimal MD estimator f¨ based on ]gT (f ) has the same asymptotic distribution as the optimal 1 MD estimator based on H(f )[ rˆ 2 r (f )]. Finally, it is well known (and easy to verify) that, for nonsingular H(f ), the optimal MD estimator f¨ based on H(f )[ rˆ 2 r (f )] has the same asymptotic ˆ and f˜ have the same distribution as the optimal MD estimator f˜ based on [ rˆ 2 r (f )]. Thus f asymptotic distribution, because each has the same asymptotic distribution as f¨ .
4. Concluding remarks We have considered MD estimation of the parameters of the AR( p) model, based on the distance between the first m (m $ p) theoretical and sample autocorrelations. Our main result is that the last ] ] ] More precisely, we should multiply (13) by ŒT so that ŒT g]T (f ) and H(f ) ?ŒT [ rˆ 2 r (f )] are O p (1). The last term is then 21 / 2 O p (T ), and hence asymptotically irrelevant. 1
270
Y. Kim et al. / Economics Letters 62 (1999) 265 – 270
(m 2 p) autocorrelations are redundant, in the sense that they do not increase asymptotic efficiency of estimation. This result does not depend on normality, and shows that any estimator that is more efficient than the normal quasi-MLE under non-normality must exploit some feature of the data other than the autocorrelations. No similar result appears to hold for MA or ARMA processes. For example, for the MA(1) model, it is possible to use the results of Breusch et al. (1998) to show that, given the first m autocorrelations (with m arbitrary), autocorrelations m 1 1 and m 1 2 are not redundant. Thus no finite number of autocorrelations will lead to asymptotic efficiency. This negative conclusion is also supported by the calculations of Chung (1997).
Acknowledgements The second author acknowledges financial support from the Faculty of Commerce and Administration of Victoria University of Wellington. The third author gratefully acknowledges financial support from the National Science Foundation.
References Breusch, T., Qian, H., Schmidt, P., Wyhowski, D., 1998. Redundancy of moment conditions. Journal of Econometrics (forthcoming). Brockwell, P., Davis, R., 1991. Time Series: Theory and Methods, 2nd, Springer, New York. Chung, H., 1997. Minimum distance estimation for ARMA and GARCH processes. Unpublished Ph.D. dissertation, Michigan State University.