~ I MATHEMATICS AND COMPUTERS N SIMULATION ELSEVIER
Mathematics and Computers in Simulation 39 (1995) 265-271
Estimating the rank of co-integration when the order of a vector autoregression is unknown K i m i o M o r i m u n e *, A k i h i s a M a n t a n i Institute of Economic Research, Kyoto University, Sakyo-ku, Kyoto 606, Japan
1. Introduction
In this paper, small sample properties of the maximum likelihood test for determining the rank of co-integration are studied that are developed for the vector autoregressive (VAR) process with unit roots (Johansen [3] and [4]). The lag order of V A R processes is first determined by one of AIC, BIC [10], likelihood ratio (LR) test [6], Wald test [9], or t-test [6], then Johansen's LR test is applied. A m o n g the order selection procedures for the autoregressive processes, AIC is found inconsistent [11], but BIC is consistent [10] for the uni-variate stationary process. This result was extended to the multivariate non-stationary autoregression [7]. Liitkepohl [5] has studied the small sample properties of some statistical criteria for the stationary V A R processes where the average frequency of choosing a correct lag order was calculated across one thousand processes. Morimune and Mantani [6] repeated simulations ten thousand times for each process, and empirical null distributions of the criteria were derived. Toda [12] studied the finite sample properties of the LR tests for co-integration when the lag order of the V A R process with a constant term is known. See also [8] and [13]. The model and test procedures are explained in Section 2. The design of experiments and simulation results are summarized in Section 3.
2. Model and test procedures The p-vector autoregresive
p r o c e s s {Yt} is g e n e r a t e d
by
k+l
Yt = Z B i Y t - i q - u t
t=k+2,...,n
i=1
* Corresponding author. 0378-4754/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 0378-4754(94)00069-V
(1)
266
K. Morimune, A. Mantani/ Mathematics and Computers in Simulation 39 (1995) 265-271
where {u t} is an innovation sequence with covariance matrix ,~. The analysis of co-integration, particularly by [3], is developed along this V A R model which does not include a constant term. The model (1) leads to the error correction representation which is
k z~y, =MYt_ 1 - E f f ' i , ~ y t _ i - l - u t t=k+2,...,n (2) i=1 where H = F 0 - I and F i = F/+ 1 + Bi+ 1, i = 0, 1 , . . . , k, F k+ 1 = 0. Let r be the rank o f / 7 which is less than p, then /7 can be written as /7 = aft' where a and /3 are p × r matrices with rank r. Johansen's maximum likelihood m e t h o d of finding the rank r is d e p e n d e n t upon the lag order of V A R processes since the error t e r m is serially correlated if the correct lag order is not chosen. The lag o r d e r can be d e t e r m i n e d by the Wald test [9], the likelihood ratio test, the t-test, or selection criteria such as A I C and BIC. The null hypothesis for the last lag order is H0: Bk+ 1 = 0, and the alternative is Ha: B k + 1 4= O. The Wald test statistic is
W~vec(nk+a)t(,~k+l
~ (Xk+lexoXk+l)-l)
-1 vec(/~k+l)
(3)
where vec(/~k+ 1) is a column vector stacking p row vectors of the least squares estimator of B/,+l, Xk+l is (n - k - 1) X p matrix of the observed values on Y/-k-1, t = k + 2 , . . . , n , X 0 is (n - k - 1) × (pk) matrix of observed values on, ( Y t - l , . . . , Y t - k ) , t = k + 2,..., n, which are explanatory variables u n d e r H 0, ffXo=I-Xo(XoXo)-lXo, and ~ + 1 is the sum of squared vector-residuals of the regression (1) or (2) divided by n - p ( 1 + k). The asymptotic distribution of W u n d e r H 0 is X 2 with p2 degrees of freedom. The L R test statistic is L R = n l n ( d e t ~ k / d e t ~k+l} which has the same asymptotic distribution as W u n d e r H 0. The significance of coefficients of the highest lag term can be individually tested by the t-ratio [6]. This t-test should also be used sequentially starting from the highest lag order: t-ratios of coefficients in the same lag order are correlated each other but i n d e p e n d e n t from the t-ratios of the higher lag orders. However, if the size of the test of a particular order is c, then the size of each t-test should be c/p 2 since there are p2 t-ratios in the same lag order. (These p2 t-ratios are correlated but this may not cause much bias in the size of the test.) Further, A I C is defined as A I C ( k ) = n ln(det ~k) + 2(P 2k + P ) , and Schwarz criterion is B I C ( k ) = n l n ( d e t "~k} + ln(n)(P 2k +P). The sequential testing procedures must start from the highest lag order as it was explained by [1]. R e s e a r c h e r s test hypotheses H,+~: Bk+ 1 = 0, H~: B k = 0 and so on until one of the null hypotheses is rejected. The starting and ending lag order must be d e t e r m i n e d prior to the test, and the total size of test is not the sum of the sizes assigned to each lag o r d e r up to the ending lag order but 1 - (1 - c) ~.
K. Morimune, A. Mantani /Mathematics and Computers in Simulation 39 (1995) 265-271
267
O n c e the order of a V A R process is determined, Johansen's L R test is applied to find the rank of co-integration. We use only the trace test for this purpose to save space: residuals from the regressions of A y t and Yt-1 onto all lagged differenced variables may be d e n o t e d Rot and R l t , respectively, and matrices Sij i, j = 1, 2, are defined a s (l//n)Y~7=k+2RitR~t. Roots of the characteristic equation det(AS11- SloS~olSol) must be calculated next which may be d e n o t e d A1 >~A2. T h e n the trace test statistics for the hypothesis that there are at most r co-integrating vectors, and thus s = ( 2 - r) unit roots or, equivalent, s zero characteristic roots), is '~r : --n2]~2=r+l In(1 -- Ai). The asymptotic distribution of this test statistic depends only on s, and the 5% critical value for s = 1 is 3.84, and that for s = 2 is 12.53. (See [2]. Both D G P and model do not include a constant term.)
3. Design and results of experiments We will explain the results of simulations on representative cases. The first kind of the data generating process (DGP) is the two dimensional first order vector autoregression Yt = B1Yt- 1 + u t d e n o t e d VAR(1) where the error term is independently distributed as N2(0 Z). See [12] for a canonical form but our experiments do not cover all possible cases. In particular, variances of the error terms are restricted to be one in this report. Simulation was r e p e a t e d five thousand times for each process, and the search for the lag order of autoregression was started from the eighth lag. The size of the three sequential tests is about one percent at each lag, and about eight percent overall. Time series were recorded after discarding the first one h u n d r e d observations. Table 1 gives results of the lag order test and the following trace test by the five criteria w h e n the sample size is fifty. Simulation results of the trace test are also tabulated for the one and four h u n d r e d sample sizes. The lag order of the V A R process is d e t e r m i n e d by five criteria but the real size of t, and Wald are about ten and twelve percent which are greater than the nominal size. The real size of the LR test is about six percent, and it is smaller than the nominal size. BIC is almost perfectly choosing the correct lag order. A I C chooses the second and higher lag orders with non-negligible frequencies. We may repeat here our conclusions of [6] on the order determination criteria: (1) BIC is very accurate w h e n the highest lag order term has large coefficient values. (2) L R gives the most accurate real size among the three test procedures.
Table 1 Lag o r d e r test a n d the effect of the sample size o n the trace test in V A R ( 1 ) process
W t LR BIC AIC
lag o r d e r selection (n = 50) 1 2 3 4 5 6
7
8
r a n k (n = 50) 0 1 2
r a n k (n = 100) 0 1 2
r a n k ( n = 400) 0 1 2
88 90 94 99 82
2 2 1 0 1
1% 2% 0% 0% 1%
77 78 78 79 76
35 35 35 34 35
0 0 0 0 0
1 1 1 1 9
2 1 1 0 3
2 1 1 0 2
2 1 1 0 1
2 2 1 0 1
D G P : B 1 = diagonal (1, 0.8), ,~ = I, 5,000 replications.
20 19 19 18 21
3% 3% 3% 3% 3%
60 59 60 60 60
5% 6% 5% 6% 5%
94 94 94 94 94
6% 6% 6% 6% 6%
268
If. Morimune, A. Mantani / Mathematics and Computers in Simulation 39 (1995) 265-271
Table 2 Lag order test and the effect of the sample size on the trace test in VAR(2) process.
W t LR BIC AIC
lag order selection (n = 50) 1 2 3 4 5
6
7
8
rank (n = 50) 0 1 2
rank (n = 100) 0 1 2
rank (n = 400) 0 1 2
76 76 85 92 51
2 2 1 0 1
2 2 1 0 1
2% 2% 0% 0% 1%
74 75 78 81 65
26 25 28 33 16
0 0 0 0 0
14 15 10 8 35
2 1 1 0 7
2 1 1 0 3
2 1 1 0 2
22 22 19 16 31
4% 3% 4% 3% 4%
68 69 67 61 78
6% 6% 5% 6% 6%
94 94 94 94 94
6% 6% 6% 6% 6%
D G P : B 1 = diagonal (1, 1), B 2 = diagonal (0, - 0 . 2 5 ) , X = I, 5,000 replications.
(3) t-test is useful since it tests the significance of individual coefficients in the highest order term. (4) Wald is not useful because it is less accurate than LR, and it serves for the same purpose as LR. (5) A I C is least parsimonious among the five, and it can be used to find the highest possible lag. Table 1 reconfirms (1) to (5). Further, difference in the lag order selection effects the trace test, but the difference among the five procedures in the trace test is not so m u c h as that in the lag order test. The frequency of selecting AR(1) by BIC is seventeen percent higher than the minimum frequency given by AIC. However, the BIC frequency of choosing rank 1 in the trace test is three percent worse than the AIC frequency when the sample size is fifty. This difference in the trace test disappears when the sample size is one hundred. Therefore, as long as we confine ourselves to the five criteria which are analyzed in this study, the lag order selection criterion is not influential to the trace test. The lag order selection criteria can give different lag orders, but the result of the trace test may be very close. This result becomes more evident as the sample size increases. It can be also seen that the frequency of choosing the correct rank of co-integration converges to the nominal size as the sample size increases, but one h u n d r e d observations are not enough to support the asymptotic distribution of the trace test. Table 2 is on the effect of stationary coefficient in the second autoregressive process. The first process is a simple r a n d o m walk, but the second process is a stationary AR(2). The lag orders chosen by the criteria are mostly AR(1). This is because the second order coefficient in the stationary process is small. All criteria simply neglect the second order coefficient when the sample size is fifty. The t-test cannot find the second order coefficient significantly, either. Wald and t procedures give similar lag order selections, and LR performs worse than the Wald and t. BIC is worst among the five as it is well known by its under-fitting property. AIC seems to p e r f o r m the best but it may be only reflecting the well known over-fitting property. We can see that A I C gives the most a u g m e n t e d process, and the process to be chosen should be less parsimonious than the AIC result, at least, on the average. If we look at the trace test, situation is quite different. We can see that the five criteria do not give much different result even when the sample size is fifty. In fact, the selection frequency of the correct rank of co-integration is worst in BIC, then LR, and best in AIC, but the effect of lag order selection is m u c h less than what is expected from the frequencies of the lag order selection. It is also natural to see that the frequency of zero rank of co-integration is highest by
K. Morimune, A. Mantani /Mathematics and Computers in Simulation 39 (1995) 265-271
269
Table 3 Lag o r d e r test a n d trace test in V A R ( 2 ) process w h e n n is 400
W t LR BIC AIC
lag o r d e r selection ( n = 400) 1 2 3 4
5
6
7
8
r a n k (n = 400) 0 1
2
3 2 3 30 1
1 1 1 0 1
1 1 1 0 0
1 1 1 0 0
1% 1% 1% 0% 0%
0 0 0 0 0
6% 6% 6% 6% 6%
91 92 91 70 87
1 1 1 0 9
1 1 1 0 2
94 94 94 94 94
D G P : B 1 = diagonal (1, 1), B 2 = d i a g o n a l (0, - 0.25), ~ = I, 5,000 replications.
all criteria because the process includes only B 1 matrix when the second order is eliminated. Two variables follow random walk in this case. Difference among the five criteria decreases as the sample size increases, and BIC is not totally irrelevant when the sample size is one hundred. The zero rank is chosen with smaller frequencies because the correct lag order is chosen with higher frequencies (not shown in table). BIC and three testing procedures give rather similar results, and AIC performs the best. Difference in trace test disappears and the zero rank is not chosen when the sample size is four hundred. Once AR(2) is selected, the second variable may be found stationary. Table 3 is again on the same process as Table 2, but it gives the detail of lag order selection when the sample size is four hundred. The three test procedures show almost perfect properties as the selection criteria, and the real size is not different from the nominal size. BIC still chooses AR(1) with thirty percent, and AIC chooses AR(3) with nine percent. These under-fitting and over-fitting properties of BIC and AIC do not disappear even when the sample size is four hundred. However, as it was shown in the Table 2, the simulation result of the trace test is the same for the all criteria. The trace test finds that the true process is non-stationary with one co-integration, and chooses the stationary process with six percent as it is expected from the asymptotic distribution of the trace test. Table 4 analyzes the effect of the stationary coefficient in the second process. The coefficient value is changed from 0.2 to 0.9 and the results of the rank tests are tabulated. The frequency of the correct choice is higher when the stationary coefficient is closer to zero. The zero rank is chosen with very high frequencies when the stationary coefficient is 0.9. This is natural since the VAR process is close to a vector random walk when A is 0.9. However, frequency of choosing wrong ranks disappears as the sample size increases. As the last three columns in the Table 4 Effect of the stationary coefficient o n the trace test of c o - i n t e g r a t i o n
W t LR BIC AIC
0 = 0.2 (n = 50) 0 1 2
0 = 0.5 ( n = 50) 0 1 2
0 = 0.8 ( n = 50) 0 1 2
0 = 0.9 ( n = 50) 0 1 2
0 = 0.9 (n = 400) 0 1 2
4 4 2 0 3
14 14 12 11 14
77 78 78 79 76
88 89 89 91 88
1 1 1 0 1
90 90 92 94 90
6% 6% 6% 6% 7%
80 80 82 83 80
6% 6% 6% 6% 6%
20 19 19 18 21
3% 3% 3% 3% 3%
D G P : R a n k = 1, B 1 = d i a g o n a l (1, 0), n = 50, ,~ = I, 5,000 replications.
10 9 9 8 10
2% 2% 2% 1% 2%
93 93 93 94 93
6% 6% 6% 6% 6%
270
~ Morimune, A. Mantani / Mathematics and Computers in Simulation 39 (1995) 265-271
Table 5 Effect of the correlation coefficient on the trace test of co-integration A = 0.0 (n = 50) h = 0.3 (n = 50) h = 0.6 (n = 50) 0 1 2 0 1 2 0 1 2
h = 0.9 (n = 50) 0 1 2
A = 0.9 (n = 400) 0 1 2
W t LR BIC AIC
3 2 1 0 2
0 0 0 0 0
14 14 12 11 14
80 80 82 83 80
DGP: Rank = 1,
B 1=
6% 6% 6% 6% 6%
10 10 8 7 10
84 84 86 87 84
6% 7% 6% 6% 6%
5 4 3 1 4
89 89 91 93 89
6% 7% 6% 6% 7%
90 91 92 93 91
7% 7% 7% 7% 7%
94 94 94 94 94
6% 6% 6% 6% 6%
diagonal (1, 0.5), n = 50, X = (1, A, A, 1), 5,000 replications.
T a b l e 4 show, the c o r r e c t r a n k is c h o s e n with a b o u t n i n e t y - t h r e e p e r c e n t w h i c h is a l m o s t t h e null p r o b a b i l i t y given by the a s y m p t o t i c d i s t r i b u t i o n o f the t r a c e test. A m o n g the five criteria, B I C m a y b e giving t h e m o s t a c c u r a t e s e l e c t i o n p r o b a b i l i t i e s since the D G P is A R ( 1 ) , a n d the first o r d e r c o e f f i c i e n t is n o t negligible. H o w e v e r , it m a y b e m o s t i m p o r t a n t to see t h a t all five criteria give v e r y similar results o f the t r a c e test. T a b l e 5 shows t h e e f f e c t o f the c o r r e l a t i o n c o e f f i c i e n t b e t w e e n the e r r o r t e r m s o n the t r a c e test o f c o - i n t e g r a t i o n . T h e D G P is V A R ( 1 ) with o n e r a n d o m walk. As it can b e s e e n f r o m t h e c o l u m n s o n the z e r o rank, the c o r r e l a t i o n c o e f f i c i e n t is h e l p i n g to i d e n t i f y t h e c o r r e c t r a n k o f c o - i n t e g r a t i o n . T h e f r e q u e n c i e s o f the z e r o r a n k r e d u c e values f r o m a b o u t t h i r t e e n p e r c e n t to t h r e e p e r c e n t as t h e c o r r e l a t i o n b e c o m e s h i g h e r f r o m z e r o to 0.9. Similarly in the s e c o n d c o l u m n s , the f r e q u e n c i e s o f t h e c o r r e c t c h o i c e i n c r e a s e f r o m a few p e r c e n t a g e s a b o v e eighty to a few p e r c e n t a g e s a b o v e n i n e t y as t h e c o r r e l a t i o n b e c o m e s h i g h e r f r o m z e r o to 0.9. T h e last c o l u m n s show t h a t t h e s t a t i o n a r y p r o c e s s is s e l e c t e d r o u g h l y with t h e s a m e f r e q u e n c i e s .
4. Conclusion In p r a c t i c e , r e s e a r c h e r s s h o u l d use the L R test, BIC, A I C , a n d t-test w h e n t h e y c h o o s e the o r d e r o f V A R p r o c e s s e s . A I C a n d B I C m a y give the highest a n d t h e lowest lag o r d e r s , respectively. If the t-test finds a few significant c o e f f i c i e n t s in a h i g h e r lag o r d e r t h a n the L R o r B I C o r d e r , b u t in a l o w e r o r the s a m e as the A I C o r d e r , t h e n t h a t o r d e r s h o u l d b e i n c l u d e d in t h e process. T h e s e five p r o c e d u r e s including t h e W a l d m e t h o d can b e u s e d w i t h o u t k n o w i n g s t a t i o n a r i t y o r n o n - s t a t i o n a r i t y o f t h e processes. T h e r e f o r e , this a p p r o a c h allows us to d e t e r m i n e the lag o r d e r o f V A R p r o c e s s e s p r i o r to t h e c o - i n t e g r a t i o n analyses. F u r t h e r m o r e a n d m o s t i m p o r t a n t l y , o u r s i m u l a t i o n shows t h a t the result o f the t r a c e test for finding t h e r a n k o f c o - i n t e g r a t i o n d o e s not d e p e n d m u c h o n the lag o r d e r s e l e c t i o n o f V A R p r o c e s s e s . W h i c h e v e r t h e p r o c e d u r e we use for finding the lag o r d e r o f a V A R process, the t r a c e o r t h e m a x i m u m e i g e n v a l u e test m a y give similar r a n k o f c o - i n t e g r a t i o n o n the average.
References [1] T.W. Anderson, The Statistical Analysis of Time Series (Wiley, New York, 1971). [2] A. Banerjee, J. Dolado, J. Galbraith and D. Hendry, Co-integration, Error-correction, and the Econometric Analysis of Non-stationary Data (Oxford University Press, Oxford 1993) 269.
K. Morimune, A. Mantani ~Mathematics and Computers in Simulation 39 (1995) 265-271
271
[3] S. Johansen, Statistical analysis of cointegration vectors, J. Econom. Dynamics Control 12 (1988) 231-254. [4] S. Johansen, Estimation and hypothesis testing of cointegration vectors in gaussian vector autoregressive models, Econometrica 59 (1991) 1551-1580. [5] H. Liitkepohl, Comparison of criteria for estimating the order of a vector autoregressive process, J. Time Series Analysis 6 (1985) 35-52. [6] K. Morimune and A. Mantani. The order of the vector autoregressive processes with unit roots, mimeographed, 1993. [7] J. Paulsen, Order determination of multivariate autoregressive time series with unit roots, J. Time Ser. Analysis 5 (1984) 115-127. [8] J.M. Podivinsky, Small sample properties of tests of linear restrictions on cointegration vectors and their weights, Economic Lett. 39 (1992) 13-18. [9] C.A. Sims, J.H. Stock, and M.W. Watson Inference in linear time series models with some unit roots, Econometrica 58 (1991) 113-144. [10] G. Schwarz, Estimating the dimension of a model, Ann. Statist. 6 (1978) 461-464. [11] R. Shibata, Selection of the order of an autoregressive model by Akaike's information criterion, Biometrika 63 (1976) 117-126. [12] H. Toda, Finite sample properties of likelihood ratio tests for cointegration ranks in vector autoregressions, mimeographed, 1993. [13] H. Today and P.C.B. Phillips, Vector autoregression and causality: a theoretical overview and simulation study, Econom. Rev. (1993).