Computational Statistics & Data Analysis 41 (2002) 185 – 193 www.elsevier.com/locate/csda
Numerically stable cointegration analysis Jurgen A. Doornika; ∗ , R.J. O’Brienb a Nueld
b Department
College, University of Oxford, 0X1 INF, Oxford, UK of Economics, University of Southampton, Southampton, UK Received 1 January 2002
Abstract Cointegration analysis involves the solution of a generalized eigenproblem involving moment matrices and inverted moment matrices. These formulae are unsuitable for actual computations because the condition numbers of the resulting matrices are unnecessarily increased. The special structure of the cointegration procedure is used to achieve numerically stable computations, based on QR and singular value decompositions. c 2002 Elsevier Science B.V. All rights reserved. MSC: C32; C63 Keywords: Cointegration; QR decomposition; Reduced rank; Singular-value decomposition
1. Introduction Many observed time series appear to be non-stationary. A simple form of nonstationarity arises when the 9rst di:erences of a series are white noise. In that case we say that the series, say yt , is integrated of order one, I (1), and for the 9rst di:erences: =yt ∼ I (0). An important step in the statistical analysis of integrated series was the realization that it is also possible for a linear combination of I (1) variables to be I (0). These variable are then said to cointegrate. A simple framework for the statistical analysis is the vector autoregression (VAR). For example, when using one lag yt = yt−1 + t ;
t ∼ Nn (0; );
(1)
where yt is an n × 1 vector and an n × n matrix. The disturbances t are independent and identically distributed, with Nn denoting the n-dimensional normal distribution. ∗
Corresponding author. Tel.: +01865-278500; fax: 01865-278621. E-mail address:
[email protected] (J.A. Doornik).
c 2002 Elsevier Science B.V. All rights reserved. 0167-9473/02/$ - see front matter PII: S 0 1 6 7 - 9 4 7 3 ( 0 2 ) 0 0 1 4 3 - 3
186
J.A. Doornik, R.J. O’Brien / Computational Statistics & Data Analysis 41 (2002) 185 – 193
This system can be written as a vector equilibrium-correction model (VECM), by subtracting yt−1 from both sides: =yt = ( − In )yt−1 + t ; using In for the n × n identity matrix. Alternatively, writing = − In : =yt = yt−1 + t : This shows that the matrix determines how the level of the process y enters the system: for example, when = 0, the dynamic evolution does not depend on the levels of any of the variables. The statistical hypothesis of cointegration is H(r): rank ( ) 6 r: Under this hypothesis, can be written as the product of two matrices: = ; where and have dimension n×r, and vary freely. As suggested by Johansen (1988) (also see Johansen, 1995), such a restriction can be analyzed by maximum likelihood methods. Maximum likelihood estimation requires solving a generalized eigenproblem. The purpose of this note is to show that the standard method for solving this eigenproblem, as proposed in the statistical literature and discussed in the next section, is numerically undesirable. We then consider several alternatives which are numerically more stable. We also consider an example to illustrate the impact of the di:erent implementations. But, before we continue, we summarize the two orthogonal decompositions that are used. The singular value decomposition (SVD) of an m × n matrix A, rank A = r, is A = UV . Assuming m ¿ n gives a ‘thin’ version: U is m × n; V is n × n; U U = V V = In , and = diag(1 ; : : : ; n ). The singular values are real, non-negative, and ordered: 1 ¿ · · · ¿ r ¿ r+1 = · · · = n = 0. So the rank of A corresponds to the number of non-zero singular values. The QR decomposition of a full rank m × n matrix A, m ¿ n, is A = QR. Here Q is m × m; Q Q = Im , and R is an upper triangular m × n matrix with positive diagonal elements. Since only the top n rows of R are non-zero, it is customary to use the thin version where R consists of the 9rst n rows of R and Q of the 9rst n columns of Q, so Q = (Q : ∗); R = (R : 0) and A = QR. Instead of storing Q, only the m × n matrix of Householder transformations is stored. R R is the Choleski decomposition of A A. Normally, the QR decomposition is augmented with column pivoting to handle rank de9cient A, see Golub and Van Loan (1996, Chapter 5). 2. Standard algorithm for cointegration analysis A more general version of (1) allows for longer lags, and deterministic variables such as constant and trend. From a statistical point of view, the treatment of the constant
J.A. Doornik, R.J. O’Brien / Computational Statistics & Data Analysis 41 (2002) 185 – 193
187
and trend is very important (see, e.g. Doornik et al., 1998). The trend, if present, is usually restricted to lie inside the cointegration space, leading to the general VECM: ∗ + =xt = xt−1
k−1
j =xt−j + qt + t ;
t = 1; : : : ; T;
(2)
j=1
where {xt } is an n-vector time series, the starting values (x1−k ; : : : ; x0 ) are 9xed, {t } ∗ is i.i.d. Nn (0; ), and xt−1 = (xt−1 ; dt ) , where dt and qt are deterministic regressors. T is the sample size after allowing for lags and di:erences. The maximum likelihood method for estimating the cointegration space, proposed by Johansen (1988), writes = , where is n × r, and is n1 × r. In this case, there are n1 − n variables dt restricted to lie in the cointegrating space. The unrestricted deterministic variables qt and lagged di:erences (and possibly additional non-modelled variables) can be partialled out (i.e. removed through a prior regression) from =xt and ∗ xt−1 , giving y0t and y1t : y0t = y1t + t ;
t = 1; : : : ; T:
(3)
Stacking the partialled data, Y0 = (y01 ; : : : ; y0T ), and Y1 = (y11 ; : : : ; y1T ) gives the matrix equivalent of (3): Y0 = Y1 + E:
(4)
The likelihood is maximized by solving the generalized eigenvalue problem: −1 |!S11 − S10 S00 S01 | = 0;
(5)
writing Sij = T −1 Yi Yj , for i; j = 0; 1. This can be translated in a standard eigenvalue −1=2 problem through pre- and post-multiplication by S11 , see Johansen (1995, p. 95), ˆ ˆ giving n eigenvalues 1 ¿ !1 ¿ · · · ¿ !n ¿ 0. The remaining (if any) n1 − n eigenvalues The ‘trace’ statistic are zero. The hats on the ! s indicate that these have been estimated. n tests whether has rank 6 r, given that it has rank n or less: −T i=r+1 log(1 − !ˆi ). Under the null hypothesis, the trace test has a distribution which is a functional of multivariate Brownian motions. Full details on the statistical analysis are given in Johansen (1995). Simple restrictions of the form = B' and = H) can be incorporated with only small adjustments (as can calculation of Wald tests for di:erent r). Let B = Qb Rb be the QR decomposition of B. Then (3) can be written as Qb y0t = Rb ') H y1t + Qb t : The following algorithm has often been used to implement the Johansen procedure: Algorithm 1. If Y0 ; Y1 have full column rank; then ML estimation of (4) can be achieved by (a) Second moment matrices S00 ; S01 ; S11 : Sij ← T −1 Yi Yj ; (b) Choleski decomposition of S11 : S11 = PP ; −1 (c) Solve symmetric eigenproblem: P −1 S01 S00 S01 P −1 = H,H .
188
J.A. Doornik, R.J. O’Brien / Computational Statistics & Data Analysis 41 (2002) 185 – 193
, is a diagonal matrix with the eigenvalues !ˆi . The remaining quantities are
ˆ = P −1 H;
ˆ ˆ = S01 :
Given ; can be computed from ( ) ˆ = S01 ( S11 )−1 (Johansen, 1995, p. 91). Since ˆ S11 ˆ = I; ˆ follows as given. Note that and are not uniquely determined: = V −1 V for any non-singular V . Solving a generalized eigenproblem |!A − B| = 0 with A; B symmetric and positive de9nite is analyzed in Golub and Van Loan (1996, Section 8.7). However, in the cointegration case there is more structure in A and B. Consequently, Algorithm 1 is numerically undesirable, in the same way that regression is not implemented by inverting a second moment matrix. The next three sections present alternative algorithms which are numerically more stable. In the remainder we concentrate on (4). Note that it is essential that the move from (2) to (3) is done in a numerically stable way, for example using the QR decomposition with column pivoting (this is the default procedure for OLS in most software packages). 3. Algorithm 2: QR decomposition We 9rst consider an alternative procedure for the Johansen procedure that uses the QR algorithm with pivoting on both data matrices. Algorithm 2. If Y0 ; Y1 have full column rank; then ML estimation of (4) can be achieved by (a) (b) (c) (d)
Thin QR decomposition of Y0 : Y0 P0 = Q0 R0 ; Computation of T × n1 matrix W : W ← Q0 Y1 ; QR decomposition of W : WP2 = Q2 R2 ; Thin SVD of Z = Q2 (1 : n; :) : Z = UV .
The required quantities are !ˆi = i2 ;
˜ R2 ˜ = T 1=2 U; ˆ = P2 ;
˜ = T −1=2 R0 Z U;
ˆ = P0 ; ˜
where = diag(1 ; : : : ; n ). V is not required. Pi are permutation matrices. Proof. Note that the algorithm is identical to computing the thin QR of Yi : Yi Pi = Qi Ri ; and then the SVD of Q1 Q0 : We present the algorithm in an alternate format because most QR implementations do not return Q; but the sequences of Householder transformations instead; complemented by a procedure to compute Q A. To see that the algorithm solves (5) without pivoting (i.e. setting Pi to the identity matrix), write Yi = Qi Ri and substitute into (5): |T −1=2 R1 !I − Q1 Q0 Q0 Q1 T −1=2 R1 | = 0: Using ZZ = U2 U , we 9nd that the squared singular values of Q1 Q0 are the required eigenvalues. Next, T −1=2 R1 =U gives the cointegrating vectors. Note that this does not
J.A. Doornik, R.J. O’Brien / Computational Statistics & Data Analysis 41 (2002) 185 – 193
189
˜ Applying require inversion, and can be solved by simple backsubstitution, yielding . ˆ ˆ ˆ the permutation gives . Finally, substitution in ( ˆ ) = S01 yields . ˆ The algorithm is similar to that given in BjQorck and Golub (1973) and analyzed in Golub and Zha (1994) (also see Golub and Van Loan, 1996, Section 12.4.3). The adjustment for pivoting to (4) is Y0 P0 = Y1 P1 P1 P0 + EP0 : Using the Cauchy–Schwartz inequality: 12 = Z 22 6 Q1 22 Q0 22 6 1. So all eigenvalues are between zero and one. 4. Algorithm 3: QR decomposition The next alternative procedure is also QR based, but works on the combined data matrix. Algorithm 3. If Y0 ; Y1 have full column rank; then ML estimation of (4) can be achieved by (a) Thin QR decomposition of the
T × (n1 + n) matrix Y = (Y1 Y0 ) : Y = QR = Q
R11
R10
0
R00
;
(b) Computation of n1 × n matrix Z: Z ← R10 R−1 00 ; (c) Thin SVD of Z : Z = UV . The required quantities are !ˆi =
i2 ; 1 + i2
R11 ˆ = T 1=2 U;
ˆ = T −1=2 R10 U;
where = diag(1 ; : : : ; n ); Q and V are not required. Proof. The second-moment matrix of Y using the QR decomposition without pivoting from step (a) contains the Sij : R11 R11 S11 S10 R11 R10 −1 −1 = : T Y Y =T R10 R11 R10 R10 + R00 R00 S01 S00 We need a standard result for symmetric partitioned matrices. De9ne A B ; B D where A and D have full rank. Then (A−BD−1 B )−1 =A−1 +A−1 B(D−B A−1 B)−1 B A−1 . When A = I; D = −R00 R00 and B = R10 : [I + R10 (R00 R00 )−1 R10 ]−1 = I − R10 (R10 R10 + R00 R00 )−1 R10 :
190
J.A. Doornik, R.J. O’Brien / Computational Statistics & Data Analysis 41 (2002) 185 – 193
Substituting this into (5) gives an alternative symmetric eigenproblem: −1 0 = |!S11 − S10 S00 S01 |
= |T −1=2 R11 !I − R10 (R10 R10 + R00 R00 )−1 R10 T −1=2 R11 | = |T −1=2 R11 (! − 1)I + [I + R10 (R00 R00 )−1 R10 ]−1 T −1=2 R11 | = |T −1=2 R11 (! − 1)−1 I + I + R10 (R00 R00 )−1 R10 T −1=2 R11 | = |T −1=2 R11 !(! − 1)−1 I + R10 (R00 R00 )−1 R10 T −1=2 R11 | = (−1)n1 |T −1=2 R11 − !(! − 1)−1 I − R10 (R00 R00 )−1 R10 T −1=2 R11 |: In the fourth line we use the fact that the eigenvalues of an inverted symmetric matrix are the reciprocal eigenvalues of the original matrix. If 1i are the eigenvalues solving the term in the middle; then !i = 1i =(1 + 1i ) (and hence 0 6 !i ¡ 1). Also see Johansen (1995; p. 94). R11 is again an upper-triangular matrix; but R10 is not. Like Algorithm 2, this algorithm uses a QR decomposition for the 9rst step, and the singular value decomposition to solve the symmetric eigenvalue problem. The main di:erence is that Algorithm 2 only uses Q from the QR step, whereas Algorithm 3 only uses R. Implementations of the QR decomposition would normally signal rank de9ciencies in Yi . In that case, both algorithms should abort with an error message. Algorithm 2 will produce an answer for the singular case, but this does not solve the original problem. Algorithm 3 is numerically less reliable, because it does not use pivoting, and requires the inverse of R00 .
5. Algorithm 4: the singular case If either Y0 or Y1 has reduced rank, the previous algorithms break down. A more reliable solution is to use the SVD throughout: Algorithm 4. ML estimation of (4) can be achieved by (a) Thin SVD decomposition of Yi ; i = 0; 1: Yi = Ui i Vi ; (b) Computation of n1 × n matrix Z : Z ← U1 U0 ; (c) Thin SVD of Z: Z = Uz z Vz . The required quantities are !ˆi = i2 ;
−1
ˆ = T 1=2 V1 ˜ 1 Uz ;
ˆ = T −1=2 V0 0 Z Uz
(≡ T −1=2 Y0 U1 Uz );
−1 −1 −1 ˜ 1 = diag(˜−1 = i−1 when i ¿ y ; and ˜−1 = 0 otherwise. i 1 ; : : : ; ˜ n ); where ˜ i
J.A. Doornik, R.J. O’Brien / Computational Statistics & Data Analysis 41 (2002) 185 – 193
191
Proof. This is similar to Algorithm 3: |T −1=2 V1 1 !I − U1 U0 U0 U1 T −1=2 1 V1 = 0: The choice of y would depend on the machine precision. We suggest using y = 10−9 Y1 ∞ , which is appropriate for double precision (8-byte) arithmetic.
6. Operation count We follow Golub and Van Loan (1996) in the de9nition of Uoating point operation (Uop) count, counting a scalar addition, multiplication, or division as one Uop. Golub and Van Loan (1996, p. 225) give the Uop count of the thin Householder QR decomposition of an m × n matrix as 2mn2 − 2n3 =3. In this note, m corresponds to the sample size T , and is assumed to be signi9cantly larger than n. Therefore, we take the number of Uops of the thin QR as 2Tn2 . Application of the Householder transformation to compute Q A when A is T × q uses 4Tnq Uops (Golub and Van Loan, 1996, p. 212–213). In some cases we also require Q, the 9rst q columns of Q. This is most eEciently done through backward accumulation (Golub and Van Loan, 1996, p. 213), and can be shown to take approximately 4qmn − 2(m + n)q2 + 43 q3 operations. Therefore, when m = T dominates, and q = n, this reduces to 2Tn2 , the same as for the QR decomposition itself. Using the same large T argument, we assume that the thin SVD takes 6Tn2 Uops (Golub and Van Loan, 1996, p. 254), regardless whether V is required or not, because the term with T is assumed to dominate. Computation of A B when A; B are T × n matrix takes 2Tn2 Uops, and A A half that. Assume that n1 = n and that the sample size T is suEciently large to make Tn2 the dominant term. Then Algorithm 1 forms three Sij moment matrices, taking Tn2 (1 for S11 + 2 for S10 + 1 for S00 ) = 4Tn2 . Algorithm 2 takes two QR decompositions at about 2Tn2 Uops each, as well as a multiplication with Q0 at 4Tn2 and extraction of Q2 at another 2Tn2 , resulting in a total of 10Tn2 . This can be reduced to 8Tn2 if Q2 is found from (WP2 )(1 : n; := Q2 (1 : n; :)R2 ), but this has some negative e:ect on the numerical stability, making it comparable to Algorithm 3. If the algorithm is implemented as the QR of Yi : Yi Pi = Qi Ri , and then the SVD of Q1 Q0 it would also take 10Tn2 Uops. Algorithm 3 does one QR of a T × 2n matrix, which takes 2T (2n)2 Uops. Finally, Algorithm 4 requires two SVD decomposition and the computation of Z. The approximate operation count for the three algorithms under those assumptions equals: Algorithm
Flops when T n
1 2 3 4
4Tn2 10Tn2 8Tn2 14Tn2
192
J.A. Doornik, R.J. O’Brien / Computational Statistics & Data Analysis 41 (2002) 185 – 193
Table 1 Largest eigenvalue for VAR(2) (yt ; yt + ut 10−m )
Alg.
m=1
m=3
m=5
m = 10
1a 1b 2 3 4
0.35091940555 0.35091940557 0.35091938503 0.35091938503 0.35091938503
0.35089570155 0.35089589051 0.35091938503 0.35091938503 0.35091938503
0.05949553867 0.13273076746 0.35091938542 0.35091938540 0.35091938494
failed failed 0.01335798065a failed 0.00487801748a
Incorrect digits in bold. a A warning signalling singularity is also printed.
7. Illustration To illustrate the di:erent properties of the algorithms, we estimate a two-equation VAR(2) which has both equations asymptotically identical: yt = (yt ; yt + ut 10−m )
where ut ∼ N(0; 1):
The estimated ECM has no deterministic terms: + =yt−1 + t : =yt = yt−1
We select yt as the CONS variable from a tutorial data set of Ox (Doornik, 2001). Ox was used for all computations, and we created ut using the default random number generator. For m not too large, this problem will yield the same cointegration eigenvalues as (yt ; ut ), namely eigenvalues 0.35091938503 and 0.00450356266. As m gets very large, the system reduces to (yt ; yt ). The single equation VAR(2) involving yt has eigenvalue 0.00487801748. The results are in Table 1 for selected values of m, obtained with Ox version 2.20 (the rounding errors accumulate slightly di:erently in Ox 3:00). Algorithm 1(a) and −1 1(b) only di:er in the order of evaluation: the former has P −1 [S01 S00 S01 ]P −1 , while −1 the latter uses [P −1 S01 ]S00 [S01 P −1 ]. Both algorithms are clearly much less stable than those using QR and SV decompositions. At m = 5 they have broken down, with completely wrong eigenvalues. In this case there is no indication of failure: only at m = 6 does the Choleski decomposition break down. Note that with Algorithm 1 it is actually possible for badly conditioned problems to 9nd eigenvalues which are negative or greater than one! The fact that the algorithm is so sensitive to the ordering of the computations is also a sign of instability. Algorithm 2 signals problems from m = 6 onwards, warning that Y0 has reduced column rank, but keeps producing an answer. Algorithm 3 also signals problems from m = 6 onwards, but is unable to provide a solution in that case. Algorithm 4 is designed to handle singularities. From m = 10 onwards it gives the correct answer as if there was only one y. Between m = 7 and 10 it moves to that value: for m = 6 the largest eigenvalue is 0.350917 but for m = 7 it is 0.004914. The value for y in this example is 2:1 × 10−6 ; from m = 7 onwards, the algorithm signals the singularity.
J.A. Doornik, R.J. O’Brien / Computational Statistics & Data Analysis 41 (2002) 185 – 193
193
Algorithms 2– 4 are clearly superior to Algorithm 1, still producing essentially the correct answer at m = 5. 8. Conclusion We presented and analyzed several algorithms for maximum likelihood estimation of cointegration which are numerically more stable than the standard form which uses second moment matrices. For a general purpose implementation, where singularity could be an issue, we recommend using the SVD-based Algorithm 4. Otherwise, for example in Monte Carlo experiments, Algorithm 3 could be used as the fastest algorithm that can detect near-singularity, or 2 as the more stable algorithm. Most importantly, all alternative algorithms are an improvement on the standard method, because they will signal problems instead of reporting potentially spurious results. Acknowledgements We wish to thank two anonymous referees for their comments. Financial support from the UK Economics and Social Research Council (Grant R000237500) is gratefully acknowledged by JAD. References X ., Golub, G.H., 1973. Numerical methods for computing angles between linear subspaces. Math. BjQorck, A Comput. 27, 579–594. Doornik, J.A., 2001. Object-Oriented Matrix Programming Using Ox, 4th Edition. Timberlake Consultants Press, London. Doornik, J.A., Hendry, D.F., Nielsen, B., 1998. Inference in cointegrated models: UK Ml revisited J. Econom. Surveys 12, 533–572. Golub, G.H., Van Loan, C.F., 1996. Matrix Computations, 3rd Edition. The Johns Hopkins University Press, Baltimore. Golub, G.H., Zha, H., 1994. Perturbation analysis of the canonical correlations of matrix pairs. Linear Algebra Appl. 210, 3–28. Johansen, S., 1988. Statistical analysis of cointegration vectors. J. Econom. Dynamics Control 12, 231–254. Johansen, S., 1995. Likelihood-based Inference in Cointegrated Vector Autoregressive Models. Oxford University Press, Oxford.