Identifying restrictions of linear equations with applications to simultaneous equations and cointegration

Identifying restrictions of linear equations with applications to simultaneous equations and cointegration

JOURNAL OF Econometrics ELSEVIER Journal of Econometrics 69 (1995) 111~ 132 Identifying restrictions of linear equations with applications to simu...

1MB Sizes 0 Downloads 19 Views

JOURNAL OF

Econometrics

ELSEVIER

Journal of Econometrics 69 (1995) 111~ 132

Identifying restrictions of linear equations with applications to simultaneous equations and cointegration Snrren Johansen Institute of Mathematical

Statistics,

University of Copenhagen, 2100 Copenhagen, Denmark

Abstract

The identification problem for simultaneous equations is solved by the well-known rank condition which gives a necessary and sufficient condition for the parameters to be uniquely or statistically identified by linear restrictions. This paper formulates and solves another problem: Given a set of linear restrictions, which conditions should they satisfy for most parameter values to be identified? The main result of the paper is a simple algebraic condition on a set of linear restrictions that guarantees that most parameters satisfying the restrictions are uniquely identified. Key words: Identification; rem JEL classijication: C32

Cointegration;

Simultaneous

equations; FIML; Hall’s theo-

1. Introduction and basic definitions The problem of identification is met in econometrics in connection with the construction of systems of simultaneous linear equations that allow the coefficients to be uniquely identified. Consider as an example the simultaneous linear

1 I have benefited from discussions with Katarina Juselius on the topic of identification, and Jon Faust and David Hendry helped with some of the econometric formulations. The reference to the results by Rado and Hall were pointed out to me by Lawrence Wolsey, Steffen Lauritzen, and Erik Kjaer Jorgensen. I have had interesting comments from Agustin Maravall, Teun Kloek, Gordon Kemp, and Grant Hillier. Throughout the period of this work my research has been supported by the Danish Social Sciences Research Foundation act. 5.20.31.05.

0304~4076/95/$09.50 0 1995 Elsevier Science S.A. All rights reserved SSDI 030440769401664 L

S. Johansen / Journal of Econometrics 69 (1995) I I I-132

112

equations S’X, = T’y, + B’z, = E,,

(1)

where the p-dimensional process X, = (yj,zi)’ is decomposed into r endogenous and k predetermined variables, p = r + k , and the parameters are collected into the r x (r + k) matrix 9’ = (Y, B’), where the r x r matrix r is assumed to have full rank. The errors are assumed to be independent Gaussian variables with mean zero and variance Q, such that likelihood inference is available. Furthermore E, is assumed independent of z,, such that (1) determines the joint distribution of the variables yr conditional on the predetermined variables zl. A parameter point (9, s2) is uniquely identified if for any other value (&, Sz,) the corresponding probability measures are distinct, and it is in this sense that the word identified will be used in the present paper. The parameters r and B are not uniquely identified in Eq. (1) without further restrictions. In order to be able to recognize or identify uniquely the individual relations we formulate linear restrictions Ri as full rank p x ri matrices and assume that the coefficients in the ith equation, 9i, satisfy Ri$i = 0,

i=

1, . . . ,r.

(2)

We also work with a slightly different formulation by defining Hi = Ril, that is, we define a p x (p - ri) = p x St matrix of full rank such that H:Ri = 0. In this case (2) is replaced by 9i = Hiqi, for some q-dimensional vector (Pi. If Hi contains zeros and ones only, then can be considered the selection matrix which indicates which elements of 9i vary unrestrictedly. Thus a direct parameterization is

Hi

8 = (HIV,, . . . ,H,cp,).

(3)

The set of restrictions define a linear statistical model by the parameter space 2 = {&lx*’ SllR:9i = 0, i = 1, . . . , r}. The classical result on identification (see Sargan, 1988, p. 29; Goldberger, 1964, p. 29) is given by the rank condition. Theorem

1.

identijed

is that

A necessary and su$icient condition that a parameter

rank(R$t) = r - 1,

i=

1, . . . ,r.

value (9, Q) is

(4)

Thus when applying the restrictions of one equation to the other r - 1 equations we get a matrix of rank r - 1. Hence it is not possible by taking linear

S. Johansen / Journal of Econometrics 69 (1995) I I I - I32

113

combinations of, for instance, QZ, . . . , 9, to construct a vector which is restricted in the same way as 9, and in this sense could be confused with the equation defined by &. Hence 9i can be recognized among all linear combinations of 9i, . . . , 9, as the only one that is in sp(H,), the space spanned by the columns in Hi, or the only one that satisfies the restrictions RI. Since in general 9 is unknown, the condition in Theorem 1 is not operational (Goldberger, 1964, p. 316). We therefore focus on the restrictions rather than the parameter point, since the restrictions and not the parameter are given by economic theory. Instead we ask the question of which conditions should the restrictions R 1, . . . , R, satisfy in order that the corresponding statistical model 2 contains identified parameter values. Together with _5?we consider another statistical model, namely the set of identified parameters

=A4= {~pxrTQlRi9i = 0, rank(Ri9) = r - 1, i = 1, . . . ,Y>. Clearly J? c _Y and our first result states that &? is a large subset of _Y in the sense that &! is an open dense subset of _!Yor, in other words, that _%‘\./Zis a nowhere dense subset, unless of course J is empty. Theorem 2. If 9 contains an identijed then A? is an open dense subset of 9’.

parameter

value, that is, JY is nonempty,

The proof is given in the Appendix. The theorem can be considered a formulation of the discussion in Fisher (1966, p. 45) where the basic idea is given. Note that 9 is a linear model but &! is not linear since the rank condition is a nonlinear restriction. The fact that JZ # 8 is dense in _!Ymeans that the likelihood function cannot be used to distinguish between the models and the MLE derived from the linear model _Y satisfies the rank conditions and hence corresponds to a point in J? with probability 1. For a given set of restrictions either all parameters in 55’ defined by the restrictions are unidentified, or ‘almost all’ are identified. The paper by McManus (1992) contains a number of general results about identification of parametric models emphasizing the generic property of the subset of identified parameters in a given model. The emphasis is on nonlinear models showing that in some sense all the problems are created by focusing on linear models. The purpose of this note is to discuss the linear models because we need them in the autoregressive formulation of dynamic models and in the discussion of identification of cointegrating relations. Note that a given set of linear restrictions defines the statistical model 2. Thus the properties of the model are given by the properties of the restrictions. We therefore define a set of identifying restrictions as a set of restrictions that give rise to a linear model in which the identijied parameters are dense, as discussed in Theorem 2.

114

S. Johansen / Journal of Econometrics

69 (1995) I I I-132

Dejnition 1. The restrictions RI, . . . , R, and the linear statistical model 53 are called identifying (generically identifying) if there exists a parameter value 9 such that the rank condition is satisjied, or equivalently such that JZ is nonempty.

Theorem 2 gives an immediate criterion for determining whether a given model LZ (or equivalently a given set of restrictions) is identifying, since one can just check the rank condition for a randomly selected parameter value, which is selected such that the probability of any open set is positive. The main result of this paper, however, is to give a simple algebraic condition (see Theorem 3) for a linear statistical model to be (generically) identifying, such that before a linear model is estimated a condition on the restrictions is checked. Instead of asking for a given set of restrictions whether a given parameter value is identified, that is, whether it satisfies the rank condition we ask the opposite question, namely what properties should the restrictions have in order that they can be used to construct a linear model that has the property that most parameter values are uniquely identified. Note that we treat equation-by-equation restrictions, in particular zero restrictions, but no cross-equation restrictions and no restrictions on the covariance matrix. The reason for this is that we do not have a result like that of Theorem 3 for these cases. We conclude this section by pointing out a number of situations where the identification problem is met in the analysis of the vector autoregressive model for cointegration written in the structural error correction form: AAY, = u/?‘Y,_~ + GAY,_1 + m + E,.

Here the E, are independent Gaussian with mean zero and variance 52. For notational simplicity we have left out more lags. The reduced form error correction model is AY, = M/W-I

+ TAY,-1

+ p + 6,,

where o! = A-la, r = A-‘G, etc. reduced form as in the structural The first identification problem assumed that Y, is I(1) it follows

(6)

Notice that the /l vectors are the same in the form of the model. is for the cointegrating relations /I’Y,. If it is from (6) that fl’Y, is stationary. Thus,

S’Y, = 51 (7) where Q is stationary. Clearly any linear combination of these equations will produce new stationary relations, and hence there is a need for identifying restrictions, like in (1). The representation theorem of Granger (see, for instance, Johansen, 1991) shows that the solution of (6) under the assumption that X, is I(1) is given by

(

Y, = C i

1

Ei +

/lt

)

+ U(,

S. Johansen / Journal of Econometrics 69 (1995) I I1 - I32

115

where U, is a stationary process and C = BMLU - UPI)-

‘&,

so that the common trends are given by LY;C’i ai. Any set of linear transformations of these p - r variables can also serve as common trends, so that again restrictions are required to identify uniquely the trends. The third identification problem occurs in the short-term dynamics, after the long-run relations have been identified. If in model (5) we define X, by stacking dY,, /?‘Y,_ i, and AY,_l, then the equations have the form @,a, G)X, = a,.

(9)

In this case the identification problem of the simultaneous effects and short-term dynamics involved in the matrices A, a, and G can also be formulated as Eq. (1). Examples of the identification problem in connection with (7) and (9) are given in Johansen and Juselius (1994). The main result in Section 2 of this note does not involve any assumptions on the probability properties of the process, but relies entirely on the fact that we are interested in the linear relations, which we want to identify in the sense that we want to recognize the equations on the basis of suitable linear restrictions on the individual equations. The results can therefore be applied to any of the four different situations sketched above, that is, simultaneous equations, cointegration relations, common trends and short term dynamics.

2. A necessary and sufficient condition for a set of restrictions to be generically identifying Suppose that we have given a linear model defined by the restrictions has therepresentation 9 = (Hlcp,, . . . ,H,cp,). Ifit is identified, it satisfies the rank condition R 1, ... , R,. Aparameterpoint

rank(RjH,cp,,

. . . ,R:H,(p,) = r - 1,

i = 1, . . . ,r,

and in this case the r - 1 column vectors {RiHjqj, j = 1, . . . , r, j # i} are linearly independent. This implies a condition on the restrictions which does not involve the parameters: rank(R:Hi,,

. . . 3RjHi,) 2 k,

for any set of k indices (iI, . . , ik) not containing i. It turns out that this condition is also sufficient for the model 2 to be generically identifying. This result is formulated in Theorem 3 and we illustrate in this section how it can be applied. The proof of sufficiency, which is based on some classical results in mathematics by Hall (1935) and Rado (1942), will be discussed in detail in the Appendix.

116

S. Johansen / Journal of Econometrics

69 (1995) I II-132

Theorem 3. The linear statistical model dp dejined by the restrictions RI, . . . , R, is identifying if and only if for k = 1, . . . ,r - 1 and any set of indices 1 < i, < ... < ik =$r not containing i, it holds that

rank( RiHi,, . . . 3RiHi,) 2 k. If (IO) is satisjiedfor equation.

(10)

a particular value of i, then the restrictions are identifying that

As an example of how to apply Theorem 3 we consider the situation with r = 3 , where

JJ = (Hlcp,,

H2v2,

(11)

H3(P3)-

To see if RI, R2, and R3 identify the first equation we check the conditions rank(R;H2) > 1, rank(R;H3) > 1, rank(R;(H,

: H3)) 2 2.

In terms of the matrices M,,

= HiHj - HjHk(H;H,)-‘HkHj,

i,j, k = 1,2, 3,

the conditions become rank(M2,.J

2 1, rank(M33.1) > 1, rank( :I:::

:I:::)

2 2.

(12)

Since the matrices M,, are positive semidefinite an easy way of checking this is to calculate the eigenvalues of the matrices in (12). The system is not identified if for instance sp(H,) c sp(H,), since then R;H, = 0 and condition (10) is violated by the choice i = 1 , k = 1, and il = 2. One can interpret condition (10) as instead of applying the restriction Ri to a parameter value 9i, . . . ,9, we apply Ri to all possible parameter vectors as given by H,, . . . , H,. This turns the focus away from whether a given parameter value is identified to whether a linear statistical model in general contains identified parameters, and Theorem 2 then states that if it does then almost all parameter values are identified. Now suppose that we have an identifying statistical model 9 as above. It is apparent that if we restrict the spaces Hi further, that is, formulate over identifying restrictions, we may decrease the rank in condition (lo), thus allowing the possibility that a linear submodel of the identifying model is not identifying. An example of this is the hypothesis 8 = (HIV,, (HZ nHi)rp,, H&,

(13)

where H2nH, is a matrix such that sp(H2nHl) = sp(H2)nsp(H1). This is a formulation of the hypothesis that the second vector is not only in sp(H,) but

S. Johansen / Journal of Econometrics 69 (1995) I I I-132

117

also in the sp(Hi) which indicates that the first equation is not characterized by the restrictions R1, since evidently also QZ satisfies the same restrictions. Thus, if we know where to look for lack of identification, we can formulate this as an overidentifying restriction on the model. The methods are illustrated and exemplified in Johansen and Juselius (1994) on money demand in Australia. As a simple illustration consider the model defined by the linear restrictions

(a, b, c, d) E R4.

0

\

dl

In this case we find 1 0 1 0

RI =

01

\o 1J

0

’ 0 ;I

1 0 0

R2 =

0

1 0 0

1

so that

rank(R;H& = rank rank&Hi)

=2>1.

= rank

Thus the conditions for generic identification parameter 9 in .Z is identified if

(10) are satisfied. A given

rank (R; Q2) = rank(c, d) = 1 and rank(R;&)

= rank(a + b, - a) = 1.

That is, the parameter 9 is identified if (c, d) # (0,O) and (a, b) # (0,O). The linear model _Yin this case contains the nonidentified parameter given by either (a, d) = (0,O) or (c, d) = (0,O). Note that in the context of cointegration these exceptional parameters correspond to one of the cointegrating vectors being

118

S. Johansen / Journal of Economehics

69 (1995) I I I-132

zero, that is, that the cointegrating rank is not 2 but 1. Note also that this set of restrictions cannot be reduced to zero restrictions by a transformation of the observations. If instead 9 is given by a 8=

-a

c -c

b

0

0

d

(a,b,c,4

3

E

R4,

then RI and H1 are as before, but

in which case the model is still identifying, but R;& = (0, d) and R;L$ = (0, b). Thus the parameter 9 is identified if d # 0 and b # 0. The exceptional values in this case do not decrease the cointegrating rank, unless they are both zero. In this example, however, we could introduce X1, + Xzr and Xi, - Xat as new variables and with this choice the restrictions become zero restrictions.

3. An algorithm for estimating simultaneous equations under identifying linear restrictions and an expression for the asymptotic variance of the estimator We consider the situation described in (1). The likelihood function is given by logL(&O)

= - f TloglQl + f TloglT’TI - f ~x;$a-‘s’x, t=1 = -+TloglQl

where S,, = T -’ It’= 1 X,X;

++TloglT’rl

-fTtr{K’S’S,,$},

(see Sargan, 1988). Maximizing over Sz gives

(14) where M=

(15)

S. Johansen 1 Journal of Econometrics 69 (1995) I I I - 132

119

We want to minimize (14) under the restrictions 9 = (Hlcp,, ... ,Kcp,), where we assume that the restrictions H1, . . . ,H, are identifying, that is, that they satisfy conditions (10). We can rewrite (14) such that the dependence on (pl is more apparent and then apply a switching algorithm by cyclically minimizing with respect to one variable and fixing the others. In order to describe the successive steps we define matrices r = ($2, ... 5w, SXX.T= s,, - S,,z(z’s,,z)M,, = M - Mz(z’Mz)-

1z’Sxx,

1t’M,

and decompose

and

ls’M4 = I~‘M~Ilcp;~;M.,~,cp,I, and insert into (14). This shows that the likelihood function depends on cpl in such a way that the estimate of ql is given by a variance ratio estimate, which is found by solving the eigenvalue problem

for eigenvalues Al > . . . > A,, and eigenvectors ul, . . . , u,,. We get the maximum likelihood estimator for (pl, for fixed values of the other parameters z, as the eigenvector corresponding to the largest eigenvalue Al, and hence, & = Hlvl. Next we fix &, Q3, . . . , 2Jrand maximize over (p2 using a similar eigenvalue procedure. We then continue until convergence. We do not know that this algorithm converges to the global maximum of the likelihood function, but we have found that it works in practice and that it converges fast if given reasonable starting values. The algorithm is probably slower than Newton-Raphson iteration, but has the advantage that it is explicit in each step. Once the eigenvalue problem has been programmed each step is easy to perform, the likelihood function is increasing in each step, and finally the algorithm works for general linear restrictions on the individual equations, not just zero restrictions. It is a tempting idea to use as initial values the unrestricted estimate 9* of 9. This, however, is not a good idea, since the numbering of the unrestricted 9 vectors may not correspond to the numbering given by Hl, . . , H,. Instead we want to find that vector in sp(9*) that is as close as possible to Hi and use this for a starting value for 9i.

120

S. Johansen / Journal of Econometrics

69 (1995) I I I - I32

Thus we first solve (14) without restrictions on 9, that is, 12S,, - Ml = 0, eigenvectors for eigenvalues A1 > ... > A, > ,?,+i = ... = 1, = 0 and w1, .‘. 7wp giving 9* = (wi, . . . , w,). Next we find a vector in sp(9*) which is closest to the space sp(Zfi), that is, we solve the problem (W’9*‘Hi(p)2 KZ

(W’S*‘9*W)(~‘H~Hi~)



or equivalently the eigenvalue problem 119*‘9* - 9*Hi(H:Hi)-‘H1$*1=

0.

Let u be the eigenvector corresponding to the largest eigenvalue, then we choose 9i = 9*u as the starting value for 9i in the iteration. Note that the eigenvalue formulation gives a normalization $i$i = u’9*‘9*u = 1, which may be changed for economic interpretation. This choice of initial value also has the advantage that if Hi, . . . , H, is a just identifying set of restrictions, then no iteration is needed. In order to discuss the asymptotic distribution and the expression for the variance it is necessary to normalize the vectors. A general formulation of the normalization is 9j = hj + H’~j, for some (si - 1)-dimensional vector 1,5~,where sp(hi, Hi) = sp(Hi). If in particular we want to normalize on one of the coefficients, the first say, then we take a vector hi in sp(Hi) with a first coefficient which is nonzero, and make sure that Hi is chosen such that the first row is zero. By differentiating the concentrated likelihood function (14) we derive the expressions for the derivative of log&,, with respect to Ii/i, a log L;,fiT/a$i = e:(9’S,,9)= e:s,‘(s,,

’ 9’S,,H’ - s&r’)-

- ej(LJ’MLJ- ’ 9’MI-I’ yr,

0)) Hi

(16)

= e:s, ‘S,,9;H’,

where 8; = ( - B- ‘, I) is just the reduced form coefficients, and ei is the ith unit vector. The second derivative is given.by a210gL,,‘!T/a+ia$j

= - e$E~‘ejHi’9,S,,9;H’

+ oP(l).

(17)

In order to formulate Taylor expansions we introduce some notation: Let xi be vectors and Mij and Mi be matrices. We let {xi} be the stacked vector, and {M,} the block matrix with blocks Mij, and finally we let diag{Mi} be the block diagonal matrix with blocks Mi along the diagonal.

S. Johansen / Journal of Econometrics 69 (1995)

121

I I I - I32

We conclude this section by proving the well-known result on the asymptotic distribution of the FIML estimator (see Sargan, 1988; Amemyia, 1985, p. 234). Theorem 4. The asymptotic distribution of the maximum likelihood estimator of the simultaneous equations (I) under the identifying restrictions RI, . . . , R, and the normalization 9i = hi + H i$i is given by T1’2{$i

- pi} -L

N(0, {e:S2-‘ejHi’9,CS;H’}-‘),

provided 9 is identijied and T _ ’ CT= 1 Z,Zi 2 is consistently estimated by

(18)

C > 0. The asymptotic

variance

where ?, = - r’- ‘PZ,. Proof: By an expansion of the likelihood function we find from (16) and (17) the representation {e:S,‘ejHi’9,S,,9;H’}

Since S,, 2

{tJi - pi} z {Hi’5’lS,,S&1ei}.

s2, and S,, 2

{e:S,‘ejH”S,S,,S;Hj}

(19)

C > 0, we find ‘,

{ejQ-‘ejHi’$,C$;Hj}

(20)

N(0, {e:SZ-‘ejH”s,~S;Hj:).

(21)

and T”2{Hi’91S,,SC~1ei}

-J+

Since the parameter is assumed to be identified \ covariance matrix is nonsingular. To see this note I that 0 = {Ki}‘{eiS2-‘ejH”G,CLJ;H’}

1 tl

‘he asymptotic I vector such

{Ki}

= Ci,je:R-‘ejKfH”S,CS;H’Kj = tr({e:Q-‘ej}

{K:H”S,CS;H~K~}),

then K:Hi’9,C9;HiKi

= 0,

i = 1, . ,r.

This implies that 9;H’Ki = 0, such that the vector HiKi, which is clearly in Hi, is a linear combination of the vectors in 9. Since the parameter is identified there is only one such combination which is in the space spanned by Hi, and that is 9i = hi + H’$i, hence HiKi = c(hi + H’$i) or Hi(~i - Q!I~C)= chi, which is

122

S. Johansen / Journal of Econometrics

69 (I 995) I I I-132

clearly impossible unless Ki = 0, and c = 0, since hi is linearly independent of Hi. Hence the matrix is nonsingular and we find from (19), (20), and (21) the result that

Tl”{$i - $i} “-,

N(0,

{e:S2-1ejHi’8,~8;Hj}-1),

which proves (18). A consistent estimator of the asymptotic variance is given by replacing C, R, and 9 by their estimators. Note that 3, involves the estimated reduced form coefficients, so that

Q-(

-‘:““‘)=(;:)

is the predicted value of X,. From (18) we can derive the marginal distribution of T”‘($, - $J, say. We give the result for r = 2 for notational reasons. Further let e;S2- ‘ej = oij, then T 'I'( I$ 1 - $ 1)is asymptotically Gaussian with mean zero and inverse variance o1 l H”9,z9;H’

- H”9,C9;H2(H2’91C9;H2)-

1

xH~'~,~$;H'~'~~~~/o~~.

(22)

This expression is valid if both $i and ti2 are identified. If only + 1 is identified, then H2'&Z9;H2 is singular, but then one notes that P2(Z)= 9;H2(H2'9,C9;H2)-'H2'9,C is just the projection onto the space sp(8;H’) Hence the expression (22) can be written

with respect to the matrix 1.

w"H1'8,Z8;H'- H"8,CP2(C)8;H'o'2~21/022, which is the general expression for the asymptotic covariance for the parameters of the first identified relation whether or not the second is identified or not. The proof is based on (19), (20), and (21) as before, just noticing that whenever the left-hand side of (19) is singular then so is the right-hand side. This has the consequence that whenever only the first relation is identified, one can choose a just identification of the remaining equations, and apply the general formula (18). This discussion is clearly not sufficient, and a more careful treatment will appear elsewhere.

4. Estimation of cointegration parameters and common trends under identifying restrictions We give here a discussion of the estimation problem for identified cointegrating relations in a vector autoregressive model. It turns out that the same estimation procedure as discussed in Section 2 can be applied for the identified

S. Johansen J Journal of Econometrics

69 (1995) II I-132

123

common trends, so we comment briefly on this at the end of this section. Let Y, be generated by the reduced form equations (6) dY, = aB’Y,-, + TdY,-i

+ p + Et,

(23)

let a and j? be p x r and assume that the cointegrating relations are restricted by B = (Hlcp,, ... ,&A).

(24)

For ease of notation we have left out more lags. The cointegrating relations are estimated without restrictions by reduced rank regression of AY, on Y,_ 1 corrected for AY,_ 1 and 1 (see Johansen, 1988). This procedure gives residuals Ror and RI, and product moment matrices Sij = T-’

i Ri,R>,. 1=1

The estimation problem is solved by the eigenvalue problem I&1 - SUJS&%

I = 0,

(25)

for eigenvalues 1 > A1 > ... > 1, > 0 and eigenvectors al, . . . , up, and an estimator of fi is given by p^= (ur, . . . , u,) normalized by j?‘S,,/? = I. This way of estimating the subspace spanned by /I implies an identification of the individual coefficients, as the coefficients of the eigenvectors of a suitable eigenvalue problem involving the population quantities corresponding to Sij. It is seen from the eigenvalue problem that since the eigenvectors diagonalize S1&~SOr a total of r(r - 1)/2 restrictions are imposed and since pS,,fl= I a further r(r + 1)/2 conditions are imposed, that is, in total r2 conditions as is usually done. This identification is in general not interesting from an economic point of view, where the usual procedure is to impose linear restrictions on each equation to achieve identification. When the restrictions (24) are imposed the estimation procedure is no longer explicit, and the switching algorithm described in Section 2 can be applied: For fixed values of (p2, . . . ,cpr or equivalently p2, . . . ,/I,, we define r = ( f12, . . . , /I,) and decompose a into (al, a2) where a1 is p x 1. Then AY, = alcp;H;Y,_l

+ a2?Yt_1

+ TAY,_l

+ p + E,.

The solution for fixed t is given by reduced rank regression of A Y, on H; Y, _ 1 corrected for A Y,_ 1, z’Y,_ 1, and 1. Hence we solve the eigenvalue problem

for eigenvalues Ai and eigenvectors Cpiand choose fll = H1cpl. Here S 1l.r =

Sll - sll~(~‘&l~)-‘~‘s11,

with similar definitions of SIO.r and SOO.r.

124

S. Johansen / Journal of Econometrics

69 (1995) 1 I I-132

The algorithm then continues by fixing fil, fi3, . . . , /I1 and solve for /I2 by reduced rank regression. The process is repeated until convergence by cyclically maximizing over one of the parameters (Pi keeping all others fixed. If further submodels are formulated in the form (24), then the likelihood ratio test is asymptotically x2 distributed. This follows from the general result that hypotheses on /I can be tested by the likelihood ratio procedure using the x2 distribution (see Johansen, 1991; Phillips, 1991; Ahn and Reinsel, 1990). Without identifying restrictions the coefficients of the common trends can be estimated by the eigenvalue problem

which is solved for eigenvalues 1 > A1 > ... > 1, > 0 and eigenvectors wi, . . . ‘WP. The unrestricted maximum likelihood estimator of al is given by 6, = (Wp_,+l, . . . , w,), and under identifying restrictions on al the switching algorithm described above can be applied. The algorithm discussed here and in Section 2 is an attempt to work with reduced rank regression techniques as far as possible. The only justification for the algorithm is that once the eigenvalue routine is programmed it is easy to iterate the calculations. One could as well apply a general optimization algorithm but the basic problem of convergence to a global maximum remains unsolved. The same algorithm can be applied for simultaneous equations and cointegrating relations, but a difference exists in the setup, namely the special structure of the matrix M, see (15) , compared to S1i . It is not difficult to see that the unrestricted estimates of 9 from the eigenvalue problem I&,

- MI = 0

gives eigenvectors wl, . . . , w, with the property that SPh,

...

,%I = SP(l, - fi)‘,

where I? is the reduced form regression estimates found from the regression Y, = nz, + E,. This simple result does not hold for the cointegration analysis where Si 1 is used instead of M. The maximization with respect to cpl keeping other parameters fixed leads to a variance ratio estimator, and hence resembles LIML estimation. This estimation method was introduced by Anderson and Rubin (1949) with the purpose of estimating a structural relation without imposing structure on the remaining equations. In the present language this can be expressed as the hypothesis 9 = (Hicp,, 92, ... ,9,). They showed that this estimation problem is explicit since the remaining parameters can be eliminated from the likelihood function by regression, and what remains is a variance ratio estimator for rp,. In

S. Johansen / Journal of Econometrics 69 (1995) I I I- 132

125

cointegration analysis the similar hypothesis cannot be estimated explicitly but need the switching algorithm (or something else) again due to the fact that the matrix S1i is more complicated than the matrix M. We formulate in the next section the asymptotic distribution of the maximum likelihood estimator of b.

5. The asymptotic distribution of the estimated identified cointegrating relations The general result about the limit distribution is given in the above references. It is proved there that the limit distribution is mixed Gaussian and that inference concerning the cointegrating vectors can be performed by the x2 distribution if likelihood ratio tests are applied. We first give here some notation and then prove the result in the present context. The estimation is most conveniently performed with the individual vectors unnormalized, but for the asymptotic distribution we normalize them, usually on one of the variables. A general formulation is as before of the form pi = hi +

H’$i,

for some (si - I)-dimensional vector tjj. The representation of Y, under the assumption that the process is I(l), is given by (8). It follows that H”Y,

=

H”C i:

&i

zit + H”U,,

+

i=l

for ri = H”Cp, (si - 1) x 1. Thus the process is asymptotically linear in the direction 7; and orthogonally to this direction behaves like a random walk. Note that no combination makes H”Y, stationary when /3 is identified, see the proof of the nonsingularity of the covariance matrix in the proof of Theorem 4. Let W, be the Brownian motion defined on the unit interval as the limit of the normalized sums of the E’S,that is, T-“2

C[;~“E~“,

W,,

and let IV = jAW,dt. Let yi = ZL (si - 1) x (si - 2) and define Si = ri(riZi)- *, then it follows easily from the representation (8) that T-‘f;Hi’(YITrl-

T- 1’2jQli'(YITtI

7)

G

- 7)

w

(t-f)=

“,

G,(t),

y;H”C(W, - WI) = y;Zf”G,(t),

(27)

126

S. Johansen / Journal of Econometrics

69 (1995) I I1 - 132

say, and we define G’ = (G, , G;), and

that is, the Brownian motion corrected for a trend. With these results we can formulate the main result: Theorem 5. The asymptotic distribution of the maximum likelihood estimator of the cointegrating relations under the identifying restrictions RI, . . . , R, and the normalization bi = hi + Hill/i is mixed Gaussian and given by

-1 T{$i

- $i}

“,

~~d-‘Uj$H”

diag{yi}

iG2,1G;,lduHi.JI 0

x

y;H” j G2,1(dW)‘Q- ‘rx.‘1 *, 0

provided /? is identijied. mated by

The asymptotic

(28)

‘J

conditional

variance is consistently

T2{oi;~-‘~jHi’C~R~,R;,H’}-‘.

esti-

(29)

Proof The asymptotic distribution is most conveniently derived from the likelihood function concentrated with respect to ~1.We also use the general result that inference on fl can be conducted as if the other parameters were known (see Johansen, 1991). Under the identifying restrictions and the normalization, the derivative with respect to $i is given by

alogL/atii

= crfs2-‘CT~,(Y,_1 - P_l)‘H’ = Ta;K’S,,,H’

and a210gL/C@ia$j = - a;Q-‘ajHj’ZT(Y,-l

- Y_l)(Y,_l

- Y_l)‘H’

= - Tu~!X’~L~H”S,,~H~.

An expansion of the likelihood function gives the expression (30)

S. Johansen / Journal of Econometrics

69 (I 995) I I I - I32

121

We next determine the asymptotic properties of the product moments appearing in the derivatives. For BIT = (T- ‘fi, T- “‘yi) we find (31) T”‘B;,S

YE - w

Applying the results (31) and (32) in the expansion (30) we then get

and hence the marginal distribution of the last component is -1 T(yr(~i - ILi)} “, crj~-‘aj~~H”j‘G,,G;,lduH’;: 0 1

x 7; Hi’ j G2.1 (dW)‘SZ- ‘cli

.

0

This determines the asymptotic distribution of T { $i - It/i} since T{$i-

pi} = {T- 1’2fiT3’2Zi($i

- $i)} + {yiTyi($i

- It/i)),

which shows that the first term vanishes in the limit. This proves (28). Thus the asymptotic distribution is mixed Gaussian with an asymptotic conditional variance given by -1

diag(&j

i

Cr:O-‘Crjy~Hi”i G2,1G;.ldUHjTj 0

I

diag(?‘i},

which is consistently estimated by

This completes the proof of Theorem 5. Thus in practice, ignoring finite sample problems, we can act as if the asymptotic distribution of {$i - $i> is Gaussian with a variance matrix given by (29). The square root of the diagonal elements of (33) provide the ‘standard deviations’ which are used for constructing Wald tests for specific values of individual coefficients if the likelihood ratio test is not worked out. Note that the distribution of j is mixed Gaussian such that the distribution has heavy tails. Note also that the test for hypotheses on fi are not based upon the asymptotic distribution of p but on the joint limit distribution of B and the

128

S. Johansen / Journal of Econometrics

69 (1995) I I I-132

observed information in the sample (a’SL_ ‘a@&) suitably normalized. Hence the finite sample properties of the test for hypotheses on fi should not be investigated by simulation of the distribution of fl but on the distribution of the test statistics or the t-ratios for the individual coefficients. Since the ‘asymptotic variance’ or the observed Fisher information is random even in the limit it may happen that we get a very small value of the information and hence a large value of the asymptotic variance. In this case it may happen that the value of the likelihood estimator appears far away from the true value, but in reality we only have a situation, where the data has little information on the true parameter. Thus deviations between fi and /? should not be measured by the variance of fi but by the observed information (29) or the conditional variance (see Johansen, 1995). The limit distribution of the t-ratio is asymptotically Gaussian such that one would expect much smaller tails in finite samples than is found in the simulations of the marginal distribution of 8.

6. An illustrative example The paper by Juselius (1995) contains an analysis of the long-run foreign transmission effects between Denmark and Germany. The variables analysed are (pi, p2, ei2, ir, i2), that is, the log consumer prices in Denmark and Germany, the exchange rate, and the two bond rates. The data is quarterly 1972: 1 to 1991:4 and the detailed analysis can be found in the above mentioned paper. It was found that the variables (Pl - P2,APl?

e12,

h?

i2)

contained no I(2) variables and a VAR model with three cointegrating relations was analysed. As discussed in Juselius (1995) the basic long-run relations of interest are the PPP relation and the UIP relation. These are expressed as plr - pzt - elzt being stationary, or equivalently that the vector (1, 0, - l,O, 0)’ E sp(p), and that izr is stationary, or that (0,0,0, 1, 1)’E sp( p). The unrestricted estimates 111 of the cointegrating vectors are given in Table 1. It was found that the hypothesis that PPP was stationary was accepted, whereas the stationarity of UIP was strongly rejected (Juselius, 1995, Table 8). As the decision between stationarity, I(0) , and nonstationarity, I(1) , is not always so easy, it was decided to formulate the hypothesis that suitable combinations and modifications of these basic relations were stationary. This is formulated as linear restrictions on the cointegrating relations. Thus fl was estimated under the restrictions P =

(H,qol,

H2cp2,

H,qod,

S. Johansen / Journal of Econometrics

129

69 (1995) I I I-132

Table 1 The unrestricted cointegrating vectors Variable

82

PI

PI-P2 API

e,z iI i2d

B3

- 0.007

- 0.125

1.000 0.024 - 0.133 - 0.732

- 0.683 0.169 1.000 - 1.591

1.000 1.260 - 0.985 - 4.427 4.008

where

,

HZ=

1 0

0

0

0

0

0

1 0

0 0 0 001 The first describes the relation between real exchange rates and the interest differential, the next is a modified PPP relation and the third describes a relation between Danish price inflation and Danish nominal interest rate. The rank condition would tell us whether these restrictions identify the parameter values determined by the data. Since this parameter is unknown, we ask instead if the restrictions in general will identify uniquely the parameter values. We therefore calculate the matrices 1

1 0

R;H2= i 0 0 1 0

0

i

0

11000

\

R;(H,:H,)=\ i i :, Cf :) , etc.

and find rank(R;H,)

= 2,

rank(R;H,)

= 2,

rank(R;(H,:H,))

= 3,

rank(R;H,)

= 1, rank(R;H,)

= 2,

rank(R;(H,:H,))

= 2,

rank(R;H,)

= 2,

= 3,

rank(R;(H,:H,))

= 3.

rank(R;H,)

130

S. Johansen / Journal of Econometrics

69 (1995) I I I - I32

Thus the conditions (10) are satisfied and the parameters defined by these restrictions are ‘almost all’ identified in the sense described in Theorem 2 and Theorem 3. In Johansen and Juselius (1994) we make the distinction between, generic identification, empirical identification and economic identification. What this paper is concerned with is only the first notion of generic identification. Thus in the above example we have established that in trying to estimate the restricted parameters we do not meet a problem that is singular in its very formulation. What remains is to check that the parameters thus found are also empirically identified, that is, that we cannot accept a hypothesis of over identifying restrictions on the parameters. Finally the analysis should contain a discussion of the economic interpretation of the estimated relations, see the abovementioned paper by Juselius (1995).

Appendix

A.I.

Proof of Theorem

2

We can formulate the rank condition by defining the matrix A,(q) with entries A,(q)ij

= q:HfR,R;Hjqj,

i, j # k.

Then rank(R;$)

= rank(A,(cp)),

such that the rank condition is equivalent to det(A&))

= IA,

Z 0.

Now the function cp--, IA,(q)1 is a polynomial, and it is well-known that a polynomial which is zero on an open set is constant ( = 0), and on the other hand if there exists a vector cp such that IAt # 0, then the set of zeros {cplIA&~)l= 01 contains no open set. Under the assumptions of Theorem 2 58 contains an identified parameter value cp such that IA,(cp)1 # 0 for all k. In this case the set of unidentified parameters is equal to the closed set

which does not contain any interior points. Thus the complement &Zis an open dense set.

S. Johansen / Journal of Econometrics 69 (1995) I I I-132

131

A.2. Proof of Theorem 3 A.2.1. Necessity of the algebraic rank condition It is easily seen that if the set of restrictions identify the first equation, say, then

one can take vectors Sj E sp(Hj), j # 1, such that they satisfy the rank condition (4) and hence such that they are linearly independent. This means that the matrix

contains at least k linearly independent columns. Thus condition (10) is necessary, but the sufficiency is not so simple. A.2.2. Sujiciency of algebraic rank condition for zero restrictions

The result follows for zero restrictions from Hall’s Theorem, a classical result from combinatorics (see Hall, 1935). Theorem 6 (Hall, 1935). Let Mi, i = 1, . . . , n, be finite sets then a necessary and suficient condition that we can take distinct elements mi E Mi is that for all k = 1, .,. ,n and all sets of indices 1 Q il < ... < ik < n,

card (Mi,

u

...

u

Mi,) > k.

(33)

What is characteristic for zero restrictions is that the vector space spanned by Hj is defined by the subset of the unit vectors that it contains. Hence the vector spaces ~p(R;Hj),j = 2, . . . ,r, are characterized by subsets Mj of {1, . . . ,p},

and condition (10) translates into condition (33). Under this condition Hall’s Theorem states that there exists distinct representative elements mj E Mj, j = 2, . . . ,r. The unit vectors corresponding to these are linearly independent, and hence guarantee that the rank condition is satisfied for this choice of parameter values. The connection between Hall’s theorem and the identification of econometric models using instrumental variable techniques has been investigated by Maravall (1993). A.2.3. SufJiciency in the general case

The proof follows from a generalization of Hall’s theorem due to Rado (1942). Theorem 7 (Rado, 1942). Let Mi, i = 1, . . . , n, be jinite sets of vectors from a given vector space; then a necessary and sujficient condition that one can take linearly independent vectors vi E Mi, i = 1, . . . , n, is that for all k = 1, . . . ,n and all sets of indices 1 < iI < ... < ik Q n,

dim{sp(M,,

u

...

UMi,)}

Z

k

132

S. Johansen / Journal of Econometrics 69 (1995) I I I - 132

With this result the proof of Theorem 1 goes as follows: We define Mi as the columns of R;Hi+l, i = 1, . , Y - 1, and let 12= r - 1. Then (10) implies by Rado’s theorem that we can choose vectors ui E Mi linearly independent. These have the form Vi= R;Hi+l(pi+l for some unit vector (Pi+1. The vectors 9i=Hi~i, i=2, . . . , r, satisfy the rank condition, and hence the first equation is identified by RI. The reason for proving the result of Theorem 3 first for zero restrictions and then in the general case is to point out the connection with the two important results from operations research. A systematic exposition of the theory of matroids derived from Rado’s theorem is given by Welsh (1976), and the applications of Hall’s theorem in the theory of flows in networks is given by Ford and Fulkerson (1962).

References Ahn, SK. and C.G. Reinsel, 1990, Estimation for partially nonstationary multivariate autoregressive models, Journal of the American Statistical Association 85, 813-823. Amemiya, T., 1985, Advanced econometrics (Basil Blackwell, Oxford). Anderson, T.W. and H. Rubin, 1949, Estimation of the parameters of a single equation in a complete system of stochastic equations, Annals of Statistics 20, 4663. Fisher, F.M., 1966, The identification problem in econometrics (McGraw-Hill, New York, NY). Ford, L.R. and D.R. Fulkerson, 1962, Flows in networks (Princeton University Press, Princeton, NJ). Hall, P., 1935, On representatives of subsets, Journal ofthe London Mathematical Society 10,263O. Johansen, S., 1988, Statistical analysis of cointegration vectors, Journal of Economic Dynamics and Control 12, 231-254. Johansen, S., 1991, Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models, Econometrica 59, 1551-1580. Johansen, S., 1995, The role of ancillarity in inference for non-stationary variables, Economic Journal 105, 302-320. Johansen, S. and K. Juselius, 1994, Identification of the long-run and the short-run structure: An application to the ISLM model, Journal of Econometrics 63, 7-36. Juselius, K., 1995, Do purchasing power parity and uncovered interest rate parity hold in the long run? An example of likelihood inference in a multivariate time series model, Journal of Econometrics, this issue. Maravall, A., 1993, A note on transversal theory and identification of econometric models, Mimeo. (European University Institute, Domenico di Fiesole). McManus, D.A., 1992, How common is identification in parametric models?, Journal of Econometrics 53, 5-23. Phillips, P.C.B., 1991, Optimal inference in cointegrated systems, Econometrica 59, 283-306. Rado, R., 1942, A theorem on independence relations, Quarterly Journal of Mathematics, Oxford 13, 8389. Sargan, J.D., 1988, Lectures on advanced econometric theory (Basil Blackwell, Oxford). Welsh, D.J.A., 1976, Matroid theory (Academic Press, London).