Simultaneous equations with covariance restrictions

Simultaneous equations with covariance restrictions

Journal of Econometrics 44 (1990) 25-39. North-Holland SIMULTANEOUS COVARIANCE Thomas EQUATIONS WITH RESTRICTIONS J. ROTHENBERG and Paul A. RUU...

818KB Sizes 0 Downloads 38 Views

Journal

of Econometrics

44 (1990) 25-39.

North-Holland

SIMULTANEOUS COVARIANCE Thomas

EQUATIONS WITH RESTRICTIONS

J. ROTHENBERG

and Paul A. RUUD

U~lic~rsi
Full-information maximum-likelihood and minimum-distance estimators are derived for the simultaneous-equations mode1 with covariance restrictions. Linearization yields estimators that are easy to compute and have an instrumental-variables interpretation. The efficient three-stage least-squares estimator in the presence of covariance restrictions is shown to be a special case. Modified estimators which take into account possible nonnormality in the errors are also discussed.

1. Introduction We reconsider the linear simultaneous-equations system with covariance restrictions as described by Hausman, Newey, and Taylor (1987). Using both maximum-likelihood and minimum-distance approaches, we explore several estimation strategies. The augmented three-stage least-squares estimator proposed by these authors is derived as a special case. First we review the efficient estimation of the linear simultaneous-equations model without covariance restrictions. This provides the outline of the approach that we will use in the presence of covariance restrictions and it also is a useful contrast. Secondly, we provide an alternative derivation of the asymptotic covariance matrix of a subset of parameters estimated by maximum likelihood. This turns out to be much simpler than using the partitioned inverse formula on the information matrix. Thirdly, we derive the efficient 3SLS estimator in the presence of covariance restrictions as a linearized maximum-likelihood and as a linearized minimum-distance estimator. Finally, we discuss the properties of our estimators when the errors are not normally distributed.

2. No covariance restrictions A complete linear simultaneous-equations model involving G endogenous and K predetermined variables for a sample of size T can be written 0304-4076/90/$3.5O’C

1990, Elsevier Science Publishers

B.V. (North-Holland)

26

T.J.

compactly

Rothenherg

und P.A.

Ruud,

Simultaneous

equations

with covariance

restrictions

as

YB+ XT=

0)

U,

where Y is the T x G matrix of observations on the endogenous variables, X is the T x K matrix of observations on the predetermined variables, and U is the T x G matrix of unobserved random errors. The T rows of U are mutually independent, identically distributed random vectors with mean zero and positive definite covariance matrix 2. The matrix B is assumed to be nonsingular so that Y has the reduced-form representation

v,

y=xn+

where II = - I’B-’ and the rows of V = UB-l are independent with mean zero and covariance matrix B’-‘2B-‘. We shall further assume that X has rank K. Some of the elements of B and r are known a priori and do not have to be estimated from the data. Let A = (B’ F)’ be the (G + K) X G matrix of structural coefficients. Defining 6 to be the q-dimensional vector of unknown elements of A and defining r to be the GK + G2 - q vector of known elements, we can write’ r=Rbvec(A) The selection

Using

matrices

and

6=S,‘vec(A).

R, and S, consist

this fact, the prior restriction

(4 of zeros and ones and satisfy

R; vet(A) = r can be solved as

vec( A) = $3 + R,r. Since eq. (1) is equivalent to [ ZG@ (Y X)]vec( A) = vec( U), the simultaneousequations model with linear restrictions can be written in the familiar form of one stacked-vector equation in GT dimensions: y=Z6+u,

where

u = vec( U) has mean

zero and covariance

matrix

(2 Q I,);

the vari-

‘The operator ‘ vet’ transforms a matrix into a vector by stacking successive columns. We shall also use the operator ‘ vech’ which stacks the columns of the lower triangle (including diagonal) of a symmetric matrix, thereby removing the redundant elements. For matrices of appropriate dimensions, vec( CDE) = ( E’ @ C)vec( D), where 0 represents the Kronecker product.

X WI asoqM X!W=u (x+9) (5)

X3

e s! (0 r_a)

‘olaz 1p2a.rt! suuuyo:, pu0 JX + 8X =R aJaqM

‘(an -x~)~~~A.H(~-~~‘-x),H:-

=

an3 D pm g 103 sa103s ayi ‘(x)qDaA . H = (X)DaAwyi yms sauo pur! solaz 30 x!.~wu aql sf H 3’ ‘uaqL ‘1 30 swawala mouyun lm!wp ayl Bu!ufewo~ IolDaA (1 +s)gf ayl s! (X)y3aa E D alar++4‘De/7e = D7 put2 ge/7;re = g7 auyaa ‘o.raz 01 saJo3s asayl 8uglas pm sralaumed palDgsa.mn ayl 01 Ic)adsaJ YIP 7 30 saAgaA!lap ayl Sh.gcIrnIe~ Aq pamo3lad aq UKI s!y~, *suog+wal lo!Jd aql %$39ws 1 pm ‘J ‘8 30 sanlm ayl JaAo (E) %I.I~Z~~X~UIIcq pau!elqo ale sJowu!lsa ‘pooygay!~ umu~ew 30 poqlaw aql 01 8u;rp~omv .uogdumsse sg %uyxeIaJ30 samanbasuo3 aql aloIdxa aM ‘p uo!l3as UI mep ayl 30 uo!lnqyqs!p pmw aql s! s%Fql wyl amnsse Ileys aiME puz 2 suoyas 30 JapuretuaJ ayi .rod ‘[r_x(Jx l(8)~apPok~

+ tLI),(JX

+ (X)lap%oi.Ji

+m)]Jl$

-

- (fi~)%o1.9.~$-

=

(X‘XfX‘J‘8)7 ayl 103 uoymn3

aq pInoN uIals.k suopenba-snoauwIntu!s pooy!IayyI-801 ayl ‘pamqpwp 61pm~ou alaM wolla ayi 31 ymolddv pooyqayy-umuqxvur

aq~ ‘1.2

wnl u;[ ycea ssnmp ah .uogmu!lsa amewp -u.rnur~u~ pue pooqgaSg-umu~aw 30 S~S@XI~mo UIOJJaSaura [pm paalsu! 1nq ‘dp?aJfp paroldxa aq IOU IBM qceoJddt! h1) saIqerr~A-Il?luauInJlsuI S~J *A,AI _ (z,k ) aq plnom amugsa alqmosear f! ‘sysyxaasJam! aql3! ‘uayL *n t.p!M pala~a.IloXIn (IClamu~o~dde) pm? z yl!m pa$eIaUoD @@y s! ]ey~ M x~wm.~ b x Jg e 103 ymas 01 s! Bawls uogawgsa au0 ‘alqeyguap! s! g %u!umssv *“s[(xx)

cQI]-

=z

‘4l[(XX)@~Z]

=ft

6q uah@ an2 z put? X salqe

28

T.J. Rothenherg

and P.A. Ruud, Simultaneous

equations

with covariance

Since H’( Z- ’ @ Z-‘)H is invertible, the equation solution 2 = U’U/T. Unfortunately, the equation L, and cannot be easily solved. However, an interesting equation has been pointed out by Durbin (1988), Hendry (1976). Inserting the first-order condition TZ obtain the concentrated score2 for 6,

L,* = - S,l vec[ (XII

restrictions

L, = 0 has the unique = 0 is highly nonlinear interpretation of this Hausman (1975), and - U’U = 0 into (4), we

X)VE-l]

= -S,‘[Z-‘~(xnx)],u=~(z-‘sr,)u,

(6)

where Z = - [( IG @ (XII X)]S,. Assuming the appropriate matrices are invertible, the full-information maximum-likelihood (FIML) estimator which satisfies the equation L $ = 0 has the instrumental-variables representation 8,,

= [ iy

(8

IT)Z] -‘qe-l

Q

I,)y,

where Z = - [I, @ (XI? X)]S, and 2 = l_?‘fi/T are functions of s^,,. According to the definition of 2, the endogenous variables in Z are replaced by their FIML fitted values to form instruments which, when combined using generalized least squares, are interpreted as optimal instruments. Rothenberg and Leenders (1964) derived the asymptotic covariance matrix of the FIML estimator by taking derivatives of the concentrated-score function (6). Alternatively, it can be obtained from the asymptotic variance of the concentrated score. That is, the asymptotic-covariance matrix for @( s^,, - 8) is given by the inverse of Avar( T- ‘/‘L$ I= S{ [ 2-l

63 (II 1,)‘plim

Eq. (7) is only an implicit function for the FIML estimator, but other asymptotically equivalent estimators take the same form and are easier to compute. For example, the three-stage least-squares (3SLS) estimator proposed by Zellner and Theil (1962) and the full-information instrumental-variables estimator (FIVE) proposed by Brundy and Jorgenson (1971) can both be ‘For the log-likelihood L(6, o), the concentrated function is defined as L*(F) = L[6, h(6)] where L,[S, h(S)] = 0 so that ah’/as = - &,I&‘. Differentiation of L* gives L$’ = L,[6. h(6)] and L,& = L,, - L,, L,‘L,,. Thus, (6) is the score of the concentrated likelihood function and its to 6. derivative (when inverted) is L ss, the block of the inverse Hessian of L corresponding

T.J. Rorhenberg und P.A. Ruud, Simultuneour equations with covuriunce restricfions

29

written as

[z~(e-l~z,)z]-‘i,(e-l~z~)~, where z=

[(‘c@(XiiX))]S,,

2 = +0/T,

and I? and fi are calculated from consistent but inefficient estimators of B and r. They can be interpreted as linearized maximum-likelihood estimators (LMLE) since they are of the form

where 8 is the MLE, asymptotic appropriate

some initial estimator, T-l&;’ is some estimate of the variance of and i$ is the concentrated score for 6 evaluated at 8.3 The equivalence of these estimators with FIML can be demonstrated if regularity conditions are satisfied.

2.2. The minimum-distance approach Rather than work with the likelihood function, a distance function in sufficient statistics provides an alternative route to asymptotically efficient estimators. Let M = Z - X( X’X)-‘X’ and n = GK + iG(G + 1). Then, under normality, the distinct elements of the matrices (XX)-lX’Y and Y’MY/ (T - K) are sufficient for the unknown elements of (AJ). These statistics can be written as an n-dimensional vector s with mean CL:

PL(AA =

-vet I’B-’ vech B’-‘ZB-’

1’

The vector @(s - p) has mean zero and variance matrix V, given by

=

T(B~-‘B’@DX’X)-~ 0

3Although the method was introduced by Rothenberg and Leenders tation of 3SLS and FIVE was made by Hendry (1976).

(1964). the LMLE interpre-

30

T.J. Rotherdwg und P.A. Ruud, Simultaneous equations with covariance restrictions

where J = H( H’H)-’ has the property that vech(E) = J’vec(2). The minimum-distance approach finds parameter values that make the estimated mean p( a,e) as close as possible to the observed s. If vs is some consistent estimate of V,, it is natural to consider estimates that minimize

subject to Rk vec( A) = r. Under regularity conditions, this minimum-distance (MD) procedure produces consistent and asymptotically efficient estimates [cf. Malinvaud (1970, pp. 348-360) and Rothenberg (1973, pp. 24-25)]. As in maximum likelihood, the actual solution can be rather difficult because p is a complicated nonlinear function. A modification of the minimum-distance approach is more tractable. Let g(s, A, 1) be a vector of n smooth functions. Suppose that, for all parameter values (A*, E*) in an open neighborhood of the true value, g(s, A*, Z*) is a locally one-to-one function of s. Suppose further that, as the sample size T tends to infinity, plim g(s, A, 2) = 0 and @g(s, A,Z) is asymptotically normal with zero mean and nonsingular covariance matrix VE(A, E). For some consistent estimate fg, minimize T[g(s,

AJ)]‘f;‘[g(s,

AJ:)],

subject to the constraints. Under appropriate regularity conditions, the solution to this problem is also consistent and asymptotically efficient. By cleverly choosing the function g, we can construct a computationally convenient estimate of the structural parameters. The natural choice for g turns out to be

g, = [I i

vec( X’YB + XXr)

vech

&I

B’Y’M YB T_K

f(B’@X’X)

I[vech;g’E 1 I

-2) = 0

=/ J’( B 8 B)‘H

with

v, =

I k-II.bw)1~ 1

T.J. Rothenherg und P.A. Ruud, Simultaneous

31

equutions with covuriunce restrictrons

In this case, the original problem has been simplified by ‘cancelling’ the common terms involving B- ’ that appear in both p and V,. For some consistent estimator 2, the resulting modified distance function is4

r2g@zoXtx)-1g,+ :(T-K)g:H’(#%FJe)-lHg,.

(9)

When there are no restrictions on 2, the minimization of (9) is very simple. For any value of A, g, can be set to zero by varying 2. Hence, minimizing (9) over A and JZ is equivalent to minimizing

over 6 alone.5 Denoting solution is the three-stage [Z$-‘~3 which is a LMLE

the idempotent matrix least-squares estimator

X( X’X))‘X’

by N,,

the

N,)y,

Nx)Z] -‘Z$-‘8 using the preliminary

estimates

2 and I? = (X/X)-‘X’Y.

3. Partially restricted covariance matrix Suppose p of the elements of vech(2) are known to be zero. Then, as in (3) this information can be expressed in terms of selection matrices:

R; vet(2)

= 0,

S,l vet(2)

(10)

= u,

where u is the vector containing the iG(G + 1) -p unknown elements.6 Again, an explicit solution is given by vet(Z) = S,,u. The total prior information on the structural parameters (which we assume to be sufficient for identifiability) can be expressed either by the pair of constraint equations R; vec( A) = r, or, equivalently,

R; vet(2)

by the pair of equations

= 0,

relating

the constrained

parameters

4We use the fact that the inverse of J’(.E 0 I)J is H’(B 8 8)-‘H. Cf. Richard (1975) and Rothenberg (1973, pp. 87-88). Hausman (1983, p. 421) questions this way of representing the precision of a variance matrix, apparently because he does not use our definition of J. 51n general, let Q = x;V”xt + x;Vz2x2 + 2x{ V12x,, where the V’J are appropriate blocks of the inverse variance matrix. If x2 is freely variable, the minimizing value of x2 is - ( Vz2)- ’ V21x, and hence the minimum of Q is obtained by minimizing x[[V” - V12(V22)m1V21]x1 = x{V,;‘xl. 6R, and S,, are assumed symmetrically in (10).

to lie in the column

space

of H so that

o,, and

e,, are treated

32

T.J. Roth&erg

and P.A. Ruud, Simultaneous

to the free parameters

equations with covariance restrictions

6 and u:

vec( A) = $6 + R,r,

vec( 22) = SO0.

Our problem now is to maximize the likelihood function (3) or to minimize modified distance function (9) subject to these restrictions. 3. I. Maximum

the

likelihood

Without covariance restrictions, concentrating the log-likelihood function with respect to Z yields the IV interpretation of FIML. It is also a route to simplifying the derivation of linearized approximations to the maximum-likelihood estimator. In particular, the asymptotic covariance matrix for a,,_, a key ingredient in developing such an approximation, can be calculated from the score of the concentrated log-likelihood function, rather than from a partitioned inverse of the full-information matrix of the likelihood function. In the presence of covariance restrictions, concentration of the log-likelihood is much less convenient since an explicit closed-form solution of the normal equations for u does not appear to be possible. However, such solutions are not necessary to take advantage of the LML method or to obtain an IV interpretation of the FIML estimates. The score for 6 given in eq. (4) remains valid in the presence of covariance restrictions, but eq. (5) must be replaced by the score for the free parameters u:

L, = - fS;(Z:-’

8 Z’-‘)vec(Z

- VU/T).

(11)

Although equating L, to zero does not yield a simple solution for 2 in terms of U when S,, has rank less than iG(G + l), an implicit relationship can still be obtained. Substituting vet(Z) = S,a into (ll), we find the ‘solution’ u = [S;(ZP1

QDZ-‘)&I

P’S;(2P1

@ Z-‘)vec(U’U/T),

and hence7 vec( T2 - U’U )

=

(s,[s~(~-‘~x-‘)s~]-~s~(x-‘~B-~)-1cz)H.vech(U’U)

= -(2@2)R,[R;(E@2)R,]-‘R;vec(U’U). ‘The final expression follows, after a rotation of the coordinate system, from the identity H(H’H)-‘H’= R,( R;R,)-‘R; + S,(S;S,)-‘S; which describes the fact that the column space of H is the direct sum of the column space of R, and the column space of S,.

T.J. Rothenherg

md P.A. Ruud, Simultaneous

Substituting this expression concentrated-score function Z,,* = -SJ{ = (Z+

[z-i@

equutions

with covariance

into eq. (4) gives, as a generalization

(XII

x)1!+

[I@

(D-1

33

restrictions

of (6)

O)]‘Z(Z,@

W)‘(PC3Zr)u,

the

U)l}u 02)

where Z=Ro[R:,(Z@2)R,]-1R;,

z= - [(z,c3(xn

X))]S,,

w= -(_I?@ U)Z[Z,LB(n-l

0)1&s,.

As noted by Hausman, Newey, and Taylor (1987) the generalized IV form of FIML in (7) changes only in the instruments for the endogenous explanatory variables. With covariance restrictions, we have

a,,=

[(2+ ~)~(e-l@zT)Z]-1(2+ @)pl@zI,)y,

where 2, I@, and 2 are the FIML estimates of 2, W, and 2. The matrix 2 + W has a simple interpretation as an optimal instrument for Z. Using the transformation matrix A = (Z-l/* @ IT), where E112 is the symmetric square root of 1, our model can be written as Ay=AZ6+Au, where Au has a scalar covariance matrix. The transformed matrix of predetermined variables has valid instruments since c?[A (Z, 8 X)]‘A u = 0. In addition, the covariance restrictions imply that E[A(E @ U)R,]‘Au = 0. The population projection of AZ onto the space spanned by the GK + p columns of [A(Z,@ X) A(~@J U)R,] yields A(Z+ W) as the linear combination most highly correlated with AZ. In the presence of covariance restrictions, structural disturbances that are uncorrelated with the errors augment the reducedform regression function for the endogenous variables, thereby increasing the amount of variation in the instruments. The asymptotic covariance matrix for the maximum-likelihood estimator is, under suitable regularity assumptions, given by the inverse of the limiting Hessian of the concentrated likelihood function. Again, an easier route is via the asymptotic variance of the concentrated score. From (12), L,* is the sum of two uncorrelated terms: L,*,=

-s,,[z-‘@(XII

Ls*2= -S,l[Z,@(2Z!-’

x)]~u=z’(2-‘szT)u, O)]‘Evec(U’U)=W’(2-‘@Z,)u.

34

T.J. Rothmherg

and P.A. Ruud, Simultaneous

equations with covariance restrictions

Using the fact that var[vech(U’U)] is 2T”(E for JT(s^,, - 8) is given by the inverse of

s,l i[

2-l

@ (II

ZK)‘plim

+ 2[ I,@ (1:B-l

= CD1+ 2@,.

YVI

O)]‘,“[Z,@

@ QZ,

the asymptotic

variance

ZK)]

(X-1

u)l} S8

(13)

Note that the instruments 2 and W contribute differently to the variance. Although the rows of AW are uncorrelated with the corresponding rows of Au, W and u are not independent. Hence, the variance of W’(,Y I 8 Z,)u depends on the fourth moments of the error distribution. Under normality, this variance is exactly twice the term that would occur if W were exogenous.

3.2. Feasible IV estimation When there are no covariance constraints, linearized maximum likelihood simply involves replacing in (7) the FIML values 2 and _$ by ones based on preliminary consistent (but inefficient) estimates. By analogy, one is tempted to form the estimator

where 2, #‘, and 2 are estimates of 2, W, and 2 based on preliminary consistent estimates of 6. This, however, will not lead to an efficient estimator in the present situation because the dependence between W and u is ignored. A linearized maximum-likelihood estimator for 6 has the form 8 .,=8+T-‘(m1+2~,)-l~:, where 8 is some initial estimator, T-‘(6, + 2&,-’ is some estimate of the variance of the MLE, and i$ is the (concentrated) score for 6 evaluated at 8. Although (2 + W)‘(Z-1 @ Z,)u is the correct score, (2 + W)‘(z-’ ~3 ZT)Z/T converges in probability to O1 + @z instead of the required @i + 2@,. An

35

T.J. Rothenberg und P.A. Ruud, Simultaneous equations with covariance restrictions

appropriate

LMLE

is

~)‘(Pc3z,)(y-z8)

s^=s”+ [(i:+2~)~(e-l@zT)z]-1(i+

=

[(t+2ti)pez~)Z]-'

FP(.F1@ZT)ii].

[(2+2q'(2-'ozT)y-

(14)

3.3. Minimum distance When there are covariance restrictions, it is no longer true that the minimum-distance estimator is 3SLS. The second term in (9) must now be taken into account. Since R, and SO are selection matrices, the elements of the vector g can be grouped into three subvectors: X’U g, = vet T ’

g,,=R;vec-

U’MU T-K’

U’MU = S,l vet go2 T_K --.

Note that g, and g,, depend only on 6 and that, for any value taken by S, g,, can be set arbitrarily by varying cr. Hence, by the argument given in footnote 5, g,, can be concentrated out of (9). Then, with

+3

=

the appropriate Q(S)

I

XlX)

0 7

0

distance

+$R:(~o~)R, function

05)

I

is

= [vec(X’U)]‘[2@Xx’X]-‘vec(X’U)

+

2(T1 btU'MwR. K)

In the presence of covariance restrictions, the problem involves quartic function in 8. Given the additive form of the objective solution should not be difficult to calculate.

minimizing a function, the

36

T.J.

Rothenherg

and P.A.

Ruud,

Simultaneous

equations

with covariance

restrictions

The asymptotic variance of the minimum-distance estimator twice the inverse of the average Hessian of the distance function. the inverse of

plim+Z’(EP1

@N,)Z+

which is the same expression

plim +(IcO

U)‘Z=

Zplim$Z’(I,@

as obtained

-[Z,@

U)z(I@

in (13) for the FIML

(ZB-’

is given by That is, it is

U)‘Z, estimator

(17) since

(01SC?.

Just as with maximum likelihood, there exists an efficient linearized minimum-distance estimator that is easier to calculate. Let i?= (Y X)2 be the residuals based on some consistent estimator 2. Then, using a first-order approximation to the quadratic function, we have Rb vec( U’MU)

s Rb vec( Ij’MU + U’Mi?= R;( ZG8 Mfi)‘(224

l?‘Mo)

- ii),

which can be viewed as a linearization of g,, around the value A = 2. With Rb vec(U’MU)/@?? replaced by the asymptotically equivalent we have the simple quadratic distance function Rf, vec(UYJ)/@,

[R;@

63 2) R,] -lR;(

Z, @ fi)‘(2u

- ii).

(18)

The resulting linearized minimum-distance estimator is identical to the linearized maximum-likelihood estimator (14) if Z = [I @ X( X’X)-lX’] Z and l@ = (l/T)(z @ ti)s(Z, 8 fi)‘Z are used to form the latter. The linearized minimum-distance estimator is also equal to a version of the augmented three-stage least-squares (A3SLS) estimator proposed by Hausman, Newey, and Taylor (1987). The demonstration, however, requires some additional notation. Let U, be the tth column of u’ and define e, = Rb vec(u,u;). Form the p x T matrix E’ = [e,,. .., er]. Then, denoting the T-dimensional vector of ones by 1 and setting e = vet(E), we have R’, vec(U’U) = E’L = (Zp 8 d)e. Hence, if Rb vec(U’MU)/dm is replaced by R’, vec( U’U )/ @, eq. (16) is equal to

T.J. Rothetlherg

and P.A. Ruud, Simultaneous

equations

wifh covariance

restrictions

37

where N,_= 1( 1’~)-IL’. Corresponding to the linearization of R b vec( VU) around Rb vec(U’o), there is an equivalent linearLzation_of e aro_und e’. Defining Z, = - de/da’, e is approximated by e’- Z,(6 - S) =F= - ZJ. In this notation, (18) is identical to

(19) which is the A3SLS objective function augmented error-covariance matrix. 4. Estimation

when

normality

is imposed

on the

when the errors are not normal

The quasi-maximum-likelihood estimator that maximizes (3) and the minimum-distance estimator based on (16) are still consistent but are no longer asymptotically efficient if the errors are not normal. Furthermore, the expressions for the asymptotic variance in (13) and (17) may be wrong. Some adjustments to our statistics can get around these problems. It will be more convenient to work within the minimum-distance framework, but analogous results are available using the likelihood approach. Two potential problems occur in our analysis in section 3.3 when the errors are not normal. First, s is no longer a vector of sufficient statistics for the unknown parameters. Optimal estimates need to make use of additional sample data. Unfortunately, unless we make further parametric assumptions (or switch to semiparametric methods), it is not clear what additional sample data is relevant. Even if we restrict our attention to estimators that are functions only of s, there is a second problem. The minimum-distance estimates behave asymptotically like linear functions of s, where the weights depend on the elements of the variance matrix V in eq. (15). Since we have used the normality assumption to form V, the estimators derived here may be inefficient because the wrong weights are employed. Note that T-l<,.? 8 X’X) is a consistent estimate of V,, for any error distribution as long as 2 is consistent for 2. Hence, if va6, is zero (because, for example, the errors are assumed to have a symmetric probability distribution), the distance function (16) is correct except for the use of the matrix 2R;(X @ 2) R ~ as an estimate of V,,. Assuming fourth moments of the error distribution exist, this matrix can be replaced in eqs. (16)-(19) by a more robust estimate, say li?‘l?/T. No other changes need be made to obtain the optimal minimum-distance estimator based on s and the correct asymptotic variance. If asymmetry of the error distribution is suspected, matters are considerably more complicated. As pointed out by Hausman, Newey, and Taylor (1987)

38

T.J. Rothenherg and P.A. Ruud, Simultaneous

equations with covariance restrictions

when third moments are not a priori zero, the fact that the elements of X are uncorrelated with certain products of the errors is new information that can be exploited. Using the matrix E defined above, it can be expressed by the fact that both X’E and X’U have mean zero. If 1 is a column of X, the elements of g, and g,, are a subset of the elements of (l/T)vec[X’(U E)]. It is natural then to consider a minimum-distance estimator based on the expanded set of moment restrictions. The T(G + p) vector E = vec(U E) has mean zero and variance matrix Vary = Var ]Iz = [ s2@Z1,] for

Q=Var[t:]

= [iit

:::I.

Of course, Q,, is just 2 and a222is V,,,; preliminary estimates for them have been discussed already. The matrix a,, can be estimated by Lj’E/T, Since vec[ X’(U E)] = (I @ X)‘E has variance matrix (52 @ X’X), the appropriate distance function is

(20) If e is linearized as in (19) the resulting minimum-distance estimator is the A3SLS estimator.’ The asymptotic covariance matrix for the MD estimator based on X’(U E) is [plim(l/T)Zi(ti-’ @ Nx)Za]-‘, where Z, = d&/86’. Using the same notation, the asymptotic covariance matrix for the best MD estimator based on the subset (X’U L’E) can be written as the inverse of

Since the term being subtracted is positive semidefinite (and nonzero if Di2 # 0), the estimator using third moments of the data in addition to the second moments is strictly more efficient whenever a, is correlated with e,. In practice, third and fourth sample moments of regression residuals are often unreliable estimates of the corresponding population moments. Robust estimates of 6 using these sample moments will probably not be an improvement over estimates based on normality unless the sample size is quite large. Furthermore, covariance restrictions often arise from the belief that certain error terms are independent of other error terms. Under independence, however, cross third moments are zero and the efficiency gain from exploiting ‘Cf. Hausman,

Newey,

and Taylor (1987, eq. (4Sa)).

T.J. Rothenherg

and P.A. Ruud, Simultaneous

equations

with covariance

restrictions

39

b( X’E) = 0 disappears. Hence, it is an open question whether the estimators using third and fourth moments will be useful in practice. 5. Conclusion The full-information maximum-likelihood and minimum-distance estimators for a simultaneous-equations model with covariance restrictions are rather complicated. But linearization yields estimators that are easy to compute and have an instrumental-variables interpretation. All of these estimators are functions of the second-order moment matrices of the data and have a simple method-of-moments interpretation: the p covariance restrictions are added to the usual GK exogeneity conditions to form mean functions for approximate minimum-distance estimation. Assuming a symmetric error distribution and a sufficiently large sample size, the estimators can easily be modified to make them robust to nonnormality. When symmetry is dropped, third-order moments of the data may be used to construct improved estimators. It appears that K(G + p) moment conditions are needed when cross third moments of the errors are nonzero. References Brundy, J.M. and D.W. Jorgenson, 1971, Efficient estimation of simultaneous equations by instrumental variables, Review of Economics and Statistics 53, 207-224. Durbin, James. 1988. Maximum likelihood estimation of the parameters of a system of simultaneous regression equations, Econometric Theory 4, 159-170. Hausman, J.A.. 1975, An instrumental variable approach to full information estimators for linear and certain non-linear econometric models, Econometrica 43, 727-738. Hausman, J.A.. 1983, Specification and estimation of simultaneous equation models, Ch. 7 in: Z. Griliches and M. Intriligator, eds., Handbook of econometrics, Vol. I (North-Holland, Amsterdam). Hausman, J.A., W.K. Newey, and W. Taylor, 1987, Efficient estimation and identification of simultaneous equation models with covariance restrictions, Econometrica 55, 849-874. Hendry, D.F., 1976, The structure of simultaneous equations estimators, Journal of Econometrics 4, 51-88. Malinvaud. E.. 1970, Statistical methods of econometrics (North-Holland, Amsterdam). Richard, J.-F., 1975. A note on the information matrix of the multivariate normal distribution, Journal of Econometrics 3, 57-60. Rothenberg, T.J. and C.T. Leenders, 1964, Eficient estimation of simultaneous equation systems, Econometrica 32, 57-76. Rothenberg, T.J.. 1973, Efficient estimation with a priori information (Yale University Press, London). Zellner, A. and H. Theil. 1962. Three-stage least squares: Simultaneous estimation of simultaneous equations, Econometrica 30, 54-78.