Journal
of Econometrics
44 (1990) 25-39.
North-Holland
SIMULTANEOUS COVARIANCE Thomas
EQUATIONS WITH RESTRICTIONS
J. ROTHENBERG
and Paul A. RUUD
U~lic~rsi
Full-information maximum-likelihood and minimum-distance estimators are derived for the simultaneous-equations mode1 with covariance restrictions. Linearization yields estimators that are easy to compute and have an instrumental-variables interpretation. The efficient three-stage least-squares estimator in the presence of covariance restrictions is shown to be a special case. Modified estimators which take into account possible nonnormality in the errors are also discussed.
1. Introduction We reconsider the linear simultaneous-equations system with covariance restrictions as described by Hausman, Newey, and Taylor (1987). Using both maximum-likelihood and minimum-distance approaches, we explore several estimation strategies. The augmented three-stage least-squares estimator proposed by these authors is derived as a special case. First we review the efficient estimation of the linear simultaneous-equations model without covariance restrictions. This provides the outline of the approach that we will use in the presence of covariance restrictions and it also is a useful contrast. Secondly, we provide an alternative derivation of the asymptotic covariance matrix of a subset of parameters estimated by maximum likelihood. This turns out to be much simpler than using the partitioned inverse formula on the information matrix. Thirdly, we derive the efficient 3SLS estimator in the presence of covariance restrictions as a linearized maximum-likelihood and as a linearized minimum-distance estimator. Finally, we discuss the properties of our estimators when the errors are not normally distributed.
2. No covariance restrictions A complete linear simultaneous-equations model involving G endogenous and K predetermined variables for a sample of size T can be written 0304-4076/90/$3.5O’C
1990, Elsevier Science Publishers
B.V. (North-Holland)
26
T.J.
compactly
Rothenherg
und P.A.
Ruud,
Simultaneous
equations
with covariance
restrictions
as
YB+ XT=
0)
U,
where Y is the T x G matrix of observations on the endogenous variables, X is the T x K matrix of observations on the predetermined variables, and U is the T x G matrix of unobserved random errors. The T rows of U are mutually independent, identically distributed random vectors with mean zero and positive definite covariance matrix 2. The matrix B is assumed to be nonsingular so that Y has the reduced-form representation
v,
y=xn+
where II = - I’B-’ and the rows of V = UB-l are independent with mean zero and covariance matrix B’-‘2B-‘. We shall further assume that X has rank K. Some of the elements of B and r are known a priori and do not have to be estimated from the data. Let A = (B’ F)’ be the (G + K) X G matrix of structural coefficients. Defining 6 to be the q-dimensional vector of unknown elements of A and defining r to be the GK + G2 - q vector of known elements, we can write’ r=Rbvec(A) The selection
Using
matrices
and
6=S,‘vec(A).
R, and S, consist
this fact, the prior restriction
(4 of zeros and ones and satisfy
R; vet(A) = r can be solved as
vec( A) = $3 + R,r. Since eq. (1) is equivalent to [ ZG@ (Y X)]vec( A) = vec( U), the simultaneousequations model with linear restrictions can be written in the familiar form of one stacked-vector equation in GT dimensions: y=Z6+u,
where
u = vec( U) has mean
zero and covariance
matrix
(2 Q I,);
the vari-
‘The operator ‘ vet’ transforms a matrix into a vector by stacking successive columns. We shall also use the operator ‘ vech’ which stacks the columns of the lower triangle (including diagonal) of a symmetric matrix, thereby removing the redundant elements. For matrices of appropriate dimensions, vec( CDE) = ( E’ @ C)vec( D), where 0 represents the Kronecker product.
X WI asoqM X!W=u (x+9) (5)
X3
e s! (0 r_a)
‘olaz 1p2a.rt! suuuyo:, pu0 JX + 8X =R aJaqM
‘(an -x~)~~~A.H(~-~~‘-x),H:-
=
an3 D pm g 103 sa103s ayi ‘(x)qDaA . H = (X)DaAwyi yms sauo pur! solaz 30 x!.~wu aql sf H 3’ ‘uaqL ‘1 30 swawala mouyun lm!wp ayl Bu!ufewo~ IolDaA (1 +s)gf ayl s! (X)y3aa E D alar++4‘De/7e = D7 put2 ge/7;re = g7 auyaa ‘o.raz 01 saJo3s asayl 8uglas pm sralaumed palDgsa.mn ayl 01 Ic)adsaJ YIP 7 30 saAgaA!lap ayl Sh.gcIrnIe~ Aq pamo3lad aq UKI s!y~, *suog+wal lo!Jd aql %$39ws 1 pm ‘J ‘8 30 sanlm ayl JaAo (E) %I.I~Z~~X~UIIcq pau!elqo ale sJowu!lsa ‘pooygay!~ umu~ew 30 poqlaw aql 01 8u;rp~omv .uogdumsse sg %uyxeIaJ30 samanbasuo3 aql aloIdxa aM ‘p uo!l3as UI mep ayl 30 uo!lnqyqs!p pmw aql s! s%Fql wyl amnsse Ileys aiME puz 2 suoyas 30 JapuretuaJ ayi .rod ‘[r_x(Jx l(8)~apPok~
+ tLI),(JX
+ (X)lap%oi.Ji
+m)]Jl$
-
- (fi~)%o1.9.~$-
=
(X‘XfX‘J‘8)7 ayl 103 uoymn3
aq pInoN uIals.k suopenba-snoauwIntu!s pooy!IayyI-801 ayl ‘pamqpwp 61pm~ou alaM wolla ayi 31 ymolddv pooyqayy-umuqxvur
aq~ ‘1.2
wnl u;[ ycea ssnmp ah .uogmu!lsa amewp -u.rnur~u~ pue pooqgaSg-umu~aw 30 S~S@XI~mo UIOJJaSaura [pm paalsu! 1nq ‘dp?aJfp paroldxa aq IOU IBM qceoJddt! h1) saIqerr~A-Il?luauInJlsuI S~J *A,AI _ (z,k ) aq plnom amugsa alqmosear f! ‘sysyxaasJam! aql3! ‘uayL *n t.p!M pala~a.IloXIn (IClamu~o~dde) pm? z yl!m pa$eIaUoD @@y s! ]ey~ M x~wm.~ b x Jg e 103 ymas 01 s! Bawls uogawgsa au0 ‘alqeyguap! s! g %u!umssv *“s[(xx)
cQI]-
=z
‘4l[(XX)@~Z]
=ft
6q uah@ an2 z put? X salqe
28
T.J. Rothenherg
and P.A. Ruud, Simultaneous
equations
with covariance
Since H’( Z- ’ @ Z-‘)H is invertible, the equation solution 2 = U’U/T. Unfortunately, the equation L, and cannot be easily solved. However, an interesting equation has been pointed out by Durbin (1988), Hendry (1976). Inserting the first-order condition TZ obtain the concentrated score2 for 6,
L,* = - S,l vec[ (XII
restrictions
L, = 0 has the unique = 0 is highly nonlinear interpretation of this Hausman (1975), and - U’U = 0 into (4), we
X)VE-l]
= -S,‘[Z-‘~(xnx)],u=~(z-‘sr,)u,
(6)
where Z = - [( IG @ (XII X)]S,. Assuming the appropriate matrices are invertible, the full-information maximum-likelihood (FIML) estimator which satisfies the equation L $ = 0 has the instrumental-variables representation 8,,
= [ iy
(8
IT)Z] -‘qe-l
Q
I,)y,
where Z = - [I, @ (XI? X)]S, and 2 = l_?‘fi/T are functions of s^,,. According to the definition of 2, the endogenous variables in Z are replaced by their FIML fitted values to form instruments which, when combined using generalized least squares, are interpreted as optimal instruments. Rothenberg and Leenders (1964) derived the asymptotic covariance matrix of the FIML estimator by taking derivatives of the concentrated-score function (6). Alternatively, it can be obtained from the asymptotic variance of the concentrated score. That is, the asymptotic-covariance matrix for @( s^,, - 8) is given by the inverse of Avar( T- ‘/‘L$ I= S{ [ 2-l
63 (II 1,)‘plim
Eq. (7) is only an implicit function for the FIML estimator, but other asymptotically equivalent estimators take the same form and are easier to compute. For example, the three-stage least-squares (3SLS) estimator proposed by Zellner and Theil (1962) and the full-information instrumental-variables estimator (FIVE) proposed by Brundy and Jorgenson (1971) can both be ‘For the log-likelihood L(6, o), the concentrated function is defined as L*(F) = L[6, h(6)] where L,[S, h(S)] = 0 so that ah’/as = - &,I&‘. Differentiation of L* gives L$’ = L,[6. h(6)] and L,& = L,, - L,, L,‘L,,. Thus, (6) is the score of the concentrated likelihood function and its to 6. derivative (when inverted) is L ss, the block of the inverse Hessian of L corresponding
T.J. Rorhenberg und P.A. Ruud, Simultuneour equations with covuriunce restricfions
29
written as
[z~(e-l~z,)z]-‘i,(e-l~z~)~, where z=
[(‘c@(XiiX))]S,,
2 = +0/T,
and I? and fi are calculated from consistent but inefficient estimators of B and r. They can be interpreted as linearized maximum-likelihood estimators (LMLE) since they are of the form
where 8 is the MLE, asymptotic appropriate
some initial estimator, T-l&;’ is some estimate of the variance of and i$ is the concentrated score for 6 evaluated at 8.3 The equivalence of these estimators with FIML can be demonstrated if regularity conditions are satisfied.
2.2. The minimum-distance approach Rather than work with the likelihood function, a distance function in sufficient statistics provides an alternative route to asymptotically efficient estimators. Let M = Z - X( X’X)-‘X’ and n = GK + iG(G + 1). Then, under normality, the distinct elements of the matrices (XX)-lX’Y and Y’MY/ (T - K) are sufficient for the unknown elements of (AJ). These statistics can be written as an n-dimensional vector s with mean CL:
PL(AA =
-vet I’B-’ vech B’-‘ZB-’
1’
The vector @(s - p) has mean zero and variance matrix V, given by
=
T(B~-‘B’@DX’X)-~ 0
3Although the method was introduced by Rothenberg and Leenders tation of 3SLS and FIVE was made by Hendry (1976).
(1964). the LMLE interpre-
30
T.J. Rotherdwg und P.A. Ruud, Simultaneous equations with covariance restrictions
where J = H( H’H)-’ has the property that vech(E) = J’vec(2). The minimum-distance approach finds parameter values that make the estimated mean p( a,e) as close as possible to the observed s. If vs is some consistent estimate of V,, it is natural to consider estimates that minimize
subject to Rk vec( A) = r. Under regularity conditions, this minimum-distance (MD) procedure produces consistent and asymptotically efficient estimates [cf. Malinvaud (1970, pp. 348-360) and Rothenberg (1973, pp. 24-25)]. As in maximum likelihood, the actual solution can be rather difficult because p is a complicated nonlinear function. A modification of the minimum-distance approach is more tractable. Let g(s, A, 1) be a vector of n smooth functions. Suppose that, for all parameter values (A*, E*) in an open neighborhood of the true value, g(s, A*, Z*) is a locally one-to-one function of s. Suppose further that, as the sample size T tends to infinity, plim g(s, A, 2) = 0 and @g(s, A,Z) is asymptotically normal with zero mean and nonsingular covariance matrix VE(A, E). For some consistent estimate fg, minimize T[g(s,
AJ)]‘f;‘[g(s,
AJ:)],
subject to the constraints. Under appropriate regularity conditions, the solution to this problem is also consistent and asymptotically efficient. By cleverly choosing the function g, we can construct a computationally convenient estimate of the structural parameters. The natural choice for g turns out to be
g, = [I i
vec( X’YB + XXr)
vech
&I
B’Y’M YB T_K
f(B’@X’X)
I[vech;g’E 1 I
-2) = 0
=/ J’( B 8 B)‘H
with
v, =
I k-II.bw)1~ 1
T.J. Rothenherg und P.A. Ruud, Simultaneous
31
equutions with covuriunce restrictrons
In this case, the original problem has been simplified by ‘cancelling’ the common terms involving B- ’ that appear in both p and V,. For some consistent estimator 2, the resulting modified distance function is4
r2g@zoXtx)-1g,+ :(T-K)g:H’(#%FJe)-lHg,.
(9)
When there are no restrictions on 2, the minimization of (9) is very simple. For any value of A, g, can be set to zero by varying 2. Hence, minimizing (9) over A and JZ is equivalent to minimizing
over 6 alone.5 Denoting solution is the three-stage [Z$-‘~3 which is a LMLE
the idempotent matrix least-squares estimator
X( X’X))‘X’
by N,,
the
N,)y,
Nx)Z] -‘Z$-‘8 using the preliminary
estimates
2 and I? = (X/X)-‘X’Y.
3. Partially restricted covariance matrix Suppose p of the elements of vech(2) are known to be zero. Then, as in (3) this information can be expressed in terms of selection matrices:
R; vet(2)
= 0,
S,l vet(2)
(10)
= u,
where u is the vector containing the iG(G + 1) -p unknown elements.6 Again, an explicit solution is given by vet(Z) = S,,u. The total prior information on the structural parameters (which we assume to be sufficient for identifiability) can be expressed either by the pair of constraint equations R; vec( A) = r, or, equivalently,
R; vet(2)
by the pair of equations
= 0,
relating
the constrained
parameters
4We use the fact that the inverse of J’(.E 0 I)J is H’(B 8 8)-‘H. Cf. Richard (1975) and Rothenberg (1973, pp. 87-88). Hausman (1983, p. 421) questions this way of representing the precision of a variance matrix, apparently because he does not use our definition of J. 51n general, let Q = x;V”xt + x;Vz2x2 + 2x{ V12x,, where the V’J are appropriate blocks of the inverse variance matrix. If x2 is freely variable, the minimizing value of x2 is - ( Vz2)- ’ V21x, and hence the minimum of Q is obtained by minimizing x[[V” - V12(V22)m1V21]x1 = x{V,;‘xl. 6R, and S,, are assumed symmetrically in (10).
to lie in the column
space
of H so that
o,, and
e,, are treated
32
T.J. Roth&erg
and P.A. Ruud, Simultaneous
to the free parameters
equations with covariance restrictions
6 and u:
vec( A) = $6 + R,r,
vec( 22) = SO0.
Our problem now is to maximize the likelihood function (3) or to minimize modified distance function (9) subject to these restrictions. 3. I. Maximum
the
likelihood
Without covariance restrictions, concentrating the log-likelihood function with respect to Z yields the IV interpretation of FIML. It is also a route to simplifying the derivation of linearized approximations to the maximum-likelihood estimator. In particular, the asymptotic covariance matrix for a,,_, a key ingredient in developing such an approximation, can be calculated from the score of the concentrated log-likelihood function, rather than from a partitioned inverse of the full-information matrix of the likelihood function. In the presence of covariance restrictions, concentration of the log-likelihood is much less convenient since an explicit closed-form solution of the normal equations for u does not appear to be possible. However, such solutions are not necessary to take advantage of the LML method or to obtain an IV interpretation of the FIML estimates. The score for 6 given in eq. (4) remains valid in the presence of covariance restrictions, but eq. (5) must be replaced by the score for the free parameters u:
L, = - fS;(Z:-’
8 Z’-‘)vec(Z
- VU/T).
(11)
Although equating L, to zero does not yield a simple solution for 2 in terms of U when S,, has rank less than iG(G + l), an implicit relationship can still be obtained. Substituting vet(Z) = S,a into (ll), we find the ‘solution’ u = [S;(ZP1
QDZ-‘)&I
P’S;(2P1
@ Z-‘)vec(U’U/T),
and hence7 vec( T2 - U’U )
=
(s,[s~(~-‘~x-‘)s~]-~s~(x-‘~B-~)-1cz)H.vech(U’U)
= -(2@2)R,[R;(E@2)R,]-‘R;vec(U’U). ‘The final expression follows, after a rotation of the coordinate system, from the identity H(H’H)-‘H’= R,( R;R,)-‘R; + S,(S;S,)-‘S; which describes the fact that the column space of H is the direct sum of the column space of R, and the column space of S,.
T.J. Rothenherg
md P.A. Ruud, Simultaneous
Substituting this expression concentrated-score function Z,,* = -SJ{ = (Z+
[z-i@
equutions
with covariance
into eq. (4) gives, as a generalization
(XII
x)1!+
[I@
(D-1
33
restrictions
of (6)
O)]‘Z(Z,@
W)‘(PC3Zr)u,
the
U)l}u 02)
where Z=Ro[R:,(Z@2)R,]-1R;,
z= - [(z,c3(xn
X))]S,,
w= -(_I?@ U)Z[Z,LB(n-l
0)1&s,.
As noted by Hausman, Newey, and Taylor (1987) the generalized IV form of FIML in (7) changes only in the instruments for the endogenous explanatory variables. With covariance restrictions, we have
a,,=
[(2+ ~)~(e-l@zT)Z]-1(2+ @)pl@zI,)y,
where 2, I@, and 2 are the FIML estimates of 2, W, and 2. The matrix 2 + W has a simple interpretation as an optimal instrument for Z. Using the transformation matrix A = (Z-l/* @ IT), where E112 is the symmetric square root of 1, our model can be written as Ay=AZ6+Au, where Au has a scalar covariance matrix. The transformed matrix of predetermined variables has valid instruments since c?[A (Z, 8 X)]‘A u = 0. In addition, the covariance restrictions imply that E[A(E @ U)R,]‘Au = 0. The population projection of AZ onto the space spanned by the GK + p columns of [A(Z,@ X) A(~@J U)R,] yields A(Z+ W) as the linear combination most highly correlated with AZ. In the presence of covariance restrictions, structural disturbances that are uncorrelated with the errors augment the reducedform regression function for the endogenous variables, thereby increasing the amount of variation in the instruments. The asymptotic covariance matrix for the maximum-likelihood estimator is, under suitable regularity assumptions, given by the inverse of the limiting Hessian of the concentrated likelihood function. Again, an easier route is via the asymptotic variance of the concentrated score. From (12), L,* is the sum of two uncorrelated terms: L,*,=
-s,,[z-‘@(XII
Ls*2= -S,l[Z,@(2Z!-’
x)]~u=z’(2-‘szT)u, O)]‘Evec(U’U)=W’(2-‘@Z,)u.
34
T.J. Rothmherg
and P.A. Ruud, Simultaneous
equations with covariance restrictions
Using the fact that var[vech(U’U)] is 2T”(E for JT(s^,, - 8) is given by the inverse of
s,l i[
2-l
@ (II
ZK)‘plim
+ 2[ I,@ (1:B-l
= CD1+ 2@,.
YVI
O)]‘,“[Z,@
@ QZ,
the asymptotic
variance
ZK)]
(X-1
u)l} S8
(13)
Note that the instruments 2 and W contribute differently to the variance. Although the rows of AW are uncorrelated with the corresponding rows of Au, W and u are not independent. Hence, the variance of W’(,Y I 8 Z,)u depends on the fourth moments of the error distribution. Under normality, this variance is exactly twice the term that would occur if W were exogenous.
3.2. Feasible IV estimation When there are no covariance constraints, linearized maximum likelihood simply involves replacing in (7) the FIML values 2 and _$ by ones based on preliminary consistent (but inefficient) estimates. By analogy, one is tempted to form the estimator
where 2, #‘, and 2 are estimates of 2, W, and 2 based on preliminary consistent estimates of 6. This, however, will not lead to an efficient estimator in the present situation because the dependence between W and u is ignored. A linearized maximum-likelihood estimator for 6 has the form 8 .,=8+T-‘(m1+2~,)-l~:, where 8 is some initial estimator, T-‘(6, + 2&,-’ is some estimate of the variance of the MLE, and i$ is the (concentrated) score for 6 evaluated at 8. Although (2 + W)‘(Z-1 @ Z,)u is the correct score, (2 + W)‘(z-’ ~3 ZT)Z/T converges in probability to O1 + @z instead of the required @i + 2@,. An
35
T.J. Rothenberg und P.A. Ruud, Simultaneous equations with covariance restrictions
appropriate
LMLE
is
~)‘(Pc3z,)(y-z8)
s^=s”+ [(i:+2~)~(e-l@zT)z]-1(i+
=
[(t+2ti)pez~)Z]-'
FP(.F1@ZT)ii].
[(2+2q'(2-'ozT)y-
(14)
3.3. Minimum distance When there are covariance restrictions, it is no longer true that the minimum-distance estimator is 3SLS. The second term in (9) must now be taken into account. Since R, and SO are selection matrices, the elements of the vector g can be grouped into three subvectors: X’U g, = vet T ’
g,,=R;vec-
U’MU T-K’
U’MU = S,l vet go2 T_K --.
Note that g, and g,, depend only on 6 and that, for any value taken by S, g,, can be set arbitrarily by varying cr. Hence, by the argument given in footnote 5, g,, can be concentrated out of (9). Then, with
+3
=
the appropriate Q(S)
I
XlX)
0 7
0
distance
+$R:(~o~)R, function
05)
I
is
= [vec(X’U)]‘[2@Xx’X]-‘vec(X’U)
+
2(T1 btU'MwR. K)
In the presence of covariance restrictions, the problem involves quartic function in 8. Given the additive form of the objective solution should not be difficult to calculate.
minimizing a function, the
36
T.J.
Rothenherg
and P.A.
Ruud,
Simultaneous
equations
with covariance
restrictions
The asymptotic variance of the minimum-distance estimator twice the inverse of the average Hessian of the distance function. the inverse of
plim+Z’(EP1
@N,)Z+
which is the same expression
plim +(IcO
U)‘Z=
Zplim$Z’(I,@
as obtained
-[Z,@
U)z(I@
in (13) for the FIML
(ZB-’
is given by That is, it is
U)‘Z, estimator
(17) since
(01SC?.
Just as with maximum likelihood, there exists an efficient linearized minimum-distance estimator that is easier to calculate. Let i?= (Y X)2 be the residuals based on some consistent estimator 2. Then, using a first-order approximation to the quadratic function, we have Rb vec( U’MU)
s Rb vec( Ij’MU + U’Mi?= R;( ZG8 Mfi)‘(224
l?‘Mo)
- ii),
which can be viewed as a linearization of g,, around the value A = 2. With Rb vec(U’MU)/@?? replaced by the asymptotically equivalent we have the simple quadratic distance function Rf, vec(UYJ)/@,
[R;@
63 2) R,] -lR;(
Z, @ fi)‘(2u
- ii).
(18)
The resulting linearized minimum-distance estimator is identical to the linearized maximum-likelihood estimator (14) if Z = [I @ X( X’X)-lX’] Z and l@ = (l/T)(z @ ti)s(Z, 8 fi)‘Z are used to form the latter. The linearized minimum-distance estimator is also equal to a version of the augmented three-stage least-squares (A3SLS) estimator proposed by Hausman, Newey, and Taylor (1987). The demonstration, however, requires some additional notation. Let U, be the tth column of u’ and define e, = Rb vec(u,u;). Form the p x T matrix E’ = [e,,. .., er]. Then, denoting the T-dimensional vector of ones by 1 and setting e = vet(E), we have R’, vec(U’U) = E’L = (Zp 8 d)e. Hence, if Rb vec(U’MU)/dm is replaced by R’, vec( U’U )/ @, eq. (16) is equal to
T.J. Rothetlherg
and P.A. Ruud, Simultaneous
equations
wifh covariance
restrictions
37
where N,_= 1( 1’~)-IL’. Corresponding to the linearization of R b vec( VU) around Rb vec(U’o), there is an equivalent linearLzation_of e aro_und e’. Defining Z, = - de/da’, e is approximated by e’- Z,(6 - S) =F= - ZJ. In this notation, (18) is identical to
(19) which is the A3SLS objective function augmented error-covariance matrix. 4. Estimation
when
normality
is imposed
on the
when the errors are not normal
The quasi-maximum-likelihood estimator that maximizes (3) and the minimum-distance estimator based on (16) are still consistent but are no longer asymptotically efficient if the errors are not normal. Furthermore, the expressions for the asymptotic variance in (13) and (17) may be wrong. Some adjustments to our statistics can get around these problems. It will be more convenient to work within the minimum-distance framework, but analogous results are available using the likelihood approach. Two potential problems occur in our analysis in section 3.3 when the errors are not normal. First, s is no longer a vector of sufficient statistics for the unknown parameters. Optimal estimates need to make use of additional sample data. Unfortunately, unless we make further parametric assumptions (or switch to semiparametric methods), it is not clear what additional sample data is relevant. Even if we restrict our attention to estimators that are functions only of s, there is a second problem. The minimum-distance estimates behave asymptotically like linear functions of s, where the weights depend on the elements of the variance matrix V in eq. (15). Since we have used the normality assumption to form V, the estimators derived here may be inefficient because the wrong weights are employed. Note that T-l<,.? 8 X’X) is a consistent estimate of V,, for any error distribution as long as 2 is consistent for 2. Hence, if va6, is zero (because, for example, the errors are assumed to have a symmetric probability distribution), the distance function (16) is correct except for the use of the matrix 2R;(X @ 2) R ~ as an estimate of V,,. Assuming fourth moments of the error distribution exist, this matrix can be replaced in eqs. (16)-(19) by a more robust estimate, say li?‘l?/T. No other changes need be made to obtain the optimal minimum-distance estimator based on s and the correct asymptotic variance. If asymmetry of the error distribution is suspected, matters are considerably more complicated. As pointed out by Hausman, Newey, and Taylor (1987)
38
T.J. Rothenherg and P.A. Ruud, Simultaneous
equations with covariance restrictions
when third moments are not a priori zero, the fact that the elements of X are uncorrelated with certain products of the errors is new information that can be exploited. Using the matrix E defined above, it can be expressed by the fact that both X’E and X’U have mean zero. If 1 is a column of X, the elements of g, and g,, are a subset of the elements of (l/T)vec[X’(U E)]. It is natural then to consider a minimum-distance estimator based on the expanded set of moment restrictions. The T(G + p) vector E = vec(U E) has mean zero and variance matrix Vary = Var ]Iz = [ s2@Z1,] for
Q=Var[t:]
= [iit
:::I.
Of course, Q,, is just 2 and a222is V,,,; preliminary estimates for them have been discussed already. The matrix a,, can be estimated by Lj’E/T, Since vec[ X’(U E)] = (I @ X)‘E has variance matrix (52 @ X’X), the appropriate distance function is
(20) If e is linearized as in (19) the resulting minimum-distance estimator is the A3SLS estimator.’ The asymptotic covariance matrix for the MD estimator based on X’(U E) is [plim(l/T)Zi(ti-’ @ Nx)Za]-‘, where Z, = d&/86’. Using the same notation, the asymptotic covariance matrix for the best MD estimator based on the subset (X’U L’E) can be written as the inverse of
Since the term being subtracted is positive semidefinite (and nonzero if Di2 # 0), the estimator using third moments of the data in addition to the second moments is strictly more efficient whenever a, is correlated with e,. In practice, third and fourth sample moments of regression residuals are often unreliable estimates of the corresponding population moments. Robust estimates of 6 using these sample moments will probably not be an improvement over estimates based on normality unless the sample size is quite large. Furthermore, covariance restrictions often arise from the belief that certain error terms are independent of other error terms. Under independence, however, cross third moments are zero and the efficiency gain from exploiting ‘Cf. Hausman,
Newey,
and Taylor (1987, eq. (4Sa)).
T.J. Rothenherg
and P.A. Ruud, Simultaneous
equations
with covariance
restrictions
39
b( X’E) = 0 disappears. Hence, it is an open question whether the estimators using third and fourth moments will be useful in practice. 5. Conclusion The full-information maximum-likelihood and minimum-distance estimators for a simultaneous-equations model with covariance restrictions are rather complicated. But linearization yields estimators that are easy to compute and have an instrumental-variables interpretation. All of these estimators are functions of the second-order moment matrices of the data and have a simple method-of-moments interpretation: the p covariance restrictions are added to the usual GK exogeneity conditions to form mean functions for approximate minimum-distance estimation. Assuming a symmetric error distribution and a sufficiently large sample size, the estimators can easily be modified to make them robust to nonnormality. When symmetry is dropped, third-order moments of the data may be used to construct improved estimators. It appears that K(G + p) moment conditions are needed when cross third moments of the errors are nonzero. References Brundy, J.M. and D.W. Jorgenson, 1971, Efficient estimation of simultaneous equations by instrumental variables, Review of Economics and Statistics 53, 207-224. Durbin, James. 1988. Maximum likelihood estimation of the parameters of a system of simultaneous regression equations, Econometric Theory 4, 159-170. Hausman, J.A.. 1975, An instrumental variable approach to full information estimators for linear and certain non-linear econometric models, Econometrica 43, 727-738. Hausman, J.A.. 1983, Specification and estimation of simultaneous equation models, Ch. 7 in: Z. Griliches and M. Intriligator, eds., Handbook of econometrics, Vol. I (North-Holland, Amsterdam). Hausman, J.A., W.K. Newey, and W. Taylor, 1987, Efficient estimation and identification of simultaneous equation models with covariance restrictions, Econometrica 55, 849-874. Hendry, D.F., 1976, The structure of simultaneous equations estimators, Journal of Econometrics 4, 51-88. Malinvaud. E.. 1970, Statistical methods of econometrics (North-Holland, Amsterdam). Richard, J.-F., 1975. A note on the information matrix of the multivariate normal distribution, Journal of Econometrics 3, 57-60. Rothenberg, T.J. and C.T. Leenders, 1964, Eficient estimation of simultaneous equation systems, Econometrica 32, 57-76. Rothenberg, T.J.. 1973, Efficient estimation with a priori information (Yale University Press, London). Zellner, A. and H. Theil. 1962. Three-stage least squares: Simultaneous estimation of simultaneous equations, Econometrica 30, 54-78.