Journal
of Econometrics
51 (1992) 151-181.
Simultaneous Christopher
North-Holland
equations and panel data*
Cornwell
Uniwrsity of Georgia, Athens, GA 30601, USA
Peter Schmidt Michigan State Unicersity, East Lansing, MI 48824-1038,
USA
Donald Wyhowski Memphis State Linker&, Received
Memphis, TN 38152, USA
July 1989, final version
received
October
1990
This paper considers a simultaneous equations model, with panel data and unobservable individual effects in each structural equation. The effects may be fixed or random. In the fixed effects case, a conditional likelihood approach leads to the within transformation, just as in the single equation setting. In the random effects case, we allow an arbitrary number of the exogenous variables to be correlated with the effects, and provide efficient GMM estimators along the lines of two-stage and three-stage least squares. The case of different instruments in different equations is also considered.
1. Introduction
This paper considers the usual type of simultaneous equations model, but with panel data and with an unobservable individual effect in each structural equation. The existing treatments of this topic almost invariably cover the case of random effects that are uncorrelated with the exogenous variables; for example, see Chamberlain and Griliches (19751, Baltagi (19811, Prucha (1985), and Hsiao (1986). These papers generalize the usual structural estimators, such as 2SLS, LIML, 3SLS, and FIML, to models with panel data and with the error structure implied by random effects. In this paper, we consider both tied effects and random effects that may be correlated with some or all of the exogenous variables. These are standard *The second author gratefully acknowledges the support of the National We also wish to thank two referees for their helpful comments.
0304-4076/92/$03.50
0 1992-Elsevier
Science
Publishers
Science
B.V. All rights reserved
Foundation.
152
C. Cornwell et al., Simultaneous
equations and panel data
cases considered in the single equation (linear regression model) literature. In the case of the regression model with fixed effects, it is well known that the model can be estimated by OLS after a within transformation and that, given normality of the errors, the within estimator is both the MLE and the conditional MLE. We establish essentially the same results for the simultaneous equations model. Similarly, in the linear regression model, the implications of correlation between the explanatory variables and random effects have largely been sorted out. Relevant references include Mundlak (19781, Hausman and Taylor (19811, Amemiya and MaCurdy (19861, and Breusch, Mizon, and Schmidt (1989). These papers provide estimators that reduce to the fixed effects (within) estimator if all of the explanatory variables are correlated with the effects, and to the random effects (GLS) estimator if none of the explanatory variables are correlated with the effects. However, with the exception of Amemiya and MaCurdy, none of these papers considers the case in which some of the explanatory variables are endogenous (in the sense of being correlated with the noise component of the error, as well as the individual effect). Furthermore, Amemiya and MaCurdy consider only limited information (2SLS) estimation, and their model is restrictive in some ways that ours is not. In this paper, we generalize the estimators from the single equation literature just cited to the simultaneous equations case. We provide limited information (2SLS) and full information (3SLS) estimators. These estimators reduce to the fixed effects estimators when all exogenous variables are correlated with the effects, and they reduce to previous estimators for the random effects model when none of the exogenous variables are correlated with the effects. Finally, we discuss the case in which different variables are correlated with the effects in different structural equations. The plan of the paper is as follows. Section 2 considers the ftxed effects case, while section 3 considers the case of random effects correlated with the exogenous variables. Section 4 gives our concluding remarks. An appendix contains the proofs of some results. 2. Fixed effects In this section we consider a standard linear simultaneous equations model, but with a fixed individual effect in every structural equation. The effects can be removed by a within transformation, and in this section we provide two results that provide a statistical justification for doing so. First, we show that the MLE of the model is the same as the MLE after a within transformation. Second, we show that the means of the endogenous variables are sufficient statistics for the effects, and conditioning on these leads to a conditional likelihood that depends only on the within variation in the data; thus the conditional MLE is the same as MLE after the within transformation.
C. Comwell et al., Simultaneous equations and panel data
153
We first define some notation. Let i = 1,. . . , N index individuals and , . . . , T index time periods. There will be G equations and K time-varying exogenous variables. (We rule out lagged endogenous variables, and in this section we also rule out time-invariant exogenous variables, since they would disappear under the within transformation.) For structural equation j (j = 1,. . . , G) we write t=l
yj=E;6j+Xj/?j+aj+&. The observations (19811, so that
I’
are ordered
j=1,2
>.*.>G.
(1)
as in Baltagi (1981) or Hausman and Taylor
Yj=(Yjll”.Yjlr.‘.YjN1..‘YjNT)‘,
(2)
for example. We observe the dependent variable yj, the data matrix of included endogenous variables q, and the data matrix of included time-varying exogenous variables Xi. We do not observe the vector of individual effects ‘Y~,nor the random error (‘noise’) Ed. The coefficients aj, pi, and rj are parameters to be estimated. We assume that the individual effects in CY~ vary over individual (i) but not time (t): aj=(ajl,...,ajN)
Be;,
(3)
where er is a T x 1 vector of ones. On the other hand, the noise &j varies over both i and t, so that aj has the same structure as is given for yj in eq. (2). We rewrite the jth structural equation as yj = Rjtj + CK~+ Ed,
(4)
where Rj =
and 6; = (Sj, p:>. We then stack the equations as given in (4) into the form usual in consideration of a SUR system (or of 3SLS): y=Rc$+a+&,
(5)
where y’ = (y’,, . . . , y&I, and similarly for 5, QI,and E, and where
R=
R,
0
*..
0
0
R,
..’
0
(6)
C. Cornwell et al., Simultaneous equations and panel data
154
We assume that the random disturbances for person i at time t, (Eli,, . . . , &Git)l, are iid N(0,2). This implies E N N(0, _I$0 I,,). We define the following standard notation for projections. For any matrix A, let PA be the projection onto the column space of A, so that PA = A(A’A)-‘A’ if the inverse exists. Also let QA = I - PA be the projection onto the null space of A’. Let II = I,,, 8 eT be a matrix of individual dummy variables, so that P,, = I,,, @ e,e>/T
(7)
and Q,. = IN @ [IT - e,e’,/T] are the orthogonal idempotent matrices that perform the between and transformations. The within transformation will be represented by script ‘w’; for example, yw = (I, @ Q,,)y has typical block yy, where Q,,yj has typical element yjir -Jji. Also note (Y)”= Q,:LY~ = 0, so that QC)~ = 0. Similarly, the between transformation will be represented superscript ‘0’; for example, y,? has typical element Jji and (YY= (Y,.
within superyy = (IG 8 by a
Theorem 1. The MLE of the structural parameters (5) is the same as the MLE of the system after a within transformation. Proof.
To prove this result, we factor the normal (log) likelihood into its within and between components: L=W+B,
(8)
where L = constant - ( NT/2)
In IZ:I + NT In IA I
-+(y-R~-(Y)‘(Y@Ir)(y-R+I!), W=constant-(NT/2)lnlX:l 4
B=
2
(9a)
+NTlnlAI
y w -R”~)‘(~-‘~z,)(y”-R”~),
-;(y”-R”[-a)‘(_I-‘@I~)(y”-RO~-a),
(9b)
(SC)
and where A is the G X G matrix of coefficients of endogenous variables. We now make the simple observation that, at the point of maximization of the likelihood, B = 0. The between portion (B) serves only to determine the effects (Yin terms of the other parameters as a =y"
-R”&.
(10)
C. Comwell et al., Simultaneous equations and panel data
15.5
Thus, for the parameters 5 and 2, the between portion of the likelihood is irrelevant and maximization of L is equivalent to maximization of the within likelihood W. We now consider a conditional likelihood approach. For this approach we require a sufficient statistic for the individual effects (a). Lemma 1. A sufficient statistic for a is y”. (That is, a sufficient statistic for cxji is yji = T-‘.Zyjlt, with the summation over t from 1 to T.) Proof.
See appendix 1.
The conditional likelihood function is the density of y conditional on y“, the sufficient statistic for (Y.We have: Lemma 2. The conditional likelihood function is the same as the within likelihood Win (9b), except that CT - 1) replaces T. Proof.
See appendix 1.
Theorem 2. The conditional MLE of E is the same as the MLE. conditional MLE of C is T/CT - 11 times the MLE.
The
Proof. This follows from Lemma 2. For the structural parameters 5, the presence of (T - 1) in place of T in (9b) is irrelevant. For the covariance matrix 2, the conditional MLE uses the divisor (T - 1) while the MLE uses T, but the matrix of sums of squares and cross products being divided is the same. n These results have been generalized in Cornwell and Schmidt (1990) to a model in which exogenous variables in addition to the intercept may have coefficients that vary over individuals, provided that the set of such variables is the same in every structural equation. These results are relevant when one assumes a (different) tixed effect for each individual, and when one has panel data in which the number of individuals (N) is large relative to the number of time periods (T), so that the appropriate asymptotic theory is as N + m with T fixed. As originally pointed out by Neyman and Scott (19481, and as emphasized by Anderson (19731, Chamberlain (19801, and others, the maximum likelihood estimator (MLE) is in general inconsistent in such a case; the difficulty is that the number of parameters increases with the sample size. This is the so-called ‘incidental parameters problem’. A standard solution is to find sufficient statistics for the effects and to condition on these to obtain the conditional likelihood function. The conditional likelihood will not depend on the incidental parameters, and so the conditional MLE can be expected to be consistent.
156
C. Cornwell et al., Simultaneous equations and panel data
In general, we may not be able to find fixed-dimensional sufficient statistics for the effects; for example, Chamberlain shows that this is possible for the logit model but not for the probit model. Even when such sufficient statistics exist, the conditional MLE may not be the same as the MLE. In particular, when the distribution of the sufficient statistics for the incidental parameters is informative about the parameters of interest, conditioning on the sufficient statistics may entail a loss of information, and the conditional MLE may be inefficient relative to the MLE. See, for example, Kalbfleisch and Sprott (1973) or Liang (1984). However, the results of this section show that this general pessimism is irrelevant for the linear simultaneous equations model, just as it is for the linear regression model. The fact that the MLE is also the conditional MLE implies that it is consistent. Furthermore, the fact that the conditional MLE is also the MLE should also imply that it is efficient. Some positive results on the efficiency of the conditional MLE are provided by Andersen (1973) and Pfanzagl (1982). Andersen’s results are not useful in this paper because he considers the case in which the distribution of the sufficient statistics for the effects is independent of the parameters of interest, a condition that is not satisfied in the present context. However, Pfanzagl(1982, ch. 14) establishes the asymptotic efficiency of the conditional MLE among the class of estimators that are consistent in the presence of individual effects that are correlated with observed variables in an arbitrary way. Pfanzagl notes (p. 235) that his treatment accommodates the case of fixed effects. His treatment assumes some regulatory conditions and restricts the class of estimates considered to ones that are sufficiently ‘well-behaved’, but in the present context these are not restrictive conditions. Of course, the within transformation is straightforward enough that the statistical properties of standard estimation procedures (2SLS, 3SLS, etc.) applied to the within-transformed data could be established from first principles. The main implication of the conditional likelihood interpretation of the within transformation is to confirm the intuition that, in the absence of assumptions about the effects, we cannot do better than an efficient estimator (such as 3SLS or MLE) after a within transformation.
3. Random effects correlated with exogenous variables 3.1. Introduction We now turn to the case of random effects that are potentially correlated with some or all of the exogenous variables. Our treatment will be a generalization of the single equation analyses of Hausman and Taylor (1981), Amemiya and MaCurdy (1986), and Breusch, Mizon, and Schmidt (1989). In particular, we will present limited information (2SLS) and full information
C. Comwell et al., Simultaneous equations and panel data
151
(3SLS) estimators, using instrument sets suggested by these earlier papers. The correspondence of these estimators to the estimators for the cases of fixed effects or random effects uncorrelated with the exogenous variables will also be made explicit. Our model is the same as in the last section, except for assumptions about the effects and the presence of time-invariant exogenous variables. Thus to eq. (1) we add the term Zj-yj, where Zj is the data matrix of time-invariant explanatory variables and rj is the corresponding vector of coefficients. Thus the jth structural equation is yj =
(11)
RjtJj + uj,
with Rj = (y., Xi, Zj>, 5, = (Sj, PJ, -y,9, and with uj = (CX~ + Ed) to emphasize that the effects aj are now treated as random. The entire system is still written as y = R,$ + u, as in (5) and with R as given in (6), but with the understanding that the blocks Rj now may contain Zj. We make the following standard random effects assumptions: Assumption .(O,2,). (ii) (0,X,). (iii) S’,, and (2,
1. (9 The The random All elements + TX:,) are
individual effects for person i, Cc+,. . . , aciY, are iid errors for person i at time t, (&iir,. . . , EDIFY,are iid of (Yare uncorrelated with all elements of E. (iv) 2,, nonsingular.
These assumptions imply that the GNT ((.u+ E) has the form:
where as before PC= Z,,,~3e&./T.
X
GNT covariance matrix of u =
It is another standard result that
(13) where 2’, = 2,, _& = S:, + TSa, and as before Q, = ZNT- P,. [In this section, as in the last section, we consider asymptotic properties of estimators when N -+ ~0with T fixed, so that (L5r, &I can be regarded as a reparameterization of LX,, 2,) even though & depends on T.] Furthermore,
(14) Note also that L! is the covariance matrix for the stacked error u = ((u + E). For a single equation, such as the jth equation in (111, the relevant covariante matrix is the jth NT X NT diagonal block of 0; namely, Ojj =
UfjQ,+ U~,jP,)
(15)
158
C. Comwell et al., Simultaneous equations and panel data
where a: j z 2,. jj and ~2 j G Zz,jj. Thus,
0,;’
= +,Q,,
+
--&‘v
0,;
‘12 =
$Q,
+
&I:. 2.j
Up to this point our model is consistent with Baltagi’s or Prucha’s, except that we consider only individual effects whereas their models have both individual effects and time effects. We also follow Baltagi and Prucha (and the usual literature on simultaneous equations) in assuming that the exogenous variables are uncorrelated with statistical noise (&I. However, we will now distinguish between exogenous variables which are correlated with the effects and exogenous variables which are not correlated with the effects, whereas Baltagi and Prucha do not allow such possible correlation. We will follow the conventional usage in the simultaneous equations literature by calling a variable endogeadus if it is correlated with the statistical noise in the model (E, above) and exogenous if it is uncorrelated with the statistical noise in the model. Furthermore, we will assume that endogenous variables are also correlated with the individual effects in the model (a, above). However, we will establish an unconventional usage by calling an exogenous variable singly exogenous if it is correlated with the effects and doubly exogenous if it is uncorrelated with the effects. We therefore distinguish three types of variables. Endogenous variables are correlated with the statistical noise and the individual effects. Doubly exogenous variables are uncorrelated with the statistical noise and the individual effects. Singly exogenous variables are uncorrelated with the statistical noise but correlated with the individual effects. We motivate this distinction with an argument given earlier by Breusch, Mizon, and Schmidt (1989). Our structural model has statistical noise and an individual effect in every structural equation. By standard algebra this model implies a reduced form, and each reduced form equation has an individual effect which is a linear combination of the individual effects in the structural equations as well as a time-varying error which is a linear combination of the time-varying errors in the structural equations. Therefore the solution for every endogenous variable will in general involve every structural error and also the individual effect from every structural equation. It is a standard argument that the nature of this solution implies that every endogenous variable should be correlated with every structural error, and it follows by exactly the same argument that every endogenous variable should be correlated with the individual effect from every structural equation. The importance of this argument is that we have only three types of variables to consider, not four; endogenous variables must be correlated with both statistical noise and individual effects.
C. Cornwell et al., Simultaneous equations and panel data
159
However, there is nothing in the above argument to indicate whether or not exogenous variables should be correlated with the individual effects. We therefore are free to distinguish singly exogenous and doubly exogenous variables. In doing so we are simply following the same path as was followed earlier in the single equation literature by Mundlak, Hausman, and Taylor and Amemiya and MaCurdy. All of these papers considered variables that are uncorrelated with the statistical noise but either correlated or uncorrelated with the individual effects. We will distinguish between singly exogenous and doubly exogenous variables as follows. Let X and Z be the data matrices of all time-varying and time-invariant exogenous variables in the system. We partition these:
x= [X,I,&]~ z= [-%,¶Z,2,]~
(17)
where Xc,, and Z(,, are doubly exogenous (uncorrelated with the effects) while Xc2) and Z(,, are singly exogenous (correlated with the effects). Incidentally, to avoid confusion, note the difference between Xo, and X,: Xc,, is the set of all doubly exogenous variables appearing in the system, while X, is the set of all (singly or doubly) exogenous variables that appear in the first structural equation. Obviously the same distinction applies between Xc2) and X,, Z(,) and Z,, and Z,,, and Z,. As a matter of notation, we recall that the number of equations in the simultaneous system is G, and we let K and L represent the column dimensions of X and Z, respectively. Similarly, the column dimensions of 3, X,, and Zj will be G,, K,, and Lj, for j = 1,. . . , G. On the other hand, Kc,, and L,,, will represent the column dimensions of X,,, and Z(,,. 3.2. Two-stage least squares In this section we provide 2SLS estimates of a particular structural equation. We may as well consider the first equation, which we write here as ~1 =R,5,
+UI,
(18)
adopting the notation of (11) above. It is convenient to transform the equation by 0, ‘I*, where a,, = cov(u,); the form of K?i’/* is given in (16) above. Thus we have fl;1'2y, =
i2;"*R,&, + 12&1’2u,.
All of our 2SLS estimators are IV of (191, using instruments
(19) of the form
160
C. Comwell et al., Simultaneous equations and panel data
Thus, once the equation is transformed as in (191, the 2SLS estimator can be calculated using standard 2SLS software. We will consider three forms of the matrix B, below. For now we simply note that our 2SLS estimators will be of the algebraic form
Since P,, and Q,. are orthogonal,
PA = P,,,,,, + Pcp,,Bj. Furthermore,
since
(22) and therefore
(21) can be rewritten as follows:
(23) Our 2SLS estimator can be derived as a GMM estimator. Theorem 3. Given the instrument set A =
y, - R,(,)
=0
(24)
or from the orthogonality condition plim$I’CI;‘/‘( Proof.
y, - R,[,)
= 0.
(25)
See appendix 2.
We now discuss the choice of instruments. Obviously we want instruments that satisfy the moment condition (24), which will also be part of the list of assumptions given below when we do asymptotics formally. For Q,,X, legiti-
C. Comwell et al., Simultaneous equations and panel data
161
macy follows immediately from exogeneity of X with respect to E and the fact that X’&,:(Y = 0. However, the choice of B depends on the meaning that one is willing to attach to the statement that ‘Xo, and Z,,, are uncorrelated with the effects’ as well as on what structure (if any) is attached to the correlation of Xc2) and .Zc2)with the effects. We therefore discuss alternative possible choices of B, following the existing practice in the single equation literature. Our first choice of B is analogous to the instrument set of Hausman and Taylor (1981). Their model had no endogenous variables; and because it was a single equation model every exogenous variable in the system appeared in the equation (X, =X and Z, = Z, in our notation). They proposed the instrument set [Q,, Xo,, Z,,,], which does not fit the form of A in (20) and which indeed would not lead to a consistent estimator in the presence of endogenous regressors. However, Breusch, Mizon, and Schmidt (1989) show that the Hausman-Taylor estimator is equivalently obtained with the instrument set A,, = [Q,;X, P,Xcl), Z,,,]. (This is not the same instrument set as [Q,., Xc,,, Z,,,], but it yields the same estimator for their model.) Obviously
AH, = [QJ,
PvBHTl
where B, = [ &,, Z,,,],
(26)
and so it is of the form of our instrument set (20) above. We will call our 2SLS estimator using this instrument set HT-2SLS. The HT-2SLS estimator is apparently new, but two special cases of it are familiar. First, if there are no doubly exogenous variables, the instrument set is ((2,X) and the estimator is just 2SLS after a within transformation (of the system). Clearly this corresponds to a tixed effects treatment. Second, if there are no singly exogenous variables (all exogenous variables are uncorrelated with the effects), we have the usual random effects treatment. In this case A, = and HT-2SLS is the same as Baltagi’s EC2SLS estimator [Baltagi (1981, p. 193, eq. (13))] specialized to the case of individual effects only. Our second choice of B is motivated by the work of Amemiya and MaCurdy (1986). None of their models is exactly the same as ours. For example, their model 2 has time-varying and time-invariant endogenous variables, but all exogenous variables are doubly exogenous. Their model 3, which is the closest to our model, has time-invariant endogenous variables, time-varying singly exogenous variables, and time-varying doubly exogenous variables. Translating into our notation and allowing for the fact that some of our exogenous variables are time-invariant, their instrument set is
AAM= [ Q,XJ&J~,,] = [QuX
puB,,l,
(27)
C. Comwell et al., Simultaneous equations and panel data
162
where B,,
= [ 4?,, -%,].
(28)
As explained by Breusch, Mizon, and Schmidt (1989) the matrix X$, displays each variable separately for T = 1,2,. . . , T. That is, for a NT X L panel data matrix S, the NT X LT matrix S* is defined by
s,,
5,
s,* ...
SIT
sNl
s;, s;, ... s], . . s’ s,, s,,
s’NT
s’ Nl
i, s* =
s= i
(29)
..’
Nl
s,,
...
s,,
Note that the instrument set (27) is of the form (20) with B,, as given by (28). We will call our 2SLS estimator using this instrument set AM-2SLS. Clearly the AM-2SLS instrument set differs from the HT-2SLS instrument set only in its treatment of the time-varying doubly exogenous variables: A,, includes the means of such a variable as a single instrument, whereas A,, includes each of the T time-period values of such a variable as a separate instrument. Our third estimator follows directly from Breusch, Mizon, and Schmidt (1989). We define the BMS-2SLS estimator as our 2SLS estimator above, using the instrument set A Lws= [Q,.X,X:,,Z~,,,(Q,,X,,,)*]
=
K2,‘xP,.BBMsl~
(30)
where B BMS
-
[ X;r,,
Zw
(Q,.Xd*]
.
(31)
Once again this is of the form A = (Q, X, P, B) as in (20). Our present model differs from the model of Breusch, Mizon, and Schmidt (1989) only in that our model includes endogenous variables; but these do not contribute to the is exactly as given in their paper. instrument set, so A,,, The exogeneity conditions necessary for the consistency of these estimators are discussed in some detail in Breusch, Mizon, and Schmidt (1989, pp. 698-700). For all three estimators, the exogenous variables (X and Z) must be uncorrelated with the statistical noise (E). For HT-2SLS, the individual means of the doubly exogenous variables must be uncorrelated with the
C. Cornwell et al., Simultaneous equations and panel data
163
individual effects. For AM-2SLS, the values of the doubly exogenous variables for each t considered separately must be uncorrelated with the effects. For BMS-2SLS, we have the additional condition that the correlation between the time-varying singly exogenous variables and the effects must be due solely to a time-invariant component in the singly exogenous variables, so that their deviations from means are separately legitimate instruments. This is equivalent to the condition that COV[X~~)~,, (Y,1 must be the same for all t. If these exogeneity conditions are met and if certain other regularity conditions are satisfied, our 2SLS estimators will be consistent and asymptotically normal. Furthermore, BMS-2SLS will be efficient relative to AM-2SLS, which in turn will be efficient relative to HT-2SLS. This efficiency ranking is a reflection of the standard result that using extra legitimate instruments can never lead to a decrease in the asymptotic efficiency of an IV estimator. In order to provide a rigorous treatment of the asymptotic properties of our estimators, we make the following additional assumptions: Assumption 2. (9 C = plim[Q,X, (Q,X>*, P,X, Zl’ [Q,X, *, P,X, Zl /NT exists. (ii) C, = plim A’A/NT is nonsingular. (iii) A’0n-1’2ul/ m converges in distribution to N(O, C,); or, equivalently, A’u,/ \/”NT converges in distribution to N[O,plim A’0, I A/NT]. (iv) plim A’R,/NT has full column
rank. Condition (9 is the usual stationarity condition for the exogenous variables. It is stronger than needed for HT-2SLS or AM-2SLS; for example, for HT-2SLS we would need only that plim(Q,X, P,X, ZY(Q,X, P,X, Z)/NT exists. Condition (ii) is also standard; the probability limit exists, by (i>, but must also be nonsingular. Assumption (iii) says that the instruments are legitimate (plim A’u,/NT = 0) and regular enough for A’u, to satisfy a central limit theorem. Finally, assumption (iv) is essentially an identification condition. It obviously requires that there be at least as many instruments as there are explanatory variables in the equation. We show in appendix 3 that it requires that the usual rank condition for identification by exclusion restrictions holds, an expected result; and also that it requires that plim B’Z,/NT be of full column rank, for which the corresponding order condition is that there be at least as many time-invariant instruments as there are time-invariant regressors. Theorem 4.
m([,
Under Assumptions 1 and 2, the 2SLS estimator is consistent, and
- 6,) ---)N[O, (plim R;R,L/ZP,~,1/2R,/NT)-1].
(32)
164
Proof.
C. Comwell et al., Simultaneous equations and panel data
See appendix 3.
As noted above, given Theorem 4, the efficiency comparisons of HT-2SLS, AM-2SLS, and BMS-2SLS follow immediately. The larger the instrument set A, the smaller the asymptotic covariance matrix of the estimator, as usual. As it stands, the 2SLS estimator (21) depends on the unknown parameters at, and ~22~ , . In practice we will need to use a feasible version of (21) that replaces these parameters by consistent estimators, say &
=a;Q,.i?,/h’(T-
l),
+,
= i?,P,i&/N
(33)
are consistent estimators of a:, and 02 ,, by essentially the same algebra that applies in the single equation case. 3.3. Three-stage least squares We now turn to 3SLS estimation of the system (5). We again consider instrument sets of the form A = (Q,X, P, B), as given above. Our 3SLS estimators will be of the form
We will refer to the estimators HT-SSLS, AM-3SLS, and BMS-3SLS, corresponding to the cases that B = BHT, B = BAM, or B = BBMS, as in the previous section. The estimator (34) is new, but (as with our 2SLS estimator) some special cases of it are familiar. First, if there are no doubly exogenous variables, B is empty and & is just the usual 3SLS after a within transformation of the system. Second, if there are no singly exogenous variables, we have the usual random effects treatment. Here B =X and HT = 3SLS is the same as Baltagi’s EC3SLS estimator [Baltagi (1981, p. 196, eq. (24))1, specialized to the case of no time effects.
C. Cornwell et al., Simultaneous equations and panel data
165
The 3SLS estimator g can be viewed as an obvious generalization of the 2SLS estimator [,. Alternatively, it can be derived as an orthogonality condition estimator, as in the following result. Theorem 5. Given the instrument set A = (Q, X, P, B), the 3SLS estimator (34) can be derived as a GMM estimator. either from the orthogonality condition A’(Y,
1 plimE(I,@A)‘(y-R~)=plim~
1
A'(Y*
-R&I) -&52)
or from the orthogonal&y condition
plim&(ZG@A)‘W112(y-Rf) Proof.
I =o
(35)
(36)
=O.
See appendix 4.
To derive the asymptotic properties strengthen Assumption 2 slightly.
of the 3SLS estimator, we need to
Assumption 3
(Z, @A)‘E/~
+ N[O,plim( I, @A)‘n(
I, @A)/NT]
,
(37)
or equivalently, ( ZG 8 A)‘fi-“2~/m
+ N[O, I, @ plim A’A/NT].
(38)
Theorem 6. Under Assumptions 1 and 2 (for each equation) and Assumption 3, the 3SLS estimator (34) is consistent and
+ N 0, plim Z?‘[21’ @PCa,,x, + &’ ( ( Proof.
]R/NT}-l). 8 PCPLBj
(39)
See appendix 5.
The 3SLS estimator is at least as efficient asymptotically as the 2SLS estimator, and it is generally more efficient. This can be proved directly, using
tedious algebra, but it is also easy to set as a standard result in the GMM framework. Specifically, because the 3SLS estimator is derived from the joirlt consideration of the orthogonality conditions that separately yield 2SLS. it follows from standard results on GMM estimation that 3SLS is cthcicnt relative to 2SLS. This comparison is treated in more detail in appendix 4. Of course, 3SLS and 2SLS are equally asymptotically efficient when Z‘,, and 5, arc diagonal. However, unlike the usual case for 2SLS and 3SLS, our 2SLS and 3SLS estimates are generally not identical under conditions of exact identification. This fact was noted by Baltagi (19X 1. pp. lO7- 19X) for the cast of no singly exogenous variables and it also holds (for the same reason) in oui case. However, an exception is the special case in which there arc no doubly exogenous variables. As noted above, this is the ‘fixed effects’ cast in which our estimators reduce to the usual 2SLS and 3SLS. but with data in deviations from means. The usual relationships between 2SLS and 3SLS therefore must hold in this case. As was the case for 2SLS. in practice WC will need a feasible version of the 3SLS estimator that_replaces the unknown parameters 2, and X2 by consistent estimates. say 2, and Z2. To derive one such set of consistent cstimators, we follow essentially the same path as in the previous section. We first estimate each (untransformed) structural equation by IV, using instruments A. Let the residuals from this estimation be t,, j = I, 2,. , G. The consistent estimates of the elements of 2’, and 2, arc
iy,,
= 1^(;Q2, i$/N(
T-
1).
<, ‘2.1, = Z,l’, ir,/N.
It is easy to show that the use of consistent the unknown C, and X,_ does not affect 3SLS estimator.
( 40)
estimates 2, and zL in place of the asymptotic properties of the
3.4. Different instruments in different equutions In the previous two sections we considered 2 SLS and 3SLS when the same instruments are appropriate in every equation. This is certainly the case most commonly considered in the literature, but there are some exceptions. For example, Amemiya (1977) and Hausman, Newey, and Taylor (1987) consider cases in which different instruments apply to different equations. It is relatively straightforward to extend their treatment to the panel data model. In the present context, our instrument set depends on the classification of exogenous variables into singly and doubly exogenous variables. In the previous sections of the paper this classification was done only once, so that it was implicitly assumed that the classification of exogenous variables as singly or doubly exogenous is the same for every structural equation. In other
C. ComweN et al., Simultaneous equations and panel data
167
words, it was assumed that an exogenous variable was either correlated with the individual effect in every equation or that it was uncorrelated with the individual effect in every equation. This may be a reasonable assumption for many models, but there does not appear to be any basis for asserting that it must be true in every possible application. The case in which the available instrument set varies across equations therefore seems worth considering, at least briefly. As a specific example, consider the model of Schmidt (1988), which is a (nonlinear) system consisting of a Cobb-Douglas production function and the first-order conditions for cost minimization, with individual effects in every equation. The exogenous variables are the output level and input prices. The individual effects in the production function and in the first-order conditions may have very different interpretations, such as soil quality versus ability to choose the correct tractor and quantity of fertilizer in an agricultural setting. Different variables may be correlated with these rather different things; for example, it may be more reasonable to assume that the price of fertilizer is uncorrelated with soil quality than that it is uncorrelated with the farmer’s percentage of systematic under- or overutilization of fertilizer. If so, the instrument set varies across equations. To proceed, suppose that for equation j (j = 1,2,. . . , G) the appropriate instrument set is
(41) where B, varies over j. Here Bj may be of the HT, AM, or BMS forms, but the algebra is the same for each. The 2SLS estimator of a single equation is of the same form as above; for example, 2SLS of the first equation depends only on the orthogonality condition
(42) and yields an estimator of the form of [, in (20, except that A, replaces A in that equation. However, 3SLS now takes a rather different form than < in (34). Following Amemiya (1977), Hausman, Newey, and Taylor (19871, and Schmidt (19901, consider the orthogonality condition
A’dYl-R151) plim( NT) - I
A;(
Y,
4AYG
-&S*)
1
=plimA’(y-RS)/NT=O,
-&Sk)
(43)
168
C. Comwell et al., Simultaneous equations and panel data
where A is a block-diagonal matrix with jth block Aj. This yields the 3SLS estimator g= [
K’K(A’nA)-‘kR] -ki((knA)-‘A’y.
(44)
If Assumptions 1 and 2 hold for every equation, and if we replace Assumption 3 by the similar assumption that 2u/ fi + N[O,plim 20i/NT], then g is consistent and
(45) which is proved along exactly the same lines as in appendix 4. Furthermore, the 3SLS estimator is efficient relative to the 2SLS estimator because it uses the relevant orthogonality conditions jointly (and efficiently) rather than separately; the argument is again essentially the same as in appendix 4. When the classification of exogenous variables is the same for every equation, the instrument set Aj is the same for every equation, and so A’ = 1, @A as in section 3.3. The estimator (44) then simplifies to our previous 3SLS estimator, because
(46) as was shown previously. But in the general case of this section there is not too much scope for simplifying (44). For example, the i, j block of A’0A equals C,,ijA:Q,.Aj
+ Z~,ijA:P~Aj
We could reorder the rows and with one block equal to x1 2;’ 8 (X’Q,X)- ’ is available. of the form Z2,ij~~P~,Pj (i, j analytically.
=
s,,ijX’Qt,X 0
0 C,,ijBIp,,Bj
1 (47) ’
columns of A’f2A so that it is block-diagonal, @X’Q,X, for which the analytical inverse However, the other block would have entries = 1,. . . , G) and still could not be inverted
4. Conclusions This paper has considered simultaneous equations models with panel data and individual effects in each structural equation. It has attempted to integrate two previously separate strands of the literature. The first of these
C. Cornwell et al., Simultaneous equations and panel data
169
[Mundlak (1978), Hausman and Taylor (1981), Amemiya and MaCurdy (1986), Breusch, Mizon, and Schmidt (1989)l considers the linear regression model. It distinguishes fixed and random effects and, when the effects are random, distinguishes the case in which they are uncorrelated with the regressors from the case in which they are correlated with the regressors. If enough of the regressors are correlated with the effects, the random effects case effectively becomes the fixed effects case. The second strand of the literature [Baltagi (19811, Prucha (1985), Hsiao (198611 considers structural models, but under the assumption that the effects are random and uncorrelated with the exogenous variables. We combine these perspectives by considering a structural system in which the effects may be treated as either fixed or random. When they are treated as random, they may be correlated with an arbitrary number of the exogenous variables. For the fixed effects case, we show that the MLE and the conditional MLE are equal and take the form of MLE after the within transformation, thus extending to the simultaneous equations model a result previously known to hold for the regression model. For the case of random effects correlated with some or all of the exogenous variables, we propose 2SLS and 3SLS estimators which are similar to the estimators of Baltagi, but which make use of the instrument sets of Hausman and Taylor, Amemiya and MaCurdy, and Breusch, Mizon, and Schmidt. These estimators reduce to a fixed effects treatment if all exogenous variables are correlated with the effects, and the estimators using the Hausman-Taylor instrument set reduce to Baltagi’s estimators if no exogenous variables are correlated with the effects. We also consider the case in which some exogenous variables may be correlated with the individual effects in some equations but not others, so that the set of available instruments varies from equation to equation; this leads to a 3SLS estimator of somewhat unusual form. Appendix 1 Proof of Lemma 1
Using fairly standard notation, we write the system of equations as YA+Xp+&+E’=O,
(Al.1)
where Y is NTXG, A is GxG, X is NTXK, p is KXG, ~5 is NTXG, and g is NT x G. In terms of the notation established in the text, Y, ci;,and k are just rearrangements of y, (Y, and E of eq. (5); in particular, Y = (Y,?YD...> yo), and similarly for & and E, so that y=vec(Y),
a=vec(&),
~=vec(Z).
(Al .2)
170
C. Cornwell et al, Simultaneous
The reduced form corresponding
equations
to (All)
and panel data
is
Y=Xzz+i+P. where II=
-PA-‘,
h = -&A-‘,
(A1.3) v=
-Ed-‘,
or (Al .4)
y=(Z,@X)vecZZ+A+V,
with y=vecY, h=vech, V=vecI? Define /1.=(Z,@XX)vecU+A, y N N(p, @ 6%~ I,,) with @ = (A’)-‘XA-‘. Now define the GN x 1 vectors
jz=([email protected]‘e;)/_L=[zIGNql,O,...,O)]I*O.
so that
(A1.5)
Note that j contains the same information as y”, but each element of the form jji> is displayed only once instead of T times. It is clear that j N N(ji;, T-l@ @Z,). The vector consisting of y and jj is also normally distributed:
Thus the conditional distribution of y given j is (singular) normal, with mean and covariance matrix given by standard formulae for conditional normals. Specifically, O=cov(yIJ)
= (@@ZN@Z,)
=@S
[Z,@(Z,-TT-‘e,e$)]
=@iQQ,,,
(A1.7)
C. Cornwell et al., Simultaneous
171
equations and panel data
and similarly, E(yly)=~~f(T-~~~Z~~e~)(T-‘~~z~)-’()?-~) =P
+
(4x.d%)(~-iq
= p + (I,,
8 eT)( Z,,,, 8 T-‘e>)(
y - p)
[using (Al S)] (A1.8)
=(Z,@P,.)y+(Z,@Qe,.)k.
It is obvious that the covariance matrix 0 in (A1.7) does not depend on the effects (cu, or equivalently A). The same is true of the mean in (Al.@, since
= ( Z, @ Q,. X)vec
(A1.9)
ZI,
in light of (ZG @ Q,>A = 0. Thus the distribution independent of the effects. n
of y conditional
on JJ is
Proof of Lemma 2
The conditional likelihood is just the distribution of y given j. This is the normal distribution with mean (A1.8) and covariance matrix (A1.7). We now use the standard result [e.g., Rao (1973, p. 528)l that if z = N(y, O), with rank(@) =a, then f(z)
= (2r)
-p’2R-1/2exp{-+(z-y)‘O+(z-~)},
(A1.lO)
where Of is the generalized inverse of 0 and R is the product of the positive eigenvalues of 0. In the present case, O+= @-’ C+Q,,,
p
(Al.ll)
because Q,. is idempotent. Also, the eigenvalues of 0 are the products of the eigenvalues of Q and of @. Since the eigenvalues of Q are one (T - 1 repetitions) and zero, the product of the positive eigenvalues of 0 is simply I@1(T-1). Furthermore, Y-E(YIL)=Y-(~,~,P,.)Y-(Z,~Q,.)CL = (&@Q~,)(Y =y”-/p.
-P) (A1.12)
172
C. Comwell et al., Simultaneous equations and panel data
Thus the conditional likelihood is simply f(ylY)
= (2+o’r-1)/21~1
-(r-1)/2
* exp{ - i( yw - P~)‘(@-’
8
Q,:)( yw -
pw)}.
(A1.13)
This can be simplified. First, I@I = IA I -212 I so that I@(-(r-1)/2 = lAl’r-l)IZ(I -(T-i)/2_
(A1.14)
Second, the quadratic form in (A1.12) is (Y-P)‘(&@Q,~)(@-~@Q~)(~~@Q~,)(Y-P) =(y”-/_P)‘(@-WIT)(yW-/P) =(y”-Rw5)‘(~-‘~zI,)(yW-Rw5),
(A1.15)
where the last equality follows from standard algebra and (Ye= 0. Thus we have f(Ylj-) = (2rr)-G’T-1)/214((T-1)(~1-(T-1)/2 ~exp[-~(y”--Rw~)‘(T:~‘@l,.)(y”-Rw~)]. This is the same as W in eq. (9b) except that (T - 1) replaces T. Appendix
(A1.16) n
2
Proof of Theorem 3
First, consider the GMM estimator based on the orthogonality condition A’R,‘/*(y, - R,t,) = I/J,. The covariance matrix of $, can be taken to be c 1 =A’fl-iPfl 11
11
fi-‘+t 11
=A’/Lf9
(‘42.1)
in the usual sense that A’A/NT converges in probability to the population covariance of the moment condition, co~[A’J~~‘/~u,),,]. Thus the GMM estimator minimizes
w;v, =(Y, It is straightforward (21) of the text.
-~,5,)‘~,1’2p,~,1’2(Yl
-R151).
(fQ.2)
that the solution to this minimization is 6, as given by
C. Cornwell et al., Simultaneous equations and panel data
173
Second, consider the GMM estimator based on +!J~= A’( y, - R,[, >. This has covariance matrix that can be taken to be
c,
(A2.3)
=A’R,,A,
and so the estimator minimizes
rL;c,v*=
(Y, -R,S,)'A(A'R,,A)_'A'(Y,
-R&1).
(M.4)
In general (A2.2) and (A2.4) are different and lead to different estimators. However, in our case A =
-’
Jr I X'Q,, B’P,
,a-1/2p
11
A
a-1/2. II
7
see eq. (22) of the text for the last equality. Therefore identical.
(A2.5) the two estimators are
Appendix 3 Proof of Theorem 4
We establish some preliminary results and define the following preliminary notation. First, we define the scaling matrix S: 0 g2,1
I
b
1 ’
(A3.1)
where b = column dimension of B. Note A’O,,A = S’A’AS. Second, we define selection matrices *x and I& such that X, =XILx and Z, = ZI,&. Thus $x consists of K, columns of ZK and I,& consists of L, columns of 1,. We also define H= [Q,x(QJ)*J=J,z],
(f=2)
C. Cornwell et al., Simultaneous equations and panel data
174
2(i), C = plim H’H/NT
so that, by Assumption tion, let
’
'Q [ 0
C=
exists. As a matter
(A3.3)
c,’ 1
where C, = plim[(Q, X)*, P,,X, Zl’[(Q, X)*, f’,x, plim X’Q,.X/NT. Also define 4B and
O
Iii
*t3=
4B
0
[ such
and
C, =
(A3.4)
’
C, = plim A’A/NT
= 4B and A = H*B. Thus,
= 1&Ct+5~ =
CQ 0
and +B has full column reduced form equations
Zl/NT
I
B = [*, P, X, Zl
that
of nota-
0 WP4B
rank so that C, is nonsingular. for Y, as
1 ’
Finally,
(A3.5)
we write the
y,=xn,+zn,+v,
=X,n,,+x,n,*+z,~,,+z,~,z+~,,
(A3.6)
and define
(A3.7)
so that
R, = (Y,,x,,Z,) = [Q,.~,(Q,x)*,P,,x,z]~+
[WM]
=Hrr+(V,,O,O).
(A3.8)
Necessary conditions for Assumption 2(k) Here we derive column rank.
two necessary
conditions
for plim A’R,/NT
to have full
C. Cornwell et al., Simultaneous
equations and panel data
175
Lemma 3.1
plim A’R,/NT
= &Crr
(A3.9)
Proof.
Use the representations plim A’R,/NT
Theorem 3.1. rank is:
A = Ht,/r, and R, = Hr + (V,,O, 0) so that
= I& plim H’H/NTr
+ I,/&plim H’( V, , 0,O)/NT
A necessary condition for plim A’R,/NT
plim B’Z, /NT has full column rank. Proof.
From (A3.91, for plim A’R,/NT necessary that
= plim B’Z,/NT
have full column rank.
to have full column
(A3.10)
to have full column rank it is
(A3.11)
n
We note that (A3.10) reflects the obvious condition that there be at least as many time-invariant instruments as time-invariant regressors. Theorem 3.2. A necessary condition for plim A’R,/NT to have full column rank is that the equation satisfy the usual rank condition for identification by exclusion restrictions.
176
C. Cornwell et al., Simultaneous equations and panel data
Proof.
Since plim A’R,/NT = I&C 7, from (A3.9), we must require have full column rank. But the rank of r is the same as the rank of
Rearranging
r to
rows, we have
Define (A3.12) the matrix of coefficients of the excluded exogenous variables in the reduced form equations for Y,. Then rank(rr)=K,+L,+rank(U,).
(A3.13)
Thus rank(r) = G, + K, + L, requires rank(l7,) L G,, which is the usual rank condition for identification by exclusion restrictions. n Consistency and asymptotic distribution of 2SLS Lemma 3.2. Proof.
plim R’,fl, 1/2PAfi;1/2R1/NT
is nonsingular.
plim R’,0;l’/2PAO;1/2R1/NT = (plim R’tL?;,1/2A/NTXplim
A’A/NTjP1(plim
A’fl;1’2R,/NT)
= (plim R’rA/NT)S-l(plim A’A/NT)-‘S-‘(plim [because A’~;~“*R, = S-‘A’R,] = J’S_‘Ci’S-‘J
This is nonsingular column rank. n
where J = plim A’R,/NT. because S -‘C; ‘S- ’ is nonsingular,
A’R,/NT)
and J has full
C. Comwell et al., Simultaneous equations and panel data
177
We can now prove Theorem 4, as follows. Write &, as
s;=t1
+ [ R;L’n,‘/*P,R,“*R,/NT]
-‘( R’$,“%/NT)
.(A’A/NT)_‘(A’R,l’*u,/NT), and therefore plim5; =tl
+ [J’S-LC~lS-‘J]-lJ’S-‘C,-’
because plim _4’J2, ‘/*u,/NT tency. Similarly,
= 0 by Assumption
-0,
(A3.14)
2(iii). This proves consis-
and so
. c,-l~-lJ[J’~-lc,-‘~-‘J]
=
-I
NIO,(J'S-lC,-'S-lJ)-l].
(A3.15)
Appendix 4 Proof of Theorem 5
Define & = (I, @AY(y - R,$) as in eq. (35) of the text. Treating cov($,) as C, = (I eAYR(Z 634 =X1 @AQUA + z2 @A’P,A, the GMM estimator minimizes
C. Cornwell et al., Simultaneous
178
equations and panel data
However, with 0 = 2, @ Q,. + & ~3P,, and A = [Q,.X, P,.B], it is easy to establish
(A4.2) using essentially minimizes
the same algebra as in eq. (A2.5). Thus the estimator
(Y-R5)‘[Z;‘Bp(~,x)+Z;1~p(P,B)](Y-R5),
(A4.3)
and the solution is the 3SLS estimator 5 as in eq. (34). Furthermore, the fact that the GMM estimator based on (36) leads to the same estimator follows from the first equality in (A4.2). n Appendix
5
Proof of Theorem 6
The 3SLS estimator is given by eq. (34) of the text. However, using eq. (A4.2) we can write it as
CM.11 or
*(Z, QA)‘R/NT)-~ +(z,@A)/NT[(Z,@A)‘0(Z,@A)/NT]-‘(Z,sA)’s/NT. (M.2)
It is obvious from Assumption 3 that plim(Z, @AYE/NT = 0. Therefore the 3SLS estimator will be consistent provided that the matrices that are inverted
C. Comwell et al., Simultaneous equations and panel data
179
in (A5.2) have nonsingular probability limits. To show this, first consider plim( Zo @_4)‘n( Z, @LI)/NT = plim(Z, @A)‘(z:, @Q,. +&@Z’,)(& = ,V$r8 plim A’Q, A/NT
@A)/NT
+ _Y&63 plim A’P,, A/NT
plim X’Q,.X/NT
0
0
0
plim B’P,,B/NT
0
I
’
By Assumption 2(n), plim Xl&,X/NT and plim B’P, B/NT exist and are nonsingular. Therefore the matrix in (A5.3) is nonsingular; in fact, its inverse is (plim X’Q,X/NT)-’ 0
+X,’ @
0
0
0 0
0
(plim B’P,, B/NT)
-’
1 1 *
(M.4)
We next consider the probability limit of the matrix in braces { } in eq. (M.2). This probability limit exists and will be nonsingular provided that plim(Z, 8 AYR/NT is of full column rank. But (Z, @AYR is a block-diagonal matrix with jth block equal to A’R,, so we have full column rank provided that plim A’R,/NT is of full column rank for all j. And this follows from Assumption 2. We therefore conclude that $ is consistent, and turn to its asymptotic distribution. Here we have &i@(i--5)=(R’(I,@A)/NT[(Z@‘A)‘R(I,eA)/NT]-’
.(ZG @A)‘R/NT}-’
180
C. Comwell et al., Simultaneous equations and panel data
From Assumption 3, (ZG~AA~/~~N[0,plim(Z,8A)‘n(Z,~~)/NT], and therefore m(&covariance matrix
using (A4.2).
5) is asymptotically
(M.6)
normal with zero mean and
n
Eficiency of 3SLS relative to 2SLS
The asymptotic covariance matrix of our 3SLS estimator is As.cov.(&) = (R’[“,’
@Z’p,,,,,+&l
@JZ’~~,,,,]R)-~.
(M.8)
To see that 3SLS is efficient relative to 2SLS, define I,!+= (Z, 8 AY(y - R5) as above and note that the 2SLS estimates minimize r,&C~‘&, where A’aZ,,A c,=
0 . d
0 A’R,,A
a*.
0
..*
0
CM.9 (j
..:
A’&,
A
Explicitly, the two-stage least squares estimator can be written as
and it follows (given Assumptions matrix is A~.cov.([)
1, 2, and 3) that its asymptotic covariance
= [R(z~A)C;~(ZBA)‘R]-‘R’(Z~A)C;~ .(Z@AA)‘R(Z@A)C,‘(ZeA)‘R .[~(z~A)C,-~(Z~A)‘R]-‘.
(A5.11)
C. Cornwell et al., Simultaneous
equations and panel data
181
The asymptotic covariance matrix in (A5.11) is at least as large as the one in (AU?), in the sense that their difference is positive semi-definite, and it is generally larger. The exception to the last statement is the case in which 0 is block-diagonal (that is, ,Z, and ,I$, are diagonal), in which case (I @AYLt(I Q A) = C,, (A5.11) is identical to (AU), and 2SLS and 3SLS are equally asymptotically efficient. References Amemiya, T., 1977, The maximum likelihood and the nonlinear three-stage least squares estimator in the general nonlinear simultaneous equations model, Econometrica 45, 955-968. Amemiya, T. and T.E. MaCurdy, 1986, Instrumental variable estimation of an error components model, Econometrica 54, 869-881. Andersen, E.B., 1973, Conditional inference and models for measuring (Mentalhygiejnisk Forlag, Copenhagen). Baltagi, B.H., 1981, Simultaneous equations with error components, Journal of Econometrics 17, 189-200. Breusch, T.S., G.E. Mizon, and P. Schmidt, 1989, Efficient estimation using panel data, Econometrica 57, 695-700. Chamberlain, G., 1980, Analysis of covariance with qualitative data, Review of Economic Studies 47, 225-238. Chamberlain, Cl. and Z. Griliches, 1975, Unobservables with a variance-components structure: Ability, schooling and the economic success of brothers, International Economic Review 16, 422-450. Cornwell, C. and P. Schmidt, 1990, Models for which the MLE and the conditional MLE coincide, Empirical Economics, forthcoming. Hausman, J.A. and W.E. Taylor, 1981, Panel data and unobservable individual effects, Econometrica 49, 1377-1398. Hausman, J.A., W.K. Newey, and W.E. Taylor, 1987, Efficient estimation and identification of simultaneous equation models with covariance restrictions, Econometrica 55, 849-874. Hsiao, C., 1986, Analysis of panel data (Cambridge University Press, Cambridge). Kalbfleisch, J.D. and D.A. Sprott, 1973, Marginal and conditional likelihoods, Sankhya A 35, 311-328. Liang, K., 1984, The asymptotic efficiency of conditional likelihood methods, Biometrika 71, 305-343. Mundlak, Y., 1978, On the pooling of time series and cross section data, Econometrica 46, 69-86. Neyman, J. and E.L. Scott, 1948, Consistent estimates based on partially consistent observations, Biometrika 16, l-32. Pfanzagl, J., 1982, Contributions to a general asymptotic statistical theory, Vol. 13, Lecture notes in statistics (Springer-Verlag, New York, NY). Prucha, I., 1985, Maximum likelihood and instrumental variables estimation in simultaneous equation systems with error components, International Economic Review 26, 491-506. Rao, C.R., 1973, Linear statistical inference and its applications (Wiley, New York, NY). Schmidt, P., 1988, Estimation of a fixed-effect Cobb-Douglas system using panel data, Journal of Econometrics 37, 361-380. Schmidt, P., 1990, Three stage least squares with different instruments for different equations, Journal of Econometrics 43. 389-394.