SOCIAL
SCIENCE
RESEARCH
6, 197-210 (1977)
Dynamic Models and Cross-Sectional Data: Estimation of Dynamic Parameters in Cross-Sectional Data RONALD University
SCHOENBERG of Arizona
Serious difficulties in the interpretation of structural coefficients may arise when a static, cross-sectional model is applied to data generated by a dynamic process. One alternative is to collect data at several points in time, and specify the dynamic features that are proposed to exist. Another alternative is discussed here. Overidentifying restrictions in a static simultaneous equation system may, under certain conditions, be used to identify dynamic parameters (autoregression and autocorrelation coefficients) when data are available at only one point in time.
On occasion an investigator may be faced with the necessity of proposing a static linear structural model when a dynamic model is called for simply because the data are cross-sectional. Such a compromise, however, may have serious consequences for the interpretation of the estimated coefficients. If, for example, the following dynamic model underlies the generation of the observed data: Yt
-ut
Xt -
which may be represented
in equation
YC = alwl
xt = hwl
Vt,
form:
+ bxt-l + UC, + vt,
where t is an arbitrary unit of time, x and y are normally distributed variables which, for computation convenience are measured from their means, the ai are autoregression of “stability” coefficients (Jijreskog and Sot-born, 1976; Heise, 1973, b is the structural coefficient of the relation ofy on X, and ut and vt are residual error terms. Given that 0 < ai < 1 and that the behavior of ut and vt conform to the following assumptions: E(u,) = E(v,) = E(u,v/) = 0, andE(utu;, t #=s) = E(vtv;, t f s) = 0, that is, ut and vc are serially uncorrelated, then we may conclude that the following 197 Copyright All rights
@ 1977 by Academic Press. Inc. of reproduction in any form reserved.
ISSN 0049-089X
198
RONALD
SCHOENBERG
expressions hold for the expected variances and covariances of x and y at time c, - a&(1 @“,t = [(l + a$z&+&/(1 u,,t = a&@(1 - a%)(1 - a&?), c& = &(l - at>.
- a;> + &l/(1
- a:>,
Suppose that an investigator had at hand cross-sectional sample estimates of the variances and the covariance, s,,, s:, si, say, and proposed the following static model: x-yte
b*
or y =b*x
+e,
and proceeded to estimate b*: b*=s
.mis2I’
But since E(S) = 2, then
a2
= 1 -U& the estimated
. b,
variance of the estimate is
and the explained
variance will, on the average, be
E(R2) = (1 _
uf(1 - u;)b2& + (1 - u,u~)~c&
u;u;)b2&
.
We can immediately see that as the stability coefficients approach 1, b* becomes very large, its standard error becomes very large, and the explained variance approaches 0, whatever the magnitude of the structural coefficient b. And when the stability coefficients, a, and u2, are close to 1, very small differences between them have enormous consequences on the estimates of the parameters of the static model. Given the theoretical specification of a dynamic model and a crosssectional data set, one might conclude from these results, then, that any analysis at all is hopeless. However, one should not despair. We shall see that the problem is not necessarily one of having data collected at only one point in time, but that it may be conceived of as a problem in
DYNAMIC
identification. may place parameters trading off extraneous
MODELS
AND
CROSS-SECTIONAL
DATA
199
Given sufficiently powerful theoretical specification we enough restrictions on the dynamic model such that its may be estimated in cross-sectional data. In effect, we will be the lack of a data set at a second point in time against information generated by theory.
THE
MODEL
Let Zt be a vector ofp random variables at time t, A be ap x p matrix of coefficients, and Yt be a vector of residual variables, such that, Zt = AZt-1
Furthermore,
+ Yt.
(1)
let E(Y,) = 0
and E(Y,Y,) = 8, r =s = 0, tfs; that is, the elements of Yt are not autocorrelated and have the variancecovariance matrix E 1 The sort of dynamic process that we shall consider here must be stable. This is necessary since the variance-covariance matrix of the variables must converge as the process develops in the probability limit to one that is entirely a function of the parameters of the system. Dynamic processes for which the variance-covariance matrix of the variables does not converge are unstable, and for individual cases are short-lived. The only analytical alternative for unstable processes is the analysis of individual cases across time, and most likely not even two points in time would be enough. Since most large-scale social processes appear, however, to contain no violent instabilities, at least not over the time ranges available to investigators, the models discussed here would seem to have general applicability (Zeeman, 1976, contains discussion of some newly developed models that combine the features of unstable and stable models). The mathematical equivalent of assuming stability is to specify that the characteristic roots, or eigenvalues, of the square matrix A are less than 1 in modulus, and that A is nonsingular. If the roots are equal to 1, or if A is singular (i.e., at least one root is 0), the system is a function of the initial state of the system (substantively a very undesirable feature, since for any such process, no matter how long ago initiated, we must then treat the present state as a consequence of some initial state about which we have no information; however, this is a mathematically desirable feature, since then the static parameters would be unbiased, but not efficient, estimates
200
RONALD
SCHOENBERG
of their dynamic counterparts). The roots may be negative or complex, which would imply that the system was cyclical or oscillatory. If the roots are greater than 1, the system is exponentially unstable, and if less than - 1, the system is unstable in an oscillatory manner. The specifications regarding the behavior of Yt, combined with the stipulations regarding the roots of A, imply that the system is stationary, that is, that the autocovariance matrix of the variables in Zt, i.e., ZZ’t+s, is a function only of S. Some social processes are not stationary and will require different models than those discussed here. If, say, we were studying individuals and age was included in the model, it would not be stationary. However, age as an aggregated property of some collective, a census tract, for example, may be considered stationary. In some situations the nonstationary variables may be transformed to stationary variables. The solution of Eq. (1) is (Miller, 1968, p. 87) (2)
Then, (3)
E(z,+,z’)
(4)
=-fAr+k&l’. r=o
From Eq. (1), B = (Z, - AZ,-l)(Z, = Z,Z/ - AZ,,Z/
- AZ,-1)’ - Z,ZI-,A
’ + AZ,-,Z;-,A
‘,
but from Eqs. (3) and (4), E(Z,Z/-,) E(Z,,Z/)
= AE(ZtZ;) = E(Z,ZgA’=
= A& ZA’,
and therefore E(a)
= z -AI;A’ =I? -AI&i’.
-Ati’
+AU’ (5)
We have in Eq. (5) a parameterization of a dynamic process involving the coefficient matrix, and the variance-covariance matrices of the residual variables and the observed variables at only a single point in time. However, while a given set of parameters and residual variances and covariances will generate a unique variance-covariance matrix of the
DYNAMIC
MODELS
AND
CROSS-SECTIONAL
DATA
201
variables, a given variance-covariance matrix of the variables may have been generated by a great number of possible sets of parameters (indeed an infinite number). This is referred to as the identification problem, and can be solved by constraining elements of A and B’in various ways: to zero, to some nonzero value, to each other, to functions of each other, and so on. The constraints may be such as to identify (i.e., produce a unique estimate of) some parameters but not others. Exact methods of determining which parameters are identified when not all of them are identified have not been developed, though linear algebra may be feasible in many cases. If the constraints are such as to produce one and only one estimate of each parameter, then the model is said to be exactly identified. In this case linear algebra suffices to calculate best linear unbiased estimates, given a sample variance-covariance matrix and provided that the variables are multinormally distributed. If, however, the constraints produce more than one estimate for some parameters then the model is overidentified and maximum likelihood estimation is the most efficient way of calculating unbiased estimates of the parameters, though this requires optimization of a nonlinear function of the parameters and the observed variance-covariance matrix. Maximum-Likelihood Estimation of the Model Let the elements of ZI in Eq. (1) be multinormally distributed in the population, and their variance-covariance matrix be Z. Furthermore, let S be a variance-covariance matrix of the variables calculated from an appropriately drawn sample of size N from that population. Thus, plim(S) = 2, and the log-likelihood of the sample, apart from an irrelevant constant, is (Jiireskog and Goldberg, 1975; Anderson, 1958, p. lS9) L = - ; To maximize
[log ) Z 1 + tr(C-‘S)].
L we minimize F = - $ L = log 1 I: ( + tr(z-‘S).
(6)
Clearly in order to calculate F we will need to solve Eq. (5) for Z. Let Z* be a column vector, T(T + 1)/2 by 1, of the lower-left nonredundant elements of 2, an r by r symmetric matrix, stored row-wise. Similarly, let a* be the T(T + 1)/2 by 1 column vector of the lower-left nonredundant elements of 51 Then an Y(Y+ 1)/2 by ~(r + 1)/2 square matrix P exists that is made up of the elements of A such that and therefore c* = (1 - P)-‘8”.
(7)
202
RONALD
SCHOENBERG
Employing an iterative scheme, a computer program may be devised to select values for the free (i.e., unconstrained) elements of A and S that minimize F in Eq. (6). (A computer program to accomplish this is in progress.) The elements of P are related to the elements of A in the following manner (wherep,, is the m ,nth element of P, and au is the i jth element of A): PmR= (1 - ~/2kGkQ
+ wok),
(8)
where 6=1,k=l = 0, k f 1,
and m = i(i - 1)/2 +j, n = k(k - 1)/2 + 1, and i, j, k, and 1 are taken successively over the row and column indices of the lower left portion of 2, in row-wise order. Further details of this transformation appear in Appendix I. To minimize F in Eq. (6) we will need the first-order partials of that function with respect to the unconstrained parameters, and to calculate the standard errors of the parameters (in addition to aiding the minimization) we shall need the second-order partials of the function. Both of these sets of partials will also provide information with regard to the identification of the model and, since no general rules exist to help determine identifiability, as they do in simultaneous equation static models, the second-order partials will be invaluable in preventing us from treating an underidentified model as identified. Jiireskog and Goldberger (1975) have shown that the first-order partials of the function in Eq. (6) are
azvapti = ti-jC-'(2 - sp-laCiapu,l, and the second-order
(9)
partials are
ph(awapilavkl)
= tr[C-l(aZia~~)C-'(aC/av,,)l.
(10)
To evaluate either of these expressions we will require the first-order partial of 2 (or z*) with respect to the parameters. First, (Z
dS*laaij
=
-
away,
= PEi,
P)-l(aPIaaU)(z - P)-lE*,
(11) (12)
where ei is an r(r + 1)/2 element column vector whose ith element is 1 and all other elements are 0.
DYNAMIC
aPlaati
is calculated
MODELS
AND
CROSS-SECTIONAL
203
DATA
as follows: (aPhti),,
= Bakl,
(13)
where k i,
(14)
1 Sj n =j + 1, = (Jj - 1)/2) + I, 1 > j,
(15)
and B=2,k=i =1,k+i and all elements of aPlaaij that are not specified by Eqs. (14) and (15) are equal to 0. The proposed model in Eq. (5) may be tested against an unconstrained alternative by means of the maximum-likelihood ratio statistic. The minimum of Eq. (6) under the unconstrained alternative is Fo = log 1 s ) + r. Letting
F, be the minimum x2 = N(F,
of Eq. (6) under the proposed - F,) = N(F,
model, then
- log 1 S 1 - Y),
with degrees of freedom equal to the number of variances and covariances in S minus the number of estimated parameters in A and 8: A Structural Equation Model Approach to the Specijication of the Model The specification of particular models will be made more convenient if we distinguish between exogenous and endogenous variables. Suppose that we have in mind a model that has the following static form, BY = l-‘X + U, where Y is a vector of p endogenous variables, X is a vector of q exogenous variables, U is a vector of p residual variables, B is a p x p matrix of coefficients of the relationships among the endogenous variables with ones on the diagonal, and Y is thep x q matrix of the coefficients of the relationships of the endogenous variables to the exogenous variables. The dynamic analogue of this model is Yt = BY,-, + Txtpl + Ut, xt = CX&1 + vt, where now the diagonal of B contains the autoregression
(16)
(17) coefficients
of
204
RONALD
SCHOENBERG
the endogenous variables, and C is a diagonal matrix of the autoregression coefficients of the exogenous variables. Putting these equations into matrix form we get.
which is the form of Eq. (1). Letting c,, = (1IN)YY’ X:,# = (1IN)xY’ c,, = (l/N)XX’
= XL+,
W,, = (1IN)UU’ ‘Pu, = (1IN)UV’ ‘PD, = (1IN)VV’
= q”,,
then
which is in the form of Eq. (5). Obtaining maximum-likelihood estimates of the elements of B, r, C, 'Pu,, W,,, W,, amounts to, then, assembling these matrices as elements of a combined matrix A and a combined matrix a; assembling &,, C,,. and C,, into a combined variance-covariance matrix 2, and then choosing values for the elements of the combined parameter matrices that minimize Eq. (6). The characteristic roots of A will be the characteristic roots of B and C since
from which it also follows that the diagonal elements of C must be nonzero and have absolute value less than 1. The characteristic roots of B depend, of course, on the particular specification with regard to the relationships among the endogenous variables. If, for instance, the relationships among the endogenous variables were “fully recursive,” then these variables might be so ordered that B is triangular, and thus the characteristic roots in this case would be the diagonal elements of B. Note that this implies that recursive dynamic models of the type described in this paper will not be oscillating across time since complex roots will not be possible.
DYNAMIC
MODELS
AND
CROSS-SECTIONAL
DATA
205
u1 x1
I Yl----+
x2
Y&---A
(
fi
Y*f---"2 11
y3 c--
“3
I
FIGURE I
X2.t
-
FIGURE
V2,t 2
Suppose that we have specified the relationships among four endogenous variables and two exogenous variables illustrated in the path diagram in Fig. 1. And further suppose that we wish to estimate autoregression parameters for each of the variables; in other words, we wish to specify the dynamic model illustrated in the path diagram in Fig. 2 and estimate its parameters in cross-sectional data. The equations represented by the model in Fig. 2 would in matrix form be
206
RONALD
SCHOENBERG
and it follows from these equations
A=
/a,0 bl 0 0 0 0
and from Fig. 2 that
0 0 y10 a2 b2 0 0 0 b3 a3 b4 0 0 0 0 a4 0 y2 0 0 0 Cl0 0 0 0 0 c2
and %l 0 %2 o *u2u3 w UlU4 0 0 0 i 0 0
%3
0 0 0
%4
0% 0
9VlY2
%2
where the redundant portion of B’has been omitted. With four endogenous and two exogenous variables we have 6(6 + 1)/2 = 21 equations in the matrix Eq. (19) and 21 unknowns in A and a ; therefore we have zero degrees of freedom and the model appears to be exactly identified. The characteristic roots of A are al, a4, cl, c2, and (!I) {a2 + a3 + [(a2 - ad2 - Q&l+}. If 4b,b, > (a2 - as)’ then two of the roots will be complex conjugates and the system will be oscillatory. CONCLUSION
In this paper I have proposed a method for the estimation of parameters of first-order dynamic models in cross-sectional data. I must emphasize, however, that this is not accomplished without cost. In essence the investigator must “trade off’ some a priori information against the failure to produce data at a second point in time. First, we must assume that the coefficient matrix A is constant across time. This assumption is usually relaxed when data are available at several points in time (for example, see Joreskog and Siirbom, 1976). The second type of a priori information required is in the form of identifying restrictions. The necessary condition of identification is degrees of freedom greater than or equal to zero. And to achieve this in the type of models described here one must make stronger claims than those usually required for comparable static models. And the nature of these claims are such that identification can be reached only in the context of simultaneous equation models; thus the possibility does not even exist to identify dynamic models that are analogous to the bivariate or multivariate regression static models-sufficient, that is, to estimate their
DYNAMIC
MODELS
AND
CROSS-SECTIONAL
DATA
207
parameters in cross-sectional data-since they do not contain overidentifying restrictions that can be “used,” so to speak, to identify the dynamic parameters. In general, a priori information leading to overidentifying restrictions in structural models can be used in many ways. It may be used to identify measurement errors in variables, correlation of residual terms, nonrecursive feedback, as well as dynamic parameters, as described in this paper. Again, this occurs only in the context of simultaneous structural equation models. I believe, therefore, that the solution to many of our analytical or methodological problems is better theoretical specification. Simply including an exogenous variable in the model that is left out of all but one equation will identify all sorts of possibilities; in fact, with many such “instrumental” variables in the model, the problem becomes one of deciding which to estimate: measurement error parameters (see Blalock, 1969; several indicators of a variable are another way of estimating measurement error parameters, and would release overidentifying restrictions for other uses; see Jiireskog, 1973, for example), nonrecursive feedback parameters, correlations among residuals, or, as I have indicated in this paper, dynamic parameters. APPENDIX
I
Suppose 2 - AZA’
= 5;
where Z and B’are positive definite symmetric matrices of order r, and A is an r by T square matrix of coefficients. Then let Z* be an T(T + 1)/2 by 1 column vector of the lower-left nonredundant elements of C, stored row-wise; Z* be a similarly dimensioned column vector of the lower-left nonredundant elements of E, stored row-wise; then an T(T + 1)/2 by r(r + 1)/2 square matrix P exists such that x* - pz*
= g”,
and therefore z* = (1 - P)-‘@.
The elements of P are related to the elements of A in the following manner (wherep,, is the m,nth element of P, and ati is the ijth element of A): P mn = (1 - ~MJ*kQz
+ &%),
where 6= 1,k=l = 0, k # 1,
208
RONALD
SCHOENBERG
and m = i(i - 1)/2 +j n = k(k - 1)/2 + 1.
The simplest way of determining the values of i, j, k, and 1, given values of m and n, is to construct a table: morn=123456789...,
iork=122333444..., jorl =112123123..., or, alternatively, j or1 1234...
Suppose that A is the following
2 by 2 matrix,
AP =
then
aL
2a11a12
P, = alla21 411u22 + i dl &2,a22
d2 a12azl
a12a22 ai2
Lower-order P matrices are always upper-left order P matrices. Thus, if
in all higher-
,
then hll
P3
submatrices
=
4
~11a12
alla21
alla22
a&
h22a21
alla31
alla32
+
a21431
aZla32
+
h31a32
+
42
hlla13
alZa22
al1423
at2
hZla23
alZa31
alZa32
a22a31
al2a21
~lZal3 +
al3a21
alZ(223
alla33
+
al3431
a22432
aZla33
+
a3la23
42
k33a31
43 +
al3a22
412a33
+
al3a32
a13a33
a22a33
+
a23a32
a23a33
hz2az3
h32a23
a13a23 da
d3
DYNAMIC
MODELS
AND CROSS-SECTIONAL
and we see that P2 is an upper left 3 by 3 submatrix Let A be partitioned so
DATA
209
in P3.
BlA=
, i OC 1 where B is a p by p matrix of coefficients, r is a p by q matrix of coefficients, and C is a q by q diagonal matrix of coefficients. A will be partitioned in this manner if we distinguish between endogenous and exogenous variables. If we further decide to focus on “recursive” structural models, then the endogenous variables may be so ordered that B is an upper triangular matrix with zeros below the diagonal, that is, bti = 0 for i > j. Then A will be upper triangular, and P as well. Therefore, 1 A ) =fibiiCii i=l and
since PTM?l= &i&j,
and thus ( p 1 = 1 A ) r(r+l--/z.
We may conclude that, in this case, P is nonsingular if and only if A is nonsingular. Moreover, since we have assumed that the characteristic roots of A are all less than 1 in modulus, then in the triangular case the diagonal elements ofA are necessarily less than 1 in absolute value, and it follows therefore that I - P is nonsingular as well (which must be established in order to calculate Z*). A generalization of these results to the nontriangular case has not been worked out. REFERENCES Anderson, T. W. (1958), An Introduction to Multivariate Statistical Analysis, John Wiley and Sons, New York. Blalock, H. M., Jr. (1969), “Multiple indicators and the causal approach to measurement error,” American Journal of Sociology 75, 264-272. Heise, D. R. (1975), Causal Analysis, John Wiley and Sons, New York. Jiireskog, K. G. (1973) “A general method of estimating a linear structural equation system, ” in Structural Equation Models in the Social Sciences (A. S. Goldberger and 0. D. Duncan, Eds.), Seminar Press, New York. Jiireskog, K. G., and Goldberger, A. S. (1975) “Estimation of a model with multiple
210
RONALD
SCHOENBERG
indicators and multiple causes of a single latent variable,” Journal of the American Statistical Association 70, 63 I-639. Jiireskog, K. G., and Sorbom, D. (1976). “Statistical models and methods for analysis of longitudinal data,” in Latent Variables in Socioeconomic Models (D. J. Aigner and A. S. Goldberger, Eds.). North-Holland, Amsterdam. Miller, K. S. (1968), Linear Difference Equations, W. A. Benjamin, New York. Zeeman, E. C., (1976) “Catastrophe Theory,” Scientific American 234, 65-84.