P. R. Krishnaiah and L. N. Kanal, eds., Handbook of Statistics, Vol. 2 ©North-Holland Publishing Company (1982) 747-771
'~ZI_
Multivariate Analysis with Latent Variables*
P. M. Bentler and D. G. Weeks
1.
Introduction
Data encountered in statistical practice often represent a mixture of variables based on several levels of measurement (nominal, ordinal, interval, and ratio), but the two major classes of variables studied with multivariate analysis are clearly discrete (nominal) and continuous (interval or ratio) (e.g., Anderson, 1958; Bishop, Fienberg, and Holland, 1975; Dempster, 1969; Goodman, 1978). Distinctions have been made in both domains between manifest and latent variables: manifest or measured variables are observable realizations of a statistical process, while latent variables are not observable, even in the population. Although latent variables are not observable, certain of their effects on manifest variables are observable and hence subject to study. The range of possible multivariate models is quite large in view of the fact that both manifest and latent variables can be discrete or continuous. This chapter will concentrate on models in which both manifest and latent variables are continuous; this restriction still generates a large class of models when they are considered simultaneously in several populations, and when certain variables are considered fixed rather than random. Historically, latent variable models are most closely identified with latent structure analysis (Lazarsfeld and Henry, 1968), mental test theory (Lord and Novick, 1968), and factor analysis (Anderson and Rubin, 1956; Lawley and Maxwell, 1971). Only the latter topic has any special relevance to this chapter. The field has progressed far beyond the simple factor analytic approach per se, to include such diverse areas as extensions of factor analysi s to arbitrary covariance structure models (Bentler, 1976; Jrreskog, 1973; McDonald, 1978), path analysis (Tukey, 1954; Wright, 1934), simultaneous equation models (Geraci, 1977; Hausman, 1977; Hsiao, 1976), structural equation models (Aigner and Goldberger, 1977; Bentler and Weeks, 1980; Goldberger and Duncan, 1973; JOreskog, 1977), errors in variables models, including multiple regression (Bhargava, 1977; Feldstein, 1974; Robinson, 1974), and studies of structural and *Preparation of this chapter was facilitated by a Research Scientist Development Award (K02DA00017) and a research, grant from the U.S. Public Health Service (DA01070). 747
748
P. M. Bentler and D. G. Weeks
functional relations (Anderson, 1976; Gleser, 1979; Robinson, 1977). Although each of these areas of multivariate analysis utilizes concepts of latent variables, and although each has its own specialized approaches to the statistical problems involved, only recently has it been noted that certain basic ideas can serve a unifying function so as to generate a field of multivariate analysis with latent variables. This chapter focuses on those principles associated with model simplicitY and generality as well as large sample consistent, normal, and efficient estimators of model parameters. Further references to some of the voluminous literature on theory and applications in the above domains can be found in Bentler (1980). In order to coherently define a field of multivariate analysis with latent variables, certain further limitations are imposed for convenience. We impose the restriction of linearity and structure in models, that is, we deal with models whose range of potential parameters, their interrelations, and their relations to variables are specified such that manifest variables (MVs) and latent variables (LVs) are linearly related to each other via explicit (' structured') matrix expressions. In view of the existence of an explicit parameter structure that generates the MVs, the first and second MV moments become explicitly structured in terms of the model parameters; hence, the label moment structure models would also be appropriate to define the field. As a consequence of this limitation, certain very general models are not considered in this chapter. For example, consider a model of the form
fi(.~t, Xt,Oli)'~'~Uit ( i z l .... ,j)
(1.1)
(Amemiya, 1977). In this nonlinear simultaneous equation system for the tth observation there are j equations, where Yt is a j-dimensional vector of dependent variables, x t is a vector of independent variables, o~i is a vector of unknown parameters, and u , is a disturbance whose j-dimensional vector has an independent and identically distributed multivariate normal distribution. It is the nonlinearity that excludes the current model from consideration, but some models described below may be considered as nonlinear in parameters provided this nonlinearity is explicit (see Jennrich and Ralston, 1978). Among structured linear models, the statistical problems involved in estimating and testing such models are furthermore considerably simplified if the assumption is made that the random variables associated with the models are multivariate normally distributed. While such an assumption is not essential, as will be shown, it guarantees that the first and second moments contain the important statistical information about the data.
1.1. Concept of analysis with latent variables Multivariate analysis with latent variables generally requires more theoretical specification than multivariate analysis with manifest variables. Latent variables are hypothetical constructs invented by a scientist for the purpose of understanding a research area; generally, there exists no operational method for directly
Multivariate analysis with latent variables
749
measuring these constructs. The LVs are related to each other in certain ways as specified by the investigator's theory. When the relations among all LVs and the relation of all LVs to MVs are specified in mathematical form--here simply a simultaneous system of highly restricted linear structural equations--one obtains a model having a certain structural form and certain unknown parameters. The model purports to explain the statistical properties of the MVs in terms of the hypothesized LVs. The primary statistical problem is one of optimally estimating the parameters of the model and determining the goodness-of-fit of the model to sample data on the MVs. If the model does not acceptably fit the data, the proposed model is rejected as a possible candidate for the causal structure underlying the observed variables. If the model cannot be rejected statistically, it may provide a plausible representation of the causal structure. Since different models typically generate different observed data, carefully specified competing models can be compared statistically. As mentioned above, factor analysis represents the structured linear model whose latent variable basis has had the longest history, beginning with Spearman (1904). Although it is often discussed as a data exploration method for finding important latent variables, its recent development has focused more on hypothesis testing as described above (J6reskog, 1969). In both confirmatory and exploratory modes, however, it remains apparent that the concept of latent variable is a difficult one to communicate unambiguously. For example, Dempster (1971) considers linear combinations of MVs as LVs. Such a concept considers LVs as dependent variables. However, a defining characteristic of LV models is that the LVs are independent variables with respect to the MVs; that is, MVs are linear combinations of LVs and not vice-versa. There is a related confusion: although factor analysis is typically considered to be a prime method for dimension reduction, in fact it is just the opposite. If the MVs are drawn from a p-variate distribution, then LV models can be defined by the fact that they describe a (p + k)-variate distribution (see Bentler, 1982). Although less than p of the LVs are usually considered important, it is inappropriate to focus only on these LVs. In factor analysis the k common factors are usually of primary interest, but the p unique factors are an equally important part of the model. It should not surprise the reader to hear that the concept of drawing inferences about (p +k)-variates based on only p MVs has generated a great deal of controversy across the years. While the MVs are uniquely defined, given the hypothetical LVs, the reverse can obviously not be true. As a consequence, the very concept of LV modeling has been questioned. McDonald and Mulaik (1979), Steiger and SchiSnemann (1978), Steiger (1979), and Williams (1978) review some of the issues involved. Two observations provide a positive perspective on the statistical use of LV multivariate analysis (Bentler, 1980). Although there may be interpretive ambiguity surrounding the true 'meaning' of a hypothesized LV, it may be proposed that the statistical evaluation of models would not be affected. First, although an infinite set of LVs can be constructed under a given LV model to be consistent with given MVs, the goodness-of-fit of the LV model to data (as indexed, for example, by a X2 statistic) will be identical under all possible choices
750
P. M. Bentler and D. G. Weeks
of such LVs. Consequently, the process of evaluating the fit of a model to data and comparing the relative fit of competing models is not affected by LV indeterminacy. Thus, theory testing via LV models remains a viable research strategy in spite of LV indeterminacy. Second, although the problem has been conceptualized as one of LV indeterminacy, it equally well can be considered one of model indeterminacy. Bentler (1976) showed how LV and MV models can be framed in a factor analytic context to yield identical covariance structures; his proof is obviously relevant to other LV models. While the LVs and MVs have different properties, there is no empirical way of distinguishing between the models. Hence, the choice of representation is arbitrary, and the LV model may be preferred on the basis of LV's simpler theoretical properties. Models with MVs only can be generated from LV models. For example, traditional path analysis or simultaneous equation models can be executed with newer LV models. As a consequence, LV models are applicable to most areas of multivariate analysis as traditionally conceived, for example, canonical correlation, multivariate analysis of variance, and multivariate regression. While the LV analogues or generalizations of such methods are only slowly being worked out (e.g., Bentler, 1976; Jrreskog, 1973; Rock, Werts, and Flaugher, 1978), the LV approach in general requires more information to implement. For example, LVs may be related to each other via a traditional multivariate model (such as canonical correlation), but such a model cannot be evaluated without the additional imposition of a measurement model that relates MVs to LVs. If the measurement model is inadequate, the meaning of the LV relations is in doubt.
1.2. Importance of latent variable models The apparent precision and definiteness associated with traditional multivariate analysis masks a hidden feature of MV models and methods that makes them less than desirable in many scientific contexts: their parameters are unlikely to be invariant over various populations and experimental contexts of interest. For example, the fl weights of simple linear regression--a prototype of most MV models--can be arbitrarily increased or decreased in size by manipulation of the independent variables' reliabilities (or percent random error variance). Thus, questions about the relative importance of/3 i or/37 cannot be answered definitively without knowledge of reliability (Cochran, 1970). While MV multivariate analysis is appropriate to problems of description and prediction, its role in explanation and causal understanding is somewhat limited. Since MVs only rarely correspond in a one-to-one fashion with the constructs of scientific interest, such constructs are best conceived as LVs that are in practice measured with imprecision. Consequently, certain conclusions obtained from an MV model cannot be relied upon since various theoretical effects will of necessity be estimated in a biased manner. They will also not replicate in other studies that are identical except for the level of precision (or error) in the variables. Thus the main virtues of LV models are their ability to separate error from meaningful effects and the associated parametric invariance obtainable under various circumstances.
Multivariate analysis with latent variables
2.
751
Moment structure models: A review The simplest model to be considered is the classical common factor model x=lz+
A~+e
(2.1)
where x is a ( p × 1) random vector of scores on p observed variables,/, is a vector of means, A is a ( p × k) matrix of structural coefficients (factor loadings), ~ is a (k × 1) random vector of factor scores, and e is a ( p ×1) random vector of residual or unique scores. When the model is written for only a single population, as in (2.1), the means vector/, may be suppressed without loss of generality. It is assumed that E ( ~ ) = 0 and E ( e ) = 0 . The ~ and e represent p + k LVs, as mentioned previously, and x represents the vector of MVs. The covariances of the MVs are given by = A q ~ A ' + 'Is
(2.2)
where • = E ( ~ ' ) , g ' = E(~'f'), and E ( ~ f ' ) = 0. A further common assumption is that g' is diagonal, i.e., that the unique components of the LVs are uncorrelated. It is apparent that in this simple model there is no stucture among the LVs. Rather, they are either independent or simply correlated. The MVs are functions of the LVs, rather than the reverse. JOreskog (1969) provided the first successful applications of the factor analysis model considered as a hypothesis-testing LV model, based on the ideas of Anderson and Rubin (1956) and Lawley (1940). Multiple correlation with LVs as independent variables follows naturally from the model (see also Lawley and Maxwell, 1973). Bentler (1976) provided a parametrization for (2.1) and (2.2), such that ~ = Z ( A ~ A ' + I ) Z with Z 2 = '/'. The parameters A, but not A, are scale invariant. When model (2.1) is generalized to apply to several populations simultaneously, the means of the random vectors become an important issue. A model for factor analysis with structured means (SOrbom, 1974) generalizes confirmatory factor analysis. A random vector of observations for the gth group can be represented as x g = v g + Ag~ ~ + f g
(2.3)
with expectations E(~ g) = 0 g and E(x g) =/,g ----v g + AgO g, and covariance matrix Z g = A g ~ g A ~' + "I"g. The group covariance matrices thus have confirmatory factor analytic representations, with factor loading parameters A g, factor intercorrelations or covariances ~g, and unique covariance matrices g'g. It is generally necessary to impose constraints on parameters across groups to achieve an identified model, e.g., A g = A for all g. Similar models were considered by JOreskog (1971) and Please (1973) in the context of simultaneous factor analysis in several populations. In a model such as (2.3), there is an interdependence of first and second moment parameters. The MVs' means are decomposed into basic parameters that may also affect the
P. M. Bentler and D. G. Weeks
752
covariance structure. These models are particularly appropriate to studies of multiple populations or groups of subjects, and to the analysis of experimental data. J0reskog (1970) proposed a model for the analysis of covariance structures that may be considered as a confirmatory second-order factor analytic model that allows structured means. The model can be developed from the point of view of a random vector model
x=t~+ BA~+ B~+e,
(2.4)
or as a model for n observations on p variables with data matrix X (Jrreskog, 1973). The variables have expectation E(X)=A~-P, where ~ is a parameter matrix and A and P are known matrices, and have the covariance structure
~ = B(A~A'+ ,I,2)B'+O 2.
(2.5)
The covariance structure thus decomposes the factor intercorrelation matrix of first-order factors by a further confirmatory factor analytic structure. The model is extremely general, containing such special cases as the models of Bock and Bargmann (1966) and Wiley, Schmidt, and Bramble (1973), as well as the MIMIC model of Jrreskog and Goldberger (1975) and numerous other models such as MANOVA or patterned covariance matrices (Browne, 1977). This model introduces the idea of higher order LVs, which was first explicated by Thurstone (1947). In such a model some LVs have no 'direct' effect on MVs. Model (2.4) allows two levels of LVs for the common factors, namely ~ and ~, as well as unique LVs e as in model (2.1). These LVs may be interpreted as follows. The unique factors e and the common factors ~ are seen to affect the MVs directly, via the parameter matrix B in the latter case. However, the common factors ~ affect the MVs indirectly via the product of parameter matrices BA. As before, there are more LVs than MVs. As a covariance structure model, the model has had several applications (Jrreskog, 1978), but it has seen few applications as a model with interdependent first and second moment parameters. A model developed by Bentler (1976) can be written for the g t h population ( g = l ..... m) as x g=/~g+ ~ j=l
a~ ~y, i
(2.6)
1
where the notation I-[~=lAi J g refers to the matrix product A 1g Ag2 . . . A ~, and the ( p × n) random matrix x of n observations on p variables has expectation E ( x g ) = ~g = Tg~-gUg + V~2gWg.
(2.7)
If n = 1, the model can be considered to be a random vector model with random vectors ~y; frequently these vectors are unobserved LVs such as factors. The
Multivariate analysis with latent variables
753
parameters of the model are T g, Zg, ~2g and the matrices A~, while U g, V g, and W g are known constant matrices. In some important applications the T g can be written as functions of the A~. The columns of xg are independently distributed with covariance matrix ,Yg. For simplicity it may be assumed that the ~ have covariance matrices ~jg and are independent of ~}, where j =~ j'. It follows that
~g= E j=l
A,g q~jg i=1
A~
.
(2.8)
i=1
It is apparent that this model introduces LVs of arbitrarily high order, while allowing for an interdependence between first and second moment parameters. Alternatively, in the case of a single population one may write model (2.8) as ~_,= A 1A 2" "" A ~ q ) A ~ . . . A'2A], where q~ is block-diagonal containing all of the q~j matrices in (2.8) (McDonald, 1978). It is possible to write Tucker's (1966) generalization of principal components to three 'modes' of measurement via the random vector form as x -- ( A Q B ) F ~ where x is a ( p q X 1) vector of observations; A, B, and F are parameter matrices of order (p X a), (q X b), and (ab X c) respectively, and ~ is of order (c X 1). The notation (A ®B) refers to the right Kronecker product of matrices (A ® B) = [a u B]. Bentler and Lee (1979a) have considered an extended factor analytic version of this model as (2.9)
x = (A®B)F~ + ~
where ~ is a vector of unique factors, and they developed statistical properties of the model in both exploratory and confirmatory contexts. The covariance matrix of (2.9) is given by = (A®B)F+I"(A'®B')
+ Z 2,
(2.10)
where E ( ~ ' ) = ~, E ( ~ " ) = Z 2, and E ( ~ ' ) = 0 . A more specialized version of (2.10) is described by Bentler and Lee (1978a). These models are applicable to situations in which MVs are completely crossed, as in an experimental design, and the LVs ~ and ~" are presumed to have the characteristics of the ordinary factor model (2.1). Krishnaiah and Lee (1974) studied the more general Kronecker covariance structure ~=(G1®271)+ . . . +(GkQ~k), where the Gi are known matrices. This model has applications in testing block-sphericity and blockintraclass correlation, but structured latent random variable interpretations of this model have not yet been investigated. An extremely powerful model growing out of the econometric literature is the structural equation system with a measurement structure developed by Keesling (1972), Wiley (1973), and J0reskog (1977). In this model, there are two sets of
P. M. Bentler and D. G. Weeks
754
observed random variables having the measurement structure
x=Ax~+6
y=Ay~+e,
and
(2.11)
where ~ and 71 are latent random variables and 8 and e are vectors of error of measurement that are independent of each other and the LVs. All vectors have as expectations the null vector. The measurement model (2.11) is obviously a factor-analytic type of model, but the latent variables are furthermore related by a linear structural matrix equation ~/= B*7/+ F ~ + f,
(2.12)
where E(~f') = 0 and B = ( I - B*) is nonsingular. It follows that B~/= F~ + ~', which is the form of simultaneous equations preferred by Jt~reskog (1977). The matrices B* and F contain structural coefficients for predicting ,/s from other ,/ and from ~ LVs. Consequently,
y = AyB-'(F~ + ~) + e.
(2.13)
The covariance matrices of the observed variables are given by
Zxx= Ax~A'x + O~, and
z . - - A B - I (r
r, +
~yx= AyB-1F~PA'x (2.14) t lA,,
+
where • = E ( ~ ' ) , g' = E ( ~ ' ) , 0~ = E(68'), 0~= E(ee'). Models of a similar nature have been considered by Hausman (1977), Hsiao (1976), Geraci (1977), Robinson (1977), and Wold (1980), but the JOreskog-Keesling-Wiley model, also known as LISREL, has received the widest attention and application. The model represented by (2.11)-(2.14) represents a generalization of econometric simultaneous equation models. When variables have no measurement structure (2.11), the simultaneous equation system (2.12) can generate path analytic models, multivariate regression, and a variety of other MV models. Among such models are recursive and nonrecursive structures. Somewhat paradoxically, nonrecursive structures are those that allow true 'simultaneous' or reciprocal causation between variables, while recursive structures do not. Recursivity is indicated by a triangular form for the parameters B* in (2.12). Recursive structures have been favored as easier to interpret causally (Strotz and Wold, 1960), and they are less difficult to estimate. Models with a measurement structure (2.11) allow recursivity or nonrecursivity at the level of LVs. When the covariates in analysis of covariance (ANCOVA) are fallible--the usual case--it is well known that ANCOVA does not make an accurate adjustment for the effects of the covariate (e.g., Lord, 1960). A procedure for analysis of covariance with a measurement model for the observed variables was developed by S6rbom (1978). This model is a multiple-population structural equation model
Multivariate analysis with latent ~)ariables
755
model with measurement error, with x g = vx + Ax~g + 8 g,
yg = Vy + Ay~l g + eg,
(2.15)
where common factor LVs are independent of error of measurement LVs. The vector of criterion variables in the gth group is yg, and x g is the vector of covariates. The latent variables ~/g and ~g are related by (2.16)
~g = a g + Fg~ ~ + fg
where a g provides the treatment effect. Consequently, y g = Uy Av A y a g -[- A y ( l"g~ g + ~ g ) -[- e g.
(2.17)
The expected values of the latent variables are E(~g)---/z~ and E(T/g)=/zng. Consequently one obtains the expectations E(x g) =/Zxg= vx + A x # ~ and E ( y g) = i~gy=Vy + Ayl~g~. Then, rewriting (2.15) and (2.17) one obtains (2.18)
xg=t~gx + Ax~S* + 8g' =
+
+ V) +
where ~g = / ~ + ~g* and the expected values of ~g*, ~g, eg, and 6 g are null vectors. The covariances of (2.18) are taken to be g __
g t
Zxx-AxO'Ax+O~n,
g
g __ Zxy-AxO
and
g gt t F Ay-}-O~e
(2.19) g
=
t
+
where the covariance matrices of the ~g* and fg are given by og and ~bg, and where the various og matrices represent covariances among the errors. The structural equation models described above have been conceptualized as limited in their application to situations involving latent structural relations in which the MVs are related to LVs by a first-order factor analytic model. Causal relations involving higher-order factors, such as 'general intelligence', have not been considered. Weeks (1978) has developed a comprehensive model that overcomes this limitation. The measurement model for the gth population is given by x = IZx + Axli
and
y = l~y + Ay~l
(2.20)
where the superscript g for the population has been omitted for simplicity of notation. The components of the measurement model (2.20) are of the form (2.6) and (2.7), but they are written in supermatrix form. For example, A x = [Alx "'" A k, A L ' " • A kx- l , " ",A~] and, similarly, ( ' = [ ( ~ ' , l i k - l ' , . . . , ~ l'] where the
756
P. M. Bentler and D. G. Weeks
superscripts 1..... k represent measurement levels. The lowest level corresponds to unique factors, which may be correlated; levels higher than k = 3, allowing second-order common factor LVs, will not often be needed. The latent variables at all levels are related by a structural matrix equation for the g th population (superscript omitted) (2.21) where E(~) = >~, E(~) = / ~ , E(f) = 0, a is a regression intercept vector, and f is a multivariate residual. The multivariate relation (2.21) is specified such that if Ty = I and T~ = I, the equations relate the variables ,/ and ~, which may be considered to be latent factors orthogonalized across factor levels. On the other hand, if Ty a n d / o r Tx are specified in terms of various A j and A j matrices, (2.21) involves regression among the corresponding primary factors. That is, Tx is structured such that [Tx~]' = [~W,..., ~rJ', .... ~rl'], where ~rJ is either an orthogonalized factor ~Yor else a primary factor r j which can be expressed by a factor model such as r 2 = A ~ 3 + (2. The matrices B r a n d / " represent coefficients, as before, but in most instances Br will be block-diagonal (thus not allowing cross-level regressions among the ~/'s). The covariance matrix generated by (2.20) and (2.21) is represented by
Nix = Ax~A'x,
~,y~= AyT7'B-'FT~A',
and
(2.22) ~,yy= AyTy 1B -1 ( F T f b T x'F ' + g ' ) B '
1
T~, - - 1 A yt
where E((~ - / ~ ) ( ~ - bt~)') = ~, E(ff') = g', B = ( I - Br) and where • and g" are typically block-diagonal. Although (2.22) has a relatively simple representation due to the supermatrix notation, quite complex models are subsumed by it. It may be noted that (2.22) is similar in form to the JOreskog-Keesling-Wiley structure (2.19), but the matrices involved are supermatrices and one has the flexibility of using primary or multilevel orthogonalized factors in structural relations. See Bentler (1982) for a further discussion and Weeks (1980) for an application of higher-order LVs, and Bentler and Weeks (1979) for algebraic analyses that evaluate the generality and specialization possible among models (2.1)-(2.22). It is apparent that a variety of LV models exist, and that the study of higher-level LVs and more complex causal structures has typically been associated with increasingly complex mathematical representations. It now appears that arbitrarily complex models can be handled by very simple representations, based on the idea of classifying all variables, including MVs and LVs, into independent or dependent sets. As a consequence, a coherent field of multivariate analysis with latent variables can be developed, based on linear representations that are not more complex than those of traditional multivariate analysis.
Multivariate analysis with latent variables
3.
757
A simple general model
We shall develop a complete linear relations model for LV multivariate analysis by considering separately a structural equation model and a selection model, then combining these parts into a single model. It has been shown that this model is capable of representing all of the models discussed in the preceding section (Bentler and Weeks, 1979, 1980).
3.1.
The structural equation model
Consider the structural equation model r/= flor/+ y,~,
(3.1)
where ~ is an (m × 1) random vector of dependent variables, ~ is an (n × 1) random vector of independent variables, and where/3o and 7 are (m × m) and (m × n) parameter matrices governing the linear relations of all variables involved in the m structural equations. The parameters in 7 represent weights for predicting dependent from independent variables, while the parameters in B0 represent weights for predicting dependent variables from each other. Typically, but not necessarily, the diagonal of B0 contains known zero weights. Letting fl = ( I - rio), (3.1) yields the form f i t = 7~- In general, the coefficients of (3.1) consist of (a) fixed parameters that are assigned given values, usually zero; (b) free parameters that are unknown; and (c) constrained parameters, such that for constraints c~ and cj and any parameters 0~ and Oj, w~Oi = wjOj. These constraints, taken from Bentler and Weeks (1978), are more general than found in JOreskog (1977) but more restricted than those of Robinson (1977). Eq. (3.1) is similar to (2.12), but it omits the residual variates f. This difference is extended in the very conceptualization of the random variables B and ~. In the J6reskog-Keesling-Wiley model, the simultaneous eq. (2.12) relates only latent variables. Eq. (3.1), on the other hand, relates all variables within the theoretical linear system under consideration, whether manifest or latent. Each variable in the system is categorized as belonging either to the vector B or ~: it is included as one of the m dependent variables in the system if that variable is ever considered to be a dependent variable in any structural equation, and it is considered as one of the n independent variables in the system otherwise. Independent or nondependent variables are explanatory predictor variables that may be nonorthogonal. The vector ~ consists of all manifest variables of the sort described in (2.11), namely those variables that are presumed to have a factor analytic decomposition. In addition, ~ contains those latent variables or unmeasured (but measurable) variables that are themselves linear functions of other variables, whether manifest or latent. As a consequence, ~ contains 'primary' common factors of any level that are decomposed into higher-order and residual, orthogonalized factors. Thus, we might define 7 ' = [y', ,r'] where the random vector y represents MVs that are dependent variables and ~- represents all other LV dependent variables in the
P. M. Bentler and D. G. Weeks
758
system. Obviously, the vector ~/represents more than the 'endogenous' variables discussed in econometrics, a n d / 3 0 represents all coefficients for structural relations among dependent variables, including the coefficients governing the relation of lower-order factors to higher order factors excepting those residuals and the highest order factors that are never dependent variables in any equation. The vector ~ contains those MVs and LVs that are not functions of other manifest or latent variables, and typically it will consist of three types of variables, ~ ' = Ix', f', e'], namely, the random vector x of MVs that are 'exogenous' variables as conceived in econometrics, residual LV variables f or orthogonalized factors, and errors of measurement or unique LV factors e. Note that in a complete LV model, where every MV is decomposed into latent factors, there will be no ' x ' variables. While the conceptualization of residual variables and errors of measurement as independent variables in a system is a novel one, particularly because these variables are rarely if ever under experimental control, this categorization of variables provides a model of great flexibility. In this approach, since 7 represents the structural coefficients for the effects of all independent variables, the coefficients for residual and error independent variables are typically known as having fixed unit values.
3.2. The selection model Since (3.1) includes measured and unmeasured variables, it is desirable to provide an explicit representation for the relation between the variables in (3.1) and the measured variables. We shall assume that this relation is given by
y=l~y+Gy~l
and
x=l~x +Gx(
(3.2)
where Gx and Gy are known matrices with zero entries except for a single unit in each row to select y from 7/and x from ~. For definiteness we shall assume that there are p observed dependent variables and q observed independent variables. Vectors/~v ( P X 1) and/~x (q X 1) are vectors of means. Letting z ' = [y', x'], the selection model (3.2) can be written more compactly as
z=.+cv
(3.3)
where ~ ' = [/~y,/~'], v ' = [~/', ~'], and G is a 2 X 2 supermatrix containing the rows [Gy,O], [0, G~].
3.3. The complete model We assume that the expected values of (3.3) are given by E ( z ) =/~ + GTZU,
(3.4)
where E(v) = / ~ = TZU, with T and Z being parameter matrices of fixed, free, or constrained elements and with U being a known vector. The use of means that are
759
Multivariate analysis with latent variables
structured in terms of other parameters is useful in several applications (e.g., JOreskog, 1973; Srrbom, 1978), but this topic will not be pursued here. Combining (3.1) with (3.2) yields the resulting expression for y =l~y + Gy/3-1Y~ where /3 = ( I - / 3 o ) is assumed to be nonsingular. The covariance matrix of the MVs is thus given by the matrix elements ZyyZ__ay/3-1
....
~'~' ~ ~yx=Gy/3-~3,~G~
IF,
~'y' and
(3.5) ~xx=G~q~G~
where ~ is the covariance matrix of the independent variables ~. Eq. (3.5) may be more simply represented as
Z=G(I-Bo) 'F~r'(1--Bo)'-'G'=GB 1r~F'B'
'G',
(3.6)
where F' = [y', I ], B 0 has rows [/30, 0] and [0, I ], and B = I -- B0. The orders of the matrices in (3.6) are given by G(r × s), B(s × s), F(s × n), and ~(n × n) where r = p + q a n d s = m + n. In general, a model of the form (3.1)-(3.6) can be formulated for each of several populations, and the equality of parameters across populations can be evaluated. Such a multiple-population model is relevant, for example, to factor analysis in several populations (e.g., SOrbom, 1974) or to the analysis of covariance with latent variables (SOrbom, 1978), but these developments will not be pursued here. We concentrate on a single population with the structure (3.4) and (3.6), with/~v = 0. It is possible to drop the explicit distinction between dependent and independent variables (Bentler and Weeks, 1979). All structural relations would be represented in/30, and all variables with a null row in/3o would be independent variables. The matrix • will now be of order equal to the number of independent plus dependent variables. The rows and columns (including diagonal elements) of corresponding to dependent variables will be fixed at zero. The model is simpler in terms of number of matrices:
----GB- t~B'-1G'.
(3.7)
It can be obtained from (3.6) by setting/" = I.
3.4. Comparison with alternative models It is easy to demonstrate that model (3.1)-(3.6) incorporates the seemingly more complex model (2.20)-(2.22) developed by Weeks (1978). First, it should be noted that the measurement model (2.20) amounts to a nested series of linear equations. That is, letting/~ = 0 with k = 3, (2.20) yields x = A1AzA3~ 3 + A1A2~ 2 + A ~ . But this structure can be generated by the equations x = % = A~¢~ +0, ~'1 =A2~'2 ÷ ~1, and ~'2= A3~3 + ~2; thus, it is possible to reparameterize Weeks' measurement model to yield simple linear equations in which all variables can be
760
P. M. Bentler and D. G. Weeks
classified as independent or dependent. Next, it should be noted that (2.21) represents a linear structure as well, where the Tx and Ty matrices simply serve to redefine, if desired, the variables in a given structural equation. In particular, these matrices allow a choice among primary or residual factors in the structure (2.21), which translates in the current model into a given definition for independent and dependent variables and their relationship via structural equations. Obviously the proposed model has a simpler mathematical form. Although the basic definitions underlying (3.1)-(3.6) are radically different from those of the JOreskog-Keesling-Wiley model (2.11)-(2.14), it may be shown that when the current conceptualization is adopted under their mathematical structure i.e., ignoring the explicit focus on a first-order factor analytic measurement model in (2.11) as well as their definitions of variables, the present model covariance structure (3.5) (but not the means structure (3.4)) is obtained. That is, if one takes G y = A y , f l = B , y = F , G x = A x, ~ = 0 , 0~=0, and 0 ~ = 0 in (2.14), one obtains model (3.5). Thus it is possible to use the model (2.14) to obtain applications that were not intended, such as linear structural relations among higher-order factors. However, model (3.5) contains only three matrices with unknown parameters while (2.14) contains eight. Consequently, the mathematics involved in application are simpler in the current representation, and the model is easier to communicate. The Geraci (1977) model is of the form (2.11)-(2.12), with Ay = I, 0~= 0, and A x = I. Consequently, it is not obvious how it could be reconceptualized to yield (3.5). Similarly, Robinson's (1977) model is of the form (2.11)-(2.12), with A x = I, Ay = I, and f = 0. Although this model allows nonlinear constraints on parameters in the linear relation (2.12), it does not seem to be able to be redefined so as to yield (3.5) as conceptualized here. The problem of imposing arbitrary constraints on parameters in LV models has been addressed by Lee and Bentler (1980), and is discussed further below. Krishnaiah and Lee (1974) studied covariance structures of the form Z = U~ZIU ~+ . . . + Uk~,kU£ where the U, are known matrices and the Ni are unknown. This structure, which arises in cases such as the multivariate components of variance model, can be obtained from (3.6) by setting G = I , B = I , F = [U1..... Uk], and q~ as block diagonal with elements Ni (see also Rao, 1971; Rao and Kleffe, 1980).
4.
Parameter identification
LV models cannot be statistically tested without an evaluation of the identification problem. Identifiability depends on the choice of mathematical representation as well as the particular specification in a given application, and it refers to the uniqueness of the parameters underlying the distribution of MVs. A variety of general theoretical studies of identification have been made in recent years (Deistler and Seifert, 1978; Geraci, 1976; Monfort, 1978), but these studies are not very helpful to the applied researcher. While there exist known conditions that an observable process must satisfy in order to yield almost sure consistent
Multivariate analysis with latent variables
761
estimabili[y, it remains possible to find examples showing that identifiability does not necessarily imply the existence of a consistent estimator (Gabrielsen, 1978) and that multiple solutions are possible for locally identified models (Fink and Mabee, 1978). In practice, then, it may be necessary to use empirical means to evaluate the situation: JOreskog and S6rbom (1978) propose that a positive definite information matrix almost certainly implies identification, and McDonald and Krane (1977) state that parameters are unambiguously locally identified if the Hessian is nonsingular. Such a pragmatic stance implies that identification is, in practice, the handmaiden of estimation, a point of view also taken by Chechile (1977) from the Bayesian perspective. Although such an empirical stance may be adopted to evaluate complex models, it is theoretically inadequate. Identification is a problem of the population, independent of sampling considerations. Thus, data-based evaluations of identification may be incorrect, as shown in Monte Carlo work by McDonald and Krane (1979) who retract their earlier claims on unambiguous identifiability. In the case of model (3.1)-(3.6), identifiability depends on the choice of specification in a given application and refers to the uniqueness of parameters underlying the distribution of observed variables, specifically, the second moments (3.6) when U of (3.4) is null and there is no interdependence of means and covariance parameters. The moment structure model (3.6) must be specified with known values in the parameter matrices B0, F, and q~ such that a knowledge of 2J allows a unique inference about the unknown elements in these matrices. However, it is obvious that it is possible to rewrite (3.6) using nonsingular transformation matrices Tl, T2, and T3 as
Z= G*B*-IF*~*F*'B *' 1G*'
(4.1)
where G*=GT1, B*-I=T~XB 1T2, F*=T21FT3, and ~*----T31!~T~-1. The parameters of the model are identified when the only nonsingular transformation matrices that allow (3.6) and (4.1) to be true simultaneously are identity matrices of the appropriate order. A necessary condition for this to occur is that the number of unknown parameters of the model is less than the number of different elements in the variance-covariance matrix Z, but even the well-known rank and order conditions and their generalization (Monfort, 1978) do not provide a simple, practicable method for evaluating identification in the various special cases that might be entertained under the general model. Even the simplest special cases, such as factor analysis models, are only recently becoming understood (Algina, 1980).
5.
Estimation and testing: Statistical basis
The statistical theory involved in multivariate LV models exists in rudimentary form. Only large sample theory has been developed to any extent, and the relevance of this theory to small samples has not been established. Although the
762
P. M. Bentler and D. G. Weeks
statistical theory associated with LV models based on multinormally distributed MVs already existed (cf. Anderson and Rubin, 1956), JOreskog (1967, 1973, 1977) must be given credit for establishing that maximum likelihood (ML) estimation could be practically applied to LV models. While various researchers were studying specialized statistical problems and searching for estimators that might be easy to implement, J6reskog showed that complex models could be estimated by difficult ML methods based on a standard covariance structure approach. The most general alternative approach to estimation in LV and other models was developed by Browne (1974). Building upon the work of J6reskog and Goldberger (1972) and Anderson (1973), who had developed generalized least squares (GLS) estimators for the factor analytic model and for linear covariance structures, Browne showed that a class of GLS estimators could be developed that have many of the same asymptotic properties as ML estimators, i.e., consistency, normality, and efficiency. He also developed the associated goodness of fit tests. Lee (1977) showed that ML and GLS estimators are asymptotically equal. Swain (1975), Geraci (1977) and Robinson (1977) introduced additional estimators with optimal asymptotic properties. Some of the Geraci and Robinson estimators can be easier to compute than ML or GLS estimators; the Browne and Robinson estimators do not necessarily require multivariate normality of the MVs to yield their minimal sampling variances. Unfortunately, the empirical meaning of loosening the normality assumption is open to question, since simple procedures for evaluating the less restrictive assumption (that fourth-order cumulants of the distribution of the variables are zero) do not appear to be available. Although certain GLS estimators are somewhat easier to compute than ML estimators, there is some evidence that they may be more biased than ML estimators (JOreskog and Goldberger, 1972; Browne, 1974). Virtually nothing is known about the relative robustness of these estimators to violation of assumptions or about their relative small sample properties. We now summarize certain asymptotic statistical theorems for multivariate analysis with latent variables. The basic properties for unconstrained ML estimators are well known; Browne (1974) and Lee (1977) developed parallel large sample properties for GLS estimators. Let ~0 =~J(00) be a p by p population covariance matrix whose elements are differentiable real-valued functions of a true though unknown (q × 1) vector of parameters 00. Let S represent the sample covariance matrix obtained from a random sample of size N = n + 1 from a multivariate normal population with mean vector 0 and covariance matrix N0. We regard the vector 0 as mathematical variables and ~ = 2J(0) as a matrix function of 0. The generalized least squares estimators, provided they exist, minimize the function Q( o ) = tr[( S - Y,)w]2 / 2
(5.1)
where the weight matrix W is either a positive definite matrix or a stochastic matrix possibly depending on S which converges in probability to a positive
Multivariate analysis with latent variables
763
definite matrix as N tends to infinity. In most applications W is chosen so that it converges to Z-1 in probability, e.g., W-- S 1. It was proven by Browne (1974) and Lee (1977) that the estimator t~ that minimizes (5.1) based on a W that converges to Z-1 possesses the following asymptotic properties: (a) it is consistent; (b) it is asymptotically equivalent to the maximum likelihood estimator; (c) it is a 'best generalized least-squares' estimator in the sense that for any other generalized least-squares estimator 0+, cov(0+ ) - cov(t~) is positive semidefinite; (d) its asymptotic distribution is multivariate normal with mean vector 0o and covariance matrix 2n l [ ( 0 ~ 0 / 0 0 ) ( 2 j o l ® ~ o l ) ( 0 ~ 0 / 0 0 ) ' ] l; (e) the asymptotic distribution of nQ(O) is chi-square with degrees of freedom equal to p( p + 1 ) / 2 - q. The latter property enables one to test the hypothesis that ~o =~(00) against the alternative that Z0 is any symmetric positive definite matrix. More general statistical results deal with models whose parameters are subject to arbitrary constraints. Although Aitchison and Silvey (1958) provided the statistical basis for obtaining and evaluating constrained ML estimates, their methods were not applied to moment structure models or to LV models until the work of Lee and Bentler (1980). Their GLS results presented below extend Aitchison and Silvey's ML results. Of course models without constraints on parameters can be evaluated as a special case. Suppose 00 satisfies r ~
O(O)+/;'X =o,
h(O)=O,
(5.2)
where Q=(OQ/O0) is the gradient vector of Q(O), L=(Oh/O0) is an r×q matrix of partial derivatives, and £ = L(0). Such a definition is paralleled with constrained ML estimators. The constrained ML estimator 0 of 00 is defined as the vector which satisfies h(0) = 0 and minimizes the function
F(O) =
logdet(•) + t r ( S Z - ' ) - log det(S) - p.
(5.3)
Similarly, from the first order necessary condition, if a 0 exists, there corresponds a vector of Lagrange multipliers X'= (Xl ..... Xr) such that ~0(0 ) + £'X = 0,
h(t~)=0
(5.4)
where F = (~L/O0) is the gradient of F(0) and/2 = L(0). Lee and Bentler (1980) assume a number of standard regularity conditions that are typically satisfied in practice in order to obtain their results. Major use is
P. M. Bentler and D. G. Weeks
764
made of the information matrix 2-1M with elements M(O)(i, j ) = tr Z-lJ~iZ- 12~j (see Lee and Jennrich, 1979). The results include the following six propositions. (a) The generalized least squares estimator t~ is consistent. (b) The joint asymptotic distribution of random variables nt/2(0 - 00) and n ~/2~ is multivariate normal with zero mean vector and eovariance matrix 2[ P°0
_ 0JR0
where
[POT o RoT° M ° +] =L [° L ' ° - IL o
Lr] -1'0
with M 0 = M(Oo) and L o = L(Oo). (c) The generalized least squares estimator (0, ~) is asymptotically equivalent to the maximum likelihood estimator (0, ?Q. (d) The asymptotic distribution of nQ(O) is chi-square with degrees of freedom
p(p+l)/Z-(q-r). Suppose h(O) = (h*(O), h**(O)) where h*(O) = (h~(O) ..... hi(0)), h**(O) = (hj+l(O) ..... hr(O)), and j~
6.
Estimation and testing: Nonlinear programming basis
For any choice of error function and for almost any LV model with parameters subject to no constraints, or to simple equality and proportionality constraints, parameter estimates can be obtained by one of several nonlinear programming algorithms. Certain algorithms commonly used in moment structures analysis will be briefly considered. All algorithms to be considered here may be written as
Ok+1= Ok -- aNk gk
(6.1)
where Ok is the vector of parameter estimates at the k th iteration. The vector 0 has as the number of elements the number of nondependent parameters, i.e., the number of free parameters after considering equality and proportionally constraints. Nk is a square symmetric positive definite matrix, and gk is the gradient
Multivariate analysis with latent variables
765
at iteration k. The stepsize parameter a is, in general, chosen to minimizef(Ok+ ~) where f(Ok+ 1) = Q(Ok+1) or F(Ok+1), depending on the function chosen. Some algorithms are defined with a = 1, but in practice one must generally allow the option of reducing a in order to prevent divergence. The steepest descent algorithm is defined by setting N = I. Steepest descent is usually very effective when the initial parameter estimates are far from the solution, but it is very slow to converge. The Newton-Raphson algorithm is defined by setting N k = [O2f/ooa ]- 1, the inverse of the Hessian matrix (see, e.g., Bentler and Lee, 1979b). Typically the Newton-Raphson algorithm converges very rapidly when near the solution. However, with less than optimal starting values, the Hessian is often not positive definite. For this reason, and since second derivatives may be difficult to obtain and expensive to compute, the Newton-Raphson algorithm is relatively unattractive. The Fletcher-Powell algorithm is defined by Nk+~=Nk+ANk where
AN= ( AO'Ag)-1AOAO'--( Ag'NAg)-'NAgAg'N ', and A0 = aNg, Ag = g(O + AO)-- g(O), and typically N~ = I. The Fletcher Powell algorithm is the algorithm most commonly used at this time in covariance structure analysis. For maximum likelihood estimation the Fisher scoring algorithm is defined by N as the Fisher information matrix. For the least squares error function, the Gauss-Newton algorithm is defined by N = (HH')-1 where H = ~:~(0)/00. The basic Gauss-Newton algorithm can be modified to minimize the generalized least squares error function by setting N = [H(W® W)H']-1. It has been shown that when W = N -~, the information matrix is proportional to H(W®W)H', in which case a step of the modified Gauss-Newton algorithm is equivalent to a step of the Fisher scoring algorithm applied to the maximum likelihood error function (Lee, 1977; Lee and Jennrich, 1979). A step in the modified Gauss-Newton algorithm may be written as
Ok+,=Ok+a[H(W®W)H']-IH(W®W)Vec(S--~,),
(6.2)
where Vec stacks the elements of the subsequent matrix into a vector. Thus under an appropriate choice of W, one may obtain least-squares ( W = I), generalized least-squares ( W = S-1), or maximum-likelihood ( W = 2~-~) estimates from the modified Gauss-Netwon algorithm. In an empirical comparison of the algorithms considered here (except steepest descent) for the orthogonal factor model, Lee and Jennrich (1979) found the modified Gauss-Newton algorithm to be a cost-efficient statistical optimizer. For most moment structure models, the constrained generalized least squares estimator t~ and the corresponding Lagrange multipliers ~ cannot be solved in closed form; thus, some nonlinear iterative procedure has to be used. Among the other methods the penalty function technique developed by Fiacco and McCormick (1968) has been accepted as an effective method in constrained optimization. Based on this technique, Lee and Bentler proposed an algorithm as
P. M. Bentler and D. G. Weeks
766
follows: (a) Choose scalar c I > 0 and initial values of 0. (b) given ck > 0 and Ok, by means of the Gauss-Newton algorithm (6.2) search a minimum point 0k+ 1 of the function
Q(Ok+l)=Q(Ok)+cki ~(ht(0k))
(6.3)
t=l
where q~ is a real-valued differentiable function such that ~ ( x ) / > 0 for all x and • (x) = 0 if and only if x = 0. (c) Update k, increase ck+ 1 and return to (b) with Ok+~ as the initial values. The process is terminated when the absolute values of
maX[Ok+l(i)--Ok(i)] and i
max[ht(Ok+l) ] t
(6.4)
are less than e, where e is a predetermined small real number. The algorithm will converge to the constrained generalized least squares estimator, if it exists. It has been shown by Fiacco and McCormick (1968), and Luenberger (1973) that if the algorithm converges to Ok, the corresponding Lagrange multipliers are given by
x;
..... ck
(hr(0k)))
(6.5)
where ~ denotes the derivative of ~. An algorithm for obtaining the constrained maximum likelihood estimator can be developed in a similar procedure. In this case, during (b), the Fisher scoring algorithm is applied in minimizing the appropriately defined Lk(O ) analogous to (6.3). Lee (1980) applied the penalty function technique to obtain estimators in confirmatory factor analysis.
Partial derivatives for a general model The only derivatives required to implement (6.2) to estimate parameters of the general model (3.6) are the elements of ~ / ~ 0 . These can be obtained by various methods, including matrix differentiation. See Bentler and Lee (1978b) for a recent overview of basic results in this area and Nel (1980) for a comprehensive discussion of recent developments in this field. The derivatives were reported by Bentler and Weeks (1980) as
OX // OdP= ( F'B'- 'G' ® F'B'- 'G'), O Z / 0 1 ~= ( B t - I G " @ O F " B t - I G p ) ( I-q- Err),
(6.6)
OZ/aBo = (B'-'G'®B ' r e r ' B ' - ' 6 ' ) ( i + Err) where Err denotes OX'/OX for X (r × r) with no constant or functionally dependent elements. This matrix is a permutation matrix with (0, 1) elements. In (6.6) the symmetry of • and other constraints on parameters have been ignored. The complete set (6.6) of matrix derivatives can be stacked into a single matrix 0~/O0* with matrix elements (OZ/OO*)'=[(OZ/~)',(OZ/OF)',(~Z/Bo)' ]. It
Multivariate analysis with latent variables
767
follows that the elements of the unreduced gradient g* are stacked into the vector g * ' = [g(~)', g(F)', g(Bo)'], whose vector components are given by
g(~) = Vec[Y'A'(:~- S)AF], g(F) g(B0) = 2 Vec[ A'(Z-- S)AF~bl"'B'-1]
=
2Vec[A'(Z- S)AF~], (6.7)
where A = W G B - 1 and the symmetry of • has not been taken into account. The corresponding matrix N* is a 9 X 9 symmetric supermatrix
(~/~o*) (w®w) (~/~o*)' whose lower triangular matrix elements, taken row by row, are
(r'vr®r'vr), 2(vr®c'vr), 2[(v®c'vc) + Err(C'VNVC)], 2(vr®vvr), 2[(V®DVC)+Err(DV®VC)], 2[(V®DVD')+ E,,( DV®VD')].
(6.8)
In (6.5), C = Y~, D = B - lyq~/~,, and V = B ' - 1G'WGB i. The matrix 0N/0 0* contains derivatives with respect to all possible parameters in the general model. In specific applications certain elements of q~, F, and B0 will be known constants and the corresponding row of O2J/O0* must be eliminated. In addition, certain parameters may be constrained, as mentioned above. For example, • is a symmetric matrix so that off-diagonal equalities must be introduced. The effect of constraints is to delete rows of ON/O0* corresponding to constrained parameters and to transform a row i of OZ/O0* to a weighted sum of rows i, j for the constraint 0i = wjOj. These manipulations performed on (6.7) transform it into the (q × 1) vector g and when carried into the rows and columns of (6.8) transform it into the (q × q) matrix N; where q is the number of nondependent parameters. The theory of Lee and Bentler (1980) for estimation with arbitrarily constrained parameters, described above, can be used with the proposed penalty function technique to yield a wider class of applications of the general model (3.6) than have yet appeared in the literature.
7.
Conclusion
The field of multivariate analysis with continuous latent and measured random variables has made substantial progress in recent years, particularly from mathematical and statistical points of view. Mathematically, clarity has been achieved in understanding representation systems for structured linear random variable models. Statistically, large sample theory has been developed for a variety of competing estimators, and the associated hypothesis testing procedures
768
P. M. Bentler and D. G. Weeks
have been developed. However, much statistical work remains to be completed. For example, small-sample theory is virtually unknown, and reliance is placed upon Monte Carlo work (cf. Geweke and Singleton, 1980). A theory of estimation and model evaluation that is completely distribution-free is only now being worked out (Browne, 1982). The applied statistician who is concerned with utilizing the above theory in empirical applications will quickly find that 'causal modeling', as the above procedures are often called, is a very finicky methodology having many pitfalls. For example, parameter estimates for variances may be negative; suppressor effects yielding unreasonable structural coefficients may be found; theoretically identified models may succumb to 'empirical' underidentification with sampling variances being undeterminable; iterative computer methods may be extremely expensive to utilize; goodness of fit tests may be 'unduly' sensitive to sample size. Many of these issues are discussed in the voluminous literature cited previously. Alternative approaches to model evaluation, beyond those of the simple goodness of fit chi-square test, are discussed by Bentler and Bonett (1980).
References Aigner, D. J. and Goldberger, A. S., eds. (1977). Latent Variables in Socioeconomic Models. NorthHolland, Amsterdam. Aitchison, J. and Silvey, D. S. (1958). Maximum likelihood estimation of parameters subject to restraint. Ann. Math. Statist. 29, 813-828. Algina, J. (1980). A note on identification in the oblique and orthogonal factor analysis models. Psychometrika 45, 393-396. Amemiya, T. (1977). The maximum likelihood and the nonlinear three-stage least squares estimator in the general nonlinear simultaneous equation model. Econometrica 45, 955-968. Anderson, T. W. (1958). An Introduction to Multivariate Statistical Analysis. Wiley, New York. Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. Ann. Statist. 1, 135-141. Anderson, T. W. (1976). Estimation of linear functional relationships: Approximate distributions and connections with simultaneous equations in econometrics. J. Roy. Statist. Soc. Sec. B 38, 1-20. Discussion, ibid 20-36. Anderson, T. W. and Rubin, H. (1956). Statistical inference in factor analysis. Proc. 3rd Berkeley Symp. Math. Statist. Prob. 5, 111-150. Bentler, P. M. (1976). Multistructure statistical model applied to factor analysis. Multivariate Behav. Res. ll, 3-25. Bentler, P. M. (1980). Multivariate analysis with latent variables: Causal modeling. Ann. Rev. Psychol. 31, 419-456. Bentler, P. M. (1982). Linear systems with multiple levels and types of latent variables. In: K. G. Jrreskog and H. Wold, eds., Systems under Indirect Observation. North-Holland, Amsterdam [in press]. Bentler, P. M. and Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psych. Bull. 88, 588-606. Bentler, P. M. and Lee, S. Y. (1978a). Statistical aspects of a three-mode factor analysis model. Psychometrika 43, 343-352. Bentler, P. M. and Lee, S. Y. (1978b). Matrix derivatives with chain rule and rules for simple, Hadamard, and Kronecker products. J. Math. Psych. 17, 255-262.
Multivariate analysis with latent variables
769
Bentler, P. M. and Lee, S. Y. (1979a). A statistical development of three-mode factor analysis. British J. Math. Statist. Psych. 32, 87-104. Bentler, P. M. and Lee, S. Y. (1979b). Newton-Raphson approach to exploratory and confirmatory maximum likelihood factor analysis. J. Chin. Univ. Hong Kong. 5, 562-573. Bentler, P. M. and Weeks, D. G. (1978). Restricted multidimensional scaling models. J. Math. Psych. 17, 138-151. Benfler, P, M. and Weeks, D. G. (1979). Interrelations among models for the analysis of moment structures. Multivariate Behav. Res. 14, 169-185. Bentler, P, M. and Weeks, D. G. (1980). Linear structural equations with latent variables. Psychometrika 45, 289-308. Bhargava, A. K. (1977). Maximum likelihood estimation in a multivariate 'errors in variables' regression model with unknown error covariance matrix. Comm. Statist. A--Theory Methods 6, 587-601. Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analysis'. MIT Press, Cambridge, MA. Bock, R. D. and Bargmann, R. E. (1966). Analysis of covariance structures. Psychometrika 31, 507-534. Browne, M. W. (1974). Generalized least-squares estimators in the analysis of covariance structures. South African Statist. J. 8, 1-24. Browne, M. W. (1977). The analysis of patterned correlation matrices by generalized least squares. British J. Math. Statist. Psych. 30, 113-124. Browne, M. W. (1982). Covariance structures. In: D. M. Hawkins, ed., Topics in Applied Multivariate Analysis. Cambridge University Press, London, Chechile, R. (1977). Likelihood and posterior identification: Implications for mathematical psychology. British J. Math. Statist. Psych. 30, 177-184. Cochran, W. G. (1970). Some effects of errors of measurement on multiple correlation. J. Amer. Statist. Assoc. 65, 22-34. Deistler, M. and Seifert, H. G. (1978). Identifiability and consistent estimability in econometric models. Econometrica 46, 969-980. Dempster, A. P. (1969). Elements of Continuous Multivariate Analysis. Addison-Wesley, Reading, MA. Dempster, A. P. (1971). An overview of multivariate data analysis. J. Multivariate Anal. 1, 316-346. Feldstein, M. (1974). Errors in variables: A consistent estimator with smaller MSE in finite samples. J. Amer. Statist. Assoc. 69, 990-996. Fiacco, A. V. and McCormick, G. P. (1968). Nonlinear Programming. Wiley, New York. Fink, E. L. and Mabee, T. I. (1978). Linear equations and nonlinear estimation: A lesson from a nonrecursive example. Sociol. Methods Res. 7, 107-120. Gabrielsen, A. (1978). Consistency and identifiability. J. Econometrics 8, 261-263. Geraci, V. J. (1976). Identification of simultaneous equation models with measurement error. J. Econometrics 4, 263-283. Geraci, V. J. (1977). Estimation of simultaneous equation models with measurement error. Econometrica 45, 1243-1255. Geweke, J. F. and Singleton, K. J. (1980). Interpreting the likelihood ratio statistic in factor models when sample size is small. J. Amer. Statist. Assoc. 75, 133-137. Gleser, L, J. (1981). Estimation in a multivariate "errors in variables" regression model: Large sample results. Ann. Statist. 2, 24-44. Goldberger, A. S. and Duncan, O. D., eds. (1973). Structural Equation Models in the Social Sciences. Academic Press, New York. Goodman, L. A. (1978). Analyzing Qualitative/Categorical Data. Abt Books, Cambridge, MA. Hausman, J. A. (1977). Errors in variables in simultaneous equation models. J. Econometrics 5, 389-401. Hsiao, C, (1976). Identification and estimation of simultaneous equation models with measurement error. Internat. Econom. Rev. 17, 319-339. Jermrich, R. I. and Ralston, M. L. (1978). Fitting nonlinear models to data. Ann. Rev. Biophys. Bioeng. 8,' 195-238.
770
P. M. Bentler and D. G. Weeks
Jtreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika 32, 443-482. Jtreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika 34, 183-202. Jtreskog, K. G. (1970). A general method for analysis of covariance structures. Biometrika 57, 239-251. J~reskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika 36, 409-426. J6reskog, K. G. (1973). Analysis of covariance structures. In: P. R. Krishnaiah, ed., Multivariate Analysis III, 263-285. Academic Press, New York. J6reskog, K. G. (1977). Structural equation models in the social sciences: Specification, estimation and testing. In: P. R. Krishnaiah, ed., Applications of Statistics, 265-287. North-Holland, Amsterdam. J~reskog, K. G. (1978). Structural analysis of covariance and correlation matrices. Psychometrika 43, 443-477. J6reskog, K. G. and Goldberger, A. S. (1972). Factor analysis by generalized least squares. Psychometrika 37, 243-260. Jtreskog, K. G. and Goldberger, A. S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. J. Amer. Statist. Assoc. 70, 631-639. Jtreskog, K. G. and Strbom, D. (1978). LISREL I V Users Guide. Nat. Educ. Res., Chicago. Keesling, W. (1972). Maximum likelihood approaches to causal flow analysis. Ph.D. thesis. University of Chicago, Chicago. Krishnaiah P. R. and Lee, J. C. (1974). On covariance structures. Sankhy~ 38, 357-371. Lawley, D. N. (1940). The estimation of factor loadings by the method of maximum likelihood. Proc. R. Soc. Edinburgh 60, 64-82. Lawley, D. N. and Maxwell, A. E. (1971). Factor Analysis as a Statistical Method. Butterworth, London. Lawley, D. N. and Maxwell, A. E. (1973). Regression and factor analysis. Biometrika 60, 331-338. Lazarsfeld, P. F. and Henry, N. W. (1968). Latent Structure Analysis. Houghton-Mifflin, New York. Lee, S. Y. (1977). Some algorithms for covariance structure analysis. Ph.D. thesis. Univ. Calif., Los Angeles. Lee, S. Y. (1980), Estimation of covariance structure models with parameters subject to functional restraints. Psychometrika 45, 309-324. Lee, S. Y. and Bentler, P. M. (1980). Some asymptotic properties of constrained generalized least squares estimation in covariance structure models. South African Statist. J. 14, 121-136. Lee, S. Y. and Jennrich, R. I. (1979). A study of algorithms for covafiance structure analysis with specific comparisons using factor analysis. Psychometrika 44, 99-113.' Lord, F. M. (1960). Large-sample covariance analysis when the control variable is fallible. J. Amer. Statist. Assoc. 55, 307-321. Lord, F. M. and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Addison-Wesley, Reading, MA. Luenberger, D. G. (1973). Introduction to Linear and Nonlinear Programming. Addison-Wesley, Reading, MA. McDonald, R. P. (1978). A simple comprehensive model for tile analysis of covariance structures. British J. Math. Statist. Psych. 31, 59-72. McDonald, R. P. and Krane, W. R. (1977). A note on local identifiabihty and degrees of freedom in the asymptotic likelihood ratio test. British J. Math. Statist. Psych. 30, 198-203. McDonald, R. P. and Krane, W. R. (1979). A Monte-Carlo study of local identifiability and degrees of freedom in the asymptotic likelihood ratio test. British J. Math. Statist. Psych. 32, 121-132. McDonald, R. P. and Mulaik, S. A. (1979). Determinacy of common factors: A nontechnical review. Psych. Bull. 86, 297-306. Monfort, A. (1978). First-order identification in linear models. J. Econometrics 7, 333-350. Nel, D. G. (1980). On matrix differentiation in statistics. South African Statist. J. 14, 137-193. Olsson, U. and Bergman, L. R. (1977). A longitudinal factor model for studying change in ability structure. Multivariate Behav. Res. 12, 221-242.
Multivariate analysis with latent variables
771
Please, N. W. (1973). Comparison of factor loadings in different populations. British J. Math. Statist. Psychol. 26, 61-89. Rao, C. R. (1971). Minimum variance quadratic unbiased estimation of variance components. J. Multivariate A nal. 1, 445-456. Rao, C. R. and Kleffe, J. (1980). Estimation of variance components. In: P. R. Krishnaiah and L. N. Kanal, eds., Handbook of Statistics, Vol. I, 1-40. North-Holland, Amsterdam. Robinson, P. M. (1974). Identification, estimation and large-sample theory for regressions containing unobservable variables, lnternat. Econom. Rev. 15, 680-692. Robinson, P. M. (1977). The estimation of a multivariate linear relation. J. Multivariate Anal. 7, 409-423. Rock, D. A., Werts, C. E. and Flaugher, R. L. (1978). The use of analysis of covariance structures for comparing the psychometric properties of multiple variables across populations. Multivariate Behav. Res. 13, 403-418. Srrbom, D. (1974). A general method for studying differences in factor means and factor structure between groups. British J. Math. Statist. Psych. 27, 229-239. Srrbom, D. (1978). An alternative to the methodology for analysis of covariance. Psychometrika 43, 381-396. Spearman, C. (1904). The proof and measurement of association between two things. Amer. J. Psych. 15, 72-101. Steiger, J. H. (1979). Factor indeterminacy in the 1930's and the 1970's: Some interesting parallels. Psyehometrika 44, 157-167. Steiger, J. H. and Schi3nemann, P. H. (1978). A history of factor indeterminacy. In: S. Shye, ed., Theory Construction and Data Analysis, Jossey-Bass, San Francisco. Strotz, Robert H. and Wold, H. O. A. (1960). Recursive vs. nonrecursive systems: An attempt at synthesis. Econometrica 28, 417-427. Swain, A. J. (1975). A class of factor analytic estimation procedures with common asymptotic sampling properties. Psychometrika 40, 315-335. Thurstone, L. L. (1947). Multiple Factor Analysis. Univ. of Chicago Press, Chicago. Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279-311. Tukey, J. W. (1954). Causation, regression, and path analysis. In: O. K. Kempthorne, T. A. Bancroft, J. W. Gowen and J. L. Lush, eds., Statistics and Mathematics in Biology, 35-66. Iowa State University Press, Ames, IA. Weeks, D. G. (1978). Structural equation systems on latent variables within a second-order measurement model. Ph.D. thesis. Univ. of Calif., Los Angeles. Weeks, D. G. (1980). A second-order longitudinal model of ability structure. Multivariate Behav. Res. 15, 353-365. Wiley, D. E. (1973). The identification problem for structural equation models with unmeasured variables. In: Goldberger and Duncan, eds., Structural Equation Models" in the Social Sciences, 69-83. Academic Press, New York. Wiley, D. E., Schmidt, W. H. and Bramble, W. J. (1973). Studies of a class of covariance structure models. J. Amer. Statist. Assoc. 68, 317-323. Williams, J. S. (1978). A definition for the common-factor analysis model and the elimination of problems of factor score indeterminacy. Psychometrika 43, 293-306. Wold, H. (1980). Model construction and evaluation when theoretical knowledge is scarce: An example of the use of partial least squares. In: J. Kmenta and J. Ramsey, ed., Evaluation of Econometric Models. Academic Press, New York. Wright, S. (1934). The method of path coefficients. Ann. Math. Statist. 5, 161-215.