Linear structural relations: Gradient and Hessian of the fitting function

Linear structural relations: Gradient and Hessian of the fitting function

Statistics & Probability North-Holland January Letters 11 (1991) 57-61 1991 Linear structural relations: Gradient and Hessian of the fitting funct...

362KB Sizes 12 Downloads 71 Views

Statistics & Probability North-Holland

January

Letters 11 (1991) 57-61

1991

Linear structural relations: Gradient and Hessian of the fitting function Heinz Neudecker Instituut

voor Actunriaat

en Econometric,

Unrversity of Amsterdam,

Amsterdam,

The Netherlands

Albert Satorra * Department

of Economrcs,

Universitat Pompeu Fabra, Barcelona, Spain

Received September 1989 Revised November 1989

Abstract: We have derived latent-variable models. Keywords: Covariance

the Hessian

structure,

matrix

fitting-function,

of a general

Kronecker

type

product,

of fitting

matrix

function

differential,

commonly

used

first and second

in the analysis

of linear

derivatives.

1. Introduction Latent-variable models are now frequently used in economic, social, and behavioral studies for analyzing structural relations among variables. Perhaps the most popular of these models is the so-called LISREL model (Joreskog, 1977; Wiley, 1973), the analysis of which has become standard practice through Jiireskog and Siirbom’s (1983) computer program also named LISREL. The LISREL model integrates the classical simultaneous equation model developed in econometrics with the factor-analysis model developed by psychometricians. The classical ‘errors-in-variables’ model is also a particular case of LISREL. Alternative formulations of latent-variable models, as Bentler and Weeks’ (1980) model implemented in computer programs such as EQS (Bentler, 1987) or the COSAN model of McDonald (1980) have been recognized to be equivalent specifications of the LISREL model. A particular type of analysis of the LISREL model suited for discrete data is implemented in Miithen’s (1988) program LISCOMP. A review of latent-variable models can be found in Anderson (1984), and Aigner, Hsiao, Kapteyn and Wansbeek (1984). A maximum-likelihood analysis, under the assumption that the vector of observed variables is normally distributed, as well as more general distribution-free methods, are implemented in most of the above-mentioned computer programs. Covariance-structure analysis is a general framework for fitting latent-variable models. Within this framework, if 6 denotes the vector containing all the unknown parameters of the model and 2 the population covariance matrix of the vector of all observed variables, then the covariance structure, 2 = 2 (8) implied by the specified model is fitted to the corresponding sample covariance matrix S. The model is fitted by minimizing with respect to B a (non-negative) real-valued function F = F(S,2(8)) of 8. Numerical iteration methods are required to achieve such a fit. As far as we know, all the above-mentioned programs use only the first derivatives of F, thus quasi-Newton optimization * The research

of this author

0167-7152/91/$03.50

was made possible

by a grant

0 1991 - Elsevier Science Publishers

from the Netherlands B.V. (North-Holland)

Organization

for Scientific

Research

(NWO).

57

Volume 11, Number 1

STATISTICS & PROBABILITY

January 1991

LETTERS

methods and expected Hessians are used. Browne (1984) Shapiro (1987) and Satorra (1989) deal with different theoretical aspects of covariance-structure analysis. Lee and Jennrich (1979) study several algorithms which are used in the practice of covariance-structure analysis. This paper provides an expression for the second derivatives of a general type of fitting function used in the analysis of the LISREL model. Standard matrix differential calculus, as in Magnus and Neudecker (1986), will be used. Although the expressions for the first derivatives are known and widely used, they will also be given since they are obtained on the way to the second derivatives. The expressions obtained for the second derivatives are a novelty and have practical implications. For instance, the expressions for the Hessian would be needed to implement true Newton fitting algorithms, or to compute statistics using observed Hessians instead of expected. The paper is structured as follows. Section 2 describes the model. Sections 3 and 4 obtain the first and second derivatives respectively of the matrix valued function _Z= Z(O). Finally, Section 5 derives the desired gradient and Hessian expressions.

2. The model A general

linear

statistical

relation

that combines

measurement

and structural

equations

is the following:

where z represents ap-vector of observable variables, and TJ, E and [ are random vectors such that E is uncorrelated with [. The matrices A and B,, and the variance matrices of E and E, q and @ respectively, contain the parameters of the model. The specific form of these parameter matrices gives rise to particular models. It can be shown that (1) is both a specific case and a generalization of the LISREL model (see, e.g., Bollen (1989), p. 396). Equations (1) with the assumptions of zero correlation between E and 5, imply the following couariance structure for the matrix 2 of variances of z: xX=AB-‘@B-TA’+

P,

(2)

where B = I - B, is supposed to be non-singular. Prior information with regard to the form of B,, A, @ and \k, yields more specific models. For instance, in LISREL one may restrict some of the elements of the matrices B,, A, @ and 9 to have equal specific values, or to be equal among themselves. Given a specific model, the distinct and functionally unrelated unknown elements of the matrices B,,, A, CDand \k will be assembled into, say, a q-dimensional parameter vector 8. We will also consider the following vector 6 of unrestricted parameters of model (1): 6 = [(vet A)‘,

(vech @)‘, (vet B)‘,

(vech ‘I’)‘]‘,

where the ‘vet’ operator transforms a matrix into a vector by stacking the columns of the matrix one underneath the other, and the ‘ vech’ operator does the same but eliminates the supradiagonal elements of the matrix. Usually, without further restrictions added to (1) the parameter vector 6 will not be identified by 2. A specific model will induce a function 6 = S (~9) expressing 6 in terms of a parameter vector B of smaller dimension. For model (1) with the type of restrictions considered, the function 6 = 6( .) will be regular enough to be twice continuously differentiable. In fact, in LISREL the derivative matrix A = as/as is a constant matrix of zeroes, ones and minus ones. Moreover, Eq. (2) implies that 2 is a function of 8, hence of 8. 58

Volume

11, Number

1

STATISTICS

& PROBABILITY

January

LETTERS

1991

3. First derivatives of CJ= a(S) Vectorizing

(2), and taking

differentials,

we obtain

dvecZ=(I+K)(AB-‘@BPT@I)

dvecA-

+(AB-‘@AB-‘)

dvec

@+dvec

where ‘d’ denotes differential, ‘ @’ denotes the (right) appropriate order, and K is a commutation matrix and Now using vet A = D vech A and vech A = L vet A, the duplication and elimination matrices of Magnus and

(I+K)(AB-‘@B-T~AB-‘)

dvec

B

P, Kronecker product, I is the identity matrix of inverse of a matrix. ‘-T’ denotes the transposed for symmetric A, where D and L are respectively Neudecker (1986), we get

dvech~=L(~+K)(AB~‘~B~T~I)dvecA-L(I+K)(AB-1~B-T~AnB-1)dvecB +L(AB-‘~3AB~‘)Ddvech@+dvech\k; which implies

that

d vech Z = G d8, with G defined G=

as the following

partitioned

matrix:

[L(Z+K)(AB-‘W’@Z),

L(ABp’@ABp’)D,

-L(Z+K)(ABp’@B-T~AB-‘),

(3)

I].

4. Second derivatives of X = Z(6) Differentiating the expression of d vet 2 obtained in the section matrix of second derivatives for the ijth element u,, of 2:

above,

we get the following

partitioned

(4) where HI, = B-‘@BpT

8 TIJ’

HI2 = (BP’ @ 7],A BP’) D, HI3 = - ( B-‘@B-~ Hz3=

-D’(BpT@

8 7;,ABp1)

B B-l),

BP’A’T,AB-‘),

H33 = ( B-~@B-~A’~;,AB-~ + ( B-‘@BpT

- K( ~;,AB-‘@B-~

B B-‘)

K + K( BpTA’T,A~p~@~-T

8 B-‘)

~3 B-TA’T,AB-‘),

where El, = e,ei, with e, the ith column

of the identity

matrix

of dimension

p x p, and T, = E,, + E,, 59

Volume

11, Number

1

STATISTICS

5. Gradient and Hessian In the practice function

January

LETTERS

1991

of the fitting function

of covariance-structure

F(B)

& PROBABILITY

analysis

= (s - @))‘W(s

it is usual

to minimize

the (weighted)

least-squares

- u(d)),

fitting

(5)

where s = vech S, with S the sample covariance matrix, (T= vech 2 and W is a possibly stochastic (positive semi-definite) matrix. This is the fitting function used in generalized least-squares estimation implemented in most of the computer programs for covariance-structure analysis (EQS, LISREL, LISCOMP, etc.). Also maximum-likelihood estimation (under the assumption that z is normally distributed) has been shown to yield the same estimators as a weighted least-squares analysis where W is updated after each iteration (see, e.g., Browne, 1973; Lee and Jennrich, 1979; Fuller, 1987, Section 4.2.2) The weighted least-squares function (5) with W supposed not to depend on 8, implies the following expression for the gradient of F: aF/aS’=

- a)‘W(&~/‘t@‘)

-2(s

= -2(s

- a)‘WG,

(6)

where G was defined in (3) above. Differentiating dF, we get d2F=2(da)‘Wdo-2(s-a)‘Wd2a = 2(da)‘Wda

- 2 g

((s - e)‘W),(d2a),,

r=l

where the sum runs over the indexes ij corresponding to the p * = p( p + 1)/2 distinct elements of Z. Now using da = G d8 and (d2a), = (da)‘H, da, where G is the matrix of (3) and H, is the matrix of (4) with corresponding ij indexes, we get

hence, the Hessian t12F as asr=2

5 ((.s-IJ)‘W)~H, ?-=I

G’WG-

d2F=2(d6)’

of F is expressed G’WG-

as

r=l 5

{(vech(S-E))‘W}.H,

1

(da);

1,

(7)

where

Here w,, is the hrth element of W, and H, = au,,/a?? as’ for corresponding index ij. In the case of normality of z, maximum likelihood (ML) estimation yields the following to be minimized: F,,(B)

=lnlE(B)

1+ trace s(Z(e))-‘,

fitting

function

(8)

which is [ - 2/( n - l)] times the log-likelihood function of the data, with n being the sample size. The first derivative of F&O) is the same as the right-hand side of (6) with the matrix W being defined as w= 60

22’O’(Z”

@ X’)D.

(9)

Volume

11, Number

1

The Hessian of FML(0) following term: 2G’D’(Z-’

STATISTICS

is obtained

8 X’(S

& PROBABILITY

by adding

LETTERS

to the right-hand

side of (7)

with

W given

January

1991

in (9)

the

- _Z)X’)DG.

Note that from the gradient and Hessian given above, we immediately obtain the gradient and Hessian of a specific model. Effectively, such a specific model is associated with a function 6 = S(O) with corresponding derivative matrix A = as(O)/a6’. Linear restrictions imply that the matrix A is constant, hence the corresponding gradient and Hessian will be the product matrices [aF/M’]A and k[a2F/asas’]A, respectively.

References Aigner, D.J., C. Hsiao, A. Kapteyn and T. Wansbeek (1984), Latent variable models in econometrics, in: Z. Griliches and M.D. Intriligator, eds., Handbook of Econometrics, Vol. 2 (North-Holland, Amsterdam) pp. 1321-1393. Anderson, T.W. (1984), Estimating linear statistical relationships, Ann. Statist. 12 (1) l-45. Bentler, P.M. (1985), Theory and Implementation of EQS, a Structural Equations Programs (BMDP Statistical Software, Los Angeles). Bentler, P.M. and D.G. Weeks (1980) Linear structural equations with latent variables, Psychometrika 45, 289-308. Bollen, K.A. (1989) Structural Equations with Latent Vanables (Wiley, New York). Browne, M.W. (1974) Generalized least squares estimators in the analysis of covariance structures, South African Statist. J. 8, l-24. Browne, M.W. (1984) Asymptotically distribution-free methods for the analysis of covariance structures, British J. Math. Statist. Psych. 37, 62-83. Fuller, W. (1987), Measurement Error Models (Wiley, New York). Joreskog, K.G. (1977) Structural equation models in the social sciences: Specification, estimation and testing, in: P.R. Krishnaiah, ed., Application of statistics (Norh-Holland, Amsterdam) pp. 265-287.

Joreskog, K.G. and D. S&born (1983) LISREL V User’s Guide. Analysis of Structural Equation Relationship by Maximum Likelihood and Least Squares Methods (International Educational Services, Chicago). Lee, S-Y., and R.I. Jennrich (1979) A study of algorithms for covariance structure analysis with specific comparison using factor analysis, Psychometrika 44, 99-113. Magnus, J.R. and H. Neudecker (1986) Symmetry, O-l matrices and Jacobians, Econometric Theory 2, 157-190. McDonald, R.P. (1980) A simple comprehensive model for the analysis of covariance structures, British J. Math. Statist. Psych. 31, 59-72. Muthen, B. (1987) LISCOMP: Software for advanced analysis of L.S.E. with a comprehensive measurement model (Scientific Software, Mooresville, IN). Satorra, A. (1989). Alternative test criteria in covariance structure analysis: a unified approach, Psychometrika 54 (1) 131-151. Shapiro, A. (1986) Asymptotic theory of/Jverparameterized structural models, J. Amer. Statist. Assoc. 81, 142-149. Wiley, D.E. (1973) The identification problem for structural equation models with unmeasured variables, in: AS. Goldberger and O.D. Duncan, eds., Structural Equation Models in the Social Sciences (Seminar Press, New York) pp. 69-83.

61