In!. J. Non-Lineur Mechanics, Printed in Great Britain
Vol. 22, No. 3, pp. 251-260,
1987 0
0020-7462/87 1987 Pcrgamon
$3.00 + 0.00 lournals Ltd
EXTENSION OF SECOND MOMENT ANALYSIS TO VECTOR-VALUED AND MATRIX-VALUED FUNCTIONS FAI MA* University
of California, (Received 15
Mechanical
Engineering,
Berkeley,
August 1984; in revised form 27
CA 94720, U.S.A.
October
1986)
Abstract-In this paper vector and matrix transformations of random variables are considered. Techniques from matrix calculus and Kronecker algebra are employed to systematically develop generalized formulae for second moment analysis. The results derived are easily amenable to computational procedures.
1. INTRODUCTION
Suppose g is a function of a scalar random variable z. Lacking complete knowledge of the probability density function of z, one attempts to characterize g(z) by utilizing only information on moments of z. It is well-known that a second-moment analysis, which uses the first and second moments of z, generates the following approximate formulae [l] for the mean and variance of g(z) E(g(z)) = g(E(z)) + ;Var (z) 2
0
(1)
2
Var (g(z)) = Var (z) 2
dg where the derivatives d7and
d2g dzZare
(2)
evaluated at E(z). Equation (1) is accurate to second
order, and equation (2) to first order. Both expressions are obtained from the Taylor expansion of g(z) about E(z). If g is linear, expressions (1) and (2) are exact. Moreover, if the variance of z is sufficiently small, the above approximate formulae should be adequate for most practical purposes even when g is non-linear [l]. Quantitatively, Cornell [2] has suggested that if the coefficient of variation is not greater than 0.2, then second moment analysis is applicable to moderately non-linear functions. Since the question of adequate approximation and the measure of being sufficiently small vary from one engineering model to another, we shall not dwell upon these questions here. After all, the above approximations may be successively improved by including higher order terms. The method of second moment analysis ha’s been extended to real-valued functions defined on subsets of R”, with n 2 1. Let h be a function of several random variables zl, z2,. . . , z,. Then the following approximate formulae can be established [l]
EMzl,...,z.)) = WW,),..., E(zn)) +
ii$j$l(g)COv (zi,zj) 1
J
Var(h(z,,...,
(4)
where all derivatives are evaluated at E(z,), . . . , E(z,). The conditions for the applicability of these formulae are similar to those stated for the case of one variable. Again, the above
l
Formerly
associated
with IBM Watson
Research
Center, 251
where this research
was inspired.
252
approximations
F. MA
may be improved
by incorporating
higher order terms. Since
Var (zi) = Cov (zi, zj) for all i = j, equations
(5)
(3) and (4) become
E(h(zl, . . , 4) = NE(zl),
, E(z,)) + i ,f: r-l
2 Var
(zi)
1
2
Var(h(z,,...,
Z,))
=
f
i=j
(1 E
ZZi
Var(ZJ
if zi and zj are uncorrelated whenever i # j. All the above formulae are generally useful in engineering design [3]. For instance, equation (4) is the basis of error propagation analysis in measurement theory. This paper aims at generalizing second moment analysis to vector-valued and matrixvalued functions. The motivation for and the principles upon which the generalization is made will be addressed. The practical applications and the algorithmic coding of the results will appear elsewhere. 2. PROBLEM
FORMULATION
For the purpose of motivation, let us consider an example from computational mechanics. The flow of a single fluid through a porous medium may be described by the principle of mass conservation and Darcy’s law [4]. For a slightly compressible fluid such as water, the governing differential equation for steady-state flow in one dimension is
(K(x)21=
g
0
where 4 is the hydraulic head and K is the hydraulic conductivity. Here, attention is focused on steady-state flow for the ease of discussion; it is understood that the transient case is more interesting and significant. On account of the innate variability in geologic formations, the conductivity K(x) is realistically regarded as a spatial stochastic process. In the Monte Carlo simulation of system (8) over a bounded domain, we may first partition the flow domain [o,LJ into n equal subintervals [S, 63. A set of n random samples distribution of K. By using Ki as the {K 1,. . . , K,} is then chosen from the probability hydraulic conductivity value for the ith subinterval, the set {K 1,. . . , K,} may serve as a conductivity realization of K(x) over the bounded domain. For this particular realization of K(x), the solution over [o,L] may be calculated by utilizing the various interlayer boundary conditions [SJ. This process may be continued by selecting other sets of random samples of K(x), and the results of the repetitive calculations may then be interpreted statistically. However, the random samples Ki selected by the computer are necessarily independent. The above procedure, therefore, generates output with the implicit assumption that the hydraulic conductivity values in adjacent blocks of the discretized flow domain are statistically independent. To attain a closer representation of reality, a correlation relation should be constructed over [o,L]. This amounts to, in mathematical terms, devising a transformation f from R” to R” such that
f-W
1,.
.
,
KJT) = (~1,. . , z,JT
(9)
where the values Zi are correlated in a desired manner. If the random vector (zi, . . . , z,) is used as conductivity realization of K(x) over Co,,!,], then the resulting numerical model will possess the desired spatial correlation. For the system (8), a correlation structure may be generated by employing a time series type transformation [6], which we shall not pursue. The previous discussion, together with many other examples [7,8], show clearly that it is realistic to study the approximate properties of transformations from R” to Rp
Second
moment
analysis
253
between random vectors. We shall, instead, attempt to investigate the seemingly more difficult problem of generalizing second-moment analysis to matrix-valued transformations. Let X(s x t) and Z(p x q) be random matrices with dimensions as indicated. That means the elements of X and Z are random variables which can be continuous or discrete. The elements of each of the matrices X and Z may also be correlated. Suppose F is a transformation between matrices such that F(X) = Z.
(10)
The objective of this investigation is to characterize the mean and variance of Z by utilizing the first two moments of the elements of X. Any such characterization should be capable of being sharpened when additional information on the higher moments of the input is incorporated. In developing a matrix second-moment analysis with the above in mind, there are two basic questions that need to be examined. First, it is necessary to know how the moments of a matrix should be conveniently defined. Since Kronecker algebra has been found, among other things, to be particularly suited for the study of stochastic systems [9, lo], we shall define the pth moment of a random matrix as the expectation of the pth Kronecker power. This proposal will be clarified in subsequent sections. The second question is how a random matrix may be cast into an infinite series involving all the matrix moments. It has been recognized that the lack of a structural Taylor formula for tensor transformations is a barrier against extending linear analysis into the domain of higher order tensors [l 11. However, Vetter has recently derived the Taylor expansion for matrices [12], which is based upon Kronecker algebra. For this reason, we shall draw upon the methods of matrix calculus and Kronecker algebra in the development of matrix second-moment analysis. Results corresponding to the transformations of random vectors from 17%” to (wpwill be deduced as a corollary to our general analysis of matrix transformations. With an appropriate modification in the definitions, the extension of second-moment analysis given in [ 131 can be shown to be a particular case of the corollary. The transformation (10) arises naturally in the investigation of fluid flow through a twodimensional porous medium [14]. In extending system (8) to two dimensions, the flow domain is discretized into planar cells instead of subintervals. Suppose the planar cells are indexed as a rectangular array, with the random conductivity value in the (i,j)th cell denoted by Kij. Then a transformation between matrices of random variables results in the process of correlating Kij within the bounded two-dimensional domain, for which X =
(Kij)
in equation (lo), and Z is the matrix of random variables with the desired planar spatial correlation. It is hoped that the brief discussion presented so far would impart sufficient motivation for the practicality and utility of studying (10); additional supportive examples may be found in [3,15]. 3. MATHEMATICAL
PRELIMINARIES
In this section we shall attempt to delve into the methods of matrix calculus and Kronecker algebra. Matrix calculus concerns limiting operations on matrices. A consistent and constructive matrix calculus would be extremely useful in fields such as multivariate analysis, system control, and quantum physics. Of course, expressions in matrix system forms can always be written in terms of sets of scalar equations; and further manipulations can then be performed by ordinary calculus. However, the complexity of such a process has precluded many theoretical advances. The compactness of relations in terms of matrices not only leads to a better understanding of the problem involved, but also helps to point to the directions in which further research efforts should be made. There are several formulations of matrix calculus. The version suggested by Turnbull [ 16,171 is very inspiring. However, we shall employ the definitions proposed by Vetter [12]. The theory of matrix calculus as developed by Vetter is consistent and remarkably complete. A portion of this section is adopted from [183. One of the fundamental problems that needs to be examined before any matrix calculus
254
F. MA
can be formulated is the non-commutativity of matrix multiplications. As suggested by Bellman [19], the use of Kronecker products might be one possible means to circumvent the above problem; and the formulation of matrix calculus by Vetter is based upon Kronecker algebra. For this reason we shall first survey several aspects of Kronecker algebra that are needed in our investigation. We believe that we have occasionally obtained a new interpretation or extension of a known result. Let A(p x 4) and B(s x t) be two given matrices. The Kronecker product of the two matrices, denoted by A @ B, is a ps x qt matrix defined by [20]
.. Since A and B need not be conformable to possess a Kronecker product, the above definition represents a generalization of matrix multiplication. Let E;,“; denote the permutation matrix of order pq, consisting of q x p array of p x q dimensional elementary submatrices
E,P,“;=
11;;
1;:
1;;
1;
1
where the p x q dimensional elementary matrix E’j has 1 in the (i, j)th position other elements are zero. Then it can be shown that B @ A = E,P,“;(A@ B)E;;‘: so that the structure
of B @ A is determined
(12)
by that of A @ B. In addition
A@(B@C)=(A@B)@C
(13)
(A+B)@(C+D)=AOC+A@D+B@C+BOD
provided the various recursively
quantities
and all
(14)
(A @ B)(C @ D) = (AC) @ (BD)
(15)
(A @ B)T = AT @ BT
(16)
(A@B)-‘=A-‘@B-l
(17)
A@l=l@A=A
(18)
exist. We shall also consider
Kronecker
powers and define
At’] = A At”+ 11= A @ Atkl, Then it is apparent
that AN + 11= At’1 @ At’]
(19)
(AB)tkl = AtklBt’l
(20)
Second moment analysis
255
If A and B do not commute, in general (AB)k # A’B’; although from identity (15) (A @ B)’ = A’ 8.B’.
(21)
Equations (12), (20) and (21) are probably what prompt us to think that Kronecker multiplication may be a means to bypass the commutativity question. Suppose Aci represents the ith column of A, so that
A = (&~,&z,...,Acq). We define the column string cs(A)~ Rpq of A by
es(A) =
The concept of row string and additional identities on Kronecker products may be found in [12, 21, 221. Let us now turn to matrix calculus as developed by Vetter, which utilizes Kronecker algebra. The derivative of a matrix A(p x q) with respect to another matrix B(s x t) is a matrix of dimension ps x qt defined by
c-1 aA
aA
2-i =
abij
.
(22)
Suppose C(u x u), D(q x u) are matrices with dimensions as indicated. The following identities can be established
-l
(4 a"A
aB"
anAT =aBT”
(23)
for every positive integer n, and
(24)
G-9) where I, represents the identity matrix of order n. Notice that there is a slight error in [ 121, wherein the last permutation matrix in (25) is given as El;Zz. Furthermore, if A = A(C) and C = C(B) so that A = A(C(B)), the following chain rule applies aA
aB=
wmT (---@Ip)(4@&). aB
(26)
These definitions and identities for matrix derivatives represent a generalization and complementation of gradient vector and gradient matrix concepts [21,22]. One of the more significant applications of the various definitions presented in this section lies in the derivation of the Taylor series for matrices. If A(p x q) is a function of another matrix
256 B(s
F. MA x f),
then the Taylor
expansion
of A about
A(B) = A(R) + i IA,((cs(B k=l k! where (cs(B - @)[“I is the kth Kronecker
a nominal
value B of B is given by [ 121
- R))tkl @ IJ + R,, 1(B, B)
(27)
power of cs(B - R) as previously
defined,
and
(28) R,+,(R,B)
= O((cs(B - @)[“+‘I).
(29)
In order to exhibit the structural details of (27), let us consider the 2 x 2 case where the algebra is more transparent. If we expand a square matrix A(B) of order 2 about B, where B, R E R2; and if we denote A = A(B), then with the usual notation b,-6,
0
b, -6
0
b2-F2
0
0 32 aI2
Z2a,,
c?b:
ab:
2b,ab,
(?2a2, ___
d2a2, ___
Z2a2,
t?b:
ab:
ab,db,
72
6 __ aI1
L
~
a2a,,
-
d2a,, __
ii2a12 ~
ab2ab,
ab2ab,
db;
ah:
a2azl
d2a2, ~
?2a,, __
Z2az2 __
ablab,
ab,ab,
i3b:
d2az2
ab,ab,
~
Jb:
(b, - 6)=
0
0
i
(b,
-
6&2
-
-
0
52) (b,
62Nb,
-
aij(B)
=
ai@)
+
,$,
h
(b, [(
-
61)
62)’
+
(b2
-
+ O((b, - 6,)“+‘) + O((b, - 6J(b2
62)
+
i’ 6212
’
terms for each component,
&
,,.
- 6,)
0
(b2 -
62)
1
-
(b2 - h)(b,
at R. Collecting
$ ’
6Jb2
0
0 where all derivative matrices are evaluated is found that for 1 d i,j d 2
-
6,)
0 (b, -
FA2
(b, -
0 lb2
52
d2a,,
d2a12 ab,ab,
b2
Jij@)lB
it
= J
- 6,)) + ... + O(bz - F;2)“+1)
which is the usual Taylor formula for the multivariable case [l 11. Thus expression (27) is indeed a generalization of scalar calculus. The Taylor expansion will be used in the next section for the approximate evaluation of mean and variance. A discussion of the concepts of matrix differential and matrix integral, also introduced by Vetter, may be found in [ 121. It is obvious that the discussion presented in this section can be extended to complex matrices at the expense of increased mathematical complexity. 4. GENERALIZATION
OF SECOND
Suppose X(s x t) and Z(p x q) are matrices (10). For simplicity, we also write
MOMENT
of random
variables
ANALYSIS
related by transformation
Second
moment z
Denoting
251
analysis
= Z(X).
by
z = (Zij) it is obvious that E(Z) = (E(zij)). Moreover, the matrix power of Z, contains all the homogeneous monomials contains all the pth moments of the elements of Z
E(Z[“) =
(E(tzy))7 ZPij
ZIP], which is the pth Kronecker of degree p in zij. Hence E(Zfpl)
=
P9
Pij
2
O.
It seems appropriate, therefore, to refer to E(ZcP1)as the pth moment Z. The variance of Z is defined by [9,10,18] Var(Z)
of the random
= E((Z - E(Z))r21)
matrix
(30)
which includes all the variances and covariances of zij. Higher order central moments are similarly defined. We now proceed to derive the approximate mean and variance of the transformation (10). By expanding Z(X) in a Taylor series about E(X), we have
Z(X) = Z@(X)) +
where the deterministic
,gl;
matrices
Z,((cs(X -
E(X)))[kl 0 Iq) + O((cs(X - E(X)))[” +“)
Z, are given by
akz ‘k =atcs(x))Tk X=EtXJ’ Notice that each Z, is a matrix of dimension second moments of X, it is easy to see that E(Z(X)) = Z(E(X)) +
definition
(32)
p x qsktk. By retaining
(30), the above expression
-
cs(E(x))][*l0 Iq).
can be cast in the form
a2z E(Z(X))= Z(E(X))+ j’ a(cs(x))T*
(Var (es(X)) 0 I,)
where the derivative matrix is evaluated at E(X). Equation (33) is accurate If the series (31) is truncated to first order, the resulting equation is
Z(X) = Z@(X)) + Z,(cs(X - E(X)) 0 IJ It follows that (Z(X) -
only the first and
4 Z,E((cs(X - E(X)))[21 0 I,)
= Z(E(X)) + ) Z,E([cs(X) Recalling
(31)
Z(E(X)))[21 = (Z,(cs(X - E(X)) @ I,))[21 = Z[:‘(cs(X- E(X))
0 I 4 )I21
(33)
to second order.
258
where identity define by
F. MA
(20) has been used. To make subsequent
algebra
more transparent,
let us
Y = es(X) - cs(E(X)).
In this case (cs(X - E(X)) @ 14)[21 = (Y 0 14) 0 (Y 0 I,) = Y @ (I, 0 Y) 0 I, = (E:;',",::
((I, 0 Y) 0 Y)E: ,”ff 0 I,
= {I%:“,:: (I4 0 (Y 0 Y))I,] 0 I, = {E:;:“,:: (I, 0 Ytzl)j 0 I, where E$:“,z: is a constant permutation matrix whose elements are either 1 or 0, and several identities of the previous section have been exploited. Since the variance of Z accurate to first order is equal to E(Z(X)) - Z(E(X)))t’], we immediately have
[21{CEtfiXq2(Iq 0 Var (cs(X)))l0 I,}
(34)
where, again, the derivative matrix is evaluated at E(X). At this point, it seems convenient to summarize what has been established so far. Theorem. The mean and variance of the matrix transformation (10) derived by second moment analysis are given by expressions (33) and (34). It is clear from (33) and (34) that only the first and second moments of the elements of X are involved as inputs. From the discussion in a previous section, it is conceivable that if the coefficients of variation of the elements of X are sufficiently small, then formulae (33) and (34) may generally be adequate. Equivalently, this condition can be reflected by requiring that the ratio of the norm of Var(X) to the norm of E(X) be sufficiently small. The corresponding formulae for second moment analysis of vector transformations from R” to W can now be readily deduced. Setting t = q = 1 in equations (33) and (34), we have
WYW = Z(-W)) +
1 62z 2 (7~~2
Var (X)
(35)
(36)
t2r E: c : Var (X)
where all derivatives Xt2] dictates that
are evaluated
at E(X). However,
with X E R”, the specific structure
of
E” x S$21 = $21 sx.7
which implies Ez z: Var (X) = Var (X). Hence j$
Var (Z(X)) = (
tzl Var (X)
(37)
>
The following result can now be stated. Corollary. The mean and variance for the second moment
analysis
of vector transforma-
Second moment
analysis
259
tions are given by expressions (35) and (37). In addition, the formula (35) is accurate to second order, and formula (37) to first order. Further generalizations of the above theorem and corollary are straightforward. For instance, it can be shown that the variance corresponding to (37), but accurate to second order, is given by Var (Z(X)) = Z\‘l Var (X) - a Z[,zl(Var(X))rZ1 +
$z, 0
z, + z, @Z,)E(X
-
E(x)y31+
;zyqx - E(X)y41
(38)
where XE R”, ZE W’; and Z,, Z, are defined by (32). In the scalar case, the second order approximation for the variance corresponding to (2) has the form [l]
0
2
Var (g(z)) = Var (z) 2
3dgd2g 1 + E(z - E(z)) &G + aE(z - E(z))4
(39)
an expression to which (38) reduces upon setting p = s = 1. A different direction for generalization lies in the derivation of higher order moments. As an example, the first order formula for the third moment of Z E lRphas the form E(Z -
E(z)y31 = Z[:‘E(X - E(X)y31
where X E R”. It is obvious that any attempt to derive higher moments or expressions for E(Z) and Var(Z) accurate to a higher order would introduce Kronecker powers of increasing order. It also becomes apparent by now that matrix calculus and Kronecker algebra could serve as standard tools in stochastic analysis. After all, they enter naturally in many diverse areas of investigation in science and engineering. The manipulation of Kronecker matrices may be facilitated by the analysis presented in [23]. Practical applications and algorithmic coding of the formulae derived will be the subject of a future paper. CONCLUSION
In this paper vector and matrix transformations of random variables have been considered. Practical examples have been cited to motivate the problem formulation. Techniques from matrix calculus and Kronecker algebra are employed in the investigation, and generalized formulae for second moment analysis have been developed. The applicability of the formulae has been quantitatively discussed. The results obtained are easily amenable to computational procedures. An attempt has been made to demonstrate the utility of Kronecker algebra in stochastic analysis. Acknowledgement~I would
like to thank
The Standard
Oil Company
for supporting
this research.
REFERENCES I. A. H-S. Ang and W. H. Tang, Probability Concepts in Engineering PIarming and Design, Vol. I, Basic Principles. Wiley, New York (1975). 2. C. A. Cornell, First order analysis of model and parameter uncertainty, Int. Symposium on Uncertainties in Hydrologic and Water Resource Systems, Univ. of Arizona, Tucson, (1972). 3. A. H.-S. Ang and W. H. Tang, Probability Concepts in Engineering Planning and Design, Vol. II, Decision, Risk and Reliability. Wiley, New York (1984). 4. R. A. Greenkorn, Flow Phenomena in Porous Media. Marcel Dekker, New York (1983). 5. R. A. Freeze, A stochastic-conceptual analysis of one-dimensional groundwater flow in non-uniform homogeneous media. Water Resow. Res. 11, 725-741 (1975). 6. L. Smith and R. A. Freeze, Stochastic analysis of steady state groundwater flow in a bounded domain, 1. One-dimensional simulations. Water Resow. Res. 15, 521-528 (1979). 7. M. Shinozuka, Monte Carlo solution of structural dynamics. Comput. Struct. 2, 855-874 (1972). 8. H. Contreras, The stochastic finite-element method. Comput. Struct. 12, 341-348 (1980).
260
F. MA
9. F. Ma and T. K. Caughey, On the stability of linear and non-linear stochastic transformations. Int. J. Control 34, 501-511 (1981). 10. F. Ma, Stability theory of stochastic difference systems. In Probabilistic Analysis and Related Topics, Vol. 3, (Edited by A. T. Bharucha-Reid), pp. 127-160. Academic Press, New York (1983). Il. T. M. Apostol, Mathematical Analysis, 2nd Edn. Addison-Wesley, Reading, MA (1974). 12. W. J. Vetter, Matrix calculus operations and Taylor expansions. SIAM Rev. 15, 352-369 (1973). 13. M. D. Dettinger and J. L. Wilson, First order analysis of uncertainty in numerical models of groundwater flow. Part 1. Mathematical development. Water Resow. Res. 17, 149-161 (1981). 14. L. Smith and R. A. Freeze, Stochastic analysis of steady state groundwater flow in a bounded domain, 2. Two-dimensional simulations, Water Resow. Res. IS, 154331559 (1979). 15. J. C. Davis, Statistics and Data Analysis in Geology. Wiley, New York (1973). 16. H. W. Turnbull, On differentiating a matrix, Proc. Edinburgh Math. Sot. Ser. 2, 11 l-128 (1927). 17. H. W. Turnbull, A matrix form of Taylor’s theorem, Proc. Edinburgh Math. Sot. Ser. 2, 33-54 (1930). 18. F. Ma, Approximate analysis of a class of linear stochastic systems with colored noise parameters. Int. J. Eng. Sci. 24, 19-34 (1986). 19. R. Bellman, Limit theorems for non-commutative operations. Duke Math. J. 21, 491-500 (1954). 20. R. Bellman, Introduction to Matrix Analysis. McGraw-Hill, New York (1960). 21. J. W. Brewer, Kronecker products and matrix calculus in system theory. IEEE Trans. Circuits and Systems CAS-25, 772-781 (1978). 22. A. Graham, Kronecker Products and Matrix Calculus wirh Applications. Ellis Horwood, England (1981). 23. G. Seroussi and F. Ma, On complexity of matrix Kronecker powers. Inf. Process. Letr. 17, 145-148 (1983).