This appendix lists some basic formulas and useful vector, matrix and tensor formulas. Note that these formulas are selected for the purpose of deriving equations in this book, and do not cover the whole vector, matrix and tensor formulas.
A.1 EXPECTATION E(a) f (a) + g(a) = E(a) f (a) + E(a) g(a) E(a) bf (a) = bE(a) f (a)
(A.1) (A.2)
A.2 JENSEN’S INEQUALITY For a concave function f , X is a probabilistic random variable sampled from a distribution function, and an arbitrary function g(X), we have the following inequality: f E(X) [g(X)] ≥ E(X) [f (g(X))].
(A.3)
In the special case of a concave function f (·) = log(·), (A.3) is rewritten as follows: log E(X) [g(X)] ≥ E(X) [log(g(X))].
A.6 DERIVATIVE ∂ log |A| = (A )−1 ∂A ∂a b ∂b a = =b ∂a ∂a ∂a Cb = ab ∂C ∂tr(AB) = B ∂A ∂tr(ACB) = A B ∂C
(A.15) (A.16) (A.17) (A.18) (A.19)
∂tr(AC−1 B) = −(C−1 BAC−1 ) ∂C
(A.20)
∂a Ca = (C + C )a ∂a
(A.21)
∂b A DAc = D Abc + DAcb ∂A
(A.22)
BASIC FORMULAS
327
A.7 COMPLETE SQUARE When A is a symmetric matrix, we have x Ax − 2x b + c = (x − u) A (x − u) + v
(A.23)
where u A−1 b,
(A.24)
v c − b A−1 b.
By using the above complete square formula, we can also derive the following formula when matrices A1 and A2 are symmetric: (x − b1 ) A1 (x − b1 ) + (x − b2 ) A2 (x − b2 ) = x (A1 + A2 ) x − 2x (A1 b1 + A2 b2 ) + b1 A1 b1 + b2 A2 b2
A
b
(A.25)
c
= (x − u) (A1 + A2 ) (x − u) + v where u = (A1 + A2 )−1 (A1 b1 + A2 b2 )
A.8 TENSOR ALGEBRA • Norm of a given tensor X ∈ RI1 ×I2 ×···×IN is the square root of the sum of the square of elements
X = ··· xi21 i2 ···iN (A.27) i1 i2 iN = X , X . • Inner product of two tensors X , Y ∈ RI1 ×I2 ×···×IN is the sum of element-wise product between their entries X , Y = ··· xi1 i2 ···iN yi1 i2 ···iN . (A.28) i1
i2
iN
• If a tensor X ∈ RI1 ×I2 ×···×IN can be written as the outer product of several vectors, then we called it a rank-one tensor X = u(1) ◦ u(2) ◦ · · · ◦ u(N ) where ◦ denotes the outer product.
(A.29)
328
APPENDIX A BASIC FORMULAS
• If a tensor X ∈ RI1 ×I2 ×···×IN only has values whenever i1 = i2 = · · · = iN , other elements being all zeros, it’s a diagonal tensor. • The mode-n (matrix) product of a tensor X ∈ RI1 ×I2 ×···×IN by a matrix U ∈ RJ ×In is denoted by X ×n U. Each mode-n fiber is multiplied by the matrix U. The equation can be expressed as (X ×n U)i1 ···in−1 j in+1 ···iN =