J. Math. Anal. Appl. 441 (2016) 1–10
Contents lists available at ScienceDirect
Journal of Mathematical Analysis and Applications www.elsevier.com/locate/jmaa
An extended Hamiltonian algorithm for the general linear matrix equation Xiaomin Duan a,b , Xinyu Zhao a,∗ , Chunyuan Shi a a b
School of Materials Science and Engineering, Dalian Jiaotong University, Dalian 116028, China School of Science, Dalian Jiaotong University, Dalian 116028, China
a r t i c l e
i n f o
Article history: Received 16 January 2016 Available online 7 April 2016 Submitted by W. Sarlet Keywords: Extended Hamiltonian algorithm Linear matrix equation Differential geometry
a b s t r a c t A second-order learning algorithm based on differential geometry is used to n T T numerically solve the linear matrix equation Q = x + m i=1 Ai xAi − i=1 Bi xBi . An extended Hamiltonian algorithm is proposed based on the manifold of symmetric positive definite matrices. The algorithm is compared with traditional coupled fixedpoint algorithm. Numerical experiments illustrate that the convergence speed of the provided algorithm is faster than that of the coupled fixed-point algorithm. © 2016 Elsevier Inc. All rights reserved.
1. Introduction The linear matrix equation is presented as follows:
Q=x+
m
ATi xAi −
i=1
m
BiT xBi ,
(1)
i=1
where Q is an n × n positive definite matrix and Ai , Bi are n × n arbitrary matrices. ATi and BiT denote the conjugate transpose of the matrices Ai and Bi , respectively. Equation (1) is the general case of the generalized Lyapunov equation as follows:
M Y S ∗ + SY M ∗ +
t
Nk Y Nk∗ + CC ∗ = 0.
k=1
* Corresponding author. Fax: +86 411 84107625. E-mail addresses:
[email protected] (X. Duan),
[email protected] (X. Zhao),
[email protected] (C. Shi). http://dx.doi.org/10.1016/j.jmaa.2016.03.089 0022-247X/© 2016 Elsevier Inc. All rights reserved.
(2)
X. Duan et al. / J. Math. Anal. Appl. 441 (2016) 1–10
2
The positive definite solution of Equation (2) is the controllability Gramian of the bilinear control system (see [4,5,22] for more details)
M x(t) ˙ = Sx(t) +
t
Nk x(t)uk (t) + Cu(t).
(3)
k=1
Special cases of Equation (1) have been previously investigated. Several numerical methods, such as the Bartels–Stewart, Schur and QR decomposition methods, and Hessenberg–Schur, have been proposed in [3] and [12] to solve the following well-known Lyapunov equation Q = x + AT xA. Some sufficient conditions m for the existence of a unique symmetric positive definite solution of equations Q = x + i=1 ATi xAi and m Q = x − i=1 BiT xBi are given in [17,18], according to the Kronecker product and the fixed point theorem in partially ordered sets. Duan used matrix differentiation to obtain a precise perturbation bound for the positive definite solution for Equation (1) [5]. Berzig provided a sufficient condition for the existence of a unique positive definite solution, and an iterative algorithm using the Bhaskar–Lakshmikantham coupled fixed-point theorem (CFPA) was proposed [2]. The solution of Equation (1) is a symmetric positive definite matrix. Moreover, the set of all the symmetric positive definite matrices can be considered to be manifold. Thus, evaluating the solution problem is facilitated using the geometric structures on this manifold [6–8]. Fiori developed the extended Hamiltonian algorithm (EHA), which is a second-order learning algorithm on manifold [11]. The inclusion of a momentum term remarkably enhances the rate of convergence [16,19]. Thus, the EHA is proposed to numerically solve Equation (1), which is based on the geometric structures of the Riemannian manifold. Section 2 describes some fundamental concepts of the manifold discussed in the study. Section 3 introduces the EHA for Equation (1) on the manifold. Section 4 compares and demonstrates the behavior of the EHA with the CFPA using simulations. 2. Preliminaries M(n) is a set of n × n real matrices and GL(n) is its subset, which contains only non-singular matrices. GL(n) is a Lie group, that is, a group which is also a differentiable manifold and on which the operations of group multiplication and the inverse are smooth. The tangent space at the identity of GL(n) is called the corresponding Lie algebra. This space comprises all linear transformations, namely M(n). Additionally, x > 0 implies that x is a symmetric positive definite matrix. For a different notation x − y > 0, x > y. The manifold consisting of all the symmetric positive definite matrices is defined as follows: S+ (n) = {x ∈ Rn×n |xT = x, x > 0}. The tangent space at point x ∈ S+ (n) is given by Tx S+ (n) = {V ∈ Rn×n |v T = v}. The manifold S+ (n) is not a Lie subgroup of GL(n), but it is a submanifold of GL(n). Different metrics on the manifold S+ (n) can be defined, such that different geometric structures are established on this manifold. These geometric structures, which are used in the following sections, are briefly introduced in this section. Their detailed description can be found in [1,9,14,15,20]. 2.1. Tangent space on manifold S+ (n) The exponential map of a matrix v ∈ M(n) is usually given, by the following convergent series: exp(v) =
∞ vm . m! m=0
The inverse map, that is, the logarithmic map is defined as follows:
(4)
X. Duan et al. / J. Math. Anal. Appl. 441 (2016) 1–10
log(x) =
∞
(−1)m+1
m=1
3
(x − I)m , m
(5)
for v in a neighborhood of the identity I of GL(n). The space of all n × n symmetric matrices is denoted by Sym(n). The exponential map from Sym(n) to + S (n) is one-to-one and onto. Hence, the exponential map of any symmetric matrix is a symmetric positive definite matrix, and the logarithm map of any symmetric positive definite matrix is a symmetric matrix. The tangent space Tx S+ (n) is identified at x with Sym(n) for each x ∈ S+ (n), because S+ (n) is an open subset of Sym(n). 2.2. Frobenius inner product and Euclidean distance Given that S+ (n) is a subset of the Euclidean space M(n), the Frobenius inner product on S+ (n) is defined as follows: x1 , x2 = tr(xT1 x2 ),
(6) 1
where x1 , x2 ∈ S+ (n) and tr denotes the trace of the matrix. The associated norm xF = x, x 2 is used to define the Euclidean distance on M(n) as follows: dF (x1 , x2 ) = x1 − x2 F .
(7)
Moreover, the geodesic passing through v in the direction of u ∈ Sym(n) with respect to the Frobenius inner product is as follows: γ(t) = v + tu,
0 ≤ t < δ,
(8)
for some δ > 0. This geodesic remains within manifold S+ (n) when t is not too large. 2.3. Riemannian metric and geodesic distance The Riemannian metric at the point v on manifold S+ (n) is defined as follows: gv (x1 , x2 ) = gI (v −1 x1 , v −1 x2 ) = tr(v −1 x1 v −1 x2 ),
(9)
where I denotes the identity matrix, x1 , x2 ∈ Sym(n). The positive definiteness of this metric is guaranteed by gv (x, x) = tr(v − 2 xv − 2 v − 2 xv − 2 ). 1
1
1
1
(10)
The manifold S+ (n) with the Riemannian metric in Equation (9), then becomes a Riemannian manifold. Moreover, S+ (n) is a curved manifold because this metric is invariant under the group action of GL(n). The geodesic distance on manifold S+ (n) is defined using Equation (9). Let [0, 1] be a closed interval in R, and φ : [0, 1] → S+ (n) be a sufficiently smooth curve on manifold S+ (n). The length of φ(t) is defined as follows: 1 (φ(t)) := 0
˙ ˙ φ(t))dt = gφ(t) (φ(t),
1 2 dt. ˙ tr(φ(t)−1 φ(t)) 0
(11)
X. Duan et al. / J. Math. Anal. Appl. 441 (2016) 1–10
4
The geodesic distance between two matrices x1 and x2 on manifold S+ (n) is the infimum of the lengths of the curves connecting them as follows: dg (x1 , x2 ) := inf{(φ)|φ : [0, 1] → S+ (n), φ(0) = x1 , φ(1) = x2 }.
(12)
Thus, the length-minimizing smooth curves are geodesics, so that the infimum of Equation (12) is achieved by a geodesic. The Hopf–Rinow theorem implies that manifold S+ (n) is geodesically complete [13], which implies that the geodesic for the interval [0, 1] can be sought to (−∞, +∞). Therefore, a geodesic γ(t) can be found for any given pair x1 , x2 ∈ S+ (n), such that γ(0) = x1 and γ(1) = x2 , by taking the initial velocity 1
−1
−1
1
as γ(0) ˙ = x12 log(x1 2 x2 x1 2 )x12 ∈ Sym(n). Moreover, the geodesic γ(t) can be expressed as follows: −1
1
−1
1
1
−1
−1
1
γ(t) = x12 exp(t log(x1 2 x2 x1 2 ))x12 = x12 (x1 2 x2 x1 2 )t x12 .
(13)
Equation (13) is entirely contained in the manifold. Therefore, according to Equation (11), the geodesic distance dg (x1 , x2 ) can be computed explicitly as follows:
dg (x1 , x2 ) =
−1 −1 log(x1 2 x2 x1 2 )F
=
n
12 ln(λi )2
,
(14)
i=1 −1
−1
− 12
where λi are the eigenvalues of x1 2 x2 x1 2 . The matrix square root x1 λi are also the eigenvalues of x−1 1 x2 .
is not necessary in practice because
3. Extended Hamiltonian algorithm for the solution of the linear matrix equation 3.1. Problem formulation The solution for Equation (1) is a unique positive definite if Ai , Bi , and Q satisfy the following condition [2]: m i=1
ATi QAi <
Q , 2
m
BiT QBi <
i=1
Q . 2
(15)
m m For convenience, A(x) := Q + i=1 ATi xAi and B(x) := x + i=1 BiT xBi . A(x) and B(x) are symmetric positive definite for any x ∈ S+ (n). A matrix x ∈ S+ (n) is explored such that A(x) is as close as possible to matrix B(x). Some key points in designing such an algorithm are determined as follows: 1. For some x, the difference between the symmetric positive definite matrix A(x) and matrix B(x) should first be determined and calculated. 2. EHA can then be used to adjust x to minimize the difference. The Riemannian manifold S+ (n) is geodesically complete. Hence, the geodesic distance between A(x) and B(x) is considered to be the objective function J(x) as follows: 1 1 J(x) = d2g A(x), B(x) = tr[log2 (A(x)− 2 B(x)A(x)− 2 )].
(16)
Thus, the exact solution of Equation (1) may be defined as follows: x∗ = arg min J(x). + x∈S (n)
(17)
X. Duan et al. / J. Math. Anal. Appl. 441 (2016) 1–10
5
3.2. Riemannian gradient of the objective function The Riemannian gradient of J at x, denoted by ∇x J is a map from S+ (n) to Sym(n), which satisfies gx (∇x J, s) = ∂x J, s, for every s ∈ Sym(n), where ∂x J is the Euclidean gradient of J at x. In addition, the Riemannian manifold is denoted by M. Thus, its tangent space at point x ∈ M is given by Tx M. Given a regular function f : M → R, its Riemannian gradient ∇x f in the direction of vector v ∈ Tx M measures the rate of change of the function f in the direction v. The following conditions facilitate the computation of ∇x J [21]. Lemma 3.1. Let A, B denote two constant matrices and X, Y denote matrix functions. The following properties are considered: (1) (2) (3) (4) (5)
dtr(X) = tr(dX), dA = 0, d(AXB) = A(dX)B, d(X ± Y ) = dX ± dY, d(XY ) = (dX)Y + X(dY ),
(18)
where d denotes the differential of a matrix (scalar) function. Lemma 3.2. Given any smooth curve φx,v : [0, 1] → M, such that φx,v (0) = x ∈ M and φ˙ x,v (0) = v, the Riemannian gradient ∇x f is the unique vector in Tx M as follows: gx (v, ∇x f ) =
d f (φx,v (t))|t=0 . dt
(19)
The Riemannian gradient of the objective function J(x) is first computed and optimized. From the Riemannian gradient in Equation (19), the generic smooth curve φx,v may be replaced with a geodesic curve. The following theorem is therefore obtained. Theorem 3.1. The Riemannian gradient of the objective function J(x) at point x is given as follows: m ∇x J(x) = 2x log(A(x)−1 B(x))B(x)−1 − Aj log(A(x)−1 B(x))A(x)−1 ATj j=1
+
m
Bj log(A(x)−1 B(x))A(x)−1 BjT x.
(20)
j=1
Proof. First, the derivative of the real-valued function is determined as follows: 1 1 J s(t) = tr[log2 (A(s(t))− 2 B(s(t))A(s(t))− 2 )]
(21)
with respect to t, where s(t) = x 2 exp(tx− 2 vx− 2 )x 2 is the geodesic curve emanating from x in the direction v = s(0). ˙ The following equations are obtained using Equation (18) and Proposition 2.1 in [14], 1
1
1
1
d J s(t) |t=0 dt m
= 2tr log(A(x)−1 B(x))B(x)−1 − Aj log(A(x)−1 B(x))A(x)−1 ATj j=1
X. Duan et al. / J. Math. Anal. Appl. 441 (2016) 1–10
6
+
m
Bj log(A(x)−1 B(x))B(x)−1 BjT v
j=1 m Aj log(A(x)−1 B(x))A(x)−1 ATj = gx (v, 2x log(A(x)−1 B(x))B(x)−1 − j=1
+
m
Bj log(A(x)−1 B(x))B(x)−1 BjT x).
j=1
Thus, based on Equation (19), Equation (20) is validated. 2 3.3. Extended Hamiltonian algorithm on S+ (n) Fiori proposed the extended Hamiltonian algorithm (EHA) on manifold [10]. Specifically, the EHA on manifold S+ (n) can be expressed as follows: x˙ = v, v˙ = vx−1 v − ∇x V − μv,
(22)
where x ∈ S+ (n), v denotes the instantaneous learning velocity, V : S+ (n) → R denotes an objective function, ∇x V denotes the Riemannian gradient of V , and the constant μ > 0 denotes a viscosity term. The function denotes the kinetic energy of the particle at point x and has the corresponding expression Kx = (1/2)tr[(vx−1 )2 ]. System (22) may be implemented as follows [11]: −1
1
−1
1
xk+1 = xk2 exp(ηxk 2 vk xk 2 )xk2 , vk+1 = η(vk x−1 k vk − ∇xk V ) + (1 − ημ)vk ,
(23)
where η is a small positive number known as the learning rate. The effectiveness of the algorithm is ensured if and only if
2λmax < μ <
1 , η
(24)
where λmax denotes the largest eigenvalue of the Hessian matrix of the cost function V (x). More detailed description can be found in [10,11]. The Riemannian distance is the shortest one on manifold S+ (n), so that the Riemannian distance (16) can be considered as the objective function V (x) of Equation (1) as follows: 1 1 V (x) = d2g A(x), B(x) = log(A(x)− 2 B(x)A(x)− 2 )2F .
(25)
Theorem 3.1 is substituted into Equation (23) to apply Equation (23) to solve Equation (1). Thus, the final EHA on manifold S+ (n) is determined as follows: x˙ = v, 1 1 1 1 . a = B(x)− 2 log(B(x) 2 A(x)−1 B(x) 2 )B(x)− 2 , m m . b = 2x(a − j=1 Aj aATj + j=1 Bj aBjT )x, v˙ = vx−1 v − b − μv, which is implemented as follows:
(26)
X. Duan et al. / J. Math. Anal. Appl. 441 (2016) 1–10 −1
1
−1
7
1
xk+1 = xk2 exp(ηxk 2 vk xk 2 )xk2 , 1 1 1 1 . ak = B(xk )− 2 log(B(xk ) 2 A(xk )−1 B(xk ) 2 )B(xk )− 2 , m m . bk = 2xk (ak − j=1 Aj ak ATj + j=1 Bj ak BjT )xk ,
(27)
vk+1 = η(vk x−1 k vk − bk ) + (1 − ημ)vk , where η > 0 denotes the learning rate and the constant μ denotes a viscosity term. Thus, the EHA for the numerical solution of Equation (1) can be written explicitly as follows: Algorithm 3.1. For any x in the considered manifold S+ (n), EHA is given as follows: 1. x0 is set as an initial input matrix x and v0 as an initial direction of v, and then a desired tolerance ε > 0 is chosen. 2. ρ(A(X) − B(X)), where ρ(·) denotes the spectral radius of the matrix. 3. If ρ(A(X) − B(X)) < ε then the algorithm is stopped. 4. xk and vk are updated using Equation (27) and step 2 is repeated. 3.4. Convergence speed of the extended Hamiltonian algorithm The convergence speed of EHA is nearly quadratic in [16] and [10]. Therefore, a positive number r < 1 exists, such that lim sup
k x − xk ≤ r.
(28)
k→∞ 2
V T If H = ( ∂x∂i ∂x j ), then this expression decomposes into H = QΛQ , where Λ = diag(λ1 , . . . , λn ) with all λi > 0, then
2
x − xk ≤
n
exp(−|Re{−μ +
μ2 − 4λk }|ηk).
(29)
k=1
μ2 − 4λk = ak + ibk , |a0 − μ| = min1≤k≤n |ak − μ| and |a0 − μ| > 0. Otherwise, there exists j satisfying aj = μ, then
μ2 − 4λk = μ + ibj , λj =
(30)
1 − 2iμ bj , 4
(31)
which contradicts λj > 0. Finally, the following equation is obtained:
lim sup
n 1 k x − xk ≤ ( exp(−|ai − μ|ηk)) 2k
k→∞
i=1
≤(
n i=1
1 → exp(− |a0 − μ|η) < 1, 2 as k → ∞.
1
exp(−|a0 − μ|ηk)) 2k
X. Duan et al. / J. Math. Anal. Appl. 441 (2016) 1–10
8
Fig. 1. Contours of the iterations about η and μ in the EHA.
4. Numerical experiments Numerical experiments are performed to compare the convergence speed of EHA with that of CFPA. The error criterion used in these simulations is ρ(A(X) − B(X)) < 10−15 . The following matrix equation is considered: Q = x + AT1 xA1 + AT2 xA2 − B1T xB1 − B2T xB2 ,
(32)
where Q, A1 , A2 , B1 and B2 are as follows ⎛
⎞ ⎛ ⎞ 1 0.2 0.2 0.1 0.05 0.05 ⎜ ⎟ ⎜ ⎟ Q = ⎝ 0.2 1 0.2 ⎠ , A1 = ⎝ 0.05 0.1 0.05 ⎠ , 0.2 0.2 1 0.05 0.05 0.1 ⎞ ⎞ ⎛ ⎛ 0.5 −0.02 −0.02 0.01 0.001 0.01 ⎟ ⎟ ⎜ ⎜ A2 = ⎝ −0.02 0.5 −0.02 ⎠ , B1 = ⎝ 0.001 0.01 0.001 ⎠ , −0.02 −0.02 0.5 0.01 0.001 0.01 ⎞ ⎛ 0.1413 0.008294 0.1413 ⎟ ⎜ B2 = ⎝ 0.008294 0.1997 0.008294 ⎠ , 0.1413 0.008294 0.1413 which satisfy conditions (15) and (24). EHA is used and compared with the traditional CFPA. Q is considered to be the initial value x0 in the subsequent simulation. In the case of EHA, the contours of the iterations about η and μ are shown in Fig. 1. The iterations decrease gradually as η and μ change to the critical value. The iterations then gradually increase above this critical value. Therefore, the critical value can be selected experimentally as the optimal parameters and used in the EHA. Fig. 2 shows the convergence comparison of the EHA with the traditional CFPA. EHA needs 14 iterations to obtain the following numerical solution of Equation (32): ⎛
⎞ 1.2772 0.2323 0.1770 ⎜ ⎟ ⎝ 0.2323 1.2847 0.2323 ⎠ . 0.1770 0.2323 1.2772
(33)
X. Duan et al. / J. Math. Anal. Appl. 441 (2016) 1–10
9
Fig. 2. Comparison of convergence speeds (Log Scale).
CFPA determines the given error through 29 iterations. Thus, the convergence speed of EHA is faster than that of CFPA. 5. Conclusion The geometric structures of the manifold consisting of all symmetric positive definite matrices are used to develop, a second-order EHA to calculate the numerical solution for the linear matrix equation Q = m m x + i=1 ATi xAi − i=1 BiT xBi . The geodesic distance on the curved Riemannian manifold is considered to be an objective function in EHA. Finally, the convergence speed of EHA is compared with that of the traditional CFPA using numerical examples. The convergence speed of EHA is faster than that of CFPA. Acknowledgments The authors wish to express their appreciation to the reviewers for their helpful suggestions which greatly improved the presentation of this paper. This project is supported by the National Natural Science Foundation of China (No. 61401058) and the General Financial Grant from the China Postdoctoral Science Foundation (No. 2015M581323). References [1] F. Barbaresco, Interactions between symmetric cones and information geometrics: Bruhat–Tits and Siegel spaces models for high resolution autoregressive Doppler imagery, Lecture Notes in Comput. Sci. 5416 (2009) 124–163. [2] M. Berzig, Solving a class of matrix equations via the Bhaskar–Lakshmikantham coupled fixed point theorem, Appl. Math. Lett. 25 (2012) 1638–1643. [3] C.Y. Chiang, E.K.W. Chu, W.W. Lin, On the -Sylvester equation AX ± X B = C, Appl. Math. Comput. 218 (2012) 8393–8407. [4] T. Damm, Direct methods and ADI-preconditioned Krylov subspace methods for generalized Lyapunov equations, Numer. Linear Algebra Appl. 15 (2008) 853–871. n T T [5] X. Duan, Q. Wang, Perturbation analysis for the matrix equation x + m i=1 Ai xAi − i=1 Bi xBi = I, J. Appl. Math. 2012 (2012) 784620. [6] X. Duan, H. Sun, L. Peng, X. Zhao, A natural gradient descent algorithm for the solution of discrete algebraic Lyapunov equations based on the geodesic distance, Appl. Math. Comput. 219 (2013) 9899–9905. [7] X. Duan, H. Sun, Z. Zhang, A natural gradient descent algorithm for the solution of Lyapunov equations based on the geodesic distance, J. Comput. Math. 32 (2014) 96–103. [8] X. Duan, H. Sun, X. Zhao, Riemannian gradient algorithm for the numerical solution of linear matrix equations, J. Appl. Math. 2014 (2014) 507175. [9] S. Fiori, Learning the Fréchet mean over the manifold of symmetric positive-definite matrices, Cogn. Comput. 1 (2009) 279–291.
10
X. Duan et al. / J. Math. Anal. Appl. 441 (2016) 1–10
[10] S. Fiori, Extended Hamiltonian learning on Riemannian manifolds: theoretical aspects, IEEE Trans. Neural Netw. 22 (2011) 687–700. [11] S. Fiori, Extended Hamiltonian learning on Riemannian manifolds: numerical aspects, IEEE Trans. Neural Netw. Learn. Syst. 23 (2012) 7–21. [12] Z. Gajic, M.T.J. Qureshi, Lyapunov Matrix Equation in System Stability and Control, Academic Press, London, UK, 1995. [13] J. Jost, Riemannian Geometry and Geometric Analysis, third edition, Springer, Berlin, 2002. [14] M. Moakher, A differential geometric approach to the geometric mean of symmetric positive-definite matrices, SIAM J. Matrix Anal. Appl. 26 (2005) 735–747. [15] M. Moakher, On the averaging of symmetric positive-definite tensors, J. Elasticity 82 (2006) 273–296. [16] N. Qian, On the momentum term in gradient descent learning algorithms, Neural Netw. 12 (1999) 145–151. [17] A.C.M. Ran, M.C.B. Reurings, The symmetric linear matrix equation, Electron. J. Linear Algebra 9 (2002) 93–107. [18] M.C.B. Reurings, Symmetric matrix equations, PhD Thesis, Vrije Universiteit, Amsterdam, The Netherlands, 2003. [19] D. Rumelhart, J. McClelland, Parallel Distributed Processing, MIT Press, Cambridge, MA, 1986. [20] A. Schwartzman, Random ellipsoids and false discovery rates: statistics for diffusion tensor imaging data, PhD Thesis, Stanford University, 2006. [21] X.D. Zhang, Matrix Analysis and Application, Springer, Beijing, 2004. [22] L. Zhang, J. Lam, On H2 model reduction of bilinear systems, Automatica 38 (2002) 205–216.