Applied Mathematics and Computation 175 (2006) 557–573 www.elsevier.com/locate/amc
Krylov subspace methods for the generalized Sylvester equation q Liang Bao a, Yiqin Lin b, Yimin Wei a
c,*
Department of Mathematics, Fudan University, Shanghai 200433, PR China School of Mathematics and Computational Science, Zhongshan University, Guangzhou 510275, PR China Department of Mathematics, Fudan University, Shanghai 200433, PR China
b
c
Abstract In the paper we propose Galerkin and minimal residual methods for iteratively solving generalized Sylvester equations of the form AXB X = C. The algorithms use Krylov subspace for which orthogonal basis are generated by the Arnoldi process and reduce the storage space required by using the structure of the matrix. We give some convergence results and present numerical experiments for large problems to show that our methods are efficient. 2005 Elsevier Inc. All rights reserved. Keywords: Galerkin method; Generalized Sylvester equation; Minimal residual method; Krylov subspace
q
This work is supported by NSFC project 10471027 and Shanghai Education Commission. Corresponding author. E-mail addresses:
[email protected] (L. Bao),
[email protected] (Y. Lin),
[email protected] (Y. Wei). *
0096-3003/$ - see front matter 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2005.07.041
558
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
1. Introduction In this paper we present Krylov subspace methods for solving the generalized Sylvester matrix equation AXB X ¼ C; N N
ð1:1Þ
where A 2 R ; B2R ; and C 2 R are given matrices, and X 2 RN M is the solution matrix sought. The matrix equation (1.1) plays an important role in control and communications theory; see [1] and the references therein. The analytical solution of the matrix equation (1.1) has been considered by many authors; see [3]. Direct methods for solving the matrix equation (1.1) such as those proposed in [3,4] are attractive if the matrices are of small size. These methods are based on the Schur decomposition, by which the original equation is transformed into a form that is easily solved by a forward substitution. Let X = [x1, x2, . . . , xM], where xi is the ith column of X. We define a linear operator vec : RN M ! RMN , MM
N M
vecðX Þ ¼ ½xT1 ; xT2 ; . . . ; xTM T . Hence the generalized Sylvester equation (1.1) can be written as systems of linear equations AvecðX Þ ¼ vecðCÞ;
ð1:2Þ
where A ¼ BT Af I MN 2 RMN MN , IMN denotes the MN · MN identity matrix and denotes the Kronecker product, see [2]. Let k(A) be the spectrum of A, and k(B) the spectrum of B. We have kðAÞ ¼ fki kj 1 : ki 2 kðAÞ; kj 2 kðBÞ; i ¼ 1; 2; . . . N ; j ¼ 1; 2; . . . ; Mg; which shows that the generalized Sylvester (1.1) has a unique solution if and only if kikj 5 1, for ki 2 k(A), kj 2 k(B), i = 1, 2, . . . , N; j = 1, 2, . . . , M. In this paper we assume that this condition is satisfied. For two matrices X ; Y 2 Rmn , we define the following inner product h Æ , Æ iF = tr(XTY), where tr(Æ) denotes the trace of the matrix XTY. The associated norm is the Frobenius norm denoted by k Æ kF. enðnÞ denotes the nth coordinate vector of Rn . In the paper we propose the Galerkin and minimal residual algorithms for iteratively solving the generalized Sylvester matrix equation (1.1). The methods are based on the Arnoldi process [2] which allows us to construct an orthogonal basis of certain Krylov subspace and simultaneously reduce the order of the generalized Sylvester equation (1.1). The small generalized Sylvester equation obtained can be solved by direct methods or iterative methods. In our presentation and analysis of the algorithms, we use the form (1.2) of (1.1).
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
559
The remainder of the paper is organized as follows. In Section 2, we propose a Galerkin algorithm and a minimal residual algorithm for the solution of generalized Sylvester equation (1.1). Section 3 we discuss the result on the convergence of the algorithms introduced in Section 2. In Section 4, we present some numerical tests. Finally, we draw conclusions.
2. Galerkin method and minimal residual method for the generalized Sylvester equation The full orthogonalization method, which is a Galerkin method introduced by Saad [8], and the GMRES method, which is a minimal residual method introduced by Saad and Schultz [9]. Both methods use the Arnoldi process to compute an orthonormal basis of certain Krylov subspace Kk ðA; vÞ :¼ spanfv; Av; . . . ; Ak1 vg. 2.1. The Arnoldi process In this subsection we review the Arnoldi process [2]. The algorithm is described as follows. Algorithm 2.1 (Arnoldi process) 1. Choose v; Let v1 = v/kvk2; 2. For j = 1, 2, . . . , k w = Avj; hi;j ¼ vTi w, w = w hi,jvi, for i = 1, 2, . . . , j; hj+1,j = kwk2; if hj+1,j = 0, then stop; vj+1 = w/hj+1,j; End For The vectors vjs computed by Arnoldi process are often called Arnoldi vectors. In the algorithm and below, k Æ k2 denotes the Euclidean vector norm or the subordinate matrix norm. We will also use the Frobenius matrix norm, denoted by k Æ kF. Algorithm 2.1 uses the modified Gram-Schmidt process to compute the orthonormal basis {v1, v2, . . . , vk} of the Krylov subspace Kk ðA; vÞ. We introduce the matrices V j :¼ ½v1 ; v2 ; . . . ; vj ;
j 2 fk; k þ 1g;
560
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
and let H k 2 Rkk be the upper Hessenberg matrix, whose nontrivial elements are given by the coefficients hi,j determined by Algorithm 2.1. We also define k 2 Rðkþ1Þk , whose nontrival entries are hi,j, the Hessenberg-like matrix H 1 6 i 6 j + 1, 1 6 j 6 k. Then k. AV k ¼ V kþ1 H
H k ¼ V Tk AV k ;
2.2. Galerkin method and minimal-residual method for generalized Sylvester equation Because the matrices in (1.2) in applications can be very large, it is difficult to store a basis of the Krylov subspace Kk ðBT A I; r0 Þ in fast computer memory for large k that yield an acceptable rate of convergence. In this section, we propose Galerkin and minimal residual algorithms for the solution of (1.2) that reduce the storage space required by using the structure of the matrix in (1.2). We replace the Krylov subspace Kk ðBT A I; r0 Þ with a subspace of the form Km ðBT ; gÞ Kn ðA; f Þ, where f 2 RN and g 2 RM . The subspace of the form Km ðBT ; gÞ Kn ðA; f Þ was used for solving the Sylvester equation in [7]. n By using the Arnoldi process, we generate orthonormal basis fvj gj¼1 and m fwj gj¼1 of Kn ðA; f Þ and Km ðBT ; gÞ, respectively. In Section 2.3 we will discuss the strategies for selecting f and g. We introduce the matrices Vi :¼ [v1, v2, . . . , vi], Wj :¼ [w1, w2, . . . , wj] for i 2 {n, n + 1}, and j 2 {m, m + 1}. The Arnoldi process yields the entries of A the upper Hessenberg matrices HA and HB and Hessenberg-like matrices H and H B that satisfy H A ¼ V Tn AV n ; H B ¼ W Tm BT W m ; A ; BT W m ¼ W mþ1 H B; AV n ¼ V nþ1 H ðAÞ
T
AV n ¼ V n H A þ hnþ1;n vnþ1 ðenðnÞ Þ ;
ðBÞ
T
BT W m ¼ W m H B þ hmþ1;m wmþ1 ðeðmÞ m Þ .
Thus, the columns of the matrix Wm Vn form an orthonormal basis of Km ðBT ; gÞ Kn ðA; f Þ. Let x0 be an initial approximate solution of the linear system (1.2), and introduce the residual vector r0 ¼ c ðBT AÞx0 x0 ; where c = vec(C). We wish to determine a correction z0 2 Km ðBT ; gÞ Kn ðA; f Þ of x0 and obtain a new approximate solution x1 = x0 + z0 of (1.2). The correction z0 can be written as z0 ¼ ðW m V n Þy;
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
561
where y 2 Rmn . Thus, the new residual vector becomes r1 ¼ r0 ðBT AÞðW m V n Þy ðW m V n Þy ¼ r0 ðBT W m Þ ðAV n Þ y þ ðW m V n Þy B Þ ðV nþ1 H A ÞÞy þ ðW m V n Þy. ¼ r0 ððW mþ1 H
ð2:1Þ
According to the Galerkin condition r1 ? spanfW m V n g we obtain T T B Þ ðV nþ1 H A ÞÞy þ ðW m V n Þy Þ. ðW m V n Þ r0 ¼ ðW m V n Þ ðððW mþ1 H
So we have ðW m V n ÞT r0 ¼ ðH B H A Þy y;
ð2:2Þ
H A YH TB Y ¼ V Tn R0 W m ;
ð2:3Þ
i.e., where R0 2 RN M satisfies r0 = vec(R0) and Y 2 Rnm satisfies y = vec(Y). In order to get y, we need to solve the smaller linear system of equation (2.2) or the smaller generalized Sylvester equation (2.3). If the initial approximation x0 = 0 and c = g f, the following result shows that the approximation X 1 ¼ V n YW Tm is an exact solution of a perturbed generalized Sylvester equation. Theorem 2.1. Assume the initial approximation x0 = 0 and c = g f. If n steps of the Arnoldi algorithm for matrix A and m steps of the Arnoldi algorithm for matrix B have been run, then ðA F A ÞX 1 ðB F B Þ X 1 ¼ C; where F A ¼
ðAÞ hnþ1;n vnþ1 vTn
and F B ¼
ð2:4Þ
ðBÞ hmþ1;m wm wTmþ1 .
Proof. Multiplying Eq. (2.3) on the left by Vn and on the right by W Tm . Using ðAÞ
ðBÞ
T
the relations AV n ¼ V n H A þ hnþ1;n vnþ1 ðenðnÞ Þ and BT W m ¼ W m H B þ hmþ1;m wmþ1 T ðeðmÞ m Þ ; we get T ðAÞ ðBÞ T T Þ V n YW Tm ¼ C. AV n hnþ1;n vnþ1 ðenðnÞ Þ Y BT W m hmþ1;m wmþ1 ðeðmÞ m Now as Vn and Wm are orthonormal matrices, it follows that ðBÞ
ðAÞ
T
T ðnÞ T AX 1 B hmþ1;m AX 1 W m eðmÞ m wmþ1 hnþ1;n vnþ1 ðen Þ V n X 1 B ðAÞ
ðBÞ
T þ hnþ1;n hmþ1;m vnþ1 ðenðnÞ ÞT V Tn X 1 W m eðmÞ m wmþ1 X 1 ¼ C.
562
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573 ðAÞ
ðBÞ
Define F A ¼ hnþ1;n vnþ1 vTn and F B ¼ hmþ1;m wm wTmþ1 , therefore ðA F A ÞX 1 ðB F B Þ X 1 ¼ C. ðAÞ
ðBÞ
Remark 2.2. If hnþ1;n ¼ hmþ1;m ¼ 0, X1 is just the exact solution of the generalized Sylvester equation It follows (2.1) that B H A Þy ðW m V n Þy Þ r1 ¼ r0 ððW mþ1 V nþ1 ÞðH T ¼ I ðW mþ1 V nþ1 ÞðW mþ1 V nþ1 Þ r0 þ ðW mþ1 T B H A Þy þ ðI mþ1;m I nþ1;n Þy ; V nþ1 Þ ðW mþ1 V nþ1 Þ r0 ðH ð2:5Þ where Im+1,m = [dj,k]16j6m+1, 16k6m and dj,k is the Kronecker d-function. The minimal residual method requires T kr1 k2 ¼ min I ðW V ÞðW V Þ r0 þ ðW mþ1 V nþ1 Þ mþ1 nþ1 mþ1 nþ1 mn y2R
T B H A Þy þ ðI mþ1;m I nþ1;n Þy ðW mþ1 V nþ1 Þ r0 ðH 2 T ¼ I ðW mþ1 V nþ1 ÞðW mþ1 V nþ1 Þ r0 2
T
B H A Þy þ ðI mþ1;m I nþ1;n Þyk2 . þ min kðW mþ1 V nþ1 Þ r0 ðH mn y2R
In order to get y, we only need to solve the linear least square problem T B H A Þy þ ðI mþ1;m I nþ1;n Þyk2 . min kðW mþ1 V nþ1 Þ r0 ðH
y2Rmn
Obviously, we can solve the linear least square problem AY H T þ I nþ1;n YI T min kV Tnþ1 R0 W mþ1 H B mþ1;m kF .
Y 2Rnm
We define a linear operator S : Rn;m ! Rðnþ1Þðmþ1Þ , AY H T I nþ1;n YI T SðY Þ :¼ H B mþ1;m . As for the inner product S T : Rðnþ1Þðmþ1Þ ! Rn;m ,
h Æ , Æ iF,
the
adjoint
operator
of
S
is
T
B I T Y I mþ1;m ; Y H S T ðY Þ :¼ H nþ1;n A where Y 2 Rðnþ1Þðmþ1Þ . So, we get the normal equation S T ðSðY ÞÞ ¼ S T ðV Tnþ1 R0 W mþ1 Þ;
ð2:6Þ
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
563
where T
SðY ÞH B I T SðY ÞI mþ1;m S T ðSðY ÞÞ ¼ H nþ1;n A T
T
T ½H AY H I nþ1;n YI T ¼H A B mþ1;m H B I nþ1;n
AY H T I nþ1;n YI T ½H B mþ1;m I mþ1;m T T T TH ¼H A A Y H B H B H A YH B H A YH B þ Y
and B V T R0 W m . T V T R0 W mþ1 H S T ðV Tnþ1 R0 W mþ1 Þ ¼ H n A nþ1 In order to solve the normal equation (2.6), we can use the following global CG method [5]. Let C denote the linear operator C : Rnm ! Rnm , and it is symmetric positive definite about the inner product h Æ , Æ iF. The global CG algorithm for the solution of the equation is CðY Þ ¼ D; where Y ; D 2 Rnm , is as follows. Algorithm 2.2 (Global CG) 1. Choose an initial matrix Y0 and compute the initial residual matrix R0 ¼ D CðY 0 Þ, P0 = R0; 2. For j = 0, 1, . . . , jmax aj ¼ hRj ; Rj iF =hCðP j Þ; P j iF ; Yj+1 = Yj + aj Pj; Rjþ1 ¼ Rj aj CðP j Þ; bj = hRj+1,Rj+1iF/hRj,RjiF; Pj+1 = Rj+1 + bj Pj. End. We give the restarted minimal residual method for solving the generalized Sylvester equation as follows. Algorithm 2.3 (Restarted minimal residual method) 1. Start: Choose initial approximate solution X0, tolerance > 0, m, n > 0. 2. Compute the initial residual matrix R0 :¼ C AX0B + X0, if kR0kF 6 , then stop.
564
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
3. Choose f 2 RN ; g 2 RM by using Algorithm 2.4 below. Apply the Arnoldi process to compute orthonormal basis of Knþ1 ðA; f Þ and Kmþ1 ðBT ; gÞ. A; H B ;V n ; V nþ1 ; W m ; W mþ1 . We obtain matrices H 4. Form the approximate solution X 1 ¼ X 0 þ V n YW Tm ; where Y is the solution of the normal equation (2.6). AY H T W T þ V n YW T . If 5. Compute the new residual matrix R1 ¼ R0 V nþ1 H B mþ1 m kR1kF 6 , then stop; else let X0 :¼ X1, R0 :¼ R1, go to 3.
Remark 2.3. The Galerkin algorithm is the same as the minimal residual algorithm except that Y is the solution of (2.3) in step 4. 2.3. Choice of subspace Hu and Reichel [7] proposed how to choose the two vectors f and g in step 3 of Algorithm 2.3. Ideally, we would like to choose f and g so that r0 2 Km ðBT ; gÞ Kn ðA; f Þ. However, such vectors are difficult to determine. Instead, we may seek to determine vectors f ; g 2 R, so that kr0 g fk2 is small, or equivalently, so that kR0 fgTkF is small, where vec(R0) = r0. Let r1 be the largest singular value of R0, and let ql and qr be the associated left and right singular vectors of unit length. Then min kR0 fgT kF ¼ kR0 r1 ql qTr kF .
ð2:7Þ
The solution of the minimization problem (2.7) requires in general too much arithmetic work to be attractive. An approximate solution can be computed as follows. Assume that f 5 0 and g 5 0. Introduce the function 2
T ðf ; gÞ :¼ kR0 fgT kF .
ð2:8Þ
Then oT ¼0 ofj
8j () f ¼
oT ¼0 ogj
8j () g ¼
R0 g
ð2:9Þ
kgk22
and R0 f 2
kf k2
.
ð2:10Þ
Thus, given a vector g, we obtain from (2.9) a vector f that minimizes f ! T(f, g). Conversely, given a vector f, we obtain from (2.10) a vector g that minimizes g ! T(f, g). Algorithm uses Eqs. (2.9) and (2.10) to determine vectors f and g that provide an approximate solution of (2.7). The algorithm for choosing f and g is as follows [7].
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
565
Algorithm 2.4 (Choose f and g) Given R0, compute approximate solution f, g. • If kR0k1 P kR0k1 then let f be a column of R0 of largest norm, and determine g from (2.10). • If kR0k1 < kR0k1 then let g be a row of R0 of largest norm, and determine f from (2.9). 3. Analysis of the convergence of the minimal residual method We consider the convergence of the Algorithm 2.3 of Sections 2 in the special case when the residual vector satisfies r0 ¼ g f .
ð3:1Þ
We derive an upper bound for the residual error (2.1). This bound shows how the rate of convergence depending on k(A) and k(B). Our results are most easily obtained by expressing the residual error (2.1) as a matrix. Define the matrix R0 2 RN M , which satisfies r0 ¼ vecðR0 Þ. It follows that R0 ¼ f gT ¼ kf k2 kgk2 v1 wT1 . So ðnþ1Þ ðmþ1Þ T T AY H T þ I nþ1;n YI T R1 ¼ V nþ1 kf k2 kgk2 e1 ðe1 Þ H mþ1;m W mþ1 . B We have
2 ðnþ1Þ ðmþ1Þ T AY H T þ I nþ1;n YI T kR1 k2F ¼ kf k2 kgk2 e1 ðe1 Þ H B mþ1;m F 2 ðnÞ ðmÞ T ¼ kf k2 kgk2 e1 ðe1 Þ H A YH TB þ Y F 2 2 ðBÞ ðAÞ ðnÞ T T þ hmþ1;m H A YeðmÞ þ h ðe Þ YH nþ1;n m n B F F 2 ðAÞ ðBÞ ðnÞ T ðmÞ þ hnþ1;n hmþ1;m ðen Þ Yem .
ð3:2Þ
The minimal residual method determines a matrix Yb that solves T ðnþ1Þ ðmþ1Þ T T kf k kgk e ðe Þ H Y H þ I YI min A nþ1;n 1 2 2 1 B mþ1;m . nm Y 2R
F
For future reference, we define the residual matrix T ^ 1 :¼ V nþ1 kf k2 kgk2 e1ðnþ1Þ ðe1ðmþ1Þ ÞT H T þ I nþ1;n Yb I T A Yb H R B mþ1;m W mþ1 . We also introduce matrices Y ðaÞ 2 Rnm for a 2 R that solve ðnÞ
ðmÞ
H A YH TB Y ¼ akf k2 kgk2 e1 ðe1 ÞT ;
ð3:3Þ
566
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
and define the associated residual matrices ðaÞ ðnþ1Þ ðmþ1Þ T T T þ I nþ1;n Y ðaÞ I T A Y ðaÞ H R1 :¼ V nþ1 kf k2 kgk2 e1 ðe1 Þ H mþ1;m W mþ1 . B ð3:4Þ ðaÞ ^ 1 kF , since We study the matrices R1 and thereby obtain an upper bound for kR ðaÞ
^ 1 kF 6 min kR1 kF . kR
ð3:5Þ
a2R
Note that the Galerkin algorithm determines the matrix Y(1) when (3.1) holds. First we derive expressions for the last three terms in (3.2) when Y = Y(a) in terms of characteristic polynomials of HA and HB. We assume that all subdiðAÞ ðBÞ agonal elements hk;kþ1 and hk;kþ1 are nonzero. Multiplying the both sides of (3.3) by eðmÞ m , we obtain ðBÞ
ðmÞ
ðmÞ ðaÞ ðmÞ em ¼ 0. H A Y ðaÞ ðhm;m1 em1 þ hðBÞ m;m em Þ Y
It follows that ðBÞ
ðmÞ
ðaÞ ðmÞ em ¼ H A Y ðaÞ hm;m1 em1 . ðhðBÞ m;m H A I n ÞY
Thus, ðBÞ
ðmÞ
ðBÞ ðaÞ ðmÞ em ¼ hm;m1 H A Y ðaÞ em1 . ðH 1 A hm;m IÞH A Y ðBÞ
ð3:6Þ 1
After introducing the polynomial pm ðtÞ :¼ ðhm;m1 Þ ðt hðBÞ m;m Þ, (3.6) can be rewritten as ðmÞ
ðaÞ ðmÞ em ¼ H A Y ðaÞ em1 . pm ðH 1 A ÞH A Y ðmÞ
Multiplying the both sides of (3.3) by em1 , we obtain ðBÞ
ðmÞ
ðBÞ
ðmÞ
ðBÞ
ðmÞ
ðaÞ em1 ¼ 0. H A Y ðaÞ ðhm1;m2 em2 þ hm1;m1 em1 þ hm1;m eðmÞ m ÞY
It follows that ðBÞ
ðmÞ
ðBÞ
ðBÞ
ðmÞ
ðaÞ em2 . ðhm1;m1 H A I n ÞY ðaÞ em1 þ hm1;m H A Y ðaÞ eðmÞ m ¼ hm1;m2 H A Y
Therefore, 1 ðBÞ ðBÞ ðmÞ ðaÞ ðmÞ em1 hm1;m pm ðH 1 H A Y ðaÞ em1 ðH 1 A hm1;m1 IÞH A Y A Þ ðBÞ
ðmÞ
¼ hm1;m2 H A Y ðaÞ em2 . ðBÞ
1
ð3:7Þ ðBÞ
ðBÞ
1
Let pm1 ðtÞ :¼ ðhm1;m2 Þ ðt hm1;m1 hm1;m ðpm ðtÞÞ Þ; (3.7) can be written as ðmÞ
ðmÞ
ðaÞ em1 ¼ H A Y ðaÞ em2 . pm1 ðH 1 A ÞH A Y
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
567
Similarly, we obtain for k = m 2, m 3, . . . , 2 that ðmÞ
ðmÞ
ðaÞ ek pk ðH 1 A ÞH A Y
¼ H A Y ðaÞ ek ;
where pk ðtÞ :¼
ðBÞ ðhk;k1 Þ1
t
ðBÞ hk;k
m X
i Y
ðBÞ hk;i
i¼kþ1
! ðpj ðtÞÞ
1
ð3:8Þ
.
j¼kþ1
Finally, ðmÞ
ðmÞ
ðaÞ e1 ¼ akf k2 kgk2 e1 ; p1 ðH 1 A ÞH A Y
where p1(t) is given by (3.8) with k = 1 and h1,0 = 1. We define the function ðBÞ
UB ðtÞ :¼ ðhmþ1;m Þ
1
m Y
pi ðtÞ.
ð3:9Þ
i¼1
Then ðBÞ
ðnÞ
ðaÞ ðmÞ em Þ ¼ akf k2 kgk2 e1 . UB ðH 1 A Þðhmþ1;m H A Y
ð3:10Þ
In [7], the following two lemmas are given. Lemma 3.1. The function UB defined by (3.9) can be expressed by WH ðtÞ UB ðtÞ ¼ Qmþ1B ðBÞ ; i¼2 hi;i1
ð3:11Þ
where WH B ðtÞ :¼ detðtI H B Þ is the characteristic polynomial of HB. Lemma 3.2 mþ1 Y
ðBÞ
hi;i1 ¼ kWH B ðBT Þw1 k2 .
ð3:12Þ
i¼2
With (3.10)–(3.12), we can show that ðBÞ
2
2
2
2
1 ðnÞ 2
2 T 1 khmþ1;m H A Y ðaÞ eðmÞ m k2 ¼ a kf k2 kgk2 kWH B ðB Þw1 k2 kðWB ðH A ÞÞ e1 k2
ð3:13Þ and 2 ðBÞ ðAÞ T 2 2 2 ðAÞ 2 2 T hmþ1;m hnþ1;n ðenðnÞ Þ Y ðaÞ eðmÞ m ¼ a kf k2 kgk2 kWH B ðB Þw1 k2 ðhnþ1;n Þ 2 T 1 1 ðnÞ ðenðnÞ Þ H 1 ðW ðH ÞÞ e . B 1 A A
ð3:14Þ
568
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
Analogously, we obtain ðAÞ
T
2
2
2
2
1 ðmÞ 2
khnþ1;n ðenðnÞ Þ Y ðaÞ H TB k2 ¼ a2 kf k2 kgk2 kWH A ðAÞv1 k2 kðWA ðH T B ÞÞ e1 k2 . ð3:15Þ Theorem 3.3. Let Y(a) be a solution of (3.3), and let ðnÞ
ðmÞ
1 2 2 2 T 1 c ¼ kWH B ðBT Þw1 k22 kðWB ðH 1 A ÞÞ e1 k2 þ kWH A ðAÞv1 k2 kðWA ðH B ÞÞ e1 k2 2 2 ðAÞ 2 T 1 1 1 ðnÞ þ kWH B ðBT Þw1 k2 ðhnþ1;n Þ ðeðnÞ Þ H ðW ðH ÞÞ e . B 1 n A A
Then ðBÞ
ðAÞ
2 ðnÞ T ðaÞ T 2 khmþ1;m H A Y ðaÞ eðmÞ H B k2 m k2 þ khnþ1;n ðen Þ Y 2 ðAÞ ðBÞ T 2 2 2 þ hnþ1;n hmþ1;m ðenðnÞ Þ Y ðaÞ eðmÞ m ¼ a ckf k2 kgk2 .
ð3:16Þ
Proof. The result follows from (3.13)–(3.15). h The expression (3.16) with a = 1 bounds the norm of the residual matrix ROR 1 obtained by the Galerkin algorithm. Let the initial residual matrix R0 satisfy r0 = vec(R0), where r0 is defined by (3.1). Then kR0 kF ¼ kf k2 kgk2 and (3.16) yields 2
kROR 1 kF kR0 k2F
¼ c.
Thus, the Galerkin method converges rapidly if c is small. Theorem 3.3 yields ^ 1 . We obtain an upper bound for the residual error R ^ 1 k2 6 ðð1 aÞ2 þ a2 cÞkR0 k2 . kR F F
ð3:17Þ
The right-hand side of (3.17) is minimized for a = (1 + c)1. Substituting this value of a into (3.17) yields 2
^1k kR F 2
kR0 kF
6
c . 1þc
ð3:18Þ
The inequality (3.18) shows that the minimal residual algorithm yields rapid convergence when c is small.
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
569
4. Numerical experiments In this section, we present some numerical examples to illustrate the effectiveness of Algorithm 2.3 for large and sparse generalized Sylvester equations. Let SYL-MR(n, m) denote Algorithm 2.3, where n, m denote the dimension of the Krylov subspaces for A and BT, respectively. GMRES(k) method is based on the systems of linear equations (BT A IMN)vec(X) = vec(C), where k denotes the dimension of the Krylov subspace for BT A IMN. In the following examples, we compare SYL-MR(n, m) with restart GMRES(k) method. We call the iterations between restarts a cycle. The main advantage of SYL-MR(n, m) compared to GMRES(k) is in expense and storage in each cycle. We compare these below. Table 4.1 has a rough count of the expenses of SYL-MR(n, m) and GMRES(k) in one cycle. Only the major expenses are considered. First, we compare the storage requirements of the two methods. SYL-MR(n, m) needs a storage requirement of 2NM + O(M) + O(N) locations, while GMRES(k) needs a storage requirement of (k + 3)NM + O(NM) locations. It is not difficult to see that the storage requirement of the GMRES(k) is much greater than it for SYL-MR(n, m). Second, we compare the computation expenses. Let mvps denote the number of matrix vector product in each cycle and flops denote other computation flop count for the two methods in each cycle. From Table 4.1, we can see that GMRES(k) needs much more mvps and flops than SYL-MR(n, m) in each cycle. So, SYL-MR(n, m) is better than GMRES(k) for both storage requirement and computation cost in one cycle. All numerical experiments are performed on an AMD 1.4 GHZ PC with main memory 512 MB. We use MATLAB 6.5 with machine precision l = 2.22 · 1016. We use the zero initial approximation for both GMRES(k) and SYL-MR(n, m). The stopping criterion for two methods is ^ 1 kF kR 6 1e 6. kR0 kF
Example 1. As a test matrix, we generate a 64 by 64 bidiagonal matrix with entries 2, 2, 3, 4, . . . , 64 on the main diagonal and 1 on each super diagonal
Table 4.1 mvps and flops in each cycle Method
mvps for A
mvps for B
flops
SYL-MR(n, m) GMRES(k)
n kM
m kN
6mNM + 4n2m(n2 + 5mn + 4m2) 2k2NM
570
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
position. The matrix B is the same as A. So the matrix BT A I is size of 4096 · 4096. The right-hand side of the generalized Sylvester equation is C = fgT, where the elements of the vectors f, g are random values uniformly distributed on [0, 1]. GMRES(k) uses k = 5 and SYL-MR(n, m) uses n = m = 5. See Fig. 4.1 for a graph of the convergence. SYL-MR(5, 5) needs 580 iterations for convergence but GMRES(5) needs 1860 iterations for convergence, see Table 4.2. So, SYL-MR(5, 5) is better than GMRES(5) for this example. Example 2. For the second experiment, we use a 64 by 64 tridiagonal matrix A with 1 d in each super diagonal position, 1 + d for the subdiagonal elements, and 4 on each main diagonal position. This matrix is considered by Jbilou [6]. The matrix B is still the same as A. We set d = 5. The right-hand side of the
0
–1
log(||R|| /||R || ) F 0 F
–2
–3 GMRES(5) –4
SYL–MR(5.5)
–5
–6
–7
0
200
400
600
800 1000 1200 Number of iterations
1400
1600
1800
2000
Fig. 4.1. Comparison for Example 1.
Table 4.2 Example 1, SYL-MR(5, 5) vs. GMRES(5) Method
Iter
mvps for A
mvps for B
CPU
Res
SYL-MR(5, 5) GMRES(5)
580 1860
580 119,040
580 119,040
1.07 4.74
9.76e7 9.99e7
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
571
generalized Sylvester equation is C = fgT, where the elements of the vectors f, g are all 1s. GMRES(k) uses k = 25 and SYL-MR(n, m) uses n = m = 5. Thus subspaces of the same dimension are used. See Fig. 4.2 for a graph of the convergence. In Table 4.3, we give the results obtained with the two methods. We compare the number of iterations, the CPU-time (s) required for convergence, and the relative residual kRkF/kR0kF. It is clearly seen from Table 4.3 that the performance of SYL-MR(5, 5) is much better than GMRES(25). Example 3. We now do the same experiment as in the previous example except that d is 8 instead of 5. See Fig. 4.3 for the computational results. After about 2375 iterations SYL-MR(5, 5) has a residual norm of 9.91e7 compared to 1.2e1 for GMRES(25). The convergence rate of SYL-MR(5, 5) is fairly rapid, but GMRES(25) almost stalls out in this situation, see Table 4.4. 0
–1
log(||R||F/||R0||F)
–2
–3
GMRES(25)
–4
SYL–MR(5.5) –5
–6
–7
0
500
1000
1500
2000 2500 Number of iterations
3000
3500
4000
Fig. 4.2. Comparison for Example 2.
Table 4.3 Example 2, SYL-MR(5, 5) vs. GMRES(25) Method
Iter
mvps for A
mvps for B
CPU
Res
SYL-MR(5, 5) GMRES(25)
1260 3825
1260 244,800
1260 244,800
1.58 14.53
9.96e7 9.43e7
572
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573 0
–1
GMRES(25)
log(||R||F/||R0||F)
–2
–3
–4 SYL–MR(5.5) –5
–6
–7
0
500
1000 1500 Number of iterations
2000
2500
Fig. 4.3. Comparison for Example 3.
Table 4.4 Example 3, SYL-MR(5, 5) vs. GMRES(25) Method
Iter
mvps for A
mvps for B
CPU
Res
SYL-MR(5, 5) GMRES(25)
2375 –
2375 –
2375 –
3.22 –
9.91e7 0.1174
5. Conclusion In this paper we propose two iterative methods for the solution of generalized Sylvester equation. These methods reduce the given generalized Sylvester equation to a generalized Sylvester equation of smaller size by applying the Arnoldi process. We present Galerkin and minimal residual methods and obtain results on the convergence of the two methods, which is more appropriate for large and sparse generalized Sylvester equations than the usual Krylov subspace methods. Numerical experiments presented to show the effectiveness of the proposed methods.
L. Bao et al. / Appl. Math. Comput. 175 (2006) 557–573
573
References [1] B.N. Datta, K. Datta, Theoretical and computational aspects of some linear algebra problems in control theory, in: C.I. Byrnes, A. Lindquist (Eds.), Computational and Combinatorial Methods in Systems Theory, Elsevier, Amsterdam, 1986, pp. 201–212. [2] J.W. Demmel, Applied Numerical Linear Algebra, SIAM, 1997. [3] G.H. Golub, S. Nash, C.F. Van Loan, A Hessenberg–Shur method for the problem AX + XB = C, IEEE Trans. Automat. Control 24 (1979) 909–913. [4] G.H. Golub, C.F. Van Loan, Matrix Computations, third ed., Johns Hoplins U.P., Baltimore, 1996. [5] A. El Guennouni, K. Jbilou, A.J. Riquet, Block Krylov Subspace methods for solving large Sylvester equations, Numer. Algorithms 29 (2002) 75–96. [6] K. Jbilou, Block Krylov subspace methods for large algebraic Riccati equations, Numer. Algorithms 34 (2003) 339–353. [7] D.Y. Hu, L. Reichel, Krylov-subspace methods for the Sylvester equation, Linear Algebra Appl. 172 (1992) 283–313. [8] Y. Saad, Krylov subspace methods for solving large unsymmetric linear systems, Math. Comput. 37 (1981) 105–126. [9] Y. Saad, M.H. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Statist. Comput. 7 (1986) 856–869.