A~L.nED MATHEMATIC5 AND
CO~JTATK)N EI.~EVIER
Applied Mathematics and Computation 98 (1999) 199-208
A new implementation of EN method Weiguo Gao *, Jungong Xue, Yanyun Qu Institute of Mathematics, Fudan University, Shanghai 200433, China
Abstract EN method proposed by Eirola and Nevalinna can be more efficient than GMRES in some cases, but be not the case in general, which has been studied by Vuik and van der Vorst. Comparison of the efficiency of the two methods in practice shows that GMRES is preferable. A new implementation of EN method with significant saving both in the amount of work and memory requirements is introduced in the present paper. Numerical experiments indicate that this new version is competitive. © 1999 Elsevier Science Inc. All rights reserved. Keywords: EN method; GMRES method; Nonsymmetric linear system
1. Introduction F o r solving the large sparse nonsymmetric linear system A x = b,
A E ~NxN,
X, b E R N
with A nonsingular, Eivola and Nevalinna [1] proposed a new iterative m e t h o d which takes different splitting o f the matrix A = 11[ 1 - Rk in each iteration step. It can be illustrated as follows:
Algorithm I ( E N method) 1. Start: Given xo, H0, c o m p u t e ro = b - A x o . 2. Iterate: F o r k = 0, 1 , 2 , . . . 2.1.cti = c T ( r k - AHork), for i = O , . . . , k -
1,
k-I
k-I
rlk = Hork + Z o ¢ i u i , ~k = rk - AHork - Z o t i c i , i--O
i=0
° Corresponding author. 0096-3003/99/$ - see front matter © 1999 Elsevier Science Inc. All rights reserved. PII: S 0 0 9 6 - 3 0 0 3 ( 9 7 ) 1 0 1 5 7 - 6
IV.. Gao et al. I Appl. Math. Comput. 98 (1999) 199-208
200
2.2. /~i = c~AHo¢k, for i = 0 , . . . , k - 1, k-I
Uk = r ( H o ~ k - Z f l , u i ) , i=0
k-1
ck = z ( A H o ~ k - Z f l i c , ) , i=0
where z is such that Ilckll2 = 1, 2.3. xk+t = xk +tlk + ukc Tk ¢k, rk+t = ~k -- CkCkT ~k~ if IIrk+lll2 is small enough, then STOP. Note that in steps 2.1 and 2.2 of Algorithm I, the computation of vectors ~k and ck are made orthogonal by the Gramm-Schmidt process. Vuik and van der Vorst [2] suggested that using the modified one instead of the original will have better numerical behavior. An alternative computation of Ck and e~, i.e. ~ = rk - Arl~, ck = Auk can be used in the situations where it is more efficient to multiply one vector by A instead of computing a linear combination of k + 1 vectors. In exact arithmetic, Algorithm I gives the solution in at most n steps under the assumption that all Ilk = I - (I - ~~ik=--~ tiC* ) (I -- AHo) are nonsingular. The following property may be helpful in checking whether the assumption is satisfied.
Proposition 1 [1]. I f AHo is positive (or negative) definite, then all Ilk are nonsingular.
Vuik and van der Vorst pointed that Algorithm I is not scaling invariant and they specified a scaling invariant version which seems to be better than Algorithm I. For details, we refer to [2]. Comparison of the efficiency of EN and GMRES [3] can also be found in [2]. We outline it here briefly. Denote EN(k) as k steps of EN and GMRES(2k) as 2k steps of GMRES, respectively. EN(k) always need slightly less amount of work and memory requirements. The inner products and vector updates of the EN method can be computed in parallel. On the other hand, we have II~N[I2/> IIr~MRESll 2. Thus the EN method requires more matrix-vector multiplications than GMRES to achieve the same accuracy in general. Another drawback of EN method is that there are examples for which GMRES converges but EN fails to converge because of the singularity of Hk as described before. Vuik and van der Vorst have proved the following proposition. Proposition 2 [2]. I f AHo is neither positive nor negative definite on Im(l -AH0), then there exists a right-hand side vector b such that Hi is singular.
2. Motivation In order to avoid additional computation, we need to save both vectors {uk} and {ck} although the simple relations ck = A u k hold for all k.
W.. Gao et al. I Appl. Math. Comput. 98 (1999) 199-208
201
The natural question is raised: Can we only save either uk or ck in each iteration? A similar phenomenon appears in some other methods, such as ORT H O M I N [4], O R T H O D I R [5], Axelsson's method [6], G C R [7], etc. G M R E S [3] is just the method which gives the same solution mathematically and needs only half memory requirements. Indeed, GMRES first constructs a basis of the Krylov subspace {/)0, Vl, ''" , ~k-l}
=
Jg'k(A,ro):= {ro,Aro,... ,Ak-'ro)
without computing xi, i = 1 , . . . , k - 1. When GMRES has converged, without loss of generality, suppose at kth iteration, we then compute Xk with xk = xo + (Vo, v l , . . . , Vk-l)yk, where yk is such that the responding residual norm Ilrkll2 is minimalized. Motivated by the same idea, we concentrate on finding a basis of some suitable Krylov subspace. Since H0 can be viewed as the preconditioner, we have
Xk+l = Xl + rlk + UkCVk~ k
= go +
l.i + uic il i=0 k
= (Xo + Horo) + UoC~¢o + Z ( t l i + uic+i¢i) i=1
= (xo + noro)
+ no(noluo,noltll ,Bolus,..., Holth:,Holuk)Y2k+l,
where
Ic 4o\ 1
I
cT¢, I Y2k+l =
1
I
Fortunately, it is easy to verify the following proposition.
Proposition 3. I f Ilk is nonsingular and 4k ¢ O, then 3¢I2k+1(AH0, 40) = span{C0, c0, ¢1, cl, 41,..., Ck-l,Ck-I} = span{40, co, AHoco, cl, AHocl,..., AHock_:, ck-1}
= span{Holuo,Holrh,Holul,Holq:,Holu2,... ,HotUk_l,Holrlt},
WI Gao et aL / Appl. Math. Comput. 98 (1999) 199-208
202
OU2k+t (AHo, ¢0) = span{Co, Co, ¢1, ci, ~ , , . . . , Ck-l, ~k) = span{ ¢o, Co, AHoco, c t, A H o c t , . . . , ck_l, AHoCk_l }
}.
=
Proof. By induction argument in k.
[]
Remark. Since ~0 = ro - AHoro = b - A(xo + Horo), the above proposition improves the result in [2], s p a n { c o , . . . , ck} c s p a n { ( A H o ) r o , . . . , (AHo)2k+2ro}.
3. Matrix relations F o r the sake o f simplicity of" theoretical analysis, we rewrite Algorithm I as
Algorithm I' 1. Start: Given x0,Ho, compute ro = b - A x o . 2. Iterate: F o r k = 0, 1 , 2 , . . . 2.1. ~ik = c ~ ( r k - A H o r k ) , for i = 0 , . . . , k -
1,
k-I i=0 k-I
(2)
~+ = rk - AHork - Z ~ i k c i , i=0
2.2. flik = cTiAHo~k, for i = 0 , . . . , k -
1,
k-I
flkkUk = ~k -- Z [ ~ i k H i , i-o k-I
(3)
flkkCk = AHo~o - Z f l ~ : c , ,
(4)
i=0
where fl~, is such that Ilckll2 = 1; 2.3.
Tx ];k = Ck L,k,
xk+l = xk + Ho(rt+ + ~kuk), r++l = ~k -- ?kCk,
(5)
if [[rk+t 112 is small enough, then STOP. r/k, uk in Algorithm I are replaced by H0-Xr/k,Holuk, respectively, and some other symbols are also changed. Substituting Eqs. (4) and (5) in Eqs. (1) and (2), we deduce that
IV.. Gao et al. I Appl. Math. Comput. 98 (1999) 199-208
203
k-I
(6)
rlk = ¢k-J - 7k-lCk-I + Z a i ~ u ~ , i=0 k-I
k-I
Ck ~- Ck-I -- ~k-lCk-I -- Z f l i , k - l C i
- ZO~ikCi 2i- ~k l AHOCk-["
i=0
(7)
i=0
T o g e t h e r w i t h Eq. (3), we get the e q u i v a l e n t m a t r i x f o r m (120'~l'Ul'~2'U2'''',I]k,
blk)B2k+l = (~o, C O , ~ l , C l , ~ 2 , ' ' ' , C k - l , ~ k ) C 2 k + l ,
( ~0, Co, ~1, C1, ~2"~ " " " , Ok-l,
~k)D2k+l
= (~o, Co, 7oAHoco, c], 7 1 A H o c 1 , . . . , Ck-l, 7k_lAHoCk-I), where
'/~oo
-"ol 1
~o~
-~o~
#o2
...
-"o~
/~
-,,~
#,~
...
-,,~
fl22
"""
- - O~2k
flOk
1 B2k+l
=
fl2k
1
1
1 -~'o
1
1 -71
C2k+ 1 =
1 '.
1 --?k-1
1 l1
-1
1 UoJ+floo+7o 1
C¢02 -~- ~01
1 ~12 +fill +71 D2~+I =
"'"
~ok + rio,k-l
-1 ...
1 --1 1 ~*-),~ +/3~-l,k-I + ?k-I 1
204
IV. Gao et al. I Appl. Math. Comput. 98 (1999) 199-208
Thus, we have
(u0, rh, u~,..., ~tk,uk) -1
-1
= (G0, co, ?0AH0c0,..., ck-1, ]~k_l.4HoCk-I )D2k+l Czk+IB2t,+1. Clearly, denote Dzk , C2k,l~zk as the matrices by deleting the last row and column in D2k+l, C2k+I,B2k+l, respectively, we get (no' ~1' Ul ,. . "' Uk-l' . nk).
(~0, . CO, . ~yoAHoco,
-1 . ,7k-lAHoCk-2, Ck- l ) D ~-1 C2kB2k
4. New implementation First, we mention here that if [l~k[[2 is small enough (especially, if ~k = 0), an approximate (exact) solution xk + qk is reached. We denote this new solution as x2k+l, and the old xk+l in Algorithm I as x~+2. The derivation in Section 3 leads to the following algorithm.
Algorithm II 1. Start: Given xo,Ho, computer ro = b - Axo, xl = Xo + Horo. 2. Iterate: F o r k = 0, 1 , 2 , . . . 2.1.~ik = c T ( r k - AHork), for i = O , . . . , k 1, k-I
~k = rk - A H o r k - Z ~ i k c i , i=0
if I[~k[I2 is small enough, then G O T O step 3, 2.2. flik = cTiAHo~k, for i = 0 , . . . , k - 1, k-I
flkkCk =
AHo~k--
ZflikCi, i=0
where flkk is such that Ijckll2 = 1, 2.3. ?k = c ~ k , rk+~ = ~k - ?kck, if Ilrk+l I]2 = V/ll~kl[~. - 7~ is small enough, then G O T O step 3. 3. F o r m the approximate solution: If H~kl[2 is small, x2k+l = xl + Ho(~o, Co, yoAHoco,..., 7k_tAHock-2, ck-, ) D ~ C2kB~y2k, else if llrk+ll]2 is small, X2k+2 = XI + H 0 ( ~ 0 , Co,
(i) (ii)
?oAH0c0, • •
. ~C
-1 -1 h-l, ~k_IAHoCk-I)D2k+IC2k+IB2k+lY2k+I.
N o t e that in this new implementation: The computation of rh and uk is omitted• The cost o f computing ~k or YZk+l can be neglected.
W. Gao et al. I Appl. Math. Cornput. 98 (1999) 199-208
205
(iii) Only vectors xl, 30,c0, c l , - - . , c k and the coefficients a q , f l i y , ? i , j >>- i, i, j = 0 , . . . , k need to be saved. (iv) To form the approximate solution, one additional matrix vector multiplication and two vector updates are required. Next we compare Algorithm II with Algorithm I and G M R E S method. In order to compare the efficiency of the three methods, we need an estimate for the amount of work and m e m o r y in each method. For obvious reason we have listed in Table 1 the amount of work and m e m o r y requirements for k steps of E N and 2k steps of G M R E S . At the end of this section, we also give a scaling invariant scheme.
Algorithm HI 1. Start: Given x0, H0, compute ro = b - Axo. 2. Iterate: F o r k = 0, 1,2 . . . . 2.1. Pk = ( A n o r k ) T r k / H A n o r k l l ~ , • ,, = c~(rk - pte4Hork),
for i = 0 , . . . , k - 1,
k-1 ~k = rk - p ~ A H o r k - ~ _ ~ i k c i , i=0
if k = 0 then xt = x0 +p0H0r0, if 11~112 is small enough, then G O T O setp 3, 2.2. flik = c f A H o ~ k , for i = 0 , . . . , k - 1, k-I
flkkCk = A H o C k - Z f l i k c i , i=o
w h e r e / / ~ is such that IIckllz = 1, 2.3. ?k = ck1 ~k, rk+l = ~k - ?kCk,
if
Ilrk+x 112= v/ll~kll 2 -~
is
small enough, then G O T O step 3.
Table 1 One cycle coast of EN(k) and GMRES (2k) Method Algorithm I-EN(k)
Algorithm II-EN(k)
GMRES(2k) a
Work
Multiplications A-products
Multiplications A-products
0 (2k~ + 4k)N 2kN (2k2+6k)N (k + 3)N
N 1 (4k2 + 6k)N 2k 2kN 0 (4k2+8k+l)N 2k+l (2k + 2)N
Multiplications A-products
Step 1 0 1 Step 2 (3k~ + 6k)N 2k Step 3 0 0 Total (3k2+6k)N 2k+l Memory (2k + 3)N
1 2k 1 2k+2
a For the details on implementation of GMRES(2k), we refer to [3], Algorithm 4.
206
W. Gao et al. / Appl. Math. Comput. 98 (1999) 1 9 ~ 2 0 8
3. Form the approximate solution: If [l~kl12 is small, X2k+l = X l ÷ HO (30, co, YoAHoco, . . . , 7 k - I A H o c k - 2 , c k - I )D2~ C 2 k B ~ Y2k,
else if Ilrk+l 1[2 is small, X2k+2 = Xl ÷ HO(¢O, CO, 7oAHoco, . . •, Ck-,, 7 k _ l A H o c k - j )D2kl+lC2~
+1B2k +~-1lY2k+l.
Where B2k+1 = B2k+l, C2~+1 = C,,_k+j diag(1, P l, 1, P2, 1 , . . . , p~, 1), D2k~l = diag(1, 1,p~, l,p~,..., I,p,) 1 -! 1
~]
+Pl/~Ol +70
~02 +P2/J01
1
~o~ + Pkflo,k I
--1
1 ~q2+P2fill+Yl
cqk + Pk~l.~-i
-1 ~k ].k + Pk[3k-l.k I + ~'k I 1
and B2k, C2k,D2k are the leading principal matrices of B2k+l, C"2k+1,D2k+l with order 2k, respectively.
5. Numerical experiments We consider the problems arising from the centered difference discretization of the convection~zliffusion equation - a u + 7'(xux + y u y ) + flu = f
(8)
on the unit square with homogeneous Dirichlet condition. To test this purpose, only the restarted versions of Algorithms III and G M R E S were compared. We used the red-black ordering on a 31 x 31 grid with mesh size h = ~ yielding a matrix of order N = 961. For each test, the right-hand side was chosen so that the vector of all ones is the exact solution, and the starting vector xo was taken to be zero. The algorithms were computed on the basis of the number of iterations necessary to achieve relative residual IIr,.ll/llroll < 10-6- All the tests were run on HP9000/817s using double precision, with machine epsilon of order O(10-16). In the first test, the parameters in Eq. (8) were chosen to be 7 = 100.0 and /~ = -200.0. This choice guarantees that the cell Reynolds number is smaller than one. For this choice with k = 20, we found that both algorithms do not converge in 1000 iterations with Jacobi preconditioner. In our second test, we took the parameters 7 = 1000.0 and /3 = 10.0 to make the problem highly nonsymmetric. The results are shown in Tables 2 and 3, respectively.
207
14I.. Gao et al. I Appl. Math. Comput. 98 (1999) 199-208
Table 2 7 = 100.0,/~ = -200.0 Method
Preconditioner
k
Iterations
CPU time (s)
GMRES GMRES EN GMRES GMRES EN
ILU ILU ILU Jacobi Jacobi Jacobi
20 40 20 40 80 40
10 10 7/13 73 45 37/73
0.4 0.4 0.4 4.6 3.4 4.9
Method
Preconditioner
k
Iterations
CPU time (s)
GMRES GMRES EN GMRES GMRES EN
ILU ILU ILU Jacobi Jacobi Jacobi
20 40 20 20 40 20
39 31 39/78 157 114 62/121
1.8 2.0 3.0 6.0 7.4 4.1
Table 3 = 1000.0,fl = 10.0
6. Conclusion A n e w i m p l e m e n t a t i o n o f E N m e t h o d has b e e n p r e s e n t e d in this p a p e r . C o m p a r e d w i t h the o r i g i n a l , this n e w i m p l e m e n t a t i o n a l w a y s n e e d s less a r i t h m e t i c s a n d m e m o r y r e q u i r e m e n t s . W e also h a v e c a r r i e d o u t s o m e n u m e r i c a l e x p e r i m e n t s w h i c h s h o w t h a t it is c o m p e t i t i v e w i t h t h e G M R E S method.
References [1] T. Eirola, O. Nevanlinna, Accelerating with rank-one updates, Linear Algebra Appl. 121 (1989) 511-520. [2] C. Vuik, H.A. van der Vorst, A comparison of some GMRES-Iike methods, Linear Algebra Appl. 160 (1992) 131-162. [3] Y. Saad, M.H. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Comput. 7 (1986) 856-869. [4] P.K.W. Vinsome, an iterative method for solving sparse sets of simultaneous linear equations, in Proc. Fourth Symposium on Reservoir Simulation, Society of Petroleum Engineers of AIME, 1976, pp. 149-159. [5] D.M. Young, K.C. Jea, Generalized conjugate gradient acceleration of nonsymmetrizable iterative methods, Linear Algebra Appl. 34 (1980) 159-194.
208
W. Gao et al. I Appl. Math. Comput. 98 (1999) 199-208
[6] O. Axelsson, Conjugate gradient type methods for unsymmetric and inconsistent systems of linear equations, Linear Algebra Appl. 29 (1980) 1-16. [7] H.C. Elman, Iterative methods for large sparse nonsymmetric systems of linear equations, Ph.D. thesis, Computer Science Department, Yale University Press, New Haven, CT, 1982.