An orthogonal accelerated deflation technique for large symmetric eigenproblems

An orthogonal accelerated deflation technique for large symmetric eigenproblems

Computer Methods in Applied Mechanics and Engineering 94 (1992) 13-23 North-Holland An orthogonal accelerated deflation technique for large symmetric...

706KB Sizes 4 Downloads 29 Views

Computer Methods in Applied Mechanics and Engineering 94 (1992) 13-23 North-Holland

An orthogonal accelerated deflation technique for large symmetric eigenproblems Giuseppe Gambolati, Flavio Sartoretto and Paolo Florian Universitd di Padova, Dipartimento di Metodi e Modelli Matematici per ie Scienze Applicate, Via Belzoni 7, 35131 Padova, Italy

Received 7 December 1989 An improvement in accelerated conjugate gradient iterations is presented for the evaluation of several of the leftmost eigenpairs of large sparse symmetric positive definite matrices. The approach relies on an orthogonal deflation procedure and is based on the subsequent preconditioned conjugate gradient optimization of Rayleigh quotients over the restricted space orthogonal to the set of eigenvectors previously computed. Comparison with the accelerated simultaneous iterations performed over large finite element problems (with size up to 4500) shows that storage requirement is significantly less and CPU times may be reduced by a factor of two or more.

1. Introduction

The evaluation of the p leftmost eigenpairs of the generalized eigenvalue problem Ax = A B x ,

where A and B are large symmetric positive definite matrices, is an important task in many engineering and physical problems. The value of p is customarily much smaller than the matrix size N, which may be several hundreds or even many thousands large. A detailed list of references and a review of the methods proposed to solve this problem can be found in [1]. The preconditioned conjugate gradients for the optimization of the Rayleigh quotient have proven a very attractive and promising technique for large sparse eigenproblems [1-3]. The deflation procedure presented in [2] is quite effective. However it requires the a priori estimate of a shifting parameter which is related to the spectrum of A and, although not difficult to determine in practice, is to some extent problem dependent. To avoid the need for providing empirical parameters, simultaneous orthogonal minimizations are proposed in [1] by which the p eigenpairs are all found together. The simultaneous iteration scheme SRQMCG appears to be highly efficient and accurate estimates of the leftmost eigenpairs are obtained after a number of steps which is far smaller than the problem size N. A distinct advantage of S R Q M C G is that no acceleration parameter is required. However the rate of convergence of S R Q M C G to each single eigenpair is not uniform since it is related to the separation between the desired eigenvalues and hence the overall convergence speed is essentially controlled by the rate towards the slowest eigenvalue. The purpose of the present note is to develop and discuss an improvement of SRQMCG relying on an orthogonal deflation procedure wherein the eigenpairs are obtained one at a time over the restricted space orthogonal to the eigenvectors already computed. The minimization of the Rayleigh quotient is accomplished with the aid of the preconditioned conjugate gradient scheme given and discussed in previous papers [1-3]. 0045-7825/ 92/ $05.00 © 1992 Elsevier Science Publishers B.V. All rights reserved

G. Gambolati et al., Eigensolution of large matrices

14

Representative numerical examples, arising from the finite element integration of flow and structural equations, are shown to describe the behavior of the non-simultaneous scheme thereafter labeled NSRQMCG. The new procedure is applied to compute the 15 leftmost eigenpairs of arbitrarily sparse matrices whose sizes range from 150 to 4500. It is shown that NSRQMCG is faster than SRQMCG, with a speedup gain equal to two or more.

2. Computation of the leftmost eigenpairs Let A and B be sparse symmetric positive definite N x N matrices. Consider the generalized eigenvalue problem Ax =

.

(1)

Denote by AN ~< A N - i <~ " " " ~< A1

and U N,

UN-

1~ • • • ~ U 1

the eigenvalues and the corresponding eigenvectors. We recall that the eigenvectors of (1) are the stationary points of the Rayleigh quotient xtAx R(x) = xtBx '

(2)

and the gradient of R(x) is given by 2 g(x) = x'Bx ( A x - n ( x ) B x ) .

(3)

To simplify the notation we set gck) = g(xCk)) = 2rtk)/(x,Bx), rCk)being the residual vector. To compute a number of the leftmost eigenpairs of (1), the RQMCG scheme was first proposed [2]. RQMCG evaluates one eigenpair at a time by a deflation procedure requiring the assessment of a shifting parameter, which is problem dependent. Latec, the SRQMCG scheme has been developed [1], allowing for the simultaneous evaluation of the leftmost p eigenpairs of (1) (in the sequel 1 ~
rain R(y),

(4)

y .svi

where V/==-span{vN_j+l[j= 1 , . . . ,

i - 1}, x~k) converges toward vu_t+ l as k increases.

G. Gambolati et al., Eigensolution of large matrices

15

The idea is now that the eigenpairs can also be computed one at a time with the aid of the characterization (4). After VN_j+I, ]= 1 , . . . , i - - 1 have been evaluated, VN-~+~ can be determined by minimizing R(y) over the vector space which is the B-orthogonal complement to V~. The minimization is performed by the preconditioned CG scheme and thus the new 'non-simultaneous' NSRQMCG procedure is obtained. To assess the p ~ search directions and the x~ ~ approximation vectors, SRQMCG needs 2 ( p - 1 ) ( p - 2)/2 expensive B-orthogonalizations per iteration. On the other hand to evaluate the eigenpair (An_i+ ~, VN_j+I) NSRQMCG requires only j - 1 B-orthogonalizations per iteration. Hence the computational cost of NSRQMCG is lower. NSRQMCG is based on the modified (or preconditioned) conjugate gradient (MCG) method [1-3]. The preconditioning matrix

K-' = (LL')-'

(5)

is used, L being the pointwise incomplete Cholesky factor of A. The preconditioning with this matrix proved extremely effective in the computation of the leftmost eigenpairs of sparse finite element matrices [I, 2]. To achieve convergence, usually a recurring 'restart' operation is in order after NREST iterations have been completed [5]. Alternatively an appropriate choice of the fl parameter in the MCG scheme may avoid the restart. Following Polak [6],

{3(k) (g{k~_g(k-1))tg-lg(k) g(k _ l ) t g _

=

lg(k_l )

(6)

is set. As the numerical results show in the next section, this choice for/3 ckJ implements an automatic 'restart' at some point in the procedure which ensures the convergence of the scheme. Assume that the eigenpairs (AN-~+~, VN-~+~), i = 1 , . . . , ] - - 1 have been computed and _(k) (AN_j+~, vs_~+~) is to be evaluated. Define the relative residual r,.j as

r(k)

Ilrj

"' = IIA

(7) 'II

'

where r(k) __Ax(k) _ R(x~))Bx~) J J

(8)

is the residual vector at the kth iteration. For simplicity of notation, when no confusion arises, we shall drop the sub-index, thus writing r~k) and r ~k) instead of ",4-~k)and r~k), respectively. t Without any restriction, we can assume ViBVj = ij_~. We shall also write V instead of V~ The NSRQMCG scheme consists of the following steps:

Step 1. Compute the preconditioning matrix K -~ - (LLt) -~. Step 2. Give an initial vector x c°) such that VtBx ~°) = O, i.e. x ~°) is taken to be B-orthogonal to V. Choose a tolerance value TOLL and the allowed maximum number of iterations NITMAX. Set k = 0 (iteration index) and take p¢-~) as an arbitrary vector. As long as k is smaller than NITMAX and r,-{k)is greater than TOLL, execute Steps 3-8; otherwise go to Step 9. Step 3. If k = 0 then set/3 ~k)= O, otherwise evaluate fl(k) by eq. (6).

G. Gambolati et al., Eigensolution of large matrices

16

Step 4. Compute

K-,g k +

(9)

Step 5. Evaluate the vector p(k) by B-orthogonalizing/~(k~ with respect to V. Step 6. Compute the coefficient a ~k) by minimizing the Rayleigh quotient (see [1, 2] for details)

R(x(k) + a(k)p'k)). Step 7. Evaluate ~(k+l) = X~k) + otkp(k).

Step 8. The new approximation vextor x (k+~) is evaluated by B-normalizing £(k+~). Increment the iteration counter k and go back to the loop start. Step 9. If r~-¢k~is smaller than TOLL, R(x (k)) and x (~) are the smallest jth eigenvalue of (1) and the corresponding eigenvector, respectively. Note that the first eigenpair can be computed as well by this general procedure, by taking V~ as the null space. As far as the storage requirement is concerned, SRQMCG needs the storage of two complete sets of p N-size arrays at each iteration, one for the approximating vectors and the other for the searching directions. On the other hand when computing the eigenpair ( N - j + 1) NSRQMCG only requires j - 1 N-size eigenvectors, together with one N-size array for the search direction and one more for the approximating vector.

3. Numerical results

Some representative numerical results are provided in this section. They were obtained with a FORTRAN code performing double precison operations on the IBM 4341/2 computer of the University of Padua. Five sparse finite element matrices A are considered. They arise from the finite element integration of subsurface 3-D flow equations and 2-D elasticity equations for layered structures subject to vertical and horizontal land subsidence. The size of A is 156, 812, 1802, 2304 and 4560. The related irregular sparsity patterns of the first four matrices are displayed in [2]. The fifth matrix has 22 154 non-zero elements in the upper triangular part and the sparsity pattern of the [618 x 618] leading submatrix is shown in Fig. 1. B = I N is assumed. The starting set ,f~0) = {i~0)[ j = 1 , . . . , p} needed for SRQMCG was obtained by orthonormalizing the set of vectors {fjlj = 1 , . . . , p}, where for each j and i = 1 , . . . , N we have taken

fii=

{

T/,

if ]= i or i > p ,

0,

otherwise,

vl being an arbitrary number (see [1] for a justification of this choice). To start the evaluation of the jth eigenpair by NSRQMCG, the initial vector x (0) j was obtained by normalizing the - -(o) projection ot x j onto the orthogonal complementary space to Vj.

G. Gambolati et al., EigensolutiG.n of large matrices

-~.

.~°%t i

N

./.

"N



'.

17

!i

" ":." ..

"~.~,.~

~. "'



2

~

"

J

~

.



• •

a..

I

.

•s



w •



.

.

~' •

Fig. 1. Sparsity pattern of the upper leading submatrix of the matrix with size N = 4560. The first 618 rows and 618 columns are shown.

N R E S T - 2 0 is assumed in the SRQMCG scheme (cf. [1]) and T O L L = 10 -6 for both schemes• Figures 2-5 compare the performance of NSRQMCG and SRQMCG for the computation of the leftmost p = 15 eigenpairs of the sparse matrices with size 156, 812, 1802 and 2304, respectively. The average relative residual of the SRQMCG scheme r a(k) =

EJPl p(',d ~

(10)

is shown, together with the value of a residual, r,. s-tk) of NSRQMCG. The latter is simply the prescribed tolerance T = 10 -i and T = 0.5 x 10-', i = 1 , . . . , 6 versus the number of iterations averaged over the p eigenvalues. Figures 2-5 emphasize the improved performance of NSRQMCG. The NSRQMCG iterations needed to compute the leftmost 15 eigenpairs are fewer than the SRQMCG iterations.

G. Gambolati et al., Eigensolution of large matrices

18 10~(

-'

]

'

I

'

I

I

'

;

'

i

'

]

]-

'

'

j

'

I 90

,

J

'

I

'

;

'

1 -1

\

\

"\,

"3

"5

I07~

'k

'

I io

,

t 2o

~

i 30

,

I 40

,

t 50

,_--d__L 60

I 70

,

l 80

,

I. : ~oo

I ~o

J__l a ~2o ]~o

Iterations Fig. 2. Average convergence profiles for NSRQMCG (solid line) and SRQMCG (dashed line) for the computation of the 15 leftmost eigenpairs of the matrix with N = 156.

iterations. This is so because the rate of convergence toward each single eigenvalue is not uniform (see [1]) and the overall performance of SRQMCG is controlled by the slowest computation. No restart procedure is necessary to achieve the convergence for NSRQMCG (cf. [1]). Table 1 compares the CPU times required by the two schemes. The number of SRQMCG iterations and the average number of NSRQMCG iterations performed to satisfy the exit criterion with TOLL = 10 -6 are also given. The last two columns provide the times and 101



'~\.~

$ " O iii

\ :\ \,

"3

\

%,.

\,

\.\

\

"5

\

10

0

1o

20

30

~0

50

eo

70

ao

90

Iterations

Fig. 3. The same as Fig. 1 for N = 812.

100

110

120

1~0

G. Gambolati et al., Eigensolution of large matrices 10 z

'

I

'

I

-'

I

'

I

'

I

,

I

,



,

1--

r

l

'

I

19

'

I

'

I

'

r~'~s-a,, ,c,~ ~ ......... ,

t

! I

\ ".\ l 10

(~

1

7

'

,

l 20

,

I 30

,

1 40

,

[ 50

I

I 60

I

l 70

1 , 1Q13

, _~_._L_L__~_ aO

go

t 110

,

L 120

I "~0

Iterations Fig. 4. The same as Fig. 1 for N = 1802.

(k)

iterations needed by S R O M C G to decrease under T O L L each single relative residual r,.j, j = 1 , . . . , p. Note that N S R Q C M G is approximately twice as fast as S R Q M C G . The gain in speed is slightly more if the comparison is made between the second and the sixth column, i.e. when the prescribed accuracy is achieved for all the eigenvalues in both schemes. It is also worth noting that the average number of N S R Q M C G iterations do not increase for N = 4560. This might suggest that the scheme is particularly robust for very large matrices.

10 z

ri, J,,i;;

107~

'

I

'

I

'

[

'

I

'

I

'

[

'

I

'

]

I

,

I 40

,

I 5D

,

l 60

,

I 70

,

I., aO

'

I

'

I BO

,

I

'

I

'

]

'

[ 110

,

I 120

,

~-...................

'

I

10 .!

,

1

20

,

3C"

Iterations Fig. 5. The same as Fig. 1 for N - 2304.

I , 1013

110

G. Gambolati et al., Eigensolution of large matrices

20

Table 1 CPU time, number of SRQMCG iterations and average number of NSRQMCG iterations needed to compute the leftmost p = 15 eigenpairs of five sparse finite element matrices NSRQMCG

SRQMCGa

SRQMCGb

Matrix size

time(s)

iterations

time(s)

iterations

time(s)

iterations

156 812 1802 2304 4560

144 1488 5587 7593 12929

35 69 75 112 90

410 3234 10624 16131

58 88 95 127

436 3379 11066 17633

60 92 99 139

--~

--~

--~

--~

Stopping criterion: ra-ok)< TOLL. b Stopping criterion: rr. -("> j < TOLL for ] = 1, • . . , p. c Not evaluated: storage requirements too high. F i g u r e s 6 ( a ) - ( d ) s h o w the c o n v e r g e n c e of the single relative residuals rr,s, -(k) ] = 1, 5, 10, 15, for the matrix with N = 156. T h e s e figures provide a b e t t e r insight into the c o n v e r g e n c e b e h a v i o u r of the t w o schemes t o w a r d e a c h eigenvalue. N o t e t h a t the rate of c o n v e r g e n c e of S R Q M C G for the e v a l u a t i o n of the ( N - j + 1)th e i g e n p a i r usually decreases w i t h j, while this is n o t necessarily so for the N S R Q M C G s c h e m e , since its actual p e r f o r m a n c e is r e l a t e d to the s e p a r a t i o n b e t w e e n the ( N - ] + 1)th a n d the next h i g h e r e i g e n v a l u e . This a s p e c t is clearly t

tO ~ . . . .

,

j

1

J

e

'

I

'

10"

(k) r

r (k) ' "~''~.~

r, 1

,o'

'

"'":

,'o

"

;o

'

r, b

Io

1o

~'o

o

. "....".......... ~"~............"\ ....

,o

w'o

,o

,o

(a)

(k) rr,lO

'

I

~'"'"'""-"-

!

t

~o

,o

(b} I

i

]lO t

(k) rr 15

..

•,

.o

i|era~innj

ilera~ion|

IO t

.=

..... ,,

I

1

~

~

!

-'v-----'~

~---'-

.

j-lo

",.,

!

,

j=15

"'""., \,,

• ,oo

.'o

,o

~o

'1

i

.o

~

"5i ,o



"........

"'\""

'

;o

'

;

'

'~

Iler~litme

lSerld~one

(c)

(d)

'

Io

. . . . ;o

'

.o

Fig. 6. Convergence profiles of NSRQMCG (solid line) and SRQMCG (dashed line) in the computation of 4 eigenpairs of the matrix with size N = 156,

G. Gambolati et al., Eigensolution of large matrices

21

Table 2 Number of NSRQMCG iterations needed to compute each of the leftmost p = 15 eigenpairs of five sparse finite element matrices (TOLL = 10 -6) Eigenpair no. Matrix size N N-1 156 812 1802 2304 4560

27 45 77 91 74

25 55 53 93 66

N-2

N-3

N-4

N-5

N-6

N-7

N-8

N-9

N-10

N-11

N-12

N-13

N-14

31 46 91 89 57

28 62 64 114 64

28 50 83 128 50

32 61 62 117 90

34 67 64 160 98

33 50 57 95 109

44 142 65 142 66

31 72 70 131 92

41 100 82 103 148

32 83 55 92 163

56 67 148 148 86

39 64 73 80 98

51 64 81 99 95

e m p h a s i z e d in Tables 2 - 4 which give the n u m b e r of iterations, the C P U time per iteration and the overall C P U time r e q u i r e d by N S R Q M C G in the computation of each of the 15 leftmost eigenpairs. It is worth observing in Table 2 that the n u m b e r of iterations does not increase m o n o t o n i c a l l y with the level of the eigenvalue to be evaluated. Table 3 shows that the C P U time p e r iteration grows very slowly as we m o v e upward in the spectral interval. T a b l e 4 gives the overall C P U required to c o m p u t e each eigenpair. Again it may be noted that the cost does not increase monotonically with the level of the eigenpair, although the cost per iteration increases (see Table 3) due to the growing n u m b e r of orthogonalizations. Table 4 suggests that N S R Q M C G might be used for the efficient c o m p u t a t i o n of m o r e than 15 eigenpairs.

T~ble 3 NSRQMCG CPU time(s) per iteration needed to compute each of the leftmost p element matrices (TOLL = 10-6)

=

15 eigenpairs of five sparse finite

Eigenpair no. Matrix size

N

156 812 1802 2304 4560

0.2 1.1 4.3 3.7 7.9

N-I

N-2

N-3

N-4

N-5

N-6

N-7

N-8

N-9

N-10

N-11

N-12

N-13

N-14

0.2 1.3 4.5 4.0 8.4

0.3 1.3 4.6 4.1 8.6

0.2 1.3 4.7 4.2 8.8

0.3 1.4 4.7 4.3 9.0

0.3 1.4 4.8 4.3 9.1

0.2 1.4 4.9 4.4 9.3

0.3 1.5 5.0 4.5 9.4

0.3 1.5 5.0 4.6 9.6

0.3 1.5 5.1 4.7 9.7

0.3 1.5 5.2 4.8 9.9

0.3 1.5 5.2 4.9 10.1

0.3 1.6 5.3 4.9 10.2

0.3 1.6 5.4 5.0 10.4

0.3 1.6 5.4 5.1 10.6

Table 4 NSRQMCG overall CPU time(s) needed to compute each of the leftmost p element matrices (TOLL = 10-6)

=

15 eigenpairs of five sparse finite

Eigenpair no. Matrix size

N

N-1

N-2

N-3

N-4

N-5

N-6

156 812 1802 2304 4560

5 51 331 338 582

6 69 241 372 557

8 60 419 364 491

6 81 300 475 561

7 68 394 546 448

9 84 299 508 820

8 94 312 709 912

N-7 9 73 283 429 1024

N-8

N-9

12 207 327 654 631

9 107 356 615 895

N-10 N-11 12 152 423 492 1463

9 128 287 448 1639

N-12 N-13 16 106 783 732 881

12 103 392 403 1020

N-14 16 105 440 508 1005

G. Gambolati et al., Eigensolution of large matrices

22

|01

~ --I

r--l-F--v---Y----r-~[

~"

~ 1

,,

-'T-

r

'.:',i v"

-~

v ~

½

"

'm,

:

f~

L~ _L__.-[~_ J.~ _J.~-_k

20

30

40

_L

so

~ t_

L

so

i \

'/""R~..,

j=5 1

]

i

':~.-.

",,

-5f ~0

r--r---[--~--

,u',A\_,

'%, -

L-~

[.... l

r



,

1~O---Tr~

r--Y

t

5=1 L__L

7o

L

ao

L

!

~'

1o lSi , ~ ~00 l 90

Iterations Fig. 7. C o n v e r g e n c e p r o f i l e s o f N S R Q M C G

in t h e c o m p u t a t i o n o f 4 e i g e n p a i r s o f t h e m a t r i x w i t h N = 4560.

. (k)

Finally, Fig. 7 shows the NSRQMCG relative residuals rr. j , j = 1, 5, 10, 15, for the matrix with N = 4560. Again Fig. 7 indicates that the convergence profiles of NSRQMCG do not necessarily get worse when j increases. It is worth noting that the total number of orthogonalizations is representative of the computational burden of both NSRQMCG and SRQMCG. It was already observed in the previous section that SRQMCG needs 2(p - 1)(p - 2 ) / 2 orthogonalizations per iteration, while for the computation of the ( N - j + 1)th eigenpair NSRQMCG needs only j - 1 orthogonalizations. As an example, consider the matrix with size N = 156. The total number of orthogonalizations performed by NSRQMCG turns out to be 4585, while SRQMCG requires 60 x 14 x 13 = 10920 to evaluate the same eigenpairs with the same accuracy. The ratio 10920/4585- 2.4 is not too far from the ratio 436/144= 3.0 between the corresponding CPU times (see Table 1).

4. Conclusions

The accelerated orthogonal deflation technique NSROMCG presented in this paper is twice or more as fast as the simultaneous minimization method. In the simultaneous computation the overall convergence rate is controlled by the process toward the slowest eigenpair, while individual convergence of NSRQMCG is essentially determined by the separation from the next higher eigenvalue. NSROMCG takes advantage of the non-uniform convergence toward the diffe~'ent eigenvectors and of the smaller number of orthogonalizations performed, requiring final CPU times which are reduced by a factor of 2 or more. The storage requirement is also less and the programming is simpler. NSRQMCG appears to be well suited to handle large matrices. The numerical experiments performed with finite element problems show that the scheme is quite robust and the number of iterations required in the evaluation of each eigenpair does not increase significantly when N becomes large. The increase in the CPU time per iteration has been shown to be very slow, while the total number

G. Gambolati et al., Eigensolution of large matrices

23

of iterations does not grow monotonically as we move upward in the spectrum. In the present paper the 15 leftmost eigenpairs of large matrices have been computed, but the scheme can be used quite efficiently for the assessment of a larger number of eigenvalues and eigenvectors.

References [1] F. Sartoretto, G. Gambolati and G. Pini, Accelerated simultaneous iterations for large finite element eigenproblems, J. Comput. Phys. 81 (1989) 53. [2] G. Gambolati, G. Pini and F. Sartoretto, An improved iterative optimization technique for the leftmost eigenpairs of large symmetric matrices, J. Comput. Phys. 74 (1988) 41. [3] A.M. Perdon and G. Gambolati, Extreme eigenvalues of large sparse matrices by Rayleigh quotient and modified conjugate gradients, Comput. Methods Appi. Mech. Engrg. 56 (1986) 251-264. [4] B.N. Parlett, The Symmetric Eigenvalue Problem (Prentice-Hall, Englewood Cliffs, NJ, 1980). [5] A. Ruhe, Iterative eigenvalue algorithms for large symmetric matrices, in: L. Collatz and K.P. Hadeler, eds., Numerische Behandlung von Eigenwertaufgaben, Internat. Ser. Numer. Math. 24 (Birkh~iuser, Basel, 1974) 97. [6] E. Polak, Computational Methods in Optimization: A Unified Approach (Academic Press, New York, 1971).