An optimized computer implementation of incomplete Cholesky factorization

1MB Sizes 82 Downloads 110 Views

Report

PDF Reader
Full Text

Computing Systems in Engineering, Vol. 5, No. 3, pp. 265-274, 1994 Copyright © 1994ElsevierScienceLtd 0956-0521(94)E0001-2 Printed in Great Britain.All rights reserved 0956-0521/94$7.00+ 0.00

Pergamon

A N OPTIMIZED COMPUTER IMPLEMENTATION OF INCOMPLETE CHOLESKY FACTORIZATION N. BITOULASand M. PAPADRAKAKIS Institute of Structural Analysis and Aseismic Research, National Technical University of Athens, Athens 15773, Greece (Received 10 March 1993; accepted in revised form 2 February 1994)

Abstract--Preconditioning techniques based on incomplete Cholesky factorization are very efficient in increasing the convergence rates of basic iterative methods. Complicated addressings and high demands for auxiliary storage, or increased factorization time, have reduced their appeal as general purpose preconditioners. In this study an elegant computational implementation is presented which succeeds in reducing both computing storage and factorization time. The proposed implementation is applied to two incomplete factorization schemes. The first is based on the rejection of certain terms according to their magnitude, while the second is based on a rejection criterion relative to the position of the zero terms of the coefficientmatrix. Numerical results demonstrate the superiority of the proposed preconditioners over other types of preconditioning matrices, particularly for ill-conditioned problems. They also show their efficiencyfor large-scaleproblems in terms of computer storage and CPU time, over a direct solution method using the skyline storage scheme.

INTRODUCTION An important factor in the success of iterative methods in the solution of large-scale finite element equations is the preconditioning technique used to improve the ellipticity of the coefficient matrix. This typically consists of replacing the original system Ax = b by the equivalent system B - I A x = B-Ib

(1)

where A is an N x N symmetric and positive definite matrix and the transformation or preconditioning matrix B is an approximation to A and is nonsingular. A number of strategies can be used to increase the efficiency of linear iterative solvers based on the selection and implementation of the preconditioning matrix in conjunction with different ways of handling the matrix equations. The resulting hybrid direct/iterative methods are more effective than the exclusive use of either a direct or an iterative approach. In order to be effective, a preconditioner has to combine the following characteristics: (i) to reduce the ellipticity of the coefficient matrix as much as possible; (ii) to be easily computed and handled by the computer; (iii) to be sparse in order to reduce storage demands and to keep matrix-vector operations at low cost; and (iv) to be amenable to vectorization and/or parallelization. Since some of these requirements are conflicting, a dynamic balance has to be kept between the characteristics according to the type of problem and the available computing facilities.

Preconditioning and matrix handling strategies may be classified into three main categories: global, element-by-element, and block and domain decomposition) The most widely used global preconditioners are derived either from the incomplete Cholesky factorization of the coefficient matrix or from the SSOR characteristic matrix. Jennings and co-workers 2-4 developed a robust and efficient incomplete factorization-based preconditioner. This incomplete factorization, however, requires complicated addressings and high demands for auxiliary storage during the formation of the preconditioning matrix. Papadrakakis and Dracopoulos 5presented an improved version which, at the expense of increasing the factorization time, alleviated the disadvantages of Jennings' incomplete factorization approach. In this work an elegant computational implementation for incomplete factorization is presented. It succeeds in reducing both computing storage and factorization time. A compact storage scheme is used to store both the coefficient and the preconditioning matrices row by row. Non-zero terms are stored in a real vector, while the corresponding column numbers and the start of each row inside the compact scheme are stored in integer vectors. The proposed implementation is applied to two incomplete factorization schemes. The first is based on the rejection of certain terms according to their magnitude, while the second is based on a rejection criterion according to the position of the zero terms of the coefficient matrix. The resulting preconditioning technique is compared with a number of global preconditioners and its efficiency in two characteristic test problems is demonstrated. The numerical results presented also

265

266

N. BITOULASandM.PAPADRAKAKIS

confirm the superiority of the preconditioned conjugate gradient method over direct equation solvers in large-scale finite element applications. INCOMPLETE CHOLESKY FACTORIZATION The reason for performing an incomplete factorization is to obtain a reasonably accurate factorization of A without generating too many fill-ins. Such an approach leads to the factorization LDLT= A - E , where E is an error matrix which does not have to be formed. For this class of methods, E is defined either by the prescribed positions of the elements to be refected, 6-~° or by the computed positions of "small" elements in L which do not satisfy a specified magnitude criterion and therefore are discarded. 2~*'6,l°'u The rejection of certain elements inside the preconditioning matrix often leads to an unstable factorization process, with small diagonal elements in B, or to an indefinite B. To avoid this phenomenon, several techniques have been proposed by either modifying A before factorization, making it more diagonal dominant, 7'9 or by correcting the diagonal elements during the factorization. 2~'6'9'u'12 Numerical tests performed in the past, 5 as well as in this study, have shown that the modification to the diagonal elements proposed by Jennings and co-workers 2~ gives a more robust and efficient incomplete factorization preconditioner. The incomplete factorization procedure can be described using a row-by-row approach and referring to L T rather than L, as follows: for each row i = 1, 2 . . . . . N for each columnj = i + 1. . . . . N i-I

a* = a i j - Z lk'dklkj k=l i-I

j

aii=aii + Z elF'+ k=l

1

i-I

E e l ~ ' - ~ lk,4lk, k=l k=i+l

i-I

d ~ = a j j + Z e}f) k=l

check for refection of a* If yes: s = (d.fiijj)m e~ = sla*l e)j) = s - 1laijl *

aij* = 0 nextj d i = ~lii,lii = 1

for each columnj = i + l . . . . , N lij = a* /di

nextj next i.

(2)

Correcting the diagonal elements of the coefficient matrix which are performed, results in a positive semi-definite error matrix so that the lowest eigenvalue of LDL T cannot be less than that of A and stability is assured. In the incomplete factorization by "position", atj is not computed if it does not fit into the sparsity pattern adopted which usually is forced to coincide with that of A. In the incomplete factorization by "magnitude" the criterion to decide whether a* is large or small is defined by a* < ~0diidjj,where ~0 is a user-controlled parameter. The choice of ~ = 0 corresponds to the complete factorization, while in the case of q; = 1 it corresponds to a form of diagonal scaling. The incomplete factorization by position is more suitable where the equations are of regular form, such as for a set of finite difference equations specified over a regular grid, or when they are derived by an adaptive p-version FE procedure, u However, when the sparsity pattern of the equations is irregular, or some individual elements give much stronger coupling between the variables than others, then incomplete factorization by magnitude performs better. Furthermore, in highly ill-conditioned problems, and in those cases where some additional storage is available, incomplete factorization by magnitude is preferable. COMPUTER

IMPLEMENTATION

A compact storage scheme is used to store both the coefficient and the preconditioning matrices rowby-row. Non-zero terms are stored in a real vector, while the corresponding column numbers are stored in an integer vector of equal length. An additional integer vector with length equal to the number of equations is used to record the start of each row inside the compact scheme. Thus, the total storage requirements for an incomplete procedure by magnitude are N A + N L real (the size of A and L respectively) and N A + N L + 2*(N + 1) integer stores (for the addressing). For the incomplete procedure by position, where the preconditioning and coefficient matrix share the same auxiliary integer vectors, the storage is 2 * N A real and N A + N + 1 integer positions. The extra storage required by the conjugate gradient method is 4N real positions assuming that the solution vector overwrites the right-hand side vector. The computational methodology described in Ref. 4 needs, for the formation of L T, N L + 2*MAXK integer stores which are credited as additional storage indispensable during the factorization phase. M A X K is the maximum number of off-diagonal elements in any column of L T travelling upwards. This additional storage is one of the most serious defects of the original incomplete factorization by magnitude, since in order to gain maximum advantage of the available storage, 4*N

267

Computer implementation of incomplete Cholesky factorization real stores should overlap N L + 2 M A X K integer stores. This situation is not likely to occur as the size of the problem increases, even when integer variables of reduced wordlength are available in the computer system. The standard implementation of the incomplete Cholesky factorization involves column-by-column multiplications in order to obtain each factorized element l(i,j). However, this is not computationally efficient in this case, due to the fact that the coefficient matrix and the incomplete factorization matrix are stored in a row-by-row compact form. The simple observation that all factorized elements of row i can be processed simultaneously leads to a row-wise implementation which is described by the following pseudocode:

j~

\ x

X

I[] X~ I X I X

\

"

a[] \\

[]

X

I

n Z I~

\

",, x~

. . . . . . . . .

Z

X

[]

x

x

~ _ _ ~ _ _ ~ _ _ _ IO0•@@O0000

I factorized part l unfaetorized part

\

N X \ \ N \ \

x non-zeroelements []

for each i = 1, 2 . . . . . N

X

I

x\

non-zeroelementsof columni identifiedby the pointerisky(k)

[] last non-zeroelementprocessedfor row k

l(i, i) = a(i, i)

• elementsof row i affectedby the factorization O elementsof row i not affectedby the factorization Fig. 1. Schematic representation of the optimized incomplete factorization.

next i for each row i = 1, 2 . . . . . N li(i) = l(i, i) for each c o l u m n j = i + 1. . . . . N

and reduces the searching time for the first non-zero elements of each column. Furthermore, pointers are used to identify the zero elements of column i and avoid all multiplications in the corresponding row. A schematic representation of the proposed algorithm is depicted in Fig. 1 while a Fortran listing is given in the Appendix. A "C" version is also available. A n integer vector (INTEGER*4) isky and a real vector Ii, both of length N, are the only additional vectors required during the factorization. These vectors use the storage allocated for the conjugate gradient vectors.

li(j) = a ( i , j ) nextj for each row k = 1, 2 , . . . , i - 1 for each c o l u m n j = i. . . . , N li(j) = li(i) - l(k, i)l(k, k ) l ( k , j ) nextj next k for each c o l u m n j = i + 1. . . . . N check for rejection if yes

NUMERICAL EXAMPLES

s = (l(i, i)/l(j,j)) 1/2 l(i, i) = l(i, i) + Ili(i)l*s lq, j ) = lq, j ) + Iliq)l/s li(j) = o

nextj for each c o l u m n j = i + 1. . . . , N l ( i , j ) = li(l')/l(i, i) nextj next i.

(3)

This implementation is suitable for the storage scheme employed because non-zero elements are successively processed without the need for search in order to perform the corresponding multiplications. The proposed algorithm is further improved with the use of the integer-vector isky which traces the skyline pattern of the incomplete factorization matrix

The performance of the proposed incomplete Cholesky factorization is presented and compared with different global-type preconditioners in two test examples. The first is a clamped square plate, discretized in its quarter with a mesh of 50 x 50 finite elements, having 7400 d.o.f, with half-bandwidth 159. The second example is the space frame dome of Fig. 2 with 3126 d.o.f, and half-bandwidth 318. The tests were performed on a SUN Spare station. Figures 3 and 4 show the computer storage and factorization time required for the formation of the proposed preconditioner compared to the Incomplete Cholesky (IC) preconditioner, as is given in a Fortran listing in Ref. 4. OIC(ff) corresponds to the proposed Optimized Incomplete Cholesky preconditioner by magnitude with different values of the rejection parameter ~k. Figure 5 depicts the total computer storage for handling a direct method of solution, with the

N. BITOULAS and M. PAPADRAKAKIS

268

Fig. 2. Space frame dome (3126 d.o.f., half bandwidth = 318).

6000-

I

m oic

5000

I

[] ic [

)

4000-

~ 3000-

i,¸.¸,

• li

20001000-

) IE-9

1E-8

IE-7 IE-6 1E-5 rejection parameter

IE-4

1E-3

(a) Plate example

1E-9

1E-8

IE-7

IE-6

1E-5

IE-4

IE-3

rejection parameter Co)Space frame example Fig. 3. Computer storage for the formation of OIC and IC preconditioners for different values of the rejection parameter ¢.

1E-9

]

1E-9

1E-8

1E-6

O) Spaceframe example

1E-4

1E4

OIC

[aolc

1E-5

rejection parameter xp

"IE-7

(a) Plate example

1E-7 1E-6 1E-5 rejection parameter

I •

1E-3

DIC 1

IE-3

[ ] IC [

?ig. 4. Factorization time of OIC and IC for different values of the rejection parameter @.

10.

b 20-

,,~

1E-8

I

~.40, ~i

50,

0-

102

20-

60-

70~

0

1000

2000

Zlmi

g

B

(a) Plate example

7000

8000

I

9000

, i , , , , i , , , , |

' "t

I

Co)Spaceframeexample

6000

7000

] [ ] A and CG vectors

8000

"ii • conit,oner f

4 0 0 0 5 0 0 0 6OO0 Storage (KBytes)

3000 4000 5000 Storage (KBytes)

3000

|

I

|

Fig. 5. Total storage requirements of the skyline direct method and PCG with different preconditioners.

0

I j

2000

m i D

:~i~::l

~

1000 1~

PPR(1E-1) l~:x~l DCG ~ 1

SSOR

Im

|

Direct OIC(1E-7) i OIC(1E-6) ~ OIC(1E-5) OIC(1E-4)

PPR(1E-3) DCG

SSOR

Direct OIC(1E-9) OIC(1E-8) OIC(1E-7) OIC(1E-6) IC(1E-9) IC(1E-8) IC(1E-7) IC(IE-6) OICP

q~

O

o

ff

o

q

$

o

O

O

PPR(1E-I

50

•

-,,m --im ~-m -mm -m,m -im ~-'m

_ m m

100

,

1000

l

....

,

,

m

200 250 Iterations Co) Space frame example

150

,

2000 2500 Iterations (a) Plate example

US00

300

~300

350

3500

Fig. 6. Iteration performance of different preconditioners.

0

[C(1E-5 [C(1E-4) -

Direct'

500

0

t

m

| • i

l

SSOR" I PPR(1E-3)' DCG'

ICOEJ ~C(m-~

IC( I E-g

OIC(1E-6)"m

Direct OIC(1E-9) OIC(1E-8) I OIC(1E-7)' m

400

4000

0

20

40

400

~

~ ~

~ o

~

1200

1400

~

Co) Space f r ~ e example

60 80 100 Total CPU time (see)

[] c~ Iterations time

120

140

Formation time for the preconditioner

(a) Plate example

600 800 1000 Total CPU time (see)

~

~ ~ ~.:~]~i~i'~l

~

Formation time for the )reconditioner

[ ] CG Iterations time

•

160

1600

Fig. 7. Total CPU performance of the skyline direct method and P C G with different preconditioners.

PPR(1E-11

[C(1E-5

mm~m

m ~

200

~

DcG ~ 0

~

SSOR ~ PPR(1E-3) ~ ~

IC(IE-6) ~ - : ~ z ~

Direct m OIC(1E-9), OIC(1E-8)' OIC(1E-7) OIC(1E-6) IC(1E-9) ' IC(1E-8),

>

>

t-

O

-]

.z

271

Computer implementation of incomplete Cholesky factorization L

1E-6

* pivot fad ire

~,g-8~ . i

N

N

i

* pivot failare

c~

* pivot fail fie

1E-7

~,

o

• 0

100

200

300 400 Total CPU time (sec)

500

600

Fig. 8. Stability behaviour of the incomplete factorization without corrections for different values of the rejection parameter ~b. in which the prescribed positions of the elements to be retained coincide with the sparsity pattern of the coefficient matrix. PPRqh) is a modified SSOR preconditioner in which rejection by magnitude criterion is enforced/5'~6 The elements to be retained in

skyline storage scheme of Ref. 14, and the preconditioned conjugate gradient method with different preconditioners. Numbers in parentheses correspond to the rejection paramenter ~k. OICP stands for the Optimized Incomplete Cholesky by Position

mlllmllmmllm

OIC

m|

OIC

ml * piv( failure 0

100

200

600 300 400 500 Total CPU time (sec) (a) Plate example

700

800

900

OIC,

OIC(1E-5)-DI~ OICP-D~ OIC(1E-51 OICP-NDC I ,*,P,i~qt,f?ilu~t. . . . I . . . . . . . . . . . . 0

5

10

15

[ . . . . [ . . . . [ . . . . 1,, , i

20 25 30 Total CPU time (sec)

35

40

(b) Spaceframe example Fig. 9. Performance of different incomplete factorization schemes.

45

50

272

N. BITOULASand M. PAPADRAKAKIS

PPR(~b) are controlled by a user-specified parameter ~k. D C G corresponds to the diagonally scaled conjugate gradient. In estimating the computer storage it is assumed that integers are stored as INTEGER*2 or INTERGER*4, according to their maximum values, and the floating point variables as REAL*8, using double precision arithmetic. The number of iterations and the total CPU time for the two examples are depicted in Figs 6 and 7, respectively. Iterations were continued until the error tolerance criterion Ilridl/[lbll<10 6 was satisfied, where r is the residual vector rl = b - Axe. These figures clearly demonstrate a superiority, in both computing time and storage, of PCG methods over direct methods. An overwhelming advantage of PCG is observed in the well-conditioned 3D example where the diagonally scaled CG was found to be much faster than the direct method. The results presented give the opportunity to assess the efficiency of preconditioners in problems with different degrees of illconditioning. In the ill-conditioned plate example, the performance of OIC(~) is much better than OICP, SSOR, PPR and D C G at the expense of storing more elements in B. Figure 8 presents some tests on the stability behaviour of the optimized Cholesky factorization without diagonal correction either before or during the factorization process. Further insight into the performance of the optimized incomplete Cholesky factorization may be gleaned from Fig. 9. Different techniques have been applied to preserve the stability of the factorization process. OIC(~) and OICP implement the diagonal correction proposed by Jennings and co-workers 2q and are included in algorithm 2. The symbol "DM(ct)" corresponds to a Diagonal Modification before factorization by adding ~tl to the diagonal elements of the original matrix A. The optimum scalar parameter ct for each case is shown in parentheses. The symbol " N D C " means that No Diagonal Correction is applied either before or during the factorization. The results indicate the superiority of Jennings' correction which becomes more pronounced for ill-conditioned problems. CONCLUSIONS

The numerical results presented in this study confirm the superiority of iterative methods over direct equation solvers in large-scale finite element applications. Preconditioned conjugate gradient-like methods offer considerable flexibility in the selection of proper preconditioning techniques according to the type of problem and the available computing resources. For well-conditioned problems, preconditioners based on the SSOR characteristic method substantially improve the convergence rate of the methods with a minimum additional storage for handling the preconditioning matrix. The incomplete Cholesky

factorization, based on rejection by position, is a stronger preconditioner than the SSOR-based preconditioners, and has the advantage that the additional storage can be prescribed beforehand. The case in which the same sparsity pattern of the coefficient matrix is enforced in the preconditioning matrix is particularly advantageous, since no extra addressings are necessary for storing L. For ill-conditioned problems, the incomplete Cholesky factorization by magnitude is the only efficient global preconditioner which can compete with direct solvers both in terms of computing time and reduced storage requirements. The proposed computer implementation avoids complicated addressings during the formation of the preconditioner and keeps the required storage to a minimum, while it succeeds in reducing the factorization time as well. The optimum values, in terms of computing time, for the rejection parameter ~k are problem-dependent. In a number of tests performed they were quite often found to be in the range 10 6-10-5. If the available computer storage is occupied before the end of the incomplete factorization, then it is better to restart the factorization with a new ff value, rather than changing it to 1.0 during the factorization. Tests performed with the latter scheme showed a significant deterioration of the preconditioner's efficiency. The additional factorization time of a restarted procedure is eventually compensated by the improved convergence rate of the iterative method. Finally, diagonal corrections during incomplete factorization produce stable and efficient preconditioning matrices compared with the unmodified factorization or to the diagonal modification before the start of the factorization. The first approach is particularly advantageous for ill-conditioned problems. Acknowledgement--The authors are grateful to Prof. A.

Mamalis for providing the computing facilities of the Laboratory of Manufacturing Technology, National Technical University of Athens, and to its staff for their assistance. REFERENCES

1. M. Papadrakakis, "Solving large-scale linear problems in solid and structural mehcanics", in M. Papadrakakis (ed.), Solving Large-scale Problems in Mechanics--The Development and Application of Computational Solution Methods, John Wiley & Sons, 1993.

2. A. D. Tuff and A. Jennings, "An iterative method for large systems of linear structural equations", Int. J. Num. Meth. Engng 7, 175-183 (1973). 3. A. Jennings and G. M. Malik, "The solution of sparse linear equations by the conjugate gradient method", Int. J. Num. Meth. Engng 12, 141-158 (1978). 4. M. A. Ajiz and A. Jennings, "A robust incomplete Choleski conjugate gradient algorithm", Int. J. Num. Meth. Engng, 20, 949-966 (1984). 5. M. Papadrakakis and M. C. Dracopoulos, "Improving the efficiencyof incomplete Choleski preconditionings", Comm. Appl. Num. Methods 7, 603-612 (1991).

273

Computer implementation of incomplete Cholesky factorization 6. D. S. Kershaw, "'The incomplete Cholesky-conjugate gradient method for the iterative solution of systems of linear equations", J. Comput. Phys. 26, 43~55 (1978). 7. T. A. Manteuffel, " A n incomplete factorization technique for positive definite linear systems", Math. Comput. 34, 4 7 3 4 9 7 (1980). 8. J. A. Meijerink and H. A. van der Vorst, " A n iterative solution of linear systems of which the coefficient matrix is a symmetric M-matrix", Math. Comput 32, 148-162 (1977). 9. A. I. Gustafsson, " A class of first order factorization methods", BIT 18, 142 156 (1978). 10. N. Munksgaard, "Solving sparse symmetric sets of linear equations by preconditioned conjugate gradients", ACM Trans. Math. Software 6, 206-219 (1980). 11. O. Axelsson and N. Munksgaard, "Analysis of incomplete factorizations with fixed storage allocation", in

12. 13. 14. 15. 16.

D. J. Evans (ed.), Preconditioning Methods, Analysis and Applications, Gordon Breach, New York, 1983, pp. 219-241. O. Axelsson and G. Lindskog, "On the eigenvalue distribution of a class of preconditioning methods", Num. Math. 48, 4 7 9 4 9 8 (1986). M. Papadrakakis and G. Babilis, "Solution techniques for the p-version of the adaptive finite element method", Int. J. Num. Meth. Engng (To appear). K. J. Bathe and E. L. Wilson, Numerical Methods in Finite Element Analysis, Prentice Hall, Englewood Cliffs, NJ, 1976. M. Papadrakakis, "Accelerating vector iteration methods", J. Appl. Mech., A S M E 53, 291-297 (1986). M. Papadrakakis and M. C. Dracopoulos, " A global preconditioner for the element-by-element solution methods", Comp. Meth. Appl. Mech. Engng 88, 275 286 (1991).

APPENDIX. LISTING OF FORTRAN SUBROUTINE FOR T H E P R O P O S E D I N C O M P L E T E CHOLESKY FACTOR1ZATION

c* c* the following subroutine computes the incomplete cholesky e* factor of a positive definite matrix, using jennings c* diagonal modification. c* e* the main variables used are: c* n = n u m b e r of equations c* na = number of nonzero elements of a c* a = matrix to be factorized c* iac = column positions of nonzero element of a c* lad = positions of diagonal elements of a c* nw = m a x i m u m n u m b e r of elements of I allowed c* nwte = counter of elements in I c* rej = counter o{rejeeted elements c* I = incomplete factor of a c* lie = column positions of nonzero elements of I c* ild = p o s i t i o n s o f diagonal elements o f I c* psi = rejection p a r a m e t e r c* istart = - unused - (can be ommited) c* isky = skyline of a and pointers c* li = holds the row to be factorized e* pfall = signals pivot failure e* sfail = signals m e m o r y full error C* C*********************************************************~

subroutine pelm3 (n,na, a,iac,iad,nw,nlte,rej,l,ilc,ild,psi, +istart,isky,li,pfail,sfail ) logical sfail, pfail integer n, i, j, k, nite integer na, nw, rei, nl, m a x k integer*2 iac(na), ilc(nw) integer lad(n+ 1), lid(n+ 1) integer isky(n), istart(n) real psi real*8 aft,aft,dcocf, a(na),l(nw),x,li(n),Ikik sfail = .false. pfail = .false. rej = 0 maxk=O nl = nw - n d o 2 0 0 0 i = 1, n l(nl+i)= a(iad(i)) isky(i)= i li(i)=o. 2000 continue nit~-0 call second(tl) do 3000i = 1, n nlte=-nlte+l ild(i)=nlte ilc(nlte)=i aii=l(nl+i)

* * * *

* *

* * * * * * * * * * * *

* * * * * *

274

.N. BITOULASand M. PAPADRAKAKIS

1015

1010 1150

1020

2800 270 3000

kO=iad(i+ 1) j---ise(kO-1) ff(j.gtanaxk)maxk=j do 1015 j=iad(i)+l,kO-1 m=iac(j) li(m)=a(j) if(isky(m).eq.m)isky(m)=i continue kstart=isky(i) do 1150 k=kstart,i-I lsky=isky(k) ff(ilcOsky).eq.i)then llOk=l(lsky)*l(ild(k)) aii=aii-l(Isky)*lkik isky(k)=lsky+ 1 do 1010 j=isky(k),ild(k+l)-I m=ilc(j) li(m)=li(m)-lkik*l(D continue endif continue do 1020 j=i+l,maxk ff(isky(j).eq.j) goto 1020 x=lifj) ff(x.ne.0.)then li(j)=0 ajj=l(nl+j) ff(x*x.lt.psi*ali*ajj) then x=abs(x) dcoef=sqrt(aii/ajj) aii=aii+x*dcoef l(nl+j)=ajj+x/dcoef rej=rej+l else nlte=nlte+ 1 if(nlte.ge.nw)then sfail=.true. write(*,*)'nlte.>nw',nlte,'>':aw return endif l(nlte)=x ilc(nlte)=j endif endif continue ff(aii.lt.O.)then write(*,*)'warning: non-positive diag term in prec ',i write(*,*)'aii= ',aft endif l(ild(i))=aii do 2800 j=ild(i)+ 1,nlte l(j)=l(j)/aii continue format(lx,e24.16,i10) isky(i)=ild(i)+ 1 if(isky(i).gt.nlte)isky(i)=ild(i) continue fld(n+ 1)=nlte+ 1 call second(t2) write(*,*)'**** preconditioner calculated in ',t2-t 1,'sees' write(*,*)'elements of preconditioner ='~nite ~rrite(*,*)'elements rejected =',rej write(8,*)'elements of preconditioner =',nlte write(8,*)'elements rejected =',rej return end

An optimized computer implementation of incomplete Cholesky factorization

An optimized computer implementation of incomplete Cholesky factorization

Recommend Documents