Preconditioners for domain decomposition methods

Preconditioners for domain decomposition methods

An internationalJournal computers & mathematics with applications Computers PERGAMON and Mathematics with Applications 42 (2001) 114331155 www.el...

926KB Sizes 0 Downloads 140 Views

An internationalJournal

computers & mathematics with applications Computers

PERGAMON

and Mathematics

with Applications

42 (2001) 114331155 www.elsevier.nl/locate/camwa

Preconditioners for Domain Decomposition Methods X. JUVIGNY AND J. RYAN ONERA (Office National d’Etudes et de Recherches Aerospatiales) 29, avenue de la Divison Leclerc, 92320 Chatillon, France Qonera.fr Abstract-within the FETI domain decomposition method applied to nonsymmetric linear systems, a generic set of algebraic parallel global preconditioners is presented. They can be applied to most problems with a saddle-point formulation. An industrial test on a fluid mechanic matrix shows the independence of these preconditioners with the number of subdomains. @ 2001 Elsevier Science Ltd. All rights reserved.

1. INTRODUCTION During the last years, many algorithms solving nonsymmetric linear systems have been developed. To deal with rapidly growing problems, highly efficient parallel methods have been introduced. Amongst them, the Schur domain by Roux and Fahrat [l], proposes

decomposition method (or FETI method), a dual approach of the Schur formulation.

that this method has a better rate of convergence than the primal approach, number increases with discretization and the amount of subdomains.

recently presented Roux has proved though

its condition

Local preconditioners can alleviate the first constraint, and global preconditioners the second one. Several techniques have emerged in the last few years, combining the two types of preconditioning: Neumann-Neumann solvers based on local inverses [2-71 and a coarse grid preconditioner [8,9], wire basket algorithms [lo-131, multilevel preconditioners [14]. In this paper, we present another family of parallel global preconditioners that are quite generic and can be extended to a large set of preconditioners. An industrial test on a fluid mechanics matrix shows the independence of this preconditioner with the splitting of the domain. These preconditioners are purely algebraic, to be used as black box preconditioners, to most problems with a saddle-point formulation. 2.

DUAL

SCHUR

ALGEBRAIC

and can be applied

FORMULATION

In this section, we present an algebraic interpretation of the dual Schur domain decomposition method. For simplicity, let us first consider a domain R split in two subdomains Ri and & with the following properties: Ri n R2 = 0,

r3 = fil n iI # 0. 0898-1221/01/S - see front matter 0 PII: SO898-1221(01)00228-O

2001 Elsevier Science Ltd. All rights reserved.

Typeset

by &+S-QX

X. JUVIGNY AND J. RYAN

1144

For i = 1,2, let

where Aii is the subdomain

the local C&matrix to the interface

and A$ is the subdomain

matrix

Ri contribution

Is.

We can write the global matrix

A=

If we split the right-hand

(;!I

$:

s).

side vector and the solution

as

where

then to solve Aa: = f, one can solve

(1)

where A$\’ + A$

= Ass. Taking

into account

-431x1 + A$;‘)23

that + A32x2

+

p there is a unique

X satisfying

Ag)x3 = f3, + fJ2’ =

(2)

f3,

the relation

Ax=f~

(3)

If we note Bi = (0, II-~) the boundary

operator,

Ir, the identity

Bi : (Qi, r3) -

(Xi,xf’) equation

(3) can be rewritten

operator

restricted

to I’s

r33

xp,

as AIXl

= Fl + Bt,X,

A2X2 = F2 - B;X, BIXl

- B2X2 = 0,

(4

Domain Decomposition

Methods

1145

and the matrix system to inverse is

(;I An interpretation

4,

-!;)

(;J

= (!).

(5)

of this problem in the case of a linear problem originating from a second-order

discretized partial differential equation consists in imposing a Neumann boundary condition on the interface and continuity of the solution through this interface. One can generalize this formulation to n subdomains Ri and rij

interfaces which split the

domain. The linear system Ax = f is equivalent to AiXi + c

B;,& = Fi,

j=l

(6)

11

kc

BjiXi = 0,

i=l j=l

where Bji = -Bij = (OIr+). This formulation has the general form of a saddle point-problem, associated to the linear problem, AX = F on a domain R = URj

A’=

(; “o’> (;) =(;),

(7)

X denoting Lagrange multipliers at interfaces imposing continuity of the solution and A = (d/n,)j a block diagonal matrix. Solving the global linear system (7) with a direct method is expensive. A classical solution is to solve the problem with a hybrid approach such as a substructuring method [15] involving a local direct solver. In the case of unstructured

meshes, even this local solver can become penalizing in

terms of cost. In this article, a fully iterative method is used to solve the linear problem. In the next sections, we present two algebraic preconditioners for the global linear system (7) which are used with a GMRES

3. BLOCK

algorithm [16].

DIAGONAL

PRECONDITIONER

This section introduces a block diagonal preconditioner. conditioner is its high potential of parallelization, process.

The main advantage of a block pre-

as the inversion of each block is an independent

The next theorem, presented in [17] is the base of the following study. THEOREM

1. Let and

with A a nonsingular matrix.

A

M=

0

0 BA-lBt

-’ )

Then matrix A 0

MA=

0 BA-‘Bt)-’

(:

“o’>

has only three possible eigenvalues

PROOF.

Let A be an eigenvalue of MA, and U =

(: If Y = 0, we have AX = RAX.

“0”)

(c)

=A(f

the associated eigenvector defined by

BA”Bt)

(c)’

(8)

A being nonsingular, and X # 0, the first eigenvalue is Al = 1.

X. JUVIGNY AND J. RYAN

1146

If Y # 0, we can write (8) with a two block linear system AX + BtY = AAX,

(9)

Y.

BX = A (BA-lBt)

A being nonsingular, equation (9) gives the following relation:

-&?A-‘BtY The Schur complement BA-lBt

= A (BA-lBt)

(10)

Y

is nonsingular, so A must verify A(A - 1) = 1.

This second-order equation has two real solutions

Az=~,

1+&

A =l-fi 3

(11)

2.

Al, AZ, A3 are the only possible eigenvalues.

I

The exact computing of this preconditioner with a symmetric matrix A shows than an iterative method, such as the conjugate gradient method, will converge in three iterations at most. Kuznetsov shows that this preconditioner is an optimal block diagonal preconditioner for a symmetric matrix as M must be a positive definite matrix to solve the preconditioned linear system with a conjugate gradient algorithm. For a nonsymmetric

matrix A, this preconditioner reduces the number of eigenspaces to three

eigenspaces and the iterative method will converge faster. But, if A is a nonsymmetric

matrix,

M can also be a nonpositive definite matrix. To improve the convergence of the iterative method, we introduce a parameter o in matrix A4

M,

=

A 0

0

-1



aBA-lBt

With a similar approach to Theorem 1, we find that the M,A

matrix has the three following

eigenvalues: Al = 1; If we choose Q = -4, PROPOSITION

(12)

we have the following proposition.

1. Let

be a nonsingular matrix.

The matrix

M_4d=

A ( 0

-4BL3t)-1

(t

Bgt )

has only two possible eigenva1ue.s Al = 1;

AZ=;.

This preconditioner reduces the amount of eigenspaces to two eigenspaces. The convergence behavior of the iterative method is theoretically better with this Q parameter than the initial Kuznetsov block preconditioner.

1147

DomainDecompositionMethods Convergence Dassault matrix

-25

0

I

I

I

I

I

20

40

60 Number of iterations

80

100

120

Figure 1. Comparisonbetweenpreconditionersfor A. 3.1. Approximations

of A-’

and BAelBtel

TO compute exactly A-l is too expensive, and thus an approximation of the inverse of the matrix A has to be found. We can use for this a SPA1 method (Sparse Approximate Inverse) [18] as proposed in [lg], or an ILUT approach [20]. The SPA1 method is better to find an optimal graph for the sparse inverse approximation, but with the same graph, the ILU approach is more efficient as can be seen in Figure 1 where a comparison is shown between different approximations of A-l

used as preconditioners in the GMRES.

NOTE. All the following convergence results were obtained solving a three-dimensional steady Navier Stokes equation for a flow around a buff body. The convergence criteria based on the L2 norm of the residue is ]]F - AXI\ 5 10-l’. The dimension of the matrix is 17830, and the number of nonzeros is 1011600. The ILU approach has been chosen in this paper. We compute ia the incomplete factorization of A and A-’ = a-lz-l. It is shown in Figure 1, a comparison between different approximations of A-’

used as preconditioners in the GMRES.

The next problem is to compute an approximate inverse of BAelBt. Roux in [2I] suggested approximating (BA-lBt)-l with BABt. The performance of this approximation is not very good. We prefer here to approximate the solution of BaelBtx = fx with a few iterations of the GMRES(I) algorithm. It is shown in Figure 2, the efficiency of GMRES (1 = three and eight iterations) versus BABt for the same matrix as above. REMARK. The condition number of M,A, approximated by the ratio norm of the highest eigenvalue over the lowest reaches its minimum for cr = -1 . M-1 can be seen as an approximation of the Gauss factorization where nondiagonal blocks (interface connections) are left out. It is shown in Figure 3, the effect of the cx parameter on convergence. As foreseen, cx = -4 gives the best result closely followed by (u = -1 (see remark above). The influence of the number of subdomains is seen in Figure 4, as the number of iterations increases from 200 to 300, whereas the global matrix is split from two domains into eight subdomains, showing the need for a better treatment of the interface.

1148

X. JUVIGNY AND J. RYAN I

,

I

,

I

!

!

I

i

1 50

-10 0

I

1

100

150

I 200

Figure 2. Comparison

50

100

150

200

250

300

I

I

350

for

----

-

I

400

450

500

BA-'Bt.

body - 4 domains

350 400 450 500 550 600 650 Number of matrix vector products

Figure 3. Influence of the cx parameter

4. GAUSS

I 300

between preconditioners Blunt

0

I 250

I

BAR 3 Gmres iter. *.Gijif&it&r,-

for the Kuznetsov

700

750

800

850

900

950 1000

block preconditioner.

APPROXIMATION

The block-diagonal approach of an algebraic preconditioner leaves out the interaction between the subdomains. So, it is very difficult to obtain a preconditioner which gives a convergence rate independent of the amount of subdomains. Using a Gauss elimination approach, we can obtain a preconditioner similar to the blockdiagonal preconditioner, but which is independent of the number of subdomains.

Domain Decomposition

1149

Methods

4 CPU

2CPu

1.ox1o-“‘....I... 1

.I

.

.

.

.

.

.

IterziolL3

1.0x10-”

..I

I

,

I

200

1

285

100

Iteration8

200

IteldiOnS

number influence for the Kuznetsov

block preconditioner.

Let

be the Schur

dual formulation of the linear system. A Gauss factorization will give

‘=

I

268

8 CPU

100

Figure 4. Subdomain

200

IterptiOnS

6 CPU

1

,

100

1

200

(; “0”) =(;

_$I@)

(; A-;Bt)*

287

X. JUVIGNY AND J. RYAN

1150

So to solve the following

problem:

(i-i“o’)(;)=(:) is equivalent

to solving (BA-‘Bt)

Once X is computed,

X = BA-‘f

x can then be obtained

of the GMRES The Gauss

(f -

BtX) .

the exact inverse of A is too expensive, inverse of A with ILU. And (BAelBt)algorithm

(13)

as

z = A-’ As computing an approximate

- 1.

(14)

we compute,

as in the previous

’ is approximated

section,

with a few iteration

as before.

factorization

approximation

algorithm

is presented

(1) Solve (&A,l@ + &A11,l@)X = (BlA;lfi (2) Compute xi = &‘(fi - B,tX), i = 1,2.

+ &A,‘fi)

below with two subdomains. - 1.

The generalization to more subdomains is easy to make as the matrix has the same structure as the two subdomain matrix.

with more subdomains

4.1. Numerical Tests 4.1.1.

Numerical comparison between the two preconditioners

As can be seen in Figure domains,

the Gauss

5 which compares

preconditioner

two main values of Q (1, -4). Taking into account the interface number

of subdomains

the two preconditioners

is more efficient also improves

than

for two domains

the Kuznetsov

the independency

preconditioner

of the Gauss method

and four for the on the

as seen in Table 1. Blunt body - 2 domains I

,

I

100

I

,

150 200 250 300 350 Number of global matrix vector product

Comparison

Kuznetsov Gauss-Two

Figure 5. Comparison

Domains

Kuznetsov Gauss.

Klznetsov(!) Kljzne?sov(:4).-::--. _ : Gaugs --.--

Domain Decomposition

1151

Methods

Blunt body - 4 domains I 1 1

I

I

I

-3 -4 j__

-5

.

$ -I

:: ,*. i .I.

-6

.:x>, I ‘. *...

-7

‘\ ‘.

j

.j

.,...

,._

.. ?. j .. j

‘. I

-10 0

50

I 100

I

I

‘\

‘\-+__ i. ..‘\‘,

I

._, ‘.

‘I

...

I

250 300 350 150 200 Number of global matrix vector products

Comparison

Y\

. .. I... ..

-9

Kuznetsov Gauss-Four

..-

:. : : :.

.. .

.-.

‘\ . ..\

I. ,_

..,

“L !. .... i ..

-8 _

..I ‘.

~

.

;.

I

I

400

450

._ ... .._

500

Domains

Figure 5. (cont.) Table 1. Subdomain

number influence for a 100

x

100 mesh.

The effect of the required precision when solving iteratively the approximate problem BA-lBtX

= fx

(15)

is shown for the two preconditioners in Figure 6. Two comments can be made. l

l

4.1.2.

A fine precision is not necessary. Five to eight iterations corresponding to a precision varying between 0.1 and 0.05 are quite sufficient if an overall fine precision is desired. If a low precision is asked for the global problem, then a high precision in the solving of (15) can be of interest in terms of cost. Validation

of the Gauss preconditioner

on a large test case: The M6 wing

The Gauss preconditioner was tested on matrices provided by a steady Navier-Stokes code computing the flow around the ONERA M6 wing with 400,000 degrees of freedom, and 33 million nonzero coefficients. The precision required to solve (15) is 0.1. In Figure 7, convergence

1152

X. JUVIGNY AND J. RYAN

Blunt body - 2 domains

I

1

I

I

I

I

I

I

I

Kketsov( i ) K_I@&Q~(-& ----. Gauss ..-..

._ ;

~ ......

:...

-8

:j **. ,

\ I

I

‘\

I

I

300 350 200 250 150 Number of global matrix vector product

100

Kuznetsov-Four

I

I

400

450

500

Domains

Blunt body - 4 domains

‘.

i

‘... -6

_..

j

‘\

‘-.

: j

-10 0

...j

.;

I 50

I 100

I

..i.

: :-. :

. . .

I................. -8 __.._..__......... i____ .._._____..._................................. _g _..

:

‘\

: .. ..q

i

;...

,..

I.‘. : ‘I.... *.I~ ..I..

Figure

6. Precision

Domains of the interface

solver.

.\

_ .: .._............._. ._:.:.;.-T>,;,..... -.. .,

I I I 300 350 150 200 250 Number of global matrix vector products Gauss-Four

‘.

1 400

I 450

500

Domain Decomposition

Methods

1153

Gauss - 8 and 16 domains

150

0

200

250

Kuznetsov-Four

300

350

400

450

,

!

500

Domains

Figure 7. M6 wing: Subdomain

number influence.

Gauss - 16 domains - time variation I

I

I

I

!

I

!

cfl=002.!tr=l PO -

i

..:

.;

.$&:~~;F;!g cfl=&itr&O

cflk002.itr=5bO

__;.;IYy.: _ ..-....... -.-‘-

..-. -----------i------------------i------------..-.‘...‘_‘....’ ..-‘-T .. ....-

100

150

200 Gauss-Four

250

300

350

Domains

Figure 8. M6 wing: Time integration influence.

400

450

500

1154

X. JUVIGNYAND J. RYAN

history for 8 and 16 domains is the same, enhancing the efficiency of the parallelism of the Gauss preconditioner. In Figure 8, in the case of 16 domains, convergence for different time steps shows the robustness of the method. 4.2. Reducing the Costs The main cost of these preconditioners resides in the repeated iterative computation of (15)) the cost of each iteration consisting of an upward-downward sweep of the ILU factorization cost equivalent to 1.5 of a global matrix vector product).

(a

Search vectors in the GMRES algorithm

are defined by the size of the interface problem, which is usually small when compared to the global problem.

Thus, for a small increase in memory space, if these search vectors are stored,

the cost of the next global iteration step consisting in solving again (15) with another right-hand side can be reduced by projecting the initial residue on the family of former search vectors. Figure 9 clearly shows the gain in cost. Blunt Body ,CFL=OOP,4 domains, eps=.05, with and without projection

I

I

I

I

I

1

I

I

1 without

-

0

-1

-6

-8 -9 10

0

50

100

150

200 250 300 350 Number of matrix vector products

400

450

500

Figure 9. Interface preconditioner.

5.

CONCLUSION

Two preconditioners for the saddle-point formulation of the dual Schur domain decomposition method have been presented. The first is an extension to nonsymmetric matrices of the Kuznetsov preconditioner; the second is an approximation of the Gauss factorization of the initial matrix. The numerical tests for the Gauss preconditioner which takes into account interface connections, in accordance with the theory, is better than the Kuznetsov block preconditioners. Validation on a large test case has shown independence of the convergence versus the number of subdomains for the Gauss technique. We have used here an iterative method to approximate the solution of the dual Schur problem, but one could use any appropriate method to approximate the solution of the dual Schur problem. The Gauss preconditioner, though better in terms of iterations, could become prohibitive in terms of cost. In this case, one would prefer the modified Kuznetsov preconditioner with cy = -4,

Domain Decomposition Methods which is no more expensive

than the original

ces, but we will lose independence These methods can

are generic.

be used to approximate

When

computing

of convergence. sufficient

But

the solution

numerical

around

the overall convergence. and projecting

of convergence

versus the number

too weak an accuracy

have shown

that

The next step to improve with

of subdomains. (direct

will deteriorate

a relative

five to seven inner iterations), residue

matri-

or iterative)

of the local or the dual Schur equation.

inverses, tests

for nonsymmetric

any algorithm

accuracy

and greater

Cost of this inner solver can be reduced

the initial

these preconditioners

preconditioner

At each step of these methods,

approximate

(requiring

Kuznetsov

1155

of 10-l

accuracy

by storing

the good rate was quite

did not improve

former search vectors

onto this set of directions.

the Dual Schur domain “classical

decomposition

method

will be to combine

ones” such as coarse grid preconditioners.

REFERENCES 1.

C. Farhat

and F.X. Roux,

Implicit

parallel processing

in structural

mechanics,

Computational

Mechanics

Advances 2, (1994). 2. P. LeTallec, Y.H. De R.oeck and M. Vidrascu, Domain decomposition methods for large linearly elliptic three dimensional problems, J. Comput. Appl. Math. 34, 93-117, (1991). operators and domain decomposition methods in finite dimensional spaces, 3. V. Aghoskov, Poincar&Steklov’s In Proc. First. Int. Symp. on Domain Decomposition Methods for Partial Differential Equations, SIAM, Philadelphia, PA, (1988). 4. J.F. Bourgat, R. Glowinski, P. Le Tallec and M. Vidrascu, Proc. Second Znt Symp. on Domain Decomposition Methods for Partial Differential Equations, SIAM, Philadelphia, PA, (1989). 5. P. Morice, Transonic computations by a multidomain technique with potential and Euler solvers, In nanssonicum IIZ, IUTAM Symp. Glittingen, Springer-Verlag, Berlin, (1989). 6. Y.H. DeRoeck, P. LeTallec and M. Vidrascu, A domain decomposition solver for nonlinear elasticity, Comput. Methods Mech. Engrg. 99, 187-207, (1992). 7. P. LeTallec and M. Vidrascu, Algorithme de d&omposition de domaines pour r&oudre les syst&mes lineaires de la mecanique des fluides, Rapport de contrat STPA, INRIA, (1993). 8. J. Mandel, Two level domain decomposition preconditioning for the p-version finite element method in three dimensions, Internat. J. Numer. Methods Engrg. 29, 1095-1108, (1990). Common. Numer. Methods 9, 233-241, (1993). 9. J. Mandel, Balancing domain decomposition, 10. J. Bramble, J. Pasciak and A. Shataz, The construction of preconditioners for elliptic problems by substructuring, IV, Math. Comput. 53, l-24, (1989). 11. B. Smith, A domain decomposition algorithm for elliptic problems in three dimensions, Numer. Math. 60, 219-234, (1991). 12. B. Smith, An optimal domain decomposition preconditioner for the finite element solution of linear elasticity problems, SIAM, J. Sci. Stat. Comput. 13, 364-378, (1992). 13. B. Smith, A parallel implementation of an iterative substructuring algorithm for problems in three dimensions, SIAM, J. Sci. Stat. Comput. 14, 406-423, (1992). 14. J. Ryan and F.X. Roux, Extension of the dual Schur method into a multi-domain, multi-level, hierarchical solve, In Proceedings of “Algebraic Multilevel Iteration Methods with Applications”, University of Nijmegen, (1996). Parallel Multilevel Methods for Elliptic Partial 15. B. Smith, P. Bjtirstad and W. Gropp, Domain Decomposition: Diflerential Equations, Cambridge University Press, (1996). 16. Y. Saad and M.H. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, Rapport technique no. 254, Yale University, Department of Computer Science, (August 1983). 17. Y.A. Kuznetsov, Efficient iterative solvers for elliptic finite element problems on nonmatching grids, Russian Journal of Numerical Analysis Mathematic Modelling 10, (1995). 18. T. Huckle and M. Grote, A new approach to parallel preconditioning with sparse approximate inverses, Technical Report, Stanford University, (1994). 19. X. Juvigny, R&olution de grands systemes lineaires sur machines massivement paralltiles, Ph.D. Thesis, Universit6 de Paris VI, (1997). 20. Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Publishing, (1996). 21. F.X. Roux, MBthode de d&omposition de domaine & I’aide de multiplicateurs de Lagrange et application & la r&solution en parallele des huations de l’blasticit6 linbaire, Ph.D. Thesis, UniversitB de Park VI, (1989).