An internationalJournal
computers & mathematics with applications Computers
PERGAMON
and Mathematics
with Applications
42 (2001) 114331155 www.elsevier.nl/locate/camwa
Preconditioners for Domain Decomposition Methods X. JUVIGNY AND J. RYAN ONERA (Office National d’Etudes et de Recherches Aerospatiales) 29, avenue de la Divison Leclerc, 92320 Chatillon, France Qonera.fr Abstract-within the FETI domain decomposition method applied to nonsymmetric linear systems, a generic set of algebraic parallel global preconditioners is presented. They can be applied to most problems with a saddle-point formulation. An industrial test on a fluid mechanic matrix shows the independence of these preconditioners with the number of subdomains. @ 2001 Elsevier Science Ltd. All rights reserved.
1. INTRODUCTION During the last years, many algorithms solving nonsymmetric linear systems have been developed. To deal with rapidly growing problems, highly efficient parallel methods have been introduced. Amongst them, the Schur domain by Roux and Fahrat [l], proposes
decomposition method (or FETI method), a dual approach of the Schur formulation.
that this method has a better rate of convergence than the primal approach, number increases with discretization and the amount of subdomains.
recently presented Roux has proved though
its condition
Local preconditioners can alleviate the first constraint, and global preconditioners the second one. Several techniques have emerged in the last few years, combining the two types of preconditioning: Neumann-Neumann solvers based on local inverses [2-71 and a coarse grid preconditioner [8,9], wire basket algorithms [lo-131, multilevel preconditioners [14]. In this paper, we present another family of parallel global preconditioners that are quite generic and can be extended to a large set of preconditioners. An industrial test on a fluid mechanics matrix shows the independence of this preconditioner with the splitting of the domain. These preconditioners are purely algebraic, to be used as black box preconditioners, to most problems with a saddle-point formulation. 2.
DUAL
SCHUR
ALGEBRAIC
and can be applied
FORMULATION
In this section, we present an algebraic interpretation of the dual Schur domain decomposition method. For simplicity, let us first consider a domain R split in two subdomains Ri and & with the following properties: Ri n R2 = 0,
r3 = fil n iI # 0. 0898-1221/01/S - see front matter 0 PII: SO898-1221(01)00228-O
2001 Elsevier Science Ltd. All rights reserved.
Typeset
by &+S-QX
X. JUVIGNY AND J. RYAN
1144
For i = 1,2, let
where Aii is the subdomain
the local C&matrix to the interface
and A$ is the subdomain
matrix
Ri contribution
Is.
We can write the global matrix
A=
If we split the right-hand
(;!I
$:
s).
side vector and the solution
as
where
then to solve Aa: = f, one can solve
(1)
where A$\’ + A$
= Ass. Taking
into account
-431x1 + A$;‘)23
that + A32x2
+
p there is a unique
X satisfying
Ag)x3 = f3, + fJ2’ =
(2)
f3,
the relation
Ax=f~
(3)
If we note Bi = (0, II-~) the boundary
operator,
Ir, the identity
Bi : (Qi, r3) -
(Xi,xf’) equation
(3) can be rewritten
operator
restricted
to I’s
r33
xp,
as AIXl
= Fl + Bt,X,
A2X2 = F2 - B;X, BIXl
- B2X2 = 0,
(4
Domain Decomposition
Methods
1145
and the matrix system to inverse is
(;I An interpretation
4,
-!;)
(;J
= (!).
(5)
of this problem in the case of a linear problem originating from a second-order
discretized partial differential equation consists in imposing a Neumann boundary condition on the interface and continuity of the solution through this interface. One can generalize this formulation to n subdomains Ri and rij
interfaces which split the
domain. The linear system Ax = f is equivalent to AiXi + c
B;,& = Fi,
j=l
(6)
11
kc
BjiXi = 0,
i=l j=l
where Bji = -Bij = (OIr+). This formulation has the general form of a saddle point-problem, associated to the linear problem, AX = F on a domain R = URj
A’=
(; “o’> (;) =(;),
(7)
X denoting Lagrange multipliers at interfaces imposing continuity of the solution and A = (d/n,)j a block diagonal matrix. Solving the global linear system (7) with a direct method is expensive. A classical solution is to solve the problem with a hybrid approach such as a substructuring method [15] involving a local direct solver. In the case of unstructured
meshes, even this local solver can become penalizing in
terms of cost. In this article, a fully iterative method is used to solve the linear problem. In the next sections, we present two algebraic preconditioners for the global linear system (7) which are used with a GMRES
3. BLOCK
algorithm [16].
DIAGONAL
PRECONDITIONER
This section introduces a block diagonal preconditioner. conditioner is its high potential of parallelization, process.
The main advantage of a block pre-
as the inversion of each block is an independent
The next theorem, presented in [17] is the base of the following study. THEOREM
1. Let and
with A a nonsingular matrix.
A
M=
0
0 BA-lBt
-’ )
Then matrix A 0
MA=
0 BA-‘Bt)-’
(:
“o’>
has only three possible eigenvalues
PROOF.
Let A be an eigenvalue of MA, and U =
(: If Y = 0, we have AX = RAX.
“0”)
(c)
=A(f
the associated eigenvector defined by
BA”Bt)
(c)’
(8)
A being nonsingular, and X # 0, the first eigenvalue is Al = 1.
X. JUVIGNY AND J. RYAN
1146
If Y # 0, we can write (8) with a two block linear system AX + BtY = AAX,
(9)
Y.
BX = A (BA-lBt)
A being nonsingular, equation (9) gives the following relation:
-&?A-‘BtY The Schur complement BA-lBt
= A (BA-lBt)
(10)
Y
is nonsingular, so A must verify A(A - 1) = 1.
This second-order equation has two real solutions
Az=~,
1+&
A =l-fi 3
(11)
2.
Al, AZ, A3 are the only possible eigenvalues.
I
The exact computing of this preconditioner with a symmetric matrix A shows than an iterative method, such as the conjugate gradient method, will converge in three iterations at most. Kuznetsov shows that this preconditioner is an optimal block diagonal preconditioner for a symmetric matrix as M must be a positive definite matrix to solve the preconditioned linear system with a conjugate gradient algorithm. For a nonsymmetric
matrix A, this preconditioner reduces the number of eigenspaces to three
eigenspaces and the iterative method will converge faster. But, if A is a nonsymmetric
matrix,
M can also be a nonpositive definite matrix. To improve the convergence of the iterative method, we introduce a parameter o in matrix A4
M,
=
A 0
0
-1
’
aBA-lBt
With a similar approach to Theorem 1, we find that the M,A
matrix has the three following
eigenvalues: Al = 1; If we choose Q = -4, PROPOSITION
(12)
we have the following proposition.
1. Let
be a nonsingular matrix.
The matrix
M_4d=
A ( 0
-4BL3t)-1
(t
Bgt )
has only two possible eigenva1ue.s Al = 1;
AZ=;.
This preconditioner reduces the amount of eigenspaces to two eigenspaces. The convergence behavior of the iterative method is theoretically better with this Q parameter than the initial Kuznetsov block preconditioner.
1147
DomainDecompositionMethods Convergence Dassault matrix
-25
0
I
I
I
I
I
20
40
60 Number of iterations
80
100
120
Figure 1. Comparisonbetweenpreconditionersfor A. 3.1. Approximations
of A-’
and BAelBtel
TO compute exactly A-l is too expensive, and thus an approximation of the inverse of the matrix A has to be found. We can use for this a SPA1 method (Sparse Approximate Inverse) [18] as proposed in [lg], or an ILUT approach [20]. The SPA1 method is better to find an optimal graph for the sparse inverse approximation, but with the same graph, the ILU approach is more efficient as can be seen in Figure 1 where a comparison is shown between different approximations of A-l
used as preconditioners in the GMRES.
NOTE. All the following convergence results were obtained solving a three-dimensional steady Navier Stokes equation for a flow around a buff body. The convergence criteria based on the L2 norm of the residue is ]]F - AXI\ 5 10-l’. The dimension of the matrix is 17830, and the number of nonzeros is 1011600. The ILU approach has been chosen in this paper. We compute ia the incomplete factorization of A and A-’ = a-lz-l. It is shown in Figure 1, a comparison between different approximations of A-’
used as preconditioners in the GMRES.
The next problem is to compute an approximate inverse of BAelBt. Roux in [2I] suggested approximating (BA-lBt)-l with BABt. The performance of this approximation is not very good. We prefer here to approximate the solution of BaelBtx = fx with a few iterations of the GMRES(I) algorithm. It is shown in Figure 2, the efficiency of GMRES (1 = three and eight iterations) versus BABt for the same matrix as above. REMARK. The condition number of M,A, approximated by the ratio norm of the highest eigenvalue over the lowest reaches its minimum for cr = -1 . M-1 can be seen as an approximation of the Gauss factorization where nondiagonal blocks (interface connections) are left out. It is shown in Figure 3, the effect of the cx parameter on convergence. As foreseen, cx = -4 gives the best result closely followed by (u = -1 (see remark above). The influence of the number of subdomains is seen in Figure 4, as the number of iterations increases from 200 to 300, whereas the global matrix is split from two domains into eight subdomains, showing the need for a better treatment of the interface.
1148
X. JUVIGNY AND J. RYAN I
,
I
,
I
!
!
I
i
1 50
-10 0
I
1
100
150
I 200
Figure 2. Comparison
50
100
150
200
250
300
I
I
350
for
----
-
I
400
450
500
BA-'Bt.
body - 4 domains
350 400 450 500 550 600 650 Number of matrix vector products
Figure 3. Influence of the cx parameter
4. GAUSS
I 300
between preconditioners Blunt
0
I 250
I
BAR 3 Gmres iter. *.Gijif&it&r,-
for the Kuznetsov
700
750
800
850
900
950 1000
block preconditioner.
APPROXIMATION
The block-diagonal approach of an algebraic preconditioner leaves out the interaction between the subdomains. So, it is very difficult to obtain a preconditioner which gives a convergence rate independent of the amount of subdomains. Using a Gauss elimination approach, we can obtain a preconditioner similar to the blockdiagonal preconditioner, but which is independent of the number of subdomains.
Domain Decomposition
1149
Methods
4 CPU
2CPu
1.ox1o-“‘....I... 1
.I
.
.
.
.
.
.
IterziolL3
1.0x10-”
..I
I
,
I
200
1
285
100
Iteration8
200
IteldiOnS
number influence for the Kuznetsov
block preconditioner.
Let
be the Schur
dual formulation of the linear system. A Gauss factorization will give
‘=
I
268
8 CPU
100
Figure 4. Subdomain
200
IterptiOnS
6 CPU
1
,
100
1
200
(; “0”) =(;
_$I@)
(; A-;Bt)*
287
X. JUVIGNY AND J. RYAN
1150
So to solve the following
problem:
(i-i“o’)(;)=(:) is equivalent
to solving (BA-‘Bt)
Once X is computed,
X = BA-‘f
x can then be obtained
of the GMRES The Gauss
(f -
BtX) .
the exact inverse of A is too expensive, inverse of A with ILU. And (BAelBt)algorithm
(13)
as
z = A-’ As computing an approximate
- 1.
(14)
we compute,
as in the previous
’ is approximated
section,
with a few iteration
as before.
factorization
approximation
algorithm
is presented
(1) Solve (&A,l@ + &A11,l@)X = (BlA;lfi (2) Compute xi = &‘(fi - B,tX), i = 1,2.
+ &A,‘fi)
below with two subdomains. - 1.
The generalization to more subdomains is easy to make as the matrix has the same structure as the two subdomain matrix.
with more subdomains
4.1. Numerical Tests 4.1.1.
Numerical comparison between the two preconditioners
As can be seen in Figure domains,
the Gauss
5 which compares
preconditioner
two main values of Q (1, -4). Taking into account the interface number
of subdomains
the two preconditioners
is more efficient also improves
than
for two domains
the Kuznetsov
the independency
preconditioner
of the Gauss method
and four for the on the
as seen in Table 1. Blunt body - 2 domains I
,
I
100
I
,
150 200 250 300 350 Number of global matrix vector product
Comparison
Kuznetsov Gauss-Two
Figure 5. Comparison
Domains
Kuznetsov Gauss.
Klznetsov(!) Kljzne?sov(:4).-::--. _ : Gaugs --.--
Domain Decomposition
1151
Methods
Blunt body - 4 domains I 1 1
I
I
I
-3 -4 j__
-5
.
$ -I
:: ,*. i .I.
-6
.:x>, I ‘. *...
-7
‘\ ‘.
j
.j
.,...
,._
.. ?. j .. j
‘. I
-10 0
50
I 100
I
I
‘\
‘\-+__ i. ..‘\‘,
I
._, ‘.
‘I
...
I
250 300 350 150 200 Number of global matrix vector products
Comparison
Y\
. .. I... ..
-9
Kuznetsov Gauss-Four
..-
:. : : :.
.. .
.-.
‘\ . ..\
I. ,_
..,
“L !. .... i ..
-8 _
..I ‘.
~
.
;.
I
I
400
450
._ ... .._
500
Domains
Figure 5. (cont.) Table 1. Subdomain
number influence for a 100
x
100 mesh.
The effect of the required precision when solving iteratively the approximate problem BA-lBtX
= fx
(15)
is shown for the two preconditioners in Figure 6. Two comments can be made. l
l
4.1.2.
A fine precision is not necessary. Five to eight iterations corresponding to a precision varying between 0.1 and 0.05 are quite sufficient if an overall fine precision is desired. If a low precision is asked for the global problem, then a high precision in the solving of (15) can be of interest in terms of cost. Validation
of the Gauss preconditioner
on a large test case: The M6 wing
The Gauss preconditioner was tested on matrices provided by a steady Navier-Stokes code computing the flow around the ONERA M6 wing with 400,000 degrees of freedom, and 33 million nonzero coefficients. The precision required to solve (15) is 0.1. In Figure 7, convergence
1152
X. JUVIGNY AND J. RYAN
Blunt body - 2 domains
I
1
I
I
I
I
I
I
I
Kketsov( i ) K_I@&Q~(-& ----. Gauss ..-..
._ ;
~ ......
:...
-8
:j **. ,
\ I
I
‘\
I
I
300 350 200 250 150 Number of global matrix vector product
100
Kuznetsov-Four
I
I
400
450
500
Domains
Blunt body - 4 domains
‘.
i
‘... -6
_..
j
‘\
‘-.
: j
-10 0
...j
.;
I 50
I 100
I
..i.
: :-. :
. . .
I................. -8 __.._..__......... i____ .._._____..._................................. _g _..
:
‘\
: .. ..q
i
;...
,..
I.‘. : ‘I.... *.I~ ..I..
Figure
6. Precision
Domains of the interface
solver.
.\
_ .: .._............._. ._:.:.;.-T>,;,..... -.. .,
I I I 300 350 150 200 250 Number of global matrix vector products Gauss-Four
‘.
1 400
I 450
500
Domain Decomposition
Methods
1153
Gauss - 8 and 16 domains
150
0
200
250
Kuznetsov-Four
300
350
400
450
,
!
500
Domains
Figure 7. M6 wing: Subdomain
number influence.
Gauss - 16 domains - time variation I
I
I
I
!
I
!
cfl=002.!tr=l PO -
i
..:
.;
.$&:~~;F;!g cfl=&itr&O
cflk002.itr=5bO
__;.;IYy.: _ ..-....... -.-‘-
..-. -----------i------------------i------------..-.‘...‘_‘....’ ..-‘-T .. ....-
100
150
200 Gauss-Four
250
300
350
Domains
Figure 8. M6 wing: Time integration influence.
400
450
500
1154
X. JUVIGNYAND J. RYAN
history for 8 and 16 domains is the same, enhancing the efficiency of the parallelism of the Gauss preconditioner. In Figure 8, in the case of 16 domains, convergence for different time steps shows the robustness of the method. 4.2. Reducing the Costs The main cost of these preconditioners resides in the repeated iterative computation of (15)) the cost of each iteration consisting of an upward-downward sweep of the ILU factorization cost equivalent to 1.5 of a global matrix vector product).
(a
Search vectors in the GMRES algorithm
are defined by the size of the interface problem, which is usually small when compared to the global problem.
Thus, for a small increase in memory space, if these search vectors are stored,
the cost of the next global iteration step consisting in solving again (15) with another right-hand side can be reduced by projecting the initial residue on the family of former search vectors. Figure 9 clearly shows the gain in cost. Blunt Body ,CFL=OOP,4 domains, eps=.05, with and without projection
I
I
I
I
I
1
I
I
1 without
-
0
-1
-6
-8 -9 10
0
50
100
150
200 250 300 350 Number of matrix vector products
400
450
500
Figure 9. Interface preconditioner.
5.
CONCLUSION
Two preconditioners for the saddle-point formulation of the dual Schur domain decomposition method have been presented. The first is an extension to nonsymmetric matrices of the Kuznetsov preconditioner; the second is an approximation of the Gauss factorization of the initial matrix. The numerical tests for the Gauss preconditioner which takes into account interface connections, in accordance with the theory, is better than the Kuznetsov block preconditioners. Validation on a large test case has shown independence of the convergence versus the number of subdomains for the Gauss technique. We have used here an iterative method to approximate the solution of the dual Schur problem, but one could use any appropriate method to approximate the solution of the dual Schur problem. The Gauss preconditioner, though better in terms of iterations, could become prohibitive in terms of cost. In this case, one would prefer the modified Kuznetsov preconditioner with cy = -4,
Domain Decomposition Methods which is no more expensive
than the original
ces, but we will lose independence These methods can
are generic.
be used to approximate
When
computing
of convergence. sufficient
But
the solution
numerical
around
the overall convergence. and projecting
of convergence
versus the number
too weak an accuracy
have shown
that
The next step to improve with
of subdomains. (direct
will deteriorate
a relative
five to seven inner iterations), residue
matri-
or iterative)
of the local or the dual Schur equation.
inverses, tests
for nonsymmetric
any algorithm
accuracy
and greater
Cost of this inner solver can be reduced
the initial
these preconditioners
preconditioner
At each step of these methods,
approximate
(requiring
Kuznetsov
1155
of 10-l
accuracy
by storing
the good rate was quite
did not improve
former search vectors
onto this set of directions.
the Dual Schur domain “classical
decomposition
method
will be to combine
ones” such as coarse grid preconditioners.
REFERENCES 1.
C. Farhat
and F.X. Roux,
Implicit
parallel processing
in structural
mechanics,
Computational
Mechanics
Advances 2, (1994). 2. P. LeTallec, Y.H. De R.oeck and M. Vidrascu, Domain decomposition methods for large linearly elliptic three dimensional problems, J. Comput. Appl. Math. 34, 93-117, (1991). operators and domain decomposition methods in finite dimensional spaces, 3. V. Aghoskov, Poincar&Steklov’s In Proc. First. Int. Symp. on Domain Decomposition Methods for Partial Differential Equations, SIAM, Philadelphia, PA, (1988). 4. J.F. Bourgat, R. Glowinski, P. Le Tallec and M. Vidrascu, Proc. Second Znt Symp. on Domain Decomposition Methods for Partial Differential Equations, SIAM, Philadelphia, PA, (1989). 5. P. Morice, Transonic computations by a multidomain technique with potential and Euler solvers, In nanssonicum IIZ, IUTAM Symp. Glittingen, Springer-Verlag, Berlin, (1989). 6. Y.H. DeRoeck, P. LeTallec and M. Vidrascu, A domain decomposition solver for nonlinear elasticity, Comput. Methods Mech. Engrg. 99, 187-207, (1992). 7. P. LeTallec and M. Vidrascu, Algorithme de d&omposition de domaines pour r&oudre les syst&mes lineaires de la mecanique des fluides, Rapport de contrat STPA, INRIA, (1993). 8. J. Mandel, Two level domain decomposition preconditioning for the p-version finite element method in three dimensions, Internat. J. Numer. Methods Engrg. 29, 1095-1108, (1990). Common. Numer. Methods 9, 233-241, (1993). 9. J. Mandel, Balancing domain decomposition, 10. J. Bramble, J. Pasciak and A. Shataz, The construction of preconditioners for elliptic problems by substructuring, IV, Math. Comput. 53, l-24, (1989). 11. B. Smith, A domain decomposition algorithm for elliptic problems in three dimensions, Numer. Math. 60, 219-234, (1991). 12. B. Smith, An optimal domain decomposition preconditioner for the finite element solution of linear elasticity problems, SIAM, J. Sci. Stat. Comput. 13, 364-378, (1992). 13. B. Smith, A parallel implementation of an iterative substructuring algorithm for problems in three dimensions, SIAM, J. Sci. Stat. Comput. 14, 406-423, (1992). 14. J. Ryan and F.X. Roux, Extension of the dual Schur method into a multi-domain, multi-level, hierarchical solve, In Proceedings of “Algebraic Multilevel Iteration Methods with Applications”, University of Nijmegen, (1996). Parallel Multilevel Methods for Elliptic Partial 15. B. Smith, P. Bjtirstad and W. Gropp, Domain Decomposition: Diflerential Equations, Cambridge University Press, (1996). 16. Y. Saad and M.H. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, Rapport technique no. 254, Yale University, Department of Computer Science, (August 1983). 17. Y.A. Kuznetsov, Efficient iterative solvers for elliptic finite element problems on nonmatching grids, Russian Journal of Numerical Analysis Mathematic Modelling 10, (1995). 18. T. Huckle and M. Grote, A new approach to parallel preconditioning with sparse approximate inverses, Technical Report, Stanford University, (1994). 19. X. Juvigny, R&olution de grands systemes lineaires sur machines massivement paralltiles, Ph.D. Thesis, Universit6 de Paris VI, (1997). 20. Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Publishing, (1996). 21. F.X. Roux, MBthode de d&omposition de domaine & I’aide de multiplicateurs de Lagrange et application & la r&solution en parallele des huations de l’blasticit6 linbaire, Ph.D. Thesis, UniversitB de Park VI, (1989).