Computers and Mathematics with Applications (
)
–
Contents lists available at ScienceDirect
Computers and Mathematics with Applications journal homepage: www.elsevier.com/locate/camwa
Comparison of different FETI preconditioners for elastoplasticity Martin Čermák a,∗ , Václav Hapla a , Jakub Kružík a , Alexandros Markopoulos a,b , Alena Vašatová a a
IT4Innovations National Supercomputing Center, VŠB—Technical University of Ostrava, 17. listopadu 15/2172, Ostrava, Czech Republic
b
Faculty of Mechanical Engineering, VŠB—Technical University of Ostrava, 17. listopadu 15/2172, Ostrava, Czech Republic
article
info
Article history: Available online xxxx Keywords: Domain decomposition TFETI Elastoplasticity Preconditioning PERMON
abstract This paper illustrates parallel solution of elastoplastic problems with hardening based on the TFETI domain decomposition method with several preconditioning strategies. We consider von Mises plasticity with isotropic hardening using the return mapping concept for one time step. To treat nonlinearity and nonsmoothness, we use the semismooth Newton method. In each Newton iteration, we solve a linear system of equations using the TFETI domain decomposition method with lumped, Dirichlet, or no preconditioner. Our PERMON software is used for the numerical experiments. The observed times and numbers of iterations are backed up by the regular condition number estimates. Both preconditioners accelerate the solution process in terms of the convergence rate. For the worst conditioned benchmark, the most expensive Dirichlet preconditioner gives the lowest computational times. This lets us suggest the Dirichlet preconditioner as the default choice for engineering problems. © 2017 Elsevier Ltd. All rights reserved.
1. Introduction The goal of this paper is to describe and compare various preconditioners for the TFETI (Total Finite Element Tearing and Interconnecting) domain decomposition method [1,2] in the context of elastoplastic problems. The paper loosely follows up [3] and uses the same methodology to state the nonlinear problem and its linearization using Newton method. Let us briefly remind the fundamental features of the approach. The employed elastoplasticity model assumes small strains, von Mises criterion, and isotropic hardening; the elastoplasticity problem is considered quasi-static, and hence a pseudotime stepping strategy is required; the implemented strategy is the implicit Euler method; the predictor–corrector concept is implemented using the so-called return mapping (Section 2). The continuous problem in each pseudo-time step is spatially discretized using the Finite Element Method (FEM) resulting in a discrete nonlinear equation system; this system is linearized using the semismooth Newton method (Section 2). The resulting linear systems are solved in a scalable manner using a domain decomposition method; more specifically, the Total FETI (or TFETI) method [2] is utilized (Section 3). See [3] for a more detailed overview. In this paper, we focus mainly on a comparison of FETI preconditioners, namely the lumped and Dirichlet preconditioners, in the context of solving elastoplastic problems with the strategy above. These two preconditioners were introduced by Farhat and Roux [1,4,5]. They have been originally designated for elastic problems solved by the FETI domain decomposition
∗
Corresponding author. E-mail address:
[email protected] (M. Čermák).
http://dx.doi.org/10.1016/j.camwa.2017.01.003 0898-1221/© 2017 Elsevier Ltd. All rights reserved.
2
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
method. However, it is possible to use these preconditioners also for elastoplastic problems despite their nonlinearity as shown in Section 3. Other preconditioners for FETI methods are not known, except of the so-called superlumped preconditioner [6] which is even cheaper and less effective approximation of the lumped preconditioner. There are also extensions for problems with coefficient jumps [7]. For numerical experiments, we have used PERMON [8], implementing FETI-type domain decomposition and quadratic programming solvers; it is an open source software developed by our team at IT4Innovations (Section 4). Within numerical experiments (Section 5), we have made use of three benchmarks: a unit cube decomposed regularly, a unit cube decomposed by METIS, and a cuboid decomposed regularly. The observed times and numbers of iterations are backed up by regular condition number estimates. 2. Elastoplastic problem The elastoplastic problem is a quasi-static, rate-independent, small strain problem in which we consider associated elastoplasticity with the von Mises plastic criterion and isotropic hardening law (see e.g. [9–11]). Let us consider a deformable body occupying domain Ω ⊂ R3 with Lipschitz continuous boundary Γ = ∂ Ω . The state of the body during the loading process could be described by Cauchy stress tensor σ ∈ S, displacement u ∈ R3 and small ×3 strain tensor ε ∈ S. Here S = R3sym is the space of all symmetric second order tensors. More details can be found in [11]. The mentioned variables depend on spatial variable x ∈ Ω and time variable t ∈ [t0 , t ∗ ]. The equilibrium equation in the quasi-static case could be described as
− div(σ (x, t )) = g (x, t ) ∀(x, t ) ∈ Ω × [t0 , t ∗ ],
(1)
where g (x, t ) ∈ R represents the volume force acting at point x ∈ Ω and time t ∈ [t0 , t ]. Let boundary Γ be fixed on its part ΓU that has a nonzero Lebesgue measure with respect to Γ ; i.e. we prescribe the homogeneous Dirichlet boundary condition on ΓU : ∗
3
u(x, t ) = 0 ∀(x, t ) ∈ ΓU × [t0 , t ∗ ].
(2)
On the rest of boundary Γ , denoted by ΓN = Γ \ ΓU , we prescribe the Neumann boundary conditions
σ (x, t )n(x) = f (x, t ) ∀(x, t ) ∈ ΓN × [t0 , t ∗ ],
(3)
where n(x) denotes the exterior unit normal and f (x, t ) denotes the prescribed surface forces. For a weak formulation of the investigated problems, it is sufficient to introduce the space of kinematically admissible displacements, V = v ∈ [H 1 (Ω )]3 : v = 0 on ΓU .
(4)
The elastoplastic initial-value constitutive model consists of the following components (see e.g. [9,11,10]): 1. Additive decomposition of the strain tensor into the elastic and plastic parts:
ε = εe + εp .
(5)
2. The linear elastic law between the stress and the elastic strain:
σ = Cε e ,
(6)
where C is the fourth order elasticity tensor. 3. The von Mises yield function coupled with an isotropic hardening variable κ :
Φ (σ , κ) =
3 2
∥dev(σ )∥F − (σy + Hm κ) ≤ 0,
(7)
where σy , Hm > 0 denotes the initial yield stress and the hardening modulus, respectively. And dev(η) represents the deviatoric part of tensor η ∈ S and ∥ · ∥F denotes the Frobenius norm on space S. 4. The associated plastic flow rule:
∂Φ ε˙ = γ˙ = γ˙ ∂σ p
3
dev(σ )
2 ∥dev(σ )∥F
,
γ˙ ≥ 0,
(8)
where ε˙ p and γ˙ denote the time derivative of the plastic strain and the plastic multiplier, respectively. 5. The hardening law based on the accumulated plastic strain rate:
κ˙ =
2 3
∥˙ε p ∥F = γ˙ .
Notice that the second equality in (9) follows from (8).
(9)
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
3
6. The loading/unloading conditions:
γ˙ ≥ 0,
Φ (σ , κ) ≤ 0,
γ˙ Φ (σ , κ) = 0.
(10)
7. The initial conditions:
ε(x, t0 ) = ε e (x, t0 ) = ε p (x, t0 ) = σ (x, t0 ) = 0,
κ(x, t0 ) = 0,
x ∈ Ω.
(11)
All notations and descriptions of the elastoplastic model for the von Mises yield criterion with isotropic hardening can be found in e.g. [12–15]. Let us consider the following discretization of the time interval t0 < t1 < · · · < tk < · · · < tN = t ∗ . Let us denote σk = σk (x) = σ (x, tk ) and κk = κk (x) = κ(x, tk ), x ∈ Ω . To approximate the time derivatives, we use the implicit Euler method, often used in mathematical and engineering literature; see e.g. [9,11]. Let us define stress operator Tσ (σ , κ; ·) : S → S and hardening operator Tκ (σ , κ; ·) : S → S, both with time-dependent parameters σ ∈ S, κ ∈ R+ . For every η ∈ S Tσ (σ , κ; η) = Cη − Tκ (σ , κ; η) =
3µ
3µ + Hm
1 3µ + Hm
2 3
Φ + (σ + Cη, κ)ˆn(σ + Cη),
Φ + (σ + Cη, κ),
(12) (13)
where Φ + denotes the positive part of function Φ , and nˆ (σ ) =
dev(σ )
∥dev(σ )∥F
,
σ ∈ S;
(14)
see [9–11]. We denote stress operator Tσ (σk , κk ; ·) with respect to current parameters σk and κk by Tk (·). Operator Tk : S → S is potential, Lipschitz continuous, strongly monotone, and strongly semismooth on S; see [10,16,17]. Furthermore, we focus on the finite element discretization denoted by Th . Space V is approximated by its subspace Vh of piecewise linear and continuous functions. Therefore, the strains, the stress, and the isotropic hardening are approximated by piecewise constant functions. The finite element implementation details for elastoplastic problems can be found in [10,17]. After the time and space discretization, we can formulate the one time step elastoplastic problem. Let σk,h , κk,h be piecewise constant stress and hardening variables with respect to triangulation Th at time tk obtained from the previous time step. Problem 1 (One Time Step Elastoplastic Problem). Find displacement uk+1,h = uk,h + △uk+1,h ∈ Vh , where increment
△uk+1,h ∈ Vh solves the variational equations ⟨Tk,h (ε(△uk+1,h )), ε(vh )⟩F dx = △gkT+1 vh dx + Ω
Ω
ΓN
△fkT+1 vh ds ∀vh ∈ Vh ,
(15)
where Tk,h (.) = Tσ (σk,h , κk,h ; .) and ⟨., .⟩F denotes the Frobenius scalar product. Set stress and isotropic hardening fields σk+1,h = σk,h + △σk+1,h , κk+1,h = κk,h + △κk+1,h in next time step tk+1 from the relations
△σk+1,h = Tσ (σk,h , κk,h ; ε(△uk+1,h )),
△κk+1,h = Tκ (σk,h , κk,h ; ε(△uk+1,h ))
(16)
for every element of Th . For the sake of simplicity, we do not consider the finite element approximation of fk and gk . We can define generalized derivative Tko,h of Tk,h and consequently define approximated bilinear form ak,h (uh ) for uh ∈ Vh by ak,h (uh )(wh , vh ) =
Ω
⟨Tko,h (ε(uh ))ε(wh ), ε(vh )⟩F dx,
vh , wh ∈ Vh .
(17)
Here, Tko,h (ε(uh ))ε(wh ) can be represented as
[Tko,h (ε(uh ))ε(wh )]ij = [Tko,h (ε(uh ))]ijlm [ε(wh )]lm , where i, j, l, m ∈ {1, 2, 3} are indices of the given tensors. Let us note that the previously introduced variables also have their algebraic representations: C , Tk,T , and Tko,T are the algebraic representation of Hooke tensor C, the restriction of function Tk,h on T ∈ Th , and the restriction of function Tko,h on T ∈ Th , respectively. We define nonlinear operator Fk : Rn → Rn which corresponds to the left hand side in (15) as Fk (v ) =
T ∈Th
Tk,T (GT RT v )
T
GT RT ,
v ∈ Rn ,
(18)
4
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
where GT represents the algebraic relation between the strain and displacement, and RT is the restriction operator extracting displacements of element T from vector of all displacements u ∈ Rn . Furthermore, we define tangential and elastic stiffness matrices Kk (v ) =
Tko,T (GT RT v ), GT RT
T
G T RT ,
Kk (v ) ∈ Rn×n , v ∈ Rn ,
(19)
T ∈Th
Ke =
(CGT RT )T GT RT ,
Ke ∈ Rn×n .
(20)
T ∈Th
Now, we can reformulate the continuous Problem 1 into Problem 2 (Discrete One Time Step Elastoplastic Problem). Find increment of displacement △uk+1 ∈ V so that v T (Fk (△uk+1 ) − △fk+1 ) = 0 ∀v ∈ V ,
(21)
where △fk+1 is the increment of the load vector, Fk is the nonlinear operator, and space V is the set of admissible displacements V = v ∈ Rn |BU v = o .
The relation BU v = o represents the Dirichlet boundary conditions. The nonlinear equations (21) can be linearized by the semismooth Newton method (Algorithm 1). Algorithm 1 (Semismooth Newton method). 1: 2: 3: 4: 5:
initialization: △ uk,0 = o
for i = 0, 1, 2, . . . do
△uk,i ) find δ ui ∈ V : Kk,iδ ui = △f k+1 − Fk (△ compute △ uk,i+1 = △ uk,i + δ ui
△uk,i+1 − △ uk,i ∥/(∥△ △uk,i+1 ∥ + ∥△ △uk,i ∥) ≤ ϵNewton then stop if ∥△
6:
end for
7:
set △ uk+1 = △ uk,i+1
In each Newton step, a linear system has to be solved (line 3 of Algorithm 1) corresponding to solving a discretized linear elasticity problem; in case of a contact problem, a quadratic programming problem is solved [18,19]. Matrix Kk,i is defined as Kk,i = Kk (△uk+1,i ) using (19). The prescribed Dirichlet boundary conditions are implemented in a standard way (e.g. by omitting the corresponding rows and columns of Kk,i ), or postponed to be handled by the TFETI method (next section). Let us focus further on one Newton step (fixed i) only and one time step (fixed k) and simplify our notation K˜ = Kk,i
˜ = δ ui u
f˜ = △fk+1 − Fk (△uk,i ).
Any linear system solver can be used to solve the system
˜ = f˜ . K˜ u
(22)
3. TFETI method To apply the Total FETI (TFETI) domain decomposition method (DDM), we tear the body from the part of the boundary with the Dirichlet boundary condition, decompose it into subdomains, assign each subdomain a unique number, and introduce new ‘‘gluing’’ conditions on the artificial subdomain interfaces and on the boundaries with the imposed Dirichlet condition. Let NS denote the total number of the subdomains. 3.1. Dual DDM and the constraint matrix In the case of any non-overlapping DDM, global stiffness matrix K˜ and right-hand side f˜ is assembled only subdomainwise but left unassembled across the subdomains so that they can be expressed as K˜ = LKLT ,
f˜ = Lf,
(23)
where L is a rectangular Boolean matrix with more columns than rows, expressing the assembly of the subdomains on the interface, and K and f take the form K = diag(K1 , . . . , KNS ), f = [(f1 )T , . . . , (fNS )T ]T
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
5
where Ks and fs , s = 1, . . . , NS , is the fully assembled stiffness matrix and load vector of the subdomain Ω s , respectively. ˜ can be expressed as Then also u
˜ = Lu, u u = [(u1 )T , . . . , (uNS )T ]T , where us , s = 1, . . . , NS , is the vector of displacements of subdomain Ω s . Global decomposed stiffness matrix K has each interface degree of freedom (DOF) duplicated into all its neighbouring subdomains and the duplicated DOFs are decoupled from each other. TFETI is a dual approach which means the interface compatibility is enforced by introducing Lagrange multipliers. In the case of TFETI, also the Dirichlet boundary conditions are enforced in this way whereas Ks remains unchanged, and hence it is singular. The resulting equality constraints can be written as Bu = o, where B is a signed Boolean matrix. It can be split vertically into two blocks corresponding to gluing part BG and Dirichlet part BU , and at the same time horizontally into blocks Bsb , s = 1, . . . , NS , where each block Bsb corresponds to subdomain Ω s and consists of gluing part BsG and Dirichlet part BsU . Overall, B possesses the following structure:
BG B= BU
=
N
B1G . . . BGS
...
B1U
= B1 . . . BNS .
N BUS
The matrix BG is constructed in such a way that for each pair of connected DOFs, a row with exactly two appropriately placed nonzero values -1 and 1 is added to BG . For the numerical and implementation reasons, it is recommended to use fully redundant connections, √ i.e. DOFs are connected all to all within each group of duplicated DOFs. Then the ±1 values of BG must be scaled by 1/ m where m is the multiplicity (number of DOFs in the group). Note that [4] introduces this scaling as a part of the FETI preconditioners. However, it makes sense even if preconditioning is not used since it slightly improves conditioning on its own. Similarly, for each Dirichlet boundary DOF, BU contains a separate row with the appropriately placed value of 1. Notice BsU is a zero matrix if the boundary of Ω s does not intersect with the Dirichlet boundary. For more details, see [20]. For the sake of simplicity, let us assume that the boundary (interface) DOFs are numbered last, and index sets i and b correspond to the internal and boundary DOFs, respectively. In the TFETI approach, index set b includes the DOFs on the Dirichlet boundary. Stiffness matrix K can then be divided into blocks Ks =
Ksii
Ksib
Ksbi
Ksbb
.
(24)
Similarly, subdomain constraint matrix Bs can be rewritten as s
B = O
Bsb
O
BsGb
O
BsUb
=
(25)
where Bsb is formed by the columns corresponding to the interface DOFs while the remaining columns are all zero and correspond to the internal DOFs as the constraint matrix acts on the interface only. 3.2. Algebraic formulation Introducing Lagrange multipliers λ to enforce the interface compatibility, the global problem (22) takes the equivalent primal–dual form
K B
BT O
u
λ
=
f . o
(26)
Let us establish notation F = BKĎ BT ,
G = −RT BT ,
d = BKĎ f,
e = −RT f.
KĎ denotes a generalized inverse of K, satisfying KKĎ K = K. Notice KĎ inherits the block-diagonal structure from K and each block (KĎ )s is the generalized inverse of Ks . We obtain a new linear system with unknowns λ and α
F G
GT O
λ d = . α e
(27)
The second equation Gλ = e can be homogenized by finding the particular solution λIm satisfying GλIm = e and expressing λ as
λ = λIm + λKer ,
(28)
6
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
Fig. 1. Geometry.
where λKer ∈ Ker G. A natural choice of λIm is the least-square solution
λIm = GT (GGT )−1 e. After carrying out the substitution (28), only part λKer is unknown while λIm is the given constant vector. Hence, term FλIm can be moved to the right-hand side, yielding the new formulation
F G
GT O
λKer d , = α o
(29)
where d = d − FλIm . Furthermore, the second equation of (29) can be eliminated using projector PG onto null space of G. It relates to the projector QG onto the image of GT : PG = I − QG . It holds that Im PG = Ker QG = Ker G and Ker PG = Im QG = Im GT . This implies PG GT = O. Additionally, since it trivially holds that GλKer = o, (29) simplifies into PG FλKer = PG d.
(30)
Once λKer is found, λ can be reconstructed easily using (28) again. The most common choice of QG is QG = GT HG, where H = (GGT )−1 . The action of H is often called the coarse problem of FETI. In this case, QG and PG are orthogonal projectors. The problem (30) can be solved by the conjugate gradient method (CG). CG is a good choice thanks to the classical estimate of the spectral condition number by Farhat, Mandel, and Roux [4]:
κ(PG F|ImPG ) ≤ C
H h
,
(31)
where H is the subdomain size and h is the mesh element size, both in one dimension; see Fig. 1. 3.3. Preconditioners We now briefly summarize the FETI preconditioners introduced in [4]. Let us remind that it is very important to scale matrix BG as noted in Section 3.1 in order to obtain the desired preconditioning effect. The Dirichlet preconditioner reads 1 F− D =
Ns
Bs
Ns T O s T B = Bsb Ssbb Bsb = BSBT , s Sbb
s=1
O O
(32)
s=1
where Ssbb is the Schur complement of block Ksii with respect to subdomain stiffness matrix Ks ,
−1
T
O O
O O ,..., S1bb O
Ssbb = Ksbb − Ksib
Ksii
Ksib ,
(33)
and
S = diag
O N
SbbS
.
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
7
The Dirichlet preconditioner is numerically scalable. The second ‘‘lumped’’ preconditioner reads 1 F− = L
Ns
Bs
s =1
O O
Ns T O s T B = Bsb Ksbb Bsb = BKBT . s Kbb
(34)
s=1
O O s O Ksbb may be replaced by K since the zero blocks meet the zero blocks of B anyway, see (25). The lumped preconditioner is not numerically scalable, but it is more economical than the Dirichlet preconditioner. Finally, the problem (30) can be solved by the standard preconditioned conjugate gradient (PCG) method equipped with 1 −1 1 one of the two preconditioners of F− from (34). Using F− D from (32) or FL D , the original condition number estimate (31) improves as [5] Note that matrix
1 2 H . κ(PG F− P F | ImP ) ≤ C 1 + log G G D h
(35)
1 −1 With F− L , no such theoretical improvement has been proven. On the other hand, it is much cheaper than FD . In this paper, we compare them on elastoplasticity benchmarks.
4. PERMON toolbox and its PermonFLLOP module PERMON (Parallel, Efficient, Robust, Modular, Object-oriented, Numerical simulations) [8] forms a collection of software libraries, uniquely combining quadratic programming (QP) algorithms, and domain decomposition methods (DDM), built on top of the well-known PETSc framework [21,22] for numerical computations. The main applications include contact problems of mechanics. Our PermonFLLOP package [23] is focused on non-overlapping DDM of the FETI type, allowing efficient and robust utilization of contemporary parallel computers for problems with billions of unknowns. Any FEM software can be used to generate a mesh and assemble the stiffness matrices and load vectors for each subdomain independently. Additionally, the mapping from the local to global numbering of DOFs is needed, and non-penetration and friction information in the case of contact problems. All these data are passed to PermonFLLOP, which prepares auxiliary data needed in the DDM. PermonQP [24] is then called internally to solve the resulting equality constrained problem with additional inequality constraints in the case of contact problems. PETSc provides a special matrix type of MATSCHURCOMPLEMENT which we can take advantage of to implement the Dirichlet preconditioner (32). Function MatGetSchurComplement allows us to get an exact or approximate Schur complement. The latter approximates (Ksii )−1 with diag(1./diag(Ksii )) or diag(1./sum(Ksii )) (in MATLAB syntax). However, our initial experiments have shown that with these approximations the resulting preconditioner leads to loss of robustness and we have omitted them for this paper. 5. Numerical experiments This section is divided into three parts demonstrating the effect of preconditioning on benchmarks, and the final part devoted to the basic eigenvalue analysis. The first one is the unit cube with regular decomposition into subdomains. The second one is the same cube but partitioned using METIS [25]. The last benchmark is a cuboid, i.e. with regular decomposition but different numbers of subdomains along x, y, and z axes. For all cases the same material parameters and stopping criteria are assumed. Elastoplastic body Ω is made of a homogeneous isotropic material with parameters E = 200 GPa, ν = 0.33, σy = 450 MPa, and Hm = 100 GPa where E and ν are the Young modulus and the Poisson ratio, respectively. The stopping tolerances of the Newton and the CG algorithms are ϵNewton = 10−4 and ϵCG = 10−6 , respectively. All numerical experiments were performed using the MareNostrum [26] supercomputer operated by Barcelona Supercomputing Center. MareNostrum consists of 3056 compute nodes. Each compute node contains two 2.6 GHz, 8-core Intel E5-2670 (Sandy Bridge) processors and at least 32 GB of memory. Compute nodes are interconnected by Infiniband FDR-10. MareNostrum has the peak performance of 1.1 petaflops. 5.1. A unit cube decomposed regularly Performance of the described preconditioners is demonstrated on an elastoplastic homogeneous cube with the dimensions 1×1×1 mm (see Fig. 1) with the prescribed zero Dirichlet conditions on the lower face (z = 0) and the Neumann boundary conditions g (t ) = pz = 1000 MPa on the upper face (z = 1). Fig. 1 also depicts one possible decomposition and the meaning of discretization parameter h and decomposition parameter H. To test the weak numerical and parallel scalability, domain Ω has been partitioned regularly into 43 = 64 to 163 = 4096 subdomains. Three levels of mesh granularity have been used: 14,739; 27,783; 46,875 DOFs per subdomain corresponding to 163 ; 203 ; 243 elements per subdomain, respectively. The number of DOFs per subdomain has been kept constant for each
8
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
Table 1 Overview of the numbers of DOFs, elements, and subdomains for the unit cube with regular decomposition. Nx # cores
6 216
8 512
10 1 000
12 1 728
14 2 744
16 4 096
4,096,000 12,519,843
7,077,888 21,567,171
11,239,424 34,171,875
16,777,216 50,923,779
8,000,000 24,361,803
13,824,000 41,992,563
21,952,000 66,564,123
32,768,000 99,228,483
13,824,000 41,992,563
23,887,872 72,412,707
37,933,056 114,818,259
56,623,104 171,199,875
163 elements and 14,739 DOFs per subdomain # elements # DOFs
884,736 2,738,019
2,097,152 6,440,067
203 elements and 27,783 DOFs per subdomain # elements # DOFS
1,728,000 5,314,683
4,096,000 12,519,843
243 elements and 46,875 DOFs per subdomain # elements # DOFs
2,985,984 9,145,875
7,077,888 21,567,171
Fig. 2. Unit cube with regular decomposition: 163 elements per subdomain (2(a), (b)), 203 elements per subdomain (2(c), (d)), and 243 elements per subdomain (2(e), (f)) solution time (left) and total number of the CG iterations (right).
level. The total number of DOFs ranges from 823,875 to 171,199,875. Parameters Nx , Ny , Nz say how many subdomains there are in the x, y, and z directions, respectively. In this first benchmark, Nx = Ny = Nz . For more details, see Table 1. In Fig. 2(a)–(f), the performance results for different preconditioners, decompositions, and numbers of elements per subdomain are listed. Not surprisingly, it can be seen that the Dirichlet preconditioner gives the smallest number of CG
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
9
Table 2 Unit cube with regular decomposition: Comparison of number of the CG iterations for the first and last Newton steps for different preconditioners and 163 elements per subdomain. Nx
4 6 8 10 12 14 16
None
Lumped
Dirichlet
First iter.
Last iter.
First iter.
Last iter.
First iter.
Last iter.
60 67 69 67 68 68 69
94 100 102 100 101 102 103
47 50 50 48 48 48 48
71 71 71 69 69 69 69
33 36 36 35 35 35 35
45 45 44 43 43 43 44
Table 3 Unit cube with regular decomposition: Comparison of number of the CG iterations for the first and last Newton steps for different preconditioners and 203 elements per subdomain. Nx
4 6 8 10 12 14 16
None
Lumped
Dirichlet
First iter.
Last iter.
First iter.
Last iter.
First iter.
Last iter.
67 73 74 75 74 74 74
102 108 111 112 111 112 111
53 57 56 56 55 55 54
80 81 81 80 79 79 78
37 38 37 39 38 38 37
48 50 49 47 47 46 46
Table 4 Unit cube with regular decomposition: Comparison of number of the CG iterations for the first and last Newton steps for different preconditioners and 243 elements per subdomain. Nx
4 6 8 10 12 14 16
None
Lumped
Dirichlet
First iter.
Last iter.
First iter.
Last iter.
First iter.
Last iter.
72 77 81 82 80 81 81
112 118 122 124 121 121 123
61 62 63 62 61 61 61
91 92 91 91 89 89 90
39 40 41 41 41 40 41
53 54 54 53 50 50 50
iterations. Nevertheless, the lowest total time is obtained with the lumped preconditioner. This observation complies with the conclusion in [4]. The number of the Newton iterations was 5 in all cases. Fig. 2(b), (d) and (f) depict the total numbers of the CG iterations depending on the number of subdomains and number of elements per subdomain for three levels of preconditioning: none, lumped and Dirichlet. It shows the weak numerical scalability. The number of CG iterations is almost constant for the lumped and Dirichlet preconditioners. With no preconditioning, it increases moderately. Fig. 2(a), (c) and (e) show that with a higher number of elements per subdomain the total number of subdomains influences the computational time less significantly. For 163 elements per subdomain, for instance, the time increases approximately by a factor of 4 between 64 and 4096 subdomains, but for 243 elements per subdomain this factor is only about 2. This can be explained by the higher ratio of local problem complexity to coarse problem complexity. A further observation is that the Dirichlet preconditioning always pays off starting from a certain number of subdomains. But the overall winner is the lumped preconditioner. Tables 2–4 list the number of the CG iterations for the first (elastic behaviour) and last Newton steps for different preconditioner cases and different numbers of elements per subdomain. Fig. 3 presents the von Mises stress distribution. The whole body is in the plastic regime as shown in Fig. 3 because the von Mises stress distribution is greater than σy = 450 MPa. 5.2. A unit cube decomposed by METIS The second benchmark is the same cube as in the previous subsection but here we use nonregular decomposition into 43 = 64 to 103 = 1000 subdomains using the METIS [25] software. The numbers of DOFs per subdomain have been kept
10
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
Fig. 3. Distribution of von Mises stress [MPa].
Fig. 4. Unit cube with decomposition by METIS: 163 elements per subdomain (4(a), (b)), 203 elements per subdomain (4(c), (d)), and 243 elements per subdomain (4(e), (f)) solution time (left) and total number of the CG iterations (right).
almost constant (depending on the METIS output) and approximately equal to that in Table 1. The total number of DOFs ranges from 823,875 to 41,992,563. In Fig. 4(a)–(f), as in Section 5.1, the performance results for the different preconditioners, decompositions, and numbers of elements per subdomain are listed. Again, the Dirichlet preconditioner gives the smallest number of the CG iterations. Nevertheless, the lowest total time is again obtained with the lumped preconditioner. The number of the Newton iterations is again 5 in all cases. Tables 5–7 list the number of the CG iterations for the first (elastic behaviour) and last Newton steps for different preconditioner cases and different numbers of elements per subdomain.
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
11
Table 5 Unit cube with decomposition by METIS: Comparison of number of the CG iterations for the first and last Newton steps for different preconditioners and 163 elements per subdomain. Nx
4 6 8 10
None
Lumped
Dirichlet
First iter.
Last iter.
First iter.
Last iter.
First iter.
Last iter.
126 167 188 199
198 250 278 298
113 148 167 172
171 211 238 248
78 102 109 118
107 128 134 145
Table 6 Unit cube with decomposition by METIS: Comparison of number of the CG iterations for the first and last Newton steps for different preconditioners and 203 elements per subdomain. Nx
4 6 8 10
None
Lumped
Dirichlet
First iter.
Last iter.
First iter.
Last iter.
First iter.
Last iter.
124 168 177 175
189 249 266 262
105 149 159 157
159 212 230 225
70 96 106 103
91 127 134 125
Table 7 Unit cube with decomposition by METIS: Comparison of number of the CG iterations for the first and last Newton steps for different preconditioners and 243 elements per subdomain. Nx
4 6 8 10
None
Lumped
Dirichlet
First iter.
Last iter.
First iter.
Last iter.
First iter.
Last iter.
109 175 195 196
170 269 294 297
97 159 177 174
145 231 256 256
63 99 113 116
86 134 149 150
Table 8 Overview of the numbers of DOFs, elements, and subdomains for the cuboid with regular decomposition. Ny # cores
6 216
8 512
10 1 000
12 1 728
14 2 744
16 4 096
2,097,152 6,464,835
4,096,000 12,558,483
7,077,888 21,622,755
11,239,424 34,247,475
16,777,216 51,022,467
8,000,000 24,422,103
13,824,000 42,079,323
21,952,000 66,682,143
32,768,000 99,382,563
13,824,000 42,079,323
23,887,872 72,537,555
37,933,056 114,988,107
56,623,104 171,421,635
163 elements and 14,739 DOFs per subdomain # elements # DOFs
884,736 2,751,987
203 elements and 27,783 DOFs per subdomain # elements # DOFS
1,728,000 5,336,463
4,096,000 12,558,483
243 elements and 46,875 DOFs per subdomain # elements # DOFs
2,985,984 9,177,195
7,077,888 21,622,755
5.3. A cuboid decomposed regularly Lastly, performance of the described preconditioners is demonstrated on an elastoplastic homogeneous cuboid of the dimensions 4 × 2 × 1 mm with the prescribed zero Dirichlet conditions on the lower face (z = 0) and the Neumann boundary conditions g (t ) = pz = 1000 MPa on the upper face (z = 1). Domain Ω has been partitioned regularly with the number of subdomains ranging from 8 × 4 × 2 = 64 to 32 × 16 × 8 = 4096. Each partition has consisted of 163 ; 203 ; 243 elements and 14,739; 27,783; 46,875 DOFs, respectively. The total number of DOFs ranges from 830,115 to 171,421,635. In this benchmark, Nx = 2Ny = 4Nz (see Table 8). In Fig. 5(a)–(f), the performance results for the different preconditioners, decompositions, and numbers of elements per subdomain are listed. In contrast to the results from Sections 5.1 and 5.2, the Dirichlet preconditioner delivers not only the lowest numbers of the CG iterations but also the shortest computational times, which can be explained by the stronger preconditioning effect in this worse conditioned case. The total numbers of the CG iterations are 3 times greater for no preconditioner than for the Dirichlet preconditioner. However, regarding the lumped preconditioner, the total numbers of the CG iterations as well as total times are almost the same as for no preconditioner. The number of the Newton iterations was again 5 in all cases. Tables 9–11 list the numbers of the CG iterations for the first (elastic behaviour) and last Newton steps for different preconditioner cases and different numbers of elements per subdomain.
12
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
Fig. 5. Cuboid with regular decomposition: 163 elements per subdomain (5(a), (b)), 203 elements per subdomain (5(c), (d)), and 243 elements per subdomain (5(e), (f)) solution time (left) and total number of the CG iterations (right). Table 9 Cuboid with regular decomposition: Comparison of number of the CG iterations for the first and last Newton steps for different preconditioners and 163 elements per subdomain. Ny
4 6 8 10 12 14 16
None
Lumped
Dirichlet
First iter.
Last iter.
First iter.
Last iter.
First iter.
Last iter.
236 283 287 293 293 296 289
409 455 467 477 478 480 475
200 236 243 248 251 254 252
345 400 415 432 436 440 436
86 110 110 111 112 112 110
135 160 165 167 168 167 166
5.4. Condition numbers To support the results of the experiments above, we have computed the approximate condition numbers of the operators 1 −1 PG F, PG F− L PG F and PG FD PG F for all three benchmarks. Since the operator is singular in all cases, the condition number has been computed as the ratio of the estimates of the largest eigenvalue and the smallest nonzero eigenvalue using the Arnoldi process implemented in PETSc. The observed differences between the preconditioners within the number of the CG iterations reflect the varied condition numbers; cf. bold rows in Tables 2, 5 and 9 with Table 12. In the particular case of the
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
13
Table 10 Cuboid with regular decomposition: Comparison of number of the CG iterations for the first and last Newton steps for different preconditioners and 203 elements per subdomain. Ny
4 6 8 10 12 14 16
None
Lumped
Dirichlet
First iter.
Last iter.
First iter.
Last iter.
First iter.
Last iter.
248 300 309 303 316 307 311
450 502 518 528 532 524 527
225 259 276 277 285 280 282
382 445 469 484 490 491 493
92 114 117 116 117 115 115
143 170 177 178 178 175 176
Table 11 Cuboid with regular decomposition: Comparison of number of the CG iterations for the first and last Newton steps for different preconditioners and 243 elements per subdomain. Ny
4 6 8 10 12 14 16
None
Lumped
Dirichlet
First iter.
Last iter.
First iter.
Last iter.
First iter.
Last iter.
295 312 330 337 330 336 335
490 534 579 572 561 572 573
265 279 297 304 301 306 307
427 486 522 533 533 544 547
103 118 122 122 119 120 119
151 177 187 187 185 187 187
Table 12 Comparison of the Jacobian condition number in the first and last Newton steps for different preconditioners and geometries, 163 elements per subdomain, 103 subdomains (cases corresponding to bold rows in Tables 2, 5 and 9). Preconditioner
None
Newton step
First
Last
Lumped First
Last
Dirichlet First
Last
Cube—regular decomp. Cube—METIS decomp. Cuboid
232.2 2790.2 4667.0
518.6 4571.4 10,047.0
106.3 1818.6 2440.4
200.7 2693.0 5578.4
64.8 737.8 517.2
74.7 864.1 826.7
cuboid benchmark, the regular condition number of the operator in the last Newton iteration equipped with the Dirichlet preconditioner is more than 12 times smaller than that of the unpreconditioned operator. This explains why the Dirichlet preconditioner is in this case cost-effective in spite of its expensiveness. 6. Conclusion In this paper, we have presented the TFETI domain decomposition method equipped with the classical preconditioners applied to solving an elastoplastic problem. The numerical results were carried out using the PERMON toolbox. It is obvious from these experiments that both preconditioners accelerate the solution process in terms of the convergence rate. More surprisingly, for the relatively simple but worse conditioned cuboid benchmark, the most expensive Dirichlet preconditioner gives the lowest computational times. Hence, it can be recommended for engineering problems. Acknowledgements We acknowledge PRACE for awarding us access to resource MareNostrum 3 based in Barcelona, Spain at BSC within the project ‘‘EXA2CT: EXascale Algorithms and Advanced Computational Techniques’’. The work was also supported by The Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPU II) project ‘‘IT4Innovations excellence in science—LQ1602’’ and from the Large Infrastructures for Research, Experimental Development and Innovations project ‘‘IT4Innovations National Supercomputing Center LM2015070’’, and by the internal student grant competition project SP2016/178 ‘‘PERMON toolbox development II’’. The authors acknowledge the Czech Science Foundation (GACR) project no. 15-18274S. References [1] C. Farhat, F.-X. Roux, A method of finite element tearing and interconnecting and its parallel solution algorithm, Internat. J. Numer. Methods Engrg. 32 (6) (1991) 1205–1227. http://dx.doi.org/10.1002/nme.1620320604.
14
M. Čermák et al. / Computers and Mathematics with Applications (
)
–
[2] Z. Dostál, D. Horák, R. Kučera, Total FETI – an easier implementable variant of the FETI method for numerical solution of elliptic PDE, Comm. Numer. Methods Engrg. 22 (12) (2006) 1155–1162. http://dx.doi.org/10.1002/cnm.881. [3] A. Markopoulos, V. Hapla, M. Cermak, M. Fusek, Massively parallel solution of elastoplasticity problems with tens of millions of unknowns using PermonCube and FLLOP packages, Appl. Math. Comput. 267 (2015) 698–710. http://dx.doi.org/10.1016/j.amc.2014.12.097. [4] C. Farhat, J. Mandel, F.-X. Roux, Optimal convergence properties of the FETI domain decomposition method, Comput. Methods Appl. Mech. Engrg. 115 (1994) 365–385. http://dx.doi.org/10.1016/0045-7825(94)90068-X. [5] D. Rixen, C. Farhat, R. Tezaur, J. Mandel, Theoretical comparison of the FETI and algebraically partitioned FETI methods, and performance comparisons with a direct sparse solver, Internat. J. Numer. Methods Engrg. 46 (4) (1999) 501–533. cited By 40. [6] P. Gosselet, C. Rey, Non-overlapping domain decomposition methods in structural mechanics, Arch. Comput. Methods Eng. 13 (4) (2006) 515–572. http://dx.doi.org/10.1007/BF02905857. [7] D. Rixen, C. Farhat, A simple and efficient extension of a class of substructure based preconditioners to heterogeneous structural mechanics problems, Internat. J. Numer. Methods Engrg. 44 (4) (1999) 489–516. [8] V. Hapla, et al. PERMON (Parallel, Efficient, Robust, Modular, Object-oriented, Numerical) web pages, 2015. URL http://permon.it4i.cz/. [9] W. Han, B.D. Reddy, Plasticity: Mathematical Theory and Numerical Analysis, Springer-Verlag, 1999, http://dx.doi.org/10.1007/b97851. [10] R. Blaheta, Numerical methods in elasto-plasticity, Documenta Geonica 1998, PERES Publishers, Prague, 1998. [11] E.A. de Souza Neto, D. Peri, D.R.J. Owen, Computational Methods for Plasticity, Wiley-Blackwell, 2008, http://dx.doi.org/10.1002/9780470694626. [12] M. Čermák, T. Kozubek, S. Sysala, J. Valdman, A TFETI domain decomposition solver for elastoplastic problems, Appl. Math. Comput. 231 (2014) 634–653. http://dx.doi.org/10.1016/j.amc.2013.12.186. [13] M. Cermak, T. Kozubek, An efficient tfeti based solver for elasto-plastic problems of mechanics, Adv. Electr. Electron. Eng. 10 (1) (2012) 57–62. http://dx.doi.org/10.15598/aeee.v10i1.558. [14] M. Čermák, V. Hapla, D. Horák, M. Merta, A. Markopoulos, Total-FETI domain decomposition method for solution of elasto-plastic problems, Adv. Eng. Softw. 84 (2015) 48–54. http://dx.doi.org/10.1016/j.advengsoft.2014.12.011. [15] J. Kruis, P. Štemberk, Imperfect bond modelled by the feti method, in: Computational Plasticity - Fundamentals and Applications, COMPLAS IX, 2007, pp. 939–942. [16] S. Sysala, Application of a modified semismooth Newton method to some elasto-plastic problems, Math. Comput. Simulation 82 (10) (2012) 2004–2021. http://dx.doi.org/10.1016/j.matcom.2012.03.012. [17] P.G. Gruber, J. Valdman, Solution of one-time-step problems in elastoplasticity by a slant Newton method, SIAM J. Sci. Comput. 31 (2) (2009) 1558–1580. http://dx.doi.org/10.1137/070690079. [18] M. Cermak, S. Sysala, Total-FETI method for solving contact elasto-plastic problems, Lect. Notes Comput. Sci. Eng. 98 (2014) 955–965. http://dx.doi.org/10.1007/978-3-319-05789-7__93. [19] M. Cermak, J. Haslinger, T. Kozubek, S. Sysala, Discretization and numerical realization of contact problems for elastic-perfectly plastic bodies. PART II - numerical realization, limit analysis, ZAMM J. Appl. Math. Mech. 95 (12) (2015) 1348–1371. http://dx.doi.org/10.1002/zamm.201400069, Zeitschrift für Angewandte Mathematik und Mechanik. [20] A. Vašatová, M. Čermák, V. Hapla, Parallel implementation of the FETI DDM constraint matrix on top of PETSc for the PermonFLLOP package, in: Parallel Processing and Applied Mathematics 2015, in: Lecture Notes in Computer Science, vol. 9573, 2016, pp. 150–159. http://dx.doi.org/10.1007/978-3-31932149-3_15. [21] B.F. Smith, et al., PETSc users manual, Tech. Rep. ANL-95/11 - Revision 3.5, Argonne National Laboratory, 2014, URL http://www.mcs.anl.gov/petsc. [22] S. Balay, S. Abhyankar, M.F. Adams, J. Brown, P. Brune, K. Buschelman, V. Eijkhout, W.D. Gropp, D. Kaushik, M.G. Knepley, L.C. McInnes, K. Rupp, B.F. Smith, H. Zhang, PETSc web pages, 2015. URL http://www.mcs.anl.gov/petsc. [23] V. Hapla, et al. PermonFLLOP web pages, 2016. URL http://permon.it4i.cz/fllop/. [24] V. Hapla, et al. PermonQP web pages, 2015. URL http://permon.it4i.cz/qp/. [25] G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput. 20 (1) (1998) 359–392. [26] MareNostrum web page. URL https://www.bsc.es/marenostrum-support-services/mn3.