Accepted Manuscript Solution of high order compact discretized 3D elliptic partial differential equations by an accelerated multigrid method Arcesio Castañeda Medina, Rochus Schmid
PII: DOI: Reference:
S0377-0427(18)30640-X https://doi.org/10.1016/j.cam.2018.10.032 CAM 11978
To appear in:
Journal of Computational and Applied Mathematics
Received date : 13 February 2018 Revised date : 5 August 2018 Please cite this article as: A.C. Medina, R. Schmid, Solution of high order compact discretized 3D elliptic partial differential equations by an accelerated multigrid method, Journal of Computational and Applied Mathematics (2018), https://doi.org/10.1016/j.cam.2018.10.032 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Manuscript Click here to view linked References
Solution of high order compact discretized 3D elliptic partial differential equations by an accelerated multigrid method Arcesio Castañeda Medinaa , Rochus Schmida,∗ a Computational
Materials Chemistry group, Chair of Inorganic Chemistry 2, Ruhr-University Bochum, Bochum, Germany
Abstract A robust and efficient numerical solver for three-dimensional linear elliptic partial differential equations without cross derivative terms, but variable coefficients is presented. The proposed algorithm is based on a general meshsize fourthorder compact finite-difference scheme and an accelerated geometric multigrid solver. The former not only increases the accuracy of the discretization, but also does not affect the bandwidth of the common second-order finite-difference scheme, which is relevant for distributed memory parallel implementations of the multigrid preconditioner. The acceleration technique, on the other hand, allows for a dramatic reduction of the number of cycles and smoothing steps required to achieve convergence. Furthermore, it is found that depending on the coefficients of the elliptic partial differential equation, the multigrid preconditioned solver can be the only way to obtain a reliable numerical solution or attain convergence. Keywords: Elliptic partial differential equations; Convection-diffusion equation; High order compact difference methods; Symbolic derivation methods; Multrigrid solvers; Iterant recombination 1. Introduction
5
Elliptic partial differential equations (EPDEs) emerge in many different disciplines and levels of description of physical phenomena, ranging from coarse grained models in engineering to mean field approaches in physics and chemistry, through stochastic, financial and weather models. Thus, for example, they include the standard steady convection-diffusion equation, which describes transport process in fluid dynamics (mass, heat and momentum transfers), and ∗ Corresponding
author Email addresses:
[email protected] (Arcesio Castañeda Medina),
[email protected] (Rochus Schmid)
Preprint submitted to Journal of Computational and Applied Mathematics
August 5, 2018
10
15
20
25
30
35
40
45
50
the generalized Poisson equation used to describe the electrostatics of general dielectric media. One of the basic techniques to discretize general EPDEs is based on a second order centered finite difference scheme for the diffusion terms and a first order upwind for the convection terms [1]. An approach which, although stable, can lead to numerical spurious effects. By employing higher order discretization methods, on the other hand, stability and accuracy can be achieved at the same time [2, 3]. The most direct method to generate higher order difference schemes, based on the extension of the stencils, has the disadvantages of larger bandwidths for the resulting linear systems, as well as, a more complicated treatment of the boundary conditions, and the increase of the communication requirements in parallel implementations. The so-called high order compact difference schemes avoid such problems at the expense of more elaborated discretization stencils [4]. An elegant method to derive high-order compact stencils (HOCS) is based on the idea of applying repeatedly central difference schemes onto the terms of the EPDEs until the leading truncation error of the discretization reaches the requested accuracy [2]. In such way, fourth, and even sixth order stencils [5], for the 3D standard Poisson equation have been derived. For general EPDEs, rather extensive algebraic work is needed to compute the HOCS through this technique. Nevertheless, the availability of symbolic computation packages have allowed to determine them more efficiently [6, 7]. On the other hand, most of these derivations, and their applications [8], have focused on the discretization of EPDEs on grids with equal meshsizes. In many situations, however, it is more effective, and even unavoidable, to discretize on meshes with different sizes in the Cartesian directions. This can be the case when the unknown (and known) functions in the EPDE change very differently along each direction or when one (or more) of the physical dimensions of the domain can not be fitted to a specific meshsize. Certainly, fourth order compact stencils have been already derived and implemented for the 2D and 3D standard Poisson equations using unequal mesh sizes [9, 10]. There, a formally symbolic representation of the fourth order compact operator for the Laplacian allowed to obtain the HOCS by rather straightforward algebraic manipulations. This was possible, however, due to the relatively simple structure of the Poisson equation. In the present work, arbitrary meshsize grids are considered and the HOCS for the more general EPDE (without cross derivatives) is derived. Mathematica [11] is used for the heavy algebraic manipulations, which are required at the early stages of the symbolic derivation approach, whereas to the end, fine tuned manual simplifications are performed in order to reduce the number of arithmetic operations that are necessary to compute the stencil in an actual numerical implementation. As any finite difference approach, the high order compact discretization scheme leads to a system of linear equations Au = b for the (unknown) solution vector u, where A is the matrix representation of the discretized EPDE and b is a known vector, which can contain information about the boundary conditions of the problem. In general, for real 3D applications the matrix dimen2
55
60
65
70
75
80
85
sions are very large and a direct solution for (e.g., by Gaussian elimination) is hardly attainable. Instead of that, iterative techniques are the preferred method of solution. Coupled to high order compact discretization schemes, these methods can be fast and high-accurate, even in the case of convection dominated problems [3, and references therein]. The (geometric) multigrid technique (MG) is one of the most used iterative solvers due to its generality, as well as, its efficiency, robustness and parallel features [1]. However, as the complexity of a problem increases, as could be the case for certain EPDEs, the issue of selecting properly the multigrid parameters is not trivial. Complications as anisotropies, convection-dominance and nonlinearities can happen at the same time affecting the central idea of multigrid, i.e., it is possible that some error components can not be reduced by standard smoothing procedures and remain large during the full iterative scheme. However, it has been realized that the combination of MG with Krylov subspace approaches can lead to robust and efficient numerical solvers even in such complex cases [12, 13]. Formally, within such schemes, MG acts as a preconditioner of the linear system, nevertheless, the full procedure can also be considered as an acceleration of multigrid by the so-called iterant recombination (MGIR) [1]. In general, apart from the question of using MG as a solver or as a preconditioner, such techniques not only decrease the number of required smoothing steps and increase the convergence rates, but also allow to deal with more elaborated models of real situations (e.g., rotating flow problems, incompressible Navier-Stokes equations, etc.). Besides that, the MGIR is easy to implement, not computationally expensive, and highly parallelizable. On the other hand, although the technique is well established and has been applied to solve 2D linear and nonlinear elliptic problems with upwind discretization schemes [12, 13], to our knowledge, its potential to solve 3D linear EPDEs in conjunction with HOCS has not been yet explored, being one of the main purposes of this work. The paper is organized as follows, in the section 2 the basic methodology used to derive the HOCS is briefly discussed. Detailed formulae for the general meshsize HOCS for general EPDEs, without cross derivative terms, are presented in the AppendixA. Implementation details for the accelerated MGIR solver of the high order discretized EPDE are presented in the section 3, and its computational efficiency on some test cases in the section 4. Finally, some conclusions are drawn in 5. 2. Fourth-order compact discretization
90
In this section, the HOCS symbolic derivation procedure proposed in reference [7] is generalized to grids with non-necessarily equal mesh sizes. Namely, the EPDE: 2 ∂ u(r) ∂ 2 u(r) ∂ 2 u(r) , , + C(r) · ∇u(r) + s(r)u(r) = g(r) , (1) D(r) · ∂x2 ∂y 2 ∂z 2 for the unknown function u, is discretized in a 3D rectangular domain Ω ⊆ R3 using a uniform grid with mesh sizes hx ,hy and hz . It is assumed that the 3
95
100
function coefficients in eq. (1) are smooth enough in such a domain and suitable conditions are imposed onto the boundary ∂Ω. The aim is to to construct an approximation formula for an arbitrary grid point (x, y, z) = (hx i, hy j, hz k), i, j, k ∈ N, such that at most 27 nearest neighboring grid points are used. The required sets of values for u, D, C, s and g are defined in the local coordinate system shown in the figure 1.
y 24
x z
17 8 20
7
3
0
1
9
4
10
ijk
12 13
2
19
5
23 16 6
15
18 25
26
11
14 21
22
Figure 1: The compact stencil defined in a cube around the generic point (i, j, k). The fourth order compact scheme for the equation (1) does not make use of the points 19, . . . , 26.
The first and second partial derivatives of u at the center point are given by: ∂u h2 ∂ 3 u = δξnu − n 3 + O(h4n ) , ∂ξn 6 ∂ξn
n = 1, 2, 3 ,
(2)
h2n ∂ 4 u ∂2u 2 = δ u − + O(h4n ) , ξn ∂ξn2 12 ∂ξn4
n = 1, 2, 3 ,
(3)
where δξnu and δξ2nu are the first and second order central difference operators with respect to (ξ1 , ξ2 , ξ3 ) ≡ (x, y, z), i.e.: δξnu =
1 [−1 0 2hn
δξ2nu =
1 [1 h2n
1] ,
(4)
− 2 1] ,
(5)
105
in stencil notation [1]. Thus, up to fourth order, the equation (1) can be ap-
4
proximated by: (D · (δx2 , δy2 , δz2 )u + C · (δx , δy , δz )u + su − g) 1 D ∂4 ∂4 ∂4 ∂3 ∂3 ∂3 u+O(h4n ) = 0 , − · h2x 4 , h2y 4 , h2z 4 +C· h2x 3 , h2y 3 , h2z 3 6 2 ∂x ∂y ∂z ∂x ∂y ∂z (6)
110
where the first line is nothing more than the standard seven-point second-order central difference discretization. Then, the main idea to obtain a fourth order compact stencil is to approximate to second order each of the partial derivative operators in the second line as they are already multiplied by h2ξn [4]. To this end, the required partial derivatives are computed using the original EPDE. The third and fourth order partial derivatives uξn ξn ξn and uξn ξn ξn ξn are obtained by solving the equations: ∂D · (uxx , uyy , uzz ) + D · (uξn xx , uξn yy , uξn zz )+ ∂ξn ∂C ∂∇u · ∇u + C · + sξn u + suξn = gξn , ∂ξn ∂ξn
n = 1, 2, 3 , (7)
and: ∂D ∂2D ·(uxx , uyy , uzz )+2 ·(uξn xx , uξn yy , uξn zz )+D·(uξn ξn xx , uξn ξn yy , uξn ξn zz )+ 2 ∂ξn ∂ξn ∂C ∂∇u ∂2C ∂ 2 ∇u ·∇u+2 +sξn ξn u+2sξn uξn +suξn ξn = gξn ξn , n = 1, 2, 3 , · +C· 2 ∂ξn ∂ξn ∂ξn ∂ξn2 (8)
115
respectively. Such solutions involve, in turn, crossed and non-crossed partial derivatives of u, D, C, s and g, which are again approximated by the second order difference operators in eqs. (4) and (5). Thus, it is found that the noncrossed first and second partial derivatives of u, as well as those of the known functions (D, C, s and g), only require the points 0, . . . , 6, whereas the crossed partial derivatives require further points, except those at the corners. Finally, the (second-order approximated) third and fourth order partial derivatives of u are inserted into the second line of the eq.(6) to obtain the HOCS ({αl }, F0 ) defined at the points 0, 1, . . . 18: 18 X
αl ul = F0 .
(9)
l=0
120
The above outlined process requires extensive algebraic manipulations, however, using symbolic computation packages it can be automatized to great extent. Here, this is performed by using the symbolic capabilities of Mathematica [11], which also allows us to obtain relatively well simplified formulae for αl and F0 . These expressions are, however, still rather cumbersome to use directly 5
125
130
in an actual numerical implementation and some fine tuned “non-machine-made” simplifications can be improve noticeably the computation of the HOCS. The HOCS thus obtained is presented in AppendixA. Although, in principle, the calculation of the HOCS needs to be done just once, having succinct expressions for αl and F0 allows easier practical implementations, with less propagation of numerical rounding errors and some gain in speed. 3. Accelerated parallel multigrid solver
135
140
145
150
155
160
165
The linear system obtained by the fourth order compact difference scheme is solved by a parallel multigrid solver [14] and accelerated by the iterant recombination technique [12, 13]. The driving routines are done at Python level, whereas the heavy grid operations (smoothing, interpolation, restriction and overlapping) at C level using the MPI standard. The parallel implementation is done following a grid partitioning strategy, splitting the computational domain along all Cartesian directions and using one ghost layer to do the exchange of values between neighborhood processes. A non-lexical sorted (four-colored) Gauss-Seidel procedure is employed as smoother for the finest and the coarser grids. A second-order seven-point stencil representation of the EPDE is used as the coarse grid operator. Full-weighting restriction IhH and (tri)linear interh are used as transfer operators in the V(m,n) cycle with ng grid polation IH levels, m presmoothing and n postsmoothing steps. The discretized version of the EPDE at the bottom of the V-cycle is solved by an adequate number of smoothing steps (Ic ). The iterant recombination procedure (Fig. 2) is, basically, an outer loop in which an optimal input solution (residual) for the next coarse grid correction is determined as a linear combination of the actual solution (residual) and the most recent l previous solutions (residuals) [1]. The coefficients appearing in the linear superpositions are determined from the minimization of the optimal defect, which in turn, reduces to the solution of a system of linear equations defined by the overlaps of the different residuals. In practice, small values of l are sufficient to obtain a well conditioned linear system and an optimal solution (residual). Furthermore, at the early stages, the recombination process is done with the iterants already available. The method also fits perfectly to the parallel MG, first, the required inner products are computed following the grid decomposition scheme, then the solution of the linear system is computed in each MPI process, and, finally, the updated part of the accelerated solution. It is also possible to perform the iterant recombination process at one or more or the coarse grid levels, at a given stage of the V-cycle, by applying a similar procedure on the most recent lc coarse solutions and residuals. Then, with an adequate selection of the couple (l, lc ), it is possible to obtain the same convergence rates of an only-finest grid MGIR with high l, diminishing the storage requirements and the computational load to determine the residual overlaps [13].
6
Figure 2: Schematic representation of the multigrid grid iterant recombination process. Coarsest grid solution (#), smoothing ( ), recombination ().
4. Numerical tests The numerical algorithm is tested on the equation defined by the coefficients: D(r) = (2 + cos(kx), 2 + cos(ky), 2 + cos(kz)) , C(r) = (−k sin(kx), −k sin(ky), −k sin(kz)) ,
170
175
180
185
190
(10)
and s(r) = 0. Here k = 2π/L, where L = 5 is the length edge of the cubic computational domain Ω = (0, L)3 . Homogeneous Dirichlet conditions are imposed on the boundaries of the domain and the forcing term g is chosen to satisfy the exact solution: 2 1 x y2 z2 + 2+ 2 exp − , (11) ue (x, y, z) = 2σx2 2σy 2σz (2π)3/2 σx σy σz where the parameters σi > 0 determine the widths of the Gaussians along each one of the Cartesian directions. In all of the tests, three-level V-cycles (ng = 3) have been used, whereas some of the MG parameters (e.g., grid spacing, number of presmoothing and postsmoothing steps) are changed in order to study its robustness, stability and performance. A numerical solution, u, is assumed to be found when the `2 -norm of the residual is below 10−12 . The corresponding error is defined by δ = ||u − ue ||∞ . To begin with, an isotropic exact solution (σx = σy = σz = σ) is considered and the EPDE (eq (1)) is discretized on grids with equal mesh sizes hx = hy = hz = h = L/(N − 1), where N 3 is the number of mesh points. From the errors at consecutive mesh sizes hi and hi+1 , the accuracy order o of the HOCS is estimated through the relation (hi /hi+1 )o ≈ δ(hi )/δ(hi+1 ). The convergence factors are estimated from the ratio between the last (rhI ) and initial residuals (rh0 ) as qˆ = (rhI /rh0 )1/I , where I is the number of iterations required to achieve convergence [1]. For both, the accelerated and the non-accelerated MG, the starting point is u(r) = 0 and one hundred smoothing steps are used to estimate the solution at the coarsest level of the V-cycle (Ic = 100). The iterant recombination process, on the other hand, has been performed only on the finest grid of the V-cycle using at most three previous solutions (l = 3). The tables 1 and 2 show the results obtained with the standard and the accelerated MG schemes. For a given number of mesh points, N , both algorithms 7
N 41 61 81 101 121 141 161 181 201
δ 1.49e-04 2.89e-05 9.08e-06 3.70e-06 1.78e-06 9.61e-07 5.63e-07 3.51e-07 2.29e-07
o – 4.05 4.03 4.02 4.01 4.01 4.01 4.01 4.05
non-IT I 84 121 135 156 148 155 153 188 179
V(2,2) qˆ 0.76 0.83 0.85 0.87 0.87 0.87 0.87 0.90 0.89
IT V(1,1) I qˆ 20 0.31 20 0.32 19 0.32 20 0.33 20 0.33 20 0.35 20 0.34 21 0.37 21 0.37
IT V(2,2) I qˆ 15 0.21 15 0.22 15 0.23 15 0.24 15 0.24 15 0.25 16 0.27 19 0.32 20 0.35
Table 1: Number of grid points (N = Nx = Ny = Nz ), error (δ), accuracy order (o) of the HOCS, number of required V(n,m)-cycles (I) and convergence factors (ˆ q ) for the test case with a symmetric exact solution (eqs.(10) and (11) with σx = σy = σz = (2π)−1/2 ) using the standard MG (non-IT) and the multigrid preconditioned Krylov method (IT). The rest of MG parameters have been chosen as it is explained in the text (ng = 3, Ic = 100 and l = 3) and all executions have been performed using four MPI processes.
195
200
205
210
215
lead to the same errors and to the expected accuracy orders for the HOCS. However, in the non-accelerated case, converged results are only obtained using a V(2,2)-cycle with a considerable number of iterations (table 1). The accelerated solver with a V(2,2)-cycle not only diminishes the number of iterations (in average, by a factor of nine), but also leads to small variations of this quantity with respect to the mesh spacing. This almost h-independent convergence of the accelerated MG solver [1] is also visible in the convergence factors, which are up to four times smaller than those of the non-accelerated MG solver. An accelerated MG solver with a V(1,1)-cycle poses the same trends, although the number of iterations, as well as, the convergence factors, are slightly larger than those of the V(2,2)-cycle. In the former, however, both quantities are almost independent of the mesh size, whereas in the V(2,2)-cycle, they seem to increase for large N (h → 0). In this sense, the accelerated V(1,1)-MG solver operates at optimal complexity [1, 15]. Nevertheless, both of them are equally efficient (table 2), as the times required by the accelerated the solver with V(1,1) and V(2,2) are almost the same and considerably smaller (from three to seven times) than those of the non-accelerated V(2,2)-MG solver. Note, the times expended in the computation of the stencil is just a tiny percentage (up to 1.7% for small N ) of the total time employed by the solvers. The computation of the right hand side (smoothing) of the eq. (9), on the other hand, amounts up to 7.0% for h → 0. Next, by squeezing the 3D Gaussian function ue along the z-direction, a highly asymmetric exact solution is considered. The table 3 shows the results obtained using the accelerated MG solver with the same parameters as in the symmetric test. In general, due to the asymmetry higher samplings are needed to obtain an enough accurate solution. For equal mesh sizes, δ ∼ 10−5 requires at least Nx = Ny = Nz = N = 261. In contrast, for the symmetric ue , this is already achieved for N = 61 (table 1). Nevertheless, for the asymmetric case
8
N
trhs (s)
41 61 81 101 121 141 161 181 201
1.0e-02 3.6e-02 7.7e-02 1.8e-01 2.9e-01 4.7e-01 6.5e-01 1.0 1.4
tstencil (s) non-IT V(2,2) 0.80 3.44 7.47 13.82 23.17 35.03 49.69 87.56 119.53
4.0e-03 3.0e-03 7.0e-03 1.5e-02 2.2e-02 3.7e-02 4.5e-02 6.7e-02 9.3e-02
tsolver (s) IT V(1,1) 0.23 0.53 1.10 2.15 3.59 5.45 8.41 12.08 17.84
IT V(2,2) 0.21 0.54 1.30 2.11 3.71 5.56 8.56 12.40 18.06
Table 2: Times employed by the HOCS-MG algorithm, tsolver , for the data in table 1. The times expended in the calculation of the stencil, tstencil , and in the smoothing of the righthand side of the EPDE, trhs , are independent of the IT procedure and just the averages are shown.
220
225
230
the errors still adjust to the HOCS order, and the number of iterations and the convergence factors are small. The complexity of the V(1,1) and V(2,2) accelerated multigrid solvers is practically the same, although the V(1,1)-cycle is more efficient for h → 0. On the other hand, the higher sampling only needs to be imposed on the z direction, and coarse grids in the x and the y directions can lead to those errors obtained for the denser equal mesh size grids, but at much less computational effort. Thus, for example, δ ∼ 10−5 for (Nx , Ny , Nz ) = (101, 101, 261) which is seven times smaller than the grid with N = 261. Respect to the times required by the discretization with equal mesh sizes, the total time employed by the unequal mesh size V(1,1) and V(2,2) accelerated MG solvers are reduced by factors of five and seven, respectively. Nz 101 141 181 221 261 hIi hˆ qi
Nx = Ny = Nz ttotal (s) V(1,1) V(2,2) 3.05e-03 2.40 2.44 7.71e-04 6.12 6.26 2.79e-04 13.38 14.41 1.24e-04 28.20 34.72 6.36e-05 54.39 65.85 23 21 0.35 0.33 δ
δ
Nx = Ny = 101 ttotal (s) V(1,1) V(2,2)
7.25e-04 2.42e-04 9.79e-05 4.53e-05
3.34 4.99 7.25 10.02 28 0.41
3.26 4.85 6.38 9.09 22 0.34
Nx = Ny = 141 δ ttotal (s) V(1,1) V(2,2) 2.67e-04 1.13e-04 5.42e-05
8.11 11.19 15.26 25 0.39
8.17 12.27 16.34 22 0.34
Table 3: Error (δ) and total time (i.e. including tstencil and trhs ) for the test case with an asymmetric exact solution (eqs.(10) and (11) with σx = σy = 4σz = (2π)−1/2 ) using equal and unequal mesh sizes (hk = L/(Nk − 1) for k = x, y, z). The last two rows show the averages of the number of iterations and of the convergence factors. In all cases, the accelerated HOCS-MG solver (ng = 3, Ic = 100 and l = 3) have been used in runs with four MPI tasks.
In the above test the rather high asymmetry in the sought solution leads to pronounced differences between the equal mesh and unequal mesh size cases.
9
Figure 3: Isosurface plot of the charge density ρ(r) (black) and slice plot of the permittivity (r) (from = 1 (red) to = 80 (blue)) used in the generalized Poisson equation (eq. (12)). Periodic boundary conditions are imposed onto the x, y directions, whereas Dirichlet conditions are used in the z direction.
235
In the following test, a more realistic application, in which the mesh sizes are kept close to each other, although not exactly equal, is considered. As it is well known [16], in the electrostatics of non-homogeneous but isotropic dielectric media, the potential u is determined by the generalized Poisson equation: ∇ · ((r)∇u(r)) = −4πρ(r) ,
240
245
250
255
260
(12)
where is the dielectric permittivity and ρ is the free or non-bounded charge density. The figure 3 shows a plot of both quantities for a model problem in which the permittivity is a functional of the charge density [17]. The dielectric constant mostly changes along the z direction such that in the regions of high charge density (and for z → −∞) is equal to one, whereas for z → ∞ is a constant (different from one). A smooth permittivity in a small-width transition region connects these spatial domains. Once the scalar fields ρ and are defined in the computational domain, the boundary conditions determine uniquely the solution of the EPDE. There is no a priori reason to choose a particular set of values for the mesh sizes and the best option for the discretization is to work on grids equally spaced in all Cartesian directions. This is, however, not strictly possible due to the domain dimensions imposed by the physical problem: Ω = (0, 8.0014) × (0, 8.0014) × (0, 14.561). The table 4 shows the performance of the accelerated V(2,2)-MG solver, with the recombination of three residuals (l = 3), for two different unequal mesh size grids and different number of MPI tasks. The actual speedup is computed from the ratio between the time, T1 , employed by one task and the time Tp required by np > 1 tasks: sp = T1 /Tp . The number of smoothing steps used to solve approximately the EPDE at the coarsest level (Ic ) can play an important role in the parallel performance of the solver and as such two different values are considered. This is clearly visible in the coarsest grid, where sp (Ic = 50) > sp (Ic = 100) for np ≥ 8 (fig. 4). The same trend it is found, although to less extent, for the finest grid: sp (Ic = 50) ≥ sp (Ic = 100) for np ≥ 32. For both grids a nearly ideal speedup is found up to eight MPI tasks, then the performance increases slowly with the addition of more parallel processes. The figure 4 shows the least-square fits of the data to the Amdahl’s 10
law [18, 19]
1 , (13) (1 − p) + p/sideal assuming an ideal speedup, sideal , for the fraction, p, of the execution time amenable to parallelization. The solvers with the smaller number smoothing steps at the coarsest grid of the V-cycle fit better to above expression. On the other hand, for the finest grid, p is equal for both values of Ic , whereas for the coarsest grid, p(Ic = 50) is slightly larger than p(Ic = 100). sp =
265
np 1 4 8 16 32 48 64 128
(Nx , Ny , Nz ) = (72, 72, 137) Ic = 100 Ic = 50 ttotal (s) sp ttotal (s) sp 2.73 1 3.12 1 0.98 2.79 1.12 2.78 0.72 3.80 0.75 4.15 0.58 4.68 0.59 5.32 0.58 4.72 0.50 6.22 0.58 4.67 0.48 6.55 0.51 5.26 0.42 7.38
(Nx , Ny , Nz ) = (152, 152, 273) Ic = 100 Ic = 50 ttotal (s) sp ttotal (s) sp 46.42 1 57.49 1 10.72 4.33 16.28 3.53 6.50 7.14 7.81 7.36 4.82 9.64 6.08 9.45 3.46 13.40 3.77 15.24 2.76 16.82 2.93 19.60 2.31 20.09 2.80 20.56 1.83 25.32 2.16 26.55
Table 4: Total time, ttotal (s), and speedup, sp , for the accelerated MG solution of the generalized Poisson equation (eq. (12)) on different mesh size grids (Nx , Ny , Nz ). A V(2,2)-cycle with the recombination of three residuals (l = 3) has been used.
7
25
6
20
4
sp
sp
5
10
3
Ic = 100 : p = 0.82 Ic = 50 : p = 0.87
2 1
15
0
10
20
30
np
40
50
Ic = 100 : p = 0.97 Ic = 50 : p = 0.97
5 0
60
(a) (hx , hy , hz ) = (0.21, 0.21, 0.202)
0
20
40
60
np
80
100
120
(b) (hx , hy , hz ) = (0.099, 0.099, 0.101)
Figure 4: Fitting to the Amdahl’s law (continuous lines) of the data (dashed lines) in the table 4 . The legends show the fraction p of the execution time that benefits from an increasing of the parallel processes.
5. Conclusions 270
Although high order compact discretization schemes for the numerical solution of EPDEs are well known since long time ago [2, 5, 6, 7, 20, 21], to 11
275
280
285
290
295
our knowledge, the general meshsize fourth order compact stencil for the 3D convection-diffusion equation had not been reported. It has been found this HOCS is useful in the numerical solution of EPDEs which are better discretized on unequal mesh size grids or simply can not be discretized on uniform grids. On the other hand, the advantages of coupling the HOCS with geometric and specialized geometric multigrid solvers for solving the convection-diffusion equation have been well established [3, 21, 9, 10, 8]. Here it is shown that an accurate, robust and efficient solver can be obtained by combining the fourth order compact stencil with an accelerated geometric multigrid, in which, the multigrid acts as a preconditioner of a Krylov subspace technique [1, 12, 13]. Then, the number of iterations required to solve the linear system emerging from the discretization, as well as, the MG convergence factors and the execution times, can be diminished considerably. The parallel accelerated high order compact multigrid solver is highly efficient and fits to the Amdahl’s law, showing that a considerable fraction of the execution time benefits from parallelization. Further research directions of the present work include to deal with non-linear problems and use semicoarsening restriction and interpolation methods developed for highly asymmetric grids in the context of the Poisson equation [22]. Naturally, applications to other fields than electrodynamics and more complicated problems are also interesting as future work as they allow not only to check the robustness of the proposed solver, but also the possible improvements. Transport convectiondiffusion equations in electrochemistry[23], or even EPDEs emerging from Phase Field Crystal theory [24, 15] are some of the examples where the accelerated high order multigrid solver can be applied. Acknowledgement
300
This work is supported by the Cluster of Excellence RESOLV (EXC 1069) funded by the Deutsche Forschungsgemeinschaft (DFG) within the framework of the German Excellence Initiative and by the DFG priority program 1928 (COORNETs). AppendixA. Fourth order compact stencil for elliptic partial differential equations
305
Explicit simplified formulae for the general mesh size fourth-order compact stencil (eq. (10)) for the equation (1) are presented in this appendix. In the following, the notation D = (a, b, c) and C = (d, e, f ) is adopted. First of all, the right hand side of the eq.(10) and the coefficient α0 at the center point of the local coordinate system (Fig. 1) are given, respectively, by: 6 X a1 − a3 − d0 hx b2 − b4 − e0 hy 1 12g0 +2 gi − (g1 −g3 ) +(g2 −g4 ) + F0 = 24 a0 b0 i=1 c5 − c6 − f0 hz (g5 − g6 ) . (A.1) c0 12
and b0 c0 a0 + 2 + 2 − h2x hy hz ! 6 1 X ai bi ci si d1 − d3 e2 − e4 f5 − f6 + 2 + 2 − + + + + 6 i=1 h2x hy hz 2 hx hy hz b1 − b3 c1 − c3 2d0 1 a1 − a3 − d0 hx a1 − a3 1 + + + − (s1 − s3 ) + 12 a0 h2x h2y h2z hx 2 b2 − b4 c2 − c4 2e0 b2 − b4 − e0 hy a2 − a4 1 + + + − (s2 − s4 ) + b0 h2x h2y h2z hy 2 b5 − b6 c5 − c6 2f0 1 c5 − c6 − f0 hz a5 − a6 + + + − (s − s ) . (A.2) 5 6 c0 h2x h2y h2z hz 2 1 α0 = − 3
13
The rest of coefficients of the seven point stencil α0,··· ,6 can be related by pairs: P6 b0 c0 1 c1 − c3 b1 − b3 d0 + d1 − d3 a0 i=1 ai − 2 − 2 + − − + + 2 2 hx hy hz 12 hx h2z h2y hx ! P6 b0 d0 a1 − a3 − d0 hx c0 s0 1 i=1 di + 2 − − + + 2s0 + s1 − s3 − a0 h2z hy hx 2 24 hx d1 − d3 a1 − a3 − d0 hx d2 − d4 b2 − b4 − e0 hy a1 − a3 a2 − a4 1 + + 2 + 2 + 48 h2x hx a0 h2x hx b0 d5 − d6 c5 − c6 − f0 hz a5 − a6 + , 2 h2x hx c0 P6 1 a0 b0 c0 1 a2 − a4 c2 − c4 e0 + e2 − e4 i=1 bi α2 = − − 2 + 2 + − − + + 2 2 6 hx hy hz 12 hy h2x h2z hy ! P6 c0 e0 1 b2 − b4 − e0 hy a0 s0 i=1 ei + 2 − + − + 2s0 + s2 − s4 − b0 h2x hz hy 2 24 hy b1 − b3 e1 − e3 a1 − a3 − d0 hx b2 − b4 e2 − e4 b2 − b4 − e0 hy 1 2 + +2 + + 48 h2y hy a0 h2y hy b0 b5 − b6 e5 − e6 c5 − c6 − f0 hz 2 + , 2 hy hy c0 1 α1 = 6
α3 = β1 − α1 ,
α4 = β2 − α2 , b0 c0 1 a0 1 a5 − a6 b5 − b6 b0 c5 − c6 − f0 hz a0 c5 − c6 − f0 hz + 2 − 2 − + − 2 α5 = − − 2 + 6 h2x hy hz 12 h2x h2y hy c0 hx c0 ! P P6 6 1 f0 f5 − f6 i=1 fi i=1 ci 2 + + + + 2s0 + s5 − s6 − 24 hz hz h2z hz c1 − c3 f1 − f3 b2 − b4 − e0 hy c2 − c4 f2 − f4 1 a1 − a3 − d0 hx + + 2 + 2 + 48 a0 h2z hz b0 h2 hz z c5 − c6 − f0 hz c5 − c6 f5 − f6 2f0 2 + +2 + s0 , c0 h2z hz hz
α6 = β3 − α5 ,
14
(A.3)
where: ! P6 1 a0 b0 c0 1 d1 − d3 i=1 ai β1 = − 2 − 2 + + s0 − + 3 h2x hy hz 6 hx h2x 1 2d0 a1 − a3 − d0 hx a2 − a4 b2 − b4 − e0 hy a5 − a6 c5 − c6 − f0 hz a1 − a3 + + + , 12 h2x hx a0 h2x b0 h2x c0 ! P6 b0 c0 1 e2 − e4 1 a0 i=1 bi − 2 + 2 + + s0 − β2 = − + 3 h2x hy hz 6 hy h2y 1 b1 − b3 a1 − a3 − d0 hx 2e0 b2 − b4 − e0 hy b5 − b6 c5 − c6 − f0 hz b2 − b4 + + + , 12 h2y a0 h2y hy b0 h2y h2y c0 ! P6 b0 c0 1 f5 − f6 1 a0 i=1 ci + 2 − 2 + + + s0 − β3 = − 3 h2x hy hz 6 h2z hz c5 − c6 2f0 c5 − c6 − f0 hz 1 c1 − c3 a1 − a3 − d0 hx c2 − c4 b2 − b4 − e0 hy + . + + 12 h2z a0 h2z b0 h2z hz c0 (A.4) Next, the coefficients α7,··· ,10 in the corners of the square centered at (i, j, k) can be expressed in terms of just one of them: α7 =
β6 1 a2 − a4 b1 − b3 1 2d0 + d2 − d4 2e0 + e1 − e3 + + + + − 2 2 4 24 hx hy 48 hx hy d0 b2 − b4 − e0 hy e0 a1 − a3 − d0 hx 2b0 2a0 + + − , h2x hx b0 a0 h2y hy α8 = β4 − α7 ,
α9 = β5 − α8 = β5 − β4 + α7 , α10 = β6 − β5 − α7 ,
(A.5)
where: 1 a2 − a4 e0 β6 a0 b2 − b4 − e0 hy + − + , 2 12 h2x h2x b0 hy β6 1 b1 − b3 d0 b0 a1 − a3 − d0 hx β5 = − − + , 2 12 h2y h2y a0 hx 1 a0 b0 β6 = + . 3 h2x h2y
β4 =
15
(A.6) (A.7) (A.8)
Finally, analogous relations can be found for the coefficients α11 , . . . , α18 : c1 − c3 1 2d0 + d5 − d6 2f0 + f1 − f3 β11 1 a5 − a6 α11 = + + + + − 4 24 h2x h2z 48 hx hz f0 d0 a1 − a3 − d0 hx 2c0 c5 − c6 − f0 hz 2a0 + + − a0 h2z hz c0 h2x hx c2 − c4 1 2f0 + f2 − f4 β12 1 b5 − b6 2e0 + e5 − e6 + + α12 = + + − 4 24 h2y h2z 48 hz hy f0 e0 b2 − b4 − e0 hy 2c0 c5 − c6 − f0 hz 2b0 + + − , b0 h2z hz c0 h2y hy α13 = β7 − α11 , α14 = β8 − α12 , α15 = β9 − α11 ,
α16 = β10 − α12 ,
α17 = β11 − β7 − α15 = β11 − β7 − β9 + α11 ,
α18 = β12 − β8 − α16 = β12 − β8 − β10 + α12 , with:
310
1 a5 − a6 a0 c5 − c6 − f0 hz f0 β11 + − 2 + , β7 = 2 12 h2x hx c0 hz 1 b5 − b6 b0 c5 − c6 − f0 hz β12 f0 + − 2 β8 = + , 2 2 12 hy hy c0 hz 1 c1 − c3 c0 a1 − a3 − d0 hx d0 β11 β9 = + − + , 2 12 h2z h2z a0 hx β12 1 c2 − c4 c0 b2 − b4 − e0 hy e0 β10 = + − + , 2 12 h2z h2z b0 hy c0 1 a0 β11 = + , 3 h2x h2z 1 b0 c0 β12 = + . 3 h2y h2z
(A.9)
(A.10) (A.11) (A.12) (A.13) (A.14) (A.15)
In comparison to a full form of the HOCS, the above equations reduce, approximately, to the half the number of arithmetic operations. In a practical implementation, such reduction factor can be increased furthermore (for another factor of two) by identifying common terms which, of course, need to be computed just once. References [1] U. Trottenberg, C. W. Oosterlee, A. Schüller, Multigrid, Academic Press, 2001. 16
315
[2] W. F. Spotz, G. F. Carey, High-order compact scheme for the steady stream-function vorticity equations, International Journal for Numerical Methods in Engineering 38 (1995) 3497–3512. [3] J. Zhang, An explicit fourth-order compact finite difference scheme for three-dimensional convection–diffusion equation, Communications in Numerical Methods in Engineering 14 (1998) 209–218.
320
[4] L. Collatz, The Numerical Treatment of Differential Equations, Springer Science & Business Media, 2013. [5] W. F. Spotz, G. F. Carey, A high-order compact formulation for the 3D Poisson equation, Numerical Methods for Partial Differential Equations 12 (1996) 235–243.
325
330
[6] M. M. Gupta, J. Kouatchou, Symbolic derivation of finite difference approximations for the three-dimensional Poisson equation, Numerical Methods for Partial Differential Equations 14 (1998) 593–606. [7] L. Ge, J. Zhang, Symbolic computation of high order compact difference schemes for three dimensional linear elliptic partial differential equations with variable coefficients, Journal of Computational and Applied Mathematics 143 (2002) 9–27. [8] Y. Wang, J. Zhang, Fast and Robust Sixth Order Multigrid Computation for 3D Convection Diffusion Equation, Journal of computational and applied mathematics 234 (2010) 3496–3506.
335
340
[9] J. Zhang, Multigrid Method and Fourth-Order Compact Scheme for 2D Poisson Equation with Unequal Mesh-Size Discretization, Journal of Computational Physics 179 (2002) 170–179. [10] J. Wang, W. Zhong, J. Zhang, A general meshsize fourth-order compact difference discretization scheme for 3D Poisson equation, Applied Mathematics and Computation 183 (2006) 804–812. [11] W. R. Inc., Mathematica, Version 11.1, 2017. [12] T. Washio, C. W. Oosterlee, Krylov subspace acceleration for nonlinear multigrid schemes, Electronic Transactions on Numerical Analysis 6 (1997) 3–1.
345
350
[13] C. W. Oosterlee, T. Washio, Krylov subspace acceleration of nonlinear multigrid with application to recirculating flows, SIAM Journal on Scientific Computing 21 (2000) 1670–1690. [14] H. Köstler, R. Schmid, U. Rüde, C. Scheit, A parallel multigrid accelerated Poisson solver for ab initio molecular dynamics applications, Computing and Visualization in Science 11 (2008) 115–122.
17
[15] L. Dong, W. Feng, C. Wang, S. M. Wise, Z. Zhang, Convergence analysis and numerical implementation of a second order numerical scheme for the three-dimensional phase field crystal equation, Computers & Mathematics with Applications 75 (2018) 1912–1928. 355
[16] J. D. Jackson, Classical Electrodynamics, Wiley, 1975. [17] J.-L. Fattebert, F. Gygi, Density functional theory for efficientab initio molecular dynamics simulations in solution, Journal of Computational Chemistry 23 (2002) 662–666.
360
[18] G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, ACM Press, New York, USA, 1967, pp. 483– 485. doi:10.1145/1465482.1465560. [19] M. D. Hill, M. R. Marty, Amdahl’s Law in the Multicore Era, IEEE Computer 41 (2008) 33–38.
365
[20] W. F. Spotz, G. F. Carey, High-order compact finite difference methods, in: Preliminary Proceedings International Conference on Spectral and High Order Methods, Houston, TX, 1995, pp. 397–408. [21] M. M. Gupta, J. Zhang, High accuracy multigrid solution of the 3D convection–diffusion equation, Applied Mathematics and Computation 113 (2000) 249–274.
370
[22] Y. Ge, Multigrid method and fourth-order compact difference discretization scheme with unequal meshsizes for 3D poisson equation, Journal of Computational Physics 229 (2010) 6381–6391. [23] J. Newman, K. Thomas-Alyea, Electrochemical Systems, Wiley, 2012.
375
[24] K. Cheng, W. Feng, C. Wang, S. M. Wise, An energy stable fourth order finite difference scheme for the Cahn-Hilliard equation, arXiv:1712.06210 [math] (2017).
18