Computer Physics Communications 164 (2004) 118–121 www.elsevier.com/locate/cpc
A parallel electrostatic solver for the VORPAL code Peter Messmer ∗ , David L. Bruhwiler Tech-X Corporation, 5621 Arapahoe Avenue, Suite #A, Boulder, CO 80301, USA Available online 21 July 2004
Abstract Here we report on the addition of a parallel electrostatic (ES) field solver to the hybrid plasma simulation code VORPAL. Poisson’s equation is solved by finite differences in real space. The large sparse system of linear equations is solved by iterative methods, like Krylov subspace algorithms and algebraic multigrid (AMG). Utilizing solver libraries optimized for parallel platforms (AZTEC and ML, both developed at Sandia National Labs) results in good overall performance of the solver. Accuracy of the solver is compared to analytical results for examples relevant to electron-cooling problems. 2004 Elsevier B.V. All rights reserved. Keywords: Poisson solver; Iterative methods; Algebraic multigrid; Electron cooling; Particle-in-cell (PIC)
1. Introduction VORPAL [1] is a parallel hybrid PIC/fluid code under development by the University of Colorado at Boulder and by Tech-X Corp. It is implemented in C++, using template metaprogramming techniques to enable the expression of algorithms in arbitrary dimension. This code has been extensively used for electromagnetic (EM) modeling of laser–plasma interactions. An electrostatic (ES) field solver has recently been implemented for VORPAL, so the code can be applied to electron-cooling physics [2] and other wide-ranging problems. The EM treatment of plasmas is more fundamental than ES, but computationally much more demanding. In particular, the Courant condition requires that light cannot cross a grid cell in one timestep—a restriction that does not make sense for nonrelativistic plasmas * Corresponding author.
E-mail address:
[email protected] (P. Messmer). 0010-4655/$ – see front matter 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.cpc.2004.06.018
interacting with low-frequency fields. In such cases, an ES treatment of the problem is justified and can be orders of magnitude faster. Also, an ES field solve is often required to establish correct initial conditions for an EM simulation. The numerical techniques used in an ES code differ significantly from those used in an EM code, especially for parallel computing. In an EM code, the field advance requires only local quantities, making it relatively straightforward to efficiently parallelize the computation with low communication overhead. In contrast, an ES field solve requires global information—starting from the global charge distribution, the electric potential is obtained by solving Poisson’s equation. The decision to use freely available solvers, which are already optimized for massively parallel architectures, yielded excellent performance while greatly simplifying implementation within VORPAL. Solving the discretized Poisson equation is a well studied field in numerical analysis and several algo-
P. Messmer, D.L. Bruhwiler / Computer Physics Communications 164 (2004) 118–121
rithms are known, including methods based on Fast Fourier transforms, or Green’s function. The electrostatic solver presented here discretizes the Poisson equation in real space, which reduces to the solution of a sparse linear system Ax = r, where A is an n × n matrix of finite difference kernels with n the number of grid cells in the system, x the electrostatic potential, and r the charge density on the grid. The matrix is very sparse, with usually 5–7 non-zero elements per row, depending on the dimensionality of the problem. The electric field is finally obtained by finite differencing the potential. Currently, arbitrary shaped Dirichlet boundaries are supported, leading in general to a nonsymmetric matrices.
2. Implementation Designing robust and fast linear solvers on parallel architectures is a challenging task. We therefore used for our implementation the AZTEC library from Sandia National Lab [3]. The library offers a large collection of solver and preconditioner algorithms. Additionally, AZTEC allows the user to easily specify the decomposition of the matrix and the vectors among the processors. This avoids redistribution of field data before and after the field solver step. In addition to the conventional Krylov subspace solvers for non-symmetric matrices, like Conjugate Gradient Squared (CGS) or Stabilized Bi-Conjugate Gradients (BiCGStab) available in AZTEC, multigrid based solvers can be very efficient for Poissontype (elliptic) problems. The new electrostatic solver for VORPAL offers therefore an Algebraic Multigrid (AMG) based preconditioner and a full AMG solver. The implementation of the smoothed-aggregation AMG algorithm is based on the multigrid preconditioning package ML [4], developed by Sandia Natl. Lab. While ML offers a variety of smoothers to chose from, only Gauss–Seidel without damping was used in the following numerical experiments.
fore, solutions to ellipsoidal and Gaussian charge distributions in conducting boxes are computed. An analytical treatment for a Gaussian distribution is used to precisely study errors. 3.1. Accuracy A non-trivial problem of relevance to electron cooling is to determine the potential of a Gaussian elec2 2 2 2 tron distribution ρ = ρ0 e−x /2a e−y /2b in a conducting box. For simplicity, we only show the 2D case, but the extension to 3D is straightforward. The charge distribution is confined in a box of length Ld /2 in all dimensions d, centered around the origin. The analytic expression for the potential can be found by expanding both the potential and the charge distribution in a Fourier series, e.g. Φ(x, y) =
Here we consider accuracy of the solvers, timing issues, and scaling with number of processors. There-
∞
Φm,n eikm x eikn y ,
m,n=0
2π km = m, Lx
kn =
2π n, Ly
(1)
where denotes the real part. Letting the Laplacian operate on the potential leads to the following relation between the Fourier coefficients of the potential and the charge density: Φm,n = −
2 km
1 ρm,n . + kn2
(2)
The Fourier coefficients for a Gaussian charge distribution can be expressed as a product of error functions, Ly Lx ρm,n = 2πab erf √ erf √ 2 2a 2 2b 1
× e− 2 (km a
2 2 +k 2 b2 ) n
.
(3)
Eqs. (3) and (2) determine Φm,n , the coefficients of the potential. Imposing conducting boundary conditions, and the symmetry of the charge distribution, the potential is finally determined by Φ(x, y) = 4
3. Results
119
∞
Φm, ˆ nˆ cos(km ˆ x) cos(knˆ y),
(4)
m,n=0
with m ˆ = 2m + 1, nˆ = 2n + 1. Fig. 1, left, shows the relative error of the potential of a Gaussian electron distribution on a 400 × 400
120
P. Messmer, D.L. Bruhwiler / Computer Physics Communications 164 (2004) 118–121
Fig. 1. Left: Relative error of the potential for a 2D Gaussian charge distribution on a 400 × 400 cell grid with 50 particles per cell. Right: Transverse electric field of a uniformly charged ion distribution on a 65 × 65 cells grid with aspect ratio 1000. Solid: 100 particles/cell, 10−10 residual. Coincides with the theoretical result. Dashed: 100 particles/cell, 10−1 residual. Dotted: 1 particle/cell, 10−10 residual.
cell grid with 50 particles per cell on average. The series in Eq. (4) was approximated by a sum of the first 50 coefficients. The convergence criterion for the iterative solver (preconditioned CGS) was a residual of 10−10 . The largest contribution to the error originates from the center of the distribution. The random nature of the error indicates its origin in the noise of the particle distribution. The effect of low sampling of the particle distribution will be discovered again in the next example. 3.2. Aspect ratio The electron cooling problem is characterized by an approximately Gaussian ion distribution which is highly elongated along the dimension parallel to an external magnetic field. In order to avoid an excessive number of grid points in the longitudinal direction, the electrostatic solver has to be capable of obtaining solutions for grids with large aspect ratios. Therefore the electric field of a uniformly charged ellipsoidal ion distribution with axes ax = 0.5 mm and ay = 0.5 m on a 65 × 65 cells grid is investigated. The right panel of Fig. 1 shows the electric field along the short dimension, obtained with the new ES solver, using preconditioned CGS. No artifacts are visible at the interface between plasma and vacuum and the solution is indistinguishable from the theoretical solution. The error due to a crude solution of Poisson’s equation is marginal. However, undersam-
pling the charge distribution leads to a relatively large error. 3.3. Performance The availability of different iterative solvers and preconditioners in AZTEC allows to easily compare convergence. Fig. 2, left panel, shows the convergence of the electrostatic potential for a Gaussian particle distribution on a 1026 × 65 × 65 cells grid with a cell aspect ratio of 500. Relative residuals versus time for runs on 32 processors of NERSC’s IBM-SP are shown, using different solvers. A three step symmetric Gauss–Seidel preconditioner was used for CGS and BiCGStab. For this particular problem, full AMG has the best performance and convergence behavior. Each iteration of the AMG preconditioned CGS takes relatively long, but the overall solution time is comparable to the full AMG solver. They are both almost a factor of two faster than preconditioned CGS and much faster than CGS or BiCGStab. The convergence criterion was chosen arbitrarily to be 10−6 . The previous example showed that the overall accuracy of the potential in this particular case is dominated by the accuracy in the charge distribution and not by the accuracy of the solution. Relaxing the convergence criteria could therefore lead to much faster solution time. Another criteria is the scalability of the algorithm on parallel architectures. Even though the algorithm
P. Messmer, D.L. Bruhwiler / Computer Physics Communications 164 (2004) 118–121
121
Fig. 2. Left: Convergence behavior for different solvers for a 3D Gaussian charge distribution on a 1026 × 65 × 65 cell grid on 32 processors: Full AMG (solid, thick), AMG preconditioned CGS (solid, thin), Gauss–Seidel preconditioned CGS (dotted, thick), CGS (dotted, thin), preconditioned BiCGStab (dashed, thick), BiCGStab (dashed, thin). The execution time per timestep is shown as horizontal lines. Right: Execution times for the electrostatic simulation of a 3D Gaussian beam, on a grid of 1026 × 65 × 65 cells (solid) and 4104 × 65 × 65 cells (dotted), using AMG preconditioned CGS (diamond) or Gauss–Seidel preconditioned CGS (stars).
can handle large aspect ratios, it may be desirable to have a large computational grid, e.g. in order to resolve the longitudinal Debye length. Fig. 2 (right panel) shows the execution times of the solver on the IBMSP at NERSC, using the AMG preconditioned CGS field solver and preconditioned CGS. The scaling is promising up to a large number of processors.
Acknowledgement The ML library was kindly supplied by Ray Tuminaro, Sandia National Lab. For more information on ML, see http://www.cs.sandia.gov/~tuminaro/ ML_Description.html. This work was funded by DOE contract DE-FG03-01ER83313 and by Tech-X Corporation.
4. Conclusions References A parallel electrostatic field solver has been added to the hybrid simulation code VORPAL. Utilizing solver libraries optimized for parallel platforms (AZTEC and ML, both developed at Sandia National Labs), good performance for the solution of the large system of linear equations resulting from the discretization of Poisson’s equation is obtained. Accuracy of the solver is compared to analytical results for examples relevant to electron-cooling problems.
[1] C. Nieter, J.R. Cary, VORPAL: a versatile plasma simulation code, J. Comput. Phys. 196 (2004) 448. [2] I. Ben-Zvi, et al., Electron cooling for RHIC, in: Proc. Part. Accel. Conf., 2001, p. 48. [3] R.S. Tuminaro, M. Heroux, S.A. Hutchinson, J.N. Shadid, Official Aztec User’s Guide: Version 2.1, Technical Report, SAND99-8801J, December, 1999. [4] C. Tong, R.S. Tuminaro, ML 2.0 smoothed aggregation user’s guide, Technical Report SAND2001-8028, Sandia National Laboratories, December 2000.