Journal of Computational Physics 231 (2012) 364–372
Contents lists available at SciVerse ScienceDirect
Journal of Computational Physics journal homepage: www.elsevier.com/locate/jcp
Efficient low-storage Runge–Kutta schemes with optimized stability regions Jens Niegemann ⇑, Richard Diehl, Kurt Busch Institut für Theoretische Festkörperphysik and DFG-Center for Functional Nanostructures (CFN), Karlsruhe Institute of Technology (KIT), Wolfgang-Gaede-Straße 1, 76131 Karlsruhe, Germany
a r t i c l e
i n f o
Article history: Received 13 January 2011 Received in revised form 1 September 2011 Accepted 2 September 2011 Available online 12 September 2011 Keywords: Low-storage Runge–Kutta (LSRK) Stability region Discontinuous Galerkin time-domain (DGTD) Maxwell’s equations
a b s t r a c t A variety of numerical calculations, especially when considering wave propagation, are based on the method-of-lines, where time-dependent partial differential equations (PDEs) are first discretized in space. For the remaining time-integration, low-storage Runge–Kutta schemes are particularly popular due to their efficiency and their reduced memory requirements. In this work, we present a numerical approach to generate new low-storage Runge– Kutta (LSRK) schemes with optimized stability regions for advection-dominated problems. Adapted to the spectral shape of a given physical problem, those methods are found to yield significant performance improvements over previously known LSRK schemes. As a concrete example, we present time-domain calculations of Maxwell’s equations in fully three-dimensional systems, discretized by a discontinuous Galerkin approach. Ó 2011 Elsevier Inc. All rights reserved.
1. Introduction Despite the tremendous technological progress over the past decades, the simulation of wave propagation in complex systems remains a challenging topic in the field of computational physics. Particularly the solution of three-dimensional hyperbolic partial differential equations in fields such as electrodynamics, acoustics or hydrodynamics often require enormous amounts of computational power and memory. Possibly the most common approach to these kinds of problems is the method-of-lines, where the spatial derivatives are discretized to transform the partial differential equations (PDEs) to a large semi-discrete system of coupled ordinary differential equations (ODEs) in time. For given initial conditions, those ODEs can then be stepped forward in time by using a suitable integration method. Obviously, such time-integration procedures should be as accurate as possible and should allow for large timesteps in order to yield good computational performance. Furthermore, for complex physical systems, the number of coupled ODEs can easily reach N 108 or higher. Therefore, computer memory often becomes a limiting factor and it is important to employ methods which are not only efficient in terms of computational time, but also with respect to memory usage. A particularly popular class of such integration schemes are explicit low-storage Runge–Kutta (LSRK) schemes [1–3]. Those LSRK methods inherit the versatility and robustness from traditional explicit Runge–Kutta schemes but work with significantly reduced memory requirements of only 2N. Unfortunately, the explicit LSRK schemes, just as all other explicit ODE solvers, are only conditionally stable. This means that there exists an upper limit for the timestep Dt which one can take. This limit is determined by two contributions: (1) the spectrum of the system of ODEs under consideration and (2) the characteristic stability region of the particular integration scheme. Since the spectrum of the system is defined by the physical model and by the spatial discretization, we have only
⇑ Corresponding author. Tel.: +49 721 608 46949; fax: +49 721 608 47040. E-mail address:
[email protected] (J. Niegemann). 0021-9991/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.jcp.2011.09.003
J. Niegemann et al. / Journal of Computational Physics 231 (2012) 364–372
365
limited means to modify it. The stability domain on the other hand is a property of the time-integration scheme and can, in principle, be tailored. In the past few years, a variety of different LSRK schemes of different order and with different stability regions where proposed [4–6]. In a previous work, we analyzed around 75 different LSRK schemes proposed in the literature for their suitability with respect to discontinuous Galerkin simulations of Maxwell’s equations [7]. As a result, we identified a number of rather efficient second- and third-order schemes, but all of the analyzed fourth-order schemes come at the price of a reduced effective timestep. For the case of parabolic or diffusion-dominated systems, significant analytical and numerical effort has been put into finding Runge–Kutta schemes with increased stability regions [8,9]. However, those methods are usually limited to narrow spectra along the negative real half-axis. Furthermore, most of the proposed schemes are of lower order and/or not in a lowstorage formulation. Only recently, Allampalli and coworkers proposed optimized fourth-order LSRK schemes for inviscid problems [6]. In this work, we will employ a general numerical optimization approach to obtain new fourth-order LSRK schemes with enlarged and tailored stability domains for the simulation of general hyperbolic PDEs. We illustrate our procedure by generating three novel fourth-order LSRK-schemes, adopted to typical spectra of common wave propagation problems. Furthermore, we demonstrate that those new schemes lead to performance enhancements of about 40% for realistic three-dimensional calculations. 2. The Runge–Kutta method For a given system of N coupled ODEs,
@ yðtÞ ¼ Fðt; yÞ @t
ð1Þ
the classical Runge–Kutta method [10] reads
ynþ1 ¼ yn þ Dt
s X
bi Ki ;
with Ki ¼ F t n þ ci Dt; yn þ Dt
i¼1
s X
! aij Kj :
ð2Þ
j¼1
Here, yn+1 denotes an approximation to the exact solution y(tn + Dt) and Dt = tn+1 tn. Furthermore, s denotes the number of stages, and {aij}, {bi} and {ci} are referred to as the Butcher coefficients of the Runge–Kutta scheme. Consistency mandates that s X
ci ¼
aij ;
ð3Þ
j¼1
so only {aij}, {bi} remain as independent coefficients. To simplify the notation, we combine {aij} into a matrix A and write {bi}, {ci} as vectors b and c, respectively. As alluded to in the introduction, we restrict our analysis to explicit schemes, so we further require that aij = 0 for j P i, i.e., only the strictly lower triangle of A is non-zero. For a given scheme to be of a certain order p, the coefficients have to fulfill a number of mp order conditions. These conditions are nonlinear equations of the coefficients and can be derived with the help of rooted trees (see [10] for details). For a fourth-order scheme, there are a total of m4 = 8 conditions which read s X
s X
bi ¼ 1;
i¼1 s X i¼1
bi ci ¼
i¼1
1 bi c2i ¼ ; 3
1 ; 2
s X
bi aij cj ¼
i;j¼1
s X
1 ; bi aij c2j ¼ 12 i;j¼1
s X
1 ; 6
s X
bi aij ajk ck ¼
i;j;k¼1
1 bi ci aij cj ¼ ; 8 i;j¼1
s X i¼1
bi c3i
1 ; 24 1 ¼ : 4
ð4Þ
Besides the order p, a second important characteristic of every numerical ODE solver is its stability. A necessary criterion for the stability is that all complex eigenvalues k 2 C of the Jacobi matrix J(t, y) = @F/@y fulfill the criterion
jGðkDtÞj 6 1:
ð5Þ
For explicit RK schemes, the function G(z) takes the form of a polynomial [10]
GðzÞ ¼ 1 þ c1 z þ c2 z2 þ þ cs zs ;
ð6Þ
where z = kDt represents the rescaled complex eigenvalues and the coefficients c are found to be
( Ps
ci ¼
T
j¼1 bj
for i ¼ 1;
i2
c for i > 1:
b A
ð7Þ
It is important to note, that the first p c-values are fully determined by the order conditions (4). Thus, for schemes with s = p, the stability contour is fixed. Only when the number of stages exceeds the order, there is a certain freedom to manipulate the region of stability. We will later exploit this fact to tailor adapted stability domains.
366
J. Niegemann et al. / Journal of Computational Physics 231 (2012) 364–372
3. Low-storage Runge–Kutta schemes As can be seen from expression (2), a classical Runge–Kutta scheme can be readily implemented using s + 1 memory registers of size N. However, for large-scale simulations, N can be extremely large and the computer memory often becomes the limiting factor. Thus, it is interesting to consider specialized low-storage Runge–Kutta schemes, which only require two memory registers, independently of the number of stages s. While there are multiple ways to implement such low-storage variations of the Runge–Kutta method [1–3], we will focus on a formulation originally proposed by Williamson [2], which reads
K1 ¼ yn ; K2 ¼ Ai K2 þ DtFðt n þ ci Dt; K1 Þ
K1 ¼ K1 þ Bi K2 ;
8i ¼ 1 . . . s
ynþ1 ¼ K1 : In order for the scheme to be self-starting, we require A1 = 0, so we end up with 2s 1 independent coefficients Ai and Bi. A connection between Ai, Bi and the classical Butcher coefficients can be expressed recursively as
8 > < Ajþ1 ai;jþ1 þ Bj aij ¼ Bj > : 0
bi ¼
for j < i 1; for j ¼ i 1
ð8Þ
otherwise;
Aiþ1 biþ1 þ Bi
for i < s;
Bi
for i ¼ s:
ð9Þ
It should be noted that due to the significant reduction of free variables, it is no longer possible to obtain a general fourthorder scheme with only s = 4 stages. Instead, at least s = 5 stages are necessary to generate a Williamson LSRK scheme of fourth order [4]. For higher-order schemes, the increase in stages will be even more dramatic. 4. Finding LSRK schemes with optimized stability regions In order to generate new LSRK schemes, we apply a two-step process. Starting from the desired shape of the stability domain, we first determine a set of optimized c-values for a given number of stages. Then, in a second step, we try to find an LSRK scheme with the previously obtained c-values which also fulfills all required order conditions (4) and the constraints (7). 4.1. Optimization of the stability domain As discussed above, the largest stable timestep Dtmax depends on the stability domain and on the spectrum of the Jacobian J of the semi-discrete system. It, therefore, makes sense to adapt the stability contour to the spectrum as closely as possible. The challenge is, that the exact spectrum is problem- and discretization-dependent and usually not known. Nonetheless, one often has at least some information about the general shape of the spectrum. As typical examples for the case of hyperbolic PDEs, we consider three different cases. First, we consider the case of a purely imaginary spectrum. This type of spectrum occurs when a non-dissipative or inviscid system is discretized by an energy-conserving spatial discretization. Typical examples of this type are finite-difference time-domain (FDTD) methods [11] or the discontinuous Galerkin approach [12–14] when employing central fluxes. In practice however, one often has at least some dissipation in the system of ODEs. This dissipation can either come from the original system of PDEs (e.g. lossy materials or absorbing boundary conditions) or it can be introduced via the spatial discretization. In particular, the use of an upwind flux, e.g., in a finite-volume or discontinuous Galerkin approach, will introduce artificial dissipation. In those cases, the spectrum is no longer purely imaginary, but will be extended into the negative real half-plane. In a previous work [7], we empirically identified three prototypical spectral shapes which we will consider in more detail:
Kinviscid ðsÞ ¼ fskjk 2 i½1; 1g;
ð10aÞ
n o Kellipse ðsÞ ¼ skjRe½k 6 0 ^ Re½2k þ k0 2 þ Im½k2 < 1 ;
ð10bÞ
Kcircle ðsÞ ¼ fskjRe½k 6 0 ^ jk þ k0 j < 1g:
1
1
ð10cÞ
Here, we have introduced a scaling factor s 2 R and the shift k0 ¼ cos sin . In the following, we will try to tailor LSRK 2 schemes with stability regions closely resembling the convex hull of those prototypical spectra. Thus, the spectra defined in Eqs. (10) will act as target stability regions for the LSRK schemes to construct. To visualize those target regions, in Fig. 1 we
J. Niegemann et al. / Journal of Computational Physics 231 (2012) 364–372
367
depict a number of points on their boundary. Please note that the target stability regions are mirror-symmetric with respect to the real axis. Therefore, we only depict the upper half-plane. Now, the objective is to find, for a given number of stages s, the coefficients ci (i = p + 1 . . . s) such that s is maximized while
KðsÞ fz 2 CjjGðz=sÞj < 1g: Please note the normalization by the number of stages s in the argument of G. This normalization corresponds to the introduction of an effective timestep Dteff = Dt/s. Since the right-hand side F(t, y) of (1), which dominates the computational effort, is evaluated once for each stage, the normalization by s allows us to directly compare the performance of schemes with different numbers of stages. Since the first p coefficients are fixed by the order conditions (4), we have s p free parameters cp+1 . . . cs for the optimization of the stability region. In practice, we employ the controlled random search with local mutation (CRS2) method [15] as implemented in the library NLopt [16]. For s < 10, this method typically converges to a set of c-values within a few minutes on a single core of a standard desktop PC. For each of the three target regions, we found optimized coefficients for up to s = 14 stages. By repeating the optimization from many different (random) starting values and by using various other global optimization techniques implemented in NLopt (e.g. DIRECT-L and MLSL), we convinced ourselves that the results are at least close to the optimal values. In Fig. 2, we plot the largest scaling factor s found for the three different prototypical spectra and for different orders p as a function of the number of stages s. The largest contours for which we finally found LSRK schemes are depicted in Fig. 3. Interestingly, we find that the maximal interval on the imaginary axis scales very similarly for orders p = 2, 3 and 4. Roughly, s seems to grows as s 1/s, which was proven to be the upper limit for all schemes in the inviscid case [17]. To be more specific, we find that all second-order schemes for odd s exactly show the optimal behavior. For an even number of stages all the qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi schemes of orders p = 2, 3 and 4 show growth as ðs 1Þ2 1=s. This growth corresponds to the behavior of the stability polynomials found by Kinnmark and Gray [18]. Our findings strongly support their conjecture that there exist no one-step qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi integration schemes with s even and an imaginary stability limit larger than ðs 1Þ2 1=s [18,19]. In contrast, for the other two spectral shapes considered, we observe a somewhat reduced growth of the stability regions. In particular, the increase of the scaling factor s seems to saturate. Nonetheless, for a sufficiently high number of stages, we can reach significant improvements over most available fourth-order LSRK schemes. As discussed in the introduction, the efficiency of available second- and third-order schemes is already quite satisfactory and from the results in Fig. 2 we do not expect significant gains from schemes with a higher number of stages. In contrast, for fourth-order schemes, we can expect up to 50% improvements over the commonly used scheme by Kennedey and Carpenter [4]. Thus, in the following we will only concentrate on generating fourth-order LSRK schemes. 4.2. Finding the LSRK-coefficients With a set of c-values at hand, we can now find the LSRK coefficients Ai and Bi by solving the set of nonlinear order conditions stated in (4). In addition, we need to fulfill the constraints (7) provided by the optimized stability domain. However, since for each additional stage, we gain two free variables but only one new constraint, we usually have more unknowns than equations. In total, the number of equations to solve is given by mp + s p. For our case of p = 4, this amounts to s + 4 equations with 2s 1 unknowns. To solve the system of nonlinear equations, we employ a derivative-free hybrid method, implemented as gsl_multiroot_fsolver_hybrids in the GNU scientific library (GSL) [20]. Since this method requires an initial guess, we assign all free variables Ai and Bi random values from [1, 0] and [0, 1], respectively. Furthermore, the method supports only as many free variables as there are equations. Therefore, we randomly pick s + 4 of the 2s 1 coefficients as free variables while the rest remains fixed at its random initial guess.
Fig. 1. Each panel shows 50 sample points on the surface of the target stability regions specified in (10). Here, the scaling factor is chosen as s = 1. Please note that the target stability regions are mirror-symmetric with respect to the real axis.
368
J. Niegemann et al. / Journal of Computational Physics 231 (2012) 364–372
Fig. 2. Maximal scaling factors s found for different target spectra and for different orders p.
Fig. 3. The optimized stability domains (p = 4) for our three target spectra (thick red lines). The number s corresponds to the largest number of stages, for which we managed to construct a suitable LSRK scheme. In yellow we indicate the largest target region fitting into the stability region of the respective scheme. For comparison, we also depict (in light gray) the boundary of the stability region of the LSRK(5, 4) method proposed by Kennedy and Carpenter in [4]. Please note the scaling of the axes by s and that the regions are mirror-symmetric with respect to the real axis. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Table 1 LSRK(12, 4) coefficients (inviscid target stability region, cf. Fig. 3(a)). i
ci
Ai
Bi
ci
1 2 3 4 5 6 7 8 9 10 11 12
1 1/2 1/6 1/24 7.7793114345018587 103 1.2973631162180358 103 1.4820214027731423 104 1.8551101042762935 105 1.2351886928579280 106 1.2377768810554030 107 3.7434529900414887 109 3.1278890521988389 1010
0.0000000000000000 0.0923311242368072 0.9441056581158819 4.3271273247576394 2.1557771329026072 0.9770727190189062 0.7581835342571139 1.7977525470825499 2.6915667972700770 4.6466798960268143 0.1539613783825189 0.5943293901830616
0.0650008435125904 0.0161459902249842 0.5758627178358159 0.1649758848361671 0.3934619494248182 0.0443509641602719 0.2074504268408778 0.6914247433015102 0.3766646883450449 0.0757190350155483 0.2027862031054088 0.2167029365631842
0.0000000000000000 0.0650008435125904 0.0796560563081853 0.1620416710085376 0.2248877362907778 0.2952293985641261 0.3318332506149405 0.4094724050198658 0.6356954475753369 0.6806551557645497 0.7143773712418350 0.9032588871651854
Table 2 LSRK(13, 4) coefficients (elliptical target stability region, cf. Fig. 3(b)). i
ci
Ai
Bi
ci
1 2 3 4 5 6 7 8 9 10 11 12 13
1 1/2 1/6 1/24 8.1116406653683835 103 1.2566761910282494 103 1.5605379767258244 104 1.5517942735576833 105 1.2224029698949826 106 7.4494312546583213 108 3.3568607387350691 109 1.0176127485607402 1010 1.6382192183434098 1012
0.0000000000000000 0.6160178650170565 0.4449487060774118 1.0952033345276178 1.2256030785959187 0.2740182222332805 0.0411952089052647 0.1797084899153560 1.1771530652064288 0.4078831463120878 0.8295636426191777 4.7895970584252288 0.6606671432964504
0.0271990297818803 0.1772488819905108 0.0378528418949694 0.6086431830142991 0.2154313974316100 0.2066152563885843 0.0415864076069797 0.0219891884310925 0.9893081222650993 0.0063199019859826 0.3749640721105318 1.6080235151003195 0.0961209123818189
0.0000000000000000 0.0271990297818803 0.0952594339119365 0.1266450286591127 0.1825883045699772 0.3737511439063931 0.5301279418422206 0.5704177433952291 0.5885784947099155 0.6160769826246714 0.6223252334314046 0.6897593128753419 0.9126827615920843
369
J. Niegemann et al. / Journal of Computational Physics 231 (2012) 364–372 Table 3 LSRK(14, 4) coefficients (circular target stability region, cf. Fig. 3(c)). i
ci
Ai
Bi
ci
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 1/2 1/6 1/24 8.0971474827892589 103 1.2380169165300218 103 1.4920544370587013 104 1.4105197862197588 105 1.0338060754675449 106 5.7551620074656494 108 2.3518316167532871 109 6.6527970264862166 1011 1.1639946786449694 1012 9.4910013085549050 1015
0.0000000000000000 0.7188012108672410 0.7785331173421570 0.0053282796654044 0.8552979934029281 3.9564138245774565 1.5780575380587385 2.0837094552574054 0.7483334182761610 0.7032861106563359 0.0013917096117681 0.0932075369637460 0.9514200470875948 7.1151571693922548
0.0367762454319673 0.3136296607553959 0.1531848691869027 0.0030097086818182 0.3326293790646110 0.2440251405350864 0.3718879239592277 0.6204126221582444 0.1524043173028741 0.0760894927419266 0.0077604214040978 0.0024647284755382 0.0780348340049386 5.5059777270269628
0.0000000000000000 0.0367762454319673 0.1249685262725025 0.2446177702277698 0.2476149531070420 0.2969311120382472 0.3978149645802642 0.5270854589440328 0.6981269994175695 0.8190890835352128 0.8527059887098624 0.8604711817462826 0.8627060376969976 0.8734213127600976
It is important to note that all our calculations are performed in double-precision and the resulting coefficients are only accurate to around 12 digits. In order to improve this, in a final step, we polish the coefficients by using the arbitrary precision capabilities of Mathematica [21]. Essentially, we employ the nonlinear equation solver of Mathematica with our previous solutions as an initial guess. In this polishing step, we reduce the residuum of each equation to below 1040. For each of our three target spectra we used this procedure to generate new LSRK schemes with increased regions of stability. For the inviscid target spectrum, we found a new 12-stage scheme, for the elliptical spectrum we found a 13-stage LSRK scheme and for the circularly shaped spectrum we found a 14-stage method. The resulting coefficients can be found in Tables 1–3. 5. Numerical verification In order to assess the accuracy and efficiency of our new LSRK schemes, we run a number of numerical tests. To simplify the notation, we introduce the notation LSRK(s, p) to identify the different schemes. In the following LSRK(5, 4) always refers to the method proposed by Kennedy and Carpenter in [4]. Furthermore, we denote the HALE-RK7 from [6] as LSRK(7, 4). Our new schemes are denoted by LSRK(12, 4), LSRK(13, 4) and LSRK(14, 4) for the inviscid, the elliptical and the circular target spectra, respectively. 5.1. Basic tests First, we integrate a nonlinear system of ODEs which was also used for a similar purpose in [22,6], i.e.,
@ 1 expðt 2 Þ uðtÞ ¼ v t; @t u t2 @ 1 v ðtÞ ¼ expðt2 Þ 2t expðt2 Þ: @t v
ð11aÞ ð11bÞ
For initial conditions u(1) = 1, v(1) = exp(1), the analytic solution of this system of equations is given by
uðtÞ ¼
1 ; t
v ðtÞ ¼ expðt2 Þ:
ð12Þ
We numerically integrate (11) from t0 = 1 to tend = 1.4 for a range of step widths Dt. To study the error of the time integration, we define the total error after the simulation as
~ ðtend Þ uðt end Þj þ jv~ ðt end Þ v ðt end Þj; E ¼ ju
ð13Þ
~ refer to the numerical results while u and v represent the analytic solutions (12). ˜ and v where u In Fig. 4, we plot the error as a function of the timestep Dt and find that all schemes show the expected fourth-order behavior. In addition, at least for this example, our new schemes all exhibit significantly better error constants than the established LSRK(5, 4) or LSRK(7, 4). 5.2. Discontinuous Galerkin time-domain calculations of Maxwell’s equations For a more realistic application, we now consider the solution of Maxwell’s equations via a nodal discontinuous Galerkin time-domain (DGTD) method [12,13,23]. In this approach, Maxwell’s curl equations
370
J. Niegemann et al. / Journal of Computational Physics 231 (2012) 364–372
Fig. 4. Total error for the test system (11) at tend = 1.4.
@Eðr; tÞ ¼ r Hðr; tÞ; @t @Hðr; tÞ ¼ r Eðr; tÞ; lðrÞ @t
ðrÞ
are discretized in space to obtain the semi-discrete system of equations
i @Ek 1h ^ ðn ^ DEÞÞ þ Z þ n ^ DH =Z ; ¼ k Dk Hk þ ðMk Þ1 F k aðDE n @t
i @Hk 1 h ^ DE =Y : ^ ðn ^ DHÞÞ Y þ n ¼ k Dk Ek þ ðMk Þ1 F k aðDH n @t l In principle, the DG approach allows a spatial discretization of arbitrary order pDG. The vectors Ek and Hk together contain a total of Nk = (pDG + 1)(pDG + 2)(pDG + 3) unknown electric and magnetic field values that are associated with the element k. ^ is the normal vector of those interFurther, DE and DH signify the field discontinuities across the element’s boundaries and n faces. The definition of the matrices Dk, Mk and F k , as well as details on the material parameters Y and Z can be found in Ref. [14]. Finally, the free parameter a 2 [0, 1] allows to tune the shape of the spectrum without modifying the stability properties of the discretization [7]. This parameter is also known as the upwind factor because it linearly interpolates between an energy-conserving central flux (a = 0) and a dissipative upwind flux (a = 1). As a first example, we consider the time-evolution of an eigenmode of an empty cubic cavity (see Ref. [24, Chapter 8.3]) of edge length a = b = c = 2 with perfect electric conductor (PEC) boundary conditions. We use the mesh shown in Fig. 6(a) and employ a fourth-order spatial discretization, i.e., pDG = 4. The resulting semi-discrete system has a total of N = 80640 degrees of freedom. As initial conditions, we prepare a cavity mode with m = n = p = 2 and let the simulation run for 10 cycles. During the simulation, we record the z-component of the electric field and calculate the maximal deviation from the analytical reference solution at each timestep. The maximal deviation over all timesteps and elements is what we call the maximal error E max . In Fig. 5, we show how the maximal error E max for this system behaves as a function of the effective timestep Dteff for our three new LSRK schemes and for different values of the upwind factor a. As a first observation, we note that the error does not seem to depend on the timestep. More precisely we find that, as soon as the timestep is sufficiently small to yield a stable scheme, the error is entirely determined by the spatial discretization for this system. Furthermore, we observe that for a = 0.0, the error is significantly higher than for larger upwind factors. This effect is already well known [13,7] and simply indicates that a central flux is not suitable for accurate simulations of Maxwell’s equations with the discontinuous Galerkin method. In addition, we find that depending on the upwind factor a, a different integration scheme leads to the best performance. This is to be expected since a decreasing a squeezes the spectrum towards the imaginary axis [13,7]. In a second study, we repeat the previous analysis for a series of a-values and extract the largest stable effective timestep. In Fig. 6(b), we plot this timestep relative to the largest stable effective timestep that is allowed with the popular LSRK(5, 4) scheme with a full upwind flux (a = 1). This directly gives us the computational speedup obtained with our new schemes. We find, if we keep the full upwind flux, that our new LSRK(14, 4) scheme with a roughly circular stability region yields a speedup of almost 40%. For a partial upwind flux with a = 0.65, we find the LSRK(13, 4) scheme to perform even better. Here, the speedup is around 51%. Finally, for a central flux, our new LSRK(12, 4)-scheme reaches an improvement of around 53%. However, as stated above, the central flux comes at the price of a significant increase in the error due to the spatial discretization.
J. Niegemann et al. / Journal of Computational Physics 231 (2012) 364–372
371
Fig. 5. Influence of the normalized timestep Dteff = Dt/s on the maximal error E max of an empty metallic cavity (pDG = 4). The three panels show results for different LSRK schemes, while the markers/colors represent different values of the upwind factor: a = 1 (crosses/red), a = 0.8 (circles/green), a = 0.6 (pluses/ blue), a = 0.4 (squares/cyan), a = 0.2 (stars/magenta), a = 0.0 (triangles/dark green). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 6. Setup and the computational speedup for DG calculations of an empty cubic PEC cavity. We depict the speedup of our new LSRK schemes for different upwind factors a, as compared to LSRK(5, 4) for a full upwind flux (a = 1).
Finally, it should be noted that the LSRK(14, 4) scheme was already successfully employed in a number of realistic largescale DGTD simulations. In particular, the numerical results published in Refs. [23,25–27] were all obtained by using this technique.
6. Conclusion and outlook In conclusion, we have shown that by generating LSRK schemes with tailored stability contours, one can significantly increase performance of many method-of-lines simulations. For the case of discontinuous Galerkin time-domain calculations of Maxwell’s equations, we found improvements of the computational speed by 40–50% without significant sacrifices in accuracy or memory consumption. Those performance gains are particularly attractive since they require almost no changes on existing simulation codes. Concerning further improvements, we conclude from extrapolating the data shown in Fig. 2 that the presented fourthorder schemes are already close to optimal for the considered stability regions. By increasing the number of stages to s 20, one might be able to increase the effective timestep by approximately 10%. However, one has to expect that the rounding errors during the evaluation of the stability polynomials will start to become more significant with an increasing number of stages. Thus, one probably needs to employ a multi-precision variation of the contour optimization and of the nonlinear equation solver. It is unclear whether the potential gain is worth the additional effort. Finally, we want to mention that we also made some initial attempts to construct efficient fifth-order schemes. Unfortunately, it seems that the stability contours are much more restricted and a significantly larger number of stages is necessary to obtain efficient LSRK schemes. This will be an aim of further research.
References [1] P.J. van der Houwen, Explicit Runge–Kutta formulas with increased stability boundaries, Numer. Math. 20 (2) (1972) 149–164. [2] J.H. Williamson, Low-storage Runge–Kutta scheme, J. Comput. Phys. 35 (1) (1980) 48–56.
372
J. Niegemann et al. / Journal of Computational Physics 231 (2012) 364–372
[3] D.I. Ketcheson, Highly efficient strong stability-preserving Runge–Kutta methods with low-storage implementations, SIAM J. Sci. Comput. 30 (4) (2008) 2113–2136. [4] M.H. Carpenter, C.A. Kennedy, Fourth-order 2N-storage Runge–Kutta schemes, Technical Report NASA-TM-109112, NASA Langley Research Center, VA, June 1994. [5] C.A. Kennedy, M.H. Carpenter, R.M. Lewis, Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations, Appl. Numer. Math. 35 (3) (2000) 177–219. [6] V. Allampalli, R. Hixon, M. Nallasamy, S.D. Sawyer, High-accuracy large-step explicit Runge–Kutta (HALE-RK) schemes for computational aeroacoustics, J. Comput. Phys. 228 (10) (2009) 3837–3850. [7] R. Diehl, K. Busch, J. Niegemann, Comparison of low-storage Runge–Kutta schemes for discontinuous Galerkin time-domain simulations of Maxwell’s equations, J. Comput. Theor. Nanosci. 7 (2010) 1572–1580. [8] P.J. van Der Houwen, B.P. Sommeijer, On the internal stability of explicit, m-stage Runge–Kutta methods for large m-values, Z. Angew. Math. Mech. 60 (1980) 479–485. [9] M. Torrilhon, R. Jeltsch, Essentially optimal explicit Runge–Kutta methods with application to hyperbolic-parabolic equations, Numer. Math. 106 (2007) 303–334. [10] J.C. Butcher, Numerical Methods for Ordinary Differential Equations, John Wiley & Sons, New York, 2003. [11] A. Taflove, S.C. Hagness, Computational Electrodynamics: The Finite-Difference Time-Domain Method, third ed., Artech House Publishers, 2005. [12] J. Hesthaven, T. Warburton, Nodal high-order methods on unstructured grids – I. Time-domain solution of Maxwell’s equations, J. Comput. Phys. 181 (1) (2002) 186–221. [13] J. Hesthaven, T. Warburton, Nodal Discontinuous Galerkin Methods, Springer, 2007. [14] J. Niegemann, M. König, K. Stannigel, K. Busch, Higher-order time-domain methods for the analysis of nano-photonic systems, Photon. Nanostruct. Fundam. Appl. 7 (1) (2009) 2. [15] P. Kaelo, M.M. Ali, Some variants of the controlled random search algorithm for global optimization, J. Optim. Theory Appl. 130 (2) (2006) 253–264. [16] S.G. Johnson, The NLopt nonlinear-optimization package.
. [17] R. Vichnevetsky, New stability theorems concerning one-step numerical methods for ordinary differential equations, Math. Comput. Simulat. 25 (1983) 199–205. [18] I. Kinnmark, W. Gray, One step integration methods of third-fourth order accuracy with large hyperbolic stability limits, Math. Comput. Simulat. 26 (1984) 181–188. [19] I. Kinnmark, A principle for construction of one-step integration methods with maximum imaginary stability limits, Math. Comput. Simulat. 29 (1987) 87–106. [20] M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, M. Booth, F. Rossi, GNU Scientific Library: Reference Manual, second revised ed., Network Theory Ltd., Bristol, UK, 2005. [21] Mathematica Version 7.0, Wolfram Research, Inc., Champaign, Illinois, 2008. [22] D. Stanescu, W.G. Habashi, 2N-storage low dissipation and dispersion Runge–Kutta schemes for computational acoustics, J. Comput. Phys. 143 (2) (1998) 674–681. [23] K. Busch, M. Knig, J. Niegemann, Discontinuous Galerkin methods in nanophotonics, Laser Photon. Rev., in press. . [24] C.A. Balanis, Advanced Engineering Electromagnetics, John Wiley & Sons, 1989. [25] K.R. Hiremath, J. Niegemann, K. Busch, Analysis of light propagation in slotted resonator based systems via coupled-mode theory, Opt. Express 19 (9) (2011) 8641–8655. [26] C. Matyssek, J. Niegemann, W. Hergert, K. Busch, Computing electron energy loss spectra with the Discontinuous Galerkin Time-Domain method, Photonics Nanostruct. Fundam. Appl., in press. . [27] F. von Cube, S. Irsen, J. Niegemann, C. Matyssek, W. Hergert, K. Busch, S. Linden, Spatio-spectral characterization of photonic meta-atoms with electron energy-loss spectroscopy, Opt. Mater. Express 1 (5) (2011) 1009–1018.