A scalable fully implicit method with adaptive time stepping for unsteady compressible inviscid flows

A scalable fully implicit method with adaptive time stepping for unsteady compressible inviscid flows

Computers and Structures 176 (2016) 1–12 Contents lists available at ScienceDirect Computers and Structures journal homepage: www.elsevier.com/locat...

3MB Sizes 0 Downloads 144 Views

Computers and Structures 176 (2016) 1–12

Contents lists available at ScienceDirect

Computers and Structures journal homepage: www.elsevier.com/locate/compstruc

A scalable fully implicit method with adaptive time stepping for unsteady compressible inviscid flows q Yafei Liu a, Haijian Yang b,⇑, Chao Jiang a, Chao Yang c,d a

State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body, College of Mechanical and Vehicle Engineering, Hunan University, Changsha, Hunan 410082, China College of Mathematics and Econometrics, Hunan University, Changsha, Hunan 410082, China c Institute of Software, Chinese Academy of Sciences, Beijing 100190, China d State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100190, China b

a r t i c l e

i n f o

Article history: Received 4 November 2015 Accepted 2 August 2016 Available online 31 August 2016 Keywords: Compressible inviscid flows Finite volume scheme Fully implicit method Newton–Krylov method Parallel scalability

a b s t r a c t The class of fully implicit methods is drawing more attention in the simulation of fluid dynamics for engineering community, due to the allowance of large time steps in extreme-scale simulations. In this paper, we introduce and study a scalable fully implicit method for the numerical simulations of unsteady compressible inviscid flows governed by the compressible Euler equations. In the method, a cell-centered finite volume scheme together with the local Lax-Friedrichs (LLF) formula is used for the spatial discretization, and a backward differentiation formula is applied to integrate the Euler equations in time. The resultant nonlinear system at each time step is then solved by a parallel Newton-Krylov method with a domain decomposition type preconditioner. To improve the performance of the proposed method, we introduce an adaptive time stepping method which adjusts the time step size according to the initial residual of Newton iterations. Therefore, the proposed fully implicit solver overcomes the often severe limits on the time steps associated with existing methods. Numerical experiments validate that the approach is effective and robust for the simulations of several compressible inviscid flows. We also show that the newly developed algorithm scales well with more than one thousand processor cores for the problem with tens of millions of unknowns. Ó 2016 Elsevier Ltd. All rights reserved.

1. Introduction The problem of inviscid flows has been the subject of intensive theoretical, numerical, and experimental investigations in the recent years because of its significant applications in nature and in many scientific and engineering practices, such as shock-wave reflection and diffraction [35,51,52], general atmospheric circulation [49], and Rayleigh-Taylor instability [28,29,36]. The analysis of inviscid flows is very complex largely due to the effect of a variety of wave motions. The characterization of the limiting flow regime, where physically insignificant fast waves such as the acoustic wave and the inertia-gravity wave take shape and travel, is very essential for the complete understanding of the mechanism

q This research was supported by the NSFC grants 11571100, 91530103 and 11172096. Chao Jiang was also supported in part by the NSFC for excellent young scholars 51222502. Haijian Yang was also supported in part by the NSFC grant 11272352 and the Planned Science and Technology Project of Hunan Province under grant 2015JC3055. ⇑ Corresponding author. E-mail addresses: [email protected] (Y. Liu), [email protected] (H. Yang), [email protected] (C. Jiang), [email protected] (C. Yang).

http://dx.doi.org/10.1016/j.compstruc.2016.08.003 0045-7949/Ó 2016 Elsevier Ltd. All rights reserved.

of the fluid flow from the technical and engineering standpoints. Hence, technological implications have given rise to increased interest in solving the fully Euler equations. High resolution meshes are often required to resolve the details of the solution, which implies large condition numbers of the resulting algebraic systems, and also implies the need for parallel algorithms with good robustness and scalability. Modern computational fluid dynamics (CFD) usually uses the time marching method to solve the Euler equations, including the explicit method and the implicit method. For a single time step, the class of the explicit methods, such as the well-known Runge– Kutta method [40,45], Adams-Bashforth scheme [43], and Leapfrog-Euler algorithm [44] is much more advantageous in terms of computational complexity and memory consumption. For inviscid flows, no matter whether their initial conditions are continuous or not, some discontinuous solutions may arise, such as shock, rarefaction and contact-wave [23,52]. As a result, a finer mesh is needed for sufficient computation accuracy. But in the explicit method, the time step size is restricted by the CourantFriedrichs-Lewy (CFL) stability condition, which usually results in smaller time step size and much more total compute time when

2

Y. Liu et al. / Computers and Structures 176 (2016) 1–12

the mesh is refined. On the other hand, the class of implicit methods can weaken or even completely relax the CFL restriction and offer much larger time steps, including the semi-implicit method [15,24] and the fully implicit method [18,32]. To some extent, the semi-implicit method can weaken the stability condition and decrease the time step size. Nevertheless, it is still confined to some stability restriction. The fully implicit methods can relax the stability requirement on the time step size and therefore, provide a consistent and robust way to solve the Euler equations in long-term and extreme-scale simulations. In particular, the fully implicit approach combined with a suitable time adaptivity method can further reduce the total compute time, which greatly enlarges the scope of application of the fully implicit method [38,50]. When a fully implicit method is applied, one needs to solve a large-scale nonlinear algebraic system at each time step. In the study, we introduce and study a scalable fully implicit NewtonKrylov-Schwarz (NKS) algorithm for solving the unsteady compressible inviscid flows. The proposed method is particularly well suited for solving large nonlinear systems in memory-distributed parallel environments, and has been applied in some fields [4,13,17,31,46–49]. The NKS method consists of three major steps: (a) an inexact Newton method is used to solve the nonlinear system at each time step; (b) the linear system at each Newton step is solved by an Krylov subspace iteration method; (c) the class of domain decomposition preconditioners is employed to accelerate the Krylov iterations. The success of the overall approach depends heavily on the preconditioner. Hence, in the study we use an overlapping restricted additive Schwarz method to build the preconditioner, since it does not require any splitting of the multicomponent system and is more efficient than the classical additive Schwarz method on machines with a large number of processors. On the other hand, although the implicit method completely relaxes the CFL stability restriction on time step size, the truncation errors will increase if a larger time step is used, while a smaller time step increases the total compute time. Therefore, to handle well the trade-off between the accuracy and the efficiency, the time step size is expected to change with the flow field rather than stay a fixed value. For this purpose, we present a fully implicit method with adaptive time stepping, where the time step size is chosen by comparing the change of the solution of two subsequent time steps. The rest of the paper is organized as follows. In Section 2, the dimensionless Euler equations in conservative form are introduced. The discretization of Euler equations is presented in Section 3, including spatial discretization with the local Lax-Friedrichs (LLF) method and temporal discretization with the fully implicit method. In Section 4, we focus on the details of the NKS algorithm. Some numerical results are reported in Section 5, including the parallel performance of the proposed method. Finally, we end this paper with some concluding remarks in Section 6. 2. The compressible inviscid flows In conservative form, the compressible Euler equations [27,42] in x–y plane can be expressed as

@U @F @G þ þ ¼ S; @t @x @y 3

p ¼ ðc  1Þqe;

2

3

2

3

sffiffiffiffiffiffiffiffiffiffi qc L2c ; tc ¼ pc

rffiffiffiffiffi pc uc ¼ ;

qc

ec ¼

pc

qc

;

gc ¼

pc : Lc qc

Then, the dimensionless variables are denoted by a prime

v0 ¼

v ; v ¼ L; p; q; u; v ; e; t; g: vc

After substitution in (1) and (2) and deleting the primes, we see (1) and (2) remain the same. So (1) and (2) can just be seen as the dimensionless form. 3. Discretization for Euler equations 3.1. Spatial discretization To spatially discretize the Euler equations, a cell-centered finite volume method [27,42,45] is employed. Suppose the computational domain X is a rectangular domain ½0; L1   ½0; L2  covered by a uniform rectangular N 1  N 2 mesh. For i ¼ 1; 2; 3; . . . ; N 1 and j ¼ 1; 2; 3; . . . ; N 2 , each mesh cell Xij is centered at ðxi ; yj Þ with the uniform mesh size Dx  Dy. The cell average of U and S are defined as

U ij ¼

S ij ¼

1 DxDy 1 DxDy

ZZ

U dx dy; Xij

ZZ S dx dy  SðU ij Þ: Xij

Then, integrating (1) over Xij leads to the following semi-discrete system

@U ij F iþ12;j  F i12;j Gi;jþ12  Gi;j12 þ þ ¼ S ij ; @t Dx Dy where F i1=2;j and F iþ1=2;j are the averages of FðUðx; yÞÞ over the left and right boundaries of the mesh cell Xij , respectively, i.e.,

F i1;j ¼ 2

q qu qv 6 qu 7 6 qu2 þ p 7 6 quv 7 6 7 6 7 6 7 U¼6 7; F ¼ 6 7; G ¼ 6 7; 4 qv 5 4 quv 5 4 qv 2 þ p 5 qE uðqE þ pÞ v ðqE þ pÞ

ð2Þ

with c ¼ 1:4 for air. It’s worth mentioning that the unknowns in this paper are the conservative variable U rather than the original variables e; q; p; u and v, so the flux vectors F and G need to been expressed as functions of U. In order to reduce the round-off error, (1) and (2) should be non-dimensionalized. Let Lc ; pc and qc be the typical length, the typical pressure and the typical density respectively, and take them as units of the length, the pressure and the density. Then the units of velocity, time, energy per unit mass and acceleration can be obtained as

ð1Þ

with

2

where q is the density, p is the pressure, u and v are the velocity components in x and y-directions, respectively, E ¼ e þ ðu2 þ v 2 Þ=2 is the total energy per unit mass with e being the internal energy per unit mass, and S is the source term. We assume an ideal gas, and recall the ideal gas equation of state p ¼ qRT, from which it follows that

F iþ1;j ¼ 2

1 Dy 1 Dy

Z

y

Z y

y

jþ1 2

   F U xi1 ; y dy;

ð3Þ

   F U xiþ1 ; y dy:

ð4Þ

2

j1 2

y

jþ1 2

2

j1 2

Similarly, Gi;j1=2 and Gi;jþ1=2 are the averages of GðUðx; yÞÞ over the bottom and top boundaries of the mesh cell Xij , respectively. In order to approximate the fluxes F i1=2;j ; F iþ1=2;j ; Gi;j1=2 and Gi;jþ1=2 , one may apply various Riemann solvers. From Refs.

3

Y. Liu et al. / Computers and Structures 176 (2016) 1–12

[7,12,27], we know that for the compressible inviscid flows involving shock, rarefaction or contact-wave, the local Lax-Friedrichs (LLF) method can obtain great results

1 2

Uðq ; qþ Þ ¼ ½Fðq Þ þ Fðqþ Þ  kðqþ  q Þ;

ð5Þ

pffiffiffiffiffiffiffiffiffiffiffi where k ¼ maxðju j þ c ; juþ j þ cþ Þ with c ¼ cp=q being the speed of sound, and the superscripts ‘‘” and ‘‘þ” denote the left and right (bottom and top) states at the cell wall, respectively. q and qþ are also called the reconstructed states of q at the cell wall, as shown in Fig. 1. 3.2. State reconstruction The use of appropriate limiters is crucial to guarantee the stability of the finite volume scheme, which is usually achieved by reconstructing the states at the cell wall. For comparisons, in the paper we use the following limiters to build the reconstructed states q and qþ :  Minmod: limiterða; bÞ ¼ Sða; bÞ  minfjaj; jbjg; 2jajjbj ;  Van Leer: limiterða; bÞ ¼ Sða; bÞ  jajþjbj

 MC: limiterða; bÞ ¼ Sða; bÞ  minfja þ bj=2; 2jaj; 2jbjg;  Superbee:

limiterða; bÞ ¼ S  maxfminf2jaj; jbjg; minfjaj; 2jbjgg; 

j-scheme: ( limiterða; bÞ ¼

1j a 2 1þj a 2

þ 1þ2 j b for q ; þ 12 j b for qþ ;

where Sða; bÞ ¼ ðsignðaÞ þ signðbÞÞ=2, and signðaÞ is a sign function:

8 > < 1 if x < 0; signðxÞ ¼ 0 if x ¼ 0; > : 1 if x > 0: By using the above limiters, we obtain the reconstructed states:

8 < qi1 ¼ qi1;j þ 12  limiterðqi1;j  qi2;j ; qi;j  qi1;j Þ; 2

: qþ 1 ¼ qi;j  12  limiterðqi;j  qi1;j ; qiþ1;j  qi;j Þ: i 2

It is worth mentioning that, for different test cases, an appropriate limiter should be adopted to ensure the stability and eliminate non-physical oscillations, which will be studied in detail in Section 5. We also remark that, for the j-scheme, j ¼ 0; 1=2 and 1=3 lead to the Fromm scheme [9], the QUICK scheme [26], and the QUICKEST scheme [26], respectively. In this work, we choose the QUICK scheme because of its high-order accuracy and low numerical dissipation.

3.3. Fully implicit method with adaptive time stepping After spatially discretizing (1), the obtained semi-discrete system can be written as

@ ðUðtÞÞ þ LðUðtÞÞ ¼ SðUðtÞÞ; @t

ð6Þ

where L is a spatially discretizing operator and SðUðtÞÞ is the source term. Although for each time step, the explicit method is superior to the implicit method in terms of computational complexity, the time steps that the explicit method needs to compute may be far more than the implicit method, especially when the mesh becomes finer. As we discussed before, discontinuous solutions may exist no matter whether the initial conditions are continuous or not for the simulations of compressible inviscid flows. So the mesh is required to be finer for sufficient accuracy, which leads to smaller time steps and much more total compute time for the explicit method. On the other hand, the implicit method offers much larger time steps and costs less total compute time, especially when the adaptive time stepping method is employed [46]. Hence, in this study, we apply a fully implicit method to integrate the Euler equations in time, and adopt the second order backward difference formula (BDF-2) [18,32]

1 ð3U n  4U n1 þ U n2 Þ þ LðU n Þ ¼ SðU n Þ; 2Dt n

ð7Þ

where U n denotes the solution at the n-th time step. Particularly, for the first time step, namely n ¼ 1, the first order implicit Euler scheme is employed

1 ðU n  U n1 Þ þ LðU n Þ ¼ SðU n Þ: Dt n

ð8Þ

When using inexact Newton method to solve the nonlinear system (7) or (8), the initial residual of Newton iterations implies the variation of the flow field between two consecutive time steps, because the initial value for each Newton iteration loop is set to the solution of last time step. If the residual error is large, which indicates the flow field changes fast, then the time step size should be reduced for sufficient accuracy, otherwise a larger time step size is needed to reduce the total compute time. For this purpose, we present an adaptive time stepping method, which updates the time step size according to the initial residual of Newton iterations. More specifically, we start with a relatively small time step size Dt0 , and update its value via Dtn ¼ minfan Dtn1 ; Dtmax g where Dtmax is the maximum allowable time step size to ensure the accuracy, and an is an amending coefficient

8 > < amax an ¼ 1=amax > : n

if n > amax ; if n < 1=amax ;

ð9Þ

otherwise; g

where n ¼ ðres =resn Þ with resn being the L2-norm of the initial residual of Newton iteration loop at n-th time step, amax is the maximum allowable scaling to avoid excessive change of the time step size, and g is a tunable exponent. Both of these two parameters amax and g can be used to control the adjustment of the time step size. n1

4. Newton-Krylov-Schwarz algorithm After discretizing the Euler Eqs. (1) in space and time, we obtain a nonlinear system

F ðU n Þ ¼ 0 Fig. 1. The reconstructed states of q on the cell Xij .

ð10Þ

at each time step. In the paper, the nonlinear system is solved with the inexact Newton method [20,21] where the corresponding

4

Y. Liu et al. / Computers and Structures 176 (2016) 1–12

Jacobian system is approximately solved via a Krylov subspace iteration method [20,34], which is called Newton-Krylov method [22]. When the Jacobian matrix of the nonlinear system is ill-conditioned, an unacceptably large amount of Krylov iterations are demanded to satisfy the condition of convergence, and sometimes, the condition of convergence can not even be satisfied. To deal with this problem, a Schwarz preconditioner [5] could be applied to precondition the Jacobian system, which can accelerate the Krylov iterations, and this method is called Krylov-Schwarz method. Newton-Krylov method and Krylov-Schwarz method can be composed as Newton-Krylov-Schwarz (NKS) method. 4.1. Newton-Krylov method An inexact Newton method is employed to solved the nonlinear system (10), which updates the new approximate solution U nk by n

U nk ¼ U nk1 þ knk dk ;

for k ¼ 1; 2; . . .

n dk

where is the search direction obtained by a Krylov iteration method, and knk is the steplength determined by a line search method. In this study, the value of U n0 is set to the solution at last time step, i.e. U n0 ¼ U n1 . The iterative procedure will be repeated before the following convergence criteria is satisfied

n    o   n  F U  6 max g F U n  ; g ; r a k 0 2 2

ð11Þ

where gr and ga are relative and absolute tolerance, respectively. n For each Newton iteration, the search direction dk is obtained by approximately solving the following Jacobian system

  n J nk dk ¼ F U nk ;

ð12Þ

where J nk is the Jacobian matrix. Note that, the Jacobian matrix J nk is large and sparse for large computational mesh. While direct solution of the linear system (12) is expensive in terms of the total compute time and the memory consumption, iterative methods become more preferable. Here, we choose the Generalized Minimal RESidual (GMRES) method as a Krylov subspace algorithm for solving the linear system (12). The basic idea of this method is to transform the linear system (12) into a least square problem

min

v

     n n n  F U k þ J k dk;m  ;

dnk;m 2dn0 þKm ðJ; 0 Þ

2

for m ¼ 1; 2; . . .

points repeatedly so that the Jacobian matrix can be approximately calculated, i.e.,

J i;j ¼

F i ðU þ ej Þ  F i ðUÞ



ð15Þ

where  is a positive real number satisfying   1, and ej 2 RN is the vector with value 1 at the j-th position and value 0 at all other positions. 4.2. Restricted additive Schwarz preconditioner When the Jacobian matrix is ill-conditioned, the Krylov method requires an unacceptably large number of iterations to satisfy the convergence criteria (14). Therefore, a preconditioner needs to be used to accelerate the Krylov iterations. Here we use the restricted additive Schwarz (RAS) preconditioner [2,4,5,47,49]. And the preconditioned linear system reads

  n n J nk M 1 RAS M RAS dk ¼ F U k ;

ð16Þ

where M 1 RAS is the RAS preconditioner. In order to build the RAS preconditioner, a decomposition of the computational domain X needs to be introduced. We firstly decompose X to a series of non-overlapping subdomains Xi ; i ¼ 1; 2; . . . ; np where np is the number of subdomains (or processor cores). Then, the subdomains Xi are extended to Xdi by d layers of mesh cells where d is a overlapping parameter, as shown in Fig. 2. In each overlapping subdomain, we define a local Jacobian matrix J i by restricting the global Jacobian matrix J to the overlapping subdomain Xdi , i.e.,

 T J i ¼ Rdi J Rdi : Here, Rdi is a restriction operator, which restricts a global vector  T defined in X to a local vector defined in Xdi . Rdi , the transpose of Rdi , is a prolongation operator, which prolongates a local vector to a global vector by zeroing the elements outside Xdi . The global RAS preconditioner is defined as:

ð13Þ

where m denotes the m-th Krylov iteration, and Km ðJ; v 0 Þ is the Krylov subspace

n    2  m1 o Km J nk ; v 0 ¼ span v 0 ; J nk v 0 ; J nk v 0 ; . . . ; J nk v0 with

  n F U nk þ J nk dk;0  :  n n n  F U k þ J k dk;0 

v 0 ¼ 

2

The GMRES method is restarted after a number, say s, of iterations, with the solution at s-th iteration step being the new initial n n guess, i.e. dk;0 ¼ dk;s . The GMRES iterations (13) are stopped when the following convergence criteria is satisfied:

    n    o  n n n  n F U k þ J k dk;m  6 max er F U k 2 ; ea ; 2

ð14Þ

where er and ea is the relative and absolute tolerance, respectively. At each Newton step, when solving the linear system (12) by GMRES, the estimate of the Jacobian matrix J is needed. In the study, we use the multi-colored finite difference (MCFD) method [8] to calculate the matrix explicitly. In the MCFD method, the numerical differentiation procedure needs to evaluate F at certain

Fig. 2. Domain decomposition method. The dash lines stand for mesh lines, solid lines for non-overlapping decomposition, dark-colored areas for a non-overlapping subdomain Xi0 , and light-colored areas for the out-extended areas from Xi0 with d ¼ 1. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Y. Liu et al. / Computers and Structures 176 (2016) 1–12

M 1 RAS ¼

np  T X d R0i J 1 i Ri i¼1

R0i

where is defined similar to Rdi , which restricts a global vector to a local vector defined in the non-overlapping subdomain Xi . Actually, but their action on what we really need is not the inverse matrix J 1 i a vector, which could be either exactly calculated by a LU factorization [16] or obtained approximately by an incomplete LU (ILU) factorization [33]. 5. Numerical tests

 For the adaptive time step method, the maximum allowable scaling is set to amax ¼ 1:5 while the tunable exponent is g ¼ 1:1.  For Newton iterations, the relative and absolute tolerance are respectively gr ¼ 1:0  108 and ga ¼ 1:0  106 .  For

GMRES

ea ¼ 1:0  10

iterations,

we

use

er ¼ 1:0  108

and

11

as the relative and absolute tolerance, respectively, and the restarting parameter is fixed at s ¼ 90.  For the domain decomposition method, an overlapping parameter of d ¼ 2 is utilized, and the subdomain solvers are set to the sparse LU factorization. 5.1. Validation by several test cases Several previously published test cases are run in parallel environment to validate the proposed algorithms, including a 1D generic test case (Sod shock tube problem) [10,39], a 2D generic test case (an oblique shock reflection at a wall) [3,10,39], three 2D Riemann problems [6,23,25], Sedov blast wave [35,51,52] and Rayleigh-Taylor instability [28,29,36]. For all the test cases, the initial conditions are discontinuous, and some discontinuous solutions may arise, such as shock, rarefaction and contact-wave. At the first time step, the initial guess of Newton iterations is set to zero for all test cases except the second one, the oblique shock reflection, in which the initial guess is set to the initial conditions. 5.1.1. Sod shock tube problem To validate the discretization scheme and the fully implicit solver, we first run the 1D generic test case, Sod shock tube problem, by using different spatial discretization schemes. The initial conditions are set to:

 ðq; u; pÞ ¼

local Lax-Friedrichs method with minmod or Van Leer limiter are free from visible oscillations, while the local Lax-Friedrichs method reconstructed by other limiters produces non-physical oscillations near shocks. Depending on appropriate spatial discretization methods and appropriate limiters, the implicit method obtains credible results. Due to the consistent interpretation of mass (or energy) fluxes between interfaces, the total mass is rigorously conserved for the finite volume method. Denote the total discrete summation of a physical variable / as:

rð/Þ ¼

In this section, we report some results of numerical experiments. The proposed algorithms are implemented on the Portable, Extensible Toolkit for Scientific computation (PETSc) [1] library of Argonne National Laboratory. The numerical tests are carried on a TH-1HN supercomputer located at Hunan University. The computing nodes are interconnected via a THNI proprietary high-performance network, with two six-core 2.93 GHz Intel Xeon Westmere EP High-performance processors and 48 GB local memory in each node. Unless otherwise specified, all parallel computations are set with the following parameters.

ð1; 0; 1Þ

x 6 5;

ð0:125; 0; 0:1Þ x > 5;

where x 2 ð0; 10Þ. We compare the local Lax-Friedrichs method reconstructed by different limiters with a Godunov-type scheme [14,41]. In the test, we set the time step size to Dt ¼ 0:005 and run the tests on a 512 mesh. The plots of the velocity u at the simulation time t ¼ 2:0 are shown in Fig. 3. As Fig. 3 depicts, different discretization schemes show different performance on eliminating the non-physical oscillations. The Godunov-type method and the

5

N1 X N2 X /ij  DxDy; i¼1 j¼1

where N1 and N 2 are the number of mesh cells in x and y directions, and Dx and Dy are the mesh size in x and y directions, respectively. Then the numerical conservation error of the total mass at time t ¼ t m is defined as:

Error ¼

rðqðtm ÞÞ  rðqðt0 ÞÞ : rðqðt0 ÞÞ

Fig. 4 displays the numerical conservation error of the total mass for different spatial discretization methods. As shown in Fig. 4, for all methods, the conservation error of the total mass is within the range 2:0  1013 2:0  1013 , which is small enough to guarantee the conservation of mass. 5.1.2. An oblique shock reflection at a wall We further run the 2D generic test case, i.e., an oblique shock reflection at a wall [3,10,39], to verify the robustness of the fully implicit method, by using the local Lax-Friedrichs method with different limiters. For this test case, the boundary conditions along the left side of the computational domain X ¼ ½0; 4  ½0; 1 are given by q ¼ 1:0; u ¼ 2:9; v ¼ 0; e ¼ 5:990714; and the top side are given by q ¼ 1:7; u ¼ 2:619334; v ¼ 0:506320; e ¼ 5:805957. The reflective condition is imposed on the bottom side, while the outflow condition on the right side. We set the time step size to Dt ¼ 0:005 and run the tests on a 128  32 mesh. Fig. 5 presents the contour plots of density at simulation time t ¼ 3:0, obtained by using different limiters. It can be easily found that the resemblance between the simulated results and the published results in [3,10,39] is remarkable. The proposed method successfully resolves the shock interaction and reflection problem. We would like to point out that for this case, the discretization scheme reconstructed by superbee can not guarantee the stability, while for the following test cases, j-scheme is the only limiter which can ensure the stability. Hence, in the following numerical tests, we always use j-scheme to build the limiter for the spatial discretization. 5.1.3. 2D Riemann problems The third one is the class of 2D Riemann problems, which is solved on domain X ¼ ½0; 1  ½0; 1 with a open boundary condition. The computation domain is divided into four subdomains with piecewise constant initial conditions for each subdomain, as shown in Fig. 6. According to [6,23,25], there are 19 cases for 2D Riemann problems based on different initial conditions, from which we take two test cases to validate the proposed algorithms: Riemann 6 and Riemann 12. Their initial conditions are presented in Table 1. Let S stand for shock, R for rarefaction, J for contactwave. Then, these two test cases are characteristic by: Riemann 6: J 12 ; J 23 ; J 34 ; J 41 and Riemann 12: S12 ; J 23 ; J 34 ; S41 , where J 12 denotes that a discontinuous solution of contact-wave arises between the 1st subdomain and the 2nd subdomain, and the others are defined similarly. Fig. 7 presents the numerical results of Riemann 6 and Riemann 12 obtained on a 512  512 mesh with a fixed time step

6

Y. Liu et al. / Computers and Structures 176 (2016) 1–12

Fig. 3. The numerical results of Sod shock tube problem.

Fig. 4. Conservation errors of total mass for Sod shock tube problem.

Dt ¼ 0:002. The initial conditions given by Riemann 6 lead to four contact-wave surfaces between four subdomains. As the simulation of Riemann 6 moves forward, the four surfaces are deformed, and a vortex arises at the center of the computation domain. Similarly, for Riemann 12, two shock surfaces and two contact-wave surfaces are introduced because of the initial conditions. The two shock surfaces March towards the top right corner followed by a pair of Mach stems, while the two contact-wave surfaces rest where they were. All of the results match well with the serial computing results of the references [6,23,25,29].

Fig. 5. The numerical results of oblique shock reflection.

5.1.4. Sedov blast wave In the fourth test case, Sedov blast wave contains shock-shock and shock-contact discontinuity interactions. It is a typical explosion problem that neglects gravitational and thermodynamic source terms, i.e. S ¼ ð0; 0; 0; 0ÞT . The computation domain

X ¼ ½0; 1  ½0; 1 is divided into two subdomains, one of which is X0 ¼ fðx; yÞjðx  0:4Þ2 þ ðy  0:4Þ2 0:32 g. The subdomain X0 is

Y. Liu et al. / Computers and Structures 176 (2016) 1–12

7

The results are similar to [35,51,52] with the only difference of the center of shock front, which is caused by the different setting of the subdomain X0 .

Fig. 6. The initial conditions for 2D Riemann problems.

Table 1 Initial conditions for 2D Riemann problems. ‘‘T” denotes the final time of the simulation. All of the variables at this table are dimensionless.

Riemann 6

Riemann 12

Subdomain

p

q

u

v

T

1st 2nd 3rd 4th

1.0 1.0 1.0 1.0

1.0 2.0 1.0 3.0

0.75 0.75 0.75 0.75

0.5 0.5 0.5 0.5

0.3

1st 2nd 3rd 4th

0.4 1.0 1.0 1.0

0.5313 1.0 0.8 1.0

0 0.7276 0 0

0 0 0 0.7276

0.25





ð5; 5; 0; 0Þ ðx; yÞ 2 X0 ; ð1; 1; 0; 0Þ ðx; yÞ 2 X n X0 :

Boundary conditions on all four border lines are reflective. Fig. 8 displays the numerical simulations of Sedov blast wave at different time. All simulations are performed on a 512  512 mesh with a fixed time step Dt ¼ 0:002. It can be seen from Fig. 8 that  when t ¼ 0:05, the blast wave propagates with a spherically symmetric shape;  when t ¼ 0:1, the blast wave is reflected at the left and bottom boundaries;  when t ¼ 0:2, the front becomes much more complex and a rarefaction wave is propagating in opposite directions.

ðq2 ; u2 ; v 2 Þ ¼ ð2; 0; 0Þ; ðq1 ; u1 ; v 1 Þ ¼ ð1; 0; 0Þ;

and



filled with high-density high-pressure fluid while the other is filled with low-density low-pressure fluid. The initial conditions are set to

ðq; p; u; v Þ ¼

5.1.5. Rayleigh-Taylor instability In the gravitational field, when a heavier fluid is placed on top of a lighter fluid, the small perturbations of their interface will develop to unstable flow. This is the fifth test case, the so-called Rayleigh-Taylor instability. This problem is solved on X ¼ ½0; 1=3  ½0; 1, which is divided into two subdomains: the upper subdomain and the lower subdomain, with the interface being y0 ¼ 0:5 þ 0:01 cosð6pxÞ. The upper is filled with highdensity fluid, denoted by subscript ‘‘2”, and the lower is filled with low-density fluid, denoted by subscript ‘‘1”. The initial conditions are

p2 ¼ 0:01 þ q2 gð1  yÞ; p1 ¼ 0:01 þ q2 gð1  y0 Þ þ q1 gðy0  yÞ;

where the initial pressure is hydrostatic and g ¼ 0:1 is the dimensionless gravitational acceleration. Boundary conditions on all four border lines are reflecting. The gravitational source term is S ¼ ð0; 0; qg; qg v ÞT . Fig. 9 presents the numerical results of Rayleigh-Taylor instability carried on a 512  1536 mesh with a fixed time step Dt ¼ 0:1. As shown in Fig. 9, the instability of the interface grows under the combined action of the gravitational attractive force of the earth and the pressure of the fluids themselves. The upper heavier fluid invades into the lower lighter fluid from the middle while the lighter fluid moves upward from both sides. At t ¼ 8:5, the harmonic interface has turned to mushroom-shape. The instability grows as [28,29,36] with the difference caused by the different spatial discretization schemes. This difference exists also between [28,29,36], but all of them flow in the same way. 5.2. Implicit time steps and CFL numbers For the fully implicit solver, we check the robustness of the proposed method with respect to the time step size. Table 2 is the

Fig. 7. The numerical results of 2D Riemann problems.

8

Y. Liu et al. / Computers and Structures 176 (2016) 1–12

Fig. 8. The numerical results of Sedov blast wave.

computed results of Sedov blast wave by using different time step sizes. All simulations are carried on a 2048  2048 mesh and ended at t ¼ 0:2. It can be observed from the results that when the time step size is increased, the average number of Newton iterations increases slowly while the average number of GMRES iterations increases rapidly, but the total compute time decreases since fewer time steps are required. As discussed before, the time step size can not be too large, otherwise the truncation errors will be increased greatly, which is the reason why we use the adaptive time step size method. We present in Table 3 the results of Riemann 12 and Sedov blast wave by using both the fixed time step method and the adaptive time step method. For Riemann 12, the adaptive time step method is initialized by Dt 0 ¼ 0:005 and adaptively controlled by using (9), while the fixed time step method is run with Dt ¼ 0:005 that is the maximum allowable fixed time step size for sufficient accuracy. For Sedov blast wave, the adaptive method is initialized by Dt0 ¼ 0:005, while the fixed method is run with Dt ¼ 0:01 that is also the maximum allowable fixed time step size. CFLmax ¼ maxfCFLi;j g is the maximum CFL number of the flow field where CFLi;j [19] is the CFL number at mesh point ðxi ; yj Þ



CFLi;j ¼ max

Dtðjui;j j þ ci;j Þ Dtðjv i;j j þ ci;j Þ ; : Dx Dy

Table 2 The numerical results by using different time step sizes for Sedov blast wave. ‘‘Steps” stands for the number of time steps, ‘‘Newton” for total Newton iterations, ‘‘GMRES” for total GMRES iterations.

Dt ð104 Þ

Steps

CFLmax

Newton/steps

GMRES/Newton

Compute time (s)

5 10 20 50 100

400 200 100 40 20

2.11 4.08 8.11 22.85 43.28

3.47 4.01 4.01 4.15 4.95

2.86 3.71 5.18 7.42 9.45

19837.40 11051.80 5549.06 2370.49 1412.58

Table 3 The comparisons between the adaptive method and the fixed method for Riemann 12 and Sedov blast wave. Riemann 12

Mesh Steps Averaged Dt CFLmax Newton GMRES Compute time (s)

Sedov

Adaptive Dt

Fixed Dt

Adaptive Dt

Fixed Dt

2048  2048 18 0.0139 44.29 80 934 1567.27

2048  2048 50 0.005 14.55 203 1491 3712.51

2048  2048 11 0.0182 108.43 53 679 787.76

2048  2048 20 0.01 43.28 99 936 1412.58

As shown in Table 3, the adaptive method can use a larger time step and costs less total compute time. What’s more, the CFL number of the adaptive method easily exceeds the CFL restriction. Fig. 10 displays (a) the number of Newton iterations, (b) the number of GMRES iterations and (c) the compute time at each time step for Riemann 12. Fig. 11 gives similar results for Sedov blast wave. Below we list several observations from Figs. 10, 11 and Table 3.

Fig. 9. The numerical results of Rayleigh-Taylor instability.

 By using the adaptive method, the number of Newton iterations and GMRES iterations as well as the compute time at each time step are all increased for Riemann 12.  For Sedov blast wave, the adaptive method do not necessarily increase the number of iterations and the compute time at each time step. But at most time steps, the number of GMRES iterations are increased.  For both Riemann 12 and Sedov blast wave, the adaptive method reduces the total number of Newton iterations and GMRES iterations as well as the total compute time since the number of time steps the adaptive method needs to compute are fewer.

9

Y. Liu et al. / Computers and Structures 176 (2016) 1–12

6 5 4

80

Fixed Δt Adaptive Δt

70 60 50 40 30

3

0

0.05

0.1

0.15

0.2

0.25

20

Fixed Δt Adaptive Δt

120

Compute time (s)

Fixed Δt Adaptive Δt

GMRES iterations

Newton iterations

7

100 80 60

0

0.05

0.1

t

0.15

t

0.2

0.25

0

0.05

0.1

0.15

0.2

0.25

t

Fig. 10. The computed results at each time step for Riemann 12. The dash lines stand for the adaptive method, and the solid lines stand for the fixed method.

5.3. Impact of control parameter of MCFD method In order to study the impact of  in (15) and stopping condition (14) on the number of iterations, we run a 1D test case and a 2D case by using different ea ; er and , as shown in Tables 4 and 5. The tests are carried out by using the j-scheme with j ¼ 1=2. The 1D case is solved on a N ¼ 512 mesh and the time step size is fixed to Dt ¼ 0:005. The 2D case is solved on a 128  128 mesh and the time step size is fixed to Dt ¼ 0:005. From the tables, we observe that when ea and er are decreased, the number of GMRES iterations will increase. For both test cases, the algorithm breaks down because of ‘‘Not a Number” if a large  is adopted, such as

 ¼ 105 .

The difference caused by different  is more notable when smaller ea and er are used, but it is still in a very narrow rang. Therefore, we believe  has little effect on the number of GMRES iterations and the choice of ea and er as long as it is less than a specific value, say 106 . We remark that, for both test cases, the number of Newton iterations stays constant no matter how ea ; er and  change. In the following tests, we set ea ; er and  to 1011 ; 108 and 107 , respectively. 5.4. Parallel performance

The performance of the implicit method depends heavily on the Schwarz preconditioner. Therefore, in this subsection, we analyze in detail how the Schwarz preconditioner affects parallel performance, and focus on the influence of subdomain solvers and overlapping parameter. Another emphasis of this subsection is the scalabilities, including the strong scalability and the weak scalability. All computations are based on the third test case, RayleighTaylor instability. 5.4.1. Impact of subdomain solvers and overlapping parameter For the Schwarz preconditioner, subdomain solvers and overlapping parameter are both important parameters that greatly influence the effect of the preconditioner. Subdomain solvers directly determine the computational efficiency of the subdomain problems. A larger overlap that means more information exchange between subdomains leads to more total compute time. Therefore, choosing an appropriate subdomain solver and overlap are significant to reduce the total compute time. In the test, two kinds of subdomain solvers are studied, including the sparse LU factorization and the sparse ILU factorization with different levels of fill-in denoted by ILU(k), where k is the level of fill-in. Table 6 presents the computed results by using both kinds of subdomain solvers and different overlaps. All simulations are implemented on a 512  1536 mesh with a fixed time step Dt ¼ 0:1 and ended at t ¼ 2:0. From Table 6, we see that by using

the sparse LU factorization instead of ILU, the number of GMRES iterations can be reduced, but LU costs more computing time. For the sparse ILU factorization, when k is decreased, the number of GMRES iterations increases but the total compute time decreases. Particularly, when k is decreased to 1, the GMRES iterations diverges. Hence, in terms of total compute time, the best choice is ILU(2) with d ¼ 1. We would like to point out that the total number of Newton iterations is about 86 with a slight fluctuation, which is caused by different convergence criteria. In other words, the iteration process is terminated when either absolute criteria or relative criteria is satisfied.

5.4.2. Strong scalability In this subsection, we focus on strong scalability, which is defined by speedup ¼ T 1 =T 2 where T 1 and T 2 are the total compute time by using np;1 and np;2 processor cores (np;1 < np;2 ), respectively. To analyze strong scalability, we run the same test case while the number of processor cores is increased from 64 to 1024, as shown in Table 7. All simulations are performed on a 512  1536 mesh with a fixed time step size Dt ¼ 0:1 and ended at t ¼ 4:0. LU and ILU(2) are choosed as subdomain solvers. We can observe from Table 7 that as the number of processor cores increases, the number of Newton iterations is almost constant while the number of GMRES iterations slowly increases, but the total compute time decreases almost proportionately. Fig. 12 displays the total compute time and the speedup with respect to different numbers of processor cores. The speedup reaches up to 18.11 as the number of processor cores is increased from 64 to 1024, which is a superlinear speedup.

5.4.3. Weak scalability When the mesh density and the number of processor cores are increased proportionately, how the total compute time varies with the number of processor cores is called weak scalability. Table 8 presents the results of weak scalability tests, which are run with a fixed time step Dt ¼ 0:1 and ended at t ¼ 4:0. The sparse LU factorization is used as the subdomain solver. The test starts with a 128  384 mesh for np ¼ 16 and ends up with a 1024  3072 mesh for np ¼ 1024. In the ideal situation, the total compute time is a constant when the mesh density and the number of processor cores are increased. As shown in Table 8, the proposed method does not reach the ideal performance. For the RAS preconditioner, the calculation amount is sensitive to the mesh density and the number of subdomains. When they are increased simultaneously, the number of GMRES iterations, as well as the total compute time, increases greatly, which suggests the need of a two-level or multilevel Schwarz method.

Y. Liu et al. / Computers and Structures 176 (2016) 1–12

110

Fixed Δt Adaptive Δt

7

GMRES iterations

Newton iterations

8

6 5 4 3

110

Fixed Δt Adaptive Δt

100

Compute time (s)

10

90 80 70 60 50

90 80 70 60

40 0

0.05

0.1

0.15

0.2

Fixed Δt Adaptive Δt

100

0

0.05

0.1

t

0.15

50

0.2

0

0.05

t

0.1

0.15

0.2

t

Fig. 11. The computed results at each time step for Sedov blast wave. The dash lines stand for the adaptive method, and the solid lines stand for the fixed method.

Table 4 The number of GMRES iterations for Sod shock tube problem by using different 8

 ¼ 105  ¼ 106  ¼ 107  ¼ 108  ¼ 109

9

ea ¼ 10 er ¼ 105

ea ¼ 10 er ¼ 106

ea ¼ 10 er ¼ 107

ea ¼ 1011 er ¼ 108

NaN

NaN

NaN

NaN

NaN

NaN

NaN

1251

1339

1524

1772

1885

2029

2079

1251

1339

1524

1772

1880

2011

2075

1251

1339

1524

1769

1880

2011

2073

1251

1339

1524

1769

1879

2011

2073

ea ¼ 1013 er ¼ 1010

ea ¼ 1014 er ¼ 1011

Table 5 The number of GMRES iterations for Sedov blast wave by using different

 ¼ 105  ¼ 106  ¼ 107  ¼ 108  ¼ 109

ea ; er and . ‘‘NaN” means ‘‘Not a Number”.

10

ea ¼ 1012 er ¼ 109

ea ¼ 1013 er ¼ 1010

ea ¼ 1014 er ¼ 1011

ea ; er and . ‘‘NaN” means ‘‘Not a Number”.

ea ¼ 108 er ¼ 105

ea ¼ 109 er ¼ 106

ea ¼ 1010 er ¼ 107

ea ¼ 1011 er ¼ 108

ea ¼ 1012 er ¼ 109

NaN

NaN

NaN

NaN

NaN

NaN

NaN

242

242

243

312

336

348

387

242

242

243

313

335

347

389

242

242

243

313

335

347

389

242

242

243

313

335

347

389

Table 6 The impact of subdomain solvers and overlapping parameter for Rayleigh-Taylor instability. d denotes overlapping parameter. ‘‘ ⁄ ” means divergence of GMRES. GMRES/Newton

d

ILU(1) ILU(2) ILU(3) ILU(4) ILU(5) LU

Compute time/Newton (s)

1

2

3

4

5

1

2

3

4

5

⁄ 35.30 31.03 29.07 29.97 19.28

⁄ 33.23 26.21 23.55 24.98 14.13

⁄ 33.49 25.24 22.71 23.86 13.09

⁄ 33.38 24.35 22.00 23.00 12.96

⁄ 32.43 23.69 21.61 22.53 12.45

⁄ 0.76 0.86 0.98 1.16 1.56

⁄ 0.80 0.85 0.96 1.15 1.74

⁄ 0.82 0.88 1.00 1.19 2.42

⁄ 0.86 0.90 1.04 1.25 2.22

⁄ 0.88 0.94 1.09 1.32 3.19

Table 7 Strong scalability for Rayleigh-Taylor instability. ‘‘np ” denotes the number of processor cores. np

64 128 256 512 1024

Newton/steps

GMRES/Newton

Compute time (s)

ILU(2)

LU

ILU(2)

LU

ILU(2)

LU

4.73 4.73 4.73 4.75 4.75

4.73 4.73 4.75 4.75 4.75

62.68 63.32 64.59 65.58 68.25

19.86 22.08 24.11 30.75 34.02

3210.20 1638.01 899.12 480.49 310.45

12031.20 4483.56 1960.32 1003.47 664.42

Y. Liu et al. / Computers and Structures 176 (2016) 1–12

11

Fig. 12. Strong scalability for Rayleigh-Taylor instability.

Table 8 Weak scalability for Rayleigh-Taylor instability. ‘‘np ” denotes the number of processor cores. np

16

64

256

1024

Mesh Newton GMRES/Newton Compute time (s)

128  384 131 8.78 153.37

256  768 147 12.78 195.91

512  1536 170 19.80 328.43

1024  3072 190 33.49 655.03

6. Concluding remarks In this paper, a parallel, fully implicit method has been developed for solving the compressible Euler equations. To spatially discretize the Euler equations, a cell-centered finite volume scheme based on the LLF method is employed. We then integrate the Euler equations in time with a fully implicit BDF-2 method. The nonlinear system arising at each time step is solved by a Newton-Krylov algorithm preconditioned by the restricted additive Schwarz preconditioner. In order to reduce the total compute time, an adaptive time stepping method is proposed to adjust the time step size according to the initial residual of Newton iterations. We numerically compared the adaptive time stepping method with the fixed time step size method. The numerical experiments indicate that the adaptive method outperform the fixed method by several times in terms of the total compute time. Meanwhile, the proposed method exhibits a good scalability with up to 1024 processor cores. We would like to point out that it is important to look at other higher order spatial discretization, especially the compact schemes [10,11,30,37,39]. Compared to the general second order methods such as the LLF scheme, the main advantage of the compact schemes is their capability to achieve arbitrarily high order formal accuracy in smooth regions while maintaining stable, sharp discontinuity transitions, and non-oscillatory. Ref. [11] combines the compact schemes with finite volume formulation, which presents a class of arbitrary high order finite volume scheme. In [10,30,39], the ideas of high order accurate weighted essentially nonoscillatory (WENO) and upwind schemes are introduced into the compact schemes, which can obtain more accurate results and lead to substantial reduction in the total computing time. These contributions greatly expand the scope of application of the finite volume method. Although compact schemes have certain advantages, the focus of the paper is on the fully-implicit NewtonKrylov-Schwarz solver, rather than the finite volume scheme. Hence, we choose the class of second order methods, like the LLF

scheme, to illustrate the parallel performance of the proposed method. And we will extend the research to the case of other high-order schemes, such as the compact schemes, in the future. Acknowledgements The authors would like to express their appreciations to the anonymous reviewers for the invaluable comments that have greatly improved the quality of the manuscript. References [1] Balay S, Brown J, Buschelman K, Eijkhout V, Gropp WD, et al. PETSc users manual. Argonne National Laboratory; 2015. [2] Bjorstad P, Gropp W. Domain decomposition: parallel multilevel methods for elliptic partial differential equations. Cambridge University Press; 2004. [3] Boonmarlert P, Phongthanapanich S, Dechaumphai P. Combined characteristicbased split algorithm and mesh adaptation technique for high-speed compressible flow analysis. Indian J Eng Mater Sci 2005;12:376–88. [4] Cai XC, Gropp WD, Keyes DE, Melvin RG, Young DP. Parallel Newton-KrylovSchwarz algorithms for the transonic full potential equation. SIAM J Sci Comput 1998;19:246–65. [5] Cai XC, Sarkis M. A restricted additive Schwarz preconditioner for general sparse linear systems. SIAM J Sci Comput 1999;21:792–7. [6] Chang T, Chen G, Yang S. On the 2-D Riemann problem for the compressible Euler equations. I. Interaction of shocks and rarefaction waves. Discr Contin Dynam Syst 1995;1:555–84. [7] Cockburn Bernardo, Shu C. The Runge–Kutta discontinuous Galerkin method for conservation laws V: Multidimensional systems. J Comput Phys 1998;141:199–224. [8] Coleman TF, Mor EJJ. Estimation of sparse Jacobian matrices and graph coloring blems. SIAM J Numer Anal 1983;20:187–209. [9] Fromm JE. A method for reducing dispersion in convective difference schemes. J Comput Phys 1968;3:176–89. [10] Fu H, Wang Z, Yan Y, Liu C. Modified weighted compact scheme with global weights for shock capturing. Comput Fluids 2014;96:165–76. [11] Gaitonde D, Shang JS. Optimized compact-difference-based finite-volume schemes for linear wave phenomena. J Comput Phys 1997;138:617–43. [12] Giraldo FX, Restelli M. A study of spectral element and discontinuous Galerkin methods for mesoscale atmospheric modeling: equation sets and test cases. J Comput Phys 2007;227:3849–77. [13] Gropp W, Keyes D, Mcinnes LC, Tidriri MD. Globalized Newton-KrylovSchwarz algorithms and software for parallel implicit CFD. Int J High Perform Comput Appl 2000;14:102–36. [14] Harten A, Lax PD, Leer BV. On upstream differencing and Godunov-type schemes for hyperbolic conservation laws. SIAM Rev 1983;25:35–61. [15] Herbin R, Kheriji W, Latch EJ. On some implicit and semi-implicit staggered schemes for the shallow water and Euler equations. ESAIM: Math Modell Numer Anal 2014;48:1807–57. [16] Hildebrand FB. Introduction to numerical analysis. Courier Corporation; 1987. [17] Huang C, Hwang F. Parallel pseudo-transient Newton-Krylov-Schwarz continuation algorithms for bifurcation analysis of incompressible sudden expansion flows. Appl Numer Math 2010;60:738–51. [18] Hundsdorfer WH. Partially implicit BDF2 blends for convection dominated flows. Stichting Mathematisch Centrum; 1998.

12

Y. Liu et al. / Computers and Structures 176 (2016) 1–12

[19] Kalita HM, Sarma AK. Efficiency and performances of finite difference schemes in the solution of saint Venant’s equation. Int J Civ Struct Eng 2012;3:950–8. [20] Kelley CT. Iterative methods for linear and nonlinear equations. SIAM; 1995. [21] Kelley CT. Iterative methods for optimization. SIAM; 1999. [22] Knoll DA, Keyes DE. Jacobian-free Newton–Krylov methods: a survey of approaches and applications. J Comput Phys 2004;193:357–97. [23] Kurganov A, Tadmor E. Solution of two-dimensional Riemann problems for gas dynamics without Riemann problem solvers. Numer Methods Part Differ Eqs 2002;18:584–608. [24] Kwizak M, Robert AEJ. A semi-implicit scheme for grid point atmospheric models of the primitive equations. Mon Weather Rev 1971;99:32–6. [25] Lax PD, Liu X. Solution of two-dimensional Riemann problems of gas dynamics by positive schemes. SIAM J Sci Comput 1998;19:319–40. [26] Leonard BP. A stable and accurate convective modelling procedure based on quadratic upstream interpolation. Comput Methods Appl Mech Eng 1979;19: 59–98. [27] LeVeque RJ. Finite volume methods for hyperbolic problems. Cambridge University Press; 2002. [28] Li X, Jin B, Glimm J. Numerical study for the three-dimensional Rayleigh– Taylor instability through the TVD/AC scheme and parallel computation. J Comput Phys 1996;126:343–55. [29] Liska R, Wendroff B. Comparison of several difference schemes on 1D and 2D test problems for the Euler equations. SIAM J Sci Comput 2003;25:995–1017. [30] Liu C, Lu P, Oliveira M, Xie P. Modified upwinding compact scheme for shock and shock boundary layer interaction. Commun Comput Phys 2012;11: 1022–42. [31] Munteanu M, Pavarino LF, Scacchi S. A scalable Newton-Krylov-Schwarz method for the Bidomain reaction-diffusion system. SIAM J Sci Comput 2009;31:3861–83. [32] Rosam J, Jimack PK, Mullis A. A fully implicit, fully adaptive time and space discretization method for phase-field simulation of binary alloy solidification. J Comput Phys 2007;225:1271–87. [33] Saad Y. ILUT: a dual threshold incomplete LU factorization. Numer Linear Algebra Appl 1994;1:387–402. [34] Saad Y. Iterative methods for sparse linear systems. SIAM: Society for Industrial and Applied Mathematics; 2003. [35] Sedov LI. Similarity and dimensional methods in mechanics. Chemical Rubber Company Press; 1993. [36] Shadloo M, Zainali A, Yildiz M. Simulation of single mode Rayleigh-Taylor instability by SPH method. Comput Mech 2013;51:699–715.

[37] Shang JS. High-order compact-difference schemes for time-dependent Maxwell equations. J Comput Phys 1999;153:312–33. [38] Steiner SM, Noelle S. Adaptive timestep control for weakly instationary solutions of the Euler equations. SIAM J Sci Comput 2009;32:1617–51. [39] Stipcich G, Fu H, Liu C. High-order mixed weighted compact and non-compact scheme for shock and small length scale interaction. Int J Comput Math 2013;90:376–407. [40] Tang H, Warnecke G. A Runge–Kutta discontinuous Galerkin method for the Euler equations. Comput Fluids 2005;34:375–98. [41] Toro EF. Riemann solvers and numerical methods for fluid dynamics: a practical introduction. Springer Science & Business Media; 2013. [42] Versteeg HK, Malalasekera W. An introduction to computational fluid dynamics: the finite volume method. Pearson Education; 2007. [43] Voke PR, Gao S, Leslie D. Large-eddy simulations of plane impinging jets. Int J Numer Methods Eng 1995;38:489–507. [44] Wagner C, Friedrich R. Direct numerical simulation of turbulent flow in a sudden pipe expansion. Appl Direct Large Eddy Simulat Transit Turbul 1994;551:1–11. [45] Wesseling P. Principles of computational fluid dynamics. Springer Science & Business Media; 2009. [46] Yang C, Cai XC. A scalable fully implicit compressible Euler solver for mesoscale nonhydrostatic simulation of atmospheric flows. SIAM J Sci Comput 2014;36:S23–47. [47] Yang H, Cai XC. Parallel fully implicit two-grid methods for distributed control of unsteady incompressible flows. Int J Numer Methods Fluids 2013;71:1–21. [48] Yang H, Hwang FN, Cai XC. Nonlinear preconditioning techniques for fullspace Lagrange-Newton solution of PDE-constrained optimization problems. SIAM J Sci Comput 2016 [in press]. [49] Yang H, Prudencio E, Cai XC. Fully implicit Lagrange-Newton-Krylov-Schwarz algorithms for boundary control of unsteady incompressible flows. Int J Numer Methods Eng 2012;91:644–65. [50] Yang H, Yang C, Sun S. Active-set reduced-space methods with nonlinear elimination for two-phase flow problems in porous media. SIAM J Sci Comput 2016;38:B593–618. [51] Zhang X, Shu C. Positivity-preserving high order discontinuous Galerkin schemes for compressible Euler equations with source terms. J Comput Phys 2011;230:1238–48. [52] Zhang X, Shu C. Positivity-preserving high order finite difference WENO schemes for compressible Euler equations. J Comput Phys 2012;231:2245–58.