Numerical aspects and implementation of a two-layer zonal wall model for LES of compressible turbulent flows on unstructured meshes

Numerical aspects and implementation of a two-layer zonal wall model for LES of compressible turbulent flows on unstructured meshes

Journal of Computational Physics 305 (2016) 589–603 Contents lists available at ScienceDirect Journal of Computational Physics www.elsevier.com/loca...

880KB Sizes 1 Downloads 82 Views

Journal of Computational Physics 305 (2016) 589–603

Contents lists available at ScienceDirect

Journal of Computational Physics www.elsevier.com/locate/jcp

Numerical aspects and implementation of a two-layer zonal wall model for LES of compressible turbulent flows on unstructured meshes George Ilhwan Park ∗ , Parviz Moin Center for Turbulence Research, Stanford University, 488 Escondido Mall, Stanford, CA 94305-4035, USA

a r t i c l e

i n f o

Article history: Received 19 June 2015 Received in revised form 4 November 2015 Accepted 5 November 2015 Available online 10 November 2015 Keywords: LES Compressible flows Wall model Two-layer zonal model Non-equilibrium model Unstructured mesh

a b s t r a c t This paper focuses on numerical and practical aspects associated with a parallel implementation of a two-layer zonal wall model for large-eddy simulation (LES) of compressible wall-bounded turbulent flows on unstructured meshes. A zonal wall model based on the solution of unsteady three-dimensional Reynolds-averaged Navier–Stokes (RANS) equations on a separate near-wall grid is implemented in an unstructured, cell-centered finite-volume LES solver. The main challenge in its implementation is to couple two parallel, unstructured flow solvers for efficient boundary data communication and simultaneous time integrations. A coupling strategy with good load balancing and low processors underutilization is identified. Face mapping and interpolation procedures at the coupling interface are explained in detail. The method of manufactured solution is used for verifying the correct implementation of solver coupling, and parallel performance of the combined wall-modeled LES (WMLES) solver is investigated. The method has successfully been applied to several attached and separated flows, including a transitional flow over a flat plate and a separated flow over an airfoil at an angle of attack. © 2015 Elsevier Inc. All rights reserved.

1. Introduction Large-eddy simulation (LES) is widely used for study of turbulent flows, and the method has well demonstrated its predictive capability over the last two decades. In LES, the low-pass filtered Navier–Stokes equations are solved for the resolved flow motions, while the smaller subgrid-scales (SGS) and their interactions with the resolved scales are modeled. It is often argued that the geometry-dependent large-scale eddies, which contain most of the turbulent energy, should be treated directly, whereas the small scales which tend to be more universal and isotropic are more amenable to phenomenological modeling. One of the key elements of LES is the employment of dynamic SGS closure models, where model coefficients are dynamically computed based on the information extracted from the solution itself, without having to tune them a priori [1–3]. The ability to predict turbulent flows accurately without tuning renders LES a predictive methodology, as opposed to traditional Reynolds-Averaged Navier–Stokes (RANS) techniques which require flow-dependent tuning of model parameters. However, despite its favorable features, LES in fact has been used sparsely for practical engineering applications due to prohibitive resolution requirements. The basic premise of LES that energy-containing and dynamically-important eddies are

*

Corresponding author. E-mail addresses: [email protected] (G.I. Park), [email protected] (P. Moin).

http://dx.doi.org/10.1016/j.jcp.2015.11.010 0021-9991/© 2015 Elsevier Inc. All rights reserved.

590

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

Fig. 1. Sketch of the wall-modeling procedure. Wall shear stress (τ w ) and heat flux (q w ) required to advance highly under-resolved LES are obtained from the solution of the full 3D RANS equations defined on a separate near-wall grid. The wall model is driven by the LES data imposed on the top boundary, and no-slip condition is applied on its wall.

resolved by the underlying grid is hard to meet in the near-wall region. This is because the size of near-wall “large” eddies becomes very small compared to the size of large-scale eddies in the outer portion of boundary layers. This scale separation becomes more prominent at high Reynolds numbers, and the cost of wall-resolved LES (WRLES) becomes comparable to that of direct numerical simulation (DNS), requiring O (106 )–O (109 ) CPU core hours to run a single LES calculation of practical aerodynamic flows. This restricts the routine use of LES in industry, where a short turnaround time is needed for parametric studies in the design/optimization cycle. As a result, the vast majority of industrial computational fluid dynamics (CFD) analyses still relies on low fidelity RANS calculations. It is forecasted in NASA’s recent CFD Vision report that this situation will continue until the notional year 2030 [4]. It is therefore clear that application of LES at realistic Reynolds numbers requires a less costly approach to account for the effect of inner-layer dynamics. In such an approach, one would compute the outer-layer using a coarse LES, while modeling the effect of momentum and heat transfer from the inner-layer to the outer-layer. This is the concept of wall-modeled LES (WMLES). A recent analysis of Choi and Moin [5] shows that N ∼ Re L in WMLES and N ∼ Re1L .9 in WRLES, where N is the number of grid points and Re L is the chord Reynolds number of a wing. They also showed that N for the WMLES is one to three orders of magnitude smaller than that for the WRLES, in computing an attached flow over a finite aspect ratio wing at different Reynolds numbers. Several wall-modeling approaches have been introduced in the LES literature. In one category called wall-flux modeling, the usual no-slip and thermal wall boundary conditions (BC) are replaced with stress and heat flux boundary conditions, which are provided by the wall model. The wall model in this category can be thought of as a black box which takes the LES solution at some location in the inner-layer as input, and returns the wall-fluxes needed by the base LES solver as output. The assumption underpinning the wall-flux modeling is that the imposition of correct Neumann wall boundary conditions is necessary to obtain an accurate solution on a very coarse LES grid. The models have evolved with increasing model-form complexity to incorporate more physics into the models. Algebraic wall models assume simple relations between the wall gradients and the LES data at the first off-wall cell [6–10]. Usually the law-of-the-wall that holds only in attached boundary layers is directly enforced, and its use is limited to very simple geometries. Two-layer zonal wall models, originally proposed by [11], aim at extending their applicability in complex domains by incorporating more physics. Two-layer models solve simplified or full flow equations on a separate near-wall grid, which is refined only in the wall-normal direction. A RANS parameterization is often assumed in the wall-model equation, because only the ensemble effect of near-wall turbulence can be represented on the wall-model mesh with very coarse wall-parallel resolution. Depending on the level of approximation, one can account for various non-equilibrium effects (nonlinear advection and pressure gradient) by solving unsteady three-dimensional RANS equations [12–15], or account for only the wall-normal diffusion to obtain a system of one-dimensional ordinary differential equations [16–18]. This paper details an implementation of a non-equilibrium two-layer zonal wall model in a parallel unstructured LES solver. The wall model solves unsteady three-dimensional RANS equations on a separate near-wall mesh to provide stress and heat flux boundary conditions to the LES momentum and total energy equations, respectively (see Fig. 1). Complexities of both the wall model and the base flow solver impose significant implementation overheads. We illustrate challenges and strategies in its implementation in terms of processor management in parallel architectures, LES/wall-model coupling at equation- and code-levels, verification of the coupling, and parallel performance. To authors’ knowledge, this is the first implementation of a PDE-based two-layer wall model in an unstructured, parallel LES solver. This is of practical importance, since zonal models that have been designed and tested in structured mesh solvers find limited utility in engineering applications. The present wall model can be used for predicting high Reynolds number flows involving complex geometries where non-equilibrium effects are important. Validation of the combined WMLES solver is not provided in this paper, but the readers are referred to our recent publications [15,19,20] in which the present work is validated in attached boundary layers at a wide range of Reynolds numbers, and separating flows over an airfoil and streamwise-periodic hills. The paper is organized as follows: In Sec. 2, governing equations used in the LES and the wall model are summarized. In Sec. 3, spatial and temporal discretization of the governing equations in unstructured finite-volume solvers are given. In Sec. 4, LES/wall-model coupling method is detailed in terms of parallel process management, face mapping, and interpolation

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

591

at the interface region, and simple tests for verifying correct coupling are proposed. Some practical aspects on the present WMLES implementation and its use are discussed in Sec. 5, followed by the conclusions in Sec. 6. 2. Governing equations Working equation sets for the LES and the wall model are the compressible filtered and Reynolds-averaged Navier–Stokes equations, respectively. These equations in the conservative form are

∂ρ ∂ρu j + = 0, ∂t ∂xj ∂ ρ ui u j ∂ τi j ∂p ∂ ρ ui + + = , ∂t ∂xj ∂ xi ∂xj ∂ τi j u i ∂q j ∂ ρ E ∂(ρ E + p )u j + = − , ∂t ∂xj ∂xj ∂xj

(1) (2) (3)

where ρ , u i , p, and E = p /[ρ (γ − 1)] + uk uk /2 denote density, velocity, pressure, and total energy, respectively. The variables are either spatially filtered or ensemble-averaged quantities depending on the use of LES or RANS equations. A perfect gas relation p = ρ R T is assumed for the equation of state, where R is the gas constant and T is the temperature. The stress tensor τi j and heat flux vector q j are

τi j = 2(μ + μt ) S dij , q j = −(λ + λt )

∂T , ∂xj

(4)

μ is the molecular viscosity (μ/μ0 = ( T / T 0 )0.76 ), λ is the molecular thermal conductivity, and S dij = S i j − δi j S kk /3 is the deviatoric part of the rate-of-strain tensor, S i j = 12 (∂ j u i + ∂i u j ). The molecular Prandtl number Pr = μc p /λ and the ratio of specific heats γ = c p /c v are kept constant at 0.7 and 1.4, respectively. μt and λt are the turbulent eddy viscosity where

and conductivity, respectively. In the LES, they are modeled with the dynamic Smagorinsky model [2,3]. In the wall model, a simple mixing-length parameterization is used with a dynamic correction proposed in [15]. 3. Discretization 3.1. Solver generic information The wall-model module is added to Charles, a cell-centered unstructured finite-volume compressible LES solver provided by Cascade Technologies, Inc. Charles has been used for high-fidelity LES in complex configurations. The examples include a prediction of supersonic jet noise from a nozzle with chevrons [21], a three-element high-lift system [16], and supersonic combustion in the HIFiRE scramjet [22]. The code is written entirely in C++ and uses domain decomposition and the Message Passing Interface (MPI) for the parallelism. The base code is modified to enable concurrent simulation of LES and RANS, as well as communication between the two solvers. In order to minimize the implementation effort, the main routines in the wall-model solver are largely taken from an existing cell-centered unstructured finite-volume compressible RANS solver (Joe) developed at the Center for Turbulence Research at Stanford University [23]. Joe and Charles are derived from the same C++ class which builds the low-level infrastructures for unstructured finite-volume discretization, such as mesh-connectivity arrays and interprocessor communications. The common features shared by the two solvers simplify the implementation of the coupling routines discussed in the following section. 3.2. Discretization of LES equations For a collocated, cell-centered finite-volume discretization, consider the integral form of the conservation laws written for a typical control volume (CV), icv0:



d



˜ V =− qd

dt icv0

F˜ d A +

∂ icv0



D˜ d A ,

(5)

∂ icv0

where q˜ = [ρ , ρ u i , ρ E ] is the state vector of conserved flow variables, F˜ = [ρ un , ρ u i un + pni , (ρ E + p )un ] T is the advective (Euler) flux vector, and D˜ = [0, 2(μ + μt ) S dij n j , (λ + λt ) ∂∂xT n j + 2μu i S dij n j ] T is the diffusive flux vector. The volume and T

j

surface integrals are approximated with the second-order midpoint quadrature rule. Semi-discretized equations for icv0 then become

dqicv0 dt

=

−1  (F − D) A f , icv0 f ∈∂ icv0

(6)

592

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

Fig. 2. Definitions of vectors and vertices for the left reconstruction (φl ) at the face f that is shared by the control volumes icv0 and icv1. Here, fc denotes the face center, and sc denotes the midpoint of the line connecting icv0 and icv1. Vectors are defined as dx = xfc − xicv0 and s f = (xicv1 − xicv0 )/ s, where

s = ||xicv1 − xicv0 ||2 . For the right reconstruction (φr ), the vectors should emanate from the control volume icv1.

where icv0 is the volume of icv0 and A f is the area of face f . In the above equation the solution vector (q icv0 ∈ R 5 ) and numerical flux vectors ( F , D ∈ R 5 ) are defined at the centroid of the cell and the face center of bounding faces of icv0, respectively. For each bounding face f of icv0, we denote the associated neighboring CV by icv1. In the LES, an explicit third-order Runge–Kutta (RK3) is used for the time advancement. The remaining task in each substage of RK3 advancements is then evaluation of the right-hand side of Eq. (6). This requires evaluation of flux vectors based on flow variables (and their gradients) reconstructed at the face center. Consider an arbitrary CV-based flow variable φ . (∇φ)icv0 , the gradient of φ at the cell center of icv0, is calculated using the Green–Gauss second-order reconstruction, which involves φ at all immediate CV neighbors of icv0. Once the CV gradient is obtained, (∇φ)fc at the face center is reconstructed by solving the following linear system:

(∇φ)fc · s f =

φicv1 − φicv0 ,

s

(7)

1

(∇φ)fc · t 1 = ((∇φ)icv0 + (∇φ)icv1 ) · t 1 ,

(8)

(∇φ)fc · t 2 = ((∇φ)icv0 + (∇φ)icv1 ) · t 2 ,

(9)

2 1 2

where icv0 and icv1 are the CVs associated with the face, s = ||xicv1 − xicv0 ||2 , and s f = (xicv1 − xicv0 )/ s. The unit vectors {s f , t 1 , t 2 } form a right-handed Cartesian coordinate system. This produces a second-order approximation of (∇φ)fc , provided that ||xfc − xsc ||2  s (see Fig. 2). The above linear system can be inverted analytically to express (∇φ)fc in terms of s f , φicv0 , φicv1 , (∇φ)icv0 , (∇φ)icv1 , and s (see Appendix A). On Dirichlet boundary faces where φ = φbc is to be enforced, φicv1 and (∇φ)icv1 in the above equations are replaced with φbc and (∇φ)icv0 , respectively. This effectively reduces the boundary gradient reconstruction to a first-order one-sided scheme. The default face value reconstruction scheme for the primitive variables (ρ , u i , and p) produces two states biased to the associated left (icv0) and right (icv1) CVs [24].

φl = φicv0 + al (φicv1 − φicv0 ) + (∇φ)icv0 · bl ,

(10)

φr = φicv1 + ar (φicv0 − φicv1 ) + (∇φ)icv1 · br ,

(11)

dx·s

where al = 13 s f , bl = 23 dx + 13 (dx − (dx · s f )s f ), and dx = xfc − xicv0 (see Fig. 2). ar and br are defined similarly by switching the roles of icv0 and icv1. It can be shown formally that the above reconstruction scheme is second-order accurate in general unstructured grids, and that the scheme reduces to a third-order accurate quadratic upwinding scheme (QUICK) in a Cartesian uniform grid. When shocks are present in the flow, the solver automatically detects the faces near the shock by examining the relative smoothness of the solution. φl and φr on those faces are then computed using the third-order ENO scheme for shock capturing. Finally, the flux vectors are evaluated using the values (φl and φr ) and gradients ((∇φ)fc ) reconstructed at the face center. The advective flux vector F = [ρ un , ρ u i un + pni , (ρ E + p )un ] T is calculated by blending the dissipative HLLC upwinding flux and non-dissipative central flux [24]:

F = (1 − α ) F central + α F HLLC ,

0 ≤ α ≤ 1,

(12)

where F central is obtained by combining φl and φr with equal weights, and F HLLC is obtained with the HLLC approximate Riemann solver [25] which solves a Riemann problem at the face with left and right reconstructed states. The central flux is formally second-order accurate in general unstructured grids, but it upgrades to a fourth-order accurate central scheme in a Cartesian uniform grid. The blending parameter α introduces the upwinding flux only in the region with poor grid quality (e.g., hanging nodes, grid transition) for numerical stability. α is calculated during the pre-processing stage, such that it is proportional to the row norm of C + C T , where C is the global advection matrix obtained by semi-discretizing the linear scalar advection equation on a given grid [24]. This was motivated by the fact that the scheme is energy-conserving

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

593

in the inviscid limit on a periodic domain, provided that the matrix C is skew-symmetric (i.e., C + C T = 0) [26]. In a regular Cartesian grid, α is close to machine zero except for the boundary faces. On the shock faces, α is set to 1 in conjunction with the use of ENO reconstruction scheme in order to prevent unstable oscillations typically present with central schemes. The diffusive flux vector (D = [0, 2(μ + μt ) S dij n j , (λ + λt ) ∂∂xT n j + 2μu i S dij n j ] T ) is calculated using the primitive variables and j

their gradients reconstructed at the face center. The viscosity and thermal conductivity at the face center are calculated by a simple average of cell values. 3.3. Discretization of wall-model equations The wall-model solver uses the same CV gradient and face value reconstruction schemes as those in the LES solver, whereas the face gradient reconstruction is done differently. The advective flux F is calculated exclusively with the HLLC scheme. Following [27], the gradient in the face-normal direction at the face center required in the diffusive flux D is approximated with

∂φ  ∂φ   = β  + (∇φ)fc · f corr , ∂ n fc ∂ s fc

(13)

 ∂φ  where β = n f · s f , f corr = n f −β s f , ∂ s  = fc

φicv1 −φicv0 , and

s

(∇φ)fc = 12 ((∇φ)icv0 +(∇φ)icv1 ). This yields D = [ D ρ , D ρ u i , D ρ E ] T

with

D ρ = 0,

 u  ∂ u j  2 ∂ uk  i ,icv1 − u i ,icv0 D ρ u i = (μ + μt ) β + (∇ u i )fc · f corr +  nj −  ni ,

s ∂ xi fc 3 ∂ xk fc  T  − T icv1 icv0 D ρ E = (λ + λt ) β + (∇ T )fc · f corr + u i ,fc D ρ u i ,

s

(14) (15) (16)

where u i ,fc = (u i ,icv0 + u i ,icv1 )/2. The viscosity and thermal conductivity are treated explicitly and calculated by a simple average of cell values. The scheme is second-order accurate provided that ||xfc − xsc ||2  s. Since the wall-model mesh has small grid spacings near the wall to resolve sharp velocity/temperature gradients, an implicit time marching is used to avoid severe acoustic Courant–Friedrichs–Lewy (CFL) time-step restriction. The time step for combined WMLES simulation is therefore limited by the acoustic CFL condition in the LES. Consider the application of the linearized implicit Euler method to Eq. (6):

 icv0 n δ qicv0 = − ( F n +1 − D n +1 ) A f

t f ∈∂ icv0

≈−

 

f ∈∂ icv0

Fn +

  ∂ F ∂ ql ∂ F ∂ qr ∂ D ∂ U nbr δQ n + δ Q n − Dn + δQ n Af, ∂ ql ∂ Q ∂ qr ∂ Q ∂ U nbr ∂ Q

+1 ≡ qnicv0 T

T where − qnicv0 , Q ≡ [q1T , q2T , · · · , qncv ] T ∈ R 5ncv , and ncv is the total number of CVs. U nbr is a vector of [ρ , u i , h = c p T ] associated with icv0, icv1, and their neighbors. ql/r (both in R 5 ) are the left/right state vectors consistent with the corresponding biased reconstruction. All Jacobians are defined at the current time-step tn . A major simplification made then is

δ qnicv0

∂ ql δ Q n ≈ δ qnicv0 , ∂Q ∂ D ∂ U nbr δQ n ≈ ∂ U nbr ∂ Q

∂ qr δ Q n ≈ δ qnicv1 . ∂Q ∂ D ∂ U icv0 n ∂ D ∂ U icv1 n δ qicv0 + δq ∂ U icv0 ∂ qicv0 ∂ U icv1 ∂ qicv1 icv1

(17) (18)

to give

   ∂F n ∂F n ∂ D ∂ U icv0 n ∂ D ∂ U icv1 n  icv0 n δ qicv0 = − Fn + δ qicv0 + δ qicv1 − D n + δ qicv0 + δq Af.

t ∂ ql ∂ qr ∂ U icv0 ∂ qicv0 ∂ U icv1 ∂ qicv1 icv1 f ∈∂ icv0

(19) The approximations in Eqs. (17) and (18) increase the sparsity of the implicit matrix substantially, at the expense of introducing the first-order spatial error. This is because ql/r and U nbr,l/r , in general, are functions of icv0, icv1, and their neighbors. The analytical expressions given by [25] and [28] are used for F n and ∂∂ qF , respectively. These HLLC terms are evaluated with the primitive variables (ρ , u i , p) formed by biased second-order reconstructions. D n , ∂∂ UD , and ∂∂Uq are evaluated with Eqs. (14)–(16). Finally, a matrix equation A δ Q n = b ( A ∈ R 5ncv×5ncv , b ∈ R 5ncv ), is solved for δ Q n after assembling Eq. (19) for all control volumes. This linear system is solved iteratively by the flexible GMRES method with successive overrelaxation preconditioning, using an external linear algebra library PETSC [29]. The conserved variables are then updated with Q n+1 = Q n + δ Q n .

594

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

3.4. Boundary conditions In the LES solver, the usual no-slip/thermal wall-boundary conditions are replaced with the wall-stress/heat flux boundary conditions, because the coarse LES mesh cannot support sharp wall gradients. That is, the diffusive flux vector D on the LES wall face is calculated with the wall-model solution. The advective flux vector F is set to 0 to satisfy kinematic no-penetration condition. The LES solver maintains the same wall boundary conditions during each substage of RK3 time integration. Application of these flux boundary conditions in the wall-adjacent cells is straightforward in the cell-centered finite-volume approach (see Eq. (5)), and precise values of velocity and temperature at the wall need not be specified. In order to provide accurate wall fluxes to the LES, the wall-model mesh is designed to have sufficiently fine resolution in the wall-normal direction, and the no-slip condition is applied at the wall (see Fig. 1). The top boundary condition needed by the wall model is imposed by the instantaneous primitive variables (ρ , u i , p) from the LES at the interface location. More specifically, the advective flux at the wall-model top boundary is calculated by solving a local Riemann problem associated with the imposed LES primitive variables and the primitive variables reconstructed from the inner wall-model solution. This characteristic-based boundary treatment is required, because one must specify boundary conditions pertaining only to incoming waves for well-posedness and stability [30]. Directly enforcing the LES state produced the flux inconsistent with outgoing waves (which are completely determined by the wall-model solution), and made the simulations unstable. 4. Wall-model implementation in a parallel, unstructured LES solver 4.1. LES/wall-model coupling strategy: split-communicator vs single-communicator The partial differential equations (PDE) nature of the present two-layer wall model imposes significant implementation overhead, compared to simpler zonal wall models. The wall model has its own computational domain in the near-wall region on which three-dimensional, time-dependent flow equations are solved. The primal LES solver and the wall-model solver therefore have the same degree of complexity, and they have to be integrated in time synchronously while exchanging boundary conditions. One way to manage parallel flow solvers running concurrently is to divide the total number of processors into a set of two disjointed groups, where each group of processors advances exclusively either the LES or the wall model required by the coupled WMLES simulation. This type of coupling strategy, which we refer to as the split-communicator approach, was in fact used to perform an end-to-end numerical simulation of a realistic gas-turbine engine (Pratt and Whitney 6000 engine) as part of the Advanced Simulation and Computing (ASC) program at Stanford University [31,32]. All turbomachinery (fan, compressors, and turbines) calculations were performed using RANS, and the combustor and its inlet diffuser were simulated using LES. No wall model was employed in the calculation. For integrated calculations the two codes were linked via a separate software module. All three components of the simulation were independent software packages: CDP the LES solver, SUmb the RANS solver, and CHIMPS the linker. A coupling method similar to the aforementioned one was adopted in the early period of the present study. However, rather than writing a separate coupling software module, MPI was used to let the solvers communicate directly with each other. Also, the combined WMLES solver was a single software module (i.e., a single MPI executable) rather than multiple modules linked by a linker. Following the MPI-II standard, the processors-splitting was handled by MPI_Comm_split() or MPI_Comm_create() functions. This function splits the global MPI communicator (typically identified with MPI_COMM_WORLD), which refers to all processors involved in a MPI computation, into two non-overlapping groups (intra-communicators). Each group was then responsible for solving either LES or wall-model equations, and communications between the two solvers belonging to different intra-communicators were through an abstract inter-communicator created by using MPI_Intercomm_create() function. The advantage of having a dedicated communicator for each flow solver is that it provides well-defined scope and modularity, such that routines in one flow solver does not interfere with those in another solver unless intended to do so. A disadvantage of the split-communicator approach is the difficulty in achieving optimal load balance between the LES and the wall model. The optimal processor-partition ratio has to be specified a priori by the user, which depends on several factors such as each solver’s compute-load per time-step, the number of cells in the LES and the wall-model meshes, total number of allocated processors, and network bandwidth of underlying parallel architectures. In practice, this ratio has to be determined experimentally with multiple trial-and-error runs, which are impractical for large scale calculations. Communication overhead associated with the interface data exchange is also often unbalanced in the split-communicator approach. Interface boundary zones (e.g., wall zone or matching plane) shared by the two solvers are partitioned with a smaller number of processors in the wall-model solver than that in the LES solver, because the wall-model mesh usually has cell counts lower than that of the LES mesh. Additionally, there is always the possibility of underutilizing the allocated computer resource because the group of processors assigned to a particular flow solver (e.g., the LES solver) is, by construction, not used by another flow solver (e.g., the wall-model solver). That is, the processors associated with the LES solver are idle when the wall-model solver is active for time-advancement, and vice versa. The parallel efficiency of the combined WMLES solver with the split-communicator approach is therefore likely to be suboptimal. The problems with the split-communicator approach become more evident when it is compared to the singlecommunicator approach. In the latter, all the allocated processors are used to advance both the LES and the wall-model

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

595

Algorithm 1 WMLES driver routine. 1: Initialization 2: Initialize LES and WM solver objects (load grids and solutions). 3: Initialize Coupler object with pointers to the solvers. 4: Pre-processing 5: Build wall-face mapping between LES and WM (§4.2.1). 6: Build matching-face mapping between LES and WM (§4.2.1). 7: For each WM cell, find its proper wall face and matching face to be used for TKE sensor (§5.4). 8: Time integration 9: while step < N do 10: WM-to-LES BC communication: Interpolate τ w and q w , pack in 1-D array, and send (§4.2.2 and §4.2.3). 11: Advance LES one time-step (§3.2) 12: LES-to-WM BC communication: Interpolate ρ , u i , p, pack in 1-D array, and send (§4.2.2 and §4.2.3). 13: Advance WM one time-step (§3.3) 14: ++step 15: end while

solvers in turns, while the communication between the solvers takes place within the global MPI communicator. In the single-communicator approach, 10–15% speed up was found compared to the split-communicator approach. This is expected since there are no idle processors in the former. A disadvantage of the single-communicator approach is that parallel slowdown of the overall WMLES solver can occur with the smaller number of processors than in the split-communicator approach. Since the number of cells in the wallmodel mesh is often less than that of the LES mesh, the wall-model solver saturates first as the total number of processors is increased. At this point, the communication overhead within the wall-model solver dominates the computational cost, and further parallelization can slow down the calculation even if the LES solver exhibits speedup. However, this slowdown will be of lesser concern than processor underutilization in the split-communicator approach, since we are more likely to be lacking in computer resources than to have too much when simulating practical wall-bounded flows at high Reynolds numbers. The split-communicator approach might be the only viable way to handle multiple software modules in a single MPI task. However, inasmuch as C++ object-oriented programming (OOP) allows us to handle the LES and the wall-model solvers within a single software module, we adopted the single-communicator approach to achieve better parallel performance. 4.2. Implementation of coupled WMLES solver in the single-communicator approach Algorithm 1 outlines the front-end driver routine in the present WMLES implementation. The actual coupling of the two flow solvers (LES and wall model) at the code-level is handled by a C++ coupler object. The coupler builds the face mappings between the solvers for boundary condition communication, and performs the interpolation and actual message passing while advancing each solver in turns (lines 4–15 in Algorithm 1). The coupler is initialized with the flow solver objects passed by their pointer references. The face mapping is typically established in the pre-processing stage of the calculation. At this stage, the solvers exchange the location of the faces at which the boundary data have to be specified by the other solver. Each solver then determines which processor owns the target faces requested by the other, and how to interpolate from the data defined on their own mesh to the requested interface faces. In the following sections, we describe in detail the implementation of these tasks performed by the coupler. 4.2.1. Face mapping It is important to construct accurate face mappings between the two solvers so that the correct boundary condition is applied at the correct location. Each wall-face in the LES solver has to find its pair in the wall-model solver, from which the wall stress and heat flux will be obtained. Likewise, each top-boundary matching face in the wall-model solver has to identify its pair in the LES solver, from which Dirichlet boundary data are interpolated. In a structured solver, this identification can be done easily by accessing boundary Cartesian indices directly. However, this logically and geometrically ordered connectivity is absent in unstructured solvers. Additionally, the faces needing information exchange likely reside on different processors with different face indices. The situation gets more complicated when some of the faces involved in the boundary communication are not shared between the two solvers. For example, when the LES mesh is composed of tetrahedra and the wall-model mesh is generated by prismatically extruding the wall zone of LES mesh, there is no guarantee that the top-matching faces in the wall-model mesh will have their exact counterparts in the LES mesh. When the wall-model mesh is built independently of the LES mesh, even the wall-faces may not be shared between solvers. With unstructured meshes for complex configurations, these situations are both highly probable. For its usefulness in such scenarios, the face mapping is built by finding the closest face to the reference face, rather than by trying to find an exactly identical face. The procedure to build the wall-face mapping is given below. The top-matching face mapping is constructed in a similar manner.

596

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

• The LES solver identifies the wall boundary zones in each processor (if any), by examining the boundary condition types in the input file.

• The LES solver packs the wall-face information in all wall zones into an one-dimensional array. This array contains the Cartesian coordinate of the face center, the unit normal vector, the processor rank, and the local face index of each LES wall-face. This array is then MPI-broadcasted and made available to the wall-model solver. • For each LES wall-face in the broadcasted array: – Each processor loops over its wall-faces in the wall-model solver (if any) to find the one that is locally closest to the LES wall-face, based on the distance between two face centers. – The wall-model face that is globally closest to a given LES wall-face is then determined by MPI-reduction. • After this procedure, each processor will have a list of wall-face mappings characterized by the ordered map index, the destination processor rank, the destination face index (LES), the source face index (wall model), and the unit normal vector and the face-center coordinate of the LES wall-face. This list is used to send the wall boundary data from the wall-model solver to the LES solver. The unit normal vector of the LES wall-face is not used for the mapping, but it is useful when the wall-face is not shared by the solvers. The brute force search algorithm presented here is in fact an expensive, O ( N 2 ) operation. The actual time to build the face mappings during the pre-processing stage was comparable to 200–1000 time-step integrations in the test cases presented in Chapter 4. A more sophisticated and fast search algorithm such as Alternating Digital Tree [33] might be needed in very large-scale calculations, but this was not considered in this study. 4.2.2. Interpolation The following summarizes the interpolation of boundary data associated with the face mapping. φ represents a general flow variable.

• Neumann boundary conditions on the LES wall boundary: – For each LES wall-face wfa_les, find a wall-model wall-face wfa_rans that lies closest to it based on the distance between the face centers. – ∇φ at the face center of wfa_les is reconstructed from the wall-model solution (∇φicv0 , φicv0 , and φwall ) using a first-order gradient reconstruction scheme (see Eqs. (A.1)–(A.3)), where icv0 denotes the boundary cell in the wall model associated with wfa_rans. – Then ∂φ/∂ n on the LES wall-face wfa_les is computed from ∇φ · n, where n is the outward unit normal vector associated with wfa_les. • Dirichlet boundary conditions on the wall-model top boundary: – For each top boundary face tfa_rans of the wall model, find an internal face of LES ifa_les that lies closest to it based on the distance between the face centers. The face ifa_les will have two associated neighboring LES cells, icv0 and icv1. – Then φ at the face center of tfa_rans is reconstructed from the LES solution (φicv0 , φicv1 , ∇φicv0 and ∇φicv1 ) using a second-order central reconstruction scheme (see Eqs. (10)–(12)). 4.2.3. Boundary condition (BC) communication During the actual computation, after each time-step, the solvers exchange the boundary data through the coupler. The boundary data exchanged by the two solvers are:

• Wall model → LES: 2μ S i j n j , λ ∂∂xT n j , destination processor rank and face index, current time-step. j • LES → wall model:

ρ , u i , p, resolved turbulent kinetic energy, destination processor rank and face index, current time-step. The resolved turbulent kinetic energy of LES at the matching location is used to inform the wall model whether the flow is laminar or turbulent (see §5.4). The list of face mappings created in the preprocessing stage is utilized to pack the boundary data and to identify proper sender/receiver processor pairs. On each parallel processor, each flow solver first loops over the face mappings and packs the boundary data to be sent in an one-dimensional array. Generally, each processor sends to and receives from all other processors. Instead of using a blocking, collective MPI function (MPI_Alltoallv), a non-blocking, point-to-point function is used (MPI_Isend/Irecv) to send/receive the data arrays. Upon completing the MPI communications, the boundary data each flow solver has received from the other are enumerated in increasing order of the sender processor rank. Since the BC routine loops over the boundary faces in increasing order of face index, the received data in each processor is then sorted in the same way to avoid O ( N 2 ) linear search. 4.3. Verification of solver coupling We verify the implementation of the key elements of the coupler class (face mapping, interpolation, and communication) with simple tests. The objective here is to ensure that these complex operations are done correctly in an unstructured

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

597

Table 1 LES/wall-model mesh information for BC communication test T1. Case number

( L x , L y , L z )LES

( L x , L y , L z )WM

( N x , N y , N z )LES

( N x , N y , N z )WM

1 2 3 4 5

(4, 1, 4) (4, 1, 4) (4, 1, 4) (4, 1, 4) (4, 1, 4)

(4, 0.2, 4) (4, 0.2, 4) (4, 0.2, 4) (4, 0.2, 4) (4, 0.2, 4)

(10, 10, 10) (20, 20, 20) (40, 40, 40) (80, 80, 80) (160, 160, 160)

(10, 10, 10) (20, 20, 20) (40, 40, 40) (80, 80, 80) (160, 160, 160)

Fig. 3. BC interpolation L ∞ -error in test T1, as a function of the characteristic grid spacing WMLES . Solid lines (from top to bottom), e 2 (Dirichlet data interpolation from the LES to the wall model, nonlinear test function), e 3 (Neumann data interpolation from the wall model to the LES, nonlinear test function), and e 1 (Dirichlet data interpolation from the LES to the wall model, linear test function); dashed line, O ( 2WMLES ); dash-dotted line, O ( WMLES ).

parallel environment. The boundary condition communication routine is tested on hypothetical LES/wall-model meshes with simple test functions, using 128 processors. The size of the LES domain is 1 ≤ x ≤ 5, 0 ≤ y ≤ 1, and 1 ≤ z ≤ 5. The wall-model mesh has the same domain size in x/ z directions as that in the LES, but it is smaller in the wall-normal direction (0 ≤ y ≤ 0.2). The wall-zone shared by the two solvers is located at y = 0, and the matching plane on which the LES data are to be prescribed is the top boundary of the wall-model mesh at y = 0.2. This is a typical domain configuration for boundary layer simulations using the present wall-modeling approach. For verification of the face mapping and Dirichlet BC interpolation from the LES to the wall model, we consider two test functions given by

φ1 (x, y , z) = x + 2 y + 3z, φ2 (x, y , z) = 10 + sin(

πx 2

(20)

) sin(

πy 2

) sin(

πz 2

).

(21)

The test function φ1 is linear in space, and therefore the second-order interpolation from the LES to the wall model should be exact. The nonlinear test function φ2 is used to verify the expected second-order convergence. Once the wall-model solver receives the LES data φ1LES and φ2LES , the errors at each top boundary face of the wall model are calculated with e i = φi (xwm , y wm , zwm ) − φiLES , i = 1, 2, where (xwm , y wm , zwm ) is the face-center coordinate of the wall-model top-boundary f f f f f f face. For the verification of the face mapping and Neumann-BC interpolation from the wall model to the LES, the following test function is considered: 1

φ3 (x, y , z) = x− 7 y + y sin(

πz

). (22) 2 The choice of the test function φ3 is motivated by the presence of the linear sublayer in turbulent boundary layers 1 (u ∝ y as y → 0), and an empirical power-law for turbulent skin friction ( ∂∂ uy ∝ x− 7 as y → 0 [34]). Once the LES  ∂φ  solver receives the Neumann BC, ∂ n3  from the wall model, the error at each wall-face of the LES is calculated with WM  ∂φ3 les les les

∂φ3  e3 = − ∂ y x f , y f , z f − ∂n  , where (xles , y les , zles ) is the face-center coordinate of the LES wall-face. f f f WM

In the first test denoted by T1, both the LES and the wall-model meshes are refined successively in all directions (see Table 1). Note that the wall-model grid completely overlaps the LES grid on the matching plane at y = 0.2 and on the wall plane at y = 0. Fig. 3 shows the L ∞ error from the test T1 as a function of the characteristic grid spacing, 1/ 6 1/ 6

WMLES ≡ ( x y z)LES ( x y z)WM . As expected, in the LES, the interpolation error of the linear test function is close to machine-zero, and the error of the nonlinear test function shows the second-order convergence. The interpolation from the wall model to the LES also shows the expected first-order error convergence. The expected convergence behavior and small level of error indicate the correct implementation of the face mapping and interpolation routines.

598

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

Table 2 LES/wall-model mesh information for BC communication test T2. Case number

( L x , L y , L z )LES

( L x , L y , L z )WM

( N x , N y , N z )LES

( N x , N y , N z )WM

1 2 3 4 5

(4, 1, 4) (4, 1, 4) (4, 1, 4) (4, 1, 4) (4, 1, 4)

(4, 0.2, 4) (4, 0.2, 4) (4, 0.2, 4) (4, 0.2, 4) (4, 0.2, 4)

(160, 160, 160) (160, 160, 160) (160, 160, 160) (160, 160, 160) (160, 160, 160)

(10, 160, 10) (20, 160, 20) (40, 160, 40) (80, 160, 80) (160, 160, 160)

fine

Fig. 4. BC interpolation L ∞ -error in test T2, as a function of the wall-parallel grid spacing ratio WM / LES . Solid lines (from top to bottom), e 2 (Dirichlet data interpolation from the LES to the wall model, nonlinear test function), e 3 (Neumann data interpolation from the wall model to the LES, nonlinear test function), and e 1 (Dirichlet data interpolation from the LES to the wall model, linear test function); dash-dotted line, O ( WM ).

In the second test case denoted by T2, starting from the LES/wall-model meshes at the finest level, only the wall-model mesh is successively coarsened in the wall-parallel directions (x and z) (see Table 2). This mimics a practical situation in which one uses a wall-model mesh coarser than the LES mesh to reduce the cost of wall modeling. In this case, both the wall faces and the matching faces are not shared by the solvers. Fig. 4 shows the L ∞ error from the test T2 as a fine

function of WM / LES , the ratio of the wall-model grid spacing to the LES grid spacing in the wall-parallel directions. While the interpolation error of the linear test function in the LES is negligible, the error of the nonlinear test functions requires some explanations. In the test T2, the LES-to-wall-model Dirichlet data transfer is a fine-to-coarse interpolation (restriction). Consequently the effective grid spacing eff and the error are independent of the wall-model mesh resolution 1

2 ( eff = 12 ( x2LES + y 2LES + zLES ) 2 when the wall-model mesh is coarser than the LES mesh, and eff = 12 y LES when they have the same resolution in x/ z directions). Therefore, the interpolation error in the LES-to-wall-model data transfer is a function of the LES grid resolution only. On the contrary, the wall-model-to-LES Neumann data transfer is a coarse-to-fine interpolation (prolongation), and eff increases when the wall-model mesh is coarsened. These observations suggest that the wall-model-to-LES interpolation and the LES-to-wall-model interpolation are first- and zeroth-order accurate, respectively, when the wall-model mesh is coarser than the LES mesh. Having overlapping meshes in the LES and the wall model minimizes the interpolation error.

5. Practical considerations 5.1. Cost and parallel performance In the test cases considered in [15], the ratio of the number of cells in the wall model to the number of cells in the LES is about 0.2–0.5, and the extra cost of wall modeling was about 100–150% of the stand-alone LES without a wall model. Compared to an additional 30% cost of the equilibrium wall model implemented in a similar code [35], the cost of the present WMLES formulation is appreciably higher. This difference is mainly due to the inversion of a large matrix in implicit time advancement. However, the present method is still computationally affordable due to the use of very coarse LES grids, and we anticipate that non-equilibrium effects important in complex flows could be represented better with the current formulation. Parallel performance of the WMLES solver was analyzed with a strong scalability test. The NACA4412 test case presented in [15] was considered. This case was selected since it was the largest problem run during the present study in terms of number of grid points. The number of cells totaled 13.4 million (10.7 million cells in the LES mesh and 2.7 million in the wall-model mesh). The computations were carried out on the Pinto cluster, a modest-sized high-performance computing (HPC) platform operated by Los Alamos National Laboratory (LANL). The cluster has 2464 CPU cores (Intel Xeon E5-2670 2.6 GHz) on 154 compute nodes (16 cores/node), with 32 GB/node memory and Qlogic InfiniBand Fat-Tree interconnect. The

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

599

Fig. 5. Speedup (left) and parallel efficiency (right) as a function of number of processors. Obtained from a strong scaling test with the NACA4412 airfoil case presented in [15]. ncv per core represents the number of control volumes per core (LES + wall model). Dashed lines, ideal scaling; lines with symbols, actual scaling.

number of processors was increased from 32 to 512, doubling the number for each successive run. The number of control volumes per core therefore varied from 430,000 to 25,000. The average wall-clock time required to advance 1000 time-steps was recorded for each run to calculate the speedup factor and parallel efficiency. The speedup in this study is defined as

S ( p) =

T p ref Tp

(23)

,

where T p ref is the reference execution time with p ref processors, and T p is the execution time with p processors. When p ref = 1, S ( p ) represents the parallel speedup with respect to a corresponding sequential algorithm, and ideal linear speedup is achieved when S ( p ) = p. However, taking p ref = 1 was impossible due to the memory restriction with a single processor. Instead, we took p ref = 32, the smallest possible number of processors with which the solver did not break due to memory problems. In this case, ideal speedup is achieved when S ( p ) = p / p ref . The parallel efficiency is defined as

η( p ) =

T p ref /( p / p ref ) Tp

,

(24)

with p ref = 32. η( p ) represents the ratio of the expected ideal execution time to the actual execution time. Typically, η( p ) is a positive number between 0 and 1, deemed ideal when close to 1. Fig. 5 shows the speedup S ( p ) and parallel efficiency η( p ) as a function of number of processors (also equivalently as a function of number of control volumes per processor). Up to 256 processors, the WMLES solver in fact exhibits superlinear speedup. This rare and rather confusing phenomenon is attributed to the cache effect. In some circumstances, the memory hierarchies of the HPC platform and the problem partition happen to be just the right combination, so that reduction in memory/cache access time becomes significant in addition to the reduction in actual computation time. For p > 256, the typical sublinear speedup is observed. This trend is also reflected in the plot of parallel efficiency η( p ). At p = 512, the wall-model solver starts to exhibit parallel slowdown due to the increased communication overhead. At this point, the number of control volumes per processor is about 20,000 in the LES solver and 5000 in the wall-model solver. Reducing further the number of control volumes in the wall-model solver in fact slows down the global communication for the matrix assembly/inversion and solution broadcast. We expect that the parallel slowdown at this limit is not a serious impediment in practical large-scale simulation. Under limited computer resources, those calculations are likely to be performed with 30,000 to 50,000 control volumes per processor at which the strong scaling parallel efficiency is greater than 75%. 5.2. LES grid preparation In WMLES, major saving in the computational cost comes from the use of very large grid spacings in the wall-parallel directions. Reduction of the cell count due to the coarsening in the wall-normal direction is peripheral, because the boundary layer thickness (δ ) is often much smaller than the length scale of the geometry (e.g., airfoil chord length or span). However, having large normal spacings near the wall is still preferred in explicit compressible flow solvers, in order to avoid severe acoustic CFL time-step restriction in low Mach number flows. In this case, time-to-solution can be reduced by a factor of ten or more by using large computational time-steps. Grid resolution in wall-bounded turbulent flows is often expressed in wall units defined as + ≡ u τ /ν , where ν = μ/ρ √ and u τ = τ w /ρ are the kinematic viscosity and the friction velocity, respectively. However, a more proper metric of grid resolution in WMLES is the number of boundary-layer-resolving cells, because the grid spacing in WMLES should scale with the local boundary layer thickness, δ . This cell density suggested in the literature is between 10–30 [5]. In our test cases

600

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

Fig. 6. WMLES of flow over a NACA4412 airfoil at a 12-degree angle of attack [15]: contour of instantaneous streamwise velocity normalized by freestream velocity u X /U ∞ . The flow is from left to right. The negative velocity region (blue online) near the trailing edge shows flow separation.

Fig. 7. Wall-model mesh used for NACA4412 airfoil calculation [15]. The trailing edge region is zoomed in on the right.

reported in [15], good results were obtained with nx = 10–30, n y = 30–45, and n z = 16–30, where nx , n y , and n z are the number of boundary-layer-resolving cells in the streamwise (x), wall-normal ( y), and spanwise (z) directions, respectively. 5.3. Wall-model grid generation An overhead often overlooked in a PDE-based two-layer zonal wall model is the grid generation for the wall-model solver. The wall-model grid often occupies a tiny volume compared to the volume of the primal LES grid, because the wall-normal thickness of the wall-model grid (hwm ) is only a portion of the local boundary layer thickness (δ ). However, in high Reynolds number flows, the total number of cells in the wall-model grid becomes comparable to that in the primal LES grid. This ratio, as mentioned earlier, was about 0.2–0.5 in our validation studies [15]. In the wall-model grid, the wall-parallel grid content is usually kept the same as or coarser than that of the LES grid, but the wall-normal grid spacings need to be small enough to resolve wall-gradients. Consequently, one has to mesh a thin, shell-like volume with highly anisotropic elements. If the geometry is simple and the boundary layer develops slowly over the surface, the grid generation can be done easily with meshing softwares. Examples are channel and flat-plate boundary layer flows. However, if the wall-boundaries have complex shapes and the boundary layer grows fast, defining and meshing a thin domain that properly respects the boundary layer growth may be challenging with commercial meshing tools. An example is a stalled NACA 4412 airfoil calculation in [15] (see Fig. 6). In this flow, the boundary layer over the suction surface grows rapidly under the adverse pressure gradient, and the incipient separation is observed near the trailing edge. Using in-house grid manipulation tools, the wall-model mesh in this calculation was generated from the LES mesh by chopping off cells in which wall distance is greater than 10% of the local boundary layer thickness reported in the experiment [36], and then subsequently refining the remaining cells only in the wall-normal direction (see Fig. 7). This method is useful when the LES mesh near the wall is composed of prismatic elements (typically hexahedra). This is not a strong limitation when the geometry is essentially two-dimensional, since

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

601

Fig. 8. Response of the laminar/turbulent sensor in WMLES of a spatially developing transitional boundary layer on a flat-plate [15]. (Top) Color contours of the second invariant of the velocity gradient tensor. The computational domain is enlarged by a factor of 2.5 in the wall-normal direction. (Bottom) Skin friction coefficient, wall-model eddy viscosity, and sensor variable as functions of the streamwise coordinate. Solid line, 103 C f ; dashed line, 10−1 μt ,wm /μ; dash-dotted line, stl in Eq. (25). s0tl = 0.5 was used with no-slip condition on the laminar wall faces.

prismatic elements are preferred in the boundary layers for accuracy and stability. For three-dimensional geometries where the use of tetrahedra elements is unavoidable in the LES mesh near the wall, the wall-model mesh may have to be meshed directly, or constructed with a surface mesh extrusion techniques. Aside from the technical difficulty in the wall-model mesh generation, the effect of the wall-model layer thickness (h wm ) on the WMLES solution is of some importance. hwm , strictly speaking, is an implicit modeling parameter that affects the interaction between the outer-layer LES and the inner-layer wall model. hwm is often specified by the user at the grid generation stage, and it is difficult to adjust hwm dynamically during the simulation. It was shown by [13,16] that WMLES solution in attached flows is not sensitive to hwm , provided that hwm is placed well in the middle of the logarithmic layer of the mean velocity profile. Although no study has reported systematically the sensitivity of the LES solution to h wm in separated flows, it is suggested to place the matching location well down the recirculation bubble and close to the wall, so that the flow seen by the wall model (either reversed or not) is attached and driven passively by the outer LES content [37]. Our experience is that the proper resolution of the separated shear layer in the LES grid is more important than the good placement of LES/wall-model interface layer. In the preliminary WMLES of the stalled NACA4412 with a very coarse LES grid, the flow on the suction surface remained fully attached everywhere, and adjusting hwm brought no change. Only after doubling the wall-parallel resolution could the trailing edge separation be captured. 5.4. Handling of transition and laminar boundary layer While the LES can handle laminar flows and its transition to turbulence naturally through the use of dynamic SGS models [38], the wall model based on RANS requires a flow sensor to differentiate between the laminar and the turbulent states (otherwise the wall model would assume the flow is turbulent everywhere). The sensor deactivates RANS turbulence modeling in the laminar region, and reverts the LES to the no-slip condition. The sensor variable based on the resolved turbulent kinetic energy (TKE) proposed by Bodart and Larsson [35] is used in this study.



u i u i /2 stl ≡ √ .

τ w /ρ

(25)

When stl is less than a given threshold value s0tl (stl < s0tl ), the wall model assumes the flow is laminar and switches off the RANS eddy viscosity/diffusivity (μt = λt = 0). The LES reverts to the no-slip condition on the laminar wall faces, but τ w from the wall model may be still used if the LES grid barely resolves the thin laminar boundary layer (LBL). The latter strategy is useful in external aerodynamics simulations, since the LES grid often cannot resolve the flow near the stagnation point that quickly transitions to turbulence. Bodart and Larsson [35] suggests s0tl = 0.5 based on the analysis of TKE from DNS of turbulent channel flows at a wide range of friction Reynolds numbers (Reτ = 180–2000). In our actual calculation of transitional flows reported in [15], stl remained close to 0 in the laminar region, and rose steeply to values above 1 in the turbulent region (see Fig. 8). The steep variation of stl near the transition was confined in a few cells in the streamwise direction, and threshold values in the range of 0.5 ≤ s0tl ≤ 1 worked equally well, not affecting the transition point significantly. In Eq. (25), the resolved TKE in the numerator is computed by the LES on the top boundary faces of the wall model, and the wall shear stress in the denominator is computed by the wall model on each wall face. Since stl is defined on all CVs in the wall-model domain, each wall-model CV has to identify its proper wall and top faces, and retrieve the TKE and wall stress from those remote faces (possibly across the processors). These tasks are handled by employing procedures similar to those used for the face mapping (§4.2.1) and the boundary data communication (§4.2.3). Additional computational and communication overheads associated with the sensor computation are negligible, and they do not affect the scaling of the solver.

602

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

6. Conclusions We have implemented a PDE-based, two-layer zonal wall model in an unstructured LES solver. The primary task of the wall model is to provide approximate wall-boundary conditions, in terms of the wall stress and the wall heat flux, to the LES with very coarse near-wall resolution. The wall model solves unsteady RANS equations on a separate near-wall grid, given the LES data imposed on the wall-model top boundary. The main contribution of this paper is that a two-layer zonal wall model in its most general form is implemented in an unstructured compressible LES solver for the first time. All two-layer wall models in the literature have been implemented in structured mesh solvers with simplifying assumptions to reduce the implementation overhead. However, inability to handle arbitrary geometries and non-equilibrium effects has confined their use in academic test cases. In order to address these technology shortcomings and impediments, we have built the framework of a PDE-based two-layer zonal wall model in an unstructured mesh flow solver. The unstructured mesh capability significantly increases the usability of the wall-modeled LES technique for realistic engineering applications with complex three-dimensional geometries. Additionally, compressibility effects important in high speed flows are naturally accounted for by adopting compressible flow formulations for both the LES and the wall model. The major issues in implementation are how to manage the processors globally, and how to couple the two unstructured parallel solvers for boundary data communications. It was found that splitting the processors into two groups for exclusive handling of either the LES or the wall model by each group was less efficient than using the whole processors to advance the two solvers in turns. The former required finding the optimal splitting ratio with trial-and-error runs, but it still exhibited up to 15% slowdown compared to the latter. The lack of regularly ordered cell/face connectivities required us to build the manual face mappings between the two solvers through the coupling interfaces. The intrinsic reconstruction schemes of the base flow solver were used for the interpolation of boundary data from one solver to another. We verified the correct implementation of the face mapping, interpolation, and data communication using the method of manufactured solution in hypothetical computational domains, and the expected orders of accuracy of the interpolation procedures were confirmed. The cost of wall modeling in the present work was comparable to the cost of a stand-alone LES without a wall model. Parallel performance of the combined WMLES code was analyzed in terms of scalability and efficiency, and it was shown that the method scales well up to the number of processors above which the wall-model solver exhibits parallel slowdown. Practical concerns in preparing and running the actual simulations, such as grid generations, handling of laminar-to-turbulence transition and thin laminar boundary layer, were discussed. Validation of the present WMLES implementation is reported in separate publications [15,19,20]. The method has successfully been applied to attached turbulent flows at a wide range of Reynolds numbers, and separating flows over an airfoil and streamwise periodic hills. Application of the method to the three-dimensional complex geometries are ongoing. Acknowledgements This work was supported by the Winston and Fu-Mei Stanford Graduate Fellowship, NASA Aeronautics Scholarship Program, the NASA under the Subsonic Fixed-Wing Program (Grant NNX11AI60A), and the Boeing Company. We thank to Drs. Julien Bodart and Sanjeeb Bose for useful discussions on the initial code development. Appendix A. Reconstruction of face gradient Here, we provide a closed form analytical expression for the reconstructed face gradient (∇φ)fc , which is the solution of the linear system in Eqs. (7)–(9). First, let us define the following variables for notational convenience.

dφ ds



φicv1 − φicv0 1 , (∇φ)cv ≡ ((∇φ)icv0 + (∇φ)icv1 ).

s 2

With this definition, Eqs. (7)–(9) are rewritten as

(∇φ)fc · s f =

dφ ds

(A.1)

,

(∇φ)fc · t 1 = (∇φ)cv · t 1 ,

(A.2)

(∇φ)fc · t 2 = (∇φ)cv · t 2 .

(A.3)

The above linear system can be solved for (∇φ)fc , once the unit vectors t 1 and t 2 that form a right-handed Cartesian coordinate system with s f are prescribed. We choose

(s z − s y , s x − s z , s y − s x )

t1 =

2 − 2( s x s y + s y s z + s z s x )

, t2 = t1 × s f ,

(A.4)

where s x , s y , and s z are the Cartesian components of the unit vector s f . The linear system can be now inverted analytically to obtain

G.I. Park, P. Moin / Journal of Computational Physics 305 (2016) 589–603

(∇φ)fc,x = (1 − s2x )(∇φ)cv,x + sx

dφ ds

(∇φ)fc, y = (1 −

s2y )(∇φ)cv, y

(∇φ)fc,z = (1 −

s2z )(∇φ)cv,z

+ sy

+ sz

dφ ds

dφ ds

603

 − s y (∇φ)cv, y − s z (∇φ)cv,z ,

(A.5)

 − s z (∇φ)cv,z − sx (∇φ)cv,x ,

(A.6)

 − sx (∇φ)cv,x − s y (∇φ)cv, y ,

(A.7)

where the subscripts of the gradients denote their Cartesian components. In principle, any proper set of orthogonal unit vectors {t 1 , t 2 } can be used to produce identical numerical results for (∇φ)fc . However, with the choice in Eq. (A.4), there is no need to identify the vectors {t 1 , t 2 } for each computational face, and (∇φ)fc can be expressed in terms of only s f , φicv0 , φicv1 , (∇φ)icv0 , (∇φ)icv1 , and s. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29]

[30] [31] [32] [33] [34] [35] [36] [37] [38]

M. Germano, U. Piomelli, P. Moin, W.H. Cabot, A dynamic subgrid-scale eddy-viscosity model, Phys. Fluids A 3 (1991) 1760–1765. D.K. Lilly, A proposed modification of the Germano subgrid-scale closure method, Phys. Fluids A 4 (1992) 633–635. P. Moin, K. Squires, W. Cabot, S. Lee, A dynamic subgrid-scale model for compressible turbulence and scalar transport, Phys. Fluids 3 (1991) 2746. J. Slotnick, A. Khodadoust, J. Alonso, D. Darmofal, W. Gropp, E. Lurie, D. Mavriplis, CFD Vision 2030 study: a path to revolutionary computational aerosciences, NASA/CR-2014-218178, 2014. H. Choi, P. Moin, Grid-point requirements for large eddy simulation: Chapman’s estimates revisited, Phys. Fluids 24 (2012) 011702. J.W. Deardorff, The numerical study of three dimensional turbulent channel flow at large Reynolds numbers, J. Fluid Mech. 41 (1970) 453–480. U. Schumann, Subgrid scale model for finite difference simulations of turbulent flows in plane channels and annuli, J. Comput. Phys. 18 (1975) 376–404. G. Grötzbach, Direct Numerical and Large Eddy Simulation of Turbulent Channel Flows, Encyclopedia of Fluid Mechanics, vol. 6, 1987. U. Piomelli, J. Ferziger, P. Moin, J. Kim, New approximate boundary conditions for large eddy simulations of wall-bounded flows, Phys. Fluids A 1 (1989) 1061–1068. J. Lee, M. Cho, H. Choi, Large eddy simulations of turbulent channel and boundary layer flows at high Reynolds number with mean wall shear stress boundary condition, Phys. Fluids 25 (2013) 110808. E. Balaras, C. Benoccis, U. Piomelli, Two-layer approximate boundary conditions for large-eddy simulations, AIAA J. 34 (1996) 1111–1119. W. Cabot, P. Moin, Approximate wall boundary conditions in large-eddy simulation of high Reynolds number flow, Flow Turbul. Combust. 63 (2000) 269–291. M. Wang, P. Moin, Dynamic wall modeling for large-eddy simulation of complex turbulent flows, Phys. Fluids 14 (2002) 2043–2051. S. Kawai, J. Larsson, Dynamic non-equilibrium wall-modeling for large eddy simulation at high Reynolds numbers, Phys. Fluids 25 (2013) 015105. G.I. Park, P. Moin, An improved dynamic non-equilibrium wall-model for large eddy simulation, Phys. Fluids 26 (2014) 015108. J. Bodart, J. Larsson, Wall-modeled large eddy simulation in complex geometries with application to high-lift devices, in: Center for Turbulence Research Annual Research Briefs, 2011, pp. 37–48. S. Kawai, J. Larsson, Wall-modeling in large eddy simulation: length scales, grid resolution, and accuracy, Phys. Fluids 24 (2012) 015015. I. Bermejo-Moreno, L. Campo, J. Larsson, J. Bodart, D. Helmer, J.K. Eaton, Confinement effects in shock wave/turbulent boundary layer interactions through wall-modelled large-eddy simulations, J. Fluid Mech. 758 (2014) 5–62. G.I. Park, Wall-modeled large eddy simulation in an unstructured mesh environment, PhD dissertation, Stanford University, Stanford, CA, US, 2014. P. Balakumar, G.I. Park, B. Pierce, DNS LES and wall-modeled LES of separating flow over periodic hills, in: Center for Turbulence Research Proceedings of the Summer Program 2014, 2014, pp. 407–415. J. Nichols, S. Lele, F.E. Ham, S. Martens, J.T. Spyropoulos, Crackle noise in heated supersonic jets, J. Eng. Gas Turbines Power 135 (2013) 051202. A. Saghafian, L. Shunn, D.A. Philips, F. Ham, Large eddy simulations of the HIFiRE scramjet using a compressible flamelet/progress variable approach, Proc. Combust. Inst. 35 (2015) 2163–2172. R. Pecnik, V.E. Terrapon, F. Ham, G. Iaccarino, Full-system RANS of the Hyshot II scramjet. Part 1: numerics and non-reactive simulations, in: Center for Turbulence Research Annual Research Briefs, 2010, pp. 57–68. Y. Khalighi, J.W. Nichols, F. Ham, S.K. Lele, P. Moin, Unstructured large eddy simulation for prediction of noise issued from turbulent jets in various configurations, AIAA Paper No. 2011-2886, 2011. E. Toro, M. Spruce, W. Speares, Restoration of the contact surface in the HLL-Riemann solver, Shock Waves 4 (1994) 25–34. R. Verstappen, A. Veldman, Symmetry-preserving discretization of turbulent flow, J. Comput. Phys. 187 (2003) 343–368. F. Ham, G. Iaccarino, Energy conservation in collocated discretization schemes on unstructured meshes, in: Center for Turbulence Research Annual Research Briefs, 2004, pp. 3–14. P. Batten, M. Leschziner, U. Goldberg, Average-state Jacobians and implicit methods for compressible viscous and turbulent flows, J. Comput. Phys. 137 (1997) 38–78. S. Balay, S. Abhyankar, M.F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, V. Eijkhout, W.D. Gropp, D. Kaushik, M.G. Knepley, L.C. McInnes, K. Rupp, B.F. Smith, S. Zampini, H. Zhang, PETSc users manual, Technical Report ANL-95/11 – Revision 3.6, Argonne National Laboratory, 2015, http:// www.mcs.anl.gov/petsc. K.W. Thompson, Time dependent boundary conditions for hyperbolic systems, J. Comput. Phys. 68 (1987) 1–24. J.U. Schluter, X. Wu, S. Kim, S. Shankaran, J.J. Alonso, H. Pitsch, A framework for coupling Reynolds-averaged with large-eddy simulations for gas turbine applications, J. Fluids Eng. 127 (2005) 806–815. P. Moin, Center for Integrated Turbulence Simulations annual technical report, Report, Stanford University, Stanford, California 94305, 2008. J. Bonet, J. Peraire, An alternating digital tree (ADT) algorithm for 3D geometric searching and intersection problems, Int. J. Numer. Methods Eng. 31 (1991) 1–17. F.M. White, Viscous Fluid Flow, 3rd ed., McGraw-Hill, 2006. J. Bodart, J. Larsson, Sensor-based computation of transitional flows using wall-modeled large eddy simulation, in: Center for Turbulence Research Annual Research Briefs, 2012, pp. 229–240. A.J. Wadcock, Investigation of low-speed turbulent separated flow around airfoils, NASA Contractor Report 177450, 1987. U. Piomelli, E. Balaras, Wall-layer models for large-eddy simulations, Annu. Rev. Fluid Mech. 34 (2002) 349–374. T. Sayadi, P. Moin, Large eddy simulation of controlled transition to turbulence, Phys. Fluids 24 (2012) 114103.