Adjoint Lattice Boltzmann for topology optimization on multi-GPU architecture

Adjoint Lattice Boltzmann for topology optimization on multi-GPU architecture

Computers and Mathematics with Applications ( ) – Contents lists available at ScienceDirect Computers and Mathematics with Applications journal ho...

1MB Sizes 0 Downloads 105 Views

Computers and Mathematics with Applications (

)



Contents lists available at ScienceDirect

Computers and Mathematics with Applications journal homepage: www.elsevier.com/locate/camwa

Adjoint Lattice Boltzmann for topology optimization on multi-GPU architecture Ł. Łaniewski-Wołłk ∗ , J. Rokicki Institute of Aeronautics and Applied Mechanics, Warsaw University of Technology, Nowowiejska 24, 00-665 Warszawa, Poland

article

info

Article history: Received 22 July 2014 Received in revised form 19 September 2015 Accepted 28 December 2015 Available online xxxx Keywords: Lattice Boltzmann Method Adjoint Optimization

abstract In this paper we present a topology optimization technique applicable to a broad range of flow design problems. We propose also a discrete adjoint formulation effective for a wide class of Lattice Boltzmann Methods (LBM). This adjoint formulation is used to calculate sensitivity of the LBM solution to several type of parameters, both global and local. The numerical scheme for solving the adjoint problem has many properties of the original system, including locality and explicit time-stepping. Thus it is possible to integrate it with the standard LBM solver, allowing for straightforward and efficient parallelization (overcoming limitations typical for the discrete adjoint solvers). This approach is successfully used for the channel flow to design a free-topology mixer and a heat exchanger. Both resulting geometries being very complex maximize their objective functions, while keeping viscous losses at acceptable level. © 2016 Elsevier Ltd. All rights reserved.

1. Introduction The goal of the present work was to develop an optimization framework for a variety of multi-physics fluid flow problems. This framework is intended to provide consistent objective function improvement for any physical model implemented, while the numerical method is expected to allow for straightforward parallelization to achieve peak performance on clusters of modern nVidia CUDA capable GPUs. The optimal design for fluid flow problems has received considerable attention in recent years by both engineers and mathematicians. For a general survey of CFD optimization methods we refer the reader to the book by Mohammadi and Pironneau [1] and to a more recent work of Rozvany et al. [2]. In principle the flow domain optimization problems are solved with two main approaches called shape and topology optimization. The first, more traditional approach, consists in parametrization of the geometry and subsequent finding of the parameters corresponding to the best objective function [1,3]. This limits the range of available shapes as topology cannot change during the optimization process. Nevertheless the approach remains very efficient and accurate in assessing the influence of fine details of geometry. The second approach defines the unknown shape using a global scalar field indicating the presence of fluid or solid at each point of the domain [2]. Different scalar fields are used for this purpose, nevertheless the artificial porosity, introduced by Borrvall et al. [4], seems to be most popular [4–6]. This approach is very well established for solids and structures; for an extensive review we refer the reader to the book by Bendsøe and Sigmund [7]. Discretization of a fluid problem is typically obtained by Finite Volume, Finite Element, or more recently by Discontinuous Galerkin methods. These methods use fine body-fitted meshes, which are difficult to generate for complex 3D geometries.



Corresponding author. E-mail addresses: [email protected] (Ł. Łaniewski-Wołłk), [email protected] (J. Rokicki).

http://dx.doi.org/10.1016/j.camwa.2015.12.043 0898-1221/© 2016 Elsevier Ltd. All rights reserved.

2

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



Moreover these approaches are quite difficult to apply for multiphase flows with variable complex interfaces. In contrast Lattice Boltzmann Methods (LBM) are based on the concept of solving discrete Boltzmann equation on a very fine Cartesian grid. This in principle allows to tackle geometries of extreme complexity (see application of LBM to the filtration problems [8,9]). LBM is proven to converge to incompressible Navier–Stokes equations for the low Mach number limit and in recent years has gained broad recognition as a method for calculation of micro-fluids, multi-phase problems, and flows through porous media [10,8,9]. For a comprehensive introduction to LBM we refer the reader to the book by Succi [11]. The geometry in LBM is described by switching specific nodes of the mesh on and off and applying a so-called Bounce Back boundary condition. This inherent immersed boundary approach is an ideal candidate for topology optimization technique, where very complex shapes are expected to appear. To achieve acceptable efficiency of the optimization algorithm for high-dimensional design spaces, the gradient of the objective function is needed. For this purpose we use a well established adjoint approach [5,12,6,1]. Two main types of this approach exist. The first is a continuous adjoint, which is based on the equation dual to the original form of the problem (e.g., Navier–Stokes equations). The resulting dual equation is discretized, in most cases, by the same scheme as the original (primal) problem [5]. The second approach is a discrete adjoint, which formulates equations dual to the already discretized form of the original problem [13,14]. In the present work we use the latter approach, as it can be easily automated, at the same time providing the accurate value of the computed gradient. Despite advantages, the discrete adjoint exhibits drawbacks related to parallelization. This is because many implementations use Automatic Differentiation (AD) techniques to the whole residual calculation procedure. In most cases AD tools are incapable of handling MPI directives,1 which makes the resulting code inherently serial. This approach is also very memory consuming. In contrast, in the present work we apply AD only to small local procedures, which allows the adjoint code to be both efficient and parallel. The optimization in the context of Lattice Boltzmann Methods was addressed previously by several authors [12,16–25] also with the use of adjoint approach. Adjoint Lattice Boltzmann Method (ALBM) was used for topology optimization by Pingen, Evgrafov, Kirk, et al. [12,16,23], and more recently by Liu et al. [22] and Yaji et al. [24]. Lagrange multiplier approach, mathematically equivalent to the adjoint, was also used for topology optimization by Yonekura et al. [25]. In the same work, the authors use the one-shot approach to speed up the optimization process (interpreting one-shot as ‘‘using transient information’’). Adjoint approach was also applied for a simple optimal control case by Hekmat et al. [20,21] and for the 3D optimal control (aimed at inverse design application) by Krause et al. [18,19]. The latter work presented also an efficient parallelization of the Adjoint Lattice Boltzmann Method solver. The adjoint approach was also considered for parameter identification by Tekitek et al. [26]. In almost all of the above cases, the adjoint formulation was derived by hand for a specific model (with a notable exception of [19]). In contrast the present work provides a fully parallel adjoint formulation for a wide class of Lattice Boltzmann schemes. The approach allows for automatic generation of adjoint formulation for any commonly used boundary conditions, forcing schemes or collision operators of arbitrary complexity. The advantages of such an automated approach vary from simplicity of gradient calculation for complex multi-physics problems, ability to use complex boundary conditions, to maintainability of the code itself. Many of the hand differentiated codes, used widely in commercial solvers, suffer from problem with maintenance, as the adjoint code has to be kept up-to-date with the primal. Our approach greatly reduce these problems, as long as the implemented scheme adheres to a well defined class of numerical schemes with an a priori defined communication pattern (see Section 2.1). To illustrate the efficiency and the ease of implementation, we apply the Adjoint Lattice Boltzmann to the 3D topological optimization in the context of the coupled flow and heat transfer problem. The optimization is taking advantage of a one-shot approach, which lowers further the numerical cost. The obtained optimal heat exchanger has an almost fractal shape, proving that the approach is capable of tackling the complex 3D problems. To simulate complex flows (especially in 3D) with the Lattice Boltzmann Method, an efficient parallel implementation is needed. Recently many researchers implementing LBM are taking advantage of the computing power and the memory efficiency of the Graphics Processing Units (GPU) [27–30] (see Tölke, Krafczyk et al. [27], which is also an example of application of a very rare D3Q13 lattice, for the sake of memory efficiency alone). The detailed description of an example CPU and GPU implementation of the LBM is showed by Schönherr et al. [30]. The paper discusses also a common misconception about the real speed-up of the GPU codes. Misconception arises from comparison with the non-multi-threaded and the non-SIMD-aware CPU implementations. Multi-GPU implementation of LBM is presented in further detail in the works of Obrecht, and Kuznik et al. [28,29]. To obtain the highly optimized parallel code for the GPU architecture, the present work uses the concept of automatic code generation. This concept was used already by the present authors to generate the GPU-tuned code for a wide variety of Lattice Boltzmann schemes, ranging from simple incompressible Navier–Stokes simulations [9], complex 3D LES simulation, to multi-phase [31,32] and electrophoretic micro-flows. Such code exhibits high parallel performance and scalability. These properties are preserved also for the adjoint case, provided the general local formulation is used. The performance tests for the present case are analysed in more detail in Section 4.4. The problem of low-Reynolds number mixing was addressed earlier with both Finite Element Method (FEM) by Andreasen et al. [6], and with LBM by Makhija et al. [17], however the results of the latter work are limited to the 2D 1 More modern AD software can in principle cope with MPI directives [15], but the process is still not completely automated and may produce a very inefficient code.

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



3

channel flows and represent low topological complexity. The case of topology optimization of a heat exchanger or heat-sink geometry was the subject of an extensive Ph.D. thesis by Lee [33]. Simplified problems were also considered by Dede [34] using FEM and McConnell et al. [35] using Lattice Boltzmann (a pseudo-3D case). For all of the above, the computational cost and the complexity of the adjoint derivation were a strong limit on the applicability, thus the works focus on 2D cases and relatively simple geometries. The presented work overcomes these limits, by the use of a generalized local formulation of the ALBM and by the use of highly scalable solver. In Section 2 we describe a framework for a class of numerical LBM schemes and provide for them an adjoint formulation. In Section 3 we show details of the current LBM presenting: (i) the topology parametrization, (ii) the optimization algorithm and (iii) the implementation for the CUDA architecture. In Section 4 we formulate two distinct topology optimization problems and analyse the results obtained. We also investigate the parallel performance of the code, considering both the weak and the strong scaling. The two test cases deal with a low Reynolds number mixer and a free-topology heat exchanger. The final Section presents the conclusions. 2. Local Lattice Boltzmann scheme and adjoint formulation To address a wider class of Lattice Boltzmann Methods as well as to preserve important parallelization properties in the adjoint formulation, we define a subclass of Local Lattice Boltzmann Schemes (LLBS). The procedure to obtain an adjoint formulation can be applied to any Lattice Boltzmann Method that fits into the proposed subclass. The main features of LLBS are the locality of the collision operator and the reversibility of the streaming. These constraints still allow to have a different collision operators for each lattice node, provided that the operator does not use information from the surrounding nodes (apart from the streamed information). Most Lattice Boltzmann schemes, including single and multiple relaxation time schemes, fit into this framework. Many other schemes, including these for multi-phase flows, can be suitably reformulated for this framework by adding additional densities. 2.1. Local Lattice Boltzmann scheme For the purpose of generalization we propose the following mathematical framework. Let us define lattice L as a finite set L = {0, . . . , Lx − 1} × {0, . . . , Ly − 1} × {0, . . . , Lz − 1}. For a fixed M we define density space RM , with its elements describing the physical state at each node. The complete state of our system is expressed by the states at all the lattice nodes f : L → RM . The linear space of all possible states will be called a state space H = {f : L → RM }. Let us also define a parameter space P ⊂ Rp describing the allowable range for the optimization variables α ∈ P. The Lattice Boltzmann method is characterized by complete separation of communication between nodes (called streaming) and calculations at each node (called collision). This feature allows for very efficient massive parallelization and nearly linear scalability [36]. At each lattice node x ∈ L, the streaming is defined as:

(Sf )j (x + ej ) = fj (x) for j = 1, . . . , M where ej denotes the shift of information related to the jth density from the node x to the neighbouring node x + ej . The shift x + ej is assumed to be periodic in all directions (further on, the increment/decrement shift will be always mean addition/subtraction modulo Lx , Ly and Lz ). This periodicity of the streaming operator is vital for the simplicity of the adjoint formulation, yet by no means limits the generality of the approach. The collision operator W x (f (x), α) is defined for each node x of the Lattice, its first argument denotes the set of densities, while the second the possible dependence on all optimization parameters α ∈ P. In general we may have a separate collision operator W x : RM → RM for each node of the lattice x ∈ L. A single step n of the Local Lattice Boltzmann iterative scheme is finally defined as a composition of streaming and collision operators: fjn+1 (x + ej ) = Wjx (f n (x), α) for j = 1, . . . , M .

(1)

It has to be stressed again, that W x depends only on the densities at a given node, but not on the densities at the neighbouring nodes. The stationary solution fˆ forms a fixed point of the iterative scheme (1): fˆj (x + ej ) = Wjx (fˆ (x), α).

(2)

It should be noted that this stationary solution fˆ implicitly depends on all parameters α . It will be assumed now, that the objective function, we want to optimize has the following general form: F (α) =



F x (fˆ (x))

(3)

x

where the node dependent function F x can be quite arbitrary. Other forms of (3) are possible, yet this formula seems sufficient to represent typical optimization problems in hydrodynamics.

4

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



2.2. Adjoint Lattice Boltzmann scheme Before we go any further, we need to make a simple observation, based on the periodicity of the shift operator S.

(x + ei ). From which it follows that for any a, b ∈ H:   ai (x + ei )bi (x + ei ) and ai (x)bi (x + ei ) = ai (x − ei )bi (x).

Lemma 1. For any f ∈ H, we have



ai (x)bi (x) =



x ,i



x,i fi

(x) =



x,i fi

x ,i

x ,i

It can be noted that, for a scalar product ⟨a, b⟩ =



x,i

x ,i

ai (x)bi (x) defined on H, this lemma states that the streaming operator

S is unitary, and we have both ⟨a, b⟩ = ⟨Sa, Sb⟩ and ⟨a, Sb⟩ = ⟨S −1 a, b⟩. Now we can introduce the adjoint Lattice Boltzmann equation and prove the required properties. Theorem 1. If fˆ ∈ H is a fixed point solution of the Local Lattice Boltzmann equation (2), the objective function F is defined by (3) and v ∈ H is a solution of the equation:

vk (x − ek ) =



vj (x)

j

∂ Wjx ∂ fk

(fˆ (x), α) +

∂F x ˆ (f (x)) for k = 1, . . . , M ∂ fk

(4)

then the gradient of the objective function can be calculated as

 ∂ Wjx ∂F vj (x) = (fˆ (x), α). ∂α ∂α x ,j

(5)

By (4) we have defined the fixed-point adjoint Lattice Boltzmann equation. It is important to notice, that v is of the same size as fˆ , while the number of equations in (4) is independent of the number of parameters α . Solved once, it can be used to calculate gradient of the objective function with respect to any number (and any type) of parameters by using the formula (5). Proof. Let us differentiate equation (2) with respect to α :

 ∂ Wjx ∂ Wjx ∂ fˆj ∂ fˆk (x + ej ) = (fˆ (x), α) (x) + (fˆ (x), α). ∂α ∂ fk ∂α ∂α k

(6)

∂ fˆ

Now let us multiply Eq. (4) by ∂αk (x) and sum it over all indices k:

 k

   ∂ Wjx   ∂F x ∂ fˆk ∂ fˆk ∂ fˆk ˆ v j ( x) vk (x − ek ) (x) = (f (x), α) (x) + (fˆ (x)) (x). ∂α ∂ f ∂α ∂ f ∂α k k k j k

The bracket is now substituted by Eq. (6):

 k

    ∂F x ∂ Wjx ∂ fˆk ∂ fˆj ∂ fˆk ˆ v j ( x) ( x + ej ) − (f (x), α) + (fˆ (x)) (x) vk (x − ek ) (x) = ∂α ∂α ∂α ∂ f ∂α k j k

and both sides are summed over all lattice nodes:



vk (x − ek )

x,k

   ∂F x ∂ Wjx ∂ fˆk ∂ fˆj ∂ fˆk (x) = v j ( x) ( x + e j ) − vj (x) (fˆ (x), α) + (fˆ (x)) (x). ∂α ∂α ∂α ∂ fk ∂α x ,j x,j x ,k

Finally, using Lemma 1, one obtains:

 x ,j

vj (x)

∂ Wjx ∂α

(fˆ (x), α) =

 ∂F x x ,k

which concludes the proof.

∂ fk

(fˆ (x))

∂ fˆk ∂F (x) = ∂α ∂α



We propose a numerical scheme in which v(x) is obtained by explicit iterations based on Eq. (4):

vkn+1 (x − ek ) =

 j

vjn (x)

∂ Wjx ∂ fk

(fˆ (x), α) +

∂F x ˆ (f (x)). ∂ fk

(7)

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



5

We will call this scheme the Adjoint Lattice Boltzmann Scheme (ALBS). It has clearly the same form as the LLBS iteration formula (1). However the streaming of v acts in the opposite direction, while the corresponding adjoint collision operator is defined as:

kx (v, f , α) = W

 j

vj

∂ Wjx ∂ fk

(f , α) +

∂F x (f ). ∂ fk

(8)

3. Implementation of the Lattice Boltzmann method In this study a set of 19 vectors e is used for 3D flow simulations [37]: e1 = [0, 0, 0] e2 = [1, 0, 0]

e3 = [−1, 0, 0]

e8 = [1, 1, 0]

e9 = [−1, 1, 0]

e14 = [1, 0, −1]

e15 = [−1, 0, −1]

e4 = [0, 1, 0]

e5 = [0, −1, 0]

e6 = [0, 0, 1]

e7 = [0, 0, −1]

e10 = [1, −1, 0] e11 = [−1, −1, 0]

e12 = [1, 0, 1]

e13 = [−1, 0, 1]

e16 = [0, 1, 1]

e18 = [0, 1, −1]

e19 = [0, −1, −1].

e17 = [0, −1, 1]

For the heat transfer equation we use the first 7 of the above. For the 2D cases the number of vectors is reduced to 9 for both the fluid flow and the heat transfer. In LLMS the local collision operator plays a different role at the different nodes of the lattice. It is both responsible for fluid dynamics at the interior nodes, as well as for enforcing the boundary conditions. Therefore three types of nodes can be distinguished: (i) interior nodes, (ii) inlet/outlet nodes, (iii) wall nodes. For inlet/outlet nodes we use simple boundary conditions by Zou and He [38], while for the wall nodes we use bounce back condition [11]. For the interior nodes we use a Forced Multiple Relaxation Time (FMRT) collision. 3.1. Forced multiple relaxation time collision For a classic LB scheme, a simple (BGK) collision operator is used:

W bgk (f ) = f + ω feq − f





(9)

where feq are the equilibrium densities depending on the macroscopic quantities, while ω1 is called the over-relaxation time. This over-relaxation time is directly related to the modelled viscosity coefficient ω1 = 0.5 + 3ν (1 ≤ ω < 2). For stabilization of the scheme an additional transformation was introduced [37]. This approach applies a linear transformation of the density set f , projecting it to a moment space (corresponding to geometric moments of the velocity set). Different relaxation factors are used in each direction of the moment space. Let U be a matrix of this transformation (for 3D flow simulations we use the matrix proposed in [37]). We define the moment projection as m = Uf . The analogue for the BGK operator (9) in the moment space can be written as:

W m (m) = m + T meq − m





(10)

where T is a diagonal matrix of relaxation factors. Multiplying both sides of (10) by U −1 we get MRT collision

W mrt (f ) = f + U −1 TU feq − f





in most cases it is less computationally expensive to calculate meq than feq . This is why an intermediate form is further used:



W mrt (f ) = U −1 meq + (I − T ) Uf − meq





for our applications it is important to change the global quantities during the iteration process in order to:

• enforce certain values like velocity, temperature, etc. at specific points, • optionally add a forcing term e.g. mass force or heat source. eq

eq

We introduce these changes by differentiating between the pre-collision fpre and post-collision fpost equilibria. This defines the Forced Multi Relaxation Time (FMRT) collision:



pre W fmrt (f ) = U −1 mpost eq + (I − T ) Uf − meq



pre



(11)

where the pre-collision equilibrium feq is based on the global variables (like u and ρ ) of the density set f prior to collision post while feq is based on the desired post-collision values. The FMRT can be seen as a natural extension of the Exact Difference Method [39] of discretization of the body forces in LBM. To illustrate this, let us assume that f eq depends on the velocity u

6

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



Fig. 1. The influence of the cut-off level (threshold) on the objective in the mixing test case.

and the density ρ . We can apply a body force R and integrate the dynamics of u with a simple Euler step: upost = upre + Rdt. Applying this to the FMRT scheme we get:

W fmrt (f ) = feq (upre + Rdt, ρ) + U −1 TU f − feq (upre , ρ) .





Let us now discard the suffix of u to obtain:





W fmrt (f ) = feq (u + Rdt, ρ) − feq (u, ρ) + W mrt (f ) where the first bracket denotes the Exact Difference discretization of the force R. This approach can be used for any forcing terms, including: the heat flux, the solvent source as well as these which enforce the prescribed velocity or temperature at the selected points. Several forcing schemes can be found in the literature, including the method used in the Shan–Chen model proposed in the original work of Shan and Chen [40]. For a comprehensive comparison of different methods of discretization of the forcing term in the Lattice Boltzmann we refer the reader to the work by Huang et al. [41], which compares the Exact Difference Method with the other forcing schemes. 3.2. Topology parametrization In order to implement the topology optimization with the classic optimization algorithms, one has to parametrize the design space. For this purpose at each node of the lattice L we define a parameter w , which refers to interpolation between a fluid node (w = 1) and a solid node (w = 0). The material properties such as heat diffusivity are obtained by interpolation β = wβfluid + (1 − w)βsolid . In order to represent different dynamics in the solid and in the fluid regions, the most common method is to add a fictitious Darcy body force R = −K (w)u [5,6]. In practical implementation we use: upost = upre − dtK (w)upre = G(w)upre where switching function G assumes value G(1) = 1 (no additional Darcy force exists in the fluid part) and G(0) = 0 (corresponds to upost = 0 in the solid part). The particular selection of function G (or K ) can significantly affect the convergence of the optimization process, nevertheless this issue is not discussed here being a subject of a separate study. The common choice is a power function: K (w) = Kˆ w θ with θ = 3 [6,12]. The final, optimized geometry should consist of either purely solid (w = 0) or purely fluid (w = 1) nodes. In this way a particular form of G becomes unimportant for the evaluation of the final result. To reduce the number of nodes with the intermediate values of w (between 0 and 1) during the optimization, we introduce a quadratic penalty function P: P =



w(1 − w).

(12)

nodes

This penalty is gradually added to the main objective function to force the optimizer to avoid non-physical values of w . To ensure that the final geometry has a physical meaning at a final stage of optimization, every w that is below a threshold η is set to 0, while w above η is set to 1. Finally the threshold level η giving the best objective function value is selected from the interval [0, 1]. The example how the objective function depends on the threshold level is presented in Fig. 1. 3.3. Optimization For the optimization purposes two methods are used in the present paper. The first is the Method of Moving Asymptotes (MMA) proposed by Svanberg [42]. In this approach, at each iteration of the optimization algorithm:

• • • •

the primal solution is iterated until convergence the adjoint scheme is iterated by the same number of iterations the objective function and the gradient are evaluated the optimization parameters are updated by MMA.

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



7

Fig. 2. Scheme of code generation and compilation in the CLB code.

The second method is a simple descent algorithm. The optimization problem is solved simultaneously with the primal and adjoint problem using a one-shot framework [13,43]. A single iteration consists of:

• one iteration of the primal problem • one iteration of the adjoint problem • one update of the parameters: αin+1 = αin + ζ

∂ F ∂αi

(13)

∂ where ∂α F denotes the present estimate of the gradient from the actual adjoint solution. The sufficiently small parameter i ζ is chosen to suppress possible oscillations (caused by the delayed response).

3.4. Programming and parallel implementation Both the primal and the adjoint Lattice Boltzmann schemes are solved by the in-house solver CLB. This solver is highly parallel and tuned for nVidia CUDA architecture. It is also capable of running on multiple GPUs by using MPI directives. To achieve high performance and to allow for automatic adjoint formulation, we have used special code generation techniques. The code is generated for every model separately. The model is described by a set of tables defining:

• • • •

Densities—Names of the densities and corresponding streaming vectors (e.g., fi and ei in (1)) Settings—Names of global settings that influence the model (e.g., ν ) Globals—Names and properties of the global integrals (e.g., Volume Flux, Heat Flux) Quantities—Names of the macroscopic quantities that are exported by the model (e.g., u, ρ , T ).

Based on this description and on the generic text templates, all the source codes are generated with a in-house software called R-Template [44]. R-Template is a generic open source template tool build on top of R programming language [45]. The model collision operator W is defined as a function in a C source file, operating on the defined densities. As the code includes matrix multiplications and other repeatable operations, it is written partly using the symbolic algebra. These symbolic formulas are subsequently expanded by the R-Template tool. For the adjoint formulation, the model information is augmented with the adjoint densities. Then the model’s dynamics is differentiated using Automatic Differentiation (AD) tool TAPENADE [46] (for reverse differentiation of the operator W ). The obtained code is subsequently used in implementation of Eq. (8). The complete code generation and compilation procedure are presented in Fig. 2. 4. Results To verify the performance of the proposed optimization approach, two test cases were considered. The first test case maximizes the low Reynolds number mixing. The second test case concerns coupled heat and mass transfer, in which the

8

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



Table 1 Summary of the setting in both test cases. Case Mixing Heat exchanger

Lattice size

No. of nodes

DoF

450 × 50 × 50 352 × 50 × 50

1.12 × 10 0.88 × 106

29.2 × 10 22.8 × 106

6

6

Span of the design region

1p

ν

αfluid

αsolid

61–390 101–252

0.016666 0.03

0.02 0.01

0.003 0.003

0.003 1

Table 2 Summary of the results. SD—Standard deviation; Opt—Optimized with the one-shot method; MMA—Optimized with the MMA algorithm; T—threshold applied, resulting in a binary (w = 0 and w = 1) geometry; P—penalty P (see (12)) gradually added to the objective function; Fins—Optimal fin heat exchanger. Case

Temperature

Topology parameter w in design region

Mean

SD

Re

Mean

SD

0

49.266 114.508 111.863 113.381 110.383

0.051 0.050 0.050 0.050 0.050

0.031 0.030 0.030 0.031 0.030

127 125 125 125 125

0.000 0.041 0.041 0.000 0.000

0.763 0.068 0.175 0.094 0.200

2.376 25.202 33.362 27.883 31.032 28.226 30.955 29.447

0.051 0.030 0.034 0.032 0.025 0.026 0.027 0.029

0.029 0.017 0.020 0.017 0.009 0.013 0.010 0.014

225 150 170 160 125 130 135 145

0.020 0.367 0.424 0.378 0.533 0.467 0.502 0.446

0.049 0.092 0.101 0.062 0.050 0.065 0.051 0.060

Objective

Mixer

Empty MMA MMA + T Opt Opt + T

Exchanger

Empty Fins MMA MMA + T Opt Opt + T Opt + P Opt + P + T

Velocity

(0,0.9)

Iterations

[0.9,0.99)

[0.99,1)

1

2% – 3%

– 1% – – –

– 6% – 6% –

– 6% – 7% –

100% 87% 98% 87% 97%

0.04 × 106 6 × 106 – 1 × 106 –

– 16% 10% 25% 25% 40% 31% 32%

– – 14% – 15% – <1% –

– – 7% – 3% – 2% –

– – 4% – 9% – 9% –

100% 84% 65% 75% 48% 60% 57% 68%

0.03 × 106 0.03 × 106 3.5 × 106 – 0.4 × 106 – 0.4 × 106 –



<1%

convected heat is maximized. In both cases the new shapes are created in the channel with a square cross-section. The topology optimization algorithm generates these shapes in a selected volume of this channel (called a design region). The fluid flow is simulated using a standard 19-density 3D MRT [37], while for the heat transfer a 7-density single relaxation time BGK is used. The temperature field is always a passive scalar and does not influence the fluid flow. The walls of the channel are adiabatic with a no-slip boundary condition for the velocity. At the inlet and at the outlet the standard Pressure Inlet/Outlet boundary conditions by Zou and He [38] are imposed. The pressure at the outlet is set to the reference level of 0 (corresponding to the LB density 1), while the pressure at the inlet is set equal to the desired pressure drop. The additional constraint was imposed on the inlet, limiting (out of stability reasons) the velocity to 0.05. During the optimization process in both cases the Reynolds number remains in the range of 100–250, being strongly dependent on the shape of the generated geometry (see Table 2 for the values for specific geometries). Summary of the settings for both cases can be found in Table 1. All simulations were performed with an unsteady LBM solver and were checked for the potential unsteady behaviour of the solution. In the discussed optimization runs only geometries that allow for a steady-state flow solution were found. For higher Reynold number one can expect the optimal geometry to generate complex unsteady flows as they may intensify mixing or heat transfer. 4.1. Low Reynolds number mixing The problem of low-Reynolds number mixing was considered by Gersborg-Hansen, Andreasen, Sigmund et al. [47,6] with the use of FEM, and by Makhija et al. [17] with LBM. The present work uses a fairly similar set-up to the case presented by Andreasen [6], but instead of a constrained optimization problem, we use an objective function taking the mass flux into account. The mixing test case consists of a square channel of the height D and the length 9D. The design region of length ∼ 6.5D is located in the middle section of the channel. At the inlet a discontinuous temperature distribution is prescribed (T = −1 for y < 25 and T = 1 for y ≥ 25—see Fig. 3(a)). As the heat conduction mimics the diffusion, the evolution of temperature can be used to represent the mass mixing process. To evaluate the quality of mixing we use a flux of an award function:



(u · n)(1 − T )(1 + T )dS =

Mixing = outlet



(u · n)(1 − T 2 )dS .

(14)

outlet

Similar objective function was considered by Makhija et al. [17] for a two component flow (with ρ1 = 1+2 T and ρ2 = 1−2 T ) and Andreasen [6] (without u · n, which accounts in our case for the mass flux). This simple award function brings a balance between the low variance of temperature and the high volume of the flow. The pressure drop of 1p = 0.016666 was set between the inlet and the outlet. The viscosity was taken as ν = 0.02, while both the solid and the fluid heat diffusivity coefficients were set to αsolid = αfluid = 0.003 (see Table 1).

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



9

Fig. 3. Temperature on (a) inlet of the channel (b) outlet of the channel without any obstacles and (c) outlet of the channel with the optimal geometry.

If there is no obstacle in the channel, the temperature profile is evolving by simple heat conduction. The resulting distribution of temperature at the outlet is illustrated in Fig. 3(b). This particular distribution has a standard deviation of σ = 0.763 (see Table 2) while the value of the objective function (14) is around 50. The convergence of the one-shot optimization procedure can be observed in Fig. 4. The final value of the objective is around 113 (the standard deviation of the temperature distribution is σ = 0.094). The resulting geometry consists mostly of nodes with values of w > 0.9. As the diffusivity coefficients of the solid and the fluid are the same and the value of w = 0.9 seems sufficient to completely stop the flow, the sensitivity of the objective to the parameter w becomes negligible. Application of the threshold, marginally reduces the objective (from 113 to 110). Dependence of the objective on the threshold level is shown in Fig. 1. The standard deviation of the temperature has risen to σ = 0.200. The distribution of temperature at the outlet for the optimized geometry is illustrated in Fig. 3(c). A stronger influence of the threshold procedure will become visible for the case of the heat exchanger, for which the thermal conductivity of the solid is much higher than that of the fluid. The final geometry of the optimized shape together with the temperature contour plots in the channel are presented in Fig. 5. Though the obstacles seem to float freely, the generated geometry is separated from the boundary only by a thin gap. The flow in these gaps is virtually stopped. The sensitivity of the objective to the presence of the solid in the gap is negligible, thus the gaps are never filled by the optimizer. These gaps can be filled in the post-processing step, which will leave only one free-floating element of the geometry. More detailed discussion on the subject of post-processing can be found in Section 4.3. The MMA optimization gives just slightly better results then the one-shot simple descent algorithm (with the objective equal to 114.5 before, and 112 after applying the threshold, see Table 2). On the other hand the MMA approach needs around 6 million of the internal LBM solver iterations, whereas one-shot method needs around 1 million. This difference occurs, because in the MMA optimization we iterate both the solution and the adjoint until the convergence is obtained in each optimization iteration. In the one-shot approach we use the imprecise, unconverged adjoint solution for the update, saving a lot of iterations at the early stages of the optimization. The one-shot approach can be successfully applied together with many optimization algorithms, nevertheless, in all cases a judiciously selected convergence/stopping criterion is necessary for the method to work properly [13,43]. If such a parameter is properly selected, the computational cost of the one-shot optimization is comparable to the cost of primal calculation [43]. In principle the MMA can also be coupled with the one-shot approach, nevertheless this was not implemented in the present study. 4.2. Heat exchanger The goal of this test case is to intensify the heat exchange between a heating plate and the cooling fluid in a low Reynolds number channel flow. The problem of low-Reynold number heat exchanger or heat-sink geometry was considered previously in a Ph.D. thesis by Lee [33], and a work of Dede [34] using FEM. The latter used the COMSOL Multi-physics package. McConnell et al. [35] used a method based on LBM to solve a flat case (pseudo-3D) of heat exchange in a square compartment with an evenly distributed heat source. As these works do not establish any common benchmark case, we introduce a simple optimization problem set-up to test our approach. This test case consists of a square channel of the height D and the length 7D (see Fig. 6). In the middle of the channel, a square D × D heating plate is located at the bottom wall. The design region is the middle section of the pipe (3D). The heating plate has a fixed temperature of T = 1, while the fluid at inlet has T = 0. The goal is to increase the heat flux convected at the outlet.



(u · n)TdS .

Heat flux ∼ outlet

(15)

10

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



Fig. 4. Convergence of the optimization for the free-topology mixer. Descent speed denotes the speed ζ in the steepest descent algorithm (see 3.3).

Fig. 5. Optimal free-topology mixer, generated by the one-shot algorithm. Geometry of the mixer and the contour plots of the temperature in the consecutive crossections of the channel.

Fig. 6. Heat exchanger test case geometry.

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



11

Fig. 7. Objective function for a heat exchanger consisting of a set of parallel fins.

The pressure drop of 1p = 0.03 was set between the inlet and the outlet. The viscosity coefficient was assumed as ν = 0.01, while the solid and the fluid heat diffusivity coefficients were taken as αsolid = 1 and αfluid = 0.003 respectively (Prandtl number = 3.33). If there is no obstacle in the channel, the heat exchange is marginal giving value of the objective function (15) of around 2.4 and the mean temperature of 0.02. To make a fair comparison of the performance of the optimized geometry, a series of simple geometries were first investigated. Geometries consisting of one to seven evenly spaced parallel vertical fins of width of 3–10 lattice nodes were tested with the same flow settings. An additional constraint was adopted to preserve a gap of at least 3 lattice nodes between the fins. The results for all fin geometries can be seen in Fig. 7. As also reported by McConnell et al. [35], the function of heat flux has a clear optimum with respect to the number of fins. The best heat transfer was achieved for 6 fins of width 4 (see Table 2). For such configuration the value of the objective is 25.2. In contrast the one-shot topology optimization algorithm gives a result of around 31 (see Table 2). This behaviour is caused by a region of w > 0.9 acting as a porous obstacle on the top of the resulting geometry. Such values of w act as a fictitious porous medium, with high heat conductivity. This in turn results in a higher value of the objective. Nevertheless such objective value is non-physical and some selected threshold level has to be applied for the value of parameter w . When a low threshold value is applied, this region is blocked, reducing the flux. In the case of high threshold value, there is no obstacle in this region, causing the flow to pass by quickly, without being heated. Finally even for the optimal threshold level, the objective value is reduced to 28 (see Table 2). To further improve the quality of the result, with respect to the intermediate values of w , an additional penalization is used. This penalization consists in adding the function P (12) to the objective, and exponentially increasing the weight of this function in the overall objective. As can be observed in Fig. 8 the penalization does not reduce the true objective function significantly. On the other hand the function P is reduced efficiently, and the resulting geometry is more binary. Finally we loose less of the performance by applying the threshold than we did in the case of optimization without penalization (compare Opt and Opt + T to Opt + P and Opt + P + T in Table 2). The optimized geometry is shown in Fig. 9. The increase in the objective function is 17% and in the mean temperature 21%, compared to the best fin geometry. When compared to the empty channel, the objective is 12 times higher, and the mean temperature is 22 times higher. The resulting geometry is complex and fractal-like. Such tendency is to be expected in a heat exchange problem, where the surface area of the solid structure should be maximized (to increase the heat flux). Similar complex shapes are reported in other works on topology optimization of heat exchangers with, or without flow [34]. The MMA optimization results in even better result of the objective function of around 33, but this value is reduced to 27.8 after applying the threshold. As in the earlier case, the one-shot optimization outperforms the MMA approach. Convergence of the MMA optimization is shown in Fig. 10. The number of iterations needed for a convergence of the solution (without the optimization) is around 40 thousand, for the one-shot optimization (with penalization) it is 0.4 million, and for the MMA optimization it is around 3.5 million. Additionally there is no simple method of adding the penalty gradually in MMA, which can prevent the drop of the objective function caused by the application of the threshold.

12

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



Fig. 8. Convergence of the optimization of free-topology heat exchanger with the one-shot algorithm. Penalty denotes the weight of function P (see (12)) in the objective function.

Fig. 9. Geometry of the free-topology heat exchanger. (a) front and (b) back.

4.3. Practical implementation The geometries generated by the presented topology optimization methodology are still far from a final industrial design. Rugged edges, fractal-like structures, free-floating elements, make the generated shapes impractical, if used directly as they are. The next design step would therefore consist in post-processing of the geometry. This step would be determined by the method of manufacturing used for the final product. The limitations imposed by manufacturing can also be taken into account at the earlier stages, either by modification of the objective function or by introduction of specialized geometry parametrization. This latter method is quite difficult to implement in the general case, but the presented generic adjoint formulation allows for a wide class of such parameterizations. Nevertheless the recent advancement in rapid prototyping techniques, such us 3D printing, or Electron Beam Melting (EBM), suggests that the use of more complex geometries (less simplified) will become a viable option in a near future. The current paper presents therefore a generic framework for topology optimization using Lattice Boltzmann and demonstrates that without any additional smoothing or artificial constraints on the volume, the algorithm can create optimal geometries. These geometries are solution to a simplified and geometrically unconstrained problem (what we call a free-topology problem).

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



13

Fig. 10. Convergence of the optimization of the free-topology heat exchanger with the MMA algorithm. In each iteration of MMA the algorithm (separated by vertical lines) the solution of the primal and the adjoint problem is iterated until convergence (see 3.3).

4.4. Parallel performance Parallel performance of the final code was investigated in a series of numerical experiments. The performance of any parallel code is strongly dependent on the size of the problem and the ratio between the calculations and communication. The latter however is directly related to the ratio of volume to surface of the part of the problem that each processor/GPU is calculating. We distinguish two types of scaling: weak and strong. Weak scaling measures the speedup of the code, if the size of the subproblem assigned to each processor is the same and fixed. Strong scaling on the other hand measures the speedup with a fixed size of the overall problem—causing the problem to be divided to smaller parts for higher number of processors. In most cases, as the size of the subproblems decreases, the communication starts to dominate the calculations. Such system reaches a saturation level at which adding new processors/GPUs, does not improve the performance. We have selected three meshes:

• ‘‘3M’’—320 × 100 × 100—3.2 × 106 elements—83 × 106 DoF • ‘‘400K’’—160 × 50 × 50—400 × 103 elements—10 × 106 DoF • ‘‘12M’’—320 × 200 × 200—12.8 × 106 elements—332 × 106 DoF. The calculations were performed on 1–6 nodes with 1–8 GPUs per node (1–48 GPUs in total) of ZEUS supercomputer at the Cyfronet computing center. Each computational node of the cluster consists of 2 Intel Xeon E5645 processors and 8 Nvidia Tesla M2090 GPU units. Weak scaling was investigated for ‘‘3M’’ and ‘‘400K’’ meshes, by running the CLB code for a mesh size proportional to the number of GPUs (in this case 3.2 × 106 and 400 × 103 elements per GPU). The strong scaling was checked for ‘‘3M’’ and ‘‘12M’’ meshes, by running the CLB code, for a fixed size mesh. The results are presented in Fig. 11. All calculations were performed in a single precision. The memory requirements of the flow simulation with heat transfer and adjoint calculation on the ‘‘12M’’ mesh is slightly above the memory limit of a single Nvidia Tesla M2090. The results clearly demonstrate that the code has nearly linear weak scaling for 3.2 × 106 elements per node. Strong scaling saturates at 15 GPUs for the ‘‘12M’’ mesh, and at 10 GPUs for the ‘‘3M’’ mesh. The interesting feature of good parallelism of the proposed adjoint formulation is that the adjoint calculations are closer to the linear scaling, than the primal calculations. This is caused by a average higher number of operations per node, while keeping the communication at exactly the same level. The ratio between the speed of the adjoint and the primal calculations is consistently around 2.65, which is in good agreement with theoretical estimates of 2–3 (as derivative of one multiplication is two multiplications and one addition). 5. Conclusions In the present work we have shown an efficient technique for addressing topology optimization in complex mesoscopic flow problems. The approach is based on solving the primal problem using the Lattice Boltzmann Method, while the sensitivities are obtained by the adjoint approach. The presented generalized local adjoint formulation for LBM allows to construct and efficiently solve the adjoint problem for a range of different LBM models, boundary conditions and forcing schemes.

14

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



Fig. 11. Weak and strong scalings of Primal and Adjoint calculations in CLB code.

The optimization process can be carried out either with the one-shot steepest descent method, or with the Method of Moving Asymptotes. This approach was successfully used for the optimization of a free-topology mixer and a heat exchanger (see Table 2). The resulting optimal 3D geometry has a very complex, almost fractal shape. Such complex shapes have to be considered not as a final design, but as a starting point for further design process. Such process should include feature size control, shape parametrization, shape optimization and finally analysis of robustness. Parts of the generated geometries are not attached to the walls of the channel. Other elements are very small or thin. Such features are impossible to reproduce in real applications, but can serve as a starting point of a design process, pointing in the optimal direction. Additionally further research is needed to assess the possibility of using a case-specific topology parametrization, which would take into account the selected manufacturing process of such geometry. One can easily imagine a parametrization, which would allow only extruded or only milled geometries. The one-shot optimization algorithm was shown to outperform the MMA algorithm in both cases. The main drawback of the presented one-shot approach (and a common drawback of many one-shot approaches) is the selection of parameter ζ . This parameter was selected by the trial-and-error, method, which is by no means general. Additionally, apart from the Augmented Lagrangian method, it is not clear how to introduce constraints into the presented one-shot approach. On the other hand the MMA algorithm provides consistently good results without the need to select arbitrary parameters. Additional research is thus necessary for improving the performance of the MMA loop. With application of any gradient based algorithm, a usual question of global minimum arises. The algorithm cannot guarantee that the design is not just a local optimum. As most of the global optimization algorithms cannot cope with a high number of parameters, the topology optimization methods cannot make use of them. Starting the gradient optimization algorithm from a set of different initial designs is a method of choice for investigating the stability of the position of the minimum with respect to the initial data. We did not investigate the question of this stability, as the goal of the proposed method was to provide optimized, or improved solution in an automated and efficient way. The question of global optimality of such complex designs as shown in Fig. 9 is very hard to address and is far out of the scope of present research. We have demonstrated that the code has nearly linear weak scaling, while the strong scaling saturates at 10–15 GPUs. This high efficiency was enabled by application of various code generation and parallel programming techniques. The proposed adjoint problem itself has even better parallel properties, keeping linear scalability even for smaller meshes. Overall, we have found the Adjoint Lattice Boltzmann to be a powerful technique for topology optimization in mesoscopic multi-physics flow problems. Application of this technique for the inverse design, for the optimal control or for the time-dependent problems, requires further research. Acknowledgements The work has been financially supported by the Polish Ministry of Science and Higher Education, research grant no. N N507 273636, as well as by the PL-Grid Infrastructure. Most of the simulations were performed on the ZEUS cluster at Cyfronet computing center (https://kdm.cyfronet.pl/portal/Zeus_GPGPU).

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



15

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

[24] [25] [26] [27] [28] [29] [30] [31] [32]

[33] [34] [35]

[36] [37] [38] [39] [40] [41] [42]

B. Mohammadi, O. Pironneau, B. Mohammadi, O. Pironneau, Applied Shape Optimization for Fluids, Vol. 28, Oxford University Press, Oxford, 2001. G.I.N. Rozvany, T. Lewinski, Topology Optimization in Structural and Continuum Mechanics, Springer Science & Business Media, 2013. A. Jameson, Aerodynamic shape optimization using the adjoint method, Lectures at the Von Karman Institute, Brussels. T. Borrvall, J. Petersson, Topology optimization of fluids in stokes flow, Internat. J. Numer. Methods Fluids 41 (1) (2003) 77–107. http://dx.doi.org/10.1002/fld.426. C. Othmer, A continuous adjoint formulation for the computation of topological and surface sensitivities of ducted flows, Internat. J. Numer. Methods Fluids 58 (8) (2008) 861–877. C.S. Andreasen, A.R. Gersborg, O. Sigmund, Topology optimization of microfluidic mixers, Internat. J. Numer. Methods Fluids 61 (5) (2009) 498–513. M.P. Bendsoe, Topology Optimization: Theory, Methods and Applications, Springer, 2003. C. Pan, L.-S. Luo, C.T. Miller, An evaluation of lattice Boltzmann schemes for porous medium flow simulation, Comput. Fluids 35 (8) (2006) 898–909. W. Regulski, J. Szumbarski, Ł. Łaniewski-Wołłk, K. Gumowski, J. Skibiński, M. Wichrowski, T. Wejrzanowski, Pressure drop in flow across ceramic foams—a numerical and experimental study, Chem. Eng. Sci. 137 (2015) 320–337. T. Lee, C.-L. Lin, A stable discretization of the lattice Boltzmann equation for simulation of incompressible two-phase flows at high density ratio, J. Comput. Phys. 206 (1) (2005) 16–47. http://dx.doi.org/10.1016/j.jcp.2004.12.001. S. Succi, The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond, Oxford university press, 2001. G. Pingen, A. Evgrafov, K. Maute, Topology optimization of flow domains using the lattice Boltzmann method, Struct. Multidiscip. Optim. 34 (6) (2007) 507–524. A. Jaworski, J.-D. Müller, Toward modular multigrid design optimisation, in: Advances in Automatic Differentiation, Springer, 2008, pp. 281–291. J.-D. Müller, P. Cusdin, On the performance of discrete adjoint CFD codes using automatic differentiation, Internat. J. Numer. Methods Fluids 47 (8–9) (2005) 939–945. http://dx.doi.org/10.1002/fld.885. J. Utke, L. Hascoët, P. Heimbach, C. Hill, P. Hovland, U. Naumann, Toward adjoinable MPI, in: Proceedings of the 10th IEEE International Workshop on Parallel and Distributed Scientific and Engineering, PDSEC’09, Rome, Italy, 2009. G. Pingen, A. Evgrafov, K. Maute, Adjoint parameter sensitivity analysis for the hydrodynamic lattice Boltzmann method with applications to design optimization, Comput. & Fluids 38 (4) (2009) 910–923. D. Makhija, G. Pingen, R. Yang, K. Maute, Topology optimization of multi-component flows using a multi-relaxation time lattice Boltzmann method, Comput. Fluids (2012). M.J. Krause, G. Thäter, V. Heuveline, Adjoint-based fluid flow control and optimisation with lattice Boltzmann methods, Comput. Math. Appl. 65 (6) (2013) 945–960. http://dx.doi.org/10.1016/j.camwa.2012.08.007. M.J. Krause, V. Heuveline, Parallel fluid flow control and optimisation with lattice Boltzmann methods and automatic differentiation, Comput. & Fluids 80 (2013) 28–36. http://dx.doi.org/10.1016/j.compfluid.2012.07.026. M.H. Hekmat, M. Mirzaei, Development of discrete adjoint approach based on the lattice Boltzmann method, Adv. Mech. Eng. (2014) URL: http://www.hindawi.com/journals/ame/2014/230854/abs/. M.H. Hekmat, M. Mirzaei, Extraction of macroscopic and microscopic adjoint concepts using a lattice Boltzmann method and discrete adjoint approach, Phys. Rev. E 91 (1) (2015) 013303. http://dx.doi.org/10.1103/PhysRevE.91.013303. G. Liu, M. Geier, Z. Liu, M. Krafczyk, T. Chen, Discrete adjoint sensitivity analysis for fluid flow topology optimization based on the generalized lattice Boltzmann method, Comput. Math. Appl. 68 (10) (2014) 1374–1392. http://dx.doi.org/10.1016/j.camwa.2014.09.002. A. Kirk, S. Kreissl, G. Pingen, K. Maute, Lattice Boltzmann topology optimization for transient flow, in: MAESC 2011 Conference, Christian Brothers University, Memphis, Tennessee, 2011. URL: http://www.researchgate.net/profile/Georg_Pingen/publication/262314027_Lattice_Boltzmann_Topology_ Optimization_for_Transient_Flow/links/0deec5374a31e013ab000000.pdf. K. Yaji, T. Yamada, M. Yoshino, T. Matsumoto, K. Izui, S. Nishiwaki, Topology optimization using the lattice Boltzmann method incorporating level set boundary expressions, J. Comput. Phys. 274 (2014) 158–181. http://dx.doi.org/10.1016/j.jcp.2014.06.004. K. Yonekura, Y. Kanno, A flow topology optimization method for steady state flow using transient information of flow field solved by lattice Boltzmann method, Struct. Multidiscip. Optim. 51 (1) (2014) 159–172. http://dx.doi.org/10.1007/s00158-014-1123-x. M. Tekitek, M. Bouzidi, F. Dubois, P. Lallemand, Adjoint lattice Boltzmann equation for parameter identification, Comput. Fluids 35 (8) (2006) 805–813. J. Tölke, M. Krafczyk, TeraFLOP computing on a desktop PC with GPUs for 3D CFD, Int. J. Comput. Fluid Dyn. 22 (7) (2008) 443–456. http://dx.doi.org/10.1080/10618560802238275. F. Kuznik, C. Obrecht, G. Rusaouen, J.-J. Roux, LBM based flow simulation using GPU computing processor, Comput. Math. Appl. 59 (7) (2010) 2380–2392. http://dx.doi.org/10.1016/j.camwa.2009.08.052. C. Obrecht, F. Kuznik, B. Tourancheau, J.-J. Roux, Multi-GPU implementation of the lattice Boltzmann method, Comput. Math. Appl. 65 (2) (2013) 252–261. http://dx.doi.org/10.1016/j.camwa.2011.02.020. M. Schönherr, K. Kucher, M. Geier, M. Stiebler, S. Freudiger, M. Krafczyk, Multi-thread implementations of the lattice Boltzmann method on nonuniform grids for CPUs and GPUs, Comput. Math. Appl. 61 (12) (2011) 3730–3743. http://dx.doi.org/10.1016/j.camwa.2011.04.012. M. l Dzikowski, J. Rokicki, Lattice Boltzmann method as a fine scale solver in multiscale method modeling surface texture. URL: http://www.wccmeccm-ecfd2014.org/admin/files/fileabstract/a1093.pdf. M. Dzikowski, J. Rokicki, Uszczelnienie czołowe: metody modelowania i analizy wpływu geometrycznej tekstury powierzchniowej, Zesz. Nauk. Politech. Świętokrzyskiej. Nauk. Tech. Z. 17 (2012) 43–49. URL: http://yadda.icm.edu.pl/baztech/element/bwmeta1.element.baztech-article-BSW10093-0004. K. Lee, Topology optimization of convective cooling system designs (Ph.D. thesis), The University of Michigan, 2012, URL: http://deepblue.lib.umich. edu/handle/2027.42/91462. E.M. Dede, Multiphysics topology optimization of heat transfer and fluid flow systems, in: Proceedings of the COMSOL Users Conference, 2009. URL: http://www.comsol.se/papers/6282/download/Dede.pdf. C. McConnell, G. Pingen, Multi-layer, pseudo 3D thermal topology optimization of heat sinks, in: ASME 2012 International Mechanical Engineering Congress and Exposition, American Society of Mechanical Engineers, 2012, pp. 2381–2392. URL: http://proceedings.asmedigitalcollection.asme.org/ proceeding.aspx?articleid=1751371. T. Pohl, F. Deserno, N. Thurey, U. Rude, P. Lammers, G. Wellein, T. Zeiser, Performance evaluation of parallel large-scale lattice Boltzmann applications on three supercomputing architectures, in: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, 2004, p. 21. D. d’Humières, Multiple–relaxation–time lattice Boltzmann models in three dimensions, Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 360 (1792) (2002) 437–451. Q. Zou, X. He, On pressure and velocity boundary conditions for the lattice Boltzmann BGK model, Phys. Fluids 9 (6) (1997) 1591–1598. http://dx.doi.org/10.1063/1.869307. A.L. Kupershtokh, D.A. Medvedev, D.I. Karpov, On equations of state in a lattice Boltzmann method, Comput. Math. Appl. 58 (5) (2009) 965–974. http://dx.doi.org/10.1016/j.camwa.2009.02.024. Mesoscopic Methods in Engineering and Science. X. Shan, H. Chen, Lattice Boltzmann model for simulating flows with multiple phases and components, Phys. Rev. E 47 (3) (1993) 1815. URL: http://journals.aps.org/pre/abstract/10.1103/PhysRevE.47.1815. H. Huang, M. Krafczyk, X. Lu, Forcing term in single-phase and shan-chen-type multiphase lattice Boltzmann models, Phys. Rev. E 84 (4) (2011) 046710. http://dx.doi.org/10.1103/PhysRevE.84.046710. K. Svanberg, The method of moving asymptotes—a new method for structural optimization, Internat. J. Numer. Methods Engrg. 24 (2) (1987) 359–373.

16

Ł. Łaniewski-Wołłk, J. Rokicki / Computers and Mathematics with Applications (

)



[43] E. Özkaya, N. Gauger, Single-step one-shot aerodynamic shape optimization, in: K. Kunisch, J. Sprekels, G. Leugering, F. Tröltzsch (Eds.), Optimal Control of Coupled Systems of Partial Differential Equations, in: International Series of Numerical Mathematics, vol. 158, Birkhäuser, Basel, 2009, pp. 191–204. URL: http://dx.doi.org/10.1007/978-3-7643-8923-9_11. [44] Ł. Łaniewski-Wołłk, RTemplate tool. URL: https://github.com/llaniewski/rtemplate. [45] R Core Team, R: A Language and Environment for Statistical Computing, Vienna, Austria, 2013. URL: http://www.R-project.org/. [46] L. Hascoët, V. Pascual, The tapenade automatic differentiation tool: Principles, model, and specification, ACM Trans. Math. Softw. 39 (3) (2013) URL: http://dx.doi.org/10.1145/2450153.2450158. [47] A. Gersborg-Hansen, O. Sigmund, R.B. Haber, Topology optimization of channel flow problems, Struct. Multidiscip. Optim. 30 (3) (2005) 181–192. http://dx.doi.org/10.1007/s00158-004-0508-7.