Shape optimisation method based on the surrogate models in the parallel asynchronous environment

Shape optimisation method based on the surrogate models in the parallel asynchronous environment

Accepted Manuscript Title: Shape optimisation method based on the surrogate models in the parallel asynchronous environment Author: Krzysztof Przysowa...

3MB Sizes 0 Downloads 9 Views

Accepted Manuscript Title: Shape optimisation method based on the surrogate models in the parallel asynchronous environment Author: Krzysztof Przysowa Łukasz Łaniewski-Wołłk Jacek Rokicki PII: DOI: Reference:

S1568-4946(18)30219-9 https://doi.org/doi:10.1016/j.asoc.2018.04.028 ASOC 4832

To appear in:

Applied Soft Computing

Received date: Revised date: Accepted date:

18-9-2017 26-2-2018 14-4-2018

Please cite this article as: Krzysztof Przysowa, Lukasz Laniewski-Wollk, Jacek Rokicki, Shape optimisation method based on the surrogate models in the parallel asynchronous environment, (2018), https://doi.org/10.1016/j.asoc.2018.04.028 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Shape optimisation method based on the surrogate models in the parallel asynchronous environment

Institute of Aeronautics and Applied Mechanics Warsaw University of Technology Nowowiejska 24, 00-665 Warsaw Tel.: +48 22 234 7444 Fax: +48 22 622 0901

us

cr

a

ip t

Krzysztof Przysowaa , Lukasz Laniewski - Wollka , Jacek Rokickia

Abstract

pt

ed

M

an

This paper proposes a new optimisation method, the Parallel Asynchronous Surrogate Model (PASM) method, which is based on the surrogate models approximation and takes advantage of the asynchronous, parallel processing threads. Additionally, it introduces the Cornering technique (PASM+C), which by using values from the corners of the design space provides a rapid drop of the value of the objective function and a significant reduction of the processing time. An overview and characteristics of the main reference optimisation methods, like Particle Swarm Optimisation PSO and Genetic Algorithm (GA), is presented, together with the results of the computer experiments involving optimisation of the generic car body shape. Significant attention is devoted to the time and computational effort needed for drag minimization by using different optimisation schemes. Finally, the benefits and limitations of the proposed methods are discussed together with their potential impact on the optimisation process workflow.

Ac ce

Keywords: parallel optimisation, mixed surrogate models, car aerodynamics, high performance computing, cloud computing, corners of the optimisation domain

1. Introduction

The researchers in many disciplines, especially related to mechanical Engineering Design Optimisation (EDO), attempt to deal with real life, complex optimisation problems of nonconvex and often discontinuous objective functions, which can be subjected to errors and are very computationally expensive. Due to the spread of Computer Aided Engineering simulation tools the large scale parallel and complex optimisation tasks are plausible but require considerable computing resources and execution times. The cloud-based HPC (High Email addresses: [email protected] (Krzysztof Przysowa), [email protected] (Lukasz Laniewski - Wollk), [email protected] (Jacek Rokicki) Preprint submitted to Applied Soft Computing

April 21, 2018

Page 1 of 31

Ac ce

pt

ed

M

an

us

cr

ip t

Performance Computing) provides potential solution for such problems but is constrained by the scalability and the cost per time used. During process of car development the interaction between industrial design and aerodynamics plays a critical role. Especially in the concept stage, there is a great need for delivering a car body shape that is not only stylish [1], but also provides as low as possible drag coefficient (cd ) to result in consequence in low fuel consumption and CO2 emissions [2, 3]. For the Electric Vehicles even further reduction of the drag coefficient is important due to its impact on the driving range and in consequence on the consumer adaptation rate of this type of the car [4]. Additionally, the fierce competition among automotive manufacturers and growing number of released models generate the need for shortening the time of the development process [5, 6]. These requirements create the natural need for assessing aerodynamic properties of more and more product variants [7]. Currently, the basic shape of every concept variant should be optimised. It is considered as the only way to achieve expected low aerodynamic drag later on, during the detail design phase. Due to the high number of models and their variants, the total aero-optimisation time should be measured in hours, not days or weeks. Only the automated process of optimising aerodynamical properties can fulfil this requirement [8, 9]. On the other hand, high power computational resources are available in the cloud, on demand. The relatively low cost, around $1/h for 16 CPUs computing instance [10, 11], offers the potential access to hundreds or even thousands of CPU cores simultaneously. At the same time, the majority of the commercial cloud HPC services charge directly per time of provisioned machines, irrespective of their utilisation level. Such circumstances generate a requirement of high levels of resource usage during optimisation runs. Also for these reasons the wallclock time, in other words time elapsed from start to finish, is in primary focus here. This needs to be distinguished from the processing time, which only considers active time spent on computing. In the cloud-based HPC services the computing nodes can be allocated to the different physical or even geographical locations. This feature adversely affect the communication between computing nodes and in consequence limits the scalability [12, 13, 14]. For this reasons the employment of the all available, cloud-based computing power for the classic, sequential, surrogate model assisted optimisation can be considered as inefficient. A possible solution to such a problem is the introduction of parallelisation also at the optimisation method level. In other words, multiple design variants are being computed simultaneously in the multiple parallel threads. The existing, popular optimisation methods [15, 8, 16, 17], are not entirely suitable for this task. In particular in the mainstream methods the multiple design variants are usually evaluated in batches, also known as populations. These populations are usually fixed in size and frequently require synchronisation at each optimisation step (as described below). The strictly parallel, surrogate model-based optimisation methods surveyed by Haftka [18] focus mostly on synchronous sampling. The notable exceptions are papers of Asouti [19], Janusevskis [20] and Ginsbourger [21], who investigated the asynchronous implementation. In the other methods, such as gradient-based methods, the size of the batch is dependent on the number of optimisation parameters. However, very often, neither population nor batch 2

Page 2 of 31

M

an

us

cr

ip t

size fit the corresponding allocation of available threads in the pool of the computational resources [8]. This approach also suffers from the limitation of the synchronous processing mode. The new batch or population is created only when all designs from the previous generation are completed. The CFD jobs in a batch differ in the execution time causing a situation in which a large part of the computing resources, in extreme cases almost all, are in the idle state for extended periods of time. In the industrial engineering practice, it is essential to achieve optimisation results as soon as possible, ideally overnight or over weekend runs. Frequently, especially in the concept phase, the optimisation processes are manually stopped without reaching the global minimum, when the satisfactory objective values (cd ) are considered as sufficient. Usually, in such cases, the operatorӀs time and computational resources are the limiting factors. However, if required, the same method should provide the possibility of reaching global minimum by extending the processing time. In order to achieve such functionality, the proposed method needs to be invulnerable to the traps of the local minima. Due to the lack of continuous human supervision, the technique used in the aero-optimisation process should also be insensitive to a limited number of failed designs. Purely from the algorithmic point of view, the aim of the study is to propose and test an optimisation method, that would be characterised by the following features (expressed further on for the particular case, when the drag coefficient of the car cd forms an objective of the optimisation): • Speed. Achieve low aerodynamic drag in relatively short time.

pt

ed

• Balance between domain search exploration and value of the objective function. Potential to achieve global domain minimum in a shorter time than other methods with similar capabilities. Insensitivity to the traps of the local minima of the non-convex functions.

Ac ce

• Ability to handle discontinuous objective functions. • Fault tolerance. The ability to continue processing even if a small number of designs fails to produce output (this is a common problem in the large CFD simulations). • High assets utilisation ratio. The capacity to make use of given computing resources without extended periods of idle time, measured as ratio of the actual computing time to the wallclock time of provisioned resources. • Relatively low computing effort measured in the total time of utilisation of computing resources needed to minimise the objective function. • Ability to respond to the hardware reconfiguration ’on the fly’. • Asynchronous operation mode.

3

Page 3 of 31

Formally, the discussed problem is the optimisation of a single objective with multiple continuous input parameters. x? = argmin J(x), x ∈ B (1)

B = {x ∈ Rm : xi,min ≤ xi ≤ xi,max }

ip t

where x ∈ Rm describes parameters of the investigated case, J stands for the objective function, while B denotes a hypercube describing the space of allowable parameters. (2)

Ac ce

pt

ed

M

an

us

cr

The objective function J is in our case evaluated in two steps. In the first, the flow solution is found for the geometry descibed by the given parameters x. In the second step the objective function, e.g., drag coefficient is calculated for this particular flow solution. The first step consists of a very lengthy / time consuming iterative procedure, which in extreme case may fail leaving the objective function unevaluated. Moreover, depending on the case, the number of iteration required for the convergence may vary by the order of magnitude, depending, e.g., on the starting point of the iterative procedure. The corresponding execution time in our case varies between approx. 100s and 10000s (see Figure 3). The second step is computationally inexpensive and never fails. This situation is typical for all aerodynamic shape optimisation problems. The fact that here the allowable space B is a hypecube is also quite typical, when the parameters x describe the geometric shape of the optimised object, and each is limited from below and above. This particular feature is necessary to use the Cornering technique (described further on), but not vital for the general parallel asynchronous approach. The novel approach, the Parallel Asynchronous Surrogate Model (PASM) method, which is proposed in this paper introduces asynchronous seeding of the new designs in the parallel environment. The design candidates are selected using the surrogate models created from the results of previous evaluations. The seeding is only performed at the time when the computing resources are available. Immediately after the seeding, the value derived from the surrogate model is stored and can be used later on in the process of building surrogate models for the new iterations. Once the expensive objective function is evaluated for this point, it overrides the estimated value. As a result, for the limited number of designs that are building the surrogate model, the values of the objective function are estimated from previous surrogate models generated in the earlier iterations of the process. Additionally, after initial sampling block, the seeding using the Cornering technique is introduced (PASM+C). This operation finds designs with the input parameters at the edge of the domain space and with a low objective function value. Such approach enhances the surrogate model building and speeds up the domain exploration process, which can result in achieving low objective function values earlier and with less effort. This novel approach is tested for the case of a flow around a bluff body resembling modern compact car. Although the aerodynamic CFD simulation is not the direct objective of this study, it is used to evaluate and analyse the performance of the proposed techniques in the typical engineering environment. 4

Page 4 of 31

2. Reference optimisation algorithms and testing system architecture

ip t

The performance and characteristics of the proposed methods were tested and compared with the reference algorithms. The reference methods were selected using criteria of variety (representation of major families of optimisation methods), suitability (previously published results for the similar EDO applications) and performance in the preliminary study (number of evaluations of the objective function needed for reaching minimum). After testing multiple algorithms in the preliminary study, the following methods were selected for the main phase of the experiment:

M

an

us

cr

• Genetic Algorithm (GA) - the optimisation procedure that imitates the natural biological evolution. It involves selection, crossover and mutation steps performed over given population [22, 23, 24, 25]. The fault tolerance is at the base of this method. Only the best surviving, (completed) designs are allowed in the crossover stage. The processing of the seeds is based on populations, which signifies synchronous operation. There are very high numbers of population and designs needed to converge to minimum. GA is capable of handling discontinuous and non-convex objective functions. Utilised by Long et al. [26], Ando and Takamura [27], Rokicki et al. [24], Lombardi et al. [15], Dumas [8] in aerospace and automotive tasks as well as by Burczy´ nski et al. [28] in the parallel shape optimisation problems.

pt

ed

• Particle Swarm Optimization (PSO) - the algorithm that mimics the social behaviour of a swarm of insects, fish or birds. It takes into account the best solution for both, every particle (individual), as well as for the entire swarm [29, 23, 30, 25]. The PSO features a population-based, synchronous approach capable of handling a failed design but not immune to traps of local minima. Investigated by Venter et al. [31] and Wickramasinghe et al. [32] in aerospace applications.

Ac ce

• Nelder-Mead Downhill Simplex Method (SIMPLEX) - gradient-free, heuristic search algorithm that utilises simplex geometrical structure [33] to find minimum value, with synchronous parallelisation [34]. It can handle solver noise and failed variants but at times struggles with non-convex functions and local minima. Used by Bekshi et al. [35] in the bluff body drag reduction study. • Adaptive Response Surface Method (ARSM) - a method that combines polynomial approximation and Design of Experiment (DoE). The size and position of the DoE built around the optimal design may vary with every optimisation iteration step [36, 37]. It is resistant to noise and failed variants, but the combination of global DoE and local response surface creates a possibility of missing the global minimum. The local approximation is performed in multi-seed iterations in synchronous requests. Used by Long et al. [26] in aerospace and coupled aero-structure applications. • Nonlinear Programming by Quadratic Lagrangian (NLPQL) - a very effective and efficient, local, gradient-based method [38], but sensitive to failed designs and local minima. The NLPQL method was used together with CFD simulation in multiple 5

Page 5 of 31

ip t

optimisation tasks: in aerospace [39], turbomachinery [40] and ground vehicle external aerodynamics [41]. The parallelisation is carried out in batches, in a synchronous mode. The size of the seeding batch for NLPQL technique is directly related to the number of the design parameters, which can be different than the number of the available parallel threads in the pool of the computing resources. This feature can significantly reduce the level of the resources utilisation.

cr

• Sequential - described in greater detail in Section 3, unsusceptible to local minima traps, but vulnerable to failed design variants, efficient but not effective due to the lack of parallelisation.

M

3. Elements of the optimisation process

an

us

These methods were given equal computational resources consisting of the eight computational threads, each with 16 Intel Xeon X5560 cores and 24GB RAM. This configuration was chosen due to multiple factors including: relatively good CFD model multiprocessor scalability as well as the best value for money (cost per CPU·h) in the commercial cloud computing services at the time of carrying out the experiments. The considered algorithms were using the flow field database of previous design variants in their runs. None of the synchronous techniques was capable of ’on the fly’ hardware reconfiguration.

Ac ce

pt

ed

The majority of the published surrogate model-based optimisations consists of the three major parts: initial sampling, surrogate model building and the infill sampling also known under the name of the sampling strategy [42, 43, 44, 45, 46]. The application of the surrogate models can bring insight into the nature of the design space already with the limited number of known designs. It is especially important for the expensive objective functions, like CFD simulations, where every design requires significant computing resources and evaluation times. In consequence, using surrogate models can significantly reduce both the time and effort spent on the optimisation task. The surrogate-based optimisation is also capable of handling noisy, discontinuous and non-convex objective functions. As a result, methods proposed in this paper rely heavily on that concept. The first of the suggested methods is a sequential optimisation scheme with a surrogate model, as discussed by Jones [47, 42], Lophaven [45], Queipo [48] and Wang and Shan [46]. This method is supplemented by mesh morphing technique, which allows adjustments of shape of the model mesh without changing mesh topology, as well as the introduction of a database for all previously computed variants. The workflow of this method is presented in Figure 1. After creating the CAD geometry, as well as the mesh (a and b in Figure 1), the model was ready for further processing. In the discussed method the initial sampling (c in Figure 1) was carried out using Latin Hypercube (LHC) algorithm introduced by McKay [49]. This technique was selected due to its right balance of random seeding and relatively homogeneous distribution across the domain for a small number of samples. It is commonly used for the initial sampling in the optimisation tasks [27, 45, 50, 48, 51, 52, 44, 53, 54]. The next step in the sequential optimisation algorithm was the mesh morphing (e in Figure 1) detailed 6

Page 6 of 31

Sequential Optimisation Loop

e)

f)

Mesh generation i)

a)

Data Base With Results For All Designs

Building Surrogate Model

Infill Sampling

an

CAD geometry

g)

us

b)

CFD Solver OpenFOAM

ip t

c)

Use Nearest Design as Start Point

Mesh Morphing

cr

Initial Sampling LHC

n)

m)

Objective Function Evaluation h)

M

Figure 1: Sequential optimisation process

Ac ce

pt

ed

in Section 6.2. During this step, the shape of the model was adjusted to the given values of Geometrical Optimisation input Parameters (GOP), taking particular care of the mesh quality near the solid boundary. Since the model with the morphed mesh has the same grid topology as the other variants, it is possible to use flow fields from the previously solved variant, as the CFD solution start point. This can be achieved without the need for additional steps like interpolation or secondary mapping. The CFD solution can be sped up by using the flow field of the already existing design as a start point for the computation of the current variant (f in Figure 1). Once the source (parent) design variant selected on the premise of the shortest Euclidean distance in the parametric space is found, the first guess of the flow field is transferred from the results database (i in Figure 1) to the working space of the currently processed (child) design variant. In the next step, the newly created design variant is solved using the simulation setup detailed in Section 6.1. The objective function is calculated on the basis of the converged solution (h in Figure 1). The optimisation input parameters and the objective function (e.g., cd coefficient) values are used to form a surrogate model (m in Figure 1). The composite surrogate modelling approach was adopted for this purpose. It is the combination of the cubic Radial Basis Function (RBF) interpolant [55] and the Multivariate Adaptive Regression Spline (MARS) approximation [56]. The merge of the surrogate models was carried out using DempsterShafer (DST) theory [57, 44], which enables to dynamically change the selected surrogate models’ weights in the mixture, depending on their fitness level. The surrogate model has been used as a base for the infill sampling operation (n in Figure 1). The infill sampling is the prescribed way of finding GOP values that will create 7

Page 7 of 31

us

cr

ip t

a new variant with a potentially lower objective function (cd coefficient). After carrying out initial tests the Candidate Points Algorithm (CAND) [44, 58, 55] has been chosen for that purpose. The CAND infill sampling technique creates a large set of pseudo-randomly chosen seeds from the entire design space. The second set of candidates are created by carrying out random perturbation on the best design so far. In the next step CAND uses the surrogate model to evaluate the value of the objective function for all of the candidate designs. Finally, the infill algorithm selects the best candidate point (derived from the surrogate model) on the basis of the combination of the objective function value and the distance to the previously computed variants. After defining a new design variant the sequential optimisation loop starts over and continues creating and solving new design variants until manually stopped by the operator (Figure 1). With every iteration the surrogate models represent with a greater detail the relationship between GOPs and the objective function. In consequence, the infill sampling has a bigger chance of exploring and exploiting the optimisation domain.

an

4. Parallel Asynchronous Surrogate Model (PASM)

M

In order to enable efficient parallelisation the modification of the sequential algorithm described in Section 3 is proposed. This modification consists of two additional components in the optimisation loop (see Figure 2). The first component is a monitor of HPC resources

Initial Sampling LHC

e)

Ac ce

Mesh generation b)

f)

Infill Sampling

n)

i)

Building Surrogate Model

CAD geometry a)

Use Nearest Design as Start Point

Mesh Morphing

pt

c)

ed

Parallel Optimisation Loop x 8 threads

g)

Data Base With Results For All Designs

Objective Function Estimation

m)

CFD Solver OpenFOAM

l)

Objective Function Evaluation

h)

Y Available HPC N Resources k)

Wait j)

Figure 2: Parallel optimisation process

(k in Figure 2), which starts a new computational loop, as soon as the necessary computing cores are available. The next processing thread is initiated in parallel, in an asynchronous mode, without waiting for the other computing jobs to finish. Such an arrangement creates the problem of the uniqueness of the source data, while building the surrogate model for every new iteration. In some cases, especially during the 8

Page 8 of 31

us

cr

ip t

first runs after the initial sampling phase, the new variants need to be defined without any new information on the values of the objective function. This problem has been resolved by the second component, which allows temporary use of response values, previously estimated by the surrogate model (l in Figure 2). The CAND (detailed in Section 3) infill sampling algorithm (n in Figure 2) takes into account the distance between the seeded points. This feature prevents seeding logic from focusing on a narrow region of the domain, to allow for a wider exploration. Other important feature of the CAND algorithm is that the weight assigned to the antagonistic goals (domain search and objective function minimisation) is variable during the optimisation process. This helps to better utilise the processing parallel threads by assigning them to both exploration and exploitation tasks at the same time. Furthermore, it can possibly soften the potentially adverse impact of the unavoidable error in the objective function estimation by spatially separating consecutive samples. Once the CFD solution is found, the temporary estimated value of objective function is replaced by the value of the objective function obtained from the flow field.

an

8.0% 7.0%

M

5.0% 4.0% 3.0%

ed

Frequency

6.0%

2.0%

0

2000

4000

6000

Computing time of single design varinat [s]

8000

10000

Ac ce

0.0%

pt

1.0%

Figure 3: Histogram of the duration of the CFD execution

The asynchronous infill sampling has been introduced in order to mitigate the impact of different duration of the objective function evaluation on the utilisation level of the resources. Due to use of the nearest design variant as a starting point of the CFD solution, the execution times can differ significantly. Two orders of magnitude differences are observed between computing times of various CFD runs (see Figure 3). An important characteristics of the presented parallel setup is the existence of the processing gap (Figure 4) between the block of initial sampling and the block of the surrogate model. Such a feature results in the suboptimal utilisation of the computing resources, when almost all threads are in the idle state. Simple means of closing the gap by increasing the number of initial samples does not bring a notable increase in effectiveness of the entire process. Neither the objective function value is decreased, nor the surrogate model is enriched. On the other hand, an attempt to 9

Page 9 of 31

#1 #2 #3 #4

#12 #15 #16

#21 #22 #23

#9 #14 #11 #10

#29 #30

#31 #24 #25 #32 #26 #27 #28

#13

#5 #6 #7 #8

#20

#18 #19 #17

#38 #39 #33

#41 #40

#37 #34 #36 #35

#44 #42 #45 #43

ip t

1 2 3 4 5 6 7 8

Surrogate model + Infill sampling

cr

Parallel threads

Initial sampling

Single design variant processing task

us

Processing gap

Figure 4: Parallel threads of jobs with the indicated processing gap

Ac ce

pt

ed

M

an

start the second phase, basing on the coarse surrogate model, provides similar results. In some cases, especially when the the number of available responses is similar to the number of the optimisation parameters the generation of a realistic surrogate model is not possible. Apparently adding a few designs to the surrogate model in the infill sampling phase brings no difference to the process that normally employs tens or hundreds of the design variants to be effective. The problem of the processing gap creates the need to define new variants in an alternative way to the initial sampling or the surrogate model (Figure 4). At this stage, any potential technique has a particularly small number of known objective function values at its disposal. Hence, the true nature of the relationship between input parameters and response values is unknown. On the other hand, there are significant, but idle computer resources available and ready for use. Such circumstances present an opportunity to introduce another process component, that fills the processing gap with the cases that will possibly enhance the search for minimum of the objective function. 5. Cornering technique

In order to address the problem of the processing gap (detailed in Section 4) the Cornering technique was proposed (d in Figure 5). This new component is introduced between the initial sampling (c in Figure 5) and the parallel optimisation loop (e to n in Figure 5). This enhanced setup is named as the Parallel Asynchronous Surrogate Model method with the Cornering technique (PASM+C). This method employs design variants with the values of the input parameters located at the border of the design space. Such approach is reasonable as, due to practical reasons, the optimisation domain is often very narrow (Figure 6) relative to the curvature of the optimised function. Very often the global unconstrained minimum is located outside of the domain. In such a case, it is possible that the minimum objective function for this parameter is at the domain boundary. Even if the global unconstrained minimum is located inside the optimisation 10

Page 10 of 31

Parallel Optimisation Loop x 8 threads

d)

Mesh generation b)

g)

Data Base With Results For All Designs

Infill Sampling i)

n)

Building Surrogate Model

CAD geometry a)

f)

e)

Objective Function Estimation

m)

CFD Solver OpenFOAM

Objective Function Evaluation

h)

cr

c)

Use Nearest Design as Start Point

Mesh Morphing

ip t

Cornering

l)

Y Available HPC N Resources k)

us

Initial Sampling LHC

Wait j)

an

Figure 5: Parallel optimisation process with Cornering section (PASM+C)

Ac ce

pt

ed

M

domain, selecting a nearby domain boundary value would be beneficial for the search of the minimum, by focusing infill sampling in that region. The occurrence of the minimum at the boundary of the optimisation domain for a number, or sometimes all, parameters is very popular both in the research and the engineering practice. It appeared in the work carried out by Gilkeson et al. [59], Jin et al. [60], Lewis et al. [61], Kuya et al. [62], Bingham et al. [63], van Beers et al. [64], Bursztyn and Steinberg [65], Craig et al. [66], Mason et al. [67] or Campana et al. [68]. This phenomenon seems to be independent from engineering area (general fluid dynamics, automotive external aerodynamics, ship building, solid mechanics in the aerospace etc.) and the type of the optimisation method (surrogate model based, PSO, SIMPLEX, GA). Considering the above, it is possible to state, that the occurrence of the minimum in region where parameters attain boundary values (edges or corners) is a rule not the exception. At this point, the method of selection of the upper or the lower boundary for each parameter, further known as ⥨corners⥹, needs to be explained. This task is carried out just after the initial sampling, for all already evaluated design variants. At this stage the number of samples is larger than m + 1, thus allowing to generate the linear approximation of the objective function in the m dimensional parametric space. The procedure to yield the most suitable corner design bases on the observation that the linear function always assumes its extremum at one of the corners of the hypercube space of allowable parameters (see Figure 7). This linear approximation is setup by computing linear regression for each parameter separately (least squares algorithm). The slope of each regression line gi (1, ..., m) is used to form the gradient g = (g1 , ..., gm ) and the linear score function is thus defined as: L(x) = g · x ≡

m X

gi xi

(3)

i=1

11

Page 11 of 31

ip t

domain

f(x)

cr

domain min

us

unconstrained min x

an

Figure 6: Optimisation domain and objective function values for single optimisation parameter

Ac ce

pt

ed

M

The slope of the regression line is used later to determine which border value (high or low) of the given parameter has a better chance to deliver a lower objective function value. In order to better visualise this procedure, the objective function (drag) values and the regression lines for the selected parameters have been presented in Figure 7. For the parameters #9 and #12, the lower and upper boundaries respectively are the obvious choices, as they deliver higher chances of minimising the objective function. The corresponding boundaries shall not be considered further on, as they provide modest odds of achieving the minimum. For the parameters #1 and #7, however, the pair of top and bottom bounds respectively offers similar chances to minimise the objective function. In such a case, it would be prudent to examine both of them. The priority belongs to the domain bound indicated by the lower tip of the regression line - the top bound for GOP #1 and the bottom bound for #7. Once the linear score function L(x) is established, its value is calculated at every corner of the solution domain B (2m corners have to be examined - 2m in our case was equal to 4096). The number of the corners may be large, but the evaluation of L(x) is extremely cheap. Small number of the best corner candidates (lowest value of the score function) are selected to supplement the existing pool of samples, for which the objective function J(x) is subsequently evaluated. The number of the corner candidates corresponds to the size of the processing gap and the number of the concurrent threads available. For the considered test case this value is equal to 8. The Cornering method (d in Figure 5) aims to find neither the global nor local minimum. It is rather meant to increase the effectiveness of the parallel surrogate model optimisation by filling up the processing gap with the meaningful cases. Having values at the selected corners of the optimisation domain has also a potentially positive impact on the fidelity of the surrogate model approximation. In other words, values in the corners of the domain act 12

Page 12 of 31

ip t cr us an M ed

Figure 7: Selection of parameter border values for domain corner variants

pt

as the anchors, stabilising and creating the more realistic surrogate model. The benefit of the cornering component is thus threefold:

Ac ce

(i) it provides rapid drop of the objective function value, (ii) it increases the hardware utilisation level by filling the processing gap, (iii) it improves the surrogate model by adding samples on the border of the optimisation domain. 6. Optimised shape and numerical model The presented optimisation approach is demonstrated for the test case to the optimise the car body shape. In order to allow for comparison with other methods a simplified car body is considered. The investigation of the simplified car bodies [16, 69, 61] or reference bluff bodies [8, 70, 71] is a common practice among aerodynamics researchers. This is motivated by the fact that the full fidelity CFD simulations are very expensive in terms of the computational time and effort. 13

Page 13 of 31

Ac ce

pt

ed

M

an

us

cr

ip t

In the present study the objective of the optimisation is a reduction of the aerodynamic drag, under a number of geometric constraints. The shape of the model used in this study was derived from the reference bluff body of 3/8 Ford Variable Geometry Model, proposed by Gilhaus and Renn [72]. This particular shape has been chosen due to its typical outline representing a modern compact car, having the well-known aerodynamic properties in the different geometrical configurations [73]. The main motivation, however, to select the simplified shape, was the fact that some model features (not critical from an optimisation point of view) significantly increase the computational effort for the solution of every design variant. Since the research involving different optimisation methods involves tens of thousands of CFD model evaluations, the model geometry has been simplified by removing the wheels and edge fillets. This operation ensured that the time spent on aerodynamic drag evaluation for every design variant is still kept on the acceptable level, without jeopardising the main, optimisation oriented, objective of the study. The negative aspect of such arrangement is the possibility of the flow separation at leading edges of the shape at lower speeds. Gilhaus and Renn [72] investigated in detail the impact on the sharp edges rounding off in original Ford 3/8 reference shape on the drag reduction. Their study quantified the aerodynamic impact of the fillets removal and provided insights into relationship between size of the fillets, shape of the rear of the car and the drag coefficient values. In the literature one can find multiple examples of removal of the fillets of the leading edges in similar automotive aerodynamics research as presented by Hu et al. [74], Ahmed and Chacko [75], Perzon [76], Gilkeson et al. [59], Kim et al. [77]. On the other hand such simplification enables much easier mesh morphing and in consequence, provides better mesh quality in the sensitive areas of the boundary layer for different design variations, even for these with the extreme values of the geometrical parameters. Taking all of the above into account it has been concluded that although the removal of wheels and fillets will have significant impact on the aerodynamic performance of the single design variant, it would not contribute greatly to the performance of the optimisation process and this aspect is the most important in the present study. The basic car shape has been modelled using ANSA software package [78]. The same software has been applied to generate both the surface and the volume mesh. Due to the symmetry of the problem in the X-Z plane, only half of the shape and its surroundings were included in the model. The grid model around the shape consisted of 4.8 x 106 cells (Figure 8) with the multiple mesh refinement areas. For all the solid walls, the boundary layer mesh was introduced. The floor was defined as stationary with the local slip area [2, 3]. The blockage ratio between car body and the walls was 2% [79]. 6.1. Simulation setup This study, with the exception of the geometrical simplification, tried to imitate the environment found typically in the automotive industry and used for the CFD simulation for the car external aerodynamics. All computations employed the SIMPLE (Semi-Implicit Method for Pressure-Linked Equations) procedure with the GAMG solver from the wellvalidated and versatile OpenFOAM package [80, 61, 16, 81], implementing the steady-state, 14

Page 14 of 31

ip t cr us

an

Figure 8: Mesh of the CFD flow model, general view, on the symmetry plane, with the mesh refinement in the regions of the expected wake and in the boundary layer

Ac ce

pt

ed

M

incompressible RANS (Reynolds Averaged Navier-Stokes) [15, 8] with Spalart-Allmaras turbulence model [69, 81, 82]. The Reynolds number was estimated to be Re ≈ 3.8 x 106 , which is considered to be in the standard range for the aerodynamic drag assessment (see Figure 9) for the wind tunnel tests [83, 84]. The computation of single design was terminated when the solver residuals fall below value of 10-6 . This is again quite typical value for CFD simulations in the optimisation framework, as it provides balance between the solution accuracy and a reasonable solution time. In rare cases where convergence was not reached the computations were automatically stopped after predefined time period. Neither the value of the objective function of such design was taken into account nor the flow field was used as a starting point for the other cases. 6.2. Optimisation parameters and mesh morphing For the purpose of this study m = 12 Geometrical Optimisation Parameters (GOP) were selected. The number of parameters between 10 and 20 is typical of similar research investigations [15, 69, 27, 8]. The geometry variation was focused on the 3 primary areas: back and front of the car as well as the windshield. The detailed map showing the geometric influence of all parameters is presented in Figure 10. The change of the parameter value was translated into the movement of the feature edge along its adjacent boundaries in the allowed range of -/+ 40mm. Taking into account 3/8 scale of the model and its total length of 1445mm (see Figure 11), the geometry modification can be considered as significant. The tolerance of the seeding proximity was assumed as 0.25% of the input parameter range. The model and the grid shape adjustments were performed using the morphing technique implemented by Beta CAE ANSA software package [78]. This technique is also a popular choice in the automotive industry as well as among the aerodynamics researchers for CFD 15

Page 15 of 31

ip t cr us

an

Figure 9: Streamlines around the car body

M

mesh deformation [7, 61, 16, 69]. The set of GOP values creates a unique shape of the studied model called a design variant or variant for short. 7. Performance of the optimisation process

Ac ce

pt

ed

In this study the primary measure of effectiveness was the value of minimum drag coefficient achieved after a fixed wallclock time (Figure 12).

16

Page 16 of 31

ip t cr us an M ed

Figure 10: Mesh morphing boxes with geometry optimisation parameters (GOP)

pt

0.40

Ac ce

Drag Coefficient Cd [-]

0.35

GA PSO SIMPLEX ARSM NLPQL sequential PASM PASM+C

0.30 0.25 0.20 0.15

0

50

100

150

Wallclock Time [h]

200

250

Figure 12: The comparison of selected optimisation methods. Only design variants with PBS (present best solution) [18] status are presented in here. For details17see Section 2: GA - 2, PSO - 2, SIMPLEX - 2, ARSM - 2, NLPQL - 2; sequential - Section 3, PASM - Section 4, PASM+C - Section 5

Page 17 of 31

ip t cr

us

Figure 11: Original (left) and optimised (right) shapes of the model

Ac ce

pt

ed

M

an

It is easily observed, that the performance of the selected optimisation methods can vary significantly. The PSO and SIMPLEX methods were not able to find the global minimum. It had taken between 100 and 250 hours to find the corresponding minima for GA, PSO, SIMPLEX and sequential methods. Other methods (Figure 13) were able to find the minimum in less than 40h, while the drag values in a reasonable proximity to a minimum were achieved in under 20h. The proposed methods PASM and PASM+C were characterised by attaining notably lower cd values already after less than 8h. The proposed asynchronous method which included the Cornering (PASM+C) algorithm was observed to be significantly faster than the other methods especially in the early hours of the optimisation process. Another aspect of the optimisation performance was assessed by comparing a cumulative computational effort as presented in Figure 14. This measure was equal to the processing time spent by all cores to achieve the given drag coefficient. The least efficient methods were PSO and SIMPLEX. The NLPQL scheme reached its minimum in just under 600 CPU·h. Unfortunately, NLPQL was also characterised by being sensitive to failed design variants and traps of the local minima. Therefore, it could not be classified as the best option for solving complex optimisation problems. Other techniques from the gradient family such as adjoint methods require dedicated or heavily customised computer code for every component of the optimisation process [69, 85, 86, 87, 88] and therefore may be less feasible for the application in the automotive aero-development environment. All the other techniques tend to reach a similar drag coefficient level after 1300 CPU·h. The proposed methods (PASM and PASM+C) were achieving a low level of drag very early, between 400 and 600 CPU·h. The вҐҶPASM+C⥪ method needed more resources in the first 300 CPU·h in comparison to the pure parallel method. This phenomenon can be explained by a greater average geometrical distance of the corner design variants from its peers. Since, for the corner designs, the CFD simulation starting point was usually significantly different, the computing took both more time as well as resources. The investment in computationally more expensive Corner design variants paid off however with a serious drop in the cd value. 18

Page 18 of 31

0.40

GA PSO SIMPLEX ARSM NLPQL sequential PASM PASM+C

ip t

0.30

cr

0.25 0.20 0

5

10

15

20

Wallclock Time [h]

25

30

35

40

an

0.15

us

Drag Coefficient Cd [-]

0.35

M

Figure 13: The comparison of selected optimisation methods. Focus on the first 40h. Only design variants with PBS (present best solution) [18] status are presented in here

ed

8000

4000 2000 00

pt

6000

Ac ce

Execution time for single CFD job [s]

10000

10

20

30

40

50

60

Normalised distance between parent and child design variants [-]

70

80

Figure 15: The anthill plot of the execution time of CFD job depending on the normalised distance between parent and child design variants. The normalisation was based on the distance between the centre and the corner of the hypercube domain

All compared (including reference) methods used the data base of the previous CFD flow solutions as a initial conditions guess for the drag evaluation (f in Figs. [1, 2,5]). Therefore 19

Page 19 of 31

0.40

GA PSO SIMPLEX ARSM NLPQL sequential PASM PASM+C

ip t

0.30

cr

0.25 0.20 0

500

1000

1500

Cumulative computational effort [CPU·h]

2000

2500

an

0.15

us

Drag Coefficient Cd [-]

0.35

M

Figure 14: The comparison of the optimisation methods in the view of cumulative computation effort. Only design variants with PBS (present best solution) [18] status are presented in here

Ac ce

pt

ed

it is also useful to assess how the execution time depends on the Euclidean distance between the starting point (parent - where the solution is known) and the current design (child - where the solution is sought). The parent geometry is always selected such, that this distance is smallest among designs existing in the data base. It is expected that selecting the starting solution, which corresponds to the geometry nearest to the present one, will result in smaller CPU simulation time. This effect can be observed in general in Figure 15, although substantial scatter is well visible. It is clear that the nearest (parent) design as a solution starting point does not guarantee faster convergence for every single design, however it provides the shortening of the time for the entire optimisation process. This is especially significant considering fact that together with increasing number of the designs in the optimisation process the average distance between the new and the parent design is reduced (Figure 16). The execution time of the single CFD job, at the end of the optimisation process, is on average 50% shorter that at the beginning. Without introduction of the parent solution as a guess of the initial conditions for the child CFD job, the average execution time curve in Figure 16 would be much more horizontal. Moreover, during exploitation phase of the infill sampling the distance between parent and child designs is usually very small reaching as low as 1% of the maximum distance, and therefore providing relatively very short convergence times of the CFD solutions. Such a phenomenon is particularly visible for the GA method, where new PBS (present best solution) values appears in the relatively short time intervals (Figure 12) and the relatively little computational effort (Figure 14). This shortening of the execution times can be explained by the small parametric distance between subsequent generations typical for the GA approach. 20

Page 20 of 31

4500

65

Normalised distance between parent and child designs [-] (rolling average)

execution time distance to parent 70 60

ip t

4000

55 50 45

cr

3500 3000 10

20

30 Wallclock Time [h]

40

40

50

an

0

us

Execution time of single CFD job [s] (rolling average)

5000

M

Figure 16: The history of the execution time of CFD job and the normalised distance between parent and child design variant for the single PASM optimisation run. There are significant differences between times for exploration (long distance) jobs and exploitation (short distance) jobs, hence the local spikes of the curves and need of rolling averaging

ed

100% 80%

pt

60% 40% 20% 0%

Ac ce

45.6% 20.9%

GA

PSO

77.2%

83.4%

42.7% 31.2%

17.4% SIMPLEX

13.8% ARSM

NLPQL sequential

PASM

PASM+C

Figure 17: The comparison of the optimisation methods in the view of resources utilisation level, measured as ratio of the actual computing time to the wallclock time of provisioned resources

Another comparison shows the utilisation level, measured as ratio of the actual computing time to the wallclock time of provisioned resources (see Figure 17). Most reference methods show levels in the range between 17% and 45%. This is caused by the combina21

Page 21 of 31

0.4 0.385

us

Drag reduction for 100th best corner design

an

0.3

start design

0.2

suspected global minimum

100

200

300

400

Ranking of the corner designs ordered by cd [-]

500

600

pt

0.172 0.15 1

Achievable reduction

M

0.25

ed

Drag Coefficient cd [-]

0.35

cr

ip t

tion of significant differences in the execution times of the CFD jobs (see Figure 3) and the population based synchronous execution mode, where a large part of the computing resources, in extreme cases almost all, are in the idle state for extended periods of time. As expected, the lowest utilisation level yields the sequential method, where the majority of the available threads are in the idle state during the time of the optimisation loop processing. The resources are best used there during the initial sampling, where designs are computed concurrently. The PASM method performs at a high level of 77.2%. The PASM+C method achieves an even greater value of 83.4%. The better utilisation of the computational resources arises, in this case, from the removal of the processing gap.

Ac ce

Figure 18: The value of the cd for corner designs. The corner designs are sorted from the best (left) to worst performing (right). The values of the drag coefficient for starting design and suspected global minimum were indicated with the horizontal lines.

For the purpose of verification true values of the drag coefficient were calculated for all (4096) corner design variants (in contrast to the Cornering algorithm for which only the cheap score function (3) is evaluated at all corners). It appears that the value of the objective function for the best performing corner designs is very close to the value of the suspected global minimum (presented in Figure 18). This observation leads to conclusion that by selecting a few corner designs with lowest value of the score function it is possible to significantly reduce the drag value. If the achievable reduction of the objective function (the difference between the value of the objective function at starting point and the value at the suspected global minimum) is defined as 100, the best corner design attains value of 97.1. In other words, it is feasible to exploit the value of the objective function to a point, that is almost reaching the level of the value of the suspected global minimum by carrying out just a couple evaluations of the expensive function in the corners of the domain. For comparison, the LHC algorithm (pseudo random selection) after 20 evaluations at22

Page 22 of 31

us

cr

ip t

tains 55.3% of the achievable reduction of the objective function. To illustrate how expensive is to reach the best corner design level of 97.1% achievable reduction, it is worth to mention that the GA method needs 658 objective function evaluations, 112h of parallel processing and 1254 CPU·h of computational effort to obtain similar value. The selection of the corner designs with the lowest objective function is not a trivial task, especially early in the optimisation process where there is just a limited number of responses available. The results of the Cornering algorithm has been presented in Table 1. The combination of High / Low parameter boundary value is indicated for every input, together with the value of the drag coefficient attained by this design. The value of the objective function for each of the selected design variants translates into the value of the Achievable reduction of the objective function and the Position of this design in the global ranking of all corner designs. In that ranking all the corner design variants are arranged by the increasing value of the drag coefficient. Table 1: Corner designs selected using the Cornering technique

Position in ranking [-]

89.6 88.9 96.6 96.0 83.2 81.6 89.6 81.5

13 16 2 4 20 21 14 22

an cd [-]

M

Input parameter No#

No

Achievable reduction [%]

1 2 3 4 5 6 7 8 9 10 11 12 H L H L H L H H

H H H H H H H H

H H H H H H H L

L L L L H H L L

L L L L L L L L

L L L L L L L L

L L L L L L L L

L L L L L L L L

L L L L L L H L

ed

H H L L H H H H

pt

L L L L L L L L

Ac ce

1 2 3 4 5 6 7 8

H H H H H H H H

0.188 0.189 0.172 0.174 0.202 0.205 0.188 0.206

H - High, L - Low boundary value of the input parameter

The proposed Cornering method is by no means perfect in the process of selection of the best performing corner designs. Although it provides significant (89.6%) achievable objective function reduction after the evaluation of just one corner design, the first selected corner design occupies crude 13th place on the ranking of all corner designs. The third selected design variant delivers excellent 96.6% of the achievable reduction of the objective function and holds 2nd place in the ranking of the corner designs. The Cornering method, in this particular test case, does not select the first design in the ranking with 97.1% of achievable reduction of the objective function. Most likely, the source of this shortcoming is the low number (13-20) of the available designs early in the process, which can produce different values of the regression slopes than a higher number of responses later on.

23

Page 23 of 31

8. Discussion

Ac ce

pt

ed

M

an

us

cr

ip t

The purpose of the study was to develop, present and test the novel optimisation methods based on the following criteria: Speed: The proposed optimisation methods: the Parallel Asynchronous Surrogate Model (PASM) optimisation method and the Parallel Asynchronous Surrogate Model optimisation method supplemented by the Cornering technique (PASM+C) achieved the main objectives of the study, as they provided relatively fast convergence to the minimum while delivering low drag values very early in the process. Balance between exploration and exploitation: Both, PASM and PASM+C were able to achieve the low values of the objective function relatively quickly and without falling into traps of the local minima. Fault tolerance and Ability to handle discontinuous objective functions: Due to the use of estimated drag values during the surrogate modelling, the proposed parallel methods proved to be insensitive against failed design variants. This feature also enables PASM and PASM+C to handle discontinuous objective functions. On the other hand, the initial sampling with the generic LHC algorithm was vulnerable to difficulties during CFD jobs execution. To avoid such a problem, this part of the process had to be modified. In the enhanced approach the failed designs were replaced with the new ones, sampled according to the LHC rules. Assets utilisation: Similarly, for both the pure parallel (PASM) and the parallel with Cornering technique (PASM+C) methods the utilisation level of the computational resources was significantly higher than the reference methods. This was possible since the computing of the expensive objective function, for a given thread in the asynchronous approach, is stopped for only the time needed for the generation of a new surrogate model, the infill sampling, the mesh morphing and the initialisation of the CFD solution. For the pure parallel approach, the processing gap (between components of the initial sampling and the surrogate modelling) caused unnecessary delay lowering the overall effectiveness of the parallel process. This problem was eliminated by replacing the processing gap with the Cornering block. This new component consisted of coarse approximation of the relation between every input parameter and the objective function by the linear regression and the selection of the designs in the corners of the hypercube optimisation domain. The introduction of the Cornering element of the algorithm, although proposed to fill the processing gap, brings also overall performance benefits for the whole minimisation task. The computational effort of the PASM and PASM+C methods was similar or lower than the reference schemes (with the exception of gradient-based NLPQL). Additionally, the proposed methods were achieving a very low level of the drag values with relatively modest computational effort. It is important to realise, that the cumulative computational effort of the parallel methods was not higher than the sequential surrogate-based optimisation approach. On that basis, it is possible to conclude that the estimation of the objective function during the parallel optimisation process, has no adverse impact on the scalability of the parallelisation (on the level of the optimisation method). In contrast to the reference methods, the features mentioned above provide a better flexibility to the operator, giving 24

Page 24 of 31

us

cr

ip t

a choice between short time/low reliability and longer computing time/high reliability of achieving the global minimum without the need to restart the optimisation process. This aspect potentially allows using the proposed methods in the broader range of optimisation tasks. The asynchronous approach provides the benefits of high resources utilisation and the capability of ’on the fly’ hardware reconfiguration. This study also provides insights into the nature of the objective function in the corners of the hypercube of the constrained domain and the potential opportunities of rapid and significant objective function minimisation using selected corner designs. Another unique aspect of this study was utilisation of the nearest (parent) design as a starting point for the CFD solution and the analysis of its impact on the execution time both of the single CFD simulation and on the entire optimisation process. Such a solution provides a considerable reduction of the average execution time of CFD job allowing for evaluation of more designs in shorter time.

an

9. Conclusions

Acknowledgements

pt

ed

M

It has been shown that, in contrast to the reference methods, the proposed PASM and PASM+C methods provide simultaneous benefits of high performance, immunity against the traps of the local minima, fault tolerance, ability to handle discontinuous functions, high level of assets utilisation and relatively low computational effort needed for objective function minimisation. The aforementioned features make the proposed methods to be perfectly suited for use in the cloud-based HPC environment, especially in the applications with highly variable objective function evaluation time.

Ac ce

The authors would like to express their gratitude to J. Korpela and M. Theman from the Nokia Salo R&D Centre for the opportunity of using the computational resources there, C. Othmer from the Volkswagen Group as well as other participants of European Commission FP7 FLOWHEAD Project (218626) [6] for the insight into the aerodynamics simulation and optimisation setup in the industrial environment. Additionally, authors would like to thank Alastair Smith for his contribution in editing and proofreading. References

[1] G. Le Good, C. Johnson, B. Clough, R. Lewis, The aesthetics of low drag vehicles, SAE International Journal of Engines 4 (2011-37-0016) (2011) 2638–2658. [2] M. Samples, A. Gaylard, S. Windsor, Aerodynamic development of the Range Rover Evoque, 8th MIRA Aerodynamics Conference. [3] A. Wagner, The aerodynamics of the new Audi Q5, 7th MIRA International Vehicle Aerodynamics Conference.

25

Page 25 of 31

Ac ce

pt

ed

M

an

us

cr

ip t

[4] R. Van Haaren, Assessment of electric cars range requirements and usage patterns based on driving behavior recorded in the national household travel survey of 2009, Earth and Environmental Engineering Department, Columbia University, Fu Foundation School of Engineering and Applied Science, New York 51 (2011) 53. [5] E. Stavropoulou, M. Hojjat, K.-U. Bletzinger, Fluid optimisation workflows for highly effective automotive development processes. [6] J.-D. M¨ uller, Fluid optimisation workflows for highly effective automotive development processes, http: //cordis.europa.eu/project/rcn/90102_en.html, [Online; accessed 23-June-2016] (2012). [7] J. Jagrik, S. Klepacek, V. Jakubec, J. Hulicka, Aerodynamic development of the Skoda Roomster, 7th MIRA International Vehicle Aerodynamics Conference, pp (2008) 62–74. [8] L. Dumas, CFD-based optimization for automotive aerodynamics, Optimization and Computational Fluid Dynamics, (2008) 191–215. [9] E. Ribaldone, G. Scantamburlo, Application of multi-objective optimization to the aerodynamic development of passenger cars at FIAT, in: 8th MIRA International Vehicle Aerodynamics Conference, 2010, pp. 110–119. [10] J. Barr, AWS Blog, Now Available вҐҮ New C4 Instances, https://aws.amazon.com/blogs/aws/ now-available-new-c4-instances/, [Online; accessed 27-August-2016] (2016). [11] C. Greenshields, Cost of CFD in the Cloud, https://cfd.direct/cloud/cost/, [Online; accessed 27-December-2016] (2016). [12] M. Behr, Comments on CFD code performance on scalable architectures, Computer Methods in Applied Mechanics and Engineering 190 (2000) 263–277. [13] G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, in: Proceedings of the April 18-20, 1967, spring joint computer conference, ACM, 1967, pp. 483–485. ˙ oltak, D. Drikakis, J. Majewski, Parallel performance of overlapping mesh technique for [14] J. Rokicki, J. Z´ compressible flows, Future Generation Computer Systems 18 (1) (2001) 3–15. [15] G. Lombardi, M. Maganzi, F. Cannizzo, E. Cardile, Use of the CFD for the aerodynamic optimization of the car shape: Problems and application, 4th European Automotive Simulation Conference. [16] S. Petropoulou, Industrial optimisation solutions based on OpenFOAM technology, in: V European Conference on Computational Fluid Dynamics ECCOMAS, 2010. [17] G. Venturelli, E. Benini, L. Laniewski-Wollk, A kriging-assisted multiobjective evolutionary algorithm, Applied Soft Computing 58 (2017) 155–175. [18] R. T. Haftka, D. Villanueva, A. Chaudhuri, Parallel surrogate-assisted global optimization with expensive functions – a survey, Structural and Multidisciplinary Optimization 54 (1) (2016) 3–13. doi:10.1007/s00158-016-1432-3. DOI http://dx.doi.org/10.1007/s00158-016-1432-3 [19] V. G. Asouti, I. C. Kampolis, K. C. Giannakoglou, A grid-enabled asynchronous metamodel-assisted evolutionary algorithm for aerodynamic optimization, Genetic Programming and Evolvable Machines 10 (4) (2009) 373–389. doi:doi:10.1007/s10710-009-9090-5. [20] J. Janusevskis, R. Le Riche, D. Ginsbourger, R. Girdziusas, Expected improvements for the asynchronous parallel global optimization of expensive functions: Potentials and challenges, in: Learning and Intelligent Optimization, Springer, 2012, pp. 413–418. [21] D. Ginsbourger, R. Le Riche, L. Carraro, Kriging is well-suited to parallelize optimization, in: Computational Intelligence in Expensive Optimization Problems, Springer, 2010, pp. 131–162. [22] H. Holland John, Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence, USA: University of Michigan. [23] Y. Wang, M. Borland, V. Sajaev, Exploration of parallel optimization techniques for accelerator design, Proceedings of 2011 Particle Accelerator Conference (2011) 787–789. [24] J. Rokicki, W. Stalewski, J. Zoltak, Multi-disciplinary optimisation of forward-swept wing, Evolutionary Methods for Design, Optimization and Control⥹, eds.: T. Burczynski and J. Periaux, CIMNE, Barcelona, Spain. [25] S. Skinner, H. Zare-Behtash, State-of-the-art in aerodynamic shape optimisation methods, Applied Soft

26

Page 26 of 31

Ac ce

pt

ed

M

an

us

cr

ip t

Computing. [26] T. Long, L. Liu, L. Peng, Y. Li, Aero-structure coupled optimization of high aspect ratio wing using enhanced adaptive response surface method, in: 14th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Indianapolis, Indiana, 2012, pp. 17–19. [27] K. Ando, A. Takamura, I. Saito, Automotive aerodynamic design exploration employing new optimization methodology based on cfd, SAE International Journal of Passenger Cars-Mechanical Systems 3 (2010-01-0513) (2010) 398–406. [28] T. Burczy´ nski, A. Dlugosz, W. Ku´s, Parallel evolutionary algorithms in shape optimization of heat radiators, Journal of Theoretical and Applied Mechanics 44 (2) (2006) 351–366. [29] J. Kennedy, R. Eberhart, Particle swarm optimization, in: Proceedings of the IEEE International Conference on Neural Networks, vol. Volume 4, 1995. [30] D. GmbH, Methods for multi-disciplinary optimization and robustness analysis, 2014. [31] G. Venter, J. Sobieszczanski-Sobieski, Multidisciplinary optimization of a transport aircraft wing using particle swarm optimization, in: 9th AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, 2002, pp. 4–6. [32] U. K. Wickramasinghe, R. Carrese, X. Li, Designing airfoils using a reference point based evolutionary many-objective particle swarm optimization algorithm, in: Evolutionary Computation (CEC), 2010 IEEE Congress on, IEEE, 2010, pp. 1–8. [33] J. Nelder, R. Mead, The downhill simplex method, Computer Journal 7 (1965) 308–313. [34] J. A. Hall, Towards a practical parallelisation of the simplex method, Computational Management Science 7 (2) (2010) 139–170. [35] S. Bakshi, B. A. Jawad, S. Arslan, K. Yee, L. Liu, Multi-objective optimization of a simplified car body using computational fluid dynamics, in: ASME 2015 International Mechanical Engineering Congress and Exposition, American Society of Mechanical Engineers, 2015, pp. V07BT09A037–V07BT09A037. [36] N. Stander, K. Graig, On the robustness of a simple domain reduction scheme for simulation-based optimization, Engineering Computations 19 (2002) 431–450. [37] G. Wang, Z. Dong, P. Aitchison, Adaptive response surface method. a global optimization, Engineering Optimization 33 (2001) 707–733. [38] K. Schittkowski, NLPQL: A fortran subroutine for solving constrained nonlinear programming problems, Annals of Operations Research 5 (1986) 485–500. [39] N. Bartoli, T. Lefebvre, S. Dubreuil, R. Olivanti, N. Bons, J. Martins, M. Bouhlel, J. Morlier, An adaptive optimization strategy based on mixture of experts for wing aerodynamic design optimization, in: 18th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, 2017, p. 4433. [40] A. Huppertz, P. M. Flassig, R. J. Flassig, M. Swoboda, Knowledge-based 2d blade design using multiobjective aerodynamic optimization and a neural network, ASME Paper No. GT2007-28204. [41] X. Gong, Z. Gu, Z. Li, X. Song, Y. Wang, Aerodynamic shape optimization of a container-truck’s wind deflector using approximate model, Tech. rep., SAE Technical Paper (2010). [42] D. R. Jones, M. Schonlau, W. J. Welch, Efficient global optimization of expensive black-box functions, Journal of Global optimization 13 (4) (1998) 455–492. [43] M. Ahmed, N. Qin, Surrogate-based aerodynamic design optimization: use of surrogates in aerodynamic design optimization, in: 13th International Conference on Aerospace Science & Aviation Technology, Cairo, Egypt, 2009, pp. 26–28. [44] J. M¨ uller, MATSuMoTo: The MATLAB surrogate model toolbox for computationally expensive blackbox global optimization problems, preprint, arXiv (2014). arXiv:1404.4261. [45] S. N. Lophaven, H. B. Nielsen, J. SГҷndergaard, DACE - a MATLAB kriging toolbox, version 2.0. [46] G. G. Wang, S. Shan, Review of metamodeling techniques in support of engineering design optimization, Journal of Mechanical design 129 (4) (2007) 370–380. [47] D. Jones, Global optimization with response surfaces, in: Fifth SIAM conference on optimization, Victoria, Canada, 1996. [48] N. V. Queipo, R. T. Haftka, W. Shyy, T. Goel, R. Vaidyanathan, P. K. Tucker, Surrogate-based analysis and optimization, Progress in aerospace sciences 41 (1) (2005) 1–28.

27

Page 27 of 31

Ac ce

pt

ed

M

an

us

cr

ip t

[49] M. D. McKay, R. J. Beckman, W. J. Conover, Comparison of three methods for selecting values of input variables in the analysis of output from a computer code, Technometrics 21 (2) (1979) 239–245. [50] S. N. Lophaven, H. B. Nielsen, J. Søndergaard, Aspects of the MATLAB toolbox DACE, Tech. rep., Informatics and Mathematical Modelling, Technical University of Denmark, DTU (2002). [51] J. Liu, Z. Han, W. Song, Comparison of infill sampling criteria in kriging-based aerodynamic optimization, in: 28th Congress of the International Council of the Aeronautical Sciences, 2012, pp. 23–28. [52] S. Krajnovic, Aerodynamic optimization of vehicles using computational fluid dynamics and response surface methodology, Science & Motor Vehicles Belgrad. [53] R. L. Lietz, Vehicle aerodynamic shape optimization, Tech. rep., SAE Technical Paper (2011). [54] F. Campos, P. Geremia, E. Skaperdas, Automatic optimization of automotive designs using mesh morphing and cfd, Proceedings of the 3rd EACC, Frankfurt, Germany. [55] R. G. Regis, C. A. Shoemaker, A stochastic radial basis function method for the global optimization of expensive functions, INFORMS Journal on Computing 19 (4) (2007) 497–509. [56] J. H. Friedman, Multivariate adaptive regression splines, The annals of statistics (1991) 1–67. [57] J. M¨ uller, R. Pich´e, Mixture surrogate models based on Dempster-Shafer theory for global optimization problems, Journal of Global Optimization 51 (1) (2011) 79–104. [58] J. M¨ uller, User guide for modularized surrogate model toolbox, Department of Mathematics, Tampere University of Technology, Tampere, Finland. [59] C. Gilkeson, V. Toropov, H. Thompson, M. Wilson, N. Foxley, P. Gaskell, Dealing with numerical noise in cfd-based design optimization, Computers & Fluids 94 (2014) 84–97. [60] R. Jin, X. Du, W. Chen, The use of metamodeling techniques for optimization under uncertainty, Structural and Multidisciplinary Optimization 25 (2) (2003) 99–116. [61] R. Lewis, A. Mosedale, I. Annetts, Using OpenFOAM and ANSA for road and race car CFD, 3rd ANSA & BETA International Conference. [62] Y. Kuya, K. Takeda, X. Zhang, A. I. Forrester, et al., Multifidelity surrogate modeling of experimental and computational aerodynamic data sets, AIAA journal 49 (2) (2011) 289. [63] D. Bingham, P. Ranjan, W. J. Welch, Design of computer experiments for optimization, estimation of function contours, and related objectives, Statistics in Action: A Canadian Outlook 109. [64] W. C. Van Beers, J. P. Kleijnen, Kriging interpolation in simulation: a survey, in: Simulation Conference, 2004. Proceedings of the 2004 Winter, Vol. 1, IEEE, 2004. [65] D. Bursztyn, D. M. Steinberg, Comparison of designs for computer experiments, Journal of Statistical Planning and Inference 136 (3) (2006) 1103–1119. [66] K. J. Craig, N. Stander, D. Dooge, S. Varadappa, Mdo of automotive vehicle for crashworthiness and nvh using response surface methods, AIAA paper 5607. [67] B. H. Mason, R. T. Haftka, E. R. Johnson, G. L. Farley, Variable complexity design of composite fuselage frames by response surface techniques, Thin-walled structures 32 (4) (1998) 235–261. [68] E. F. Campana, G. Liuzzi, S. Lucidi, D. Peri, V. Piccialli, A. Pinto, New global optimization methods for ship design problems, Optimization and Engineering 10 (4) (2009) 533. [69] C. Othmer, Adjoint methods for car aerodynamics, Journal of Mathematics in Industry 4 (1) (2014) 1–23. [70] R. den Braembussche, Optimization and computational fluid dynamics (2008). [71] G. Buresti, G. Iungo, G. Lombardi, Methods for the drag reduction of bluff bodies and their application to heavy road-vehicles, 1st Interim Report Contract between CRF and DIA, DDIA2007-6. [72] A. M. Gilhaus, V. E. Renn, Drag and driving-stability-related aerodynamic forces and their interdependence: Results of measurements on 3/8-scale basic car shapes, SAE Technical Paper 860211. [73] G. L. Good, K. Garry, On the use of reference models in automotive aerodynamics, SAE World Congres 1308. [74] X. Hu, B. Yang, Y. Lei, J. Wang, X. Li, L. Liao, T. Xu, Automotive shape optimization using the radial basis function model based on a parametric surface grid, Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering 230 (13) (2016) 1808–1821. [75] H. Ahmed, S. Chacko, Computational optimization of vehicle aerodynamics, in: Proc. of the 23rd

28

Page 28 of 31

Ac ce

pt

ed

M

an

us

cr

ip t

International DAAM Symposium, Vol. 23, 2012, pp. 313–318. [76] S. Perzon, On blockage effects in wind tunnels–a cfd study, Tech. rep., SAE Technical Paper (2001). [77] J. Kim, M. V. Pattermann, Justin, Mokhtar, Cfd optimization via sensitivity-based shape morphing, in: Proceedings of the 2011 ASEE North Central & Illinois-Indiana Section Conference, 2011. [78] S. A. Beta Cae Systems, ANSA user guide, v15.1.x. [79] G. Carr, W. Stepleford, Blockage effects in automotive wind-tunnel testing, SAE Technical Paper 860093. [80] OpenCFD Ltd., Openfoam user guide (2014). DOI http://www.openfoam.com/documentation/ [81] S. Thomas, O. Carsten, Adjoint optimization for vehicle external aerodynamics, International Journal of Automotive Engineering 7.1 (2016) 1–7. [82] T. Janson, J. Piechna, Numerical analysis of aerodynamic characteristics of a of high-speed car with movable bodywork elements, Archive of Mechanical Engineering 62 (4) (2015) 451–476. [83] W.-H. Hucho (Ed.), Aerodynamics of road vehicles: from fluid mechanics to vehicle engineering, Elsevier, 2013. [84] J. Piechna, Podstawy aerodynamiki pojazd´ow, Wydawnictwa Komunikacji i La˛czno´sci, 2000. [85] A. Jaworski, J.-D. M¨ uller, Toward modular multigrid design optimisation, Advances in Automatic Differentiation (2008) 281–291. [86] L. Laniewski-Wollk, J. Rokicki, Adjoint lattice boltzmann for topology optimization on multi-gpu architecture, Computers & Mathematics with Applications 71 (3) (2016) 833–848. [87] E. Stavropoulou, M. Hojjat, K.-U. Bletzinger, Uncertainty management for robust industrial design in aeronautics (2013). DOI www.umrida.eu [88] D. Jones, J.-D. M¨ uller, F. Christakopoulos, Preparation and assembly of discrete adjoint CFD codes, Computers & Fluids 46 (1) (2011) 282–286.

29

Page 29 of 31

e

Graphical abstract (for review) Shape optimisation method based on the surrogate models in the parallel asynchronous environment

Mesh generation ANSA

CAD geometry

Mesh Morphing ANSA

Use Nearest Design as Start Point

CFD Solver OpenFOAM

Asynchronous Infill Sampling - CAND

Data Base With Results For All Designs

Objective Function (drag coef. - cd) Evaluation

Mixed Surrogate Models RBF + MARS

Objective Function Estimation

Y Available HPC N Page Resources

ce

Sampling in the corners of design space Cornering

Ac

Initial Sampling LHC

pt

Parallel Optimisation Loop, Multiple Asynchronous Threads

30 ofWait 31

ip t

*Highlights (for review)

cr

• This paper introduces Parallel Asynchronous Surrogate Model (PASM) method for Engineering Design Optimisation.

us

• The PASM method is enhanced by the use of the Cornering technique (PASM+C), which selects the best performing designs in the corners of the parametric design space.

an

• The performance and other features of PASM and hybrid PASM+C are compared with variety of popular optimisation methods in the test case of car aerodynamics shape optimisation.

Ac ce p

te

d

M

• This study provides insight into the wallclock time, fault tolerance and computing resources needed for the minimisation of the objective function, important factors when using cloud-based High Performance Computing (HPC) services.

Page 31 of 31