Simulation of eutectic growth using phase field method and LBM based on OpenCL

Simulation of eutectic growth using phase field method and LBM based on OpenCL

Computational Materials Science 176 (2020) 109523 Contents lists available at ScienceDirect Computational Materials Science journal homepage: www.el...

1MB Sizes 0 Downloads 19 Views

Computational Materials Science 176 (2020) 109523

Contents lists available at ScienceDirect

Computational Materials Science journal homepage: www.elsevier.com/locate/commatsci

Simulation of eutectic growth using phase field method and LBM based on OpenCL

T



Chang-sheng Zhua,b, , Zhong-yan Denga, Li Fengb, Yu-jie Lia a b

College of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China State Key Laboratory of Gansu Advanced Processing and Recycling of Non-Ferrous Metal, Lanzhou University of Technology, Lanzhou 730050, China

A R T I C LE I N FO

A B S T R A C T

Keywords: OpenCL PF-LBM Eutectic growth Numerical simulation GPU computing

During the solidification of alloys, numerical simulation of forced convection eutectic growth is essential for accurate prediction and control of solidified microstructure. The phase field lattice Boltzmann model (PF-LBM) is established by coupling the phase field model of KKSM model with the LBM model of computational fluid. In order to solve the problems of PF-LBM model, which has a large amount of computation, the simulation region is small and can not be solved in a reasonable time. The parallel computing architecture based on OpenCL can give full play to the powerful computing power of NVIDIA GPU and accelerate the implementation of the algorithm to improve the efficiency of numerical simulation. The effect of forced convection on the eutectic growth of CBr4C2Cl6 eutectic alloy is systematically studied. The results showed that liquid flow altered the eutectic evolution by affecting the distributions of the solute ahead of the eutectic solid/liquid interface. Compared with serial CPU code, a single GPU can achieve a maximum acceleration ratio of 136.X. Solved the problem of 3-D numerical simulation of forced convection eutectic growth due to computational limitations.

1. Introduction As a basic solidification method, eutectic growth is a self-organizing phenomenon for simulating multi-phase solidification of complex alloys in solid/liquid phases. The essence of eutectic growth is a two-phase morphology selection and symbiotic process, and its interface morphology evolution and the choice of rules has a major impact on the final tissue performance. The eutectic alloy microstructure has become a research hotspot [1,2]. The Phase-Field Method (PFM) is a method for numerically simulating the effective microstructure and distribution of metal during solidification. It plays an important role in the study of solidification microstructure evolution [3–7]. Comparing with other phase field models, KKSM multi-component alloy eutectic multi-phase field model based on the S.G. Kim, W.T. Kim and T. Suzuki et al. overcomes the extra double well potential energy in the multiphase field and breaks through the limitation of the interface thickness parameter [8]. KKSM model is suitable for solidification process of eutectic heterogeneous alloy. The flow of molten metal caused by natural convection or forced convection is an inevitable phenomenon in the solidification process. The flow affects the microstructure of the material after solidification, including the loosening of the constituents and the distribution of defects in the pores [9,10]. The convection of the molten metal will affect ⁎

the distribution of the solute field and the temperature field, and produce different morphologies of eutectic growth under pure diffusion conditions [1,11]. So the flow field is one of the important factors affecting the growth of eutectic. At present, the simulation study of the dendritic structure of the coupled flow field has made a lot of progress [12–15]. However, the simulation of eutectic growth in coupled flow field is relatively few. Wang et al. [16] simulated the eutectic growth mode of the binary system by Monte Carlo convection. Zhang et al. [11] studied the effect of forced convection on the growth of layered eutectics using modified Jackson-Hunt theory and phase field latticeBoltzmann method. The eutectic simulation studies are mostly limited to two-dimensions. Two-dimensional simulation studies cannot fully characterize solidified structures. Therefore, it is necessary to carry out simulation studies on the three-dimensional eutectic growth of the coupled flow field [17–19]. The phase field variables introduced by the phase field model bring about boundary layer changes. The thickness of the interface layer is thin, and the amount of mesh is huge. So that the phase field equations are complicated and difficult to solve. The performance bottleneck of traditional solution calculation has restricted the development of phase field model. At present, only small regions can be simulated, and the solution can not be completed at a reasonable time. GPU computing is playing an irreplaceable role in various fields, and has become an important means of researching and

Corresponding author at: College of Computer and Communication, Lanzhou Univ. of Tech., Lanzhou 730050, China E-mail address: [email protected] (C.-s. Zhu).

https://doi.org/10.1016/j.commatsci.2020.109523 Received 8 November 2019; Received in revised form 5 January 2020; Accepted 6 January 2020 0927-0256/ © 2020 Elsevier B.V. All rights reserved.

Computational Materials Science 176 (2020) 109523

C.-s. Zhu, et al.

solving challenging problems in various fields [20]. C.S. Zhu et al. [21] based on GPU implemented the simulation of phase field. Yamanaka et al. [22] achieved 100 times acceleration on the computational scale by implementing GPU acceleration for phase field modeling using a phase field method on a single NVIDIA TESLA C1060 GPU. Cong et al. [23] used the phase field method to study the microstructure formation of nickel-based superalloys during three-dimensional solidification. A single NVIDIA GTX1080 GPU can achieve the maximum performance of 774.292 GFLOPS (giga floating point operations per second). In summary, the parallel implementation of the 3D phase field model in GPU has become a research hotspot. But the phase field method is applied to eutectic, especially the simulation of coupling with the flow field is relatively rare [21–25]. Although GPU computing has been developed at home and abroad, most of them use the CUDA programming framework. We also explored the use of CUDA to accelerate the phase field model [21,25]. However, CUDA currently only supports Nvidia GPU, the hardware devices produced by different manufacturers cannot complete the development of programs with a unified programming framework, which hinders the wide application of high performance computing. OpenCL (Open Computing Language) is an industry-standard framework for computer programming consisting of multi-core CPUs, GPUs, FPGAs, and other coprocessors [26]. OpenCL supports a wide range of hardware, expresses massively parallel computing logic in a simple and effective way and expresses massively parallel computing logic in a simple, efficient and reasonable way, and make full use of GPU hardware resources to improve program performance. In this study, the OpenCL program was used to solve the PF-LBM model without losing the accuracy of calculation, and explored the difficulty of 3-D numerical simulation of forced convection eutectic growth. Finally, the number of layers is appropriately enlarged to study the effect of forced convection on eutectic growth.

condition; and the D3Q15 and D3Q19 lattices are always used in 3D condition. In the study three-dimensional nineteen-velocity (D3Q19) model is employed to determine the incompressible fluid flow. The distribution function of the D3Q19 model is:

fα (r + eα Δt , t + Δt ) = fα (r , t ) + Ω(f (r , t ))

where Ω(f (r , t )) is the particle collision function in the α direction; fα (r , t ) is the velocity distribution function of the particle in the α direction at time t . By introducing the Bhatnagar-Gross-Krook (BGK) collision operator, we can obtain the following lattice BGK (LBGK) model:

1 fα (r + eα Δt , t + Δt ) − fα (r , t ) = − (f αeq (r , t ) − fα (r , t )) τ

(e ·u)2 e ·u u2 f αeq = ρξa ⎡ α 4 + α 2 − 2 + 1⎤ ⎥ ⎢ 2cs c 2 cs s ⎦ ⎣

⎛0 1 0 0 − 1 0 0 1 − 1 1 − 1 1 − 1 1 − 1 0 0 0 0⎞ e = cl· ⎜ 0 0 1 0 0 − 1 0 1 1 − 1 − 1 0 0 0 0 1 − 1 1 − 1 ⎟ ⎝0 0 0 1 0 0 − 1 0 0 0 0 1 1 − 1 − 1 1 1 − 1 − 1⎠

⎧1 3, α = 0 ξα 1 18, α = 1 − 6 cs = cl ⎨ ⎩1 36, α = 7 − 18

ρ=

In this paper, the phase field model uses the KKSM multi-phase field model, and its specific derivation process is detailed in the literature [27]. The following is the eutectic phase field governing equation for the KKSM multi-phase field model.

j≠i

δF





i

δF ⎤ δφj ⎥ ⎦

j≠i





(6)

u=

1 ρ

18

∑ fα eα α=0

(7)

2.3. PF-LBM model After the flow field is added, the eutectic alloy is redistributed by the solute during the solidification process. The expression of the eutectic alloy solute diffusion equation affected by convection is:

(1)

∂c = ∇ ·D ∑ ϕi ∇ci − ∇ (u·c ) ∂t i

εij2

∑ ⎡⎢ 2 ∇2 φj + ωij φj⎤⎥ + f i (ci) − ci fc

∑ fα , α=0

where

δF = δφi

3

where cl is the lattice speed. The fluid density and macroscopic velocity of the D3Q19 model are as follows: 18

∑ sij Mij ⎡⎢ δφ

(5)

where u is the flow velocity; ρ is the local fluid density; ξα is the weight coefficient; cs is the lattice sound speed. D3Q19 model discrete speed expression and the value of weight coefficient:

2.1. Phase field model

∂φi 2 =− ∂t n

(4)

where τ is the relaxation time; f αeq (r , t ) is the particle distribution function in equilibrium. In LBM model, particle distribution function on balance condition is:

2. The establishment of eutectic phase-field coupled flow field model

n

(3)

(2)

(8)

where c represents the solute concentration; u is the flow velocity vector of the melt. D is the solute diffusion coefficient in the liquid. In the calculation domain, the initial eutectic layer spacing is changed by setting the number of grids of the initial eutectic layer. The undercooling ΔT = 0.1. The pulling speed V = 2.0μm s . The step space Δx = Δy = Δz = 0.2μm , time step Δt = 10.04 × 10−5s , Δt meet the following conditions in 3D simulations.

Where n represents the number of coexisting phases at a given point; F is the volume free energy function; φi respectively respects volume fractions of each phase; sij is Interface field variable; εij is gradient energy coefficient; f is the free energy densities; c is the compositions of each phase. Mij is the phase field kinetic coefficient; ωij is the interface thickness parameter. 2.2. Flow model

Δt1 ⩽

The Lattice Boltzmann Method (LBM) is a numerical method for simulating fluid motion based on molecular motion theory and statistical mechanics. LBM regards fluid as a large number of particles discrete on the grid, and these particles collide and migrate regularly on the grid. Qian et al. with the proposed the famous DdQm model in 1992. Where d is the spatial dimension and m is the discrete velocity number. In commonly, the D2Q9 lattices are always used in 2D

(Δx )2 (Δx )2 ·Δt2 ⩽ 6D 2 6Mij εij2

Δt = 0.9Min {Δt1, Δt2}

(9)

In the flow field, the velocity of the fluid, the density of the fluid, and the relaxation velocity in the LBM also affect the final eutectic growth pattern. This article sets the correlation to the following Table 1. 2

Computational Materials Science 176 (2020) 109523

C.-s. Zhu, et al.

scientific computing. According to the fixed geometry value of adjacent grid points, the neighborhood calculation of a large number of time step iterations depends on the evolution of the propulsion system, and each point in the structured grid is updated by the expression. Fig. 1(d) shows the spatial calculation mode of phase field, solute field and flow field, respectively. The three-dimensional phase field and the solute field are composed of a 19-point Stencil calculation and a 7-point Stencil calculation, respectively. The LBM simulation area is divided into a cube unit of the same size as the phase field and the solute field, and a regular mesh with the phase field and the solute field is constructed. The mesh model of the flow field in D3Q19 model, the particle distribution function needs to propagate in 19 directions, and it is necessary to create an array for the distribution function in each direction. To facilitate the processing of boundary conditions, add an extra point around the entire calculation grid. Without changing the phase field and solute field value of the current time step, an array of the same size is required as an intermediate variable to store the values of the newly generated phase field and the solute field, so an array with a plurality of storage spaces of (L + 2)(W + 2)(H + 2) is required.

Table 1 LBM flow field parameter settings. Flow field parameters

Numerical value

relaxation number omega accelerated speed accel/m·s2 streamline density density/m·s

1.85 0.051 0.35

Host

(a)

Kernel function

(b) Computational domain

Device

z

NDrange

Serial code

Work-group (0,0,0)

Work-group (0,0,1)

Work-group (0,0,2)

Work-group (0,1,1)

Work-group (0,0,2)

H

input Work-group (0,1,0)

x W

£

¢

L

¢ y z

Work-group CPU memories

output

Serial code

Work-item (0,0,0)

Work-item (0,0,1)

Work-item (0,0,2)

Work-item (0,1,0)

Work-item (0,1,1)

Work-item (0,1,2)

x

˄ķ˅

W

L

˄ĸ˅

z

(c) Subdomain

(d) 14

5

15

11

18 3

10 2 8

1

4 12

6

1 7

y

H

3.2. OpenCL-based optimization x

y

Fig. 2 shows the optimization of the PF-LBM model in OpenCL. Firstly, the algorithm of PF-LBM model is analyzed by hot spot analysis (parallelism analysis). The data dependence determines the degree to which the algorithm can achieve parallelization. In order to avoid data dependency, the parallel algorithm of PF-LBM model is a set of multiple kernel. Because of the large memory required for the PF-LBM model, the transfer of large buffers between kernel is time-consuming, so the data set of the PF-LBM model is put in and always stored in the GPU. Fig. 2(a) shows that the PF-LBM Kernel function has multiple workgroup executed on GPU. Each work-item in each work-group is transmitted by the GPU thread scheduling unit to the same computing unit (CU). Each work-item in the same work-group shares the same local memory. The GPU operation unit accesses local memory one to two orders of magnitude faster than Global memory. According to the difference of speed and capacity among different levels of storage structure, the local principle is used to improve the performance by data reuse. Fig. 2(b) shows a two-dimensional area decomposition diagram as an example. The work-group configuration method of Halo Threads is used to improve the calculation efficiency. If a work-group size is (x + 2)(y + 2) , it takes three steps for the required data to be fully read to the Local Memory, read (x + y ) size data, and read upper and lower boundaries and left and right boundary data. For the last two steps, the effective work-item is much smaller than (x + y ) , and most work-items are idle because of synchronization. Introducing a boundary work-item guarantees that only one set of read operations requires the entire workgroup to calculate the data and calculate the intermediate results. Set the work-group processing sub-block size (x + 2)(y + 2) , the workgroup size written to the local memory is (x + 2)(y + 2) , and the executable work-group size corresponding to the update area of the workgroup is (x + y ) , without three readings. Reducing the waiting and waste of computing resources is caused by data reading, and of course Halo Threads also pays the cost of low efficient utilization of threads. Fortunately, the cost of the work-group thread is not high. Similarly, the 3D model uses this mechanism to calculate the phase field px , y, z as an example. Each work-item displays loads px , y, z to OpenCL's local memory, and adjacent work-item will read px ± 1, y, z , px , y ± 1, z , px , y, z ± 1 get the local memory. Fig. 2(c) shows the situation on the NVIDIA GPU platform, the PFLBM model calculates the OpenCL program and the required time under a certain grid. The following is the calculation flow and related steps. Step 1: The PF-LBM model is initialized on the CPU to create a context for the GPU, and the calculation algorithm is complex and requires high precision. Step 2: Pass the value calculated by the CPU to the GPU. The time required by the PF-LBM model to complete all calculations is

7

9 13

16

Phase field

Solute field

Flow field D3Q19 model

Fig. 1. OpenCL accelerated PF-LBM schematic, including (a) the hybrid CPU/ GPU framework, (b) the computational domain, (c) the subdomain, (d) the D3Q19 model and a 19-point Stencil calculation and a 7-point Stencil calculation. Where ① and ② represent the relationship between the compute node and the work-item.

3. GPU architecture parallelism implementation 3.1. Solving PF-LBM based on OpenCL As shown in Fig. 1(a), in using the OpenCL programming, in which the CPU is regarded as a host, and the GPU is considered to be a device capable of executing multiple threads in parallel. The entire calculation process is controlled by the CPU, and is mainly used to solve serial code problems such as initialization, memory allocation, CPU-GPU data conversion, kernel function (kernel) call and so on, and efficiently manage complex concurrent programs through lower overhead. kernel runs on the GPU and creates a large number of threads at execution. Each thread can be paralleled independently. Each thread is called a work-item, and multiple units of work are called a work-group. Several work-groups organize an integer index space called NDRange. NDRange and work-group can be logically organized into N-dimensional (1 ⩽ DIM ⩽ 3), and each dimension of NDRange is divisible by the value of workgroup. Fig. 1(b) shows the computational domain of the PF-LBM model. The upper part is the schematic diagram of the growth of the directional solidified three-dimensional eutectic layer L, W and H are the width, thickness and height of the layer respectively, which are parallel to the X, and Z axes. The α and β solid phases are alternately set according to the equilibrium volume fraction of the two phases. Fig. 1(b) also shows that the PF-LBM model discretizes the computational domain into a uniform grid using a finite difference method. The PF-LBM parallel implementation is based on the correspondence between grid nodes and work-item, in which each grid node is calculated by a work-item. When using GPU acceleration, NDRange is called by the kernel function, where all work-item belonging to the NDRange cover all compute nodes. Fig. 1(c) shows that the entire computational domain is divided into a series of non-overlapping subdomains, each of which is calculated as a work-group. Stencil computing is the core of 3

Computational Materials Science 176 (2020) 109523

C.-s. Zhu, et al.

_kernel PF-LBM(...) {

context

_kernel PF-LBM(...) {

(a)

GPU barrier(...)

barrier(...)

...

Freed

Dispatch

Queue }

Control

}

Local Memory

CPU

Local Memory

Host Memory

PCI-E

Global Memory (buffer) Nvidia Platform

Synchronize

CPU computing

CPU computing

CPU control queue y+2

y

CPU-GPU communiction

GPU-CPU communiction

x GPU computing

x+2

(1) (2)

(b)

(3)

(c)

(4)

(5) (6)

time(min)

Fig. 2. Optimization of PF-LBM model in OpenCL, including (a) OpenCL program execution diagram, (b) Halo Threads and (c) OpenCL program timing diagram.

mainly the iterative iteration of the kernel, that is, the repeated execution of step3. Step 3: Each time the execution is completed, the host side determines whether it needs to output data: if is YES, it will execute step 3 repeatedly, otherwise execute step 4. Step 4: The buffer in the GPU memory is transferred to the host memory through PCI-E. Step 5: The calculated result is generated in the CPU by using the graphic visualization software Teclpot.

Table 2 CBr4-C2Cl6 physical properties of the alloy.

4. Simulation results and discussion 4.1. Calculation conditions In this paper, study the interaction between lamellar spacing and eutectic growth when there is liquid flow, the initial eutectic lamellar is introduced in the bottom of the calculation area. The dimensionless initial lamellar spacing is defined as N = λ λ min . The value of N represents the ratio of the initial lamellar spacing λ and the minimumundercooling lamellar spacing λ min . Where λ min can be derived from the JH model. 1 2 λ min = 13.84 − 4.9Δ′ 2 + 57Δ′ 2 − 285Δ′ 2 + 874Δ′ 2

Physical parameters

Value

Eutectic composition c mol α − L equilibrium partition coefficient k α (1)

0.118 0.75

β − L equilibrium partition coefficient kβ (1)

1.5

α liquid slope mα (K ·mol−1)

− 82

β liquid slope mβ (K ·mol−1)

164

liquid diffusion coefficient D1 (m2·s−1)

0.5 × 10−9

Solid diffusion coefficient Ds (m2·s−1)

0.5 × 10−13

β − L surface energy σ23 (J ·m−2)

6.6 × 10−3

β − L surface energy σ23 (J ·m−2)

5.8 × 10−3

α − β surface energy σ23 (J ·m−2) 5.8 × 10−3 α − β volume fraction η (1)

5.8 × 10−3 0.29

be zero to simulate a solid boundary. LBM adopts the boundary condition of non-equilibrium extrapolation format processing. 4.2. Calculation efficiency and analysis

(10)

The code is implemented by standard C++ and can be easily compiled on different platforms. The experimental platform is located in Lanzhou Branch of the Chinese Academy of Sciences. The experimental platform is equipped with Tesla M2090 GPU and Tesla K20M GPU, respectively. The main parameters are shown in Table 3. In order to reduce the time overhead of serial calculation, the eutectic lamellae is fixed to two. CPU serial code implementation takes time tCPUS1, GPU parallel code takes time to tGPUS1 on Tesla M2090 GPU, and GPU

where Δ′ = |ϕ − 0.5|, Eq. (10) has a certain scope of application: 0.1 < ϕ < 0.6 . The eutectic composition (β-phase volume fraction ηβ = 0.29) as an example. Assuming that all simulations are carried out under the isotropic interface free energy, α and β are solid phase, γ is liquid phase, and the research object is CBr4-C2Cl6 rich in CBr4. The α-phase is the fcc structure, and the β-phase rich in C2Cl6 is the bcc structure. The composition of the two endpoints of the eutectic platform in the phase diagram is referenced to the physical properties of the alloy [28]. Other phase field parameters need Refs. [28,29]. The physical properties of the CBr4-C2Cl6 alloy are shown in Table 2. In the phase field calculation, both sides parallel to the direction of the temperature gradient adopt periodic boundary conditions. The upper and lower sides adopt adiabatic boundary conditions. The numerical calculation of boundary conditions of LBM plays an important role, which has a great influence on calculation accuracy and stability. The boundary conditions affected the convection behavior. For the flow field, two kinds of boundary conditions are generally used, namely the non-slip boundary condition and the specular reflection boundary condition. These two boundary conditions have different effects on eutectic growth [30]. This study uses non-slip boundary conditions, the velocity at the top was forced to

Table 3 Hardware device specific parameters.

4

Platform

NVIDIA Tesla M2090 GPU

NVIDIA Tesla K20 M GPU

Number of processor cores Memory bus type Device memory size Double precision peak Maximum operating frequency Memory bandwidth Support OpenCL version

448 GDDR5 6GB 515 Gflops 1.15 GHz

2486 DDR5 5 GB 1.17Tflops 825 MHz

177 GB/s OpenCL1.1

208 GB/s OpenCL1.1

Computational Materials Science 176 (2020) 109523

C.-s. Zhu, et al.

4.625, and the double floating point number needs to reach 45 for each data. Above, the performance is severely limited by the speed of memory access, so the access optimization is the focus of optimization. Different organizational threading methods affect the kernel's access to memory, and the memory access continuity can exponentially improve program performance. PF-LBM model is implemented in OpenCL by using work-group size and model dataset size to calculate NDRange size. Fig. 3 (b) shows the time spent in 10,000 time steps on a Tesla K20M GPU for multiple sets of compute scale programs. In order to meet the relationship between work-item and space nodes, the number of work-item should be greater than or equal to NDRange. The optimized NDRange and work-group have a large performance improvement, greatly reducing the execution time of the program, and getting the best work-group and NDRange through continuous debugging.

Table 4 Processing time of each platform. Mesh size

Mode

N

tCPUS1

tGPUS1

tGPUS2

105 * 31 * 220 196 * 31 * 220 237 * 31 * 140

PFM PFM PF-LBM

1.2 1.8 1.8

3550 s 10152 s 39645 s

59 s 106.8 s 365.67 s

45.45 s 81.49 s 291.02 s

parallel code takes time tGPUS2 on Tesla K20M. Table 4 shows the execution time of the three sets of models on different platforms. The calculation time of the PFM model depends on the size of the area in which the phase field. The calculation time of the PFM model increases as the number of grids increases. Although we have chosen a calculation model with a different ratio of initial lamellar spacing to minimum subcooling lamellar spacing N, the resulting computational difference is an increase in the calculated number of molecules in the solute and phase fields. After the PF-LBM phase field model is established by the coupled flow field, the amount of calculation and the amount of data increase sharply. PFM and PF-LBM models usually take hundreds of thousands of steps to complete the full computation, and the time cost of completing the full computation on the CPU is unacceptable. Fig. 3(a) shows the computational efficiency can be upgraded to millions of grids per second for measurement, in the number of grids in the X direction x the number of grids in the Y direction x the number of grids in the Z direction x iteration step/the time used in the iteration step. Compared with CPU serial program, the parallel program based on GPU has higher performance benefit. When N = 1.8, the single Tesla K20M GPU achieved the benefit of the CPU serial code 124.X under the grid scale of 196 * 31 * 220. The computational efficiency has increased by two orders of magnitude. After coupling the LBM model, N = 1.8, the coupled flow field pure diffusion eutectic directional growth PFLBM model achieved an acceleration of 136.X on the single Tesla K20M GPU at the grid size of 237 * 31 * 140. The acceleration ratio increases with the increase of calculation scale. The study also found that the lengths of ② and ③ are basically the same, but longer than ①. At this time, the hardware gain of the program executed on the Tesla K20M GPU is larger than that of the Tesla M2090 GPU. The smaller calculation amount is not enough to exert the power of the Tesla K20M GPU. In order to give full play to the computational performance of GPU, it is necessary to have enough computation to give full play to the computational power of GPU. For the powerful computing resources of the GPU, taking the NVIDIA K20 GPU as an example, the double-precision floating-point computing power is 1.17 TFlops, the storage bandwidth is only 208 GB, the program computing access ratio needs to reach

4.3. Influence of different lamellar spacing without flow on eutectic growth morphology and solute distribution In the absence of flow field, the eutectic exhibits a purely diffuse growth. Relying on the lateral concentration gradient established by the two solid phases, and a method of mutual coupling and synergistic growth perpendicular to the solid–liquid interface is form. Fig. 4 shows that the solute distribution is consistent with the eutectic growth. In other words, the eutectic evolution is highly dependent on the variation of solute concentration. When N = 1.2, the layered eutectic is in a steady state growth mode perpendicular to the symmetry of the solid–liquid interface. During the eutectic two-phase growth process, the front edge of the α-phase interface diffuses along the lateral direction, and the solute elements discharged from the interface are gradually enriched in the front edge of the β-phase interface, promoting the growth of the β-phase, and the solute element discharged from the βphase promotes α-phase. Realize the common growth of two phases. When N = 1.8, the eutectic layer propelled into the liquid phase, the phenomenon of oscillating state growth of a specific wavelength occurs. Because the width of the α-phase layer is larger, the distance from the center interface of the α-phase layer to the front edge of the β-phase interface increases, and the solute enrichment at the front of the center interface hinders the vicinity of the α-phase center growth. The α-phase center has slower growth rate than both sides of α-phase, gradually formed a “solute-rich concave”. When the interlayer spacing λ is larger than the corresponding λmin , the increasing spacing of the eutectic layer leads to an increase in the distance between the solute lateral diffusion of the eutectic interface and the solute redistribution. The solute field is disturbed and shows the enrichment solute elements at the interface front. At this time, the constraint effect of the initial lamellar spacing is

Fig. 3. Performance analysis. Among them, (a) single GPU performance improvement speedup. Where ①, ② and ③ respectively for the acceleration ratio P increase on different GPUs of the program. (b) Impact of NDRange and work-group on performance. In the paper, The time it took to select 10,000 iterative steps, in seconds (s). 5

Computational Materials Science 176 (2020) 109523

C.-s. Zhu, et al.

side to be carried to the left, and the concentration of the solute on the right side becomes smaller. The distribution of solute distribution in the front interface of the middle interface is unevenly distributed, and the eutectic grows along a region where the concentration of the solute is small. Fig. 5(b) show that the eutectic growth morphologies at time points t = 4 × 105Δt . The eutectic continues to grow obliquely. Due to the large volume fraction of α-phase, the solute elements at the interface front are too late to diffuse, and a large number of solute elements are accumulated. The β-phase nucleation occurs at the front edge of the α-phase interface, and the eutectic layer spacing decreases and the logarithm increases. Fig. 5(a) to (d) show the evolution of the layered tilt angle, with the tilt angle first increasing and then decreasing. At the beginning of the simulation, the liquid flow rate is large, and the eutectic sheet is inclined more severely. After the eutectic growth reaches half of the area, because the flow velocity in the remaining liquid region is insufficient to maintain the slope of the eutectic growth, the tilt angle begins to decrease, and the smaller liquid flow has less effect on the lateral solute diffusion in the eutectic front liquid phase. Fig. 5(d) show that the eutectic growth morphologies at time points t = 8 × 105Δt . A phenomenon in which a layer of β-phase slice grows in the direction of liquid flow. At a small flow rate, compared with the regions with higher solute concentration, the lamellae tends to grow to the regions with higher undercooling. Fig. 6(a) shows that the eutectic grows countercurrently along the direction of liquid flow, and the oscillating growth mode at specific wavelengths is broken. The eutectic layer shows a mixed growth of an inclined state and oscillatory state. With the growth of the eutectic structure, not only the liquid flow rate is changed, but also the direction of liquid flow is changed. When N = 1.2, the liquid flow at the front of the solid–liquid interface is almost horizontal [see the Fig. 5]. Compared with N = 1.2, when N = 1.8, the flow field ring structure is obvious under the action of “solute-rich concave” [see the Fig. 6(b)]. The solute elements precipitated on the right side of α-phase are washed away by the flow field, and the solute concentration on the right side of the α-phase decreases. The solute element on the right side of the αphase is brought to the bottom, increasing the degree of solute enrichment. Moreover, the solute elements discharged from the β-phase cannot be uniformly diffused laterally due to the influence of flow. A large amount of solute elements are brought to the right of the α-phase. It promotes the absorption on the right side of α-phase and accelerates the precipitation process. In addition, the α-phase growth rate on the left side is slower than on the right side. Because under the action of the

Fig. 4. Shows the results of 3D simulation of pure diffusion growth using a single GPU for 8 layers of 3D eutectic. Among them, (a) and (b) shows the solute distribution and the corresponding phase field morphology of the eutectic initial lamellar spacing N = 1.2. (c) and (d) shows the solute distribution and the corresponding phase field morphology of the eutectic alloy at the initial lamellar spacing N = 1.8. The ruler shows the value of the phase field order parameter and the solute percentage. Where c is the solute field and p is the phase field.

smaller than the disturbance of the solute field at the interface front. The volume fraction of eutectic two phases remains unchanged during the whole disturbance process, and the solute diffusion and interface can reach equilibrium. 4.4. Effect of convection on eutectic growth morphology and solute distribution Fig. 5(a) show that the eutectic growth morphologies at time points t = 2 × 105Δt . Convection makes the eutectic layer in the early stage of growth inclined to the growth in the countercurrent direction, and the flow of the solution destroys the eutectic steady-state growth form. The solute concentration is higher in the right section of the α-phase than in the left section. The flow of liquid causes the solute element on the right

Fig. 5. Shows the three-dimensional simulation results of the forced convection on the evolution of the eutectic alloy at different times, with an initial spacing of N = 1.2 and four plies as an example. Among them, the larger the value of the grid unit/magnitude, the smaller the value of the vector. The ruler shows the solute percentage. The arrows represent the solution flow rate and direction. The ruler shows the value of the solute percentage. Where c is the solute field.

6

Computational Materials Science 176 (2020) 109523

C.-s. Zhu, et al.

Fig. 6. Shows that under forced convection, solute distribution and the corresponding phase field morphology of the eutectic alloy at the initial lamellar spacing N = 1.8 and four plies as an example. Where (b) and (c) are partial enlarged views of the α-phase and β-phase of (a), respectively. Among them, the figure (c) shows the eutectic morphology and solute distribution of the “solute-rich concave”, respectively. The arrows represent the solution flow rate and direction. The ruler shows the value of the phase field order parameter and the solute percentage. Where c is the solute field and p is the phase field.

flow field, the solute element discharged from the right side of the αphase accumulates at the bottom of the α-phase “solute-rich concave”, and the solute concentration on the right side of the α-phase is lowered, so the α-phase grows faster on the right side. The solute elements in the “solute-rich concave” are transferred by the liquid flow, along the left side of the phase. The solute concentration at the front of the solid–liquid interface on the left side of the α-phase is relatively large, which hinders the growth. In summary, the left side of the α-phase is affected by the solute element discharged from the “solute-rich concave”, and the concentration of the interface front is larger than the right side of the α-phase. Compared with N = 1.2, when N = 1.8, the β-phase nucleation has not yet occurred in the “solute-rich concave” at the current time step. The main reasons are as follows: on one hand, the driving force for eutectic growth is provided by the concentration gradient. The flow of liquid accelerates the diffusion rate of solutes and takes away a large amount of solute elements from the α-phase. The actual driving force for eutectic growth is increasing. At this time, the growth rate of eutectic structure is relatively balanced with the solute enrichment. On the other hand, the degree of solute enrichment and layer spacing have not exceeded the upper limit of stability. The conditions for β-phase nucleation in α-phase are: the interface curvature is less than 0. In other words, the liquid/a interface mean curvature is negative. When CL, α − CE > ΔCα , the β-phase nucleation occurs. Where CL, α is the liquid phase solute component at the α-phase interface, ΔCα is the critical nucleating solute component at the α-phase interface.

and the interlamellar spacing exceed the upper limit of stability, heterogeneous nucleation occurs at the interface front of the larger volume fraction, and the eutectic layer spacing decreases and the logarithm increases. Under forced convection, the growth rate of α-phase is faster on the countercurrent side than on the downstream side. CRediT authorship contribution statement Chang-sheng Zhu: Conceptualization, Methodology, Software, Supervision, Investigation. Zhong-yan Deng: Data curation, Writing original draft, Formal analysis, Writing - review & editing, Software, Validation, Data curation. Li Feng: Visualization, Investigation, Supervision. Yu-jie Li: Software, Validation, Supervision. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements Fund project: National Natural Science Foundation of China (Grant Nos. 51661020, 11504149 and 11364024), the Postdoctoral Science Foundation of China (Grant No. 2014M560371), and the Funds for Distinguished Young Scientists of Lanzhou University of Technology, China (Grant No. J201304).

5. Conclusion

References

This paper implements the evolution process of three-dimensional eutectic structure under forced convection based on GPU parallel simulation. The computational efficiency is improved by two orders of magnitude. The experimental results show that the flow of liquid melt changes the growth morphology of the layered eutectic, and at the initial stage it changes to grow obliquely to the countercurrent direction. When the eutectic growth region is more than half, the liquid flow rate is insufficient to maintain the tilt angle of eutectic growth. When the liquid flow is small, the influence of liquid flow on eutectic growth is small. At this time, the eutectic is dominated by pure diffusion growth. When the layer spacing is large, in α-phase with a large volume fraction, solute elements in front of the interface are easy to accumulate to produce “solute-rich concave”. When the degree of solute enrichment

[1] A. Zhang, Z. Guo, S.-M. Xiong, Quantitative phase-field lattice-Boltzmann study of lamellar eutectic growth under natural convection, Phys. Rev. E 97 (5) (2018) 053302, , https://doi.org/10.1103/PhysRevE.97.053302. [2] Silvère Akamatsu, Mathis Plapp, Eutectic and peritectic solidification patterns, Curr. Opin. Solid State Mater. Sci. 20 (1) (2016) 46–54, https://doi.org/10.1016/j. cossms.2015.10.002. [3] Y. Lu, C. Beckermann, J.C. Ramirez, Three-dimensional phase-field simulations of the effect of convection on free dendritic growth, J. Cryst. Growth 280 (1-2) (2005) 320–334, https://doi.org/10.1016/j.jcrysgro.2005.03.063. [4] Adam A. Wheeler, Bruce T. Murray, Robert J. Schaefer, Computation of dendrites using a phase field model, Phys. D: Nonlinear Phenomena 66 (1-2) (1993) 243–262, https://doi.org/10.1016/0167-2789(93)90242-S. [5] Zh.u. ChangSheng, Feng Li, Wang ZhiPing, et al., Numerical simulation of threedimensional dendritic growth using phase-field method, Acta Phys. Sin. 58 (2009)

7

Computational Materials Science 176 (2020) 109523

C.-s. Zhu, et al.

[18] B. Nestler, A.A. Wheeler, A multi-phase-field model of eutectic and peritectic alloys: numerical simulation of growth structures, Phys. D: Nonlinear Phenomena. 138 (2000) 114–133, https://doi.org/10.1016/S0167-2789(99)00184-0. [19] Li Feng, et al., Phase field modeling of lamellar eutectic growth under the influence of fluid flow, Comput. Mater. Sci. 137 (2017) 171–178, https://doi.org/10.1016/j. commatsci.2017.05.035. [20] X. Liao, N. Xiao, Emerging High-Performance Computing Systems and Technology, Sci. Sin. Inf. 46 (2016) 1175–1210, https://doi.org/10.1360/N112016-00147. [21] C. Zhu, J. Jia, L. Feng, et al., Research of three-dimensional dendritic growth using phase-field method based on GPU, Comput. Mater. Sci. 91 (2014) 146–152, https:// doi.org/10.1016/j.commatsci.2014.04.050. [22] A. Yamanaka, T. Aoki, S. Ogawa, et al., GPU-accelerated phase-field simulation of dendritic solidification in a binary alloy, J. Cryst. Growth 318 (2011) 40–45, https://doi.org/10.1016/j.jcrysgro.2010.10.096. [23] Cong Yang, Xu Qingyan, Baicheng Liu, GPU-accelerated three-dimensional phasefield simulation of dendrite growth in a nickel-based superalloy, Comput. Mater. Sci. 136 (2017) 133–143, https://doi.org/10.1016/j.commatsci.2017.04.031. [24] S. Sakane, T. Takaki, R. Rojas, et al., Multi-GPUs parallel computation of dendrite growth in forced convection using the phase-field-lattice Boltzmann model, J. Cryst. Growth 474 (2017) 154–159, https://doi.org/10.1016/j.jcrysgro.2016.11.103. [25] C. Zhu, J. Liu, M. Zhu, et al., Multi-GPU hybrid programming accelerated threedimensional phase-field model in binary alloy, AIP Adv. 8 (2018) 035312, , https:// doi.org/10.1063/1.5021730. [26] Munshi, Aaftab, The OpenCL specification. 2009 IEEE Hot Chips 21 Symposium (HCS). IEEE, 2009. [27] S.G. Kim, W.T. Kim, T. Suzuki, Phase-field model for binary alloys, Phys. Rev. 60 (1999) 7186, https://doi.org/10.1103/PhysRevE.60.7186. [28] Yang Yujuan, Yan Biao, Three-dimensional multiphase field simulation of CBr 4–C2Cl6 eutectic alloy growth I. Effect of pitch on morphology evolution, Acta Metallur. Sin. 46 (2010) 781–786. [29] Yang Yujuan, Yan Biao, Three-dimensional multiphase field simulation of CBr 4–C2Cl6 eutectic alloy growth Ⅱ. Effect of pitch on morphology evolution, Acta Metallur. Sin. 46 (2010) 787–793. [30] Ang Zhang, et al., Dependence of lamellar eutectic growth with convection on boundary conditions and geometric confinement: A phase-field lattice-Boltzmann study, Metallur. Mater. Trans. B 50 (1) (2019) 517–530, https://doi.org/10.1007/ s11663-018-1479-1.

8055–8061. [6] R. Kobayashi, Modeling and numerical simulations of dendritic crystal growth, Physica D: Nonlinear Phenomena 63 (1993) 410–423, https://doi.org/10.1016/ 0167-2789(93)90120-P. [7] T. Takak, Phase-field modeling and simulations of dendrite growth, ISIJ Int. 54 (2014) 437–444, https://doi.org/10.2355/isijinternational.54.437. [8] S.G. Kim, W.T. Kim, T. Suzuki, et al., Phase-field modeling of eutectic solidification, J. Cryst. Growth 261 (2004) 135–158, https://doi.org/10.1016/j.jcrysgro.2003.08. 078. [9] X. Tong, C. Beckermann, A. Karma, Velocity and shape selection of dendritic crystals in a forced flow, Phys. Rev. 61 (2000) 49–61, https://doi.org/10.1007/ 978-94-015-9807-1_6. [10] X. Tong, C. Beckermann, A. Karma, et al., Phase-field simulations of dendritic crystal growth in a forced flow, Phys. Rev. 63 (2001) 061601, , https://doi.org/10. 1103/PhysRevE.63.061601. [11] Ang Zhang, Zhi-peng Guo, Shou-mei Xiong, Phase-field-lattice Boltzmann study for lamellar eutectic growth in a natural convection melt, China Foundry 14 (5) (2017) 373–378, https://doi.org/10.1007/s41230-017-7186-8. [12] J.H. Jeong, J.A. Dantzig, N. Goldenfeld, Dendritic growth with fluid flow in pure materials, Metallur. Mater. Trans. A 34 (2003) 459–466, https://doi.org/10.1007/ s11661-003-0082-4. [13] C.W. Lan, C.J. Shih, Phase field simulation of non-isothermal free dendritic growth of a binary alloy in a forced flow, J. Cryst. Growth 264 (2004) 472–482, https://doi. org/10.1016/j.jcrysgro.2004.01.016. [14] D. Medvedev, K. Kassner, Lattice Boltzmann scheme for crystal growth in external flows, Phys. Rev. E 72 (2005) 056703, , https://doi.org/10.1103/PhysRevE.72. 056703. [15] Z. Guo, J. Mi, S. Xiong, et al., Phase field simulation of binary alloy dendrite growth under thermal-and forced-flow fields: an implementation of the parallel–multigrid approach, Metallur. Mater. Trans. B 44 (2013) 924–937, https://doi.org/10.1007/ s11663-013-9861-5. [16] W.-M. Wang, et al., Eutectic patterns with weak convection in binary systems, J. Cryst. Growth 240 (1-2) (2002) 313–320, https://doi.org/10.1016/S00220248(02)00907-7. [17] S. Sakane, T. Takaki, M. Ohno, et al., Three-dimensional morphologies of inclined equiaxed dendrites growing under forced convection by phase-field-lattice Boltzmann method, J. Cryst. Growth 483 (2018) 147–155, https://doi.org/10. 1016/j.jcrysgro.2017.11.029.

8