Computers & Fluids 90 (2014) 164–171
Contents lists available at ScienceDirect
Computers & Fluids j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / c o m p fl u i d
Accelerated solution of problems of combustion gas dynamics on GPUs B.P. Rybakin ⇑, L.I. Stamov, E.V. Egorova SRISA RAS, Nahimovskiy Prospect, 36, 1, Moscow 117218, Russia MSU, Leninskie Gory, GSP-1, Moscow 119991, Russia
a r t i c l e
i n f o
Article history: Received 29 June 2013 Received in revised form 19 October 2013 Accepted 5 November 2013 Available online 16 November 2013 Keywords: Gas dynamics Combustion Parallel algorithms MultiGPU
a b s t r a c t The paper deals with a solution of gas dynamics problems with chemical kinetics on graphic accelerators. To solve these problems, schemes are analyzed that combine high resolving power in the areas of small perturbations and monotony in areas of strong discontinuities. Problems are studied with consideration of chemical kinetics, which address the processes of transition from combustion to detonation. Parallel algorithms of the analyzed difference models are built for the purpose of calculations on graphic processors using the CUDA technology. The proposed algorithms and programs allow for 41.66 times faster acceleration on four graphic accelerators M 2090 than the CPU. Ó 2013 Elsevier Ltd. All rights reserved.
1. Introduction The solution of 3D gas dynamics problems in consideration of combustion and detonation processes, gravitational and magnetic fields is of major practical interest. Such solutions are used in designing cars and airplanes, increasing the output of oil-bearing strata, studying galaxy formation, supernova explosions, weather forecasting, climate modeling, etc. High resolution difference schemes have been developed in recent years, which enable mathematical modeling of the above processes, but conducting 3D calculations on detailed grids require large computational resources. Therefore, the emergence of new program development technologies for working on a GPU (Graphical Processor Unit) allows for the use of new possibilities, in addition to existing ones, to conduct intensive computing. As a result of increased GPU performance, the number of cores in such units reaches hundreds or even thousands. For example, Tesla C1060 has 240 cores, and the Fermi graphic processors has up to 512 cores. In the new Kepler technology, the number of CUDA cores exceeds 3000. At the moment, there are two main technologies of program development for working on GPU: CUDA and OpenCL. CUDA (Compute Unified Device Architecture) is a technology of programming graphic accelerators from NVIDIA [1]. This technology allows to work with C, FORTRAN and other programming languages. OpenCL (Open Compute Language) was developed jointly by several companies, and it enables the development of programs for graphic accelerators of NVIDIA, ATI and regular pro⇑ Corresponding author at: SRISA RAS, Nahimovskiy prospect, 36, 1, Moscow 117218, Russia. E-mail address:
[email protected] (B.P. Rybakin). 0045-7930/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.compfluid.2013.11.008
cessors (CPU). This way, the possibility of intensive mathematical calculations on graphic accelerators is emerging. 2. High resolution difference schemes Modeling of 3D gas dynamics flows is one of the complex nonlinear dynamic processes that place high demands on the utilized difference schemes. These schemes must reproduce the behavior of a substance in the vicinity of discontinuities and high gradient areas as accurately as possible, and reliably describe small perturbations away from shock wave fronts. To address such conflicting demands difference schemes are used that combine high resolving power in the areas of small perturbations and monotony in areas of strong discontinuities. Such schemes include TVD, ENO, WENO, PPM and other types of schemes. Such second-order-accurate non-linear schemes with limited overall variation enable to calculate shock waves with high resolution and to prevent non-physical oscillation in their fronts. Among these schemes, preference is given to schemes that allow to create good parallel algorithms, which enable calculations on heterogeneous computing systems using graphic accelerators. Computation of gas dynamics flows in complex problems is not the only factor that requires the use of high-performance computers. Additional computation is needed to solve combustion and detonation problems, and to study gravitational and magnetic fields. Often, the time spent on these additional computations greatly exceeds the time for calculating the gas dynamics phase. This work provides solutions of several problems of computational gas dynamics, both in consideration of the influence of chemical kinetics [2,3], turbulence, and computations of pure gas dynamics flows.
165
B.P. Rybakin et al. / Computers & Fluids 90 (2014) 164–171
The modeling of multidimensional processes of disturbances propagation in gas requires large computational resources. Gas flow equations can be represented conservatively as follows:
Sod and Lax tests (Fig. 1). When testing have been used and other tests ([4]).
@U @Fx @Fy @Fz þ þ þ ¼ 0; @t @x @y @z
3. Testing of schemes
ð1Þ
where U vector of conservative variables, Fx ; Fy and Fz – numerical threads. In Eqs. (1) for ideal gas equation, vector U is as follows:
U ¼ ðq; qv x ; qv y ; qv z ; qEÞT ;
ð2Þ T
Fx ¼ ðqv x ; qv v ; qv x v y ; qv x v z ; qE þ pÞ : 2
ð3Þ 2
p Here v ¼ ðv x ; v y ; v z Þ components of the speed vector, E ¼ jv2j þ ðc1Þ q energy, q – density and p – pressure. Threads Fy and Fz are determined analogously (3). In the equation of state c was assumed to be c ¼ 5=3. Three high-resolution schemes were analyzed: (a) TVD scheme [4], (b) modification of TVD scheme, suggested by Jiang and Tadmor [5] and PPM scheme [6]. A program was written for each of these schemes, and a series of 1D, 2D [7] and 3D [8] tests were conducted. Testing of programs for 1D problems was conducted using T
Fig. 1 demonstrates the results of testing obtained based on the TVD and PPM schemes for the Sod test problem, drawing (a,b) and Lax test problem, drawing (c,d). Density graphs are provided. On the top (diagram a,b) shows analytical (in red) and computational (blue line) solutions; diagram (c,d) shows solutions for the Lax problem in the same colors. In this case, computations were conducted based on the PPM scheme. It is noteworthy that the PPM scheme results in better correspondence of the analytical solution and numerical calculations. On the other hand, using the PPM scheme leads to considerable time expenditures and more complex algorithm used to work on graphic accelerators. For 2D problem testing, test problems provided in [9] were used. Initial data for these tests are presented in Table 1. The rectangular computational domain was divided into four equal rectangles, and different initial conditions for pressure, density, energy and velocity components v x and v y were specified for each.
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
(a)
0.5
0.6
0.7
0.8
0.9
1
0.6
0.7
0.8
0.9
1
(b)
1.3
1.3
1.2
1.2
1.1
1.1
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3 0.1
0.2
0.3
0.4
0.5
(c)
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
(d)
Fig. 1. Sod test problem for TVD-type method (a) and PPM method (b); Lax test problem for TVD-type method (c) and PPM method (d).
166
B.P. Rybakin et al. / Computers & Fluids 90 (2014) 164–171
Table 1 Initial data for 2D test problems used by R. Liska and B. Wendroff [9]. Case
Left
Right
vx
vy
T
vx
vy
p
q
p
q
3
0.3 0.029
0.5323 0.138
1.206 1.206
0.0 1.206
1.5 0.3
1.5 0.5323
0.0 0.0
0.0 1.206
0.3
4
0.35 1.1
0.5065 1.1
0.8939 0.8939
0.0 0.8939
1.1 0.35
1.1 0.5065
0.0 0.0
0.0 0.8939
0.25
6
1.0 1.0
2.0 1.0
0.5 0.5
1.0 1.0
1.0 3.0
12
1.0 1.0
1.0 0.8
0.7276 0.0
0.0 0.0
0.4 1.0
0.5313 1.0
0.0 0.0
0.0 0.7276
15
0.4 0.4
0.5197 0.8
0.6259 0.1
0.3 0.3
1.0 0.4
1.0 0.5313
0.1 0.1
0.3 0.4276
0.2
17
1.0 0.4
2.0 1.0625
0.3 0.2145
1.0 0.4
1.0 0.5197
0.0 0.0
0.4 1.1259
0.3
0.75 0.75
0.0 0.0
The first column in Table 1 contains the number of the test problem from [9]. The second column to the left and the third column to the right of the central line that divides the rectangle contain the gas dynamic values. The first line for each variant corresponds to the upper half of the computational domain, and the second line to the lower half. The last column shows the end time of computations. Thus, variant (a) in Fig. 2 corresponds to case 3 in Table 1, whereas case 4 corresponds to variant (b) in this diagram. All calculations were conducted on the basis of the TVD scheme [5]. Sizes of the difference grid varied from 64 64 to 2048 2048 cells. At the time t = 0 these areas filled with gas began to interact with each other. The demonstrated results of calculations agree well with the results provided in [9]. Three-dimensional test problem. We will review the problem of interaction of the shock wave with a spherical enclosure filled with low-density gas [7,8]. Let us assume that area R : f0 6 x 1:0; 0 6 y 1:0; 0 6 z 1:0g is a parallelepiped filled with quiet gas with density q0 ¼ 1:0 a pressure p0 ¼ 1:0. The inside of this parallelepiped contains a spherical domain with radius r ¼ r0 , with the center at a point ðx0 ; y0 ; z0 Þ. To the left, at the point with coordinates x ¼ x , there is a flat shock wave, which is moving to the left. The initial conditions to the right of the shock wave and outside the spherical bubble are specified as follows ðp; q; u; v ; wÞT ¼ ðph ; qh ; uh ; v h ; wh ÞT . Inside the spherical enclosure, pressure and density are p ¼ pb ¼ 1:0; q ¼ qb ¼ 0:1. Behind the shock wave, values are determined using the Rankine-Hugoniot relations. The computational parallelepiped was covered by a rectangular grid of between 64 64 64 and 1024 1024 1024 in size [6]. Computations of the 3D test problem were conducted in Intel Core I7 920 processors and graphic accelerators NVIDIA Tesla C1060 and GeForce 570 GTX. The architecture of graphic accelerators Tesla C1060 and GeForce 570 GTX is different. The Tesla architecture has 30 streaming multiprocessors, with 8 cores each for single precision. So there are 240 cores in total. Memory available for use is 4 Gb. The graphic accelerator NVIDIA GeForce 570 GTX has 480 cores and 1.2 Gb of memory. The graphic accelerator was used to compute threads at cell boundaries, thread correction, and delimiters. The architectures of graphic accelerators Tesla and Fermi use multithreaded processors SM, which enable vectorization using the SIMT (Single Instruction Multiply Thread) technology. This allows to create and manage thousands of competing threads. Threads can be organized into 1D, 2D and 3D thread blocks, each operating on its own core [10]. All threads are completed independently, using thread registers and local memory. Built-in variables threadIdx and blockIdx [1] were calculated for each thread and thread block.
0.75 0.75
0.5 0.5
0.3 0.25
Calculations conducted on the graphic accelerator GeForce 570 GTX result in 15.2 times the acceleration as compared to a singlecore processor on a 256 256 256 grid. At the same time, the main program operates on the central processor. Resolution of the shock front is the size of two or three sizes of computing cells [8]. Thus, the conducted testing of proposed schemes and 3D hydrodynamic flow modeling programs shows that this scheme can be applied in solving various problems of transient gas dynamics, combustion gas dynamics on heterogeneous computing systems.
4. Statement of the combustion gas dynamics problem The set of gas dynamics equations that demonstrate the multicomponent gas mixture in consideration of chemical kinetics comprises classical energy conservation laws, such as Navier–Stokes, and could be written down as follows:
@ qk q _ k; þ r qk v De r k ¼ x @t q @ðqv Þ þ r ðqvv sÞ þ rp ¼ 0; @t @ðqEÞ _ þ r ðqEv þ pv ke rh þ le rK s v Þ ¼ q; @t where q ¼
P
k
ð4Þ ð5Þ ð6Þ
qk is sum of partial densities of the mixture,
2 3
2 3
s ¼ le rv þ rv T r vI qKI
is a stress tensor,
h¼
Xq k
k
q
ek þ
R T Wk
_ k is a rate of is enthalpy per unit mass of the perfect gases mixture, x formation of component k in chemical interactions. Effective transport coefficients in (5) are le ¼ l þ lT ; ke ¼ l=Pr þ lT =Pr T ; De ¼ l=Sc þ lT =ScT , where l; lT are molecular and turbulent viscosity of the mixture; Pr; Pr T ; Sc; ScT are Prandtl number, turbulent Prandtl number, Schmidt number and turbulent Schmidt number, respectively. The constant numbers hypothesis of Prandtl and Schmidt was used in the model. Value q_ models the inflow of heat to the system from the outside, for example for mixture ignition. Model of perfect gas was used in equation (7):
p ¼ RT
Xq k : Wk k
ð7Þ
167
B.P. Rybakin et al. / Computers & Fluids 90 (2014) 164–171 1
1
0.9
1.6
0.8
1.4
0.7
0.9
1.8
0.8 1.6
0.7 1.2
0.6
1
0.5
0.6
0.8
0.4
0.3
0.6
0.3
0.2
0.4
0.1
1.4
0.5
0.4
0
2
1.2 1
0.2
0.8
0.1 0.2 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.6 0
0.1
0.2
0.3
0.4
(a)
0.6
0.7
0.8
0.9
1
(b)
1
3
0.9
1 0.9
2.5
0.8 0.7
1.6
0.8 1.4
0.7 2
0.6 0.5
1.5
0.6
0.4
0.3
0.3
1
0.2
1.2
0.5
0.4
1
0.8
0.2
0.1 0
0.5
0.5 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1 0
1
0.6 0
0.1
0.2
0.3
0.4
(c)
0.5
0.6
0.7
0.8
0.9
1
(d)
1
1
1
2
0.9
0.95
0.9
0.8
0.9
0.8
0.7
0.85
0.7
0.6
0.8
0.6
0.5
0.75
0.5
0.4
0.7
0.4
1
0.3
0.65
0.3
0.8
0.2
0.6
0.2
0.6
0.1
0.55
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.5
(e)
0
1.8 1.6 1.4 1.2
0.4 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(f)
Fig. 2. Results for 2D Riemann problem case 3 (a), 4 (b), 6 (c), 12 (d), 15 (e), 17 (f) with initial data from Table 1. Density is displayed by color, pressure by contours and velocity by arrows.
To model the turbulent gas flow, the RANS (Reynolds-Averaged Navier–Stokes) class turbulence model is used on the basis of the k model, which is demonstrated by the following equations system (8), (9):
@ðqÞ þ r qv r1 lT r ¼ K ðC 1 P r C 2 qÞ; @t @ðqKÞ þ r ðqK v lT rKÞ ¼ PT q; @t
ð8Þ ð9Þ
168
B.P. Rybakin et al. / Computers & Fluids 90 (2014) 164–171
where K is a turbulent energy, is a dissipation parameter, lT ¼ C l qK 2 1 is turbulence production. A series of simplifications
5. Parallel algorithm
were made in this model. Force of gravity and energy transfer by radiation were not taken into consideration. The environment was considered single-speed and single-temperature. All components of the gas mixture could be mixed with each other. This system was solved using the method of splitting into physical processes. It was based on the modernization of 2D centered second-order-accurate TVD (Total Variation Diminution) scheme, proposed by Jiang and Tadmor [5], which is an extension of the 1D predictor–corrector scheme of Nessyahu and Tadmor [11]. The works of Harten [12] were pioneering in these types of schemes. This scheme type is much simpler than the schemes that use the Riemann invariants, but at the same time it has good accuracy, adequate for solving a large number of problems. To obtain a monotonous solution, values used for thread extrapolation were limited. This scheme and its parallel implementation were used to calculate the convection part in the method of splitting by physical processes. A kinetic mechanism in the gaseous phase was implemented for the oxygen-hydrogen mixture (H2 ; N2 ; O2 ) for this model. Nitrogen is the neutral component. This mechanism is based on the work of Maas and Warnatz [3] and contains nineteen reciprocal reactions with nine components. Let us analyze the problem of calculating the transient combustion process in the chemically-reacting gas phase to study the transitional processes of combustion and detonation. The gas phase consists of components k that can be mixed with each other [2]. Solution of the convection part was produced using the TVD class centered scheme (see Fig. 3). To check the accuracy of calculations of the applied algorithm of chemical kinetics, let us analyze the following example. Let us assume a squared 2D area R : f0 6 x 6 1:0; 0 6 y 6 1:0g, filled with a mixture of ideally mixed quiet perfect gasses. In the center of the area there is continuous supply of energy. At the initial time point the following gas concentrations of fuel, oxidizer, and diluent are used – f3:0 : 2:0 : 3:5g, respectively. The sum density of the mixture is considered constant and equals q ¼ 0:3. Initial temperature of the mixture is T ¼ 300. Thermal energy supply from the outside is q_ ¼ 7 109 . The perfect gas constitutive equation is used as the constitutive equation. At the boundaries of the computational domain, an open boundary is assumed. Results of computations are provided in Fig. 4 and 5.
Due to the computational complexity, the modeling of equation systems specified above takes a rather long time. Therefore, CUDA technology, which enables the use of graphic accelerators, was used to solve these problems. The CUDA programming model supports four levels of abstraction: cooperative organization of thread groups, common memory, barrier synchronization within thread groups, and coordination of independent threads in the grid [1,10]. Graphic processors enable simultaneous launch of tens of thousands threads for execution. They are joined into warps. Each warp comprises 32 threads. Furthermore, the graphic accelerator has its own memory, which in turn is divided into global, local, register, memory constant and texture. All this requires another approach to the design of computational algorithms and programming. To solve the problems analyzed above, mathematical models, computational algorithms, and programs were developed on the graphic processors [8,13,14]. Calculations of threads at the boundaries of the cells, thread correction, and delimiters were performed on the graphic accelerator. Architectures of the Tesla and Fermi graphic accelerators use multithreaded processors SM, which enable vectorization using the SIMT (Single Instruction Multiply Thread) technology. This allows to create and manage thousands of competing threads. Threads can be organized into 1D, 2D and 3D thread blocks, each operating on its own core. All threads are completed independently, using thread registers and local memory. Computations were performed on one core of the CPU, on one, two or four graphic cards [13,14]. The algorithm has been split into two parts. One part relates to the description of gas-dynamic processes, and the other to the chemical. Parallelization was performed on the geometric principle. The parallel version of the gas-dynamic similar to algorithms used in [13,14]. At step occurring chemical processes do not depend on the spatial gradients, so each node of the computational grid calculation depends only on the values located in the same node and can be performed independently for the neighboring cells. Thus each grid cell has corresponding stream on the graphics card. When carrying out calculations on multiple graphics cards the computational domain was divided between them at the similar overlapping vertical stripes. All other calculations were performed in a similar manner.
Fig. 3. Density (left) and energy (right) at the time moment t = 20.07.
169
B.P. Rybakin et al. / Computers & Fluids 90 (2014) 164–171 −3
x 10 12
1 0.9
7
0.9 10
0.8 0.7
0.8
6
0.7 8
0.6 0.5
6
0.4 4
0.3 0.2
5
0.6 0.5
4
0.4
3
0.3
2
0.2 2
0.1 0
−5
x 10
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
1
1
0.1 0
0.1
0.2
0.3
0.4
(a)
0.6
0.7
0.8
0.9
1
0
(b)
1
5
x 10
1
0.9
3000
0.8
8
0.9
7
0.8 2500
0.7 0.6
0.7
6
0.6 2000
0.5
5
0.5 4
0.4
1500
0.3
0.4 3
0.3 1000
0.2 0.1 0
0.5
500 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1 0
1
2
0.2
1 0
0.1
0.2
0.3
0.4
(c)
0.5
0.6
0.7
0.8
0.9
1
(d)
Fig. 4. Concentrations of H2 (a) and H2 O2 (b), temperature (c) and pressure (d) for time moment t ¼ 0:0001.
1
1 3000
0.9 0.8
2500
0.7
0.8 2500
0.7
0.6
2000
0.5
0.6
2000
0.5 1500
0.4 0.3
1500
0.4 0.3
1000
0.2 0.1 0
3000
0.9
500 0
0.1
0.2
0.3
0.4
0.5
(a)
0.6
0.7
0.8
0.9
1
1000
0.2 0.1 0
500 0
0.1
0.2
0.3
0.4
0.5
(b)
Fig. 5. Temperature for time moment t ¼ 0:00016 (left) and t ¼ 0:00019 (right).
0.6
0.7
0.8
0.9
1
170
B.P. Rybakin et al. / Computers & Fluids 90 (2014) 164–171
Table 2 Time of calculations and accelerations for different grid size. Grid size
Time of calculations, s
64 64 128 128 256 256 512 512 1024 1024
M2090
2 M2090
4 M2090
M2090
2 M2090
4 M2090
8.69 41.19 317.21 2416.15 17981.66
2.81 7.61 35.29 218.53 1650.61
1.83 4.68 18.44 112.82 836.05
1.67 3.46 11.45 60.45 431.62
3.09 5.41 8.99 11.06 10.89
4.75 8.80 17.20 21.42 21.51
5.20 11.90 27.70 39.97 41.66
45
4
1xM2090 1xM2090 new 2xM2090 4xM2090
40 35
3.5
25 20 15
2.5 2 1.5 1
10
0.5
5 0
2xM2090 4xM2090
3
30
Acceleration
Acceleration
Acceleration vs. X5650, times
X5650
64
128
256
512
1024
0
64
Number of cells
128
256
512
1024
2048
Number of cells
(a)
(b)
Fig. 6. Acceleration on Tesla M2090 GPU comparision with Intel Xeon X5650 CPU (left); acceleration on multi GPU (right).
Figs. 4, 5 demonstrate the results of test computations of the gas dynamics problem in consideration of the chemical processes in 2D domain filled with a mixture of chemically-regulating gases. The duration of calculations performed using the central processor, one and two and four graphic processors, and the relative acceleration are provided in Table 2 for various sizes of the computational domain. Results of computations show that the computations conducted on one graphic processor M2090 results in acceleration as compared to computations conducted on the CPU even for a small-sized computational domain (up to 128 128 elements). With the increase in the cells of the computational domain, there is an increase in productivity (Table 2, Fig. 6) and resulting in a somewhat steady value, followed by a slight drop in productivity. This drop in productivity is most likely associated with the lacking program optimization. Computations were conducted on one core of the Intel Xeon X5650 processor and on one, two and four graphic accelerators (GPU). A similar result is observed with the use of a bigger computational domain of 2048 2048 elements for two and four graphic accelerators. A maximum acceleration on one graphic card is approximately eleven times the calculation on the Intel Xeon X5650 processor. An even bigger acceleration is reached using 2 and 4 GPU (Fig. 6 a, b) for big grids elements, sized 1024 1024 and 2048 2048. 6. Conclusions During work on a graphic processor, acceleration is reached even when working on a non-optimized program and for a computational domain with a small number of cells. Even an insignificant optimization of algorithm, consisting of better use of local memory leads to increased computational productivity. The increase in the size of computational domain leads to increased productivity (Table 2, Fig. 6). A similar result is observed when using several gra-
phic cards. Maximum acceleration on one graphic card reaches 11.06 times as compared to computation on the Intel Xeon X5650 processor. The use of two graphic processors results in acceleration of 21.51 times, and on four graphic processors the acceleration increases up to 41.66 times as compared to the CPU. The obtained results can be explained by the high computational complexity of the analyzed algorithm of chemical kinetics and its structure. To conduct chemical kinetics computation 2300 bytes of graphic processor memory was needed for each thread, where most of the memory was occupied by second-order-accurate variables. In subprograms that conduct the computation of gas dynamics, 388 bytes per thread is needed. Based on this, to conduct 2D test computations, the optimal block size was block 8 4 for computing the chemical phase. The use of blocks of other sizes lead to increased time expenditures for computation. Since chemical kinetics computations demand extreme accuracy, all computations on the graphic accelerator were conducted with double-order accuracy, which also has significant impact on productivity. The number of blocks in the grid is calculated automatically based on the size of the grid. 7. Future work In the future, it is planned to conduct studies to build parallel algorithms that use the OpenMP technology to work on several cores of CUDA Intel Xeon X5650 processors, graphic accelerators M2090 and new graphic accelerators from NVIDIA Kepler K20. It is planned to use schemes WENO and PPM, in addition to TVD, to study the impact of the applied computational scheme on the overall computational productivity. A comparison of acceleration achieve by each of these algorithms will be conducted, along with an evaluation of accuracy of each of these schemes. To solve stiff equations of chemical kinetics, it is planned to use explicit schemes, such as fourth-order and fifth-order-accurate
B.P. Rybakin et al. / Computers & Fluids 90 (2014) 164–171
Runge–Kutta, instead of implicit schemes currently in use. Algorithms of computation will be modified for asynchronous work on graphic accelerators and regular processors. It is planned to compare the speed of explicit and implicit schemes intended for solving stiff ordinary differential equations with equal computation accuracy. Acknowledgements This work is conducted with the support of the General Committee of the Russian Academy of Sciences (Program 18) and the RFBR (Research Project No. 11-07-00679á). References [1] CUDA Toolkit Documentation; 2013.
. [2] Smirnov NN, Nikitin VF, Alyari Shurekhdeli S. Perehodhye regimy rsprostranenia voln v metastabilnyh sistemah. Phisica gorenia i vzryva 2008;44(5):25–37 (in Russian). [3] Maas U, Warnatz J. Ignition processes in hydrogen–oxygen mixtures. Combust Flame 1988;74(1):53–69. [4] Toro EF. Riemann solvers and numerical methods for fluid dynamics. Springer; 2009.
171
[5] Jiang G-S, Tadmor E. Nonoscillatory central schemes for multidimensional hyperbolic conservation laws. SIAM J Sci Comput 1998;19(6). 1982–1917. [6] Colella P, Woodward PR. The piecewise parabolic method (PPM) for gasdynamical simulation. J Comput Phys 1984;54(1):174–201. http://dx.doi.org/ 10.1016/0021-9991(84)90143-8. [7] Rybakin BP, Shider NI. Development of parallel algorithms for solving the problems of gravitational gas dynamics. Numer Methods Programm 2010;11:388–94 (in Russian). [8] Rybakin BP. Modeling of III-D problems of gas dynamics on multiprocessing computers and GPU. Comput Fluids. doi:http://dx.doi.org/10.1016/ j.compfluid.2012.01.016. [9] Liska R, Wendroff B. Comparison of several difference schemes on 1d and 2d test problems for the Euler equations. SIAM J Sci Comput 2003;25(3):995–1017. [10] Rybakin BP. Parallelnoe programmirovanie dlya graficheskih uskoriteley. SRISA RAS 2011 (in Russian). [11] Nessyahu H, Tadmor E. Non-oscillatory central differencing for hyperbolic conservation laws. J Comput Phys 1990;87(2):408–48. [12] Harten A. High resolution schemes for hyperbolic conservation laws. J Comput Phys 1983;49(3):357–93. http://dx.doi.org/10.1016/0021-9991(83)90136-5. [13] Rybakin BP, Egorova EE, Stamov LI. Primenenie graficheskih protsessorov dlya resheniya zadach gazovoy dinamiki s uchetom uravneniy himicheskoy kinetiki. Numer Methods Programm 2013;14:31–7 (in Russian). [14] Rybakin BP, Egorova EE, Stamov LI. O primenenii graficheskikh protsessorov dlya uskoreniya resheniya zadach gazovoy dinamiki. In: Tezisy sektsionnykh dokladov Sankt-Peterburgskogo nauchnogo foruma Nauka i obshchestvo. Nauka i progress chelovechestva; 2012. p. 96–99 (in Russian).