PARALLEL COMPUTING Parallel
ELSEVIER
Computing 23 (1997)
1249-1260
Parallel computing of diatomic molecular rarefied gas flows ’ Y. Matsumoto *, T. Tokumasu Department of Mechanical Engineering, The Unioersify of Tokyo, Hongo. Bunkyo-ku. Tokyo 113. Japan
Received 1 November
1996; revised 24 February 1997
Abstract A parallel algorithm for direct simulation Monte Carlo calculation of diatomic molecular rarefied gas flows is presented. For reliable simulation of such flow, an efficient molecular collision model is required. Using the molecular dynamics method, the collision of N, molecules is simulated. For this molecular dynamics simulation, the parameter decomposition method is applied for parallel computing. By using these results, the statistical collision model of diatomic molecule is constructed. For validation this model is applied to the direct simulation Monte Carlo method to simulate the energy distribution at equilibrium condition and the structure of normal shock wave. For this DSMC calculation, the domain decomposition is applied. It is shown that the collision process of diatomic molecules can be calculated precisely and the parallel algorithm can be efficiently implemented on the parallel computer. 0 1997 Elsevier Science B.V.
Keywo&:
Rarefied gas flows; Molecular collision model; Direct simulation Monte Carlo method; Parameter
decomposition method; Domain decomposition; Load balance; Distributed memory multiprocessors
1. Introduction The direct
simulation
Monte
Carlo (DSMC)
method
is the most powerful
and widely
used to simulate rarefied gas flows. In this method, the calculation domain is divided into cells, the gas molecules are individually tracked in the domain and. the collision process between molecules is calculated in the cell by the Monte Carlo method using
’
Corresponding
author.
Tel.:
+X1-3-381221
I I. fax: + 81-3-38180835;
e-mail:
[email protected]
tokyo.ac.jp.
’ The numerical calculations were performed on T3D Corporation, Japan. 0167~8191/97/%17.00
0
PI/ SOl67-8191(97)00051-3
al Power Reactor and Nuclear Fuel Development
1997 Elsevier Science B.V. All rights reserved.
1250
Y. Matsumoro, T. Tokumasu/Parallel Compuring23 (1997) 1249-1260
random numbers. Many calculation techniques [l-3] for molecular collision and molecular models in which inelastic effects - for instance internal degree of freedom, chemical reactions, electronic excitation and radiation - are taken into account have been developed [4,5]. Recently, this method has also been applied to simulate almost continuum fluid flows [61. However the DSMC method takes rather long calculation time and the many IF statements make it unsuitable for vector processing, which makes parallel computing a better choice for reasonable computation time. For high performance computing, efficient numerical methods and rational mathematical modeling of the phenomena are required. In this paper, the collision of a pair of N, molecules is simulated by the molecular dynamics (MD) method. The temperature is below 1000 K. In this temperature range, the vibrational degree of freedom of the colliding molecules is negligible and rotation is only considered as an internal degree of freedom. The energy transfer between translational and rotational degrees of freedom is calculated for several hundred thousand cases. In this MD calculation, each collision is assigned to one of the processors of the parallel computer. After the calculations in each processor, the results are collected to construct a data base. This parallelization method shows linear scalability. Using these results, the inelastic collision model of diatomic molecules for the DSMC method is constructed so that diatomic rarefied gas flows can be calculated precisely. To make sure of its validity, two calculations are carried out by this model in the DSMC method. The one is the calculation of translational and rotational energy distributions at equilibrium condition. These results are compared with theoretical ones. In this DSMC calculation, the same calculation was done in each processor using different sequence of random numbers. After the calculation, the results are collected to reduce the statistical fluctuation. This parallelization method also shows linear scalability. The second example is the calculation of the structure of a normal shock wave. In this DSMC calculation, the domain decomposition method is applied taking the calculational load in each cell into account. Due to the unbalance of calculational load among the cells and the communication overhead between CPUs, linear scalability for many CPUs is not obtained. These numerical results are compared with experimental results. Both simulated results agree well with experimental or theoretical data.
2. The MD simulation 2.1. Interaction
potential
of a collision of diatomic
of diatomic
molecules
molecule
Nitrogen gas is chosen as an example of a diatomic molecule for which considerable number of experiments of nonequilibrium flows exist. Both colliding molecules are assumed as rigid rotors because the vibrational degree of freedom is neglected in the present moderate temperature range, for example from 200 K to 1000 K. Forces and moments acting on the colliding molecules are calculated by adding the potential forces acting on the two atoms of a colliding molecule. The Lennard-Jones (12-6) potential is used to describe the interaction.
Y. Matmmoto, T. Tokumusu / Parallel Computing 23 (1997) 1249-1260
1251
The parameters uo, E, are determined so that the potential field created by the two atoms of the molecule closely approximates the molecular one deduced from experimental data for viscosity (a, = 3.798 X lo-” m, .s, = 9.858 X 10Wz2J). This is obtained o0 = 3.22 X lo-” m, E, = 4.06 X 1O-22 J [7]. 2.2. Calculational method Molecule 1 is located at ($a,, ib, 0) and molecule 2 at (- {q, ib, 0). A schematic diagram of the collision process is shown in Fig. 1. The initial x-distance between the two molecules is 3u,. At this distance, the initial potential energy is small enough to be neglected. The initial y-distance is the impact parameter b. The relative velocity of the two molecules is determined from u, = /e,/m where e,, is the relative translational energy. The velocity of molecule 1 is U, = ( - uo, 0, 0) and that of molecule 2, v2 = (u,, 0, 0). Calculation is continued until the distance between the two colliding molecules is large enough to give a negligible potential energy, and the relative translational and the two rotational energies are recorded. After the collision, the translational energy ei, and the two rotational ones e:,, eL2 change depending on the direction and phase of the rotational vector and the impact parameter even though the initial translational and rotational energies et,, e,,, er2 are the same. For this reason, this simulation is carried out 5000 times for each combination of initial energy by changing the direction and phase of the rotational vector and the impact parameter randomly. The simulations mentioned above are carried out for 126 combinations of translational energies, 100, 200, 400, 600, 800 and 1000 K, and rotational energies, 0,200, 400, 600, 800 and 1000 K. Here, the molecular energies, e, and e,, are given in term of the temperature, T, as e,, = $kT,, and e, = kT,, where k is the Boltzmann constant.
( Molecule 2 )
Fig. 1. The collision process of diatomic molecules.
1252
Y. Matsumoto, T. Tokumasu/Parallel
Computing 23 (1997) 1249-1260
2.3. Parameter decomposition computing of molecular collision The MD calculation is parallelized by giving different sets of parameter to each processor. 64 processors are used for 126 combinations of translational and rotational energies. One processor works as a master to distribute the initial translational and rotational energies to the slaves, the remaining 63 processors, and to collect the calculated results from the slaves. In this calculation, random numbers uniformly distributed between 0 and 1 are used many times. In the Cray-T3D, however, each processor generates the same array of random number. Therefore the random numbers are shuffled nM times at nth processor, where M is an integer. This parallelization method shows linear scalability.
3. Construction
of the collision
model
3.1. Definition of collision cross section For the DSMC method it is necessary to calculate the collision frequency in each cell. The collision cross section S should be defined precisely. The rate of energy transfer is shown in Fig. 2 as a function of the impact parameter. It is distributed widely from 0% to 200% depending on the direction and the phase of the rotational vector. However, it decreases suddenly at about b = 1.5~~ and remains nearly 0, independent of rotational vector. At the impact parameter where the rate of energy transfer suddenly decreases, the rate of energy transfer is distributed between 0% and 10%. This tendency is found in all combinations of initial energy. For this reason, the radius of the collision cross section d is defined as the largest impact parameter where the rate of energy transfer is over 10% and the collision cross section is defined as S = rrd*. The collision cross section is shown in Fig. 3 as a function of temperature. The collision cross section decreases as the translational temperature decreases. However, it is found that there is no dependence on the rotational temperature.
Impact Parameter [G a] Fig. 2. The rate of energy transfer versus impact parameter.
I’. Matsunwto. T. Toktunasu/Parallel
.%
a”
,I0
200
Computing 23 (1997) 1249-1260
400
600
600
1253
1000
[K]
Temperature
Fig. 3. The collision cross section versus temperature.
3.2. The de$nition of energy afer the collision
The distribution of energies after collision is shown in Fig. 4. Of all simulations, only those in which the two molecules collide with each other within the radius of the collision cross section are used to obtain the data. The collision model of diatomic molecules is constructed by approximating this distribution by a suitable functional form, namely F(e)
A exp( -B,( Er, - e)}
for e I EP
Aexp{-B,(e-E,))
fore>,!?,
= i
where Ep is the most probable energy and A, B, and B, are determined so that the integral of the function equals 1 and the left and right deviations equal those of the MD simulation. Thus A, B, and B, are determined as follows,
0.25 -
,
0.2 -
0.15
,
,
,
,
,
,
,
Etr,MD : Etr,DSMC : ---- Erl.MD : ----. Erl ,DSMC : ....._.. Er2,MD : -.-.. -
’
-
0 5
,
10 15 20 25 30 35 40 45 50 Energy
Fig. 4. The distribution
[ ~a]
of energies after the collision.
1254
Y. Matsumoto, T. Tokumasu / Parallel Computing 23 (1997) 1249-1260
where or and a; are the left and right side deviations method, these functions are used as follows,
of the MD results. In the DSMC
where R is a random number uniformly distributed in (0, 1). These functions are applied to both translational and rotational energy distributions because the two distributions are similar. In the DSMC method, the energy after collision is determined as follows so that the total energy is conserved. The total energy before collision is etota, = 2 e,, + e,, + er2. Next, two of the three energies e:,, e:, e:2, chosen randomly, are determined by the method mentioned above. The remaining one is determined by subtracting these two energies from etota,.
4. DSMC calculation of diatomic molecular rarefied gas flows 4.1. Parallel computing algorithm for DSMC calculation In DSMC calculation, the calculational domain is divided into cells which contains a considerable number of representative molecules. The calculation is performed as follows: (1) The variables used in the calculation are determined. (2) The representative molecules are distributed into cells and velocity and rotational energies are given to the molecules as initial conditions. (3) The molecules are then moved according to their velocities. If a molecule passes through the processor boundary, it is registered in the processor where it enters. (4) Collisions are calculated according to the molecules’ velocities. The Null collision method [3] is used to determine the collision. (5) The pressure, translational and rotational temperatures and density are calculated in each cell. (6) The data are stored. The parallelization of DSMC calculation is done as shown in Fig. 5. The cells are distributed among the processors. The DSMC calculation is performed according to the sequence mentioned above. If a molecule in a cell goes out and enters a cell belonging to another processor, the corresponding data must be transferred from the original processor to the other one which the molecule enters. A one-dimensional flow is calculated for the illustration. The molecules which constitute the flow enter the upper boundary at velocity u and exit from the lower boundary with the same velocity. The length of the calculation domain is 4Oh, where A is the mean free path of the molecules. Each molecule is registered in each cell. For example, the velocity vi(j) is denoted
Y. Matswwto,
12.55
T. Tokumasu / Parallel Computing 23 (1997) 1249-l 240
Communication .’
i
t-
. . . .’
-..
...
.’
i..w ..
__,,’
IIIIIIIiIII Calculation
Domain +
Fig. 5. The diagram of DSMC calculation.
by the jth molecule in the ith cell. Now consider the second molecule in the ith cell moves to (i + 1)st cell as shown in Fig. 6. If the molecule moves from the ith cell to the (i + 1)st cell, the data of the molecules are recorded at dummy array and deleted from the original array (step (1)). In this step the data of molecule in the ith cell are rearranged and the number of molecules n; are recalculated. Next the processor which handles the ith cell reads the number of molecules in the (i + l>st cell, ni+ 1 (step (2)). Last, the data registered at the dummy array is transferred into the (n,, , + 1)st element in the array ui+, (TZ,+~ + 1) and the value PZ,+~+ 1 is recorded as the number of molecules of the (i + 1)st cell (step (3)). The communication occurs when the molecule moves through the cell boundary which coincides with a processor boundary. Now consider that the molecules can move
“i *I. +
write
1
(3) read niti (2) 1
i th cell
“; +
P+l) st cell
“; - 1
(3)
“it21 -_ I
I
I
dummy
I
I
I
array Fig. 6. The diagram of data transfer.
"i+J
‘
1256
Y. Marsumoto, T. Tokumasu/Parallel 701
Compuring 23 (1997) 1249-1260
,
,
,
,
,
,
,
lo
20
30
40
50
60
70
60 50
0
The Number of Processor
Fig. 7. Scalability of a one-dimensional flow. Computational conditions: number of cells is 93 and number of molecules is about 7000.
more than two cells (for instance from the ith cell to the (i + 2)nd cell). In the case that one processor deals one cell, there is a possibility that the ith and the (i + 1)st processor can read nif2 and record the data at the dummy array to the (ni+* + 1)st element at the array in the (n + 2)nd cell at the same time and one of the transferred data is deleted. If the number of molecules which moves more than two cells is so large, this calculation is not carried out correctly. In this paper, the calculation time step At is set small so that the number of molecules which moves more than two cells is negligible and the molecules which move more than two cells are deleted. In these calculational condition only 3 molecules among 15 million molecules which pass the cell boundary move more than two cells and it can be neglected. First, the ith processor transfers position, velocity and rotational energy of the molecules which pass through the processor boundary between the ith and the (i + 1)st processors to the (i + 1)st processor at the same time. Next, the ith processor transfers the properties of the molecules which pass through the processor boundary between the ith and the (i - 1)st processors to the (i - 1)st processor at the same time. By this method, the communication time is greatly reduced. This flow is simulated by the calculation method mentioned above and the scalability is measured. This result is shown in Fig. 7. As shown in this figure, the acceleration ratio is 36.30 for 64 processors (0.57 per processor) though it is 1.88 for 2 processors (0.94 per processor). In this calculation, the computational load of the collision process and the molecular movement is the same because the number of molecules in each cell is almost equal to each other. The cause of the degeneration is the communication between processors. 4.2. The translational
and rotational energy distributions
at equilibrium
The translational and rotational energy distributions at equilibrium are calculated by the DSMC method. In this calculation, it is not necessary to divide the calculation domain into cells because the molecular movement is isotropic and the number density of the molecules is homogeneous. For this reason, this program is parallelized in such a
Y. Matsumoto, T. Tokumasu / Parallel Computing 23 (1997) 1249-1260
: -
1
1257
IJ : Translational, presen
LL 6 a:
0.5
Energy1E al Fig. 8. The energy distribution at equilibrium condition.
way that the equilibrium condition of 12= ( pV>/( NkT) molecules is simulated by one processor, where p is the pressure, V is the volume of the calculation domain and N is the number of processors, in this case 64. The results of all processors are collected to reduce the statistical fluctuations. In this case it is better to use this parallelization method than the one mentioned in Section 4.1. The calculated translational and rotational energy distributions at equilibrium are compared with theoretical values. AS initial conditions, all molecules are given e,, = 1SkT as the translational energy and e, = kT as the rotational one, with T = 400 K. The calculation is continued until equilibrium. The results are shown in Fig. 8. In this calculation, P = 1.013 X lo5 Pa, V = 0.3 X lo-*’ m3 and N = 7337. The abscissa is the energy and the ordinate is the probability density function. The solid lines show the Maxwell-Boltzmann distribution and the open symbols show the values calculated by using the present model. As shown in this figure, the calculated results are in good agreement with the Maxwell-Boltzmarm distribution. 4.3. Normal shock wave in N2 As a second example, we now calculate a normal shock wave in N,, in which the nonequilibrium condition between the translational and the rotational energies exists. The results are then compared with experimental data by Alsmeyer [S] and Robben and Talbot [9] for validation. In this simulation, the propagation direction of the shock wave is denoted by x, and y and z are orthogonal to this direction. The calculation domain is 4Oh in the x-direction, and its cross section 1.0 X lo-l6 m2. In this condition, the number of simulated molecules is several thousand. As the initial condition, the temperature T,,, the pressure P,, and the Mach number Mi, are given in the upstream state. Those in the downstream state are determined so that they satisfy the RankineHugoniot relation. The molecules are distributed uniformly on each side of the calculation domain. The initial velocity of the molecules in the upstream and downstream region is given to satisfy the Maxwell-Boltzmann distribution at each temperature. This calculation is carried out until the normal shock wave becomes stable. The program is parallelized as shown in Section 4.1 and executed by 1, 2, 4, 8, 16, 32 and 64 processors. The scalability of the calculation is shown in Fig. 9. The line for case B represents the result. As shown in this figure, the acceleration ratio is only 5.36 at 64 processors (0.084 per processor). In calculations at such high Mach number shock wave,
1258
Y. Matsumoto,
T. Tokumasu/Parallel
Computing 23 (1997) 1249-1260
70 60 50 40 30 20
10 0
0
10
20
30
40
50
60
70
The Number of Processor
Fig. 9. The scalability of the shock wave calculation. Case A: one-dimensional flow without shock wave. Case B: normal shock wave without load balance in processor. Case C: normal shock wave with load balance in processor. Computational conditions: number of cells is 111 and number of molecules is about 7000.
the number of molecules in cell which belongs to the upstream region and that which belongs to the downstream region has a large difference. This creates a difference in the computational load of the collision process and molecular movement between upstream and downstream processors. In this method, the upstream processors must wait until the downstream processors complete the calculation. It is known that the calculation time of collision process is much larger than that of molecular movement and the computation time of the null-collision method is proportional to the square of the number of molecules. For this reason, the cells are distributed in each processor so that the square of the number of molecules in each processor are almost equal. In this method, all the latter half of the processors (i > N/2, where i is the processor number and N is the number of processors) treat cells which belong to the downstream equilibrium flow field. The computational load in these processors, however, is not always the same because the number of cells which treat the upper processors (denoted by NC_,,) is not always divisible by N/2. This problem is solved if the number of cells is much larger than the number of processors. In general, however, the memory size of each processor is not
X-Distance
[ A in]
Fig. 10. The normal shock wave at M,, = 2.0.
Y. Matsunwto, T. Tokumasu/ Parallel Computing23 (1997) 1249-1260
1259
. :Experiment -0.5
' -10
(
' -5
m
’
X-Distarke[
”
n
' 10
A in;
Fig. 11. The normal shock wave at M, = 1.71.
enough for large number of cells. In this paper, Nce,, is adjusted so that NC_,,is divisible by N/2. The scalability of this algorithm is shown in Fig. 9. The line for case C represents the result. The number of cells is 111. As shown in this figure, the acceleration ratio is 38.63 for 64 processors (0.60 per processor). This shows that the acceleration ratio in this method is much larger than that of case B and is very close to the calculated result for case A. For this we deduce that the bad efficiency is mainly caused by the communication overhead. The collision process and molecular movement are parallelized well. These results are shown in Fig. 10 for the case ‘of Mi, = 2.0, P, = 6.666 Pa and Ti:,,= 300 K. The number of molecules is about 20000. The calculated number density of the molecules in the normal shock wave agrees well with the experimental results obtain by Alsmeyer [8]. The energy distributions upstream and downstream agree with the Maxwell-Boltzmann distribution. The energy profiles show that t,,,, overshoots and the other energies relax in the order of ttr,yr and T,. The normal shock wave for the case of min = 1.71, P, = 6.666 Pa and c:,, = 200 K has also been calculated and this temperature profile is compared with that obtained in the experiment by Robben and Talbot [9]. The result is shown in Fig. 11. The figure also shows that the relaxation of the rotational energies is in good agreement with experimental data.
5. Conclusion
A parallel algorithm for the DSMC method of diatomic molecular rarefied gas flows is presented. Using a parameter decomposition method in which the calculation is decomposed by the parameters for the initial conditions, the inelastic collision of diatomic molecules is simulated by the MD method and a collision model for diatomic molecules is constructed on the basis of these MD results. The validity of the model is verified by calculating the translational and rotational energy distributions at equilibrium and normal shock wave by the DSMC method using this model. The results show that the collision of diatomic molecules is calculated very well by using this model. In the DSMC calculation, the domain decomposition method, where the calculation domain is
1260
Y. Matsumoto, T. Tokumasu/Parallel
Computing 23 (1997) 1249-1260
divided and distributed among each processor, is used to parallelize the calculation. In the case where the ratio between the number of processors and that of calculation cells is small, it appears that this method exhibits linear scalability. When the ratio is larger, however, the scalability becomes worse. The cell adjustable method is developed to deal with this problem. The scalability result for this method shows that the collision process and molecular movement can be parallelized in this way.
Acknowledgements
We would like to acknowledge that a part of the computation was performed under the support by Power Reactor and Nuclear Fuel Development Corporation, Japan.
References [l] G.A. Bird, Molecular Gas Dynamics, Clarendon Press, Oxford, 1976. [2] K. Nanbu, Direct simulation scheme derived from the Boltzmann equation I. Monocomponent gases, J. Phys. Sot. Jpn. 49 (1980) 2042-2049. [3] K. Koura, Null collision technique in the direct simulation Monte Carlo method, Phys. Fluids 29 (1986) 3509-3511. [4] C. Borgnakke, P.S. Larsen, Statistical collision model for Monte Carlo simulation of polyatomic gas mixture, J. Comput. Phys. 18 (1975) 405-420. [5] G.A. Bird, Direct simulation of gas flows at the molecular level, Commun. Appl. Numer. Meth. 4 (1988) 165-172. [6] D. Riechelmann, K. Nanbu, Monte Carlo direct simulation of the Taylor instability in rarefied gas, Phys. Fluids 5-11 (1993) 2585-2587. [7] Y. Matsumoto, T. Tokumasu, Cluster formation of diatomic molecules, Thermal Sci. Eng. 2 (1994) 138-144. [S] H. Alsmeyer, Density profiles in argon and nitrogen shock waves measured by the absorption of an electron beam, J. Fluid Mech. 74 (1976) 497-513. [9] F. Robben, L. Talbot, Experimental study of the rotational distribution function of nitrogen a shock wave, Phys. Fluids 9 (1966) 653-662.