Computers and Geotechnics 25 (1999) 227±246 www.elsevier.com/locate/compgeo
Parallel simulations of pore-scale ¯ow through porous media J.P. Morris a,*, Y. Zhu b, P.J. Fox c a
Department of Physics, Purdue University, West Lafayette, IN 47907, USA Golder Associates, 3730 Chamblee Tucker Road, Atlanta, GA 30341, USA c Civil and Environmental Engineering. Department, UCLA, Los Angeles, CA, 90095-1593, USA b
Received 16 September 1998; received in revised form 27 July 1999; accepted 31 August 1999
Abstract Smoothed particle hydrodynamics (SPH) is a versatile technique which can be applied to single and multiphase ¯ow through porous media. The versatility of SPH is oset by its computational expense which limits the practicability of SPH for large problems involving low Reynolds number ¯ow. A parallel pore-scale numerical model based on SPH is described for modeling ¯ow phenomena in porous media. Aspects of SPH which complicate parallelization are emphasized. The speed of the method is demonstrated to be proportional to the number of processors for test cases where load balance was achieved. The parallel algorithm permits the application of SPH to more complicated porous media problems than previously considered. For such problems, best performance is achieved when several soil grains are simulated by each processor. Finally, future applications of the method and possible extensions are discussed. # 1999 Elsevier Science Ltd. All rights reserved.
1. Introduction Considerable theoretical, experimental, and numerical research has been devoted to the development of pore-scale models for ¯ow through porous media. Numerical methods applied to porous media ¯ow include ®nite dierence methods [1], ®nite element methods [2±6], and boundary integral methods [7,8]. It can be dicult, however, to apply such methods to ¯ows involving immiscible ¯uids or mobile solid boundaries. Lagrangian particle techniques, such as smoothed particle hydrodynamics * Corresponding author. Current address: Geophysics and Global Security Division, Lawrence Livermore National Laboratory, PO Box 808, L-219. Livermore, CA 94551-9900, USA. Tel.: +1-925-4244581. 0266-352X/99/$ - see front matter # 1999 Elsevier Science Ltd. All rights reserved. PII: S0266-352X(99)00026-9
228
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
(SPH) and lattice Boltzmann [9], provide an alternative approach which is more easily applied to such problems. SPH has been used to investigate pore-scale ¯ow phenomena in porous media [10±12] and multiphase ¯ow with surface tension eects [13]. First developed for astrophysical applications [14,15], SPH is a fully Lagrangian technique in which the numerical solution is achieved without a grid. Using this approach, ¯uid velocity and pressure distributions, discharge velocity, and ¯uid particle paths can be computed, as well as other information that would be dicult or impossible to observe experimentally. SPH has certain advantages over other ¯uid dynamical methods, which may encounter diculty with deformable boundaries, multiphase ¯ows, free surfaces, and the extension to three dimensions. For example, if the soil grains which form a porous structure are allowed to move in response to ¯uid ¯ow, other techniques may require continual remeshing of the ¯ow domain, leading to increased numerical diusion and algorithmic complexity. In addition, SPH uses an interpolation method which simpli®es the inclusion of chemical and thermal eects. While SPH is versatile, errors can sometimes be larger than those obtained using grid-based methods tailored for speci®c problems. Moreover, SPH can be computationally expensive for certain applications. For example, individual time-steps for comparable numbers of nodes with SPH and ®nite element methods take similar computational eort. However, many more steps are typically required by SPH to obtain a solution to steady-state problems. For time-dependent problems at low Reynolds number, the computational eort is similar to a ®nite element method employing explicit time integration. The computational expense of SPH is in part a consequence of its versatility. More specialized numerical techniques employ computational nodes which have relatively small regions over which they interact. For Eulerian methods, these regions remain ®xed throughout the computation (provided nodes are not added or deleted). In addition, the weights associated with nodes are often time-independent. With SPH, the nodes (particles) are free to move, producing variations in the nearest neighbors of each particle. Thus, the weights associated with particle interactions are time-dependent and must be recalculated at each time step. These aspects of SPH contribute to the computational expense of the method. Consequently, parallelization of SPH is often advantageous for problems involving high resolution or large computational domains. Due to the computational expense of SPH, previous studies applying SPH to ¯ow through porous media have considered two-dimensional problems involving a repeating cell with periodic boundary conditions that contained only one or two soil grains. The objective of this paper is to present a suitable approach for parallelizing the pore-scale model presented by [10] and [12], thus, extending the range of problems which may be considered. A brief overview of SPH is ®rst presented, including the necessary modi®cations for low Reynolds number, quasi-incompressible ¯ow. The method of parallelization is discussed and compared with previous approaches used to parallelize SPH. The enhanced performance of the parallel code is then demonstrated by considering three problems. Finally, conclusions are drawn and possible improvements to the parallel algorithm are discussed.
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
229
2. The SPH equations Using SPH the ¯uid is represented by particles, typically of ®xed mass, which follow the ¯uid motion, advect contact discontinuities, preserve Galilean invariance, and reduce computational diusion of various properties, including momentum. The equations governing the evolution of the ¯uid become expressions for interparticle forces and ¯uxes when written in SPH form. Using the standard approach to SPH [16,17], the particles (which may also be regarded as interpolation points) move with the local ¯uid velocity (see Fig. 1). Each particle carries mass m, velocity v, and other ¯uid quantities speci®c to a given problem. The equations governing the evolution of ¯uid quantities are expressed as summation interpolants using a kernel function W with smoothing length h. For example, the density at particle a, a , may be evaluated using direct summation by X mb Wab
1 a b
where Wab denotes
Fig. 1. Sphere of in¯uence for particle a.
230
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
Wab W
rab ; h;
2
rab ra ÿ rb
3
and
where ra denotes the position of particle a. The kernel typically takes the form jrab j 1 ; W
rab ; h N f h h
4
where N is the number of dimensions. The function f is typically a spline approximation to a Gaussian, such that f
s 0 for s5s0 , for some nondimensional radius s0 . The advantage of using a spline kernel is that the region of in¯uence of each particle is ®nite (a sphere of radius hs0 ), reducing the computational expense. For this work, the quintic spline [18] 8 >
3 ÿ s5 ÿ 6
2 ÿ s5 15
1 ÿ s5 ; if 04s < 1; > < 7
3 ÿ s5 ÿ 6
2 ÿ s5 ; if 14s < 2;
5 f
s if 24s < 3;
3 ÿ s5 ; 478 > > : 0; if s53; was used, for which s0 =3. The quintic spline has compact support and good numerical stability properties [19,20]. Expressions for quantities at the particles are obtained by summations involving the kernel or its derivatives. For example, in this work the following SPH expression is used for the pressure gradient term of the momentum equation: X pa pb 1
6 ra Wab ÿ rp ÿ mb a b a b where pa is the pressure at particle a and ra denotes the gradient with respect to the coordinates of particle a. For a kernel of the form in (4), (6) conserves momentum exactly since forces acting between individual particles are antisymmetric. If (1) is used, densities must be obtained ®rst by a sum over neighboring particles before gradients [such as (6)] may be calculated. To reduce the computational eort, we evolve density using an approximation to the continuity equation: X da mb
va ÿ vb ra Wab : ÿ
rva dt b
7
This permits particle densities to be calculated concurrently with other particle quantities. Although (7) does not conserve mass exactly [(1) does provided the total number and mass of particles are constant] direct summation can be used intermittently during a simulation to prevent signi®cant ``drift'' in particle densities [12].
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
231
Most grid-based techniques treat the ¯ow of water as incompressible since the speed of sound in water is usually large compared with bulk ¯uid motions (i.e. the Mach number is very low). Using SPH, pressure is an explicit function of local ¯uid density and particle motions are driven by local density gradients. Therefore, it is most natural to use a quasi-incompressible equation of state to model such ¯ows with SPH [21]. Since the actual equation of state for water would require a prohibitively small time step for stability (by the CFL condition [22]), the following arti®cial state equation is used: p c2s a ;
8
where cs is the speed of sound. The chosen value of cs is low enough to be practical, yet high enough to restrict density ¯uctuations within the desired limits (typically about 1% [10,12,21]). For example, Morris et al. [10] suggested that cs satisfy the largest of: c2s
V2 V aL ; ; ; L
9
where is the desired relative variation in density, is the dynamic viscosity, and V; L and a are typical velocity, length, and body acceleration scales. Most implementations of SPH employ an arti®cial viscosity ®rst introduced to permit the modeling of strong shocks [16,17,23]. This formulation conserves angular and linear momenta and has been used to model real viscosity [24,25]. For simulations involving low Reynolds numbers, however, it was found to produce inaccurate velocity pro®les [10]. More realistic viscosities using nested summations [26,27] or directly employing second derivatives of the kernel [28] have also been used. This work uses an approximation to the Laplacian recently applied to low Reynolds number ¯ows [10]. A Taylor expansion con®rms this formulation is appropriate and it has been used to model heat conduction [29]. The SPH momentum equation, is then written as X pa pb X mb
a b vab 1 @Wab dva ÿ mb ra Wab F; dt a b a b rab @rab b b
10
where F denotes the net body force driving the ¯ow. The implementation of no-slip boundary conditions for low Reynolds number ¯ow has been explored in detail [10,12]. In particular, due to the smoothing length (®nite interaction distance), a single line of stationary SPH particles does not produce an accurate no-slip boundary. In this work, no-slip boundary conditions at soil grain surfaces are simulated using special SPH boundary particles [10,12]. These particles are placed in parallel layers on and within the grain boundary. Boundary particles move with the grain velocity and make normal contributions to the evolution of density. For the viscosity calculation, boundary particles are assigned an arti®cial velocity which approximately reproduces antisymmetry in the velocity ®eld
232
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
across the boundary. Fig. 2 illustrates the concept for a curved boundary. For each ¯uid particle a, the normal distance da to the boundary is calculated. The normal de®nes a tangent plane (a line in two dimensions) and the normal distance dB to each boundary particle B. The velocity of particle a is extrapolated across the tangent plane, assuming zero velocity on the plane itself, giving each boundary particle the velocity, uB ÿ
dB ua da
11
If the boundary is in motion, ua is replaced by the ¯uid velocity relative to the boundary. The arti®cial velocity uB is used to calculate viscous forces, whereas the actual boundary velocity is used to evolve boundary particle positions and densities. If a ¯uid particle approaches a boundary surface too closely, large arti®cial velocities may result. To prevent this problem, da is bounded according to, p 3 x ;
12 da max da ; 4 where x is the initial nearest neighbor distance between ¯uid particles.
Fig. 2. Boundary particles are temporarily assigned arti®cial velocities to simulate a no-slip boundary.
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
233
3. Locating neighboring particles Eulerian methods and more conventional Lagrangian methods employ a mesh with a well-de®ned structure. Consequently, the identi®cation of interacting nodes is straightforward. The motion of SPH particles complicates parallelization of the method, since the neighbors which interact with a given particle constantly change. Using a kernel with compact support, all particles within hs0 (see Fig. 1) of particle a will contribute to the force calculation at a. The approach employed to locate nearest neighbors is of critical importance when considering the parallelization of SPH. Data structures used to keep track of particles become instrumental in dividing labor between processors. For example, if a tree structure is used to locate nearest neighbors, parallelization may be accomplished by sending separate branches of the tree to dierent processors. Most previous studies employing a parallel SPH code have involved highly compressible gases [30,31] or other materials with large variations in particle number density [32]. Such problems require varying the smoothing length throughout the computational domain to capture ®ne details of a solution. Ecient location of nearest neighbors then mandates the use of trees or hash tables [30,31]. For the class of problems considered in this paper, the ¯uid is quasi-incompressible and the local particle number density remains nearly constant throughout the computational domain. Consequently, a constant smoothing length is speci®ed and nearest neighbors are located eciently using a grid-based approach. A Eulerian mesh with grid spacing equal to the maximum interaction distance of the particles (i.e. hs0) is used. At the mid-point of each time step, particles are added to linked lists corresponding to the cells within the mesh. When particle interactions are calculated, only neighboring cells need be considered. 4. Load balancing For best performance of a parallel code, the computational eort should be divided among processors such that each takes equal time to complete a time step and idle processor time is minimized. For the problems considered here, the geometry of the computational domain does not change. In addition, particle number density remains approximately constant since the ¯uid is quasi-incompressible. For these reasons, dynamic load balancing (i.e. during the course of a simulation) was considered unnecessary and processor loads were balanced according to the initial placement of particles. This approach does not account for variations in load resulting from other system users utilizing or freeing processors during the course of a simulation. The computational domain was divided into a Cartesian grid of sub-cells which were assigned to corresponding processors (see Figs. 4, 5, 8, and 11). Some ¯exibility in the division of load between processors is permitted by moving the sub-cell boundaries. This approach was found to be adequate for cases where inhomogeneities of the computational domain were smaller than the cell size (i.e. more than one
234
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
soil grain per processor). Consequently, this method is most ecient for parallel machines employing a moderate number of powerful processors rather than a large number of relatively slow processors. It is possible to use an irregular grid of sub-cells to cover the computational domain, permitting a more even distribution of load between processors and improved performance. This extension would require modi®cations to the straightforward communication algorithm presented in the next section, such that each subcell can communicate with multiple neighbors in each direction. The implementation of irregular grids of sub-cells was not considered in this work. 5. Communication issues The application of SPH to groundwater ¯ow typically involves specifying periodic boundaries for a unit cell to simulate an in®nite lattice of repeating soil grains [11,12], as illustrated in Fig. 3. A predictor±corrector technique is used for timeintegration such that interparticle forces are calculated at the mid-point of each time step. Periodic boundary conditions are implemented by creating images (``ghosts'') of particles from the opposite side of the computational domain. These images are used to calculate hydrodynamic forces acting on particles near the edge of the computational domain at the midpoint of the time step. The parallel algorithm leaves most of this method intact, but modi®es the subroutines used to implement periodic boundaries. Each processor evolves particles belonging to its sub-cell within the unit cell. For example, the unit cell in Fig. 3 can be subdivided among nine processors as depicted in Fig. 4. Image particles at the boundaries of each sub-cell are copies of particles in neighboring sub-cells. The algorithm followed by each processor for a single time step is: 1. Evolve the particles within the processor's sub-cell to the mid-point of the time step. 2. Identify particles within a distance hs0 from the left and right boundaries of the processor's sub-cell and send corresponding data to the left and right processors. 3. Receive data from the left and right processors concerning neighboring particles in the left and right sub-cells. 4. Identify particles within a distance hs0 of the upper and lower boundaries of the processor's sub-cell and send corresponding data to the upper and lower processors. 5. Receive data from the upper and lower processors concerning neighboring particles in the upper and lower sub-cells. 6. Identify particles that currently lie within the processor's boundaries. 7. Evaluate interparticle forces. 8. Correct particle positions based on the evaluated forces. 9. Delete those particles previously identi®ed as lying outside the processor's boundaries at the mid-point of the time step.
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
235
Fig. 3. An in®nite cubic lattice is simulated using a periodic unit cell.
Fig. 4. Sub-cell partitioning for a unit cell.
For example, processor 5 in Fig. 4 will receive data initially from processors 4 and 6. This complete, communication with processors 2 and 8 will be initiated. Information regarding particles from the ``corner'' sub-cells (1, 3, 7, and 9) are received via processors 2 and 8. Similarly, processor 5 will send information regarding particles from sub-cells 4 and 6 to processors 2 and 8.
236
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
Fig. 5. Sub-cell boundaries for test problem I. Boundaries for 22, 33, and 44 sub-cell arrays are shown as solid gray, dashed black, and dashed gray lines, respectively.
Data can be communicated between processors via synchronous or asynchronous methods. When using synchronous send-receives, processors are paired up to communicate with alternate neighbors (`red-black' ordering) such that they initiate send and receive requests at the same time. Asynchronous communication permits processors to initiate send and receive requests at dierent times. Versions of our code using synchronous and asynchronous send-receives showed little variation in total execution time. Results using the asynchronous version are presented here because it permitted a wider range of processor arrangements, since processors were not required to communicate in pairs. 6. Initial conditions There are two basic approaches to initializing problems for parallel execution. One processor can set up the problem and transmit the results to the other processors (serial approach) or each processor can generate its respective initial state concurrently (parallel approach). The serial approach can be simpler for problems involving complicated boundaries, however, it limits the problem size to those which can be accommodated by a single processor. For this work, a parallel approach was
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
237
employed. Each processor was given a list of the objects (soil grains) within the ¯ow ®eld and the boundaries of its respective sub-cell. The processors then generated initial particle positions within their respective boundaries, satisfying the prescribed boundary conditions. 7. Simulations The SPH code was executed on an IBM SP2 parallel system. The SP2 provides for parallel programming using a distributed memory model. It has a High Performance Switch that connects the processors and provides for high-bandwidth communication between them. The simulations presented here employed a maximum of 16 processors. The parallel algorithm was implemented using the message passing interface (MPI) library. 7.1. Constant density Test problem I is a ¯uid of constant density, with no soil grains (i.e. no obstacles) in the ¯ow ®eld. This problem is optimal for the parallel code as work can be trivially distributed among the processors (assuming identical processor speed). This ¯ow was simulated using three levels of resolution corresponding to nearest neighbor distances of 0.01, 0.005 and 0.0025 times the size of the unit cell. These solutions required approximately 11 600, 46 000 and 184 800 particles, respectively. Simulations were performed for each resolution using 1, 4, 9, and 16 processors. The computational domain was divided into sub-cells of equal area as shown in Fig. 5. Fig. 6 presents the CPU time (T) required to complete a single time step as a function of the number of processors (Nproc) and the number of particles (Npart). Single processor times were based on the performance of a serial code using one processor of the IBM SP2. Fig. 6 shows that, as expected, CPU time decreases rapidly with increasing number of processors. The data also indicate that, for each number of processors, T increases linearly with Npart. Overall, T decreases to approximately 7% (high resolution) and 12% (low resolution) of the single processor value as the number of processors is increased from 1 to 16. Although indicating an increase in speed, the time taken per time step does not give a direct indication of the eciency of the algorithm. Ideally, the required CPU time will be proportional to Npart and inversely proportional to Nproc. It is, therefore, informative to consider a normalized dimensionless time required per time step, T, de®ned as, T
TNproc ; T0 Npart
13
where T0 is a constant equal to the time per particle for one step in the simulation of constant density using 184 800 particles on a single processor. For a perfectly parallelized algorithm, T will remain constant as the problem size and number of processors is
238
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
Fig. 6. CPU time for one time step as a function of number of processors for three resolutions of test problem I.
varied. Fig. 7 shows T versus Nproc for the three resolutions. Lesser values of T indicate higher program eciency. Program eciency is optimal for the single processor because no time is spent communicating with other processors. For a given resolution, the width of image particles surrounding each sub-cell is constant. As the number of processors increases, image particles thus constitute a larger proportion of the total number of particles on a given processor. Consequently, for a given resolution, the eciency of the method decreases with increasing number of processors. For lower resolution simulations the eect is more pronounced. Normalized time for the largest simulation is nearly constant, indicating that the algorithm becomes more ecient as Npart increases. 7.2. Single grain Test problem II, shown in Fig. 8, involved ¯ow past a single solid grain (i.e. solid cylinder) of radius r. Since there is no ¯ow through the grain, SPH boundary particles need only ®ll an annular region de®ned by radial distances (r-hs0) and r. Thus, a 33 array of square sub-cells can be accommodated using only the 8 outer processors (i.e. the processor for the center sub-cell is not needed). Likewise, a 44 array of sub-cells can be treated using only the 12 outer processors. For our parallel SPH code, computational speed is governed by the most heavily loaded processor. If the computational load is not balanced, one or more processors experience idle time
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
239
Fig. 7. Normalized CPU time for one time step as a function of number of processors for three resolutions of test problem I.
Fig. 8. Sub-cell boundaries for test problem II. Boundaries for 22, 33, and 44 sub-cells are shown as solid gray, dashed black, and dashed gray lines, respectively. Readjusted sub-cell boundaries for the 33 array are shown as narrow dashed black lines.
240
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
during each time step. Thus, although the total CPU time will decrease as more processors are brought to bear on a problem, the eciency of the parallel code will decrease as well. Two approaches were used to simulate the ¯ow using a 33 array of sub-cells. For the ®rst, the ¯ow domain was divided into sub-cells of equal area. Although this approach is straightforward, substantial load imbalances result because the corner processors are assigned more particles than the side processors. A second set of solutions was obtained by moving the processor boundaries slightly outward (see Fig. 8), so as to more evenly distribute particles among the processors. For the 44 array, simulations were run using only equal-area square sub-cells. In this case, processor loads could not be further balanced because any such repositioning of sub-cell boundaries would call upon the inner 4 processors. Figs. 9 and 10 show T vs Nproc and T vs Nproc , respectively, for simulations at three levels of resolution. The total time was decreased substantially (to 25 and 18% of the single processor value for low and high resolution, respectively) as Nproc was increased from 1 to 8 using square sub-cells. An additional decrease of 17±25% was possible by adjusting sub-cells boundaries for the 8-processor simulations. This results in a nearly constant value of T* for the higher resolution simulations using up to 8 processors. Although T decreased slightly when Nproc was increased to 12, T increased by 28 to 45%. The parallel code was considerably less ecient for
Fig. 9. CPU time for one time step as a function of number of processors for three resolutions of test problem 11. Points lying o the main curves correspond to simulations without the readjusted sub-cell boundaries for the 33 array.
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
241
Fig. 10. Normalized CPU time for one time step as a function of number of processors for three resolutions of test problem II. Points lying o the main curves correspond to simulations without the readjusted sub-cell boundaries for the 33 array.
Nproc =12 because the processor boundaries were almost coincident with those for the 8-processor simulations. However, the non-corner sub-cells were shared between two processors. Consequently, the total time taken was essentially the same (the time taken by the corner processors) but Nproc , increased by 50%, resulting in a substantial decrease in overall eciency. This problem could be possibly alleviated by specifying irregular sub-cell boundaries and a more even distribution of processor loads. 7.3. Arbitrary arrangement of grains Test problem III, shown in Fig. 11, was ¯ow through a unit cell enclosing an arbitrary arrangement of 6 soil grains, including several narrow pore throats. Accurate simulation of ¯ow through these pore throats requires high resolution, and hence more computational eort per time step and smaller time steps to maintain numerical stability. The eciency of the parallel code was tested by subdividing the ¯ow domain into 1, 4, 9, and 16 square subcells of equal area. In addition, simulations were performed for the 33 array with sub-cell boundaries adjusted to eliminate the right center processor (i.e. only 8 processors utilized). However, the load on the remaining 8 processors was still unbalanced after this adjustment. Fig. 12 shows T vs Nproc for simulations at three levels of resolution. For each level, T
242
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
Fig. 11. Sub-cell boundaries for test problem III. Boundaries for 22, 33, and 44 sub-cells are shown as solid gray, dashed black, and dashed gray lines, respectively. Readjusted sub-cell boundaries for the 33 array are shown as narrow dashed black lines.
decreases with increasing Nproc . Similar to the ®rst two test cases, gains in computational speed become progressively less as more processors are employed in the simulation (due to the necessary additional communications and poorer load balancing). The data also indicate that T increases linearly with Npart for any given number of processors. The corresponding plot of T vs Nproc is shown in Fig. 13. As with the other test cases, program eciency is optimal for the single-processor simulations. When 4 processors are employed, little eciency is lost for the higher resolution simulations because loads remained nearly balanced within the relatively large sub-cells. Similar to the previous test cases, the eciency of the method decreased with increasing number of processors, for a given resolution, and this eect was more pronounced for lower resolution simulations. Readjustment of sub-cell boundaries for the 33 array improved eciency by 4±9% since approximately the same amount of work was completed by 8 processors instead of 9. Considering 1 processor vs 16 processors, although eciency dropped by approximately 50%, the total time required was reduced to 12% of the single processor value at high resolution. For a constant pressure gradient in the x-direction, the high resolution simulations gave an intrinsic permeability of 4.0910ÿ10 m2 for test problem III. For comparison,
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
243
Fig. 12. CPU time for one time step as a function of number of processors for three resolutions of test problem III.
Fig. 13. Normalized CPU time for one time step as a function of number of processors for three resolutions of test problem III.
244
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
a ®nite element program was also used to solve this problem and yielded a corresponding value of 3.9110ÿ10 m2. The 5% discrepancy between these results is typical of SPH for this level of resolution [10]. 8. Discussion and conclusions An approach suitable for the parallelization of smoothed particle hydrodynamics (SPH) has been presented for simulations of quasi-incompressible, two-dimensional ¯ow through porous media. The performance of the code was demonstrated using three idealized problems incorporating periodic boundary conditions. For problems in which computational load was approximately balanced among processors, test results con®rm that computation speed scales appropriately with the number of processors. For cases in which processor loads were not balanced, although the total CPU time decreased as more processors were brought to bear on a problem, the eciency of the parallel code decreased as well. For the most complicated arrangement considered (test problem III), the performance of the code was good provided more than one soil grain was simulated by each processor. The Cartesian sub-cell structure permits some improvement in program eciency through adjustment of sub-cell boundaries (see test problems II and III). Several improvements to the method are possible. At present, boundaries between sub-cells form a Cartesian grid in the solution space. Clearly, more precise distribution of processor loads could be achieved for certain problems by further subdividing individual sub-cells or using an irregular grid of sub-cells. For problems involving more than one grain per processor (such as test problem III with 4 processors), the sub-cell structure employed here is satisfactory. For problems with more sub-cells per grain, the Cartesian sub-cell structure was found to be relatively inecient. This work has not addressed dynamic load balancing. Although the initial problem was subdivided to balance loads between processors, it was assumed that the speed of each processor remained the same throughout the computation. In addition, dynamic load balancing will be required for problems involving mobile solid boundaries. This method allows SPH to be applied to incompressible ¯uid-¯ow problems involving complicated arrangements of soil grains. A similar approach could be used to apply SPH to three-dimensional low Reynolds number incompressible ¯ows. Acknowledgements This work was sponsored by the Air Force Oce of Scienti®c Research, USAF, under grant number F49620-96-1-0020. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the ocial policies or endorsements, either expressed or implied, of the Air Force Oce of Scienti®c Research or the U.S. Government.
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
245
References [1] Schwartz LM, Martys N, Bentz DP, Garboczi EJ, Torquato S. Cross-property relations and permeability estimation in model porous media. Phys Rev E 1993;48:4584±91. [2] Snyder LJ, Stewart WE. Velocity and pressure pro®les for Newtonian creeping ¯ow in regular packed beds of spheres. A I Ch E Journal 1966;12:167±73. [3] Sorensen JP, Stewart WE. Computation of forced convection in slow ¯ow through ducts and packed beds Ð II velocity pro®le in a simple cubic array of spheres. Chem Eng Sci 1974;29:819±25. [4] Edwards DA, Shapiro M, Bar-Yoseph P, Shapira M. The in¯uence of Reynolds-number upon the apparent permeability of spatially periodic arrays of cylinders. Phys Fluids A 1990;2:45±53. [5] Ghaddar CK. On the permeability of unidirectional ®brous media: a parallel computational approach. Phys Fluids 1995;7:2563±86. [6] Meegoda NJ, King IP, Arulanandan K. An expression for the permeability of anisotropic granular media. Int J Numer Anal Methods Geomech 1989;13:575±98. [7] Larson RE, Higdon JJL. Microscopic ¯ow near the surface of two-dimensional porous media. Part 1. Axial ¯ow. J Fluid Mech 1986;166:449±72. [8] Larson RE, Higdon JJL. Microscopic ¯ow near the surface of two-dimensional porous media. Part 2. Transverse ¯ow'. J Fluid Mech 1987;178:119±36. [9] Qian YH, d'Humires D, Lallemand P. Lattice BGK models for Navier-Stokes equation. Europhys Lett 1992;17:479±84. [10] Morris JP, Fox PJ, Zhu Y. Modeling low Reynolds number incompressible ¯ows using SPH. J Comput Phys 1997;136:214±26. [11] Zhu Y, Fox PJ, Morris JP. `Smoothed particle hydrodynamics model for ¯ow through porous media'. In: Proceedings of the 9th International Conference on Computer Methods and Advances in Geomechanics Vol. 2, China, Wuhan, (1997), p. 1041±6. [12] Zhu Y, Fox PJ, Morris JP. `A pore-scale numerical model for ¯ow through porous media', Int J Numer Anal Methods Geomech 1999;23:881±904. [13] Morris JP. Simulating surface tension with smoothed particle hydrodynamics. Int J Numer Methods Fluids, (accepted for publication). [14] Lucy LB. A numerical approach to the testing of the ®ssion hypothesis. Astron J 1977;83:1013±24. [15] Gingold RA, Monaghan JJ. Smoothed particle hydrodynamics: theory and application to nonspherical stars. Mon Not R Astron Soc 1977;181:375±89. [16] Benz W. Smooth particle hydrodynamics: a review. In: The numerical modelling of non-linear stellar pulsations, NATO Workshop, Les Arcs, France, 1989: 269. [17] Monaghan JJ. Smoothed particle hydrodynamics. Annu Rev Astron Astrophys 1992;30:543±74. [18] Schoenberg IJ. Contributions to the problem of approximation of equidistant data by analytic functions. Q Appl Math 1946;4:45. [19] Morris JP. A study of the stability properties of SPH. Publ Astron Soc Aust 1996;13:97±102. [20] Morris JP, Analysis of SPH with applications. Ph.D. thesis, Mathematics Department, Monash University, Melbourne, Australia, 1996. [21] Monaghan JJ. Simulating free surface ¯ows with SPH. J Comput Phys 1994;110:399±406. [22] Courant R, Friedrichs K, Lewy H. UÈber die partiellen dierenzengleichungen der mathematischen physik'. Math Ann 1928;100:32±74. [23] Morris JP, Monaghan JJ. A switch to reduce SPH viscosity. J Comput Phys 1997;136:41±50. [24] Artymowicz P, Lubow SH. Dynamics of binary-disk interaction 1. Resonances and disk gap sizes. Astrophys J 1994;421:651±67. [25] Maddison ST, Murray JR, Monaghan JJ. SPH simulations of accretion disks and narrow rings. Publ Astron Soc Aust 1996;13:66±70. [26] Flebbe O, MuÈnzel S, Herold H, Riert H, Ruder H. Smoothed particle hydrodynamics: Physical viscosity and the simulation of accretion disks. Astrophys J 1994;431:754±60. [27] Watkins SJ, Bhattal AS, Francis N, Turner JA, Whitworth AP. A new prescription for viscosity in smoothed particle hydrodynamics. Astron Astrophys Suppl Ser 1996;119:177±87. [28] Takeda H, Miyama SM, Sekiya M. Numerical simulation of viscous ¯ow by smoothed particle hydrodynamics. Prog Theor Phys 1994;92(5):939±60.
246
J.P. Morris et al. / Computers and Geotechnics 25 (1999) 227±246
[29] Monaghan JJ. Heat conduction with discontinuous conductivity', Applied mathematics reports and preprints, Monash University, (95/18), (1995). [30] Theuns T, Rathsack ME. Calculating short range forces on a massively parallel computer: SPH on the connection machine. Comput Phys Comm 1993;76:141±58. [31] Warren MS, Salmon JK. A portable parallel particle program. Comput Phys Comm 1995;87:266±90. [32] Smith BJ, Baker L. A portable parallel smooth particle hydrocode. In: High performance computing in computational dynamics American Society of Mechanical Engineers, Computer Engineering Division, CED, Vol. 6, ASME, New York, USA. (1994) 53±60.