2 August 1996
CHEMICAL PHYSICS LETTERS ELSEVIER
Chemical Physics Letters 257 (1996) 647-650
Fractional tiers in fast multipole method calculations Christopher A. White, Martin Head-Gordon l Department of Chemistry, University of California, Berkeley, Berkeley, CA 94720, USA Received 21 February 1996; in final form 9 April 1996
Al~tract One def'ming characteristic of a fast multipole calculation is the number of tiers (depth of tree) used to group the particles. For three dimensions, the standard boxing scheme restricts the number of lowest level boxes to be a power of eight. We present a method which through a simple scaling of the particle coordinates allows an arbitrary number of lowest level boxes. Consequently, one can better balance the near-field and far-field work by minimizing the variation in the number of particles per lowest level box from its optimal value. Test calculations show systems where this method gives a speedup approaching two times.
1. Introduction Interest in using Greengard and Rohklin's fast multipole method (FMM) [1-4] for the computation of Coulomb interactions has greatly increased in recent years [5-10]. The FMM uses multipole and Taylor expansions to obtain the potential and forces of a distribution of point charges in work scaling linearly with the number of charges. The limited regions of validity for these expansions requires a spatially localized grouping of charges which is conventionally obtained by placing the point charges within boxes. Within the FMM, Coulomb interactions are grouped into near-field or far-field interactions. Near-field interactions are defmed as interactions between particles too close to interact via multipoles and are evaluated by explicit pair-wise computation of the potential. The remaining interactions are 'NSF Young Investigator 1993-8, Alfred P. Sloun Fellow 1995-7, and David and Lucile Packard Fellow, 1995-2000.
termed far-field and are evaluated by using multipole and Taylor expansions. A definition of "too close" (given by a well-separatedness index [10]) and a scheme for placing the point charges within boxes determines the nature of each interaction without an explicit quadratic examination of each particle-particle distance. In the standard FMM boxing scheme, one first encloses the entire distribution of charges within a computational parent box. This parent box is subdivided in half along each Cartesian axis to yield a set (eight in three dimensions or two in one dimension) of smaller child boxes. These child boxes are further sub-divided creating children of children. This process is continued until the number of particles enclosed within each of the lowest level boxes reaches some optimal value. By optimal, we mean the balance of near-field and far-field work is such that one minimizes the time necessary to compute the potential and forces. The work to determine the near-field portion of the potential scales as dr(Mn) where M is approxi-
0009-2614/96/$12.00 Copyright O 1996 Published by Elsevier Science B.V. All rights reserved PI! S0009-2614(96)00574-X
648
CA. White, M. Head-Gordon / Chemical Physics Letters 257 (1996) 647-650
mately the number of particles enclosed in each lowest level box and n is the total number of particles. For this step to scale linearly with the total number of particles, M must be independent of the total number of particles. Since the standard method increases the number of lowest level boxes only by factors of eight (in 3D), this is not generally possible unless we increase the number of particles by a factor of eight. For smaller increases in the number of particles, the optimal number of lowest level boxes will often be unchanged and hence M increases. Thus for a range of system sizes where the optimal tree depth is constant, we see locally quadratic behavior in the computational effort due to the near-field evaluation [10,11]. The locally quadratic behavior occurs for uniform distributions of particles. For non-uniform distributions, even larger variations in M can occur. Adaptive multipole methods [12,13] generalize the standard scheme to allow different regions of space to be described by independent levels of partitioning. This non-uniform boxing reduces the variations in M introduced by the non-uniform charge distribution. However, adaptive methods do not address the smaller variations in M induced by selecting boxes such that the number of boxes enclosed in a 1 × 1 × 1 box is a power of eight. In this Letter we describe a simple yet highly effective method that permits the number of lowest level boxes to change by very small amounts. The variations of M inherent in the standard method can be damped by the added control we gain by being able to add one box at a time to the lowest level. One still has locally quadratic behavior; however, the amount of time spent on any given quadratic curve is dramatically reduced allowing the method to more optimally balance the near-field and far-field work for any given number of particles.
2. Implementation and discussion In standard FMM implementations, one selects the depth of the FMM tree, and thus determines a number of lowest level boxes. Here our objective is to instead define a procedure for yielding lowest level boxes which, /f occupied, contain approximately the optimal number of particles, Mt,rg~. Mt~g~t
may be established by numerical experiments, such as those discussed below. To achieve this objective within the tree structure of the FMM, we must allow many of the lowest level boxes to be empty. Crucial to the value of this idea is the fact that the computational overhead associated with empty boxes can be made negligible. The optimal boxing problem can be satisfactorily solved for the case where the distribution of particles is approximately uniform within some bounding surface, So. We enclose the surface ,3" by the smallest possible cube, ~', and denote the volume ratio (or area ratios in 2-D and length ratio in 1-D problems) of 5 " to ~" as g. The target number of lowest level boxes is the total number of particles, n, divided by the optimal number of particles per box, scaled by g:
I n ]
ffi M,.
o,g
'
where [ x] denotes the smallest integer greater than or equal to x. By construction this fixes the average particle number within a particle-containing box at close to its optimal value. Depending on the geometric factor, g, there may be many empty boxes. The operating tree depth, t, for the FMM is now given by the smallest number of tiers which generates at least Nt,rget lowest level boxes. In d dimensions, this is: tffil+
dlog2
"
(2)
We can guarantee that only the required number of boxes are occupied by defining the dimensions of the parent box (which is subdivided t - 1 times to yield the lowest level boxes) to be larger than those of the system. The ratio of the volume of the parent box to the size of the smallest enclosing cube ~" should be equal to the ratio of Ntar-et to the total number of lowest level boxes, 2 d¢t- ~ of the t-level tree. This is a fraction equal to:
Ntarget f---- 2d(t-l)"
(3)
TO achieve this volume ratio, we define the ratio of side lengths of the minimal enclosing cube ~ to the parent box of the FMM tree to be f~/d. The additional empty boxes beyond the edges of the charge distribution can again be ignored at negligible corn-
CA. White, M. Head-Gordon / Chemical Physics Letters 257 (1996) 647-650 3 ,,,j~~//"
l I
o 0
I
.
2
"~': pagticles( xI0~)
3
4
Fig. I. Plot showing the time to form the potential and forces for a random, homogeneous system of point charges enclosed within the unit box, using the fast multipole method (FMM) with expansions of order L ffi 21 and L ffi 6. Timings were recorded on an IBM R S / 6 0 0 0 model 350 workstation. The dashed lines rcpresent use of the standard boxing scheme. These curves show locally quadratic behavior with cusps which arise from increasing the depth of the FMM trce. The solid lines represent our new boxing method which allows a fractional tree depth by selecting an arbitrary number of lowest level boxes. The maximal differences betwecu the curves shows that the new method can give spcedups approaching a factor of two.
putational cost, effectively selecting an arbitrarily number of lowest level boxes. This method requires no modifications to the standard FMM procedure except the pre-conditioning of the charge distribution described above, and yet we shall show it gives speedups approaching a factor of 2 even for very low order multipoles. We have applied these modifications to our rotation based FMM code [14] (a reformulation of the FMM with computational cost that depends cubically rather than quartically on the number of retained multipoles). Fig. 1 shows a plot of time against number of particles for 6 and 21 levels of multipoles, using the same model system we have described previously [10]. This is a cube containing randomly distributed particles, so that g -- 1. The standard FMM (dashed curves) show locally quadratic behavior for fixed depths of tree, as is well known [10,11]. The solid curves representing the scaled charge distributions touch the original curves at their lower tangents where the globally optimum value of M is achieved. (roughly 100 for 21 poles and 25 for 6
649
poles in this implementation). However, they show much more favorable timings in the vicinity of crossings between different numbers of tiers in the conventional method. That is the point where two values of M differing by a factor of eight yield equal performance. One is much larger than the globally optimum value and the other much smaller. The quadratic regions obtained by this method are barely perceptible in both cases as would be expected by the reduction in the variations of M. Clearly the new procedure achieves its goal of permitting the FMM to operate in a close to optimum fashion regardless of particle number. It is also important to comment on the fact that the FMM curves do not yield true linear scaling over the full ranges we have plotted on Fig. 1. This is not because of any hidden nonlinear contributions to the computational effort in the algorithm. As one increases the total number of boxes, the fraction of boxes located at the edges of the distribution decreases. This increases the average number of neighboring boxes (since boxes on the edge have fewer neighbors) and thus increases the amount of near-field work. This effect is most noticeable for shallow trees where a substantial number of boxes are located at the edge of the distribution, and the average number of neighboring boxes grows rapidly with tree depth. Linearity is established for deeper trees where one achieves an asymptotic average number of neighboring boxes determined solely by the well-separatedhess criterion. The nonlinearity is more noticeable for the 21 pole results where the transition to deeper trees occurs for larger numbers of particles (since far-field work is relatively more expensive), and accordingly the asymptotic linear regime is only approached toward the right of the graph.
3. Conclusion We have presented a new method for boxing charge distributions within the fast multipole method (FMM). We emphasize several points: (1) The method uses a pre-scaling of the charge distribution to select an arbitrary number of lowest level boxes thus allowing the number of particles per box to remain much closer to optimal than is possi-
650
CA. White, M. Head-Gordon~ Chemical Physics Letters 257 (1996) 647-650
ble within standard FMM calculations. This method requires only trivial modifications yet it dramatically reduces the locally quadratic behavior seen in FMM timings. (2) We focus on optimizing the number of particles per box for a system of uniform density enclosed by a potentially irregular boundary. This contrasts with other boxing methods such as the adaptive boxing strategies [12,13] which specifically target non-uniform charge distributions. For a uniform system the adaptive strategies revert to the standard method, indicating that even the adaptive methods might profit from coupling to this method. (3) Since the idea applies directly to the method of boxing the charges rather then to the FMM itself, one might apply this method to any variation of the FMM such as our continuous fast multipole method for density functional theory calculations of elecIronic structure [15,16] or more generally to any application requiring the partitioning of a system into boxes.
Acknowledgement This work was supported by Q-Chem Inc., through a Department of Energy SBIR agreement (DE-FG0593ER81643).
References [1] L. Greengard and V. Rokhlin, J. Comp. Phys. 60 (1985) 187. [2] L. Greengard, The rapid evaluation of potential fields in particle systems (MIT Press, Cambridge, 1987). [3] L. Greengard and W.D. Gropp, Comp. Math. Appl. 20 (1990) 63. [4] L. Greengard, Science 265 (1994) 909. [5] K.E. Schmidt and M.A. Lee, J. Stat. Phys. 63 (1991) 1223. [6] F. Zhao and S.L. Johnsson, Siam J. Sci. Statist. Comp. 12
(1991) 1420. [7] J.A. Board Jr., J.W. Causey, J.F. Leathrum Jr., A. Windemuth and K. Schulten, Chem. Phys. Letters 198 (1992) 89. [8] H.Q. Ding, N. Karasawa and W.A. Goddard llI, J. Chem. Phys. 97 (1992) 4309; Chem. Phys. Letters 196 (1992) 6. [9] J. Shimada, H. Kaneko and T. Takada, J. Comp. Chem. 15 (1994) 29. [10] C.A. White and M. Head-Gordon, J. Chem. Phys. 101 (1994) 6593. [11] G. Blelloch and G. Narlikar, Proceedings of Dimacs Implementation Challenge Workshop, (Oc~ber 1994), available at hUp://www.cs.cmu.edu/afs/cs.cmu.adu/project/ scandal/public/www/papers-html#nbody. [12] J. Carrier, L. Greengard and V. Rokhlin, SIAM J. Sci. Statist. Comp. 9 (1988) 669. [13] K. Nabors, F.T. Korsmeyer, F.T. Leighton and J. White, SIAM J. Sci. Statist. Comp. 15 (1994) 714. [14] C.A. White and M. Head-Gordon, J. Chem. Phys., submitted for publication. [15] C.A. White, B.G. Johnson, P.M.W. Gill and M. HeadGordon, Chem. Phys. Letters 230 (1994) 8. [16] C.A. White, B.G. Johnson, P.M.W. Gill and M. HeadGordon, Chem. Phys. Letters, in press.