Future Generation Computer Systems 21 (2005) 759–766
Partitioning finite element meshes using space-filling curves Stefan Schambergera,∗ , Jens-Michael Wierumb b
a University of Paderborn, F¨ urstenallee 11, 33102 Paderborn, Germany Paderborn Center for Parallel Computing, F¨urstenallee 11, 33102 Paderborn, Germany
Available online 2 July 2004
Abstract Using space-filling curves to partition unstructured finite element meshes is a widely applied strategy when it comes to distributing load among several computation nodes. Compared to more elaborated graph partitioning packages, this geometric approach is relatively easy to implement and very fast. However, results are not expected to be as good as those of the latter. In this paper we present results of our experiments comparing the quality of partitionings computed with different types of space-filling curves to those generated with the graph partitioning package Metis. © 2004 Elsevier B.V. All rights reserved. Keywords: Distributed Finite Element Method; Graph partitioning; Space-filling curves
1. Introduction Finite Elements (FE) are often used to numerically approximate solutions of a Partial Differential Equation (PDE) describing physical processes. The domain on which the PDE has to be solved is discretized into a mesh of finite elements, and the PDE itself is transformed into a set of linear equations defined on these elements [1], which can then be solved by iterative methods such as Conjugate Gradient (CG). Due to the very large amount of elements needed to obtain an accurate approximation of the original problem, this method became a classical application for parallel computers. The parallelization of numerical simulation al∗
Corresponding author. E-mail addresses:
[email protected] (S. Schamberger),
[email protected] (J-M. Wierum). 0167-739X/$ – see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2004.05.018
gorithms usually follows the Single-Program MultipleData (SPMD) paradigm: Each processor executes the same code on a different part of the data. This means that the mesh has to be partitioned into P subdomains (where P is the number of processors) and each subdomain is then assigned to one of the processors. Since iterative solution algorithms mainly perform local operations, i.e. data dependencies are defined by the mesh, the parallel algorithm only requires communication at the partition boundaries. Hence, the overall efficiency depends on two factors: an equal distribution of the data (work load) on the processors and a small communication overhead achieved by minimizing the number of edges between different partitions. In practice, mainly two distinct approaches are applied to take care of this problem: advanced partitioning tools based on sometimes quite complicated heuristics and more simplistic methods based on
760
S. Schamberger, J.-M. Wierum / Future Generation Computer Systems 21 (2005) 759–766
geometric approaches. Comparisons between these approaches have been undertaken and results are presented for example in [2,3]. However, since these publications consider a large number of different partitioning approaches, the presentations of the results are somewhat comprehensive. The graph partitioning problem is known to be NP-complete, hence a number of heuristics finding good solutions have been developed and implemented in several libraries like Metis [4], Jostle [5], Chaco [6] or Party [7,8]. They usually follow the multilevel approach: In every level, vertices of the graph are matched and a new, smaller graph with a similar structure is generated, until only a small graph, sometimes only with P vertices, is left. The partitioning problem is then solved for this small graph and vertices in higher levels are partitioned according to their representatives in lower levels. Additionally, a local refinement phase is applied in every level which plays an important role improving the partition quality. Another widely applied approach to partition a mesh is based on geometric methods. The one considered in this paper applies space-filling curves (SFC). The vertices of the FE mesh are sorted by a certain recursive scheme covering the whole domain. Then, the now linear array of vertices is split into equal sized parts, each representing a partition. An advantage of the ordering of the mesh’s nodes is its flexible use. Once it has been determined, a partitioning into any number of domains can be generated without additional time consuming calculations. Furthermore, these kinds of algorithms are relatively easy to implement and provide information useful for realizing cache efficient calculations. Therefore, they are often closely coupled with the FE application. In contrast to the described multilevel partitioning heuristics, this method only works if vertex coordinates are present. It is also clear that the quality of the generated partitioning does not excel the quality of the former mentioned elaborated heuristics since except for the coordinates any other information provided by the graph is simply ignored. A definition of the space-filling curves included in this paper as well as a description of how they can be used to solve the FE graph partitioning problem is presented in the next section, together with some theoretical considerations. Section 3 gives details of our experimental results and we conclude in the last section.
Fig. 1. Refinement rule of the Hilbert curve.
2. Partitioning with space-filling curves Space-filling curves are geometric representations of bijective mappings M : {1, . . . , N m }→ {1, . . . , N}m . The curve M traverses all N m cells in the m-dimensional grid of size N. They have been introduced by Peano and Hilbert in the late 19th century [9]. An (historic) overview on space-filling curves is given in [10]. In Fig. 1–3, the recursive construction of spacefilling curves is illustrated exemplarily for the Hilbert curve. Fig. 1 shows the refinement rule splitting each quadrant into four subparts. The order within the quadrants has the same u-like basic pattern with some subparts being reflected and rotated. A possible algorithm calculating this refinement is sketched in Fig. 2. The Hilbert-function separates all given nodes in the inter-
Fig. 2. Sketch of the recursive Hilbert curve calculation.
Fig. 3. The four basic patterns of the Hilbert curve.
S. Schamberger, J.-M. Wierum / Future Generation Computer Systems 21 (2005) 759–766
761
refinement steps. For Sierpi´nski and βΩ-indexing, no 3D versions are known. 2.1. An example: partitioning the 16 × 16-grid
Fig. 4. Three-dimensional Hilbert curve used for evaluations.
val [first, last] into four sections and processes them recursively. The separators for the four sections are firstcut, midcut, and thirdcut. The Split-operation sorts all nodes in the specified interval according to an orientation (x- or y-axis) and a direction (ascend or descend), returning the index which represents the geometric separator. The orientation and direction are determined from the basic pattern type of the Hilbert curve to be generated. The given algorithm fits to the enumeration of pattern types printed in Fig. 3. An overview on the indexing schemes evaluated in this paper is plotted in Fig. 5 (top row). In addition to the older indexing schemes of Hilbert, Lebesgue, and Sierpi´nski we have examined the βΩ-indexing. The refinement rules for all curves can be found in [10,11]. While the extension to 3D space is obvious and unique for the Lebesgue curve, there are 1536 structurally different possibilities for curves with the Hilbert property [12]. In this paper the evaluations are based on the 3D definition sketched in Fig. 4 showing two
The bottom row of Fig. 5 shows the partitioning of a 16 × 16-grid into five parts using space-filling curves. The edge cuts for the four indexing schemes are 65 (Hilbert), 64 (Lebesgue), 66 (Sierpi´nski), and 58 (βΩindexing). For comparison, the edge cut obtained with Metis is 46. Thus, the edge cut for the partitionings based on the indexing schemes is 26–44 % higher than the one of the solution computed by Metis. Although this example is not representative for the overall quality (especially for irregular graphs), it shows some of the specific disadvantages of indexing schemes. In case of the Hilbert curve, the endings of the partitions are sometimes spiral. The partitions induced by the Lebesgue curve are not connected, even when applied on regular grids (cf. highlighted partition). Note that none of the indexing schemes can guarantee connected partitions in irregular graphs, but the probability of disconnected partitions is much higher for the Lebesgue curve for all meshes. On regular grids, the Sierpi´nski curve shows weaknesses, since here the diagonal geometric separators lead to a high edge cut during its recursive construction, although the partitions look quite compact. Furthermore, the endings of the partitions are sometimes of a slightly spiral structure. The βΩ-indexing is based on the same u-like base pattern as the Hilbert curve but uses different refinement rules to reduce the spiral effects at the partition endings.
Fig. 5. The four evaluated curves. Top – structure after four refinement steps; bottom – the induced partitionings on a 16 × 16-grid.
762
S. Schamberger, J.-M. Wierum / Future Generation Computer Systems 21 (2005) 759–766
Fig. 6. Partitioning example for biplane.9 using space-filling curves.
2.2. Partitioning irregular graphs For the indexing of the vertices of irregular graphs the space is split recursively until each sub-square (or sub-cube) contains of at most one vertex. The order of the vertices is given by the order of the subspaces. Fig. 6 shows the partitionings of the larger irregular graph biplane.9 (cf. Table 1 in Section 3) into five parts using the evaluated indexing schemes. The resulting edge cuts are 627 (Hilbert), 611 (Lebesgue), 863 (Sierpi´nski), and 615 (βΩ-indexing). For comparison, the edge cut obtained using Metis is 299. Due to the holes in the graph, space-filling curves and all other geometric partitioning heuristics may lead to disconnected partitions. Spiral effects can be observed again for the Hilbert and Sierpi´nski based partitioning and in a slightly reduced form for the βΩindexing. On the other hand, for the Lebesgue curve there is a larger number of disconnected partitions. The Sierpi´nski curve results in the worst partitioning because its recursive definition is based on triangles which fits badly to a graph dominated by axis aligned edges. When partitioning graphs with different expansions in each dimension, the following observation can improve the quality. A straightforward ordering by splitting the space in all dimensions leads to a distorted structure of the space-filling curve. To avoid this improper construction, the splitting starts orthogonal to the dimension of the largest expansion. The 1D splitting is performed unless the largest and second largest expansion of the subspaces differ by a factor less than √ 2. A phase of 2D splitting is attached to get the expansions of the subspaces in a 1:22/3 relation. An example for the Hilbert curve after a 1-, 2- and 3D split is given in
Fig. 7. Indexing of oblong objects.
Fig. 7. It is obvious, that the choice of the correct basic curve type is important when switching to a higher dimensional splitting phase to ensure a continuous curve. 2.3. Analytical results In contrast to most multilevel heuristics, some properties of space-filling curves and their implied partitions have been shown analytically. In [13], it is proven that subdomains based on connected spacefilling curves are “quasi optimal” for regular grids and special types of adaptively refined grids: edge cut ≤ C
|V | P
(d−1)/d ,
where |V | denotes the number of vertices, P the number of partitions, and d the dimension of the graph. The constant C depends on the type of the curve. Some constants have been determined for 2D regular grids in worst case [14]. The quality of a partition based on the Lebesgue curve is bounded by √ 384 Lebesgue 7.348 < 3 6 − ε ≤ Cmax ≤√ < 7.305. 2730
S. Schamberger, J.-M. Wierum / Future Generation Computer Systems 21 (2005) 759–766
763
Table 1 Graphs used in our experiments Graph
|V |
|E|
degmin
degavg
degmax
dim
crack big biplane.9 shock.9 brack rotor wave hermes
10240 15606 21701 36476 62631 99617 156317 320194
30380 45878 42038 71290 366559 662431 1059331 3722641
3 3 2 2 3 5 3 4
5.93 5.88 3.87 3.91 11.71 13.30 13.55 23.25
8 10 4 4 32 125 44 56
2 2 2 2 3 3 3 3
A lower bound for the Hilbert curve is 5 Hilbert ≤ Cmax . 7.442 < 12 13
consider in this paper and whose results are described in detail.
It follows that the Lebesgue curve is better than the Hilbert curve in worst case analysis for the regular grid, despite its disconnected partitions. Combined with the fact that the Sierpi´nski curve and the βΩ-indexing have larger lower bounds, Lebesgue turns out to be the best of the four evaluated indexing schemes in this case. In average case, partitions based on the Lebesgue curve are bounded by
First we examine a large regular grid. As already indicated in the last section, space-filling curve based partitionings are usually 20–40% worse than those obtained applying Metis on this 2D graph. One exception is the Sierpi´nski curve with an up to 70% higher edge cut. Another exception are partitionings requesting powers of 2 many subdomains. Here, space-filling curves perform about 20% better than Metis does. The results on the grid also correspond well with the analytical bounds shown in the last section. Those indicate a quality which is at most 40% worse than an optimal solution in average case. Fig. 8 shows the results for another 2D graph. In contrast to the grid, the biplane.9 consists of an irregular structure and also contains two holes. Therefore, we do not expect as good results as for the grid. While this expectation turned out to be true, it mainly holds in case
Lebesgue
Cavg
10 ≤ √ < 5.774. 3
Upper bounds for the Hilbert curve (5.56) and the βΩ-indexing (5.48) can be extracted from experimental results [11]. Compared to√the optimal partition, a square with a boundary of 4 |V |/P, the decrease in quality is about 85 % in worst case and 40 % in average case.
3.1. Quality
3. Experimental results In this section we compare space-filling curve based partitionings to those generated by Metis. The metrics we consider in this paper is the classical edge cut, since others like, e.g. the number of the partitions’ boundary vertices lead almost to the same conclusions. The algorithms’ performance depends much on the input instances. Hence, we use a set of established FEM graph instances in our experiments that can be obtained via the Internet [15] and that has already been included in other comparisons [5,16,7]. Table 1 lists the graphs we
Fig. 8. Solution quality of SFC compared to Metis for biplane.9.
764
S. Schamberger, J.-M. Wierum / Future Generation Computer Systems 21 (2005) 759–766
Fig. 9. Solution quality of SFC compared to Metis (2D graphs).
Fig. 10. Solution quality of SFC compared to Metis (3D graphs).
of small numbers of partitions. The advantage of applying Metis decreases to a factor of 1.5 in the interesting range of partition sizes. Unfortunately, since the partitioning problem is NP-complete, no optimal solutions are known for large graphs. Thus, it is an open question whether the decreasing difference between both methods is due to an improvement of the partitionings
induced by space-filling curves or a quality reduction of the solutions computed with Metis. Among the spacefilling curves the Sierpi´nski curve performs about 10– 20% worse than the other three curves. As mentioned, this is due to the graph’s grid-like structure with only axis oriented edges. Experiments with more unstructured triangle based meshes have shown that here the Sierpi´nski curve often performs better than the three others. The results obtained for the other 2D graphs considered in this paper are given in Fig. 9. In these experiments, the ordering of the grid-like graph shock.9 is induced by the Lebesgue curve while the Sierpi´nski curve is applied to the more unstructured instances crack and big. Again, for a small number of partitions Metis outperforms the space-filling curves by quite a large factor, this time ranging up to 3. But with an increasing number of partitions, the edge cut obtained using space-filling curves gets closer to the one calculated by Metis. For the graphs included here, a factor of less than 1.5 is reached. The 3D extensions of the Hilbert and Lebesgue curves show a similar behavior as their 2D counterparts. We combined the results from the experiments with the graphs brack, rotor, wave, and hermes all in Fig. 10, displaying the solution quality obtained by the Hilbert scheme. All space-filling curves perform quite well again, producing edge cuts less than twice as large as those from Metis from around 32 partitions on. An exception to this is the rotor graph, where only a factor of 3–4 can be achieved with the Hilbert scheme. Tables 2 and 3 summarize our observations for 16 and 64 partitions, respectively. In the 2D cases, either the Lebesgue or the Sierpi´nski curve produces the best result for 16 partitions. On grid-like meshes, the indexing schemes based on axis aligned separators are supe-
Table 2 Edge cut obtained for 16 partitions Graph
Metis
Hilbert
Lebesgue
Sierpi´nski
βΩ-ind.
crack big biplane.9 shock.9 brack rotor wave hermes all
1218 1099 800 1208 13225 24477 48183 119219
2060 2345 1253 1837 29932 97609 77166 256866
2108 2540 1235 1675 28086 93526 78592 266479
2013 2267 1593 2050 – – – –
2183 2526 1285 1736 – – – –
S. Schamberger, J.-M. Wierum / Future Generation Computer Systems 21 (2005) 759–766
765
Table 3 Edge cut obtained for 64 partitions Graph
Metis
Hilbert
Lebesgue
Sierpi´nski
βΩ-ind.
crack big biplane.9 shock.9 brack rotor wave hermes all
2781 2843 1906 2902 29432 52190 94342 241771
4031 5025 2867 3908 59814 176574 130923 443450
4271 5198 2888 3915 61054 182657 138787 473081
3927 4854 3229 4355 – – – –
4084 4919 2844 3761 – – – –
rior to Sierpi´nski while the latter one performs better on more unstructured meshes. This observation also holds for 64 partitions. But now the βΩ-indexing is best for biplane.9 and shock.9. For the partitioning of grid-like meshes it follows, that the non continuous structure of the Lebesgue curve is superior to spiral ones of, e.g. the Hilbert scheme for partitioning in few parts. But if a finer partitioning is needed this advantage diminishes and the continuous curves Hilbert and βΩ perform better. Overall, the edge cut of the partitionings induced by space-filling curves is less than twice as large as the one obtained with Metis for 16 partitions. For 64 partitions this value decreases down to a factor of 1.5. A similar relationship between Hilbert and Lebesgue curve can be seen for 3D graphs. For finer partitionings the continuous Hilbert curve gets more superior to the Lebesgue curve. Compared to Metis, in this case the relative difference is 2.4 and 2.1 for partition numbers of 16 and 64, respectively. 3.2. Resources As mentioned before, the goal of balancing load is the reduction of the overall computation time. Therefore, the time spend on partitioning itself should also be minimized. All space-filling curves need much less time for their computation than Metis. This is even more the case if the computation of the ordering is interrupted as soon as all partitionings have been determined. On the other hand, if full ordering information is available, it is easy to decompose the graph into any other given number of partitions very quickly. Considering the memory consumption, Metis is outperformed even more by all space-filling curves. In case of the hermes all graph, Metis requires about 220 MByte, whereas space-filling curves only consume
about 5 MByte for the graph and additional 4 KByte for the recursive descending. 4. Conclusion and future work As expected Metis computes partitionings with lower edge-cuts than space-filling curves do. This is not surprising since space-filling curves only rely on vertex coordinates rather than on any connectivity information between the vertices. The information that edges do provide is simply ignored. However, the advantage between the solution quality of both approaches is not too large. In most cases, applying Metis does not decrease the edge cut by more than 40–80% for decent numbers of partitions. This factor decreases further with an increasing number of partitions. Moreover, space-filling curves save both, a lot of time and a lot of memory. One possibility to further improve the partitions induced by node orderings is to take the graphs’ edges into account. This additional information enables us to compute an ordering that models the graphs’ structure more closely and therefore improves the quality significantly. Acknowledgement This work was partly supported by the German Science Foundation (DFG) project SFB-376 and by the IST Program of the EU under contract number IST1999-14186 (ALCOM-FT). References [1] G. Fox, R. Williams, P. Messina, Parallel Computing Works!, Morgan Kaufmann, San Francisco, 1994.
766
S. Schamberger, J.-M. Wierum / Future Generation Computer Systems 21 (2005) 759–766
[2] B. Hendrickson, K. Devine, Dynamic load balancing in computational mechanics, Comput. Methods Appl. Mech. Eng. 184 (2000) 485–500. [3] K. Schloegel, G. Karypis, V. Kumar, Graph partitioning for high performance scientific simulations, in: J. Dongarra, et al. (Eds.), The Sourcebook of Parallel Computing, Morgan Kaufmann, 2002. [4] G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput. 20 (1) (1998) 359–392. [5] C. Walshaw, M. Cross, Mesh partitioning: a multilevel balancing and refinement algorithm, SIAM J. Sci. Comput. 22 (1) (2000) 63–80. [6] B. Hendrickson, R. Leland, The Chaco User’s Guide, version 2.0, 1994. [7] R. Preis, R. Diekmann, PARTY – A software library for graph partitioning, Adv. Comput. Mech. Parallel Distribut. Process. (1997) 63–71. [8] S. Schamberger, Improvements to the helpful-set heuristic and a new evaluation scheme for graphs-partitioners, International Conference on Computational Science and its Applications, ICCSA’03, LNCS, vol. 2667, Springer, 2003, pp. 49–59. ¨ [9] D. Hilbert, Uber die stetige Abbildung einer Linie auf ein Fl¨achenst¨uck, Mathematische Annalen 38 (1891) 459–460. [10] H. Sagan, Space Filling Curves, Springer, 1994. [11] J.-M. Wierum, Definition of a new circular space-filling curve – β-indexing, Tech. Rep. TR-001–02, Paderborn Center for Parallel Computing, http://www. upb.de/pc2/, 2002. [12] J. Alber, R. Niedermeier, On multi-dimensional hilbert indexings, Computing and Combinatorics, LNCS, vol. 1449, Springer, 1998, pp. 329–338. [13] G. Zumbusch, Parallel Multilevel Methods: Adaptive Mesh Refinement and Loadbalancing, Teubner, Stuttgart, 2003.
[14] J. Hungersh¨ofer, J.-M. Wierum, On the quality of partitions based on space-filling curves, International Conference on Computational Science ICCS, LNCS, vol. 2331, Springer, 2002, pp. 36–45. [15] C. Walshaw, The graph partitioning archive, http://www. gre.ac.uk/∼c.walshaw/partition/, 1997–2003. [16] R. Battiti, A. Bertossi, Greedy, prohibition, and reactive heuristics for graph partitioning, IEEE Trans. Comput. 48 (4) (1999) 361–385. Stefan Schamberger received the diploma degree from the Universit¨at Paderborn, Germany, in 2000. Currently he is working towards his Dr. degree as a researcher in the field of efficient usage of parallel systems. His main interests are load balancing schemes and graph partitioning heuristics in distributed computing environments.
Jens-Michael Wierum received his diploma degree in computer science from the Universit¨at Paderborn in 1996. He is currently pursuing his Dr. degree while working as a researcher at the Paderborn Center for Parallel Computing. His interest is focused on applications of the locality preserving properties of discrete spacefilling curves including mesh partitioning and global contact search in finite element simulations.