Advances in Engineering Software25 (1996) 253-266 Copyright 0 1996 Civil-Camp Limited and Ekevier ScienceLimited Printed in Great Britain. All rights rese.rved 0965-9978(95)00110-7
ELSEVIER
0965-9978/96/$15.00
Subdomain generation for non-convex parallel finite element domains B. H. V. Topping & A. I. Khan Department of Mechanical and Chemical Engineering, Heriot- Watt University, Riccarton, Edinburgh EH14 4AS, UK
(Received 31 March 1994; accepted 6 November 1995) In this paper the Subdomain Generation Method @GM), originally formulated in Khan & Topping (1993; Khan, A. I. & Topping, B. H. V., Subdomain generation for parallel finite element analysis. Comput. Syst. Engng, 1993,4(4/6), 473-488) for convex finite element domains, is generalized for arbitrary shaped domains. Modifications to the original SGM are described which allow partitioning of non-convex domains. These modifications have been made to the formulation of the optimization module and the predictive module. The examples presented in Khan & Topping (1993) have heen re-worked and two more examples have heen added which demonstrate the application of the method to arbitrary shaped domains. It is shown with the aid of the examples that the method provides well-balanced subdomains very efficiently and allows parallel adaptive mesh generation. The method in its present form may be used to partition unstructured graphs in two or three dimensions. Since the computational cost for the mesh partitioning with this method depends solely upon the initial coarse mesh, hence the computational cost does not increase with the increase in the mesh density of the final mesh. The method in its present form is unsuitable for relatively coarse grained parallel computers, however the modifications which would impart a greater degree of scalability to this method are discussed.Copyright 0 1996Civil-Comp Limited and Elsevier ScienceLimited.
based on the concept of dividing a large and computationally time-consuming finite element problem into smaller and more manageable sub-problems which may be solved efficiently. This process of breaking up a problem into smaller sub-problems is called domain decomposition. Under this approach a target finite element mesh (a discretized domain) is divided into a finite number of subdomains such that the computational load (or effort) per subdomain is approximately the same, these subdomains are then solved concurrently over different processors. Hence, large-scale analysis problems may be solved using this approach at much greater speeds by networking multiple processors. The task of partitioning unstructured mesh for parallel finite element analysis is not straightforward. For explicit time stepping finite element analysis not only the number of elements per subdomain has to be kept the same but the number of interfacing boundary nodes between these subdomains has to be minimized. In addition to this the actual time of mesh partitioning becomes a critical factor when large-scale adaptive finite element discretizations are encountered. The time for re-meshing and partitioning grows exponentially as
1 INTRODUCTION The key features of large-scale computationally finite element analysis are:
intensive
l a large number of degrees of freedom w a nonlinear or dynamic analysis l an adaptive finite element analysis.
Adaptive finite element analysis with h-refinement is best accomplished with the aid of unstructured meshes. Unstructured, adaptive or uniform, finite element meshes generally comprise a single element type. The computational load balancing using the unstructured meshes for explicit time-stepping parallel finite element analysis may be ensured by fulfilling the following requirements: l
l
an equal number of elements should be assigned to each subdomain the number of boundary interface nodes of each subdomain should be minimized.
Large-scale finite element computations lead to parallel finite element analysis. Parallel finite element analysis is 253
254
B. H. V. Topping, A. I. Khan
the mesh density increases.In Ref. 1 a non-conventional approach to the domain decomposition problem was proposed which addressed the problem of re-meshing and mesh partitioning by working with the initial coarse uniform mesh in adaptive finite element analysis in contrast with the some of the contemporary methods2’3 which decompose the final refined mesh. This nonconventional approach was, however, restricted to convex finite element domains. An optimization and an artificial-intelligence-based approach was used to alleviate the excessivecomputational loads occurring due to re-meshing and mesh partitioning. The large memory requirement for the ROOT processor for storing the complete mesh was also eliminated owing to the generation of distributed meshes or subdomains over parallel processors. 2 AN OPTIMIZATION-BASED APPROACH TO THE MESH PARTITIONING PROBLEM The problem of mesh partitioning, i.e. breaking up of a mesh into subdomains such that the computational load is balanced per subdomain and the communication problem is minimized through the minimization of the interfacing boundary nodes, is also known as the mapping problem. Flower et aL4 used simulated annealing to solve the optimal mapping problem for finite element subdomains. A cost function comprising the total computational work load and the total communication cost of the mapping was defined as follows:
allowing the elements to migrate among the processors during the iterations performed using the parallel conjugate gradient equation solver. This study indicated that the minimum value of the cost function depended upon the initial distribution of the elements over the processor network. It also showed that best results were obtained when the initial distribution was based upon the criterion of placing neighbouring subdomains on neighbouring processors (geometric parallelism). Thus, in order to obtain low cost function values the problem of automatically partitioning the mesh still persisted. As with conventional optimization methods the computational cost for genetic-algorithm-based optimization depends heavily on the number of design variables. If a technique similar to that of Ref. 4 would have been adopted then to partition the finite element mesh optimally, over the processor network, would have required the number of design variables to be equal to the number of the elements in the finite element mesh. For large scale discretizations this would have been impractical. In Ref. 1, an objective function based upon computation and communication costs was proposed for partitioning planar convex finite element domains using recursive bisections which utilized only three variables. Thus, owing to the relatively fewer number of design variables and the restricted problem size (resulting from the use of the initial coarse background mesh for partitioning) it was shown that optimal/near optimal mesh partitions could be obtained quite inexpensively for large-scale finite element discretizations.
PC9
C= 5
(w~)~+h tcomP
i=l
c DPq p,q
(1)
3 THE SUBDOMAIN GENERATION METHOD FOR NON-CONVEX DOMAINS
subject to: N c
wi =
Wtot
i=l
where N:
number of subdomains or processors computational work load associated with elements residing in subdomain i, assigned to processor i a constant wet : Dpq = Dij: communication distance between two finite elements p and q residing in two different processors i and j tCorm: typical communication time per element tcamp: typical computation time per element r: is defined as an integer greater than 1, which for simplicity’s sake is taken as 2. The term wtot was obtained for equal values of wi. Initial mappings were done by arbitrarily assigning the elements over the processors. The simulated annealing heuristic was used to improve the mapping by
The Subdomain Generation Method proposed in Ref. 1 was restricted to convex finite element domains. This limitation was a natural consequence of using a linear separator to bisect the coarse mesh. It is not possible to
Wi:
II ELEWYR
47 El.lYE”TI \ YnR*CE ONE
1) ELEYEllTI
-I2 El.Ew!WTI I ONI! lwruIFAcL
Fig. 1. Non-optimal bisection of a non-convex domain by a linear separator.
Subdomain generation for non-convex parallel finite element domains NON-LINEAR
within a bisection. Numerical studies done on the genetic-algorithm-based optimization module of the SGM indicated that the optimization module gained efficiency when the range of the values of the design variables was restricted.
SEPERATOR
21 ELEMENTS
’
21 ELEMENTS
ONE INTERFACE
Fig. 2. Bisectionof a non-convexdomainby a linear separator. bisect a non-convex domain, as the one shown in Fig. 1, such that the number of elements in each subdomain is equal and the number of interfaces between these subdomains is minimized. In order to cater for the non-convexity of the finite element domain the objective function has to use either a nonlinear separator or set the number of design variables equal to the number of elements in the coarse mesh. Both the possibilities entail a substantial increase in the number of design variables. Use of a second-order separator was carried out on a non-convex domain similar to the one shown in Fig. 1. The nonlinear separator, as shown in Fig. 2, did succeed in determining the optimal partition but the computational cost involved for this process made this line of action not feasible to pursue further for the domain decomposition problem. The second option, i.e. setting the design variables in the objective function equal to the number of elements in the background mesh initially did not appear to be promising since other researchers,such as Flower et a1.,4 had used it with limited successin optimization-based mesh partitioning. However, working with a coarse background mesh and using recursive bisections indicated that good recursive partitions may be determined with the SGM, even when the number of design variables was set equal to the number of the elements in the coarse starting mesh owing to the following factors: l
l
255
As indicated in Ref. 1, a coarse starting mesh comprising up to 150 elements could deliver 8-16 balanced partitions. Hence, for coarse-grained high-performance systems the number of variables may be generally kept under this limiting value. With the recursive bisection of a subdomain, a design variable representing an element of the starting mesh may only have two discrete values representing its assignment to either of the two subdomains formed
Keeping in mind the above factors, the number of design variables was set equal to the number of the elements in the coarse initial mesh. The results were, however, not encouraging. The optimization module was able to form optimal partitions as long as the number of design variables remained very low, however when the number of design variables was increased to barely workable values, such as a coarse mesh comprising between forty and fifty elements, the optimization module failed to determine optimal bisections within a reasonable time limit. The genetic-algorithm-based optimization module generated a population of mesh partitions by assigning, at random, the values 0 or 1 to each design variable or chromosome. A set of these design variables represented an individual in the population. A numerical experiment was performed by artificially introducing a near-optimal design in the population of the genetic algorithm. It was noticed that the optimization module very rapidly reached the optimal design, using the information presented by this artificially introduced individual. Hence, it was inferred that starting with individuals formed through random generation of chromosome value approach could not be used in the optimization module for mesh partitioning, since the cost of obtaining the optimal solution would usually be very high, rendering this technique not feasible. It was important that a rapid way of determining near-optimal bisections was introduced into the process of initial population generation, which would produce a population consisting of some individuals with near optimal characteristic, enabling the genetic algorithm to determine an optimal solution efficiently. 3.1 Greedy algorithm for the population initialization
Farhat proposed a mesh-partitioning method based upon a greedy algorithm. This method, along with someother meshpartitioning methods, has beenreviewed in Ref. 1. The greedy-algorithm-based approach was found to be the quickest in delivering mesh partitions. It, however, did poorly with regard to limiting the number of interfaces between subdomains. This aspect was primarily due to the intrinsic nature of the greedy algorithm, where the consequencesof forming a mesh partition on future mesh partitions, which have yet to be constructed, are completely ignored. Hence the splitting of subdomains occurs frequently increasing the inter subdominal boundaries considerably. It was, however, felt that a greedy-algorithm-based approach may provide the near-optimal solutions for
256
B. H. V. Topping, A. I. Khan
priming the optimization module. The population initialization stage of the genetic algorithm was modified such that the individuals were created using a variation of the greedy algorithm instead of random sampling. A neural-network-based predictive module was used in this process which was similar to the one described in Ref. 1, except for a modification to cater for large-scale element generation. This modification to the predictive module shall be described later under the neural network section. The greedy-algorithm-based population initialization was carried out as follows: Process the initial mesh, on element-by-element basis. Using the neural-network-based predictive module to estimate the number of elements that would be generated within each element of the coarse initial mesh and the total number of generated elements resulting from this mesh. (2) Set up an element adjacency array for the elements of the coarse background mesh. (3) Assign a weight factor to each element in the coarse background mesh which is equal to the number of elements with which the element shares sides with. (4) Select an element from the coarse initial mesh at random. This element may be referred to as the pivot element. (5) Do. (6) Selectthe elements adjacent to the pivot elements provided they do not have zero weight. If (7) an adjacent element is not to be found then select an element at random which has not yet been selected. This is sometimes necessary to allow the algorithm to proceed at the cost of splitting of the subdomain. (8) If an element is selected then its associated weight is reduced by one. (9) Randomly choose the next pivot element from the previously selected elements which have minimal non-zero weights. (10) While (the number of generated elements is within 5% tolerance of the half of the total number of generated elements). The above process results in the generation of an individual in the population space with approximately equal number of generated elements in the subdomain and some probability in favour of having fewer interfacing boundary nodes. The above process is repeated until the total number of individuals is equal to the specified population size in the genetic algorithm module.
type of element, may be addressedby ensuring that the elements in the domain are equally shared among the processors and that the boundary interface nodes per subdomain are minimized. The mesh partitioning problem for adaptive unstructured meshesbecomesincreasingly computationintensive as the number of elements in the mesh increases. The method proposed in Ref. 1 was not affected by the total number of elements in the final adaptive mesh. It was, however, necessary to have an advanced knowledge concerning the number of the elements that would be generated in the final mesh. Hence, by applying recursive bisections on the initial mesh Mi = (Ei, Ci), consisting of Ei elements and Ci element edges, it was possible to partition the final (predicted) mesh iVr = (Ef, Cf), with Ef elements and C, element edges, as follows. Divide il4i into Mti and M2ir such that: &=&u&f ElfflEx=
(3) (4)
and the interfacing edges C,, ICcfl
= lClfnC2fI
(5)
are minimized. The advance knowledge regarding the number of elements in the final mesh Mr was obtained by training a neutral network to predict the number of elements that may be generated per element from the initial mesh Mi. The method of recursive bisections was applied to the initial mesh Mi which was divided into two subdomains by using a variation of the genetic algorithm regulated by an objective function which attained a maximum value when the number of generated elements were equal in both the subdomains and the number of interfacing edges was at its minimum. This procedure was then applied recursively to each subdomain. The coarse initial mesh thus partitioned into the desired number of subdomains was ready to be mapped on to parallel processors and adaptive mesh generation could be done concurrently, producing the final distributed mesh for use in a parallel finite element analysis. The modified SGM proposed now follows the above solution strategy except for the change in the population initialization process, which has been described in Section 3.1, and some other changes in the formulation of the objective function and convergence criterion which are described in the following sections. 3.3 Objective function
3.2 The genetic-algorithm-based optimization module for the modified SGM As already mentioned earlier, the load-balancing problem for finite element discretizations, involving a single
The objective function was based upon eqns (3)-(5). Initially the elements of the domain are divided into two sets using the greedy-algorithm-based approach described in Section 3.1. Using a trained neural network
Subdomain generation for non-convex parallel finite element domains
251
1 BIAS
2ND
1ST
8
WINPUT
HIDDEN
HIDDEN
LAYER
LAYER
LAYER
Fig. 3. Neural network for the predictive module.
module (which shall be described later), the number of elements are estimated which would result from each element of the mesh. The total number of predicted elements in the divided mesh, i.e. Eli and &i, are calculated. The value C$ of interfacing edges1C,r 1in the final mesh is determined as follows: (6)
where Cci are the interfacing edgesin the initial mesh, S,, and S,, are the nodal mesh parameters at the two ends of (C,.Jk, (Lc,,)k is the length of (Cci)k. The objective function is defined as: z = 1 _ A KIElfl - IE2fl)l _ B Gf
IEfl
possible designs is processed through sufhcient number of generations to determine the optimum design. One iteration is equivalent to one generation of the population. For problems of design optimization the shape of the design spaceis usually not known in advance and the genetic algorithm performs directed random searches over the design variable space to find better solutions. Hence, it is not possible to state at any one stage that the solution obtained by the genetic algorithm is the best possible solution. Given a genetic algorithm with a very large sample space (population) and allowing it to process a very large number of generations then one may say with a certain degree of confidence that the solution reached by the genetic algorithm is either globally optimum or quite close to it. However, For 7
lEfl
whereIhI = IEd + I&d
The multipliers A and B were chosen empirically for the best results. The values thus selected were A = 3 and B = 10. These values indicate a bias towards minimization of the interfaces which seemsreasonable since the greedy-algorithm-based population initialization provides all the individuals with equal opportunity, within tolerance of 5%, to have equal number of elements in the subdomains, hence probably greater optimization effort is needed to minimize the interface boundaries during the optimization process than to retain an equal number of elements within the bisected subdomains.
A,>//
3.4 Convergence
With the genetic algorithm a given population
of
Fig. 4. A square-shapeddomain with in-plane load.
B. H. V. Topping, A. I, Khan
Fig. 5. Example1. The initial mesh(46 elements)divided into four subdomainsusing the SGM.
Fig. 7. Example 1. The meshpartitions deliveredby Farhat’s
problems such as mesh-partitioning for parallel finite element analysis it is not possible to incur large computational effort since it would defeat the whole purpose of performing parallel finite element analysis. Keeping in mind the aspect that the genetic-algorithmbased optimization may not be performed over a very large number of generations, the following convergence criterion was employed. For these considerations, genetic-algorithm-based optimization module in Ref. 1
used a population size of 50 individuals. However, the proposed method requires a greater population space due to the increase in the design variables from three to the number of elements in the starting mesh. The population size for the modified SGM usesa population size of 500, crossover probability of 0.6 and mutation of 0.0333. The genetic algorithm is stopped if no improvement over the best design is noticed within 20 consecutive generations. In addition to this criterion a
Fig. 6. Example 1. The remeshedsubdomainsfor the mesh (412elements)partitioned using the SGM.
Fig. 8. Example 1. The meshpartitions deliveredby Simon’s
method for the final meshcomprising412 elements.
methodfor the final meshcomprising412 elements.
Subdomain generation for non-convex parallel finite element domains
259
Table 1. Example 1. Comparlmn of tke actual mlmber of generated elements per srhdomaia vs the ideal number of ektuents that sin&d have been generatedin Example 1
Subdomain Generated Generated Diff. no. elements elements (required) (actual) 103 106 3 : 3 4
106 104 96
103 103 103
3
1 -7
Percentage diff. 2.91 2.91 0.97 -6.8
maximum limit of 500 on the number of generations is set but this criterion was never invoked during the test runs of the SGM. The genetic algorithm generally provided good partitions within the first 70 generations based upon the above stopping criterion. 3.5 Neural networks Initially the neural network module trained previously for SGM was used to re-work the test meshes used in the examples of Ref. 1. However this neural network failed to provide reasonable partitions when the number of generated elements increased over the maximum number of elements in the test meshes used in the training of the neural network. This was due to the fact that the neural network module could not predict values which were greater than the maximum value it had encountered during its training. Hence, it became necessary to train a new neural network which could handle higher predictive mesh densities. A squareshaped domain with a 46-element mesh as used in Ref. 1 was re-analysed with a very high degree of refinement. The resultant mesh generated through this adaptive analysis comprised 13265 elements and the maximum number of elements generated within a single element of the coarse mesh was 1226. The results from this analysis were added to the original training file. A neural network with 12 neurons, as shown in Fig. 3, provided the best performance with two hidden layers. Eight neurons and a bias element were provided in the input layer, two neurons in the first hidden layer, one neuron in the second hidden layer and a neuron in the output layer. An epoch size of IO was used with dbd (delta-bar-delta) training strategy. The convergence was assumed when the RMS error value fell below 0.03%. The neural network trained to predict the large number of elements produced relatively larger errors in Table 2. Example 1. Comparison of interfaces betweenthe SU~~OII&DSand the run times on a SPARC 10 workstation for the partitioning of the coarse and the refined meshes
Method SGM Farhat’s method Simon’smethod
Fig. 9. Example2. An L-shapeddomain with in-plane load.
the number of predicted elements as compared to its predecessor,’ which was trained for relatively fewer number of generated elements. Theseerrors becamevery significant when relatively coarser meshes were partitioned. Hence, provision was made in the method to choose between the two neural networks when predicting the number of elements. The large-scale predictive network was initially used to estimate the number of elements that would be generated within an element and if the number of elements was found to be less than the upper limit of the neural network,’ then the original neural network module was invoked instead of the new one described above. This imparted considerable accuracy to the predictive module regarding the handling of coarse to very fine mesh densities. The lesson learned from this exercise was that it is better to train neural networks for specific ranges rather than to train a single neural network for a wide spectrum of outputs.
4 EXAMPLES
The examples presented in Ref. 1 have been re-worked for the modified SGM. These examples were worked on a single T800 transputer in Ref. 1. A SUN SPARC 10 M20 workstation has been used for the CPU timings of the modified SGM presented here. In Ref. 1 the examples of Farhat’s and Simon’s methods were applied to the final meshes produced after the adaptive fmite element analysis. In the examples presented in this paper the meshesused for Farhat’s and Simon’s methods were Table 3. Example 1. The nmnber re-meshing each subdomain. Tbe domaia took 29s
InterfacesCcr
Time (s)
Subdomainno. 1
Elementsgenerated 106
48 82 56
19.3 0.2 58.3
2 3 4
106 104 96
red tlw the for oftbecoqhte
Time (s) 0.7 0.7 0.6 0.7
260
B. H. V. Topping, A. I. Khan
Fig. 10. Example 2. The initial mesh (126 elements) divided into four subdomains using the SGM.
Fig. 12. Example 2. The mesh partitions delivered by Farhat’s method for the final mesh (681 elements).
the refined meshesthat had been assembledat the end of
system not actively performing housekeeping tasks. The system time and the user times have been added together in order to work out the aggregate times. Owing to considerably higher performance of SPARC 10 processor as compared to 20MHz T800 transputer, the total execution times for the re-worked examples using all the three mesh-partitioning methods were reduced significantly. In addition to the previously tested meshes, two new meshes were created with relatively higher mesh densities and non-convex domains to test the viability of these methods for large-scale finite element discretizations with arbitrary shaped domains.
the SGM, whereas in the previous paper’ Farhat’s and Simon’s mesheswere generated on the basis of a single domain. In theory these meshes should be identical when generated on the same hardware, but very slight differences were noted and it was eventually concluded that these were due to the floating point inaccuracy in the hardware. All the times quoted in the following examples have been measured on a SUN SPARC 10 M20 workstation with a single user logged on and the
Example 1
A square-shaped domain, shown in Fig. 4 with an in-plane horizontal concentrated load at the top left corner node and the bottom comer nodes restrained, was uniformly meshed and the initial mesh comprised 46 elements. Adaptive finite element analysis on this mesh was carried out, which only required a fraction of a second on a SPARC 10 workstation. The results from the Table 4. Example 2. Comparison of the actual number of generated elements per subdomain vs the ideal number of elementstbat should have been generatedusing tbe SGM
Fig. 11. Example 2. The remeshed subdomains for the final mesh (681 elements) partitioned using the SEM.
Subdomain no.
Generated elements (actual)
Generated elements (required)
Diff.
Percentage diff.
1 2 3 4
165 185 163 168
170.25 170.25 170.25 170.25
-5.25 14.75 -7.25 -2.25
-3.08 8.66 -4.258 -1.32
Subdomain generation for non-convex parallel finite element domains
261
Fig. 14. Example3.
Fig. 13. Example2. The meshpartitions deliveredby Simon’s
method for the final mesh(681elements).
adaptive analysis was used to generate the final mesh comprising 412 elements. The final mesh was required for Simon’s and Farhat’s method. The proposed method was applied to the initial mesh and the mesh was divided into four subdomains as shown in Fig. 5 by performing two recursive bisections. The subdomains obtained were independently re-meshed (as they would have been processedin a multiple processors environment) and the remeshed subdomains are shown in Fig. 6. The number of elements generated within each subdomain vs the ideal number of elements which should have been generated per subdomain are shown in Table 1. The mesh partitions delivered by Farhat’s and Simon’s methods for the final mesh comprising 412 elements are shown in Figs 7 and 8, respectively. The number of interfaces C,, for the above partitioning of the mesh obtained using SGM, Simon’s method (RSB).2 Farhat’s method3 and the corresponding times taken for mesh partitioning are shown in Table 2. It may be seen from Table 1 that the maximum positive load imbalance occurs in the subdomain numbers 1 and 2 and is equal to 2.91%. Both Simon’s and Farhat’s methods provide the exact number of Table 5. Example 2. Comparison of interfaces betweenthe subdomains and the mn times on a SPARC 10 workstation for the partitioning of the mesh
Method Proposedmethod Farhat’s method Simon’smethod
InterfacesCcr 58 108 80
Time (s) 880 0.7 92.1
elements per subdomain to ensure equal element distribution. It may be seen from the Table 2 that the proposed SGM method provides better results than both Simon’s method and Farhat’s method with respect to the number of interfaces. The subdomains generated using the SCM were re-meshed independently and the time required to re-mesh these subdomains is given in Table 3. The maximum time for re-meshing was for subdomains 1, 2 and 4, i.e. 0.7 s, whereas re-meshing the complete domain for use with Simon’s and Farhat’s method required 2.9 s. Hence, the actual process of mesh generation was speededup by a factor of 4.143 by the use of the SGM. Example 2
The adaptive finite element analysis was performed for the L-shaped domain shown in Fig. 9. An initial mesh comprising 126 elements was generated. The SGM was applied to the initial mesh and the mesh was divided into four subdomains as shown in Fig. 10. The subdomains obtained were independently re-meshed and are shown in Fig. 11. The final mesh resulting from this re-meshing comprised 681 elements. The number of elements generated within each subdomain vs the ideal number of elements which should have been generatedper subdomain are shown in Table 4. The mesh partitions delivered by Farhat’s and Simon’s methods are shown in Figs 12 and 13, respectively. The number of interfaces C,, for the above partitioning of the mesh obtained using the SGM, Simon’s method (RSB),* Farhat’s method3 and the corresponding time taken for each method is shown in Table 5. It may be seen from Table 4 that the maximum Table 6. Example 2. The mdw of rt?amhgeaeh*malll
Subdomainno. 1
Elementsgenerated 165
2 3 4
185 163 168
and tbe time for
Time (s) 1.2 0.9 0.9
1.0
262
B. H. V. Topping, A. I. Khan Table 7. Example 3. Comparison of the actual number of generated elements per subdomain vs the ideal number of elementsthat should have been generated
Subdomain Generated Generated Diff. no. elements elements (actual) (required) 1 137 145.75 -8.75
Fig. 15. Example 3. The initial mesh (153 elements)divided
into eight subdomainsusing the SGM. positive load imbalance occurs in subdomain 2 and is equal to 8.86%. Both Simon’s and Farhat’s methods provide the exact number of elements per subdomain to ensure equal element distribution. Table 5 shows that the SGM provides the best results with regard to the number of interfaces. The subdomains generated with the SGM were remeshed independently and the time required to re-mesh these subdomains is given in Table 6. The maximum time for re-meshing was for subdomain 1 (i.e. 1.2s) whereas re-meshing the complete domain for use with Simon’s and Farhat’s method required 5.0 s. Hence, the actual process of mesh generation was speededup by a factor of 4.16 in this case. Example 3
For the finite element domain shown in Fig. 14, adaptive analysis was performed on the initial mesh comprising 153 elements. This mesh was divided into eight subdomains as shown in Fig, 15 using the SGM. The subdomains were remeshed and the resultant subdomains are shown in Fig. 16. The maximum positive load imbalance of 11.15% occurred in subdomain 5 as shown in Table 7. The mesh partitions delivered by Farhat’s method and Simon’s method are shown in Figs 17 and 18, respectively. The minimum number of interfaces is provided by the
2 3 4 5 6 7 8
162 147 150
145.75 145.75 145.75
140
145.75
155 116 159
145.75 145.75 145.75
16.25 1.25 4.25 -5.75 9.25 -29.75 13.25
Percentage diff. -6.0
11.15 0.86 2.92 -3.95 6.35 -20.411 9.1
SGM. The sequential mesh generation for the overall domain shown in Fig. 14 took 11.7s on a SPARClO workstation. The maximum time to re-mesh a partitioned subdomain with the SGM was 1.3s for subdomain 1. Thus the speed-up factor for mesh generation using the SGM is 9 for eight subdomains. Example 4
A highly non-convex domain, shown in Fig. 19, was partitioned using the SGM. The subdomains generated for this domain were derived from a coarse uniform mesh comprising 118 elements shown in Fig. 20. These subdomains were re-meshed and the resultant mesh is shown in Fig. 21. The maximum positive load imbalance using the SGM was 2.24% for subdomain 4 shown in Table 9. The SGM provided the best results with respect to the number of interfaces and the mesh partitioning time as shown in Table 10. The coarse initial mesh shown in Fig. 20 was partitioned using the SGM for 8 and then 16 subdomains. The percentage maximum positive load imbalance, the number of interfaces, the maximum time to re-mesh a subdomain and the speed-up factor for re-meshing under the SGM are shown in the Table 11 for 4, 8 and 16 subdomains. The remeshing of the complete domain took 1024s. It appears from the Table 11 that the percentage imbalance and the number of interfaces tend to increase with the increase in the number of subdomains. The time to re-mesh each subdomain decreases leading to an increase in the speed-up factor for the re-meshing of the whole domain. Table 8. Example 3. Comparison of interfaces between the subdomains and the run times on a SPARC 10 workstation for the partitioning of the mesh
Fig. 16. Example 3. The re-meshedsubdomainfor the final
mesh(1166elements)partitioned using the SGM.
Method Proposedmethod Farhat’s method Simon’smethod
InterfacesC-r 144
Time (s) 115.14
304
1.7 275.64
170
Subdomain generation for non-convex parallelJinite
element domains
263
r
Fig. 17. Example3. The meshpartitions deliveredby Farhat’s
method for the mesh(1166)elements).
Example 5 A non-convex domain, shown in Fig. 22, was partitioned using the SGM. The subdomains generated for this domain, based upon a coarse uniform mesh comprising 75 elements, are shown in Fig. 23. These subdomains were re-meshed and the resultant mesh, of 12 584 elements, is shown in Fig. 24. The maximum positive load imbalance with the SGM occurred in subdomain 8, as shown in Table 12, and was 11.06%. The SGM provided the best results for interfaces and mesh partitioning time as shown in Table 13. The maximum time for re-meshing was required by subdomain I, i.e. 55.5 s. The complete re-meshing of the domain required 818.1 s. The speed-up factor for re-meshing using the SGM for this example is therefore 14.74.
Fig. 19. Example4.
method only relies on the mesh connectivities for performing the mesh partitioning, it may be generalized for more than two dimensions without any major modifications. The Subdomain Generation Method was originally developed for the partitioning of finite element meshes for explicit time-stepping nonlinear dynamic analysis. The method was targetted at unstructured adaptive meshes. This method provides a way forward to economically partition unstructured multi-dimensional
5 CONCLUSIONS AND REMARKS The original Subdomain Generation Method presented in Ref. 1 has been modified to account for nonconvexity of the finite element domains. It has been shown with the aid of examples that this method may be applied to any arbitrary shaped domain. Since this
Fig. 18. Example3. The meshpartitions deliveredby Simon’s
method for the mesh(1166elements).
Fig. 20. Example 4. The non-convex domain mesh (118
elements)partitioned into four subdomains.
B. H. V. Topping, A. I. Khan
\
Fig. 21. Example4. The re-meshednon-convexdomain final mesh(14531elements)partitioned using the SGM re-meshed.
graphs approximately into an equal number of elements with minimum interface boundaries. The re-meshing of the finite element domain which is usually very time consuming may be done concurrently on parallel processors. Speed-up increases with the increase in the number of partitions. This is due to the decreasein the problem size for re-meshing and the post processing of the mesh. The re-meshing on each subdomain is done using the theory presented in Refs 5 and 6. Using the SGM, high-density meshes may be economically partitioned on the basis of their initial coarse meshes.Hence, the SGM becomesincreasingly efficient as the final mesh density increases for any given initial coarse mesh. This is reflected in the examples above. In Examples l-3 the number of elements generated from the initial meshesare relatively lower than the number of elements generated from the coarse meshesof Examples 4 and 5. The time taken by the SGM is much higher than the time taken by Farhat’s method in Examples l-3. However, the mesh partitioning time for Farhat’s method becomes significantly greater than the time taken by the SGM in Examples 4 and 5. In this study, the SGM has been tested for coarseTable 9. Example 4. The number of elements generated withia each subdomaia and time taken to re-mesheach subdomain
Subdomainno. 1 2
Generatedelements
Time (s)
3 520 3 614 3 623 3 714
96.2
106.46 132.5 133.54
.
Fig. 22. Example5. grained partitions. The method presents considerable potential for scalability. This method derives computational advantage due to the use of a coarse initial mesh for partitioning a highly refined mesh and parallel remeshing. The possible bottlenecks preventing a higher degree of scalability are the increase in the computational effort in the formation of the partitions and the possibility that only a few elements may generate a large proportion of the total number of elements. The first bottleneck may be removed using a parallel implementation of the genetic-algorithm-based optimization module and the second bottleneck may be alleviated using a graded coarse initial mesh instead of a uniform initial coarse mesh. The adaptive gradation of the mesh element sizes within the initial coarse mesh would generally ensure that there is some gradual distribution of element generation within the elements of the initial coarse mesh. Table 10. Example 4. Comparison of interfaces between the s&domains aad tbe run times on a SPARC 10 workstation for tbe partitioning of the mesh
Method Proposedmethod Farhat’s method Simon’smethod
InterfacesCcf 144
Time (s) 115.14
178
235.3
216
6472.1
Subdomain generation for non-convex parallel jnite
265
element domains
Table 11. Example 4. Effect of inweasing the number of soon the maxinwm positive load imbalance, tbe number of interfaces, maximum re-meh& time for the snbdoslrigs a~4 the speed-up factor for re-meshing using tbe SGM Number of subdomains
Max positive imbalance %
Number of interfaces
Max re-meshing time (s)
Re-meshing speed-up
4 8 16
2.24 3.83 8.57
144 396 782
133.54 55.2 28.0
161 18.55 36.57
The authors would like to acknowledge the useful discussions with other members of the Structural Engineering Computational Technology Group (SECT)
at Heriot-Watt University. The contribution of Ardeshir Bahreininejad, Joao Leite, JLnos Sziveri and Jorgen Stang is gratefully acknowledged. The authors particularly wish to thank Janos Sziveri for his support with the transputer based graphics to display the mesh partitions. The authors would like to thank David S. Scot, Computer Science Department, University of Texas at Austin, for placing the LAS02 source code in public domain which was translated into C and used for computing the Fiedler’s vector in Simon’s method. The research described in this paper was supported by Marine Technology Directorate Limited (MTD) Research Grant no. SERC/GR/J22191. The authors wish to thank MTD for their support of this and other research projects.
Fig. 23. Example 5. The non-convex domain mesh (75 elements) partitioned into eight domains.
Fig. 24. Example 5. The re-meshed non-convex domain final mesh (12 584 elements) partitioned using the SGM.
The load imbalance problem does not pose a serious threat, in fact it may used advantageously on coarsegrained networked computers where system architecture and the computational load among the machines may differ. Further research in the field of neural network may be advantageous in order to improve the performance of the predictive module.
ACKNOWLEDGEMENTS
B. H. V. Topping, A. I. Khan
266
Table 12. Exnmule 5. The number of elementsgeneratedwithin each subdomain‘ind time taken to re-mesheackiubdomain. The re-meshingof the domain touk 81&l s Subdomain no.
Generated elements
Time (s)
1 2 3 4 5 6 7 8
1514 1599 1576 1580 1430 1412 1726 1747
55.5 41.6 32.6 43.4 39.9 42.2 41.0 39.2
Table 13. Example 5. Comparison of interfaces between the subdomah and tke ruu times on a SPARC 10 workstation for the partitioning of tke mesh Method Proposed method Farhat’s method Simon’s method
Interfaces Ccr
Time (s)
560 934 546
45.2 172.7 5 618.8
REFERENCES Khan, A. I. & Topping, B. H. V., Subdomain generation for parallel finite element analysis. Cornput. Cyst. Engng, 1993, 4(4/6), 473-488. Simon, H. D., Partitioning of unstructured problems for parallel processing. Comput. Syst. Engng, 1991,2(2/3), 135-148. Farhat, C., A simple and efficient automatic FEM domain decomposer. Comput. Struct., 1988, 28(5), 579-602. Flower, J., Otto, S. & Salma, M., Optimal mapping of irregular finite element domains to parallel processors. In Parallel Computations and Their Impact on Mechanics, AMSE, New York, 1986, AMD-VOL 86, pp. 239-250. 5. Khan, A. I. & Topping, B. H. V., Parallel adaptive mesh generation. Comput. Syst. Engng, 1991, 2(l), 75-102. 6. Topping, B. H. V. & Khan, A. I., Parallel Finite Element Computations, Saxe-Coburg, Edinburgh, 1996.