PARMESH — A parallel mesh generator

PARMESH — A parallel mesh generator

PARALLEL COMPUTING Parallel Computing 21 (1995) 509-524 EISEVIER Practical aspects and experiences PARMESH - A parallel mesh generator Gerhard ...

1MB Sizes 0 Downloads 102 Views

PARALLEL COMPUTING Parallel Computing 21 (1995) 509-524

EISEVIER

Practical aspects and experiences

PARMESH

-

A parallel mesh generator

Gerhard

Glob&h

*

Department of Mathematics, Technical Universi@ of Chemnitz, 09009 Chemnitz, PSF 964 Germany

Received 2 July 1993; revised 9 November 1993

Abstract We developed a program for the automatically parallel triangular mesh generation in arbitrary bounded plane domains that are a priori divided up into several single connected subdomains. Its output data structure briefly described too is very suitably for further performing parallel hierarchical mesh generation in each subdomain starting from the triangulation got there as well as for parallel processing itself. Two numerical examples are presented. Consequently this program package called PARMESH can be an efficient ingredient for parallel numerical solvers of discrete problems arisen from mathematical physics. Keywords:

Mesh generation;

Parallel preprocessing;

Finite elements;

Domain decomposi-

tion

1. Introduction In recent time the parallel numerical solution of problems given by mathematical physics became very up do date, see e.g. [2,5-71 and the references therein. One of its most essential part is the parallel pre-processing performed on parallel computer technique equipped with p processors, such as the new PAFLWTEEGC-system or the nCube-system. Consequently our main task consists in how to discretize the problem effectively, i.e. especially, in how the needed mesh generation can be done in parallel. As far as we know there is only a few literature, see e.g. [2,4,111 and several references therein, describing methodologies and, above all, few programs that are capable of parallel mesh generating.

* Email: [email protected] 0167-8191/95/$09.50 0 1995 Elsevier Science B.V. All rights reserved SSDZO167-8191(94)00085-9

510

G. Glob&h

/Parallel

Computing 21 (1995) 509-524

We consider the problem of the parallel generation of some coarse finite element triangulation for parallel solving partial differential equations, e.g. boundary value problems in plane bounded domains 0 consisted of several single connected subdomains fli, i = 1,. . . , q. This decomposition must be given a priori by the user. Either the domain 0 is naturally divided up into subdomains (e.g. by different material properties), see also [3,8-101) or we have to define the splitting artificially e.g. by hand or by other preprocessor programs, see [12]. In the second section we describe the input data structure necessary for our program and give the program documentation briefly, cf. [41. In Section 3 the output data structure of the triangular mesh produced by each processor is explained. The data structure contains all of the information about the automatical construction of finite element triangular meshes hierarchically up to a certain refinement level connected with some fine discretization parameter h, cf. [1,6,9,10,12]. In Section 4 we give two numerical examples in order to demonstrate the effectiveness of our program package.

2. Program’s documentation The program PARMESH generates the triangulation of an arbitrary bounded plane domain fi = uBi, i = 1,. . . , q, its boundary as well as the interior boundaries (say e.g. the interface lines) that are separating the single connected subdomains fii from each other are piecewisely consisted of straight lines, circles or parabolae. The description of the geometry of the domain R as well as other data belonging to the boundary conditions and the material properties, etc. of the partial differential equation given on R are included in the input data file. This file can be quickly created by some graphical editor GRAFED, see [12] or must be edited by hand via text editor. The location of the domain R in the plane-coordinate system is arbitrary. The output data of the program can be both the standard finite element mesh data file as it was described e.g. in [12] or the data structure given in Section 3. The first structure can be used for further sequential finite element computations as performed e.g. in [3,9,12], whereas the second data structure is specially adapted for the massive parallelization of the FEM and BEM, see [5-7,111 and the references therein. It depends on the user of the program package whether he wants to look at the whole mesh parallely generated in 0 or only at a certain part of it defined by the corresponding subdomain mesh. Each subdomain 3ii is bounded by so called basic lines that are bounded by basic nodal points (geometric vertices) again. Consequently for each basic line we have its starting and ending basic nodal point. Basic nodal points that have at least the valence 3 are additionally denoted as cross points. In the case of curvilinear basic lines (circles, parabolae) a third point in the middle of the basic line is required. If we have a circle, its geometric equation is obviously determined by this midpoint, and if a parabola occurs its unique definition needs the connection line between the corresponding starting and ending point as the abscissa and its

G. Globisch /Parallel Computing 21 (1995) 509-524

511

midpoint as the zero point of some local coordinate system. The sets of the basic lines and points are essentially determined by the shape of the boundary of the domain 0 as well as of its decomposition into the subdomains fli, i = 1,. . . , q. The corners of these domains always coincide with basic nodal points and separate two basic lines from each other. Note that the midpoint of some curvilinear basic line is a basic nodal point too, but it can never be a crosspoint. 2.1 The structure of the input data file The input data file is consisted of five data blocks concerning the global names of domain’s geometric data. The first data block contains the number of basic nodes, the number of basic lines, the number of subdomains, the number of the (exterior) basic lines (that are defining the boundary of the domain a) and three further information specified later. The second data block is consisted of the Cartesian (x, y)-coordinate pairs of each basic nodal point. In the third data block for each basic line its name (a name is an uniquely defined integer), its type, the names of its starting, eventually midpoint and ending point as well as the information about how and how often this line must be divided up are contained. The opportunities for and its explanation are given in Fig. 1. This is a code determining the number of nodes that are to be newly positioned onto this basic line and its distribution (e.g. equidistant distribution, compressed refinement in the direction of the starting or the ending point of the basic line, respectively, etc.). The fourth data block defines the actual boundary of the domain 0 by the corresponding description of its basic lines additionally equipped with some boundary condition codes and some material code name. The last data block includes the subdomain’s descriptions given in the natural sequence of its names, i.e. for each subdomain these data are written in terms of two rows. The first row contains the name of the subdomain, the number of the basic lines defining its boundary, the assigned material code name and three further data item specifying the mesh generation in this subdomain more detailed, cf. [12]. In the second row all of the names of its basic line description stand sequentially but not necessarily in mathematically positive orientation as it is still required in [12]. The mesh generator is sensitively with respect to its input data. If non regular triangulations are built the user should manipulate the input parameters IT, NR, Al and A2 in data block 3 or the criterion-angles PI2A, PI24A and PI6A defining the triangular meshing in each subdomain; see Table 1. In most cases the standard value triple (O., O., 0.) will produce a regular mesh, where its density depends on the data triple, too. Otherwise we recommend to change the triple in the range from (30., 30., 30.) up to (120., 120., 120.). The shape of each single connected subdomain is allowed to be arbitrary but our experiences indicated that in the case of an extremely concave subdomain the algorithm sometimes fails. Therefore the shape of the subdomains should be defined convexly.

512

G. Glob&h /Parallel Computing 21 (1995) 509-524

nodal distribution .

I

L (I_ II 1

IT

,

a

’ a

b

a

b

tct--+--tt--HI a b

b

Al

A2

0 a

0 0

0 5

Lla0

1 6

arbitrary 0

b/a a or b

0 b or a

2

1

a

7

arbitrary 0

b/a a or b

0 b or a

a

3 10

even 0

b/a a or b

0 b or a

a

3 8

odd 0

b/a a or b

0 b or a

b

NR

I b

a

4 11

even 0

b/a a or b

0 b or a

1 b

aa

:

odd 0

b/a a or b

0 b or a

12 or 13

arbitrary

adapted to point singularities (see e.g. section 4.2)

CL

Hi

i

Fig. 1. The context of the parameters IT, NR, Al and A2 for partitioning the basic lines.

Now the input data file structure can be given generally in terms of the following Table 1. By the example shown in Fig. 2 we gain insight into its given specific data-file structure. Fig. 3 presents both the whole triangulation built up by the program package in the domain 0 its contour’s outline was given in Fig. 2 and the refined triangulation of the third subdomain On3 after the hierarchical mesh refinement was one times performed by dividing each of its coarser triangles up into four congruent smaller triangles.

2.2 On the parallelization

of the program

The parallelization of the sequential mesh generator PREMESH consists in the following steps. Let the domain decomposition (DD) of the domain L! into 4 subdomains be given. Reading the input data file by the root processor, there the entire basic line partitioning will be performed and stored according to its

G. Glob&h /Parallel Computing 21 (1995) 509-524

Table 1 The general structure of the PARMESH-input-ASCII-data-file Block I

The first data block (scalar data); The first row of the PARMESH-input file: the number of basic nodal points NG the number of basic lines NGL NO the number of subdomains NRSA the number of basic lines on the domain’s boundary NPMAX the maximum value of nodal points that can be generated NEMAX the maximum value of triangles that can be generated the admissible maximum value of points on all of the boundaries NPG Block 2 The Cartesian coordinate block, X(I), Y(I), I = 1,. , NG the x- and y-coordinates of the basic nodal points Block 3 The block of the description of the basic lines; consisted of NGL rows; for each basic line one line is necessary: the name of the basic line NUM LT the type of the basic line (l-straight line; 2-circle; 3-parabola) NN the name of its starting point the name of its midpoint (O-if straight line) NS NK the name of its ending point IT the type of its partitioning E{0,...,131 NR IT 0..4 the number of points to be generated on it IT>4 in this case stands 0 Al the relation between the longest and the IT 1..4 shortest edge to be generated on it IT > 4 the lenght of the longest (shortest) edge A2 in this case stands 0 IT 1.S IT>5 the lenght of the shortest (longest) edge Block 4 The block of the boundary description; consisted of NRSA rows; for each basic line of the boundary of R the row: NUM the name of the basic line (cf. above on block 3) LT the type of the basic line (cf. above in block 3) NN the name of its starting point (cf. above in block 3) NS the name of its midpoint (according to the above) NK the name of its ending point (cf. above in block 3) IRCl the code name for the boundary condition for the first degree of freedom and now if it is required: IRC2 the code name for the boundary condition for the second one ........... IRCNDF the code name for the boundary condition in the last one MB the material code name given on this basic line Block 5 The block of the description of the subdomains; consisted of 2 x NO rows; for each subdomain: the first row contains the following: NUM the name of the subdomain NPA the number of basic lines that are bounding it NMB the corresponding material code PI2A the first criterion-angle defining the triangles (default-O.) PI24A the second criterion-angle defining the triangles (default-O.) PI6A the third criterion-angle defining the triangles (default-o.) the second row contains the following: LB(I), I = 1,. . . , NPA the names of the corresponding basic lines

513

514

G. Glob&h /Parallel Computing 21 (1995) 509-524

q

8

X

cl)

&+-6&

starting/ending

0

basic lines

0

subdomains

basic nodes l

midpoint

e

crosspoints

10,8,3,6,0,0,0 Block1 -lO.O.-5.0 0.0,15.0 lO.O,-5.0 12.0,0.0 10.0,5.0 0.0,5.0 Block 2 -10.0, 5.0 -12.0,o.o -2.0,o.o 2.0,o.o 1!O,,, 1 1 0 2 0 lO,O,O 2,1,2,0,3,0,10,0,0 3**I,,, 2 3 4 5 0 lO,O,O 4v,,,,, 1 5 0 6 0 lO,O,O 5>,I,,, 16 0 7 0 lO,O,O Block3 6o,,,, 2 7 8 1 OlO,O,O 7t,,,,, 2 2 9 6 0 lO,O,O 8,2,6,10,2,0,10,0,0 1110211 IO,,, 2120312 o,,,, 3234512 ,I,,,, 4150612 Block 4 t,o,, 5160711 *I,,,, 6.2.7.8.1.1.1 1;4;1;0;0;0' 1,7,5,6 242000 ,?, 1, 2,394, Block 5 323000 ,It,, 798

Fig. 2. The outline of the domain decomposed into 3 subdomains and the input file it belongs to.

specification given in data block 3, cf. necessary arithmetical operations is of denotes the average size of the distances processor the renumbering of the cross

Table order in the points

1. The corresponding number of 0(/z,), where the parameter ho basic line partitioning. On the root given by the DD is performed in

Fig. 3. The whole coarsest mesh generated in parallel and the refined triangulation in &.

G. Glob&h /Parallel Computing21 (1995) 509-524

515

order to prepare the efficient solution of the system of equations defined later on these crosspoints according to the preconditioning theory given e.g. in ]5-71 for the parallel solution of the DD-discretized partial differential equation. Ending the basic line partitioning, well specified subsets gi, i = 1,. . . , q, in each case containing the data of the ith subdomain (the coordinates of all of the points at its boundary and the subdomain’s description in data block 5 are included in) will send from the root processor to well determined others, where the implemented static loadbalanced data division is defined by the following bijection of the data sets @ onto the kth processor, k = 0,. . . , p - 1: Let q be the number of subdomains and p be the number of available processors. l Case 1. (p 2 q): We define the mapping i * k, i.e. one and only one data set k=O, l,..., q - 1; and p - q processors will do nothing; 9 k+l perprocessor l Case 2. (q >p>: We have n := q/p and r = mod(q, p) and the data set B = {.9r,. . .) 9J is sequentially subdivided into p subsets as follows: Yo= {.@I, 92,. . . , qJ, 91 = I&+1, gn+*, . . . , .%J,. * *, 9y-1 = .9J, where card = n Vi = 0, 1,. . . , p 1. I%(p-1)+1,...9 If r # 0 then defining the remainder-set 9 = {9np+l, gnp+*, . . . ,53J and the sets PO, 9r, Yr_, must be sequentielly extended by one and only one data set from 9. Then, we define the mapping: Yk * k = 0, 1,. . . , p - 1. In [4] we proposed two opportunities for performing scientific load balancing such that the amounts of numerical operations for the mesh generation performed in each processor agree reasonably well, see Subsection 4.2.2, too. After the above one-to-one mapping was performed by means of only rare communication the program PARMESH runs totally parallel such that the triangulations in the corresponding subdomains will be generated very efficiently. The internally performed mesh generation frontier-algorithm is based on the leveling and removing process of triangles. Its idea was founded in [13]. When the mesh in each subdomain was generated the renumbering of the names of its interior nodal points is continued immediately if it was globally demanded by the user. This process is performed totally parallel, too. The corresponding algorithm is based on minimal nodal degree ordering, see [3]. Its amount of arithmetical operations can be estimated by 0((NUMINP)‘~5), where the variable NUMINP represents the number of interior nodal points of the subdomain. The impressive results given in [31 have been proven that the renumbering of the nodal points in the coarsest mesh is very efficiently in the case of the exact solution of systems of equations discretized there and included in the multilevel method. This method can be used as efficient subdomain solver in the parallel solution of the partial differential equation given on the domain 0 discretized by the additive Schwarz-DD-method, cf. also [5-71. To improve the shape of the generated triangles the grid smoothing process (see e.g. [1,12]) completes the program, where its parallelization is based on the made mapping, too. Our program package PARMESH can be used as subroutine in other parallel programs such as e.g. described in [51. In this case, according to the geometric description of the DD, the subroutine gets only the data structure belonging to one

516

G. Globisch /Parallel Computing 21 (I 995) 509-524

and only one subdomain. Therefore no communication PARMESH runs locally for meshing one subdomain.

amount occurs because

3. The output mesh data structure for parallel processes In each subdomain the corresponding triangulation generated in parallel is described by the program PARMESH in terms of an edge-related data structure. That is, for the subdomain 0, the corresponding processor makes available distinct data blocks its structure is claimed to be very suitable for further parallel computations. The names of all of the data and the size of the corresponding sets they belong to are defined to be locally and uniquely there, too. For simplicity the data are summarized and described in the following survey. (1) Scalar data: NUMNP: the total number of nodal points generated in the subdomain (the corresponding number of points on its boundary is included in) NUMINP: the number of nodal points in the interior of the subdomain NUMMP: the number of midpoints given on the curvilinear edges of the boundary of the corresponding subdomain NUMCP: the number of nodal points on the boundary of the subdomain (so called coupling points that are no cross points) NUMCRP: the number of crosspoints (i.e. the number of basic nodal points that have at least the degree greater than two). Its size corresponds to the number of coupling boundary pieces given between two cross points NUMED: the number of all of the edges generated in the subdomain NUMEL: the number of all of the triangles generated in the subdomain NUMBED: the number of the edges on the boundary of the subdomain if some corresponding part of the actual boundary of the domain 0 is described by the number of degrees of freedom the boundary conditions of the NDF: whole domain 0 are defined by the number of refinement steps are to be carried out by the NTR: hierarchical mesh generator. Remarks l

Each of the above first eight data item belonging to one and only one subdomain-mesh is uniquely assigned to some ith element of the pointervector IZEIG& NTR, IPROC), where the variable IPROC (2 1) denotes the maximum number of subdomains one and only one processor can have for the corresponding mesh generation in it. Hence the assignment of the 4 subdomains to the p available processors as described in Section 2.2. is restricted by the size of IPROC. Because PARMESH generates the initial triangulation (NTR = 1) the entries are made in IZEIG(I’, 1, 1), where the index 1, (1 I 1 I IPROC) is the local name of one of the subdomains mapped to the kth processor (k = 1,. . . , p). Remembering the adopted

G. Glob&h /Parallel Computing 21 (1995) 509-524

517

convention n = q/p and provided that the case q > p holds at most 1 = n + 1 can be fulfilled. l Defining further entries in the pointer-vector IZEIG the starting pointers for each of the following array-data are marked by because all of the vectorial data stand on some large array B(.) in our FORTRAN-program package. If we put more than one subdomain into the kth processor consequently we get the corresponding entries in IZEIG(*, 1, 11, I> 1, the starting pointer of the I-th vectorial data set of the same type (a) - (g> is given. (2) vectorial data: (a) IED(5, j): the vector of all of the edges of the subdomain, where j=l , . . . , NUMED, and the following holds: IED(1, j): the name of the starting point of the jth edge IED(2, j>: the name of the ending point of the jth edge IED(3, j): the name of the midpoint of the jth edge, if the edge is of curvilinear type otherwise zero stands here IED(4, j>: is set to be zero if the jth edge is in an interior one and otherwise it is the name of the corresponding coupling boundary piece the jth edge belong to IED(5, j>: = 1 if the type of the edge is a straight line = 2 if the type of the edge is a piece of some circle = 3 if the type of the edge is a piece of some parabola (b) IECE(4, j= l,..., IECE(1, IECE(2, IECE(3, IECE(4,

j): the element connectivity NUMEL, and the following j): j): are called the names of j): j): the material code name

vector of the subdomain, where holds: the 3 edges belong to the jth triangle of the jth triangle

Remarks * This vector can be used in order to generate the finite element stiffness

matrix quickly. * The element connectivity vector that contains the names of the nodal points belonging to the jth element is typically for finite element computations. Using the following context the edge-related data set IECE can be rewritten into the vector 1X(4, j> consisted of the three names of the nodal points that are the vertices of the triangle: The nodal point denoted by 1X(& j> is the name of the point that is in the opposite of the edge called IECE(i, j), i = 1, 2, 3; j = 1,. . . , NUMEL. (c) IBC(2 + NDF, j): the vector for coding the boundary properties, where j=l , . . . , NUMBED, and the following holds: IBC(1, j>: the name of the edge defining the jth boundary edge IBC(2, j): the material code name of the jth boundary edge

518

G. Glob&h /Parallel Computing 21 (1995) 509-524

IBC(3, j):

is set 1, 2 or 3 if Dirichlet, Neumann or boundary conditions . . .. . .. .. of third type are given on the jth edge, where the variable IBC(2 + NDF, j): NDF( 2 1) is the number of the degrees of freedom (d) ICBN(4, j): the vector the coupling boundary pieces of the subdomain are defined by, where j=l , . . . , NUMCRP, and the following holds: ICBN(l, j): the name of the first coupling point given on the jth coupling boundary piece of the subdomain ICBN(2, j): the number of coupling points on the jth coupling boundary piece ICBN(3, j): are the names of the two cross points that are starting or ending ICBN(4, j): point of the jth coupling boundary piece of the subdomain Remarks * The coupling points that lie on the jth coupling boundary

piece are uniquely numbered in the natural sequence starting with the first name ICBN(l, j) that followed the cross point ICBN(3, j) immediately. * The finite sequence of the cross points ICBN(3, l), ICBN(4, l), ICBN(3, 2) ).........) ICBN(3, NUMCRP), ICBN(4, NUMCRP) is closed and given in mathematically positive relation, where the condition ICBN(3, 1) = ICBN(4, NUMCRP) is fulfilled.

(e) X(2, j): the (x, y)-coordinate vector of all of the nodal points of the subdomain, where j=l , . . . , NUMNP, and the following holds: X(1, j): the x-coordinate of the jth nodal point X(2, j): the y-coordinate of the jth nodal point Remark * Increasing

the index j monotonically the nodal point coordinates are given as follows: At first the coordinates of the crosspoints stand according to its local-sequentially natural numbering, secondly the coordinates of the coupling points stand according to the sequentially natural numbering of the coupling boundary pieces they belong to and now the coordinates of all of the interior nodal points complete the vector.

(f) XM(2, j): the (x, y&coordinate vector of the midpoints belonging to curvilinear edges possibly participated in defining the boundary of the subdomain, where j=l , . . . , NUMMP, and the following holds: XM(1, j): the x-coordinate of the jth midpoint XM(2, j): the y-coordinate of the jth midpoint

519

G. Glob&h /Parallel Computing 21 (1995) 509-524

Remark * The numbering

of these coordinate pairs is sequentially and mathematically positive related in addition to.

natural too

(g) IEDM(j): the vector that includes specific refinement information for each edge of the subdomain, where j = 1,. . . , NUMED.

coded

4. Numerical examples The computation of the following two examples are performed on the Multicluster II equipped with 32 T 805 processors (30 Mhz, 8 Mbyte). The computational time given in the following Tables 2, 3 and 4 contains neither the time necessary for the rare communication and the cross point renumbering at the beginning of the program nor the time for the possible output of the mesh at its end. Hence, the measured time really indicates the effect of the performed parallelization, where in each case both the time needed for parallel grid smoothing and for parallel interior nodal renumbering in all of the subdomains is involved. 4.1 Example 1 - An academic test problem The first example is an academic one. The domain R is the (0, 4) x (0, 4) square divided up into 16 congruent subsquares, see Fig. 4. The corresponding basic lines are the edges of the smaller squares. For all basic lines its division into

0

0

0

0

0 9

0

0

0

0

5

0 6

0 7

0a

0

0

0

0

13

1

i

14

10

2

15

11

3

12

4

The square decomposed into the given 16 subsquares Fig. 4. The square decomposed divisions per each basic line

16

One initial subdomain mesh with 35 divisions per each basic line

into the given 16 subsquares. One initial subdomain mesh with 35

520

G. Globixh

/Parallel

Computing 21 (1995) 509-524

Table 2 Computational results for the mesh generation in the square Number of partitions of the basic lines (NR)

Number of refinement steps

Total number of generated triangles

Time for parallel mesh generation by 16 processors (in set)

Time for sequential mesh generation by only 1 processor (in set)

Total number of nodes in the initial triangulation of the whole domain: 16 0

0

1 2 3 4 5 6 7

32 128 512 2048 8192 32768 131027 524288

0,Ol 0,Ol 0,03 0,06 0,18 0,65 2,68 1059

0,14 0,23 0,41 0,92 2,83 lo,32 memory exceeded

Total number of nodes in the initial triangulation of the whole domain: 145 0

1

1 2 3 4 5

256 1024 4096 16384 65536 262144

0,07 0,09 0,14 0,39 1,42 5,37

0,89 1,16 2,14 5,91 memory exceeded

Total number of nodes in the initial triangulation of the whole domain: 674 0

5

1 2 3 4

1250 5000 20000 80000 320000

0,32 0,40 0,69 1,94 6,82

5,08 6,29 lo,92 memory exceeded

Total number of nodes in the initial triangulation of the whole domain: 4987 0 9716 4,71 74,05 15 1 38864 5,27 82,98 2 155456 7,72 memory exceeded Total number of nodes in the initial triangulation of the whole domain: 12983 0

25

1 2

25548 102192 408768

30,23 31,83 38,18

473,20 memory exceeded

Total number of nodes in the initial triangulation of the whole domain: 23376 0 46174 118,06 1860,62 35 1 184696 121,02 memory exceeded

pieces is of equidistant type specified by the corresponding number NR given in the first column of Table 2. Table 2 presents the computational results belonging to the above square. As it was expected because of the totally uniform loadbalance between the 16 subdomains given here, the speed up is nearly 16, where only slightly inaccuracies in time measuring occurred.

521

G. Glob&h /Parallel Computing 21 (1995) 509-524 absolute permeability : po = 1.257 * 10-6ti/Am relative permeability and the given materials : iron rotor /A,.= 1694 (4 (b), (c)

permanent

magnet

pr = 1.15

(d)

sheet-metal

shell

p,. = 2408

(e)

air gap

Pr = 1

5: Pz: P$ Pa: PSI Ps:

AI ti AZ M x3 M A4 M As M & =

0.46 0.48 0.57 0.59 0.51 0.79

Fig. 5. The fourth-cross section of the electronic motor containing 4 materials.

4.2 Example 2 - Permanent excited direct current motor This example is of important practical interest, cf. [3,8-10,121. The domain 0 is the fourth of the cross section of an electronic motor the magnetic field computation must be calculated in. We incorporated two subexamples both having the same motor’s geometry but differing in the made DD. In the first one the DD was heuristically performed without regarding fine loadbalance. Hence the corresponding speed up in Table 3 is low. Even this demonstrates us how DD may not be. Knowing the reason for we performed well balanced DD by redefining the DD

number

of el.

( 50

67

10

10

16

53

43

17

16

12

18

12

1

15

15

Fig. 6. The unbalanced DD consisted of 16 subdomains and tbe initial mesh belong to.

15

522

G. Glob&h

/Parallel

Computing 21 (1995) 509-524

Table 3 Computational results for the unbalanced mesh generation in the motor Number of refinement steps

Total number of triangles

Time for parallel mesh generation by 16 proc. (in set)

Time for sequential mesh generation by only 1 proc. (in set)

Speedup

0 1 2 3 4 5

370 1480 5920 23680 94720 378880

0,30 0,37 0,57 1,37 5,02 18,62

1,61 2,03 3,49 9,06 memory exceeded

5.37 5.48 6.12 6.61

Total number of nodes in the initial triangulation of the whole domain: 205

(and hence the data sets LZ&,i = 1,. . . , 16) such that the computational efforts for the mesh generation agree reasonably well between the 16 subdomains. Now the speed up in the second one was as high as it was expected for the case of parallel computations, see Table 4. Fig. 5 presents motor’s geometry with its distinct material properties additionally connected with geometric peculiarities, which are causing solution’s singularities in several indicated points Pi, i = 1,. . . ,6 six values Ai are assigned to. These are estimations of the local solution’s regularity given by [91.

By means of Maxwell’s laws the magnetic field problem defined on motor’s cross section can be rewritten in the following variational formulation, cf. also [3,8]: Find the function u E V, = {v E H’(R): ulan = 01, such that tlv E V,, holds:

/

txy> VTuVu

dx dy =

noon r

jn

pop

txy> $ox- gBo,

dx dy,

r

7

7

where B,, and Boy denote the remanent inductions of the permanent magnet in x or in y directio!, respectively. The solution u of the problem is at least in some Sobolev Space W,“*(d2> with some regularity A > 0, see [31 and the references cited therein.

Table 4 Computational results for the well performed mesh generation in the motor Number of refinement steps

Total number of triangles

Time for parallel mesh generation by 16 proc. fin set)

Time for sequential mesh generation by only 1 proc. (in set)

Speedup

0

728 2912 11648 46592 186368

0,22 0,26 0,45 1,16 4,22

3,16 3,95 6,82 18,16 memory exceeded

14.36 15.19 15.15 15.65

1 2 3 4

Total number of nodes in the initial triangulation of the whole domain: 686

523

G. Glob&h / Paralkl Computing 21 (1995) 509-524

no, subdomain

number of el.

1

1 46

2

46

3

47

4

42

5

40

6

47

7

a

9

lo

11

12

13

14

15

16

48

44

49

43

48

48

47

45

48

40

Fig. 7. The well balanced DD and the whole initial triangulation belong to.

4.2.1 Non-loadbalanced domain decomposition The whole initial mesh for the rough balanced DD is shown in Fig. 6. Table 3

contains the CPU-time for the number of specified refinement steps using the same partitioning code of the basic lines fixed in the input data file. 4.2.2 Loadbalanced

domain decomposition

Fig. 7 gives the newdefined domain decomposition of the electronic motor and the initial triangulation belong to, where our knowledge about the location and the type of the filtered six singularities was used in order to perform well balanced adaptively initial mesh generation. As we explained in Subsection 2.1, by PARMESH we got the opportunity to specify the basic line division adaptively dense at the beginning and at the end of the corresponding basic lines, respectively, cf. the data item IT = 12 and IT = 13, respectively in Fig. 1). As levels of connected triangles in the immediate vicinity of the centre Pi of the singularity go away from this point, the sizes of the corresponding edges of these triangles must be defined monotonically increasing as follows, see [9]: hj=lj+l-lj,wherelj:=(j~Hi)l’“‘,

and p
j=O,

l,...,NR,

i = 1(1)6.

Here the parameter Hi, i = 1(1)6, is some suitable scale-unit for the basic lines in the immediate neighbourhood of the centre Pi of the point singularity defined to be such that (NR aHi) “ILi
524

G. Globbch /Parallel Computing 21 (1995) 509-524

therein that the discrete convergence shortcomings, arisen e.g. from the application of the traditional finite element approach in the case of singularity perturbed solutions, can be compensated very well. We renounce to comment the numerical examples in detail because we claim that the comparison between the sequential and the parallel version of the program package PARMESH is impressively, where the superiority of the parallel computation depends essentially on the fine load balancing. The effectiveness of the mesh refinement by the hierarchical mesh generator is based on the edge-related data structure of the element connectivity such that the fourthing can be performed easily and very fast, cf. [10,12]. Further numerical results especially concerning solution’s controlled domain decomposition are given in [4].

References HI R.E. Bank, PLTMG User’s Guide (June 1981 version), (Tech. Report, Dep. of Mathematics, Univ. of California at San Diego, La Jolla, CA, 1982). [2] G.F. Carey fed.), Parallel Supercomputing: Methods, Algorithms and Applications (John Wiley, Chichester, New York, Brisbane, 1989). [3] G. Globisch, Robuste Mehrgitterverfahren fur einige elliptische Randwertaufgaben in Zweidimensionalen Gebieten, (Dissertation, Technische Universitat Chemnitz, 1992). [4] G. Globisch, PARMESH - a parallel mesh generator, Preprint-Reihe der Chemnitzer DFG-Forschergruppe ‘Scientific Parallel Computing’ (SPC 93.3, June 1993). [S] G. Haase, U. Langer, A. Meyer, Parallelisierung und Vorkonditionierung des CG-Verfahrens durch Gebietszerlegung, in: Parallele Algorithmen auf Transputersystemen, Teubner-Scripten zur Numerik III (Teubner, Stuttgart, 1992). ]6] W. Hackbusch, Parallel Algorithms for Partial Differential Equations (Vieweg, Braunschweig, 1991); or in: Proc. Sixth GAMM-Seminar (Kiel, 1990). [7] B. Heinrich, U. Langer, A. Meyer and M. Pester, Algorithmiscbe Grundlagen der Simulation von angewandten Problemen der Kontinuumsmechanik auf massiv parallelen Rechnem, Griindungsantrag der DFG-Forschergruppe SPC, TUC Chemnitz, 1992). [8] B. Heise, Multigrid-Newton-Methods for the calculation of electromagnetic fields, in: Proc. Third Multigrid Seminar, R-Math-03/89 (AdW-Mathematik, Berlin, 1989) 11-52. [9] M. Jung, Multilevel methods for problems with a non-smooth solution, Dep. of Mathematics, Technical University of Chemnitz, in preparation. [lo] M. Jung and R. Wohlgemuth, Generation of hierarchical finite element meshes for interface problems, Preprint Nr. 202, Fachbereich Mathematik, Technische Universitiit Chemnitz, 1991. [ll] R. Quatember, Ein paralleler Algorithmus zur Dreieckszerlegung beliebiger geschlossener Oberflfchen im R3. in: U. Langer, ed., Proc. zum DFG-Workshop Impkmentietung parallekr Algorithmen auf Transputersystemen (DFG Forschungsschwetpunkt Randelementmethoden), (Fachbereich Mathematik, Technische Universittit Chemnitz, 1992). [12] W. Queck, The Finite Element Multigrid Package FEMGP - A software tool for solving boundary value problems on personal computers, in: S. Hengst, ed., Proc. GAMM-Seminar on MultigridMethods, Gosen, Germany, Sep. 21-25, 1992 (IAAS Berlin, Report No. 5, ISSN 0942-9077, Berlin, 1993). [13] A.G. Tsybenko, N.G. Vashchenko, N.G. Krishchuk and Yu.0. Lavendel, Automatizirouannaya Sistema Obsluzhivaniya Konechno-elementykh Raschyotou (Golovnoe izdatelstvo isdatelakogo obedineniya Vishcha shkola, Kiev, 1988).