Computer Physics Communications 53 (1989) 349—355 North-Holland, Amsterdam
349
MULTILEVEL SOLUTION METHOD FOR THE p-VERSION OF FINITE ELEMENTS S. FORESTI, G. BRUSSINO
1,
S. HASSANZADEH and V. SONNAD
IBM Corporation, Data Systems Division, Department 48B/MS 428, Neighborhood Road, Kingston, NY 12401, USA Received 22 August 1988
A multilevel iteration scheme is utilized in the solution of equations resulting from the p-version of the finite element method. The advantage of this approach is that it is possible to attain the speed of multigrid techniques in the solution of systems of equations without generating a sequence of nested grids on complex geometries. We compare the performance of the multilevel method with other iterative methods, and a direct method for the solution of the Poisson equation on a square with the Dirichiet boundary conditions and demonstrate the effectiveness of this approach for both sequential and parallel computation.
I. Introduction The multigrid method is increasingly being accepted as one of the most effective computational techniques for the solution of problems arising from the discretization of partial differential equations. It exploits the ability of Gauss— Seidel and Jordan iterative methods to reduce high frequency errors very rapidly: a sequence of nested grids is constructed and the errors are reduced at each grid; the net result is a rapid decrease in all frequencies of the error. However, for complex 3-dimensional geometries, defining this sequence of nested grids is a difficult geometric and programming problem. An approach that is proposed to handle this difficulty, is to use the p-version of the finite element method. While this approach was originally proposed for adaptive computations [2], the structure of the method makes it unusually well suited to multilevel cornputations [3]. The basic idea of the p-version is to use higher order polynomials in regions where higher accuracy is desired, instead of reducing the mesh size (h-version). This approach has been very successfully used for adaptive computations in
1
Present affiliation: Alliant Computer Systems Corporation, Littleton, MA 01460,
USA.
structural applications and a large body of results is available establishing the effectiveness of this approach in discretizing partial differential operators [9].
2. Multilevel iteration scheme The essential feature that makes the finite element method with hierarchic basis functions highly suitable in a multilevel iterative scheme is that the matrix formed in the p-version is hierarchic in nature; i.e., the matrix formed by a lower order polynomial is fully contained in the matrix formed by a higher order polynomial: the matrix is itself nested. When performing multilevel iterations, the nested matrices resulting from the p-version play a role similar to the nested grids employed in multigrid methods in the h-version. Multigrid techniques use nested geometric grids to control errors at different frequencies, wbile the multilevel iteration scheme with hierarchic basis functions works in a more direct fashion with’ the frequencies. The multilevel iteration scheme analogous to the multignd V-cycle c~mbe easily described [3]. We illustrate it below for a system of equations A U = F involving thrqe levels, and a basis of hierarchic polynomials i~pto the second order: we
OO1O-4655/89/$03.50 © Elsevier Science Publishers B.V. (North-Holland Physics Publishing Division)
S. Foresti et a!. / Multilevel solution method for the p-version of finite elements
350
assume that the lowest order polynomial corresponds to level 0, and the highest to level 2.
IA~ A01 A
=
A021
I A1~
A11
A12
LA20
A21
A22]
],
I u~1
u= I
Lu2 U1
I,
[A~]
L
—
restriction, where the residual on a coarse mesh is
1
—
—
A
—
I.
—
A21U~ A22U~’)] —
1
A1j[C1j
=
L~’]~
=
]L
=
LF1
—
used to obtain a residual at a finer mesh. Multigrid methods use various forms of weighted interpolation to prolong the residual from a coarse mesh to a fine mesh. In the multilevel method, prolongation from the lowest level to the upper levels is implicitly achieved by performing smoothing steps at each level using the available vectors as starting vectors, and this is carried out until the highest level is reached, at which point the downward step is started. The prolongation from level 0 to level 1, consists of including terms 3~U 2~] isdropped the newinguess at level 1 (which were simply the restriction to at level 0):1,[U~ 3~ linear system is for iterations level where the defined as before; so the new updated vector is [U~4~U 3~]. In the same way we can prolong to the 1~ 4~U~3~ U~’~] is the prolonged vechighest level: [LTJ tor; after smoothing we get [uJ5~U~ U~2~] at level 2.
Starting with an initial guess [C~°~C~°~], the smoothing steps lead to [C~ Cr]. If the initial guess is [C~°~c~°~] 0, the problem is equivalent, by a simple change of variables, to 1~ 1 IA~ A01 11 ~ 1 I F0 A02U~ 1~ A11 v1] A12U~
LA10
02U~1)],
LF2]
The restrictionthe to residual the coarser level, of truncating vector toconsists the size simply of the matrix at this level; it becomes the new right hand side, for iteration at the coarser level: IA~ A 01 11 C0 IR~’~ 0 iI
LA10
2~ A
—
A~1U1~
I I,
—
F2
—
=
—
RT]
[~
=
—
=
=
relaxation with an initial guess [WJ°~] 2~];thensteps the updated vector is [W~~] [U~3~]. The process of prolongation is the inverse of [U~
where A,~,L~and F; are blocks of coefficients related to the degrees of polynomials i and j. Starting with an initial guess [UJ°~~ U~°~], a chosen number of relaxation sweeps with a selected smoother are performed to get [U~ U~ Up]; the residual is calculated 1~ A ~R~’~ IF0 A~IJU~ 01u~’~ A02u~”~ RY~I [F1 AIOUO~’~ A11U~”~A12U~
I
[w0]
where one can either solve exactly or perform
IF0] F= F1
1
down to the lowest level (equivalent to the coarsest grid in a multigrid algorithm). Again we use a simple change of variables to get
J
—
3. Smoothing procedures Since the Gauss—Seidel method haswe been widely used in traditional multigrid methods first used this as a smoother in the multilevel iteration scheme described above. The Gauss—Seidel
as long as the initial guess [V~°~V~°~][U~ U =
1W]: in this case there is no need to define any intermediate vectors. After smoothing, the the vector is 2~U~2~] and residupdated with [VJ’~V~’~] [UJ ual at this level is =
I R~1
L
IF
L
2~A —
2~ A —
0 A~UJ 2~A01U1~ 2~A 02U~] 1~ —
—
—
F1 A10U~ 11u~ 12u~ The process of smoothing and restriction is carried R~2)]=
—
smoothing techniques is inherently sequential in nature and is extremely difficult to implement in parallel on a large grain One approachespecially to implementing the machine. Gauss—Seidel scheme in parallel is to use red—black ordering with domain decomposition. This has been shown to beofana effective scheme for parallel tion traditional multigrid methodimplementa[7]. A number of other approaches for parallel multigrid methods have been implemented on various archi-
5. Foresti et a!.
/
Multilevel solution method for the p-version offinite elements
tectures [7,8]. However, it is difficult to utilize these techniques on complex geometries, and it was decided to try an approach that would be consistent with the general applicability of the present method. We have adopted an approach that resembles the “generalized multigrid method”, described by Prof. Zienkiewicz in a private communication. The basic idea is to use a block Gauss—Seidel iteration scheme with the block sizes chosen so that at every hierarchical level, the diagonal block of the corresponding matrix represents the coupling among the terms at that level only. This requires that a system of equations be solved at every Gauss—Seidel smoothing step. An exact solution is not necessary: a few conjugate gradient iterations are enough to obtain an approximate solution, even at the coarsest level.
4. Multilevel schedule The efficiency of a multilevel algorithm for a chosen smoothing procedure, is strongly dependent on the scheme used to determine change of levels. Several multilevel schedules were tned out and the remarks below are based on computational experiments rather than on theoretical considerations. In multilevel schemes, the work required to solve a problem to a given accuracy is kept to a minimum if there is a balance between the error at a certain level as compared to the overall error; for instance it is useless to smooth at a certain level if the components of the residual are much less than those at other levels. For this reason it is efficient to perform only a few smoothing iterations at any level. The majority of operations in the multilevel iteration scheme are taken by the smoothing operation; no calculations are required for restriction and prolongation. This is in contrast to the multigrid method with the h-version where several operations are required for restriction and prolongation. If the level is frequently changed, these operations can increase the execution time, in spite of a better rate of convergence. It is therefore possible to choose a level by level
351
schedule with much greater freedom in the above method than with multigrid methods. The schedule, that we finally chose, is the sawtooth V-cycle where smoothing is performed on the way up from the coarsest grid to the finest. Only the finest level is reached, smoothing is resumed at the coarsest level. Only one smoothing iteration is performed at any level; this iteration is skipped if the residual at that level is lower than a certain preset fraction of the overall residual. Since the smoothing procedure is a block Gauss-Seidel procedure, we need to solve a system of equations at each iteration. This system of equations is solved approximately by conjugate gradients using as many iterations as are required to drop the residual below the same fraction of the last overall multigrid residual. The overall residual is cornputed at each point in the schedule where we reach the finest level.
5. Computational results of senal execution The problem chosen to test the multilevel algonthm was the Poisson equation with homogeneous Dinchlet boundary conditions on the penmeter of a unit square: 82u 02u + ~j f(x, y) on (0, 1) X (0, 1), ~ u(x, y) 0 on .
.
.
.
=
=
where f(x,y)=2N(2N+1)ix
r
2N—1
(y
2N±1
—y)
I
+y2~~_l(x2V~1~1 —
x)]
with N 10. The solution to the problem, with the chosen boundary conditions and forcing function, has a large gradient near one of the edges, making it a suitable test problem. The problem was solved using an increasing number of elements; in all cases each element used polynomials of level 8 (the upper 7 levels are hierarchic). The criterk~nused to determine convergence in the iterative procedure was that the norm of the residual was reduced by 106. =
352
S. Foresti et aL
/
Multilevelsolution methodfor the p-version offinite elements
0
~\~J_ 0-
~~-4. —2 0~ —6
0-
1~ 0 —2 ‘--4.
-
0
I
I
0
I
I
2
Number
I
I
4
of
I
I
6
9
I final residual 2 1/W I initial residual I 2) where W is the overall number of Work units, =
defined as follows. Let T be the total CPU time and ‘r0 the average time required for a single finest-level relaxation (Work unit); then —
9
17
Number
In order to compare different methods, it is useful to use the equivalent smoothing rate [11], denoted by p~,which is given by:
=
I
1
V—Cyctes
Fig. 1. Convergence of MLBGS as a function of cycles.
W
0 .._.~
1.
For a one level iterative method, the number of work units is equal to the overall number of relaxation steps and therefore p. becomes the smoothing rate. In table 1 we present the smoothing rates of the multilevel algorithm described in the previous paragraph for both Gauss—Seidel and Block Gauss—Seidel smoothing along with some single
25
I
33I
4!I
49 I
17 I
of iterations
Fig. 2. Convergence of MLBGS as a function of work units.
grid iterative solvers. The abbreviations used are as follows: GS (Gauss—Seidel), MLGS (Multilevel Gauss—Seidel), BGS (Block Gauss—Seidel), MLBGS (Multilevel Block Gauss—Seidel), CG (Conjugate Gradient) and ICCG (Incomplete Cholesky Conjugate Gradient). Figs. 1 and 2 show the convergence behaviour of the multilevel iteration scheme as a function of the number of smoothing cycles and as a function of number of work units. Since the cost per iteration varies with the smoother, and in the multilevel case with the number of levels, we present in table 2 timing results for the various iterative methods, and also for a direct method. The direct method uses Cholesky factorization with the data stored in skyline format. Results with the direct solver are shown only for the smallest problems because it would be prohibitively expensive to give results for the large problems. We see from tables 1 and 2 that the MLBGS method performs best for this set of problems; the two closest methods being the BGS and ICCG
Table 1 Equivalent smoothing rates Elements
Unknowns
GS
MLGS
BGS
MLBGS
CG
ICCG
25 100 225 400 625
1521 6241 14161 25281 39601
0.956 0.965 0.968 0.971 0.975
0.956 0.968 0.972 0.974 0.976
0.774 0.755 0.711 0.730 0.731
0.580 0.468 0.511 0.535 0.585
0.875 0.897 0.906 0.911 0.915
0.599 0.640 0.649 0.666 0.657
5. Foresti et aL
/ Multilevel solution methodfor the p-version offinite elements
BGS
0
9000
16000
Number
I
24000
32600
353
III
40000
1
of unknowns
3
Number of processors
Fig. 3. Convergence behaviour of MLBGS and ICCG, with number of unknowns.
Fig. 4. Speedup with the PF Compiler on IBM 3090. Multilevel Block Gauss—Seidel.
methods. The use of block iteration significantly enhances the rate of convergence. One possible explanation is that the range of frequencies contamed within a block is relatively small (being limited by the order of polynomials corresponding to that level); a block scheme would thus remove errors of these frequencies very rapidly. While a comparison among iterative methods is interesting, the performance of the iterative methods in comparison to the direct solver is striking, It is essential to point out here that these iterative methods may not perform as spectacularly on other problems or other geometries. However, there are good reasons why they perform well on this particular problem. The p-version of the finite element method produces matrices that are much better conditioned than those produced by the h-method [3]. Furthermore, the basis functions used in the p-version are particularly well suited to the Laplace operator (the derivatives are orthogonal), so that the matrix for this problem is well
conditioned. To support this statement we point out that the number of conjugate gradient iterations needed to achieve convergence for a problem with 625 elements (39601 unknowns), was about 150; for less well conditioned problems, the number of iterations would be close to the number of unknowns. While the MLBGS method performed best on this set of problems, the ICCG method is known to perform well on a wide class of problems [1]. However, we have chosen the MLBGS for parallel implementation for the following reasons: 1) The ICCG method is a difficult problem to implement in parallel with high efficiency, because it requires solving triangular systems at every iteration. For arbitrary patterns of sparsity, this is essentially a sequential process. Another reason to prefer MLBGS is that the incomplete factorization portion of the ICCG computations is known to be unstable for certain problems [5]. While there may
Table 2 Execution times on IBM 3090 (in seconds) Elem./unkn.
GS
MLGS
BGS
MLBGS
CG
ICCG
CHOLES
25/1521 100/6241 225/14 161 400/25281 625/39601
2.6 14.2 36.8 72.6 132.1
2.5 15.0 40.4 77.9 131.0
1.2 4.4 6.3 11.3 14.7
0.6 1.7 3.5 5.9 8.4
0.7 4.0 10.2 18.9 31.8
0.4 2.0 4.9 9.3 14.2
33.2 2160.3 * * * * * *
354
5. Foresti et aL
/
Multilevel solution method for the p-version offinite elements
be problems where the MLBGS would converge slowly, there are no obvious reasons why it should be unstable for well posed problems. 2) The adaptive capability of the p-version finite element method is particularly well suited to the MLBGS method. If a solution has been obtamed at a certain level, and is then required at a higher level, the solution at the lower level can be used as a starting guess for the solution at the higher level. With the MLBGS method, the number of iterations required to reach convergence at the higher level, using the solution at the lower level as a starting guess, is significantly less than the number of iterations to convergence using a zero initial guess. While this would appear to be true for the other methods, we have found in our numerical experiments, that the number of iterations required with the ICCG method at a certam level, is not significantly reduced if the solution at the lower level is used as a starting guess. There may be other methods of using the lower level solution in the higher level solution (e.g. as a preconditioner), but we have not experimented with these ideas.
6. Parallel implementation The multilevel scheme with block Gauss—Seidel smoothing was implemented on the IBM 3090 system with four processors sharing a single memory. A detailed description of the hardware is given in ref. [10]. The software used for parallel implementation of the block Gauss-Seidel Multilevel iteration scheme is Parallel Fortran [12], which is an extension of IBM Fortran (FORTVS2) with some “parallel constructs”. These can be basically divided into three groups; i) the Parallel ioop where the loop index is used to divide the work among the processors in prespecified chunks; ii) Parallel tasks where independent tasks are created and assigned to different processors; iii) Parallel cases where groups of simple statements are collected to form independent chunks. As an alternative to the declared parallel constructs, one may request an automatic parallelization of a FORTRAN code,
just giving directives to the compiler. Vectorization is included as a compiler option. The approach adopted for the parallel implementation of the Multilevel Block Gauss—Seidel algorithm, consists simply of sharing the operations related to any block, evenly among all processors. The major computational steps consist of sparse matrix—vector multiplications and scalar products, related to the block residual calculation and the diagonal block conjugate gradient approximation, which can be readily implemented in parallel [4]. We show below the results of the parallel execution for problems using 100, 225 and 400 elements on the IBM 3090 with the PF system using up to 4 processors. Speedup is defined as:
~(
~
=
Execution time on one processor Execution time with p prOcessors’
The speedups in the above instances deviate significantly from the ideal and it is necessary to examine the nature of the computations to explain this deviation. An examination of the serial process shows that the parallelizable content varies between 96 and 99%, depending on the the discretization. Although this is high, the multilevel algorithm is such that the parallelizable part consists of many small chunks, especially at the lower levels; there are relatively few computations followed andondata While by this synchronization is not a problem finetransmission. grain machines, the 3090 system with Parallel Fortran is not specifically tuned to fine grain computations and hence the deviations from the ideal.
~,
Conclusions
The multilevel iteration scheme block Gauss— Seidel smoothing is shown to be a highly effective scheme for the solution of matrices resulting from discretizations of continuum problems using the p-version of the finite element method. The advantage of this method over multigrid methods is that only one grid is necessary; nested matrices are defined on this grid without regard to the geometry of the object. The multilevel scheme can
S. Foresti et aL
/ Multilevel solution methodfor the p-version offinite elements
also be used very effectively with the adaptive capability of the p-version of the finite element method. While the results of the parallel implementation are not entirely satisfactory, it must be pointed out that these results pertain only to the part of the computation concerned with solving the systern of equations; the overall finite element procedure involves the generation of the matrices (usually by numerical integration). This phase is a completely parallel process (no data exchange required between processors), and hence the overall parallel efficiency of the finite element solution of problems using hierarchical basis functions can be expected to be higher than shown here.
References [1] 0. Axelson and V.A. Barker, Finite Element Solution of Boundary Value Problems (Academic Press, New York, 1984).
355
[2] I. Babuska, B. Szabo and IN. Katz, SIAM J. Numer. 18 (1981) [3] Anal. A.W. Craig and 515. O.C. Zienkiewicz, in: Multigrid Methods for Integral and Differential Equations, eds. D.J. Paddon and Holstein (Clarendon Press, Oxford, 1985) p. 310. [4] R. Donati and V. Sonnad, IBM Kingston Technical Report, KGN-98 (1987). [5] G. GolubUniv. and C.F. Vanloan, Matrix Computations Hopkins Press, Bahtmore, Maryland, 1983). (Johns [6] W. Hackbusch, Multi-grid Methods and Applications (Springer-Verlag, New York, 1981). [7] R. Herbin, S. Gerbi and V. Sonnad, IBM Kingston Technical Report, KGN-155 (1987). [8] J.M. Ortega and R.G. Voight, SIAM Rev. 27 (1985) 149. [9] B. Szabo, Selected References on the p-version of the Finite Element Method, Available on request from Washington University, St-Louis, Missouri. [10] S.G. Tucker, IBM Systems J. 25 (1) (1986). [11] T.A. Zang, Y.S. Wong and MY. Hussaini, J. Comput. Phys. 54 (1984) 489. [121 Parallel FORTRAN Language and Library Reference, IBM Manual No. SC23-0431-0.