Computer methods in applied mechanics and engineering ELSEVIER
Comput. Methods Appl. Mech. Engrg. 116 (1994~ 59-67
A spectral element methodology tuned to parallel implementations F. B e n B e l g a c e m * , Y. M a d a y Laboratoire d'Analyse Numerique, Universite Pierre et Marie Curie, 75252 Paris Cedex 05, France
Abstract
We present in this paper a modification of the conforming spectral element method for the approximation of the Poisson equation. This new procedure allows the implementation on a parallel machine in an easier and faster way. The resulting method is nonconforming but the numerical analysis shows that the error between the exact solution and the discrete solution is of the same asymptotic order as the error of the original conforming method.
1. Introduction
Spectral methods for partial differential equations are well known for their high order accuracy. They are based on polynomial approximations of the solutions. The combination of the spectral method with domain decomposition techniques, resulting in the so-called spectral element method [1-3], allows for the treatment of problems set on domains with general geometries. The mortar element method [4, 5] has recently increased even further the range of applications and the flexibility of the original spectral element method for the approximation of second and fourth order elliptic or parabolic problems. This method is non-conforming; the discrete space is not imbedded in the continuous functional space suited to the analysis of the given partial differential equation. Nevertheless, the 'variational crime' committed in this strategy does not pollute the accuracy of the original method; it leads to more flexible discretizations and a better use of the discretization parameter. In order to take into account the architecture of the current computers on which the implementation of the spectral element method is done, progress has also been made to parallelize it [6-9]. The domain decomposition which is one of the basic ingredients of this method allows the design of powerful and efficient parallel versions. Actually, the parallelization efficiency is quite favorable to the spectral element method; the ratio of computation time to communication time is larger for this type of method than for others; this is mainly due to the high order accuracy provided by spectral type methods combined with the fact that the spectral matrices are full. One cause of difficulty in the implementation and increase in the cost of communication time in the case of a parallel framework is due to the fact that the interfacial values, responsible for the (obliged) communication between subdomains, have different behaviour due to different order of multiplicity: in 2D corners with respect to internal interface nodes, in 3D vertices and edges with respect to internal interface nodes. This fact, together with the ideas of non-conformity allowed by the mortar element method led us to write this paper. We present here a new idea for domain decomposition discretization (actually not restricted to the spectral context) that simplifies the parallelization of the method by grouping the unknowns into two categories:
* Corresponding author. 0045-7825/94/$07.00 © 1994 Elsevier Science B.V. All rights reserved SSDI 0045-7825(93)E0183-9
60
F. Ben Belgacern. Y. Maday / Comput. Methods AppL Mech. Engrg. 116 (1994) 59-67
internal values, allowing for local computations, and interracial values that are all treated in a similar way.
2. Presentation of the method For the sake of simplicity, we shall focus this analysis on the case of the Poisson equation on a three-dimensional domain /2: find u E H~(/2) such that -Au =f,
over/2 ,
(2.1)
where f is some given forcing function and H0~(/2) stands for the standard Sobolev space. In addition, we shall also assume that the domain /2 is decomposed into a conforming union of non-overlapping rectangular hexahedra subdomains/2k, 1 ~< k ~< K. The term conforming relates here to the assumption that the intersection of two elements ,(2k and /2~ is either empty or reduced to a common vertex, a common edge or a common face. This simplification may certainly be a restriction as regards real applications. The extension of the new methodology presented in this paper to more complex geometries is straightforward, the key point of the numerical analysis of the method will be much simplified and the general case can be analysed using techniques that are all present in this paper and in [101. We focus the analysis on the interface between two adjacent elements. For any k and l, the open rectangle Fk.~ is defined as follows: Fk./ = 0 0 k Cl 0/21 • The starting point of the spectral discretization is the equivalent variational formulation of problem (2.1): find u ~ H I o ( / 2 ) such that Vv E H01(/2),
f v.Vvdx=ffvdx.
(2.2)
Using the fact that the decomposition is non-overlapping, this is readily rewritten as follows: find u E H ~ ( O ) such that
k=l
k
k=l
k
This formulation shows that solving problem (2.1) can be transformed into another problem where the solution u is searched as a K-tuple u = (Uk)~_k~¢ of elements in X defined by X = { v = (Vk)~_k~_~¢, Ok E H~(/2k), V k , 1, 1 <~ k < 1 <~ K , Vklrk., = Vtlrk J, Ukl0a = 0} ,
which is a Hilbert space provided with the broken norm
I1 II.
(2.4)
defined by
K
Ilvll , = E Ilvkll =H t l f / k )
(2.5)
•
k=l
We can consider now the continuous problem: find u E X such that
k =1
k
k=l
fo
(2.6,
k
This is equivalent to (2.2) provided that the identification between u and u = (u k = Ulak)~k, K is done. The discretization of problem (2.6) consists of a Galerkin method based on a proper space X 8 of approximation of X. It is stated as follows: find u 8 E X 8 such that
k=l
k
k=l
k
F. Ben Belgacem, Y. Maday / Comput. Methods Appl. Mech. Engrg. 116 (1994) 59-67
61
All we have to do now is define the discrete space Xn. Let N be a K-tuple of integers N = (Nk)l,k, K, such that N k > 2. These denote the degree of the polynomial approximation that is used over each domain. The space X~ is a subspace of Y~ where =
=
ok
PNk(ak)}
•
One possible and simple choice for X~ is given by
This choice results in a conforming approximation of the problem and yields a unique solution tl8 satisfying an optimal approximation. Indeed it can be proven that K
Ilu- a ll,
Nk
lu k IIH =t(.Ok)
(2.8)
,
k=l
provided that the solution u to problem (2.1) satisfies the regularity property Wk,
l~
uk~H'k(y2k),
for some K-tuple (O'k)~k~ K of real numbers 1>1 (see [11]). As noticed in previous work this choice X~ leads to some slight difficulties in implementation due to different multiplicity of nodal values of the interface. The choice we propose here results in a non-conforming approximation still providing an optimal error bound similar to (2.8). This choice is given by
X, = {v~ = (V~.k)l~k~K E Y~, V~.klaa = O, Vk,
= 0},
(2.9)
.I where
~[k.I is defined as being either N k or Nt.
3. Numerical analysis of the method
In this section, we check that the discrete problem, with choice (2.9), leads to an error bound similar to (2.8). For this purpose, we first prove that it is well posed in the sense that there exists a unique solution to (2.7) that depends continuously on the data f. For this purpose, it is standard to introduce the bilinear form a s by
V(u~'vs)EX2~ ' a~(us'v~)= ~ fn Vu~'kV°~'kdX k=l
(3.1)
k
and prove that it is uniformly continuous and elliptic with respect to the broken norm (2.5). The uniform continuity is straightforward. The only point that is not trivial is the uniform ellipticity. It is a consequence of Proposition A.1 of [4] and we recall the arguments here for sake of convenience.
LEMMA 3.1. There exists a positive constant a such that V,,~ ~ X~,
a~(v~, ,~) t> a Ilv~II.~ .
(3.2)
PROOF. We can prove this result on a larger set of functions X that contains Xn. Let us define this space as
2 = { v = ( O k ) , , k , r , Vk EH~(Ok),V~l~n=OVk, l,a <~k
(3.3)
.I
Let us assume by contradiction that (3.2) is not true over X. It follows that we can find a sequence (v,), such that
F. Ben Belgacem, Y. Maday / Comput. Methods Appl. Mech. Engrg. 116 (1994) 59-67
62
1
as(e,,,v,)~< n
IIv, l l , = l .
and
(3.4)
Recalling the definition of a~, it follows that for any k, Vvk.,,---, 0 as n goes to infinity, so that there exists a constant Ak that verifies Vk,,,--~Ak as n goes to infinity. The matching condition expressed in (3.3) states that all these constants must be equal. Using now the homogeneous boundary condition, it follows that Ak = 0 so that vk.,,--~ 0 as n goes to infinity. This is, of course, inconsistent with (3.4) and proves the lemma by contradiction. []
REMARK 3.2. The method used in the proof of Lemma 3.1 is not constructive. Due to this, the size of the ellipticity constant a is not precise. One could estimate it by a more constructive proof, based on the same arguments as the proof of the Poincar6 Friedrichs inequality. It is obvious that a depends on the sizes of the interfaces. For this point, a numerical simulation would certainly give a better insight into the possible loss in ellipticity (that would translate into a weakness in stability) with respect to the conforming method. This point will be developed in a future paper. We are now in a position to state that problem (2.7) is well posed and that the following stability relation holds:
IIv~ll, --<- g1 IlfllL-,,,, •
(3.5)
In addition, the second Strang lemma allows us to give an error bound between the solution u and u~,
Ilu- u ~ l l , -_ <~l
[
/
inf I I " - v ~ l l , + w~x~ sup l-~k
-7- (w~k - w~.,)
JFk t On
""
i~.~i 12
1 '
(3.6)
where the first term on the right hand side of (3.6) is the best fit error of u by elements of X~ and the second term is known as the consistency error and is the price to pay for the non-conformity. In order to obtain a convergence as in (2.8), we have to upperbound each one of these two sources of error. The first is easily handled by noticing that , ~ C X~, so that inf I I " - v ~ l l , ~
e~ E
Xz,
inf I l u - v ~ l l ,
o~ E ,ft'/i
and for instance with o~ = ti~, we deduce from (2.8) that K
inf Hu v,s EX,s
v~[[,<~c ~'~
(3.7)
~-"*
k= 1
The consistency error term is slightly more involved. Recalling (2.9), we state that
V~, E P,~,.,_4r,.,),
f,o
.,.,-fin (w~.k - w~.,) =
~.,
- ~
)
(%.k -
w~.,).
This leads to the following bound:
f, ,, ~Ou (w&k
-
w&,) <~
•
q~EP~/~,
O-~n-
inf t
:(Ik,t)
g,
11w&k - w~.,ll,,,..{,-,.,, "
..
'
and the standard trace theorem, used over each side of F,.; leads to
If, i ; ~0.
(w~.k -
w,,)
~ c
inf
,,,
Let us assume that ]Vk.; = Nk, by choosing ~b as the best fit of
au/an
in the L2(Fk.;)-norm yields
F. Ben Belgacem, Y. Maday / Comput. Methods Appl. Mech. Engrg. 116 (1994) 59-67
inf
d~
,,2
63
<
-Orkll
(3.8)
where the last bound is obtained from a standard trace lemma. Going back to (3.7), we claim that
sup
2 Irk O-~n(w&k--w&t) r l~k
k=, N ,
1-,'k
Ilul.,ll.o..,
so that we have proven the following theorem. T H E O R E M 3.3. The problem (2.7) is well posed with the choice of discrete space X 8 given in (2.9). Moreover, its solution satisfies the error bound
K IIu - us II, -< c E ~ ' - ~
(3.9)
k=l
provided that the solution u to problem (2.1) satisfies the regularity property Vk, l~k<~K,
ukEH~k(Ok)
for some K-tuple (O'k)t~k<_K of real numbers >11.
4. Basic implementation
In addition to the previous statement of the discrete problem (2.7), it is standard to resort to numerical quadrature formulas to evaluate the integrals in the variational forms. The numerical analysis that has just been performed can easily be extended to handle this modification (see e.g. [12]). The quadratures formulas are of Gauss-Lobatto-Legendre type, constructed from the zeros £j, 0 ~
Vp, q , r , p ' , q ' , r ' ,
t
O<~p,q,r,p',q,r
t_.~
~N k ,
k
k
Hpqr(~p,q,r,)--~pp,~qq,~rr
, ,
which are readily extended to 0 so as to be considered elements in Y~ still denoted in the same way. Using this definition, any element of Y~ can be written as
K Nk Nk Nk [J= 2 Z Z Z Ok(~kqr)m~qr • k = l p = l q=l r=l k
-
k
The degrees of freedom of elements of Y8 are then the Vpq r corresponding to the values Vk(S~pqr). In order to be in Xa, we deduce from (2.9) that the values (V~qr)O<_p.q.r,,Nk and (VIpqr)O<~p.q.r~N, corresponding to points that are on the interface ~.t between "Ok and l/I have to be linked by the integral matching condition. Assuming that Vk, l , l < ~ k < l < - K ,
N k j = N k,
leads to the fact that the values of v k over Fk.I are derived from the values of vt if k < I. Assuming in addition that the points ~pkqr over FR.t are those such that r = 0, we can easily verify that the matching
F. Ben Belgacem, Y. Maday / Comput. Methods Appl. Mech. Engrg. 116 (1994) 59-67
64
(VkpqO)l<~p,q,<
condition translates, e.g. into the fact that the internal v a l u e s depend linearly on the boundary values VkpqO, corresponding to p = 0 , p = N k, q = O , q = N k, and the facing values 1 (VpqN,)O~p.q,N ~ if we assume in addition that the points ~tpq, over F~.I are those such that r = N r We denote by (~k.~ this linear mapping. We realize now that the true unknowns of an element o in X 8 are the internal values over each k k element g2k' Wintk. . . . I(Vpqr)l<-p.q.r<~Nkk_ ~, the values at the vertices Vver,, and internal edges, Ve0ge of each ~k (resulting into a muitivalue definition at this place), together with the internal interracial values over Fkj, 1 ~< k < l ~< K, Vl,,erfac e. In Fig. 1, we have indicated, on a simple decomposition of a square into four squares, the various properties of the nodes: either boundary ones where the boundary conditions are imposed, or true degrees of freedom as explained previously or finally slaved nodes. Let us denote by V these true algebraic unknowns and by ,3. the associated stiffness matrix representing the bilinear form a 8 of (3.1). The algebraic system corresponding to problem (2.7) reads: find /] such that A/d= F,
(4.1)
with the proper definition of F that represents the contribution of the forcing function f. Note now that if we need to have the actual values of the polynomial solution u~, we can derive them from U either directly (for those points that correspond to true degrees of freedom) or from the local linear mappings Q~,~ over each interface. For a particular slave node, the true nodes that actually infere with its value are indicated in Fig. 2. Let us denote by U the global vector of values of u~ as being the set of all values ( pq~)(~*k*K.O*p.q.,*Nk)" It follows from the previous consideration that these values can be computed from /.d via a linear mapping, denoted by Q, the lines of which are those of the identity mapping for almost all except those that correspond to the internal interfacial values over G.t, 1 ~< k < 1 ~ K, where k the Vi,,~f,c ¢ are recovered via the associated matrix (~kJ' Due to this definition, the stiffness matrix has actually a simple expression: by introducing the block matrix A of all local stiffness matrices A k corresponding to a Laplace equation with Neumann boundary conditions (i.e. find t/k E H~(glk) such that
VUk EHI(J-2k)'
fo V(/'tk)V(tTk)=fo fln~(tTk)' k
k
then we have I
II
g
g
g
I
0
0
0
0
0
0 ~:zO
0
•
g
•
I
0
0
|
fll II
0
0
@
boundarynodes
0
true degrees of freedom
@
slaved nodes
g
U
U
U
I
0
0
0
I
0
0 ;`4
0
I
0
0
0
tl
a
I
I
i
Fig. 1. Example of a decomposition.
F. Ben Belgacem, Y. Maday / Comput. Methods Appl. Mech. Engrg. 116 (1994) 59-67 U
U
U
~J
o
O
0
I
o
OD.2O
o
0
o
e
o
o
n
a
u
65
o
0
o
o
o
o
o
this slaved node • gets its value from the grey nodes @
e
U
U
U
U
D 0
0
0
0
D 0
0
0
0
DO
0
O
0
D 0
0
0
0
.)
U
U
U
3
0
0
0
3
0
~4 0
0
3
0
0
0
Fig. 2. Example of a true/slave node interaction.
,~=QT
2
0
0 ""
0
Q"
(4.2)
O" A K
This relation actually makes the point of the simplification in case of a parallel implementation. Recall first that in order to solve problem (4.1), one usually resorts to iterative methods, actually preconditioned conjugate gradient algorithms. Indeed, the matrix ,4 (even in the conforming case where the discrete space is Xa introduced in Section 2), is full and it would be too costly, in many applications, to try to invert it directly. Due to this fact, at each step of the iteration procedure, one typically has to evaluate matrix-vector and vector-scalar products. For the matrix vector product, /~, = A Z,,,
(4.3)
where 2 , is some known data, one can use the relation (4.2) to state that one first computes Z, = Q,?,,,,
(4.4)
which consists of computing the nodal values over each element g2k. Let us make the assumption now that every domain O k, 1 ~< k ~< K, is assigned to one processor among M and that the local load is equilibrated. The evaluation of (4.4) can be done in parallel independently over each processor provided that the values on O k and also the values of the interfaces Fk.t for all l with k < l are available on each processor on which is assigned the domain O k . Then the local evaluation of the action of the local matrices A k can be done in parallel; finally, from these local entities, /~, is recovered by applying Qv to the local results. This can also be done locally, but at the end of the computation you must pass from one processor on which lies O k to each processors on which lies g2t corresponding to interfaces Fk,~ that are not empty and for k ~< l. In order to symbolically represent the action of Qv, we have indicated in Fig. 3, for a particular position of a true unknown, those positions of local residual evaluations related to slave nodes that give a contribution that has to be added to the local residual at this particular true unknown node for the evaluation of the global residual. To be even more precise in the evaluation, one has to assume some properties on the communication network. It is supposed to satisfy the two constraints [6]: (1) A distinct, direct link must exist between two processors for each distinct pair of adjacent elements which is divided between the processors.
66
F. Ben Belgacem, Y. Maday / Comput. Methods Appl. Mech. Engrg. 116 (1994) 59-67 .,Ira U
U
U
(.
:)
O
O
O
C
3
0
0 D.2 0
C
3
0
0
0
C
3
•
•
•
II
) O 30
O 0
U 0
U (. OC
)0
0
0
OC
0
0
0
DO ~
0 n
0 n
OC nt~
u
0
~.
It
0
0
C
II
0
0
C
)
O
O
f
U
U
U
D
O
O
O
C
O
0
O
O
C
D
0
O
O
C
The residual at this point • gets contribution from all these points •
D,4
f~3 )0
2
C
Fig. 3. Example of the a T operation.
(2) A summation of M values distributed over M processors can be performed in O ( l o g M ) communication steps. In the case where the N k are all equal to some given N, the time required to perform matrix-vector product scales is as follows: KN 4 Cl - - M tcomp + c2t . . . . (N2) N2-
The first term here corresponds to the local evaluations of the matrix-vector products AkZ ~, and the second term represents the communication at faces between adjacent elements; t . . . . ( n ) is, in this expression, an evaluation of the time per word required to send n words across a direct link between two processors. This time compares favorably to that associated with the conforming approximation that would scale as follows: KN 4 •l ~ tcomp + 72 t . . . . (N2) N2 + ~3 t . . . . (E)E log(M)
(see [6]) where the last term corresponds to the particular treatment of all special nodes (vertex and edge) that do not have an order 2 multiplicity. In 3-D geometries, • scales like O(K~J3N) and the resulting time is about 30% of the total time [13]. Actually, in simple decomposition, most of the special nodes can be treated on the fly as other nodes by defining a proper ordered sequence in communicating from one processor to the other [6]. The number of special nodes that have to be treated as a real special case can then be dramatically reduced, but this is at the price of much programming work and has to be done separately for each domain decomposition. Such an improvement may lead to a reduction in the contribution of the treatment of the special nodes to only 10% of the total time but may lead to human work that can be as high as 50% of the total work [13]! The previous matrix-vector evaluation is the most complicated and time consuming operation in the conjugate gradient iteration since we recall that the vector scalar product involves only c4(KNd/M) + cst . . . . (1) log(M) operations (see [6]). It is thus an easy matter to be convinced that there is room for large time improvement by using this non-conforming method. The actual gain must be evaluated by numerical simulations. Indeed, recall that the ellipticity constant in (3.2) has to be compared to the ellipticity constant of the conforming case. Note also that in (3.8) the approximation of the normal derivative involves approximation by polynomials of degree ~ < N - 2 and not N - 1. All this may lead to the use of a larger value of the
F. Ben Belgacem, Y. Maday / Comput. Methods Appl. Mech. Engrg. 116 (1994) 59-67
67
d e g r e e o f t h e p o l y n o m i a l a p p r o x i m a t i o n in the n o n - c o n f o r m i n g m e t h o d t h a n in the c o n f o r m i n g m e t h o d to a c h i e v e t h e s a m e k i n d o f accuracy. T h e c o r r e s p o n d i n g gain a n d loss m u s t b e c o m p a r e d a n d this will be the subject of a forthcoming report.
Acknowledgment T h e a u t h o r s a r e v e r y g r a t e f u l to P.F. F i s c h e r , A . T . P a t e r a a n d E . M . Rlanquist for m a n y i n t e r e s t i n g d i s c u s s i o n s d u r i n g t h e p r e p a r a t i o n of this m a n u s c r i p t . This w o r k was s u p p o r t e d by C N R S u n d e r c o n t r a c t C o o p 6 r a t i o n I n t e r n a t i o n a l e a n d by N . S . F . u n d e r g r a n t INT-8914984.
References [1] A.T. Patera, A spectral element method for fluid dynamics: laminar flow in a channel expansion, J. Comput. Phys. 54 (1984) 468-488. [2] Y. Maday and A.T. Patera, Spectral element methods for the incompressible Navier-Stokes equations, in A.K. Noor and J.T. Oden, eds., State of the Art Surveys in Computational Mechanics (ASME 1989) 71-144. [3] E.M. Ronquist, Optimal spectral element methods for the unsteady 3-dimensional incompressible Navier-Stokes equations, Thesis, Massachusetts Institute of Technology, 1988. [4] C. Bernardi, Y. Maday and A.T. Patera, A new nonconforming approach to domain decomposition: the mortar element method, in H. Brezis and J.L. Lions, eds., Nonlinear Partial Differential Equations and their Applications, (Pitman and Wiley, 1992). [5] C. Bernardi, Y. Maday and A.T. Patera, Domain decomposition by the mortar element method, publications du Laboratoire d'Analyse Num6rique R92013, Universit~ Pierre et Marie Curie, 1992. [6] P.F. Fischer, Spectral element solution of the Navier-Stokes equations on high performance distributed-memory parallel processors, Thesis, Massachusetts Institute of Technology, 1989, [7] P.F. Fischer and A.T. Patera, Parallel spectral element solution of the Stokes problem, J. Comput. Phys, 92 (1991) 380-421, [8] P.F. Fischer and E.M. Ranquist, Spectral element methods for large scale parallel Navier-Stokes calculations, Comput, Methods Appl. Mech. Engrg. 116 (1994) 69-76. [9] R. Henderson and G. Karniadakis, A hybrid spectral element finite difference method for parallel computing, in P. Mehrotra, J. Saltz and R. Voigt, eds., Unstructured Scientific Computation on Scalable Multiprocessors (MIT Press, Cambridge, MA, 1992). [10] Y. Maday and E.M. Ronquist, Optimal error analysis of spectral methods with emphasis on non-constant coefficients and deformed geometrices, Comput. Methods Appl. Mech. Engrg. 80 (1990) 91-115. [11] C. Bernardi and Y. Maday, Approximation results for spectral methods with domain decompositions, J. Appl. Numer. Methods 6 (1989-1990) 33-52. [12] C. Bernardi and Y. Maday, Approximations spectrales de probl~mes aux limites elliptiques, coll. Math~matiques et Applications (Springer, France, 1992). [13] P.F. Fischer, personal communication.