ELSEVIER
Comput. Methods Appl. Mech. Engrg. 116 (1994) 175-183
Computer methods in applied mechanics and engineering
An iterative solver for p-version finite elements in three dimensions Jan Mandel 1 Center for Computational Mathematics, University of Colorado at Denver, Denver, CO 80217-3364, USA
Abstract
A new iterative method for systems arising from p-version finite elements is presented and computational experience reported for conforming, solid, serendipity elements of order up to p = 8 and real-world structures with 5000 elements and over 1 million degrees of freedom. The method uses a domain decomposition technique with each element taken to be a subdomain, and a coarse space consisting of p = 1 elements. To overcome the I/O bottleneck, only access to a fast matrix-vector multiplication instead of stiffness data is needed during iterations. In terms of wall-clock time, the method outperforms on IBM RS/6000 an earlier method running on CRAY-XMP.
1. Introduction
This paper is concerned with the preconditioned conjugate gradient method applied to large systems of linear algebraic equations arising from discretizations of linear elasticity by the conforming p-version finite element method. For background information on the p-version, see [28]; for conjugate gradients, see [14]. Other iterative methods for p-version can be found in Refs. [5,13,15,26]. The method presented in this paper is an outgrowth of our earlier investigations of efficient preconditioning for the p-version. In Refs. [ 4,18,20,21,24], general principles of such preconditioners were established: elimination of interiors, transformation of remaining degrees of freedom, and use of a coarse problem with few degrees of freedom per element to guarantee a condition number bounded independently of the number of elements. Such preconditioners draw heavily on related domain decomposition methods. The present method is related to the domain decomposition method of Ref. [ 17], which is a version of the method [ 12, Eq. (26) ] with a coarse space, also related to a special case of Ref. [27]. A practical method was introduced and studied in Refs. [ 19,22]. The method can be summed up as a transformation of the stiffness matrix to a new basis and a block diagonal preconditioning. There was one block for the subset of degrees of freedom corresponding to p = 2 tensor product elements (the coarse space) and one block for each face and edge with the remaining degrees of freedom. To overcome the effect of distorted finite element meshes, badly conditioned elements were automatically detected and the diagonal blocks in the preconditioner adaptively enlarged to control the resulting condition number. The method has outperformed state-of-the-art direct solvers for large real-world problems such as elastic deformation of automobile crankshaft or complete airplane fuselage. The computational results reported in Ref. [22] have revealed three major bottlenecks. Also at Solvers International, Inc., 1920 Kohler Dr., Boulder CO 80303-5708, USA. 0045-7825/94/$07.00 (~) 1994 Elsevier Science B.V. All rights reserved SSDI 0045-7825 (93)E0196-F
J. Mandel/Comput. Methods Appl. Mech. Engrg. 116 (1994) 175-183
176
Solution o f coarse problem. Since the computational cost of all parts of the method except processing the p = 2 coarse problem grows linearly with the number of elements, costs associated with the decomposition of the coarse problem and back substitution in every iteration dominate the computational complexity if there are sufficiently many elements. I / 0 during iterations. Since the matrices used in iterations are very large, they must be stored on external storage. But matrix-vector operations use a matrix element only once or twice, so there are only one or two floating point operations per input of one word from external storage, resulting in very low CPU utilization rate during iterations. Disk storage size. Large external storage capacity is needed for the transformed stiffness matrix and for the decomposed matrix of the coarse problem. One obvious solution to alleviate those bottlenecks is to use p = 1 elements as the coarse space to reduce the size of the coarse problem (a p = 1 element has only 24 degrees of freedom, while a p = 2 element has 72 degrees of freedom). But straightforward use of p = 1 elements with the method of Refs. [ 19,22] gives condition numbers quickly growing with p for reasons deeply rooted in functional analysis, and this growth is actually observed in practice [ 19]. Since the iterative method of Refs. [ 19,22] operates on the stiffness matrix transformed into new degrees of freedom, the transformed stiffness data had to be kept in storage during iterations, thus limiting the size of problems that could be solved with given resources, and incurring I/O. For these reasons, the method was completely redesigned and the code rewritten to allow the use of p = 1 coarse space without the growth of the condition number and to require the use of only original stiffness matrices during the iterations. The latter feature makes it possible to replace the use of stiffness data by access to fast matrix-vector multiplication available in STRIPE [ 1,2]. The code can use the stiffness data and interface with, e.g., M S C / P R O B E just as the earlier code [22].
2. Description of the method 2.1. Mathematical background The present method belongs to the family of abstract additive Schwarz methods. Here we summarize the main points; for mode details, see Refs. [6,12]. Let V be a real finite dimensional linear space with the inner product (-, .), A a symmetric, positive definite linear operator on V, and Vi, i = 0 . . . . . m, subspaces of V such that V = Vo + . . . + V.,. Denote a ( u , v) = (Au, v). The bilinear form a(-, .) is called the energy tuner product and is the energy norm. For a given f C V, we wish to solve the problem Au* = f , or, in the variational form, uE V:
a(u*,v) =(f,v),
VvC
V
IlulIA
= ( a ( u , u) )~/2
(2.1)
In each iteration, the preconditioned conjugate gradient method requires an approximate solution ~ of the problem (2.1) with a residual r in place of f . In an additive Schwarz method, this is accomplished by computing u i E Vi :
a(ui, vi) = (r, vi),
V vi E Vi,
i = 0 . . . . . m, (2.2)
F~= ~ U i . i---O Clearly, ft = Eir~o Pi u, where Au = r and Pi is the energy orthogonal projection onto V/, i = 0 . . . . . m. The number of iterations of preconditioned conjugate gradients for a given reduction factor of the error in energy norm grows at most as v/if, where a: is the condition number given by
J. Mandel/Comput. Methods Appl. Mech. Engrg. 116 (1994) 175-183
177
,Ami. and lm~x denote the minimal and maximal eigenvalue, respeciively. The maximal eigenvalue is easy to estimate as the "maximal number of subspaces V/ that interact",
where
" am~ ~< max ~ kij,
kij =
i=O,...,m j=O
~ 1, if V~M Vj +~ {0}, L0, otherwise,
(2.3)
cf. also [11, Lemma 2] or [6, Theorem 3.1]. The minimal eigenvalue is estimated from a lemma originally due to Lions [ 16] ; for a proof of the abstract result in the following form, see [6, Theorem 3.2] or [29, Lemma 4].
LEMMA 2.1. If there is a constant Co such that V v C V 3 t,i E V/,
i = 0..... m
:
~" Ilvill2a~< C011vll2a,
(2.4)
i--0
then ,Amin ~> l/Co. 2.2. Selection of subspaces We will use the method of the preceding section with V as the p-version finite element space. To specify the method, we only need to determine the subspaces V,.. First we select space V0 as the space spanned by all linear basis functions. Then, for each element, select a subspace V,- spanned by all interior basis functions (if any). For each face F in the finite element structure, select a subspace V~ of V of all functions that are zero except on F and the two adjacent interiors, and such that they are energy orthogonal to all interior basis functions. For each edge E in the finite element structure, select a subspace V/ of all functions in V that are zero except on E and on faces and interiors adjacent to E, and such that they are energy orthogonal to all interior and face basis functions. Finally, for each vertex N, select a subspace V~ of V of all functions that are zero except on N and adjacent edges, faces, and interiors, and such that they are energy orthogonal to all edge and face functions of tensor polynomial degree at most 2, and to all interior functions. This selection of subspaces is motivated by the desire to have a decomposition with the least energy possible; each of the subspaces Vi is selected so that every function in V/ has the least energy while keeping the same support and same values on the face, edge, or vertex V,. is associated with. Note that V = Vi @ . . . @ I,',,. In Ref. [21 ], the process of transforming the stiffness matrix into new basis functions that span the subspaces Vj . . . . . Vm was called partial orthogonalization. The difference is that here we orthogonalize vertex functions only to face and edge functions of tensor polynomial degree at most 2 and that we use the space V0 of linear function and additive Schwarz framework instead of the quite different (and much more complicated) construction of coarse space and the preconditioner in Ref. [21].
2.3. Analysis It is easy to see that the number of different subspaces Vi such that their supports intersect is bounded independently of the number of elements. Thus, from (2.3), A,,~, ~< const. We estimate A,,i, independently of the number of elements from approximation properties of linear functions and from element-by-element estimates. The domain J'2 is assumed to be decomposed into non-overlapping elements K that are assumed to form a quasi-uniform triangulation 7" with characteristic mesh size h, i.e., every element K is the image of one reference element ~': K = JK(~'),
llaJKII ~< consth,
[l(aJK)-Ill ~< consth - I ,
(2.5)
J. Mandel/Comput. Methods Appl. Mech. Engrg. 116 (1994)175-183
178
where OJK is the Jacobian of JK. Assume that V C ( H I ( 1 2 ) ) 3 and that
V u E V: ClJJulltmtn))3 <. a ( u , u ) <. C2[lu[[(H,(n))~,
(2.6)
with some constants 0 < C] < C2. Further, inspired by Dryja [ 10], define 2 ,2 II"II(.,(K~=I~I(.,(K~,+ Ilull~,-'(,~>,.
(2.7)
Note that the factor 1/h 2 in (2.7) makes both components of the norm scale the same way under a dilation of K. Since V = VI q) • • • @ Vm, we can decompose uniquely each v E V as v = Hi u + • • • H,,v, where Hiv E Vi. LEMMA 2.2. Let CK be constants such that V v E V V K:
IIIIivll(n,(K))3
<~ CKII"II?.'(K~"
(2.8)
i=1
Then 1
C mink
CK'
where the constant C depends only on the quasi-uniformity o f the triangulation K and on the domain 12. PROOE We proceed similarly as in Ref. [25]. There exists a mapping H0 : ( H t ( 1 2 ) ) 3 ---, V0 such that for all V E (H1(12)) 3,
IIn0ull(H'(m)~ ~ Cllull(H'(m~3, Ilu -/-/0ull(L-~(m)~ ~ Ch211ullcH'tm~, It is well known that such /70 can be taken to be the L 2 projection on V0 [9] or defined by averaging and interpolation. The proposition then follows from Lemma 2.1 with v0 = Hov, v i = 1-1i(U -- U0), i = 1 . . . . . m, the equivalence of norms (2.6), assumption (2.8), and the above properties of Hi. [] THEOREM 2.3. There exists a constant Co, dependent only on the degree p o f the elements, domain 12, the form a(., .), and quasi-uniformity of the triangulation, such that Amin /> l/Co. hi particular, Co and thus the condition number K are independent o f the number o f elements. PROOE Each sum in (2.8) for a given K contains nonzero terms only for i such that supports of functions in Vi intersect K, and that the lliu is determined only by the values of u on the element K and adjacent elements. Consequently, the constants CK can be bounded from the constants in (2.5). [] Theoretical tools for the analysis of the behavior of K as a function of p are not known in three dimensions. For the case of two dimensions, it was proved in Ref. [4] for a closely related method that K ~< const ( 1 +log2 p) for quasiuniform triangulations, and some cases of more general triangulations were studied in Ref. [24]. Motivated by (2.8) and the proof of Theorem 2.3, we use the bound computed numerically for a singular model consisting of one element as a rough estimate of the condition number fbr a specific selection of the subspaces Vi. As in Ref. [20,21 ], the estimate is obtained as the ratio of the largest and the smallest nonzero eigenvalue of the local stiffness matrix transformed to new basis functions. Each of the spaces V/, i -- 1. . . . . m, (restricted to K) is spanned by one group of basis functions, and the matrix is preconditioned by a block diagonal matrix with blocks associated with these groups. We have used such estimates to determine that the vertex functions need to be orthogonalized only to p = 2 tensor product functions, cf., Table 1 as well as to make some other strategy selections. Note that the numbers in the first line of Table l are the same as in the first line of Table 4.2 in Ref. [21 ].
J. Mandel/Comput. Methods Appl. Mech. Engrg. 116 (1994) 175-183
179
Table I Condition numbers for various orthogonalization strategies Partial orthogonalization strategy All to all functions available All to p = 2 (tensor product) only Vertex functions to p = 2, rest to all
p=3
p =4
p =5
p =6
p =7
p =8
26.0 27.2 27.2
21.6 26.2 26.2
32.4 158.0 35.1
31.7 210.8 31.7
28.2 238.4 40.3
27.9 337.3 48.6
Serendipity brick element, aspect ratio I : I : I, linear elasticity tr = 0.3. A simplified version of this estimate was also used as a run-time condition estimator. Based on the result of that estimator, suitable subspaces V/ were merged or enlarged adaptively to achieve acceptable condition numbers at a reasonable cost.
3. Implementation aspects 3.1. Algorithms Our implementation consists of preprocessing phase and iteration phase.
Preprocessing phase The stiffness matrix A is input via a (possibly user supplied) subroutine that returns a local stiffness matrix
Ax. Only one subroutine call is made for each Ax, and one for a small part o f AK for estimation and strategy selection purposes. First, the problems are scanned and condition numbers estimated and a decision on the preconditioning strategy (i.e., selection of Vi) is made. Then the s ubspaces V/are constructed as described in Section 2.2. The global stiffness matrix A is transformed into matrix A in a new basis such that each o f the subspaces V/ is spanned by a subset of the new basis,
= MtAM. The transformation matrix M is stored in a factored form, while only the part of A that is needed is assembled at any time. The size o f the part of A that needs to b e k e p t is controlled by a technique similar to bandwidth minimization. Consequently, only a small fraction of A needs to be present at any given time. The ordering o f the calculations also attempts to minimize I / O during the transformation. As soon as some part of A is no longer needed, it is discarded. The block diagonal m a t r i x / ) is constructed from the entries of ,4. The matrix D - I A is the matrix of ~i'--'1 Pi in the new basis. Contributions to D are created just before a part of ,4 is discarded. At the same time, the matrix A0 is constructed as the submatrix of A that corresponds to V0. Denoting by S the Boolean selection matrix that maps the basis functions in V0 into all basis functions o f V, we have
Ao = S' AS. Since the whole matrix A is never kept in storage, contributions to A0 are created as soon as the data of a part of A are first accessed. Finally, the Cholesky decomposition of the augmented matrix
0)
0 A0
is computed by an out-of-core skyline algorithm.
Iteration phase In each iteration, the preconditioned conjugate gradient method requires the evaluation of the matrix-vector products Au and Cr, where C is the preconditioning operator that implements the mapping r H fi, defined by (2.2).
J. Mandel/Comput. MethodsAppl. Mech. Engrg. 116 (1994) 175-183
180
The matrix-vector product Au is implemented as y'~¢ Arur, by calls to a (possibly user supplied) routine that implements the product Axur on the element level, and by assembly. The matrix-vector product Cr is computed as
Cr=(M',S')G-' (M)r.
(3.1)
Note that the only I / O incurred in (3.1) is a forward and a backward pass through the datasets containing the transformation matrix M and the factorization of G. The termination of the iterations is controlled by an estimate of the relative error in the energy norm [3], see also [22]. Various programming tricks are employed for better performance, such as buffering and virtualization schemes to take advantage of available core memory, and storing the matrices G and M in reduced precision. The vectors needed by conjugate gradients and the topology structure data are not virtualized.
3.2. Scope of the method and form of input data The current implementation accesses the global stiffness matrix only as a collection of local stiffness matrices, with boundary constraints incorporated. Additional constraints are not supported. The method is invariant to the specific selection of the basis functions that span the p-version space as long as there are separate linear functions associated with vertices and quadratic functions associated with edges and faces. Because of the complicated structure of the preconditioner, the code needs a good understanding of the topology and connectivity of the problem. Each element is assumed to be a mapping of one of the master elements. For each element, the code needs information about the location and polynomial degree of basis functions, which can be included in the definition of the master element. A topologically correct conforming finite element method is assumed. That means, for example, that the situation where one of two adjacent elements has a common face and one element has basis functions present on the face while the other has not, is not allowed even if the degree of freedom associated with that basis function is zero in the solution. This is a common trick to implement selective p-refinement. In such case, the extra basis functions should be on the own "extra face" that the elements do not share. Similarly, h refinement implemented as the case where, for example, two small elements share the same face with a large element to avoid slave degrees of freedom will be accepted, even if it is physically impossible. The descriptor tables of master elements are contained in input data. Consequently, the code has no built-in knowledge of element types, so it supports brick as well as wedge or simplex elements and h-p refinements as long as they are input as a topologically correct conforming finite element model. So far, the performance of the method has been thoroughly tested for uniform p serendipity brick and wedge elements only; results of tests for p and h-p refinements and tensor product elements will be reported later. The current version of the code will work as an iterative method for aspect ratios of about 1:25; for higher aspect ratios, poor conditioning is detected and the code reverts to a direct solver for the offending elements. Since the stiffness matrix is symmetric and positive definite and allows many simplifications and optimizations, only the symmetric packed form of the local stiffness matrices is supported.
4. Computational results The results of our first test-bed of problems are summarized in Table 2. The problem was the elastic deformation of airplane double fuselage, discretized by 5510 elements (Fig. 1). A test problem with 2755 elements was derived as one half of the fuselage, and a problem with 260 elements as a segment of half of the fuselage. We can see that the iterative solver outperforms the direct solver for all test problems in terms of CPU time, and for all but the smallest problems in terms of wall-clock time. Asynchronous I / O is planned in a future version of the code, which will allow to spread the I / O to several devices and to bring the wall-clock time close to CPU time. The estimates for the direct solver are believed to be accurate within 10%; in most cases, the direct solver could not be run because of insufficient disk space. Table 3 contains a comparison of timing of the iterative code from [22] and the present code on CRAY-XMP and IBM RS/6000. In terms of wall-clock time, the new code on RS/6000 outperforms the old code on CRAY.
J. Mandel/Comput. Methods Appl. Mech. Engrg. 116 (1994) 175-183
181
Table 2 Comparison of direct and iterative solver for STRIPE Part of frame fuselage, 260 elements p
5 6 7
NDOF*
27264 41580 60624
Stiffness
Direct (skyline)
lterative solver
Disk
CPU
Disk
CPU
Iter
Disk
Mem
CPU
Prep
Wall
51 103 194
0.02 0.03 0.05
263 516 958
0.11 0.29 0.71
26 28 30
30 55 78
23 25 29
0.11 0.22 0.47
60% 64% 71%
0.15 0.29 0.90
Complete frame fuselage, 2755 elements p
5 6 7 8
NDOF
268758 412197 604116 852780
Stiffness
Direct (skyline)
Iterative solver
Disk
CPU
Disk
CPU
Iter
Disk
Mem
CPU
Prep
Wall
520 1046 1966 3493
(0.20) (0.30) (0.50) 1,28
(6012) (11315) (20125) (34082)
(9.0) (25.1) (65.1) (149.3)
105 97 94 93
342 513 757 1153
69 88 97 1"03
3.8 5.5 10.1 19.7
23% 27% 44% 55%
14.9 21.6 28.9 49.5
Double frame fuselage, 5510 elements (Fig. 1) p
7
NDOF
1194210
Stiffness
Direct (skyline)
Iterative solver
Disk
CPU
Disk
CPU
Iter
Disk
Mem
CPU
Prep
Wall
3931
(1.0)
(60868)
(157)
79
2524
138
34.7
60%
95.2
Uniform mesh of 6 x 6 x 6 brick elements p
5 6 7
NDOF,
16938 26460 39438
Stiffness
Direct (skyline)
lterative solver
Disk
CPU
Disk
CPU
lter
Disk
Mem
CPU
Prep
Wall
43 86 161
0.02 0.03 0.05
255 511 975
0.19 0.56 1.43
18 19 23
17 35 63
20 23 28
0.06 0.12 0.30
72% 77% 82%
0.08 0.17 0.43
IBM RS/6000 550. Time in hours, disk space and memory in MB. The stiffness matrix generation times and disk space are for all local stiffness matrices and they are for comparison only; local stiffness matrices are generated as required in the preprocessing phase of the iterative solver. Multiplication by the stiffness matrix is by the fast matrix-vector multiplication in STRIPE. The CPU time for the direct solver does not include stiffness generation; the CPU time for the itemtive solver does include two stiffness generations, and stiffness-vector multiplication in every iteration. Numbers in parentheses are estimates. The computations reported in this table were performed by Dr. B0rje Andersson at the Aeronautical Research Institute of Sweden. Iter is the number of iterations for relative precision 10 - 4 in the energy norm. NDOF is the number of degrees of freedom including the boundary. NDOF, is the number of degrees of freedom without the eliminated boundary conditions. Mere is the amount of dynamically used virtual memory space, including all dynamically allocated buffers. Prep is the percentage of CPU time the iterative solver spent in the preprocessing phase, including twice stiffness generation. Wall is the wall-clock run time on a dedicated machine.
182
J. Mandel/Comput. Methods Appl. Mech. Engrg. 116 (1994) 175-183
Table 3 Comparison of iterative solvers Complete frame fuselage, 2755 elements, p = 6. NDOF = 412197 Method
Machine
lter
Disk
CPU
Wall
From [22] Present
CRAY-XMP IBM RS/6000
124 97
2240 513
2.0 5.5
30 21.55
CPU time on CRAY-XMP does not contain stiffness generation, CPU time on IBM RS/6000 does. Time in hours, disk space in MB. The computations reported in this table were performed by Dr. B6rje Andersson at the Aeronautical Research Institute of Sweden.
Fig. I. Frame fuselage, 5510 elements.Reproduced by permission of the Aeronautical Research Institute of Sweden. In other tests ( n o t reported here in detail), it was observed that in the case when the present c o d e uses local stiffness matrices instead o f fast m a t r i x - v e c t o r multiplication, its p e r f o r m a n c e was about the same as that o f the c o d e from Ref. [22] in terms o f C P U and wall-clock time. But the disk space requirements w e r e smaller for the present c o d e due to the reduced size o f the coarse problem, in particular for p r o b l e m s with little distortion since then the coarse p r o b l e m consists only o f linear degrees o f freedom, i.e., no adaptive enlargements o f the subspaces V/ are needed.
J. Mandel/Comput. Methods Appl. Mech. Engrg. 116 (1994) 175-183
183
Acknowledgements This work has been done by Solvers International, Inc., partially under a contract with the Aeronautic Research Institute of Sweden (FFA). Special thanks are due to Dr. B6rje Andersson of FFA for his assistance, identifying performance bottlenecks, running the benchmarks, and unrelenting encouragement, and to Dr. Urban Falk of FFA for his help with STRIPE interfaces.
References [ I ] B. Andersson, In preparation. [2] B. Andersson and U. Falk, Self-adaptiveanalysisof three-dimensionalstructuresusing the p-version of the finiteelement method, Tech. Report FFA T N 1987-31, The AeronauticalResearch Instituteof Sweden, Bromma, Sweden, 1987. [3 ] S. Ashby, T.A. Manteuffel and RE. Saylor,A taxonomy for conjugategradientmethods, S I A M J. Numer. Anal. 27 (1990) 1542-1568. [4] I. Babugka, A.W. Craig, J. Mandel and J. Pitkliranta,Efficientpreconditioning for the p-version finiteelement method in two dimensions, S I A M J. Numer. Anal. 28 (1991) 624-662. [5 ] I. Babugka, M. Griebel and J. Pitkfiranta,The problem of selectingthe shape functions for a p-type finiteelement, Int.J. Numer. Methods Engrg. 28 (1989) 1891-1908. [6 ] P.E. Bjerstad and J. Mandel, Spectra of sums of orthogonalprojectionsand applicationsto parallelcomputing, BIT 31 ( 1991 ) 76-88. [7] J.H. Bramble, J.E. Pasciak and A.H. Schatz, The constructionof preconditioncrsfor ellipticproblems by substructuring,l, Math. Comput. 47 (1986) I03-134. [8] J.H. Bramble, J.E. Pasciak and A.H. Schatz, The constructionof preconditionersfor ellipticproblems by subsU'ucturing,IV, Math. Comput. 53 (1989) 1-24. [9] J.H. Bramble and J. Xu, Some estimates for a weighted L 2 projection,Math. Comput. 56 (1991) 463-476. [ 10] M. Dryja, A method of domain decomposition for 3-D finiteelement problems, in: R. Glowinski, G.H. Golub, G.A. Meurant and J. Pdriaux,cds., First IntcroationalSymposium on Domain Decomposition Methods for PartialDifferentialEquations, Philadelphia, PA, 1988, SIAM. [ l I ] M. Dryja and O. Widlund, Domain decomposition algorithmswith small overlap,SIAM J. Sci. Stat.Comput., to appear. [12] M. Dryja and O.B. Widlund, Towards a unified theory of domain decomposition algorithms for ellipticproblems, in: T. Chan, R. Glowinski, J. Pdriaux, and O. Widlund, eds., Third InternationalSymposium on Domain Decomposition Methods for Partial DifferentialEquations,held in Houston, Texas, March 20-22, 1989, SIAM, Philadelphia,PA, 1990. [ 13] S. Foresti,G. Brussino, S. Hassanzadch and V. Sonnad, Multilevelsolution method for the p-version of finiteelements, Comput. Physics Comm. 53 (1989) 349-355. [ 14] G.H. Golub and C.F. Van Loan, Matrix Computations (John Hopkins University Press,2nd ed., 1989). [ 15] I.E. Kaporin, L.Y. Kolotilinaand A.Y. Eremin, Block S S O R preconditioningsfor high-order3D FE systems, Linear Algebra and its Applications 154/156 ( 1991 ) 647-674. [ 16] P.L. Lions, On the Schwarz alternatingmethod. I.,in: R. Glowinski, G.H. Golub, G.A. Meurnnt and J. Pdriaux,eds.,Fh'stInternational Symposium on Domain Decomposition Methods for PartialDifferentialEquations, Philadelphia,PA, 1988, SIAM. [17] J. Mandel, Domain decomposition on unstructured subdomains, Proceedings of the 6th InternationalSymposium on Domain Decomposition Methods, Como, Italy,1992 (American Mathematical Society,Providence,RI, 1994). [ 18] J. Mandel, A domain decomposition method for p-version finiteelements in three dimensions, in: Proc. 7th Int. Conf. on Finite Element Methods in Flow Problems, April 3-7, 1989, Huntsville,Alabama, Universityof Alabama at Huntsville,1989. [19] J. Mandcl, Hierarchical preconditioningand partial orthogonalizationfor the p-version finiteelement method, in: T.E Chan, R. GIowinski, J. Periaux and O.B. Widlund, cds., Third InternationalSymposium on Domain Decomposition Methods for Partial DifferentialEquations, Philadelphia,1990, SIAM, pp. 141-156. [20] J. Mandcl, Iterativesolvers by substructuringfor the p-version finiteelement method, Comput. Methods Appl. Mech. Engrg. 80 (1990) 117-128; InternationalConference on Spectraland High Order Methods for PartialDifferentialEquations,Como, Italy,June 1989. [21] J. Mandel, Two-level domain decomposition preconditioningfor the p-version finiteelement method in three dimensions, Int. J. Nurner. Methods Engrg. 29 (1990) I095-I I08. [22] J. Mandel, Adaptive iterative solvers in finite elements, in: M. Papadrakakis, ed., Solving Large Scale Problems in Mechanics. The Development and Application of Computational Solution Methods (Wiley, London, 1993) pp. 65-88. [23] J. Mandel and M. Brezina, Balancing domain decomposition: Theory and computations in two and three dimensions, submitted. [ 24] J. Mandel and G.S. Lett, Domain decomposition preconditioning for p-version finite elements with high aspect ratios, Applied Numer. Math. 8 ( 1991 ) 411-425. [25] L.E Pavarino, An additive Schwarz method for the p-version finite element method, Numer. Math. (1993), to appear. [26] E.M. RCnquist and A.T. Patern, Spectral element multigrid. I. Formulation and numerical results, J. Sci. Comput. 2 (1987) 389-406. [27] B.F. Smith, A domain decomposition algorithm for elliptic problems in three dimensions, Numer. Math. 60 (1991 ) 219-234. [28] B.A. Szab6 and I.M. Babu~ka, Finite Element Analysis (Wiley, New York, 1991 ). [291 O.B. Widlund, Optimal iterative refinement methods, in: T. Chan, R. Glowinski, J. Pdrianx and O. Widlund, eds., Domain Decomposition Methods, Philadelphia, PA, 1989, SIAM.