Localized tensor-based solvers for interactive finite element applications using C++ and Java

Computers and Structures 81 (2003) 423–437 www.elsevier.com/locate/compstruc Localized tensor-based solvers for interactive ﬁnite element application...

Download PDF

215KB Sizes 0 Downloads 17 Views

Report

PDF Reader
Full Text

Computers and Structures 81 (2003) 423–437 www.elsevier.com/locate/compstruc

Localized tensor-based solvers for interactive ﬁnite element applications using C++ and Java G.R. Miller

*,1,

P. Arduino, J. Jang, C. Choi

Department of Civil and Environmental Engineering, University of Washington, Box 352700, Seattle, WA 98195-2700, USA Received 18 June 2002; accepted 21 November 2002

Abstract This paper presents techniques for solving systems of equations arising in ﬁnite element applications using a localized, tensor-based approach. The approach is localized in that a major part of the solution responsibility is delegated to vector degree-of-freedom objects, rather than residing solely in a global solver working on a monolithic data structure. The approach is tensor-based in that the fundamental quantities used for computation are considered to be second-order tensors. The localized data structure strategy provides the beneﬁts of an eﬃcient sparse and symmetric storage scheme without requiring complex implementation code. The tensor-based aspect of the approach can bring substantial performance beneﬁts by increasing the granularity at which solution algorithms deal with their data. Java and C++ implementations of interactive ﬁnite element programs are used to demonstrate performance that is competitive with other available solvers, especially in the case of problems for which interactive analysis is feasible on commonly available hardware. Ó 2003 Elsevier Science Ltd. All rights reserved. Keywords: Finite elements; Object-oriented programming; Interactive modeling; Numerical analysis; Visualization

1. Introduction The general idea of building ﬁnite element programs with explicit ties to the underlying concepts and abstractions has been around for many years, and various object-oriented ﬁnite element programs have been and are being developed based on this model [1–10]. There are many advantages to building complex programs around high-level abstractions, but there is also a general perception that one cannot carry such abstractions

*

Corresponding author. Tel.: +1-206-543-0350; fax: +1-206543-1543. E-mail addresses: [email protected] (G.R. Miller), [email protected] (P. Arduino), [email protected] (J. Jang), [email protected] (C. Choi). 1 J. Ray Bowen Professor of Civil and Environmental Engineering.

too deeply into numerical computations without paying a signiﬁcant performance penalty. In this paper we present an approach that not only allows, but actually exploits the use of high-level abstraction taken all the way down to the numerical solver code to obtain performance on par with traditional solvers. Among other things, our results show that with this approach, even Java-based implementations can be made competitive with other contemporary Fortran and C++ solvers. Although Java is not typically considered as a suitable language for numerical analysis, we include it here because there are many situations where using Java would be desirable given the relative ease with which one can write useful cross-platform code with nontrivial user interfaces. Numerically solving the systems of equations associated with ﬁnite element analysis is typically undertaken from the perspective of application-independent linear algebra, i.e., performing computations involving matrices populated with scalars without reference to the

0045-7949/03/$ - see front matter Ó 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0045-7949(03)00014-2

424

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437

meaning or context of the quantities involved. The approach presented in this paper is based on a perspective that maintains closer ties to the physical context from which the ﬁnite element model is derived. In particular, we exploit the fact that for ﬁnite element applications involving structural mechanics, stiﬀnesses are tensor quantities and unknowns are vector quantities, and ﬁnite element models themselves can be viewed as a system of interconnected degrees of freedom. From this perspective, implementing a solution process involves ‘‘training’’ degrees of freedom to work together to carry out calculations associated with a particular algorithm, with the calculations themselves expressed in terms of fundamental, coordinate-free tensor operations. These ideas have been presented in diﬀerent threads for different purposes in [1,4]: the point of the present paper is to pull these threads together in the context of equation solving and to characterize performance. The broader context of this work is the development of highly interactive analysis environments that support live modeling. We deﬁne live modeling as a type of graphical interface in which one can add/modify loads, boundary conditions, imposed displacements, members, and member properties directly via controls or mouse gestures and see the results of these actions immediately in an animated, quasi-real-time fashion. In eﬀect, the system being modeled appears ‘‘live’’ in the sense that it responds immediately and realistically to input from its environment. The performance demands necessary for achieving live modeling, with multiple solution and image generations per second, are clearly high, but the kind of high performance computing required is not necessarily that of handling extremely large problems on specialized hardware at batch mode speeds. Rather, it is necessary to handle a large number of small to medium sized problems in each second, keeping the displayed state of the system in synch with essentially unpredictable user actions, and to do so on typical desktop hardware. For this reason, our results and discussions here will focus on relatively small to medium problem sizes, going up to about 40,000 DOFs (this is beyond the current range of interactive analysis, but is on the horizon). This having been said, there is no reason that strategies and algorithms appropriate for larger problems cannot be implemented in the framework of a tensor-based, DOF-localized approach: for example, we ourselves have implemented a multi-grid solver in Java that we have used successfully on problems with hundreds of thousands of degrees of freedom [11]. However, this lies outside the scope of this paper: here our focus will be on direct solvers and modest sized problems. Our presentation begins with a brief illustration of how computing stiﬀnesses in terms of coordinate-free tensor quantities can be done. We then consider tensorbased computation, including C++ and Java performance results in the fundamental context of matrix–matrix

multiplication. Next, we describe the implementation of degree-of-freedom oriented equation solution, with a particular emphasis on Gaussian decomposition. Finally, we provide results from reference implementations of tensor-based direct ﬁnite element solvers, and compare performance to other systems in various contexts.

2. Tensor-based stiﬀnesses In an earlier paper, coordinate-free, tensor-based element formulations were presented for general parametric solid mechanics elements under general conditions [12]. Here we present a more brief derivation for the stiﬀness tensors in the case of linear kinematics and linear isotropic material behavior. We also provide tensor-based expressions for the stiﬀnesses characterizing a basic linear beam element. These two examples illustrate the nature of coordinate-free formulations, and also provide useful expressions for implementation. 2.1. Parametric elements We begin with the assumption that there is some parameterized description of location, i.e., we can locate a point P via a relation of the form P ¼ /ðn1 ; n2 ; . . .Þ

ð1Þ

in which / is a mapping function expressed in terms of the parametric coordinates, na . This induces a basis at each point P of the form: ga ¼

o/ ona

ð2Þ

The dual basis associated with this basis is deﬁned such that g a g b ¼ dba

ð3Þ

The displacement gradient can be expressed in terms of parametric derivatives of the displacement ﬁeld, uðP Þ and the dual basis as ru ¼

ou ga ona

ð4Þ

For the case of isoparametric elements, we have both P ¼ N q ðn1 ; n2 ; . . .ÞPq

ð5Þ

and u ¼ N q ðn1 ; n2 ; . . .Þuq

ð6Þ

in which the N q are a suitable set of shape functions and Pq and uq are nodal locations and displacements, respectively. This enables us to express the induced basis as

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437

ga ¼

oN q Pq ona

ð7Þ

Similarly, the displacement gradient becomes

ξ

oN uq g a ¼ uq Bq ona

φ (ξ , ξ , ξ 1

2

3

)

1

P3 1

ð8Þ

in which

P2 ξ

2

P1

1

q

Bq ¼

P4

3

q

ru ¼

425

oN a g ona

ξ

1

Fig. 1. A simple constant-strain tetrahedron element.

We can determine the element stiﬀness by considering the element contribution to the strain energy variation for an arbitrary displacement as follows: Z dU ¼ r : du dV ð9Þ ZV ¼ ½lðru þ ruT Þ þ k trðruÞ : du dV ð10Þ ZV ¼ ½lðuj Bj þ Bj uj Þ V

¼

Z

þ kðuj Bj Þ1 : ðduj Bj Þ dV

ð11Þ

fl½ðBi uj ÞðBj dui Þ þ ðuj dui ÞðBj Bi Þ

ð12Þ

V

þ kðuj Bj Þðdui Bi Þg dV Z ¼ dui fl½Bj Bi þ ðB i B j Þ1

ð13Þ

V

þ kBj B i g dV uj ¼ dui Kij uj

ð14Þ ð15Þ

in which we have assumed isotropic material with LameÕs constants l and k. The element stiﬀness tensors, Kij are thus given by: Z Kij ¼ fl½Bj Bi þ ðBi Bj Þ1 þ kBj Bi g dV ð16Þ V

For cases in which the natural dimension of the element does not match the dimension of the embedding space (e.g., line and plane elements in 3-D) the identity tensor in the above expression should be calculated as 1 ¼ g a g a . It can be seen from Eq. (16) that when i ¼ j, the tensors are symmetric, whereas the oﬀ-diagonal stiﬀness tensors are related via Kji ¼ ðKij ÞT . As a simple example, consider the case of a constant strain tetrahedron (see Fig. 1). The parametric map in this case can be expressed as P ¼ /ðn1 ; n2 ; n3 Þ 1

2

3

¼ P1 þ n ðP2 P1 Þ þ n ðP3 P1 Þ þ n ðP4 P1 Þ

o/ ¼ ðP3 P1 Þ on2

ð19Þ

g3 ¼

o/ ¼ ðP4 P1 Þ on3

ð20Þ

The dual basis can be calculated as follows: ðg 2 g 3 Þ ; ðg 2 g3 Þ g 1 ðg 1 g 2 Þ g3 ¼ ðg 1 g2 Þ g 3 g1 ¼

g2 ¼

ðg 3 g1 Þ ; ðg 3 g 1 Þ g 2 ð21Þ

Using an isoparametric approach, the displacements are expressed as uðP Þ ¼ u1 þ n1 ðu2 u1 Þ þ n2 ðu3 u1 Þ þ n3 ðu4 u1 Þ ð22Þ from which we can identify the following shape function expressions: N 1 ¼ 1 n1 n2 n3

ð23Þ

N 2 ¼ n1

ð24Þ

N 3 ¼ n2

ð25Þ

N 4 ¼ n3

ð26Þ

The gradient vectors can now be determined as B1 ¼ g1 g 2 g 3

ð27Þ

B2 ¼ g1

ð28Þ

B3 ¼ g2

ð29Þ

B4 ¼ g3

ð30Þ

ð17Þ

The corresponding induced basis vectors are o/ g 1 ¼ 1 ¼ ðP2 P1 Þ on

g2 ¼

The stiﬀness tensors can then be written explicitly as ð18Þ

K11 ¼ V ½ðl þ kÞðB1 B1 Þ þ lðB1 B 1 Þ1

ð31Þ

426

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437

θj

θi

uj s Mj ui

Fi

Fj Pj n

Pi

p t

Mi Fig. 2. Linear beam element conﬁguration.

K22 ¼ V ½ðl þ kÞðB2 B 2 Þ þ lðB2 B2 Þ1

ð32Þ

K33 ¼ V ½ðl þ kÞðB3 B 3 Þ þ lðB3 B3 Þ1

ð33Þ

44

4

4

4

4

4EI JG þ ðn nÞ L L

ð43Þ

^ mh ¼ 2EI JG ðn nÞ K L L

ð44Þ

Kmh ¼

K ¼ V ½ðl þ kÞðB B Þ þ lðB B Þ1

ð34Þ

K12 ¼ V ½lðB2 B1 Þ þ kðB1 B2 Þ þ lðB1 B2 Þ1

ð35Þ

K13 ¼ V ½lðB3 B1 Þ þ kðB1 B3 Þ þ lðB1 B3 Þ1

ð36Þ

K14 ¼ V ½lðB4 B1 Þ þ kðB1 B4 Þ þ lðB1 B4 Þ1

ð37Þ

^I ¼ Iss s s þ Itt t t

K23 ¼ V ½lðB3 B2 Þ þ kðB2 B3 Þ þ lðB2 B3 Þ1

ð38Þ

K24 ¼ V ½lðB4 B2 Þ þ kðB2 B4 Þ þ lðB2 B4 Þ1

ð39Þ

K34 ¼ V ½lðB4 B3 Þ þ kðB2 B4 Þ þ lðB3 B4 Þ1

ð40Þ

The relation between displacement and load quantities can be written as 9 2 38 9 8 ui > Fi > Kfu Kf h Kfu Kf h > > > > > > < < = = 6 Kmu Kmh Kmu Kmh 7 6 7 hi ¼ Mi 4 Kfu Kf h 5 Kfu Kf h > > uj > > Fj > > > > : : ; ; hj Mj Kmu Kmh Kmu Kmh

in which V ¼ ðg 1 g 2 Þ g 3 . The primary thing to note about these stiﬀness tensors is that they are deﬁned and can be computed without reference to an explicit underlying global coordinate system. By providing a set of computational classes that encapsulate vector and tensor operations, the expressions given above can be computed eﬃciently using the same level of abstraction as used in the formulation.

in which E is the elastic modulus, G is the shear modulus, J is the torsional stiﬀness, L is the length of the element, A is the cross-sectional area, I is the moment of inertia tensor, and ^I ¼ n I n. 2 In terms of principal directions (s; t) and principal values (Iss ; Itt ), we have ð45Þ

ð46Þ As in the case of the previous example, these stiﬀness expressions can be computed very eﬃciently using essentially direct translations of expressions (41)–(44), and there is no explicit reference to any arbitrary global coordinate system (the memberÕs own n; s; t reference frame is not arbitrary, but rather is an inherent property of the member).

2.2. Beam elements Space does not allow providing full derivational details here, but for the simple linear beam element shown in Fig. 2 the stiﬀness can be characterized in terms of the following four tensors: Kfu ¼

12E^I AE ðn nÞ þ L3 L

3. Tensor-based computations In this section we consider the advantages associated with building computations around tensor quantities rather than scalars. In eﬀect, computing with tensors

ð41Þ 2

Kmu ¼

KTf h

6Eðn ^IÞ ¼ L2

ð42Þ

The cross-product of a vector and second-order tensor is deﬁned using the basic results a ðb cÞ ¼ ða bÞ c and ðb cÞ a ¼ b ðc aÞ:

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437

provides a built-in framework for loop unrolling and small-scale data blocking. It also puts one in a good position in regards to potentially piggy-backing on the technology gains focused on 3-D graphics, as tensor operations and granularities match those of common 3-D transformation quite closely (albeit not exactly due to the common use of 4-D data and transformations in 3-D graphics applications). Fundamental tensor/vector computations could be implemented using the computing infrastructure specialized for geometric transformations, thereby gaining increased performance on par with associated graphics sub-system improvements. It is diﬃcult to predict when the typical engineer will have large-scale parallel computing available on his or her desktop, but small-scale parallelism is already the norm, and this trend is likely to continue (3-D gaming providing a much bigger market for hardware manufacturers than engineers). Implementing basic tensor and vector types in languages such as Java or C++ is straightforward. One essentially encapsulates an internal representation (typically some kind of xyz system), and then provides an operational interface supporting the typical algebra required for manipulation, such as addition, scalar multiplication, inner products, and so on. The key is to keep virtually all explicit references to coordinates encapsulated within the vector and tensor classes, so that all outside users of these classes work in terms of coordinate-free expressions. In the case of element formulations of the kind presented in the previous section, the coordinate-free formulation then translates directly (and eﬃciently 3) into code. When working with aggregate data types, there are always issues associated with argument passing and copying of intermediate results when operations are cascaded together (see [13] for a detailed discussion (and alternative solution) of these issues in the case of C++). Matrix libraries, for example, often require the passing in of a data structure to hold the result of an operation in addition to the operands themselves. The implementation used for generating the results in this paper employs two strategies to deal with these issues. First, each vector and tensor class includes static (class) variables that hold circular lists or rings of the corresponding types to hold and pass back results. This avoids the calls to constructors and destructors that would otherwise be necessary, and in the case of C++ preserves the syntactic convenience of being able to write code of the form C ¼ A þ B. Caution is necessary to ensure that the size

3 Typical matrix expressions such as those used to denote mapping of stiﬀnesses from a local to a global reference frame are usually computationally expensive if implemented at the same level of abstraction as the formulation expression.

427

of the ring be suﬃciently large to handle the most deeply nested calls to the functions in questions, and so this is not necessarily a suitable approach for a general purpose library, but especially in the case of Java, performance improves signiﬁcantly with this strategy. The second strategy for handling issues of intermediate values and copying is both simpler and less general. For particular operations that are called frequently and that involve several sub-operations, it is generally worthwhile simply wrapping up the operation in a single function. For example, in the case of multiplication of matrices containing tensors, the principal operation is to compute in place the following combined sum and product of tensors: S ¼ S þ A B. It makes sense in this case to create a function that encapsulates this entire operation, and to optimize it internally to the degree possible. Again, this is not necessarily a suitable approach for a standalone library, but it works well for internally integrated classes, where there is typically need for very few such ‘‘super functions’’ for most applications, and these functions are called in very localized places. In [14–16], Rucki showed how using tensors in both C++ and Fortran 90 generally improved performance of basic levels 1 and 2 basic linear algebra subroutine operations on various platforms and using various compilers. Fig. 3 shows more recent results comparing the use of tensors versus scalars for dense matrix multiplication on a 1 GHz Pentium III using both Java and C++. The code was virtually identical for each case, and used a standard implementation strategy. For example, the snippet below shows the core C++ code used to generate the scalar and tensor C++ results in Fig. 3: for(int j ¼ 0; j < N; j þ þ) { for(int k ¼ 0; k < N; k þ þ) { Bk[k] ¼ B[k][j]; } for(int i ¼ 0; i < N; i þ þ) { #ifdef SCALAR_FIELD // – use scalars Scalar* Ai ¼ A[i]; Scalar s ¼ 0; #elif TENSOR_FIELD // – use tensors GTensor* Ai ¼ A[i]; GTensor s; #endif for(int k ¼ 0; k < N; k þ þ) { #ifdef SCALAR_FIELD s ¼ s þ Ai½k Bk½k ; #else DotPlusEq(Ai[k],Bk[k],s); #endif } C[i][j] ¼ s; } }

428

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437 600

Java scalarbased Java tensorbased C++ scalarbased C++ tensorbased

500

MFLOPS

400

300

200

100

0

0

200

400

600

800

1000

1200

Matrix dimension(N) Fig. 3. Java and C++ N N dense matrix multiplication performance for tensor and scalar data types.

The results in Fig. 3 show clearly how the use of tensors can have a signiﬁcant impact on performance, in this case allowing tensor-based Java code to out-perform scalar-based C++ code. It is important to note that speed-ups are not always this dramatic, nor do horse race examples of this kind necessarily predict performance in more complex contexts, but one can see the potential beneﬁts of tensor-based computations.

4. DOF-based computation Given an infrastructure for computation that includes vectors and tensors as well as scalars, it is not diﬃcult to build computational classes to support vector degrees of freedom. Following the approach described in [1,4], a vector degree of freedom can be encapsulated in a DOF class, which at the simplest level has the following attributes and capabilities: a generalized displacement; a symmetric stiﬀness tensor; a set of Interactions; and a member function for inter-connection with the DOF objects. The Interaction class provides a means to explicitly link one DOF to another, and consists of a pointer to a DOF identifying whom the interaction is

with, and a tensor containing the interaction itself. The system stiﬀness thus can be stored in a distributed fashion, with each DOF being responsible for storing and managing its own set of equation coeﬃcients. Sparse matrix techniques typically rely on treating the system of equations as a graph, and the DOF-based approach makes this explicit. In eﬀect, what one might think of as an indexed matrix entry of Kij is stored in DOFi Õs interaction set in an Interaction with a pointer to DOFj and a tensor value equal to Kij (see Fig. 4). Assembly consists of elements making calls of the form dof i:Connectðdof j; Kij Þ. This data structure is well suited to storing and managing sparse systems of equations, albeit at the expense of carrying additional pointers for each entry. By working with tensors rather than scalars, though, this additional storage overhead is relatively minor, requiring a single pointer for every nine scalars in the case of 3D analysis. This overhead is reduced further in the case of block solvers, with the 4 4 tensor block solver requiring a single pointer for each 144 scalars. As will be seen in the performance results section, this amortization of the indexing cost over a set of scalar degrees of freedom also pays dividends during run time.

Interactions DOFi

… DOFp Kip

DOFj

Kij

DOFq Kiq …

Fig. 4. DOF-based storage of stiﬀness tensors.

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437

There are various possibilities for the data structure used for the interaction storage within the DOFÕs. Two implementation approaches were used for generating the results in this paper: a simple singly linked list, and an array-based scheme. In the array-based scheme, the implementation separates the DOF pointers and the stiﬀness values into distinct arrays, but the overall logic presumes linkages matching those in Fig. 4. In both cases symmetry is exploited by storing only quantities above the diagonal. The diagonal stiﬀness terms themselves are symmetric tensors, and these are stored with each DOF separately from the general interactions. To facilitate symmetric storage, DOFs are also given ID numbers matching their ordering, and the DOF implementation keeps its interaction lists sorted by ID. Both the linked list and array-based schemes are set up to do numerical and symbolic factorization at the same time, and so ﬁll-in is computed on the ﬂy. This has beneﬁts for smaller problems, in particular, albeit with associated penalties for larger problems. 4.1. Factorization In the DOF-based approach, factorization is based on the ability of individual DOF objects to factorize themselves. GaussElimDOF is a subclass of DOF that provides a member function named EliminateSelf, and factorization is accomplished by having each GaussElimDOF in the system perform this function. The function itself does each of the following: (1) Computes the inverse of the DOFÕs own diagonal stiﬀness tensor. (2) Iterates through its interaction list updating the remote diagonal of all its connected DOFs, i.e., for each interaction j in DOFi Õs interaction list, do the following: Kij ! K 1 ii Kij

ð47Þ

Kjj ! Kjj KTij Kii Kij

ð48Þ

(3) Iterates through its interaction list a second time doing factorization and ﬁll-in of remote interactions, i.e., for each interaction j in DOFi Õs interaction list, do the following updates for interaction p in DOFj Õs interaction list: Kjp ! Kjp KTij Kip

ð49Þ

This is essentially just standard Gaussian elimination with the caveat that the quantities are tensors rather than scalars. The indices referred to in the above expressions are not actually used explicitly in the iterations indicated––rather, the looping is list based, and so iter-

429

ation only occurs over nonzero tensor entries (zero tensor entries are not stored). Fill-in can be handled relatively eﬃciently for the list-based implementation since list insertion is a fast operation. The array-based implementation does more work to handle ﬁll-in, but ends up providing other beneﬁts that generally outweigh the cost. In the event that one has a fairly dense banded structure for a given system of equations, an approach of this kind would not be expected to compete well against a more traditional banded solver were the approach implemented at the level of scalar degrees of freedom. However, when implemented at the level of tensors (or tensor blocks as described below), the implementation ends up performing quite well even in cases for which sparse data structures would be expected to lag. This appears to be due to the fact that the tensor operations can be coded and encapsulated in an eﬃcient manner, and one gains a modicum of loop unrolling and data localization as an inherent part of using the tensor data structure. These are well-known beneﬁts associated with blocking data [17]: the point here is that the blocking can be considered ‘‘natural’’ to the degree the tensors in question are the natural objects of computation. Although the major part of the factorization occurs within the DOF objects, there is still a need for an external Solver class to manage the DOFs and to drive the solution process. This includes forward–backwards substitution as well as factorization, and in the case of Gaussian elimination, we have developed a GaussElimSolver class to handle these operations, again in collaboration with the GaussElimDOF objects. 4.2. DOF blocks Given the beneﬁts derived from the inherent smallscale blocking of data and procedures that come with using tensor-based data, it is likely these beneﬁts can be extended further by explicitly blocking the data further. This indeed has been our experience as will be illustrated in the results presented later. It is straightforward to write code for relatively small blocks without using iteration in the critical inner loop operations, and without requiring particularly fancy mechanisms for identifying appropriate blocks. With the basic unit of blocking being a vector DOF, one can use relatively small, hand coded blocks and end up with scalar blocks sized well for common CPU caches. The trick is to form the blocks in a manner that ﬁts well with the overall analysis framework. Our approach has been to deﬁne an additional set of classes to support blocking while maintaining consistent programming interfaces to the modeling objects (i.e., elements, nodes, and so on). The additional classes come in pairs: a BlockDOF class which extends the DOF class; and a DOFBlock which

430

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437

extends the GaussElimDOF. The BlockDOF is essentially a DOF that knows how to exist within a DOFBlock, and a DOFBlock is essentially a block of DOFs that knows how to carry out block-based factorization. BlockDOFs appear to client elements, nodes, and so on just like normal DOFs, and therefore allow such operations as assembly and results extraction to be handled using the same code as for standard DOFs. These operations are actually carried out by passing information to and from the associated DOFBlock. The DOFBlocks appear to the GaussElimSolver as simply a GaussElimDOF, which can carry out factorization and related operations in the usual fashion. Again, these operations are actually carried out diﬀerently internally, using two sets of displacement and right-hand side quantities, and using BlockInteractions rather than simple Interactions. BlockInteractions are similar to simple Interactions, but instead of a single tensor interaction slot, there are four corresponding to the 2 2 nature of the interaction between DOFBlocks. These block solver classes turned out to enhance performance signiﬁcantly, and so the process was repeated once more with the creation of 4 4 block solver classes. A BlockDOF4 class was used to ﬁll the same role as the BlockDOF class described above, a DOFBlock4 class was developed to serve the function of the DOFBlock class, and so on. Results are presented below for this solver, as well. In principle, this blocking could be expanded further either hierarchically or by using larger blocks. To be eﬀective though, the blocks need to be relatively densely ﬁlled and ﬁt reasonably well with a CPUs high speed memory caches. As the block size grows, the likelihood of this diminishes, and in fact the results to be presented in the next section show diminishing returns in moving from 2 2 blocks to 4 4 blocks for the problems considered. Nevertheless, larger blocks could be eﬀective in some circumstances, and the approach used here could be extended to work in these cases.

5. Performance results In this section we present performance results for Java and C++ implementations of solvers based on the DOF-based, tensor approach in the context of basic ﬁnite element problems, with comparisons to various other C++ and Fortran implementations. Given the pace at which hardware and software technology continues to change, any such performance testing can provide only a ﬂeeting snapshot of relative performance trends. Thus, the point here is not to provide the ﬁnal word concerning the superiority of any particular implementation, but rather to illustrate the relative eﬀectiveness of the DOF-based, tensor approach in a general sense. The results show how the approach presented

here works well both for cases in which the system ordering leads to a highly banded system of equations, and for cases in which the ordering leads to a sparse set of equations. The present results were generated using two basic problem conﬁgurations: one two-dimensional and one three-dimensional. The 2-D conﬁguration is a simple square mesh of rectangular elements, while the 3-D conﬁguration uses brick elements to form the generalization of rectangular meshes in 3-D. In both cases, the number of degrees of freedom was controlled by simply increasing the size of the corresponding meshes, and base constraints were used as the support conditions. In the 2-D case the mesh was increased uniformly in both directions, while in the 3-D case two expansion variations were used: (1) the mesh size was increased in all three directions simultaneously; and (2) the mesh size was held ﬁxed in two dimensions and then increased in the third. For an N N 2-D mesh, the number of scalar degrees of freedom can be calculated as 2NðN þ 1Þ, while for the 3-D mesh the number of degrees of freedom is given by 3N ðM þ 1Þ2 , in which M is the cross-section mesh size, and N is the mesh length in the third dimension. In the 2-D case and the uniformly expanded 3-D case, the natural problem bandwidth increases as the problem size increases, while in the single dimension 3-D expansion, the bandwidth remains relatively constant. The performance tests presented here were run on an 1 GHz Pentium III CPU with 512 MB of RAM. We have run portions of these tests on a variety of hardware, including both older/slower and newer/faster Pentium CPUÕs and PowerPC G3 and G4 CPUÕs. In the case of the Java tests, the results presented here were run using SunÕs JDK 1.3.1 HotSpot Client, although again a number of other runtime environments have been tested, as well. We have also used a variety of compilers (primarily Metrowerks CodeWarrior and Microsoft Visual C++) on various platforms (Windows 98, 2000, and XP; Mac OS 9 and X) with the results in virtually all cases telling a consistent story. The comparison code for these tests is OpenSEES [18,19], an open source ﬁnite element analysis program providing access to a number of solvers. OpenSEES uses an object-oriented, C++ core combined with support for external solvers written in diﬀerent languages. It uses a Tcl-based [20] command language to deﬁne models and conﬁgure analyses. The particular solvers available within the OpenSEES framework used for the comparisons here include the following: BandSPD: A Fortran-based solver built around LAPACK [21] routines, optimized for use with systems of equations with a relatively constant bandwidth. ProﬁleSPD: A C++ solver optimized for use with banded systems with varying bandwidth.

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437 2

2

10

10

Java Tensor Based Solver(LL) Java 2 X 2 Block Solver(LL) Java 4 X 4 Block Solver(LL)

Solution Time(seconds)

Solution Time(seconds)

OpenSees BandSPD OpenSEES ProfileSPD OpenSEES SparseSPD

1

10

0

10

1

10

3

10

4

10

1

10

0

10

10

5

10

1 3

10

Number of DOFs 2

5

10

2

10 C++ Tensor Based Solver(LL) C++ Tensor Based Solver(Array) C++ 2 X 2 Block Solver(LL) C++ 2 X 2 Block Solver(Array) C++ 4 X 4 Block Solver(LL)

OpenSees ProfileSPD Java 4 X 4 Block Solver(LL) C++ 4 X 4 Block Solver(LL)

Solution Time(seconds)

Solution Time(seconds)

4

10

Number of DOFs

10

1

10

0

10

10

431

1 3

10

4

10

5

10

Number of DOFs

1

10

0

10

10

1 3

10

4

10

5

10

Number of DOFs

Fig. 5. Performance results for square 2-D mesh presented in Table 1.

SuperLU: A widely available solver optimized for use with sparse systems of equations [22]. OpenSEES uses a C-based implementation of the solver. SparseSPD: A generalized C-based sparse solver for symmetric, positive-deﬁnite systems. The source code for all these solvers is available as part of the OpenSEES package, and the latter two can

also be obtained independently from the referenced web pages. Comparison performance results were also generated using UMFPack [23], a high performance solver written in Fortran that uses a number of strategies to obtain high performance for general large-scale systems. It was diﬃcult to get consistent solutions for the problems we considered, so we do not present timing results, but for those cases where consistent results could be obtained, the performance was in a similar range to the

432

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437

other OpenSEES solvers. It should be noted that OpenSEES is still in an active development phase, and so once again we emphasize that the results presented here are intended to be indicative rather than deﬁnitive. Identical models were set up and run using OpenSEES and the solvers outlined above, and also using the Java and C++ implementations of the DOF and DOFBlock tensor-based solvers described earlier. For the most part, the sizes of the problems considered are such that the set-up time is relatively insignﬁcant compared to the decomposition time, but the run-times reported here include the full stiﬀness set-up, factorization, and solution cycle (parsing of model descriptions, etc., is not included in the timings). In all cases multiple runs were used to ensure consistent timing, which was especially important for the Java tests for which relatively wide variations are common due to JavaÕs automatic memory management. The meshes were set up in such a way that the default orderings led to systems of equations that were naturally banded, with an essentially constant bandwidth. Running the meshes through a Reverse Cuthill-McGhee [24] re-ordering provides little beneﬁt in these cases, so the solvers were run using the default ordering. For the BandSPD and ProﬁleSPD solvers, this provides an essentially ideal match between the solver and the corresponding structure of the equations. In the case of SuperLU, SparseSPD, and the tensor-based solvers presented here, there is no inherent reliance on a banded equation structure, and one could expect eﬃcient operation also in the case of sparse orderings, such as one would get from a minimum degree ordering algorithm [25]. Such alternative orderings were tested, with some improvement in performance for larger bandwidth sizes.

For those cases in which blocks were used, the blocking was set up in a very simple manner: the default ordering was re-grouped directly into 2Õs or 4Õs, as appropriate. Thus, the resulting blocks were of the form f1; 2g; f3; 4g; f5; 6g; . . . for the 2-DOF blocks, and f1; 2; 3; 4g; f5; 6; 7; 8g; f9; 10; 11; 12g; . . . for the 4-DOF blocks. In the case of 2-DOF blocks in particular, this simple approach is likely to provide reasonably dense blocks since the use of linked vector degrees of freedom ﬁts closely the underlying problem characteristics. Given the breadth of results to be presented, it is worth taking a general overview before moving to detailed descriptions. Fig. 5 and Table 1 present timing results for the two-dimensional mesh for varying problem sizes. The four sub-plots in Fig. 5 depict the range of performance for the OpenSEES solvers, the Java solvers, and the C++ solvers, followed by a comparison of the best performers from each group. Fig. 6 and Table 2 summarize similar results for the 3-D case with the uniformly expanded mesh, while Fig. 7 and Table 3 contain results for the 3-D case with mesh expansion in one direction only. Fig. 8 shows the variation in behavior for this latter class of problems with increasing base dimensions, which translates directly into increasing equation bandwidth. SuperLU was not included in the 3-D results because it exceeded the machineÕs memory capacity relatively early on due to its not using symmetric storage. The various implementations of the approach presented in this paper are identiﬁed in the tables and plots as follows: Java tensor-based and C++ tensor-based refer to Java and C++ implementations of straight tensor-based solvers with no additional blocking; Java 2=4 2=4 block and C++ 2=4 2=4 block correspond similarly to 2=4 2=4 block solvers; the parenthetical notation (LL) and (Array) refers to linked

Table 1 Performance results for a square 2-D mesh Solver

Problem size (number of scalar equations) 5202

13,122

20,402

29,282

39,762

BandSPD ProﬁleSPD SparseSPD SuperLU

0.49 0.34 0.48 0.90

2.03 1.39 1.91 4.06

4.35 2.85 3.85 9.40

7.95 5.30 7.05 19.2

13.7 9.05 11.8 49.4

Java tensor-based (LL) Java 2 2 block (LL) Java 4 4 block (LL)

0.93 0.60 0.71

3.86 3.02 2.58

8.66 6.99 5.29

18.3 13.5 10.6

41.7 25.3 17.2

C++ C++ C++ C++ C++

0.29 0.26 0.19 0.18 0.20

1.41 1.14 0.87 0.78 0.90

4.74 2.79 1.95 1.69 1.86

15.9 6.27 3.96 3.51 3.71

40.1 15.2 8.05 7.33 7.09

tensor-based (LL) tensor-based (Array) 2 2 block (LL) 2 2 block (Array) 4 4 block (LL)

Solution times in seconds. Bandwidth varies with increasing problem size.

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437 3

3

10

10

Java Tensor Based Solver(LL) Java 2 X 2 Block Solver(LL) Java 4 X 4 Block Solver(LL)

Solution Time(seconds)

Solution Time(seconds)

OpenSees BandSPD OpenSEES ProfileSPD OpenSEES SparseSPD

2

10

1

10

0

10

2

10

1

10

0

3

10

4

10

10

5

10

3

10

Number of DOFs 3

5

10

3

10 C++ Tensor Based Solver(LL) C++ Tensor Based Solver(Array) C++ 2 X 2 Block Solver(LL) C++ 2 X 2 Block Solver(Array) C++ 4 X 4 Block Solver(LL)

OpenSees BandSPD Java 4 X 4 Block Solver(LL) C++ 4 X 4 Block Solver(LL)

Solution Time(seconds)

Solution Time(seconds)

4

10

Number of DOFs

10

2

10

1

10

0

10

433

2

10

1

10

0

3

10

4

10

5

10

Number of DOFs

10

3

10

4

10

5

10

Number of DOFs

Fig. 6. Performance results for brick 3-D mesh presented in Table 2.

list and array-based implementations of the DOF interaction lists, respectively. Turning now to the results themselves, we note ﬁrst that overall the Java and C++ block solvers performed well in general, with the Java 4 4 block solver even being among the fastest in some cases. The eﬃcacy of basing computations of this nature on blocks is well documented, and the tensor data structures used here

apparently provides a useful way of getting good blocking behavior. For Java in particular, the packaging of the fundamental inner loop computations in a form requiring no sub-block indexing or looping works well, especially in the case of the 3-D problems. As can be seen by comparing the Java and C++ results for matching solvers in Tables 1–3, the performance diﬀerential between the two languages is much less in the 3-D

434

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437

Table 2 Performance results for 3-D mesh Solver

Problem Size (number of scalar equations) 5184

10,125

14,739

20,577

24,000

BandSPD ProﬁleSPD SparseSPD

5.34 6.72 7.28

19.8 29.2 29.9

50.4 66.3 66.9

91.3 138.9 139.7

142.5 196.5 198.7

Java rensor-based (LL) Java 2 2 block (LL) Java 4 4 block (LL)

9.16 4.45 3.04

49.1 22.7 14.6

120.3 55.6 33.6

267.2 122.8 72.9

385.5 177.7 105.9

C++ C++ C++ C++ C++

7.16 4.87 2.92 2.64 2.24

39.5 29.8 17.4 15.5 11.3

102.8 72.3 42.0 38.3 27.0

230.2 160.2 96.8 85.0 60.0

330.0 235.4 136.5 120.9 85.4

tensor-based (LL) tensor-based (Array) 2 2 block (LL) 2 2 block (Array) 4 4 block (LL)

Solution times in seconds. The same number of elements are used in the x, y, and z directions so that bandwidth varies with increasing problem size.

case. In the 2-D case, there is less size diﬀerence between scalars and vectors, and so in general the eﬀect of using tensors alone is not as marked. For C++, there generally was little performance diﬀerence overall between the 2 2 and 4 4 block cases, whereas for Java the further increase in block size continued to be more noticeably beneﬁcial. An exception to this can be seen in the uniform expansion 3-D results in Table 2, where the C++ 4 4 block solver performs signiﬁcantly better than the 2 2 block solver for the larger problem sizes. As a point of reference for all these results, the C++ 2 2 block array solver achieved a maximum rate for the 2-D problems of about 49% of the theoretical ﬂoating-point rate of the processor. As mentioned in the introduction, the tensor-based solvers have been developed with a particular interest in live modeling contexts, and for the smaller problem sizes, the data in the tables show that they perform well in these cases. Data for even smaller problems are not included in the tables, but in the case of problems with just a few thousand degrees of freedom, the localized, tensor-based approach presented here exhibits particularly good performance. OpenSEES is based on a strategy of relying on external solvers to service the C++ modeling code, and the overhead associated with this design becomes more apparent for smaller problems in which set-up costs become more signiﬁcant relative to solution costs. The integrated approach used here does not carry this same overhead. While all the various implementations of the localized, tensor-based approach performed well for smaller problems, in many cases the linked-list implementations did not scale particularly well. There are various explanations for this, one of the most signiﬁcant being the relative overhead of using linked-lists versus arrays [26,27]. Comparing the linked list and array-based ten-

sor solvers in Table 1, this diﬀerence can be seen quite clearly. The eﬀect is exacerbated for larger problem sizes by an ultimately ineﬃcient dynamic memory allocation scheme associated with using the default implementation of new for allocating the list data. 4 For the 2 2 block solvers in the same table, however, the diﬀerence is much less pronounced, since the increased data granularity reduces the amount of container management work that must be done. Similar behavior can be seen in Tables 2 and 3, as well. Another important factor in this is the fact that the linked-list solvers are designed to work best with sparse data, and the overhead associated with relatively expensive indexing becomes problematic in the case of large, dense bands. Fig. 9 shows examples of improved performance in the case of using minimum degree ordering for both plain tensor and 2 2 block linked-list solvers. The focus here is on relative performance, so the plots have been normalized with respect to the largest problem size. It should also be noted that the ordering time itself is not included in these results, primarily because we used a simple ordering implementation that would not be representative of typical ordering speeds. The point of including the results in Fig. 9 is to show simply that the schemes presented here can exploit the advantages of sparse ordering when these advantages are present. It should be noted, however, that sparse ordering does not always improve performance. With the crucial role played by high speed

4

A pool-based allocation scheme would make more sense in this case, and we recently have been able to obtain linked-list performance on par with the array-based performance via this mechanism. The key is to provide each DOF with its own pool allocator.

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437 2

2

10

10

Java Tensor Based Solver(LL) Java 2 X 2 Block Solver(LL) Java 4 X 4 Block Solver(LL)

Solution Time(seconds)

Solution Time(seconds)

OpenSees BandSPD OpenSEES ProfileSPD OpenSEES SparseSPD

1

10

0

10

1

10

0

3

10

4

10

10

5

10

3

10

Number of DOFs 2

5

10

2

10

OpenSees BandSPD Java 4 X 4 Block Solver(LL) C++ 4 X 4 Block Solver(LL)

Solution Time(seconds)

C++ Tensor Based Solver(LL) C++ Tensor Based Solver(Array) C++ 2 X 2 Block Solver(LL) C++ 2 X 2 Block Solver(Array) C++ 4 X 4 Block Solver(LL)

Solution Time(seconds)

4

10

Number of DOFs

10

1

10

0

10

435

1

10

0

3

10

4

10

5

10

Number of DOFs

10

3

10

4

10

5

10

Number of DOFs

Fig. 7. Performance results for brick 3-D mesh presented in Table 3.

caches in current architectures, the reduction in data locality and density associated with increased sparsity can easily make it likely that overall performance will drop, even though substantially less computational work is being done. We observed this in our own testing in which we kept track of the number of actual computations during factorization for cases with and without minimum degree ordering, and often saw a

substantial lag (or complete disappearance) of the cross-over point for performance relative to the crossover point for number of computations. It was necessary to use diﬀerent mesh aspect ratios for the regular and block cases to obtain the kind of results shown in Fig. 9. Thus, Fig. 9 shows best-case scenarios for minimum degree ordering, but this is not necessarily representative across the board.

436

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437

Table 3 Performance results for 3-D mesh Solver

Problem Size (number of scalar equations) 5082

10,890

20,328

30,129

40,293

BandSPD ProﬁleSPD SparseSPD

4.32 4.91 5.46

9.91 11.5 12.9

19.2 22.4 24.6

29.1 34.2 37.4

39.8 46.7 50.0

Java tensor-based (LL) Java 2 2 block (LL) Java 4 4 block (LL)

5.69 3.01 2.32

13.9 7.74 5.81

27.1 15.1 10.6

40.9 23.0 17.1

54.9 27.6 20.0

C++ C++ C++ C++ C++

4.45 2.99 1.83 1.78 1.66

10.8 7.09 4.39 4.26 3.91

20.9 13.7 8.60 8.21 7.68

31.8 20.6 13.2 12.3 11.7

43.4 27.8 18.1 16.6 16.1

tensor-based (LL) tensor-based (Array) 2 2 block (LL) 2 2 block (Array) 4 4 block (LL)

Solution times in seconds. The number of elements in the x and y direction are ﬁxed, and the number of elements in z direction varies so that a ﬁxed bandwidth of 363 is used.

memory usage was similar across the board is consistent with what one would expect given the use of the natural ordering and conceptually similar data storage schemes. The use of object-oriented techniques is often associated with unacceptable performance trade-oﬀs, especially at the level of numerical computation. The outcome of the numerical tests presented here, however, shows that such performance trade-oﬀs can be circumvented with suitable formulation and implementation strategies.

100

Solution time in seconds

OpenSees BandSPD OpenSEES ProfileSPD C++ Tensor Based Solver(Array) C++ 2 X 2 Block Solver(Array) C++ 4 X 4 Block Solver(LL)

10

6. Conclusions

1 243

270

300

330

363

396

432

468

507

Various bandwidth with the same number of DOFs(10,500)

Fig. 8. Relationship between performance level and various bandwidths. All the tested cases have the same number of DOFs.

Finally, memory use is another important issue for solvers, since it can dramatically inﬂuence performance in cases where problem size causes spill-over into virtual memory. It should be noted that standard Java implementations have a default heap size of only 64 MB, so in order to run the larger problem sizes presented here, it was necessary to override this default. Memory use was monitored for the tests presented above, and in general the various symmetric schemes were comparable, with the tensor-based approaches typically requiring a little less memory than the OpenSEES solvers. The fact that

Tensors and vectors form the natural basis for much of the theoretical foundation of structural mechanics. This paper has shown that tensor and vector formulations also can be used productively and consistently in the context of numerical implementations. The beneﬁts of this include those usually associated with object-oriented approaches: simpliﬁed coding due the support for higher-level abstractions; more direct relationships between conceptual and implementation models; inherent modularity and extensibility; and so on. These beneﬁts are particularly useful in the context of building systems with rich, interactive interfaces. However, the additional beneﬁt demonstrated in this paper is that one can also achieve competitive numerical performance. In many respects, this good performance can be obtained because of, rather than in spite of, the object-oriented approach. We would note in closing that the general possibilities and potential for tensor-based computation extend beyond the basic implementations of direct solver presented here. In most cases, the strategies and techniques available for general analysis scalar-based systems of linear or nonlinear equations can be adapted to accommodate tensor and vector data types.

G.R. Miller et al. / Computers and Structures 81 (2003) 423–437 Tensor Based Solver(LL)

437

2 X 2 Block Solver(LL)

1

1 w/o MDO w/ MDO

Normalized solution time

Normalized Solution Time

w/o MDO w/ MDO

0

1 Normalized number of DOFs

0

1 Normalized Number of DOFs

Fig. 9. Performance increase with minimum degree ordering.

References [1] Miller GR. A LISP-based object-oriented approach to structural analysis. Eng Comput 1988;4:197–203. [2] Baugh Jr JW, Rehak DR. Object-oriented design of ﬁnite element programs. In: Nelson JK, editor. Computer utilization in structural engineering. New York: American Society of Civil Engineers; 1989. p. 91–100. [3] Forde BW, Foschi RO, Stiemer SF. Object-oriented ﬁnite element analysis. Comput Struct 1990;34:355–74. [4] Miller GR. An object-oriented approach to structural analysis and design. Comput Struct 1991;40:75–82. [5] Zimmermann T, Dubois-Pelerin Y, Bomme P. Objectoriented ﬁnite element programming: I––governing principles. Comput Meth Appl Mech Eng 1992;98:291–303. [6] Lucas D, Dressler B, Aubry D. Object-oriented ﬁnite element programming using the Ada language. In: Hirsch, et al., editors. Numerical methods in engineering Õ92. New York: Elsevier Science Publisher; 1992. p. 591–8. [7] Ohtsubo H, Kawamura Y, Kubota A. Development of the object-oriented ﬁnite element system––MODIFY. Eng Comput 1993;9:187–97. [8] Archer GC, Fenves G, Thewalt C. A new object-oriented ﬁnite element analysis program architecture. Comput Struct 1999;70(1):63–75. [9] Mackie RI. An object-oriented approach to calculation control in ﬁnite element programs. Comput Struct 2000; 77(5):461–74. [10] Yu LC, Kumar AV. An object-oriented modular framework for implementing the ﬁnite element method. Comput Struct 2001;79(9):919–28. [11] Jang J. An assessment of iterative solution techniques for interactive ﬁnite element analysis. MS thesis. Department of Civil Engineering, University of Washington; 2001. [12] Miller GR. Coordinate-free isoparametric elements. Comput Struct 1993;49(6):1027–35. [13] Stroustrup B. The C++ programming language. 3rd ed. New York: Addison-Wesley; 2000. [14] Rucki MD. An algorithmic framework for ﬂexible ﬁnite element modeling. PhD dissertation. Department of Civil Engineering, University of Washington; 1997.

[15] Rucki MD, Miller GR. An algorithmic framework for ﬂexible ﬁnite element-based structural modeling. Comput Meth Appl Mech Eng 1996;36(3-4):363–84. [16] Rucki MD, Miller GR. An adaptable ﬁnite element modeling kernel. Comput Struct 1998;69(3):399–409. [17] Demmel JW. Applied numerical linear algebra. SIAM; 1997. [18] An open system for earthquake engineering simulation. Paciﬁc Earthquake Engineering Research Center: Available from: http://opensees.berkeley.edu. 2000. [19] McKenna F. Object oriented ﬁnite element programming: framework for analysis, algorithms, and parallel computing. PhD dissertation. Berkeley, CA: Department of Civil Engineering, University of California at Berkeley; 1997. [20] TCL Developer Site. Available from: http://www.tcl.tk. 2002. [21] Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J, et al. LAPACK usersÕ guide. 3rd ed. Philadelphia, PA: Society for Industrial and Applied Mathematics; 1999. [22] Demmel JW, Gilbert JR, Li XS. SuperLU usersÕ guide. Technical report UCB//CSD-97-944. UC Berkeley: Computer Science Division; 1997. [23] Timothy A, Davis TA, Duﬀ IS. A combined unifrontal/ multifrontal method for unsymmetric sparse matrices. Technical report TR-97-016. Computer and Information Science and Engineering Department, University of Florida; 1997. [24] Meurant G. Computer solution of large linear systems. Studies in mathematics and its applications. North-Holland; 1999. p. 28. [25] George A, Liu JWH. The evolution of the minimumdegree ordering algorithm. SIAM Rev 1989;31(1):1–19. [26] Bulka D, Mayhew D. Eﬃcient C++: performance programming techniques. New York: Addison-Wesley; 2000. [27] Black JR, Martel CU, Qi H. Graph and hashing algorithms for modern architectures: design and performance. In: Proceedings of the 2nd Workshop on Algorithm Engineering. WAE 98, Max-Planck Institute f€ ur Informatik; 1998. In TR MPI-I-98-1-019:37-48.

Localized tensor-based solvers for interactive finite element applications using C++ and Java

Localized tensor-based solvers for interactive finite element applications using C++ and Java

Recommend Documents