Using Python for large scale linear algebra applications

Using Python for large scale linear algebra applications

Future Generation Computer Systems 21 (2005) 969–979 Using Python for large scale linear algebra applications Oliver Br¨oker a,∗ , Oscar Chinellato b...

202KB Sizes 0 Downloads 63 Views

Future Generation Computer Systems 21 (2005) 969–979

Using Python for large scale linear algebra applications Oliver Br¨oker a,∗ , Oscar Chinellato b , Roman Geus c a

Computational Laboratory (CoLab), Hirschengraben 84, ETH, Zurich 8092, Switzerland b Institute of Computational Science (ICoS), ETH, Zurich, Switzerland c Paul-Scherrer-Institute (PSI), 5232 Villingen, Switzerland

Received 11 October 2004; received in revised form 11 February 2005; accepted 16 February 2005 Available online 8 April 2005

Abstract Software used in scientific computing is traditionally developed using compiled languages for the sake of maximal performance. However, for most applications, the time-critical portion of the code that requires the efficiency of a compiled language, is confined to a small set of well-defined functions. Implementing the remaining part of the application using an interactive and interpreted high-level language offers many advantages without a big performance degradation tradeoff. This paper describes the Pythonic approach, a mixed language approach combining the Python programming language with near operating-system level languages. We demonstrate the effectiveness of the Pythonic approach by showing a few small examples and fragments of two large scale linear algebra applications. Essential advantages of the Pythonic mixed language approach is the combination of flexible, readable, shorter, and most importantly less error-prone syntax with performance similar to pure Fortran or C/C++ implementations. © 2005 Elsevier B.V. All rights reserved. Keywords: Python; C/C++; Fortran; Linear algebra; Multigrid; Eigenvalue solver

1. Introduction Even today, in the age of object-oriented and component based software construction, the better part of numerical software is still written in Fortran or C for reasons of performance. Unfortunately, at the same time, these codes are known to be rather incomprehensible and their documentation, if available at all, is often overwhelmed by inscrutable details. ∗

Corresponding author. Tel.: +41 1 632 7433. E-mail addresses: [email protected] (O. Br¨oker); [email protected] (O. Chinellato); [email protected] (R. Geus). 0167-739X/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2005.02.001

Moreover, many constructs in Fortran/C code involve complex references to arrays and structures, mostly due to the lack of powerful data-structures. Also, it is sensible to use short variable names for indices that appear often. Consequently, many such codes are full of cryptic references. Another observation is that typical library functions written in Fortran or C tend to have 10 parameters or more. Not only is this considered bad programming practice, it also makes code difficult to read, verify and debug. In addition, standard procedural programming style often leads to very long programs with repeated

970

O. Br¨oker et al. / Future Generation Computer Systems 21 (2005) 969–979

code, a circumstance that renders its maintenance difficult. Thus, the resulting development process suffers from unnecessary complexity, packages become non-transparent and adding features becomes utterly difficult. Also, the write, compile, link, and test cycles becomes very time-consuming, especially if large libraries need to be rebuilt. Last, but not the least, established mathematical notation for numerical methods, which often provides compact and clear expression of the semantics, can seldom be translated into the above mentioned languages in an adequate way. 1.1. Requirements of applications from computational linear algebra Any software design process is a balance of multiple factors. We list and discuss the factors that influenced our approach: (R1) Correctness (R2) Generality

(R3) Efficiency

(R4) Reuseability (R5) Robustness

(R6) Simplicity (R7) Parallelism

Software must be correct The resulting software should, if possible, not be limited to a special problem, geometry, discretisation technique or resulting matrix class The resulting routines should use minimal (and predictable) amounts of memory and computing time The written software should be reusable in other contexts The methods should neither suddenly break down, nor behave in any other unstable fashion The created interfaces should be simple to understand, use, extend and refactor Good parallel speedup on distributed memory machines with thousands of processors is very desirable

While the emphasis of different users may vary, ideally all requirements should be fulfilled at the same time. Using the approach presented in the following strikes a good balance of requirements (R1)–(R6). It is important to note, that these requirements are by no means mutually exclusive. Indeed, the goals in objectoriented programming in general overlap heavily with the items listed above. Using an environment that is based on the modular programming approach, naturally yields improved software correctness (R1). Writing small modules and classes also make the resulting software simple, general

and reusable in other contexts (R2, R4, R6). Maximal efficiency of the code is eventually guaranteed by implementing the computationally expensive routines in a near operating-system level language (R3). The efficient parallelisation of a given algorithm (R7) is desirable, but usually difficult. A successful parallelisation can only be obtained with exact knowledge of a given fixed method. Our approach, in contrast, aims at the efficient production of flexible prototype code. We therefore restrict our attention to sequential programming only. 1.2. Overview of classical programming languages Numerical linear algebra is a critical tool in technical computing. The range of programming languages suitable for an application from that area ranges from Fortran77 to today’s scripting languages, each of which having advantages and drawbacks. Here is a short summary of the most common experiences with a few popular approaches: Fortran77 and its descendants 90/95/etc. provide portable and fast compilers which generate efficient machine executables. The well known BLAS and LAPACK software [1] is just one example of the numerous libraries in the area of numerical linear algebra. Despite its age Fortran is still widely used for exactly the above reasons. While it offers features valuable for linear algebra computations, like intrinsic complex arithmetic and vector notational constructs, it obviously lacks fundamental programming constructs like dynamic memory allocation, structures, objects, and pointers. Fortran90 has resolved only some of these issues, but is still limited when compared to C++ for example. C is very popular in scientific computing, because it supports many features that Fortran77 is lacking, while suffering practically no loss in performance. C supports dynamic memory allocation, structures and pointers. Furthermore, it is possible to use Fortran77 libraries like BLAS and LAPACK from C. On the other hand C lacks Fortran’s assumed shape arrays, which make working with arbitrary sized arrays very convenient. Many numerical C libraries (e.g. PLAPACK [2]) use BLAS and LAPACK concepts.

O. Br¨oker et al. / Future Generation Computer Systems 21 (2005) 969–979

C++ has become popular in scientific computing, because of its efficiency combined with its object-oriented features and support for generic programming using the template mechanism. A number of recent large software projects like Trilinos [3], Diffpack [4], etc. use C++. However developing large and well-structured software packages requires skilled developers, since the language has become very complex and new features are still being added. Another problem is the lack of garbage collection, which makes it hard to avoid memory leaks. MATLAB is one of the most widely used languages for prototyping applications in technical computing. It is interpreted and can be used interactively. MATLAB’s library contains hundreds of mathematical functions, many of them related to linear algebra. That renders it an attractive language for prototyping numerical algorithms. MATLAB’s drawbacks lie in its inflexible language and its bad performance, particularly what concerns sparse matrices.1 Additionally MATLAB is the only tool in this list that requires a commercial license. For prototyping we find MATLAB the most suitable tool in its field, especially due to its superior graphics capabilities. Java For sample applications Java byte-code has shown to be slower by a factor of 2–20 [5], depending on the compiler and whether the code was executed on a virtual Java machine or justin-time compiled. This fact explains the little role Java plays in the area of computational science. Perl The Perl Data Language (PDL) gives standard Perl the ability to compactly store and speedily manipulate the large multi-dimensional data arrays. Similar to MATLAB one can write simple Perl expressions to manipulate entire numerical arrays at once. Unfortunately Perl’s object-oriented interface is far from elegant, which makes the language less attractive for large applications. 1 MATLAB’s inappropriate sparse matrix data structure is the main reason for its poor performance. One could possibly implement a completely different sparse matrix interface using MEX. Unfortunately, such an interface would be very limited, e.g. subscripting operators can not be implemented, etc.

971

Python Python is a portable, interpreted, objectoriented programming language. The language has an elegant syntax and contains a number of powerful high-level built-in data types. It can be extended by adding modules for functions, variables, and new object types, implemented in a compiled language such as C or C++. There is clearly no “best language” suitable for scientific computing. However, a recent article by Prechelt [6] compares seven current programming languages, including Python, concluding that “scripting languages [. . . ] offer reasonable alternatives to C and C++, even for tasks that must handle fair amounts of computation and data”. Consequently, we propose a mix of languages for developing large scale linear algebra software. We opt for an approach that uses a clever combination of Python, C/C++ and Fortran code—the Pythonic approach. In the following section we will detail more clearly what we mean by the Pythonic approach and discuss the implications.

2. The Pythonic approach The Pythonic approach mainly combines Python, numerical Python, PySparse2 and C/C++ to create an easy to use interface to efficient libraries suitable for algorithms in large scale linear algebra. In this section we introduce the Python programming language, the numerical Python module and our PySparse package for sparse matrix computations. We then summarise the framework that our applications are built on. What we are calling the Pythonic approach is actually a mixed-language programming approach: the application logic and higher level routines are implemented in Python, while the time-critical parts, like sparse and dense linear algebra routines, iterative solvers, preconditioners, sparse matrix factorisations, and eigensolvers are implemented in C/C++. Since the C/C++ code is tightly integrated into the Python framework, there is no visible difference to pure Python code for the user. 2

Available at http://pysparse.sourceforge.net/.

972

O. Br¨oker et al. / Future Generation Computer Systems 21 (2005) 969–979

Fig. 1. Pythonic application architecture.

From the application user’s point of view, other improvements are also important: the computation can now be steered by scripts, which usually consist of only a few lines of Python code. An additional benefit is the fact that the computational structure is directly reflected in these scripts and thus becomes much more transparent (Fig. 1).

others are specific to a particular platform or environment. The Python implementation is available for many brands of UNIX, on Windows, DOS, OS/2, Macintosh, Amiga, NeXT and BeOS, to name a few. Documentation of the Python language and many references thereto can be found on the main Python website.3 Most useful when working with Python is the introduction to Python and the complete Library Reference. The Python to C interface is described in a separate document. As an interpreted general-purpose programming language, Python itself is not well suited for highperformance numerical applications off-the-shelf. The Numerical Python and PySparse packages address this shortcoming by extending the language with dense and sparse arrays and set of operations on these. Both packages follow a mixed programming language approach: performance-critical routines are implemented in C, C++ or Fortran, while higher level functionality is written in Python.

2.1. The Python language 2.2. Numerical Python (NumPy) Python is an interpreted, interactive, objectoriented programming language. It combines remarkable power with very clear syntax. It provides modules, classes, exception handling, very high level dynamic data types, and dynamic typing. There exist interfaces to many system calls and libraries. New extension modules can easily be written in a natively compiled languages like C or C++. Python is also usable as an extension language for applications that need a programmable interface. Python has a full set of string operations (including regular expression matching), and frees the user from most hassles of memory management through an automatic reference counting mechanism. These and other features make it an ideal language for prototype development and other ad hoc programming tasks. Python supports the development of large programs, even though it lacks most forms of compile-time checking: a program can be constructed out of a number of modules, each of which defines its own name space, and modules can define classes which provide further encapsulation. Exception handling makes it possible to catch errors where required without cluttering the code with error checking. A large number of modules have been developed for Python. Some are part of the standard library of tools,

The standard Python array data type is called sequence. The data type is so general that it becomes slow when operating with millions of entries. Thus Numerical Python was created for manipulating large multi-dimensional arrays. The functionality in NumPy is very similar to that in MATLAB: NumPy adds a powerful array notation to the Python language by means of a rich set of operators and functions, that support basic linear algebra, signal processing and random number generation tasks. References on Numeric Python can be found on the NumPy website4 and the references therein. 2.3. PySparse PySparse extends the Python interpreter by a set of sparse matrix types [7]. The entries of such a sparse matrix can be accessed conveniently from Python using 2D array indexing. Submatrices can be accessed similarly using index ranges. One sparse matrix type (ll mat) is based on a linked list data-structure and is designed for efficiently 3 4

See http://www.python.org. See http://www.pfdubois.com/numpy/.

O. Br¨oker et al. / Future Generation Computer Systems 21 (2005) 969–979

creating or modifying matrices. Another sparse matrix type (csr mat) is designed for memory efficiency and fast row-by-row access to its elements, which is desirable for matrix–vector multiplication. PySparse can store symmetric matrices efficiently (sss mat type) and provides conversion routines for the different formats. Among the operations related to sparse matrices the package implements • matrix–vector and matrix–matrix-multiplication; • a set of iterative methods for solving linear systems of equations; • a set of standard preconditioners; • a direct solver for sparse linear systems of equations (interface to SuperLU); • an eigenvalue solver for the symmetric, generalised matrix eigenvalue problem (JDSYM). PySparse has been designed with reuse and extensibility in mind. Sparse matrices, preconditioners and solvers are Python objects. For performance reasons, most objects in PySparse are implemented as extension types. Interoperability between these objects is ensured by imposing certain standards on their attributes and methods. Every Python object having a shape attribute which describes the dimensions of a matrix and a matvec method for performing a matrix–vector multiplication can be passed as a matrix to PySparse routines. Preconditioners have a shape attribute and also a precon method, that applies the preconditioner on one vector and stores the result in another vector. In analogy, a solver has a shape attribute and a solve method. In this way, e.g. a new preconditioner type can be introduced without changing any of the existing library code in PySparse. Only the script creating the preconditioner object needs to be adjusted. The use of the object remains the same: it is just a matter of calling the precon method.

973

statement we provide a few examples, which attest the competitiveness of Python’s built-in data-structures and syntax. The code snippets address various areas of programming, including function interface design, operator overloading, abstract data types, sorting, and I/O facilities. 3.1. Function calls Function definitions and calls are ubiquitous in any modular code. Thus, their simplicity is of major importance. Python supports a mix of positional, keyword and variable argument lists, allowing default values. This approach is superior to all languages under consideration in this paper, in that it unifies all established concepts in a simple way. The first example illustrates the advantages of the Python approach. The SPAI (sparse approximate inverse) software [8] provides functions to compute an approximate inverse. The spai routine has six parameters with appropriate default values, where the first parameter is of primary interest. In C, changing only one parameter requires the repetition of all other values as well, see Fig. 2(a). This redundancy is error-prone and does not reflect the essential change. A corresponding Python routine can conveniently specify the single altered argument by a named argument, while still keeping the default values, see Fig. 2(b). 3.2. Sparse matrix assignment and reference Matrices are naturally handled using the traditional mathematical notation A(i, j) for element reference and assignment. Most programming languages support such a multi-dimensional indexing for memory-contiguous data. Sparse matrices conversely require more sophisticated data-structures, rendering

3. Code snippet comparisons In the introductory section, we shortly claimed that the Python language in most cases permits to write elegant, clear, and compact code. To underline this

Fig. 2. Code comparison: argument lists: (a) C code; (b) Python code.

974

O. Br¨oker et al. / Future Generation Computer Systems 21 (2005) 969–979

Fig. 3. Code comparison: sparse matrices: (a) MATLAB code; (b) Python code.

a natural indexing impossible. Instead, element reference and assignment must be hidden in accessor functions. This distinction between interfacing a contiguous memory block and abstract data type is unnecessary. MATLAB’s sparse matrix type allows elegant indexing, see Fig. 3(a). This convenient syntax can be replicated by Python objects with overloaded getitem and setitem methods, see Fig. 3(b). The resulting Python code matches almost literally matches the practical MATLAB example.

respect to the second item in the arguments. The C code is technically similar to the Python code, as it uses a reference to a compare function as an argument to the sort method. Still, the C code contains some details about object sizes and lists length that become superfluous in the Python variant, thus making the Python code more compact, flexible, and most importantly less error-prone. For mapping types this advantage becomes yet more obvious, even when using C++ in combination with the STL.

3.3. Sorting

3.4. String manipulation

Creating abstract data types (ADTs), such as lists of lists or mapping types, in C/C++ requires either their effective implementation or use of standard libraries such as the STL (standard template library). Fig. 4 shows an example of sorting a list of lists with

Strictly speaking, there are no character strings in C or Fortran, just arrays of single characters. C++ string classes are much better, but their capabilities are not as elaborate and their use does not show the striking elegance of string manipulation of today’s scripting

Fig. 4. Code comparison: sorting: (a) C code; (b) Python code.

O. Br¨oker et al. / Future Generation Computer Systems 21 (2005) 969–979

975

Fig. 5. Code comparison: string manipulation: (a) C code; (b) Python code.

languages. Fig. 5 demonstrates the versatility and convenience of the Python built-in string data type. In the context of scientific computing we have strongly used Python’s string capabilities for automatic code generation, program control, text processing and data import/export.

4. Project examples The concise code snippets in the previous section demonstrate the effectiveness of the Python language for small examples. In this section we present two larger projects carried out using the Pythonic approach in the area of large scale linear algebra. We put special emphasis on the readability of the uppermost interface of the code and the execution times. 4.1. Preconditioned iterative and direct solvers This section illustrates the solution of linear systems of equations (LSE) with PySparse and NumPy by solving the Poisson equation discretised on the unit square in two ways: directly and iteratively. As demonstrated in Section 3.2 the sparse system matrix can be created by assigning individual entries using convenient index syntax. Alternatively, PySparse matrix objects can also be created by reading files in Matrix-Market coordinate format, which creates an

ll mat object: import spmatrix A = spmatrix.ll mat from mtx (’poi2d 200.mtx’) PySparse, as mentioned earlier, provides an interface to the sparse direct solver SuperLU. In a first example we demonstrate how the LSE can be solved directly by system matrix factorisation and subsequent back- and forward substitution. The SuperLU package accepts matrices in CSR format, to which end A is converted using the to csr( ) member function. The other parameters instruct SuperLU to perform MMD reordering and skip pivoting, which is justifiable for SPD matrices. import superlu Alu = superlu.factorize(A.to csr( ), permc spec=2, diag pivot thresh=0.0) The resulting object Alu holds the lower- and upper-triangular factors, L and U respectively. Its nnz attribute reflect the total number of non-zero values in the latter. The dense data-structures required during the solution process are provided by the NumPy extension. We create a right hand side b of appropriate size with all unit entries. The ’d’ parameter specifies that a vector

976

O. Br¨oker et al. / Future Generation Computer Systems 21 (2005) 969–979

of double precision values is created. import Numeric m = A.shape[0] b = Numeric.ones(m, ’d’) We allocate an additional vector (initialised with zero entries) x for storing the solution. To compute it by means of back- and forward substitution, the solve method of Alu is called. x = Numeric.zeros(n, ’d’) Alu.solve(b, x) Note that the Alu object could be reused to solve the system for other right hand sides. Next the residual r = Ax − b and its Euclidean norm are calculated. import math r = Numeric.zeros(n, ’d’) A.matvec(x, r) r -= b nrm 2 = math.sqrt(Numeric.dot(r, r)) As mentioned before, PySparse includes a number of iterative methods for solving LSEs. In the following we will show how to solve our 2D-Poisson system with the Preconditioned Conjugate Gradient method (PCG). We begin by constructing a pcg object specifying all necessary parameters for the CG iteration. import itsolvers util Acg = itsolvers util.pcg(A.to sss( ), 1e-9, 100, None, 1) Note that the system matrix is first converted to SSS format for efficient matrix–vector multiplication. The actual CG iteration is then started by calling the solve method of Acg. x = Numeric.zeros(n, ’d’) Acg.solve(b, x) To apply preconditioning, in this case symmetric successive overrelaxation (SSOR), a preconditioner object has to be created, which is then passed as argument to the constructor of the solver object:

import precon Assor = precon.ssor(A.to sss( )) Acg = itsolvers util.Pcg(A.to sss( ), 1e-9, 100, Assor, 1) x = Numeric.zeros(n, ’d’) Acg.solve(b, x) As can be seen from the above code sequences, the solve step is invoked by a mere call to the solve method of the corresponding solver object. Thus a solver object can easily be integrated into a larger application using the above mentioned interface. Only the creation of the solver objects needs to be specialised. We conclude the section by demonstrating the simplicity of creating a new preconditioning operator. The class diag prec implements a Jacobi preconditioner in the PySparse framework: class diag prec: def init (self, A): self.shape = A.shape m = self.shape[0] self.dinv = Numeric.zeros(m, ’d’) for i in xrange(m): self.dinv[i] = 1.0 / A[i,i] def precon(self, x, y): Numeric.multiply(x, self.dinv, y) 4.1.1. Performance comparison of Python, MATLAB and C++ In order to validate our approach in terms of efficiency, we benchmark two code snippets for constructing a sparse matrix and solving the corresponding LSE, using three approaches: our Pythonic mixed language approach, using MATLAB’s sparse matrix facilities and using pure C++. The experiments were conducted on a standard Linux system with one 2.4 GHz Intel Pentium 4 processor, 1 GB of main memory, and 512 KB L2 cache. The first code snippet constructs a 2D-Poisson matrix and consists of a double-nested loop with up to five matrix elements assignments per inner iteration.5 The second code snippet solves the linear system with the 2D-Poisson matrix using the conjugate gradient method (CG) without preconditioning. The CG 5 The actual code is very similar to the snippet constructing a 1DPoisson matrix shown in Section 3.2.

O. Br¨oker et al. / Future Generation Computer Systems 21 (2005) 969–979 Table 1 Execution times (in seconds) for constructing and solving 2DPoisson systems Size, n

1002 3002 5002 10002

Native C++

PySparse

MATLAB 6.5

tconstr

tsolve

tconstr

tsolve

tconstr

tsolve

0.005 0.063 0.186 0.936

0.19 6.21 26.93 218.56

0.06 0.58 1.60 6.63

0.21 6.65 30.91 248.73

4.51 608.53 4648.10 N/A

0.62 19.59 83.45 687.27

iteration is stopped when the initial residual norm is reduced by a factor of 10−8 . The numbers in Table 1 show that the matrix construction in Python is roughly an order of magnitude slower than its C++ counterpart. This is due to the fact that the Python virtual machine spends most of the execution time interpreting the double-nested loop. Only little time is actually spent for assigning the matrix element, which is implemented in C. The solution times of the Pythonic approach and the pure C++ implementation are comparable. Most of the work is carried out in the CG iteration and actually consists of executing matrix–vector products. Here, the ratio of interpretation overhead to core numerical work is by far more favourable than in the previous construction phase. In conclusion, it is mandatory to write time critical code sections in native C/C++ code (or any other near operating-system level language) in order to achieve good performance. MATLAB implements sparse matrices using the compressed sparse column (CSC) format. This format not only keeps memory consumption low, but also allows for an efficient matrix–vector multiplication due to its coherent storage layout. Unfortunately, inserting new matrix elements requires numerous timeconsuming memory operations—a property which renders this storage scheme unapt for matrix assembly. Consequently, the resulting construction by elementwise insertion, as can be seen from the timings in Table 1, cannot compete with more appropriate datastructures like ll mat.6 For the solution phase the performance gap is smaller, but MATLAB still is roughly three times 6 Using the spdiags or the kron commands are ways to achieve better performance for this example. However, they are restricted to the very special cases of quite uniform sparse matrices, which usually do not occur in research applications.

977

slower than PySparse or native C++. Detailed timings revealed that both the matrix–vector multiplication and the interpreted CG solver are considerably slower than their PySparse counterpart. An exact analysis of this behaviour is however not possible since relevant parts of MATLAB’s source code are undisclosed. 4.2. Algebraic multigrid Multigrid (MG) methods are fast linear iterative solvers based on the multilevel or multi-scale paradigm, see e.g. [9]. The typical application for MG is the numerical solution of partial differential equations (PDEs) in two or more dimensions. The algorithm can be applied in combination with any of the common discretisation techniques, in which case it is among the fastest solution techniques known today. In contrast to other methods, MG is general in that it can treat problems on arbitrary regions, with various boundary conditions and possibly strongly varying coefficients. MG is also directly applicable to non-symmetric and nonlinear systems of equations. For suited problems, MG exhibits convergence rates that are independent of the number of unknowns in the discretised system. It is therefore an optimal method. In combination with nested iteration it can solve systems to machine precision in a number of operations that is proportional to the number of unknowns. See [9] for more details on the (A)MG method. For applying MG the user needs to carefully construct the problem on a nested sequence of coarse grids, assemble transfer operators, and define smoothers on each level—a process that can unduly complicate matters. Lack of coarse geometries or mismatch of the components are just two of the reoccurring difficulties of geometric MG. In their seminal paper [10] Ruge and St¨uben introduced an extension of multigrid where no PDE and no geometrical problem background is needed to construct the multilevel hierarchy. Such algebraic multigrid (AMG) methods construct their operator hierarchy directly from the system matrix and thus become true black box solvers for sparse matrices. As structured geometric grids for complex geometries are difficult to generate, application code designers often turn to very large unstructured grids. Yet the lack of a natural grid hierarchy prevents the use of standard multigrid. In this context, AMG is often seen as the most promising method for solving large-scale problems. We have

978

O. Br¨oker et al. / Future Generation Computer Systems 21 (2005) 969–979

Fig. 6. Performance timings of three AMG codes: (a) setup times; (b) solution times.

implemented AMG using the Pythonic approach as described earlier. Our Pythonic implementation of AMG is called WolfAMG [11] and features the classical Ruge/St¨uben algorithm with a set of smoothers, interpolation operators, coarsening strategies, as well as several newly developed sparse approximate inverse techniques. Yet, Python’s functional interface with optional arguments allows for a very simple first AMG application:

For all levels beyond the second level, the smoother corresponding to the last element in the list is used. Copies of additional objects are obtained using the Prototype design pattern. Setup of the multigrid cycle for other modules works in the same manner. AMG codes are rare. We compare our implementation to the only two other open source codes that are general enough and freely available. The first fairly general AMG program, AMG1R5, is described in [10].

import fdiff, amg A = fdiff.poisson2d (n=128)# 128x128 2D-Poisson problem K = amg.amg(A) # setup of the multigrid cycle u = K.solve( ) # solution phase AMG is naturally modular. The MG hierarchy is defined by a set of strong connections, a coarsening scheme, interpolation and smoothing methods, etc. These are typically the same for all levels, but in special cases it can be useful to use different modules on different levels. However, one does not know in advance how many levels are to be created. Using Python we solved this approach by using the intrinsic sequence type, which can contain any object, and the Prototype design pattern, see [12]. Here’s an example of how to specify the smoother: K = amg.amg(A) Sjac = smoother.jac(n=3, omega=0.6) Sgs = smoother.gs(n=2) K.setS([Sjac, Sgs], [Sgs]) u = K.solve(b) In this example we use the setS method of an amg object to set the pre- and post-smoother. It accepts a list of smoothing objects which define part of the MG cycle.

It is a 4600 line Fortran77 implementation of the original Ruge/St¨uben approach. ML is a parallel multigrid equation solver/preconditioning package,7 that provides several different multigrid methods including three algebraic schemes: smoothed aggregation, edge element multigrid, and classical AMG. We compare setup and solution times for increasing grid sizes. One step of symmetric Gauss–Seidel was chosen as a smoother. We tuned the tolerance of the iteration in such a way that all three codes reduce the absolute residual norm of the solution to the same order of magnitude, i.e. 10−8 . The results depicted in Fig. 6 show the setup time tsup and solution time tsol required for solving the Poisson problem on a regular grid against the gridsize n. The graphs show clearly that there is no significant difference between the performance of these codes. From practical experience with AMG on more complicated 7 See http://software.sandia.gov/Trilinos/packages/ml/index.html for more information.

O. Br¨oker et al. / Future Generation Computer Systems 21 (2005) 969–979

problems it becomes obvious that the solution time for a given problem with AMG is less dependent on the programming language or implementation used, rather than the proper use of AMG, i.e. the appropriate choice of components and parameters.

5. Conclusions We have presented the Pythonic approach, a mixed language approach for developing reusable numerical application components for large scale linear algebra software. This approach provides powerful scripting capability of numerical codes, while at the same time retaining only a minimal overhead if all timeconsuming calculations are carried out in C or C++. In particular, we show how our implementation of a sparse matrix library combines the natural syntax of MATLAB’s sparse matrix interface, while at the same time reaching a performance similar to that of pure C/C++ implementations. We show this for two larger projects: an iterative solver package and an algebraic multigrid solver. Additionally the approach is opensource based, where development of code is transparent and does not require a commercial software license.

References [1] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J.D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, D. Sorensen, LAPACK Users’ Guide—Release 2.0, LAPACK, Philadelphia, PA, 1994. Software and guide are available from Netlib at URL http://www.netlib.org/lapack/. [2] P. Alpatov, G. Baker, C. Edwards, J. Gunnels, G. Morrow, J. Overfelt, R. van de Geijn, Y.-J.J. Wu, PLAPACK: parallel linear algebra package, in: Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, Minneapolis, MN, SIAM, Philadelphia, PA, 1997, p. 8 (electronic). [3] M. Heroux, R. Bartlett, V.H.R. Hoekstra, J. Hu, T. Kolda, R. Lehoucq, K. Long, R. Pawlowski, E. Phipps, A. Salinger, H. Thornquist, R. Tuminaro, J. Willenbring, A. Williams, An overview of Trilinos, Technical Report SAND2003-2927, Sandia National Laboratories, 2003. [4] E. Arge, A.M. Bruaset, P.B. Calvin, J.F. Kanney, H.P. Langtangen, C.T. Miller, On the numerical efficiency of C++ in scientific computing, Numer. Meth. Softw. Tools Ind. Math. 119 (1997) 93–119. [5] O. Br¨oker, Laufzeitvorhersagen f¨ur parallel Versionen des globalen Wettermodells GME, Master’s Thesis, Rheinische Friedrich-Wilhelms-Universit¨at Bonn, M¨arz 1998.

979

[6] L. Prechelt, An empirical comparison of seven programming languages, Computer 33 (10) (2000) 23–29. [7] R. Geus, The Jacobi–Davidson algorithm for solving large sparse symmetric eigenvalue problems, Ph.D. Thesis No. 14734, ETH, Zurich, 2002. [8] M.J. Grote, T. Huckle, Parallel preconditioning with sparse approximate inverses, SIAM J. Sci. Comput. 18 (3) (1997) 838–853. [9] U. Trottenberg, C. Oosterlee, A. Sch¨uller, Multigrid, Academic Press, 2001. [10] J.W. Ruge, K. St¨uben, Algebraic multigrid (AMG), in: S.F. McCormick (Ed.), Multigrid Methods, vol. 3 of Frontiers in Applied Mathematics, SIAM, Philadelphia, PA, 1987, pp. 73–130. [11] O. Br¨oker, Parallel multigrid methods using sparse approximate inverses, Ph.D. Thesis, ETH, Zurich, May 2003. [12] E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design Patterns, Addison-Wesley, 1995.

Oliver Br¨oker received his Dipl Inf degree from the RFW University of Bonn in 1998 and his PhD degree from the Swiss Federal Institute of Technology (ETH), Zurich, in 2003. He is currently holding a postdoctoral position at the Institute of Computational Science of the Department of Computer Science at ETH. His main research interests are: computational linear algebra (algebraic), multigrid methods, and software engineering. Oscar Chinellato received his Dipl Ing degree in Computer Science from the Swiss Federal Institute of Technology (ETH), Zurich, in 1999. He is currently working in the Institute of Computational Science of the Department of Computer Science at ETH, where he is writing a PhD thesis. His research interests focus on numerical methods for partial differential equations, software engineering, and numerical analysis in general. Roman Geus received his Dipl Ing degree and his PhD degree in Computer Science from the Swiss Federal Institute of Technology (ETH), Zurich, in 1996 and 2002, respectively. He is currently holding a postdoctoral position at the Large Research Facilities Department (GFA) at the PaulScherrer-Institute (PSI). He is involved in developing finite element applications and eigenvalue solvers for computing electromagnetic fields in accelerator cavities and interested in all aspects of software engineering in high-performance and parallel computing.