Automated scientific software scripting with SWIG

Automated scientific software scripting with SWIG

Future Generation Computer Systems 19 (2003) 599–609 Automated scientific software scripting with SWIG D.M. Beazley Department of Computer Science, U...

95KB Sizes 0 Downloads 87 Views

Future Generation Computer Systems 19 (2003) 599–609

Automated scientific software scripting with SWIG D.M. Beazley Department of Computer Science, University of Chicago, Chicago, IL 60637, USA

Abstract Scripting languages such as Python and Tcl are a powerful tool for the construction of flexible scientific software because they provide scientists with an interpreted problem solving environment and they provide a modular framework for controlling software components written in C, C++, and Fortran. However, a common problem faced by the developers of a scripted scientific application is that of integrating compiled code with an interpreter. To solve this problem, an extensible compiler, simplified wrapper and interface generator (SWIG), has been developed to automate the task of integrating compiled code with scripting language interpreters. SWIG requires no modifications to existing code and uses existing source to create bindings for nine different target languages including Python, Perl, Tcl, Ruby, Guile, and Java. By automating language integration, SWIG enables scientists to use scripting languages at all stages of software development and allows existing software to be more easily integrated into a scripting environment. Although SWIG has been in use for more than 6 years, little has been published on its design and the underlying mechanisms that make it work. Therefore, the primary goal of this paper is to cover these topics. © 2002 Elsevier Science B.V. All rights reserved. Keywords: Scientific software; SWIG; Scripting languages; Python; Interface compiler

1. Introduction One of the most difficult tasks faced by the developers of scientific software is figuring out how to make high-performance programs that are easy to use, flexible, and extensible. Clearly, these are desirable goals if scientists want to focus their efforts on solving scientific problems instead of problems related to software engineering. However, the task of writing such software is difficult—in fact, much more difficult than most software engineers are willing to admit. An increasingly popular approach to solving many of these problems is to use scripting languages such as Python or Tcl as a high-level steering language for software written in C, C++, or Fortran [10,12,13,17,19,23]. Scripting languages provide a E-mail address: [email protected] (D.M. Beazley).

programmable user interface that is interactive and which allows a scientist to more easily coordinate complicated simulation and analytical tasks. In addition, scripting languages provide a framework for building loosely-coupled software components and gluing those components together [16]. This often makes it easier to incorporate simulation, data analysis, visualization, and data management into an application. Scripting languages also have the benefit of being portable and straightforward to deploy on high-performance computing systems including supercomputers and clusters. Initially, many scientists are drawn to scripting languages because they are viewed as the next logical step in the user interface of a program. For example, when programs are first developed, they are often simple batch jobs that rely upon command line options or files of input parameters. As the program grows,

0167-739X/02/$ – see front matter © 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 7 3 9 X ( 0 2 ) 0 0 1 7 1 - 1

600

D.M. Beazley / Future Generation Computer Systems 19 (2003) 599–609

scientists usually want more flexibility to configure the problem so they may modify the program to ask the user a series of questions. They may even write a simple command interpreter for setting up parameters. Scripting languages build upon this by providing an interface in the form of a fully-featured programming language that provides features similar to that found in commercial scientific software such as MATLAB, Mathematica, or IDL. In fact, many scientists add an interpreter to their application just so they can emulate the interactive debugging, prototyping, and exploratory environment provided by these systems. At a deeper level, the use of scripting languages may be driven by the piecemeal software development process that characterizes a lot of scientific software projects. Rarely do scientists set out to create a generalized program for solving every possible problem. Instead, programs are created to solve a specific problem of interest. Later, if a program proves to be useful, it may be gradually adapted and extended with new features in order to solve closely related problems. Although there might be some notion of software design, a lot of scientific software is developed in a relatively ad hoc manner where features are added as they are needed as opposed to being part of a formal design process. Scripting languages are a good match for this style of development because they support a loosely-coupled approach to software components that can be used effectively even when the underlying software is messy, incomplete, or under continual development. This is a subtle, but important, difference from the approach often promoted by the software component community—an approach that typically involves a more formal design methodology.

2. The problem with scripting The primary feature of scripting languages that enables them to be used with scientific software is that they allow foreign code to be accessed through a special extension API. To access compiled code from an interpreter, a programmer simply has to write a collection of special wrapper functions. The role of these functions is to convert arguments and return values between the data representation in each language. For example, if a programmer wanted to access the cosine function from Python, they might write a wrapper

like this: PyObject ∗ wrap cos(PyObject ∗ self, PyObject ∗ args) { double x, result; if (!PyArg ParseTuple(args, ‘‘d’’, &x)) return NULL; result = cos(x); return Py BuildValue (‘‘d’’,result); } Although it is not very difficult to write a few dozen wrappers, the task becomes tedious if an application contains several hundred functions. Moreover, the task is considerably more difficult if an application uses advanced programming features such as pointers, arrays, classes, inheritance, templates, and overloaded operators. Because of the need to write wrappers, it is difficult to integrate existing software into a scripting environment without a considerable coding effort. In addition, scientists may be reluctant to use scripting languages in the early stages of program development since it will be too difficult to keep the wrapper code synchronized with program modifications. However, it is precisely this stage of software development in which the use of a scripting interpreter may be the most useful as a prototyping, debugging, and exploration tool. 3. SWIG: a compiler for extensions An obvious way to simplify the use of scripting languages is to automate the generation of wrapper code. SWIG (simplified wrapper and interface generator) is a special purpose C++ compiler that has been developed for this purpose [4]. Originally developed in 1995, SWIG was first used to build scripting language bindings for the SPaSM short-range molecular dynamics code at Los Alamos National Laboratory [5]. This was one of the first large-scale parallel applications to utilize Python as a framework for computational steering and integrated data analysis [6]. Since then, SWIG has been developed as a free software project and is used in a wide variety of software projects ranging from scientific simulations to video games. SWIG works by compiling C/C++ header files directly into scripting language wrappers. This process is fully automated and requires few, if any, code modifications. As a result, SWIG can be incorporated into a

D.M. Beazley / Future Generation Computer Systems 19 (2003) 599–609

project in a minimally intrusive manner and it allows scientists to focus on the problem at hand instead of language integration issues. Furthermore, SWIG tries to maintain a clean separation between the underlying application and the interpreter. Because of this, it promotes modularity and allows software to be reused in different settings. This also allows scripting languages to be used for prototyping, testing, and other tasks that are important for development, but which may not be part of a package that is delivered to end-users. It is impossible to describe every aspect of SWIG in this paper and examples of its use can be found in other sources [4]. However, in order to provide some background, the remainder of this section briefly highlights some of the more interesting SWIG features. The next section generalizes these features and describes the overall design of the compiler. 3.1. A simple example To illustrate the use of SWIG, suppose that a program defines a few C functions like this: int integrate(int nsteps, double dt); void set boundary periodic(); void init lj(double epsilon, double sigma, double cutoff); void set output path(char ∗ path); To build a scripting language interface, an user creates a special file containing SWIG directives (prefaced by a %) and the C++ declarations they would like to access. For example: // file: shock.i %module shock %{ #include‘‘shockwave.h’’ %} int integrate(int nsteps,double dt); void set boundary periodic(); void init lj(double epsilon, double sigma, double cutoff); void set output path(char ∗ path); To create a scripting language module from this input, SWIG is used as shown:1 1

The precise details of compilation and linking are platform dependent and are only shown to illustrate the process.

601

$ swig -python shock.i $ cc -c shock wrap.c $ cc -shared shock wrap.o $(OBJS) -o shockmodule.so $ python Python 2.1 (#1, Jun 13 2001, 16:09:46) >>> import shock >>> shock.init lj(1.0,1.0,2.5) >>> shock.set output path(‘‘./ Datafiles’’) >>> shock.set boundary periodic() In this example, a separate file is used to hold SWIG directives and the declarations to be wrapped. However, SWIG is also able to process raw header files or header files in which conditional compilation has been used to embed special SWIG directives. This makes it easier for scientists to integrate SWIG into their application since existing source code can be used as input. This also makes it easier to keep the scripting language interface synchronized with the application since changes to header files (and function/class prototypes) can be tracked and handled as a normal part of the build process. 3.2. Class and object wrapping For procedural programs, SWIG maps procedures and variables to equivalent objects in the target language. For instance, a wrapped C procedure shows up as a new scripting language command and a global variable appears as a scripting language variable. For C++, SWIG creates wrappers that mirror the class interface in the target language. For example, suppose that a program includes a class definition such as this: class Complex{ double rpart, ipart; public: Complex(double r = 0, double i = 0): rpart(r), ipart(i) {} ; double real(); double imag(); ... }; When wrapped, the resulting scripting interface allows an user to write code similar to the following (in Python):

602

D.M. Beazley / Future Generation Computer Systems 19 (2003) 599–609

w = Complex(3,4) #Create a complex number a = w.real() #Call a method del w #Delete To implement class wrapping, classes are first reduced to a collection of low-level procedure wrappers. For example: Complex ∗ new Complex(double r = 0, double i = 0) { return new Complex(r,i); } void delete Complex(Complex ∗ self) { delete self; } double Complex real(Complex ∗ self) { return self->real(); } ... Next, these procedures are used to construct a proxy class in the target language. For instance, in Python: class Complex: def init (self, ∗ args): self.this = new Complex(∗ args) self.thisown = 1 def del (self): if self.thisown: delete Complex(self.this) def real(self): return Complex real(self.this) ... Within the proxy class, a reference to the actual C++ object is held in a special attribute (.this). This reference is represented as a typed pointer object that contains both the value of the pointer and a type tag. For example, a Complex ∗ in the above example might be encoded as a string containing the pointer value and type such as 100f8ea0 p Complex. Type information is used in the wrapper code to perform run-time type checking. Type checking follows the same rules as the C++ type system including rules for inheritance, scoping, and typedef. In addition, SWIG fully supports issues related to multiple inheritance including type-safe casting of pointers from a derived class to any of its base classes (which may change the value of the pointer). Type violations result in a run-time scripting exception.

To handle memory management issues, proxy classes also have an attribute to indicate ownership. When newly created objects are returned to the interpreter, the interpreter is responsible for releasing the underlying C++ object. This would occur when the interpreter performed garbage collection on the proxy object. In certain cases, proxy objects refer to C++ objects that were not allocated from the interpreter. For instance, a proxy object might refer to an element of a C++ array. In this case, the proxy does not have ownership and its destruction has no effect on the underlying C++ object. If necessary, a programmer can manually adjust the ownership of an object by changing the ownership attribute of the proxy object. However, this is usually only needed in special cases where the management of an object is transferred from the interpreter to C++ or vice versa. It is important to note that the above description of procedure wrappers and proxy classes is only an abstract description of how SWIG generates code. In practice, the compiler does not actually generate a collection of extra procedures as shown. Instead, that code is inlined directly into the generated wrappers. Furthermore, the generation of a proxy class may occur entirely in C++ or it may be a mix of C++ and code in the target language. Certain optimizations may also be applied to reduce the amount of wrapper code or for performance. From the user’s perspective, these issues are merely implementation details and not critical to the way in which the extension module is used. The important point is that objects in C++ are exported as proxy object wrappers in the target language and that the user can use these objects in natural manner from scripts. 3.3. Special directives For more complicated applications, SWIG may need assistance in order to generate wrappers. These difficulties mostly arise due to subtle semantic differences between C++ and the target scripting language as well as incomplete information in header files. To handle these cases, SWIG provides a number of special directives that can be used to control various aspects of wrapper code generation. These directives may be placed anywhere in the input file and may be mixed with C++ declarations.

D.M. Beazley / Future Generation Computer Systems 19 (2003) 599–609

To illustrate, suppose that a programmer wanted to bind C++ overloaded operators to Python operators. One way to do this is to map the operators to special Python method names like this: %rename( add ) Complex::operator+ (const Complex &) const; %rename( sub ) Complex::operator− (const Complex &) const; %rename( neg ) Complex::operator− () const; ... class Complex { ... Complex operator+ (const Complex &c) const; // add Complex operator− (const Complex &c) const; // sub Complex operator− () const; // neg }; Similarly, certain features of the C++ interface may not be supported in the target language. For instance, the semantics of assignment in Python is completely different than C++. Therefore, a programmer may want to ignore assignment operators like this: %ignore operator=; // Ignore assignment everywhere Other programming features are more difficult to automate. For instance, there is no way to generate wrapper code for a C++ template. However, wrappers can be generated for a particular template instantiation. To do this, a special %template directive is used like this: template T max(T a, T b){ return a>b ? a : b; } template class vector { ... }; ... %template(maxint) max; / ∗ Functions ∗ / %template(maxdouble) max; %template(vecint) vector; /∗ Classes ∗ / %template(vecdouble) vector ;

603

The %template directive merely expands a template and gives it a suitable identifier name in the target language. Similarly, a variety of other minor directives are available to more precisely control mutability, memory management, and so forth. 3.4. Class extension and legacy software One of the more interesting aspects of SWIG is that it is able to repackage older procedural libraries into an object-based scripting API. For example, if an application used a procedural interface like this: typedef struct { double re, im; } Complex; Complex add complex(Complex a, Complex b); double real part(Complex a); the following SWIG interface creates a class-like scripting interface: typedef struct { double re, im; } Complex; /∗ extend the Complex structure with methods ∗ / %extend Complex { Complex(double r, double i) { Complex∗ c = (Complex ∗ ) malloc (sizeof(Complex)); c->re = r; c->im = i; return c; } ∼ Complex() { free(self); } double real() { return real part (∗self); } Complex add(Complex b) { return add complex(∗self,b); } }; In this case, the resulting scripting interface is class-based even though no changes were made to the underlying C code. Moreover, if the C code was later migrated to C++, this could often be done without breaking any scripts (since scripts would already be using an object-oriented interface).

604

D.M. Beazley / Future Generation Computer Systems 19 (2003) 599–609

3.5. Customization features For advanced users, it is sometimes desirable to modify SWIG’s code generator in order to provide better integration between the scripting environment and the underlying application. For example, an user might want to interface their code with an array package such as Numeric Python [9]. By default, SWIG does not know how to perform this integration. However, an user can customize the code generator using a special directive known as a typemap. A typemap changes the way that SWIG converts data in wrapper functions. For instance, suppose an application had several functions like this: void settemp(double ∗ grid, int nx, int ny, double temp); double avgtemp(double ∗ grid, int nx, int ny); void plottemp(double ∗ grid, int nx, int ny, double mn, double mx); Now suppose that a programmer wanted to pass a Numeric Python array as the grid parameter and associated nx and ny parameters. To do this, a typemap rule such as the following can be inserted into the SWIG interface file: %typemap(in) (double ∗ grid, int nx, int ny) { PyArrayObject ∗ array; if (!PyArray Check($input)) { PyErr SetString (PyExc TypeError, ‘‘Expected an array’’); return NULL; } array = (PyArrayObject ∗ ) PyArray ContiguousFromObject (input, PyArray DOUBLE, 2, 2); if (!array) { PyErrSetString(PyExcValueError, ‘‘array must be two-dimensional and of type float’’); return NULL; } $1 = (double ∗ ) array->data; /∗Assign grid ∗ / $2=array->dimensions[0];

}

/∗ Assign nx ∗ / $3=array->dimensions[1]; /∗ Assign ny ∗/

When defined, all subsequent occurrences of the argument sequence double ∗ grid, int nx, int ny are processed as a single numeric array object. Even though the specification of a typemap requires detailed knowledge of the underlying scripting language API, these rules only need to be defined once in order to be applied to hundreds of different declarations. Other customization features are associated with entire declarations. For example, if a programmer wanted to catch a C++ exception in a specific class method and turn it into a Python exception, they might write the following: %exceptObject::getitem{ try{$action} catch(IndexError) { PyErr SetString (PyExc IndexError, ‘‘bad index’’); return NULL; } } class Object{ virtual Item ∗ getitem(int index); }; In this case, the extra error checking code is inserted into the wrappers generated for the getitem() method of Object. In addition, this customization propagates down the inheritance hierarchy. Therefore, if Object was a base class, the exception handler code would be attached to any occurrence of getitem found in derived classes. This approach is especially convenient when interfaces are defined by abstract classes since customization features can be defined once for the abstract class and automatically inherited by derived classes. 4. A closer look at the compiler In the previous section, a number of simple examples were presented and many of SWIG’s customization directives were described. In part, these directives are needed due to the complexity of using C++ as

D.M. Beazley / Future Generation Computer Systems 19 (2003) 599–609

an interface definition language. Aside from the obvious problem of parsing C++, the real problem of using C++ is the complexity of its type system. Types are the focus of marshalling data between languages. Furthermore, the type system is the foundation of many advanced C++ issues including memory management, overloading, templates, and inheritance. To illustrate some of the complexity that can arise, consider the following declarations: typedef struct ParticleImpl Particle; typedef double Real; /∗ Pair-wise force function ∗ / typedef Real (∗ forcefunc t) (Particle ∗, Particle ∗ ); /∗ Array of pair-wise interactions indexed by particle type ∗/ typedef forcefunc t interactions t [MAXPART][MAXPART]; /∗ Compute forces ∗/ void compute forces(Particle ∗ pt, interactions t force funcs); The first problem with this code is that typedef declarations allow types to be referenced under many different names. However, this is only a minor complication in the implementation. A more problematic issue is that there is often no obvious way to marshal certain data types into a scripting language representation. For instance, how is a pointer to a function supposed to be handled or a two-dimensional array of function pointers? Similarly, when a parameter such as Particle ∗ appears there is no way for SWIG to know exactly what that means. Clearly, it is a pointer to a particle, but is it a single particle? Is it an array of particles? Is it an output value? Moreover, how do you represent a particle in the target language? Clearly, these issues affect the way in which one would like to generate wrapper code. However, by looking at the C++ headers in isolation, there is no real way for SWIG to obtain a deeper understanding of the program. In order to solve the problem of data representation, SWIG’s default behavior is to export all nonprimitive types as opaque type-tagged pointers. This avoids the problem of marshalling and allows objects of any type to be manipulated from the interpreter. A run-time type system is used to make sure that only pointers of

605

an acceptable type are passed to the wrappers (type mismatches result in an exception as described in the last section). To better guide the interface building process, special SWIG directives are used (as previously described). However, most of these directives are simply variations of a few low-level primitives that interact with the C++ type system. The first of these are typemaps, which were already presented as an advanced customization feature. Internally, all type conversion in SWIG is defined by typemap patterns. For instance, to convert floating point numbers, typemap patterns such as the following are defined: /∗ Convert from Python to C ∗/ %typemap(in) double { $1 = PyFloat AsDouble($input); } /∗ Convert from C to Python ∗/ %typemap(out) double { $result = PyFloat FromDouble($1); } These patterns are then applied to all occurrences of double. If necessary, typemaps can be specialized with parameter names. For instance: %typemap(in) double nonnegative { $1 = PyFloat AsDouble($input); if ($1 < 0){ PyErr SetString (PyExc ValueError, ‘‘domain error’’); return NULL; } } ... double sqrt(double nonnegative); // Uses the above rule double sin(double x); // Uses the default rule This specialization of typemaps allows additional semantics to be specified merely by following certain naming convention in the interface. For example: /∗ Return values from arguments ∗/ %typemap(argout)int ∗ OUTPUT {. . . } ...

606

D.M. Beazley / Future Generation Computer Systems 19 (2003) 599–609

void getsize(Window ∗ w, int ∗ OUTPUT, int ∗ OUTPUT); In addition, typemaps can be defined for general classes of C++ data types. For example, the following typemap matches all pointer types and would be used if no other mapping was defined: /∗ General typemap for all pointer types ∗/ %typemap(in) SWIGTYPE ∗ { ... } It is important to emphasize that typemap patterns are fully integrated into the SWIG type system. For instance, to handle C++ string objects, code such as the following might be used: namespace std { class string; %typemap(in)string ∗ (string temp) { temp = PyString AsString ($input); $1 = &temp; } } using std::string; void foo(string ∗ s, std::string ∗ t); In the example, the typemap remains attached to std::string no matter how that type is referenced in the input (i.e., the typemap still applies even if std::string is hidden by typedef, using, or namespace aliases). The second customization mechanism used by SWIG is a declaration annotator. A number of SWIG directives such as %rename, %ignore, and %exception all apply to specific declarations in the input file. In some sense, these directives are attaching “features” to certain declarations in the interface. To pinpoint declarations, a low-level declaration matching mechanism is used. Raw access to the declaration matcher is provided using a special %feature directive. For instance, the %exception directive is really the same as this low-level primitive: %feature(‘‘except’’) Object::getitem{ try { $action } catch (IndexError) {

PyErr SetString (PyExc IndexError, ‘‘bad index’’); return NULL; }

}

Features can be parameterized with types and can be used to precisely pinpoint any declaration in an interface. Type parameterization was shown when illustrating the use of the %rename directive and operator overloading in Section 3.3. Like typemaps, declaration annotation is also integrated with the type system. For instance, features applied to base classes are propagated to derived classes as already described. Features can even be attached to templates. For instance: %feature(‘‘except’’) vector::getitem{ try { $action } catch (IndexError) { PyErr SetString (PyExc IndexError, ‘‘bad index’’); return NULL; } } /∗ Definition of a template. Gets the above feature added ∗/ template class vector { ... T & getitem(int index) { . . . } ... }; /∗ Instantiation. Above feature added to all instantiations ∗/ %template(vecint) vector; %template(vecdouble) vector; The final low-level customization mechanism is class extension and the %extend directive. This directive simply expands existing class and structure definitions with new attributes. For instance, an user can use this to annotate a class definition with additional methods or properties that are useful in building the scripting interface, but which would not be part of the original C++ class definition. To the uninitiated, SWIG’s special directives often appear unrelated and haphazard. However, almost all of SWIG’s directives reduce to one of the above

D.M. Beazley / Future Generation Computer Systems 19 (2003) 599–609

607

primitives. In fact, a large number of SWIG special directives are nothing more than preprocessor macros that expand to %typemap or %feature directives.

However, in order to provide more advanced wrapping of classes and structures, a language module will generally implement more handler methods.

5. SWIG implementation

6. Case study

The SWIG front-end consists of an extended C++ preprocessor and parser. These components differ from a traditional implementation due to issues related to mixing special SWIG directives and C++ declarations in the same file. For instance, certain information from the preprocessor is used during code generation and certain syntactic features are parsed differently in order to make SWIG easier to use. Also, since SWIG is primarily concerned with interfaces, it does not parse function bodies (although that code is captured as a string and used for certain features such as class extension). Internally, SWIG builds a complete parse tree and provides a traversal API similar to that described in the XML-DOM specification [14]. Nodes are built from hash tables that allow them to be annotated with arbitrary attributes at any stage of parsing and code generation [3]. This annotation of nodes is the primary mechanism used to implement the SWIG declaration annotator. Code generation is guided by a multi-stage compilation process. During generation, parse tree nodes are handled by a hierarchical sequence of handler functions that may elect to generate wrapper code directly or to transform the node and forward it to another handler. The behavior of each target language is defined by implementing selected virtual methods in a C++ class. Minimally, a language module only needs to implement handlers for generating low-level function and variable wrappers. For example:

Most of the early work on SWIG was conducted as part of the SPaSM project at Los Alamos National Laboratory [5]. SPaSM is code that was originally developed in 1992 for performing large-scale parallel molecular dynamics simulations on the Connection Machine 5. In order to address problems with data analysis, SWIG was used to build an interactive scripting environment that integrated simulation, data analysis, and visualization together in a common framework controlled by Python [6,7]. In this implementation, the core of the system consisted of 24,000 lines of ANSI C spread across five Python extension modules. Each of these modules was built entirely by SWIG—corresponding to approximately 28,000 lines of automatically generated wrapper code. The construction of this code was entirely hidden from users—being primary generated from header files during the make process. By hiding the generation of the wrapper code, users could make modifications to the system (e.g., adding new functions or changing calling conventions) without having to worry about the underlying Python wrappers. Instead, the wrappers were simply regenerated as necessary. As for the performance impact of adding scripting language to this application, it is important to realize that the scripting language is primarily a mechanism for controlling and debugging the application. It is not used to implement low-level data structures or performance critical algorithms. Therefore, even if a command is issued from an interpreter, the performance penalty is often negligible. Obviously, the precise performance impact depends on the application and the number of computational cycles performed by each scripting language operation. However, in the case of SPaSM, no significant performance degradation was observed—even when the entire outer simulation loop was implemented in Python [7].

class MinimalLanguage: public Language { public: void main(int argc, char ∗ argv[]); int top(Node ∗ n); int functionWrapper(Node ∗ n); int constantWrapper(Node ∗ n); int nativeWrapper(Node ∗ n); };

608

D.M. Beazley / Future Generation Computer Systems 19 (2003) 599–609

7. Limitations SWIG is primarily designed to support software development in C and C++. The system can be used to wrap Fortran as long as the Fortran functions are described by C prototypes as might be generated by a tool such as f2c. If wrapping Fortran, there are other similar tools that can be used [11,18]. SWIG is also not a full C++ compiler. Certain advanced C++ features such as nested classes are not yet supported. Furthermore, features such as overloaded operators and templates may require the user to supply extra directives (as illustrated in earlier sections). More complicated wrapping problems arise due to C++ libraries that rely heavily on generic programming and templates such as the STL or Blitz++ [21,22]. Although SWIG can be used to wrap programs that use these libraries, providing wrappers to the libraries themselves is by no means automatic and requires user direction (it might not even make sense).

paces, and other type-related declarations that appear in real programs. SWIG’s use of typemaps to define type conversion is largely borrowed from the xsubpp tool used to create Perl extension modules [20]. However, typemaps in xsubpp are not really typemaps at all—instead they are just simple string-based regular expressions that are applied to the input text and not any underlying notion of a type. In SWIG, typemaps are fully integrated into the C++ type system which makes their behavior much more powerful and well defined. Work similar to SWIG can be found in the metaprogramming community. For example, the Open C++ project aims to expose the internals of C++ programs so that tools can use that information for other tasks [8]. Using such information, it might be possible to generate scripting language wrappers in a manner similar to SWIG. Certain aspects of SWIG (especially declaration annotation) are also similar to recent research in aspect oriented programming (AOP) [1].

8. Related work

9. Conclusions and future work

In the scripting language community, a variety of projects have attempted to tackle the problem of generating wrapper code. Most scripting languages have tools that can assist in the creation of simple extension modules. However, few of these tools are designed to target multiple target languages and most are more limited in their capabilities. Scripting language extension building tools can sometimes be found in application frameworks such as Vtk [15]. However, these tend to be tailored to the specific features of the framework and tend to ignore programming features required for them to be more general purpose. A number of tools have been developed specifically for scientific applications. For example, pyfort and f2py provide Python wrappers for Fortran codes and the Boost Python Library provides an interesting alternative to SWIG for creating C++ class wrappers [2,11,18]. Of tools that attempt to work from native source code, the most serious limitation is their lack of proper type system support. For instance, several tools have focused on the problem of parsing C++, but completely ignore aspects of the type system that are needed to correctly handle typedef, names-

SWIG is a compiler that simplifies the integration of scripting languages with scientific software. It is particularly well suited for use with existing software and supports a wide variety of C++ language features. SWIG also promotes modular design by maintaining a clean separation between the scripting language interface and the underlying application code. By using an automatic code generator such as SWIG, computational scientists will find that it is much easier to utilize scripting languages at all stages of program development. Furthermore, the use of a scripting language environment encourages modular design and allows scientists to more easily construct software that incorporates features such as integrated data analysis, visualization, database management, and networking. Since its release in 1996, SWIG has been used in hundreds of software projects. Space prevents a detailed case study here. However, further information is available from the SWIG web page. Currently, the system supports nine different target languages including Guile, Java, Mzscheme, OCAML, Perl, PHP, Python, Ruby, and Tcl. Future work is focused on improving

D.M. Beazley / Future Generation Computer Systems 19 (2003) 599–609

the quality of code generation, providing support for more languages, and adding reliability features such as contracts and assertions. More information about SWIG is available at: http://www.swig.org.

[9] [10] [11]

Acknowledgements [12]

Many people have helped with SWIG development. Major contributors to the current implementation include William Fulton, Matthias Köppe, Lyle Johnson, Richard Palmer, Luigi Ballabio, Jason Stewart, Loic Dachary, Harco de Hilster, Thien-Thi Nguyen, Masaki Fukushima, Oleg Tolmatcev, Kevin Butler, John Buckman, Dominique Dumont, David Fletcher, Art Yerkes, Marcelo Matus, and Gary Holt. SWIG was originally developed in the Theoretical Physics Division at Los Alamos National Laboratory in collaboration with Peter Lomdahl, Tim Germann, and Brad Holian. Development is currently supported by the Department of Computer Science at the University of Chicago. This paper is an expanded version of the paper “An Extensible Compiler for Creating Scriptable Scientific Software” presented at ICCS’02 in Amsterdam and submitted to FGCS by invitation.

[13]

[14] [15]

[16] [17]

[18]

[19]

[20]

References [21] [1] Aspect-oriented software development. http://www.aosd. net. [2] D. Abrahams, The Boost Python Library. http://www.boost. org/libs/python/doc/. [3] A. Aho, R. Sethi, J. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, Reading, MA, 1986. [4] D. Beazley, SWIG: an easy to use tool for integrating scripting languages with C and C++, in: Proceedings of the Fourth USENIX Tcl/Tk Workshop, 1996, pp. 129–139. [5] D. Beazley, P. Lomdahl, Message-passing multi-cell molecular dynamics on the connection machine 5, Parallel Comput. 20 (1994) 173–195. [6] D. Beazley, P. Lomdahl, Lightweight computational steering of very large scale molecular dynamics simulations, in: Proceedings of the Supercomputing’96, IEEE Computer Society Press, Silver Spring, MD, 1996. [7] D. Beazley, P. Lomdahl, Controlling the data glut in large-scale molecular dynamics simulations, Comput. Phys. 11 (3) (1997) 230–238. [8] S. Chiba, A metaobject protocol for C++, in: Proceedings of the ACM Conference on Object-oriented Programming

[22]

[23]

609

Systems, Languages, and Applications (OOPSLA), 1995, pp. 285–299. P. Dubois, K. Hinsen, J. Hugunin, Numerical python, Comput. Phys. 10 (3) (1996) 262–267. P. Dubois, The future of scientific programming, Comput. Phys. 11 (2) (1997) 168–173. P. Dubois, Climate data analysis software, in: Proceedings of the Eighth International Python Conference, 2000. F. Gathmann, Python as a discrete event simulation environment, in: Proceedings of the Seventh International Python Conference, 1998. K. Hinsen, The molecular modeling toolkit: a case study of a large scientific application in Python, in: Proceedings of the Sixth International Python Conference, 1997, pp. 29–35. S. Holzer, Inside XML, New Riders Publishing, 2001. K. Martin, Automated wrapping of a C++ class library into Tcl, in: Proceedings of the Fourth USENIX Tcl/Tk Workshop, 1996, pp. 141–148. J. Ousterhout, Scripting: higher-level programming for the 21st century, IEEE Comput. 31 (3) (1998) 23–30. M. Owen, An open-source project for modeling hydrodynamics in astrophysical systems, IEEE Comput. Sci. Eng. 3 (6) (2001) 54–59. P. Peterson, J. Martins, J. Alonso, Fortran to Python interface generator with an application to aerospace engineering, in: Proceedings of the Ninth International Python Conference, 2000. D. Scherer, P. Dubois, B. Sherwood, VPython: 3D interactive scientific graphics for students, IEEE Comput. Sci. Eng. 2 (5) (2000) 56–62. S. Srinivasan, Advanced Perl Programming, O’Reilly and Associates, 1997. B. Stroustrup, The C++ Programming Language, 3rd ed., Addison-Wesley, Reading, MA, 1997. T. Veldhuizen, Arrays in Blitz++, in: Proceedings of the Second International Scientific Computing in Object-oriented Parallel Environments (ISCOPE’98), Springer, Berlin, 1998. R. White, P. Greenfield, Using Python to modernize astronomical software, in: Proceedings of the Eighth International Python Conference, 1999.

D.M. Beazley is an Assistant Professor in the Department of Computer Science at the University of Chicago. He received his PhD in Computer Science from the University of Utah. Prior to joining the University of Chicago, he worked in the Theoretical Physics Division at Los Alamos National Laboratory. His research interests include software architecture, programming tools, operating systems, and computational science.