Science of Computer Programming 98 (2015) 531–550
Contents lists available at ScienceDirect
Science of Computer Programming www.elsevier.com/locate/...
Science of Computer Programming www.elsevier.com/locate/scico
EVL: A framework for multi-methods in C++ Yannick Le Goc a , Alexandre Donzé b,∗ a b
Institut Laue-Langevin, 6 rue Jules Horowitz, BP 156, 38042 Grenoble Cedex 9, France University of California, Berkeley, EECS Department, Cory Hall 545S, 94720 Berkeley, CA, USA
a r t i c l e
i n f o
Article history: Received 3 April 2012 Received in revised form 9 August 2014 Accepted 11 August 2014 Available online 4 September 2014 Keywords: Multi-methods Multiple dispatch Runtime type information Object-oriented programming C++
1. Introduction In object-oriented programming (OOP), it is common to define some member functions as virtual to allow them to be overridden by differing concrete implementations in derived classes; a call to a virtual function is then resolved to one of its concrete implementations at runtime based on the dynamic type of the object on which the call is made. Multiple dispatch can be seen as an extension of this selection mechanism, where the concrete implementation to use at runtime depends on
Y. Le Goc, A. Donzé / Science of Computer Programming 98 (2015) 531–550
the dynamic type of more than one argument. It has been implemented in different languages with various terminologies. In CLOS [1] and Dylan [2], generic functions implement multiple dispatch. In Cecil [3] and MultiJava [4], such method families are called multi-methods. The most popular object-oriented languages, including C++ [5] and Java [6], propose a built-in single dispatch mechanism implemented by class member functions (virtual functions in C++, non-static methods in Java), but do not support the runtime resolution of functions that depend on the dynamic type of more than one argument. Potential applications of multiple dispatch are (see [7] for more examples):
• The traversal of a graph data structure where nodes are instances of different polymorphic classes without defining a process function in each node class;
• The definition of an event handler for a widget toolkit by defining functions depending on two dynamic arguments (event, widget);
• The definition of binary comparison operators. Each example has its own specifics. The traversal example can be found in 3D image synthesis, where scenes are modeled with an OOP data structure called a scene graph and for which a number of different rendering algorithms (defined into the process functions) can be used. The well-known visitor pattern [8], which makes use of common features of object-oriented languages, is functional and relatively convenient, but it introduces additional dependencies between the node class to process and the visitor class. The event handler and the binary operator examples are typical problems requiring the use of double dispatch. As an example of a triple dispatcher, one can consider the event handler problem with an additional polymorphic argument (its parameter type is polymorphic) representing a context state. Multiple dispatch is an advanced topic in OOP. As a consequence, training programmers to use such a tool is a problem by itself; this was addressed, e.g., by Chambers in [9] but is still an ongoing concern. Despite the fact that the concept has existed for years, there are still many remaining issues with multiple dispatch and many ways of defining it; it is also not clear how popular it is among practitioners. Muschevici et al. [10] present a cross-language comparison on how multiple dispatch is used in programs. By defining code metrics, they are able to provide statistics over applications coded in different languages. The results tend to show that when multi-methods are available in a language, they are used. We can thus infer that once multi-methods are implemented in C++, they will be used. Moreover Muschevici et al. studied the potential use of multi-methods in the Java language. For that they made statistics on the use of explicit multiple dispatch by implementations of the visitor pattern and implicit multiple dispatch by cascaded use of the instanceof operator. The results showed that cascaded instanceof operators are more often used than the visitor pattern. The first issue for multiple dispatch is the resolution of ambiguities when looking for the best matching function, i.e., when more than one function matches but it is not possible to decide which one to call. In C++, it is usual to encounter this situation. For instance, if we define two non-member overloaded functions foo with a single argument of type float for the first function and double for the second function and we try to apply foo to an int, we get an ambiguous call error at compile-time. When multiple dispatch is defined as an extension of single dispatch, the natural idea is to have a compiler that provides compilation errors when it detects ambiguities. However, detecting ambiguities with multi-methods is not as easy as for single dispatch functions, which are defined in the scope of a class hierarchy. It may not even be possible for the compiler to check for ambiguities at compile-time, since it has to check all possible tuples of types, but their exhaustive list is only available at link-time. In the presence of dynamic class loading, this list can even vary at runtime. Furthermore, it can be very large, increasing the probability of ambiguities. Also, it is legitimate and plausible to define multi-methods for which ambiguous tuples of types exist but are intentionally never used together. An a priori systematic detection of possible ambiguities would forbid this use case. A common approach to ambiguity resolution is the definition of additional functions that forward their calls to the already defined functions. The minimal set of additional functions to define can be computed [11] but the number of functions to define can remain large. The resolution can be either symmetric [9,7] or asymmetric [1]. Compared to asymmetric resolution (e.g., arguments are taken in lexicographic order), symmetric resolution (all arguments have the same “weight”) seems more natural in C++ because it conforms to its function overload resolution rules. However the symmetric resolution generates more ambiguities and the programmer will find easier to consider the resolution argument by argument. Various approaches have been used to incorporate multiple dispatch in OOP languages. Some languages natively support it, like CLOS [1], Dylan [2] and Cecil [3]. For others, it comes as an extension, such as MultiJava [4] for Java. In C++, Open Multi-Methods [7] were proposed as an extension of virtual functions to more general multi-methods, but have not yet been submitted for inclusion in the standard. An alternative approach is to implement a library based on existing language features. In Java, JMMF [12] is such a library, based on Java’s reflection mechanism. In the Python language [13], several modules can be imported, and in C++, the Cmm library [14] uses pre-processor definitions and a pre-compiler to integrate multiple dispatch. By contrast, the library that we propose is based on the C++ standard only. Several other C++ libraries have been developed prior to ours, such as Loki [15] or DoubleCpp [16]. In Section 5.2, we provide a more detailed comparison of our work with existing approaches and implementations, in terms of features and performance. One point to be discussed is where to define functions of multi-methods. In [7], they are non-member functions automatically handled by a special linker which will construct the family of compatible callable functions, hence their qualification as open. This provides an easy extensibility and less verbosity. The problem with such global functions is that they can be found by the linker in different translation units even if those have nothing in common, leading to unexpected dispatches.
Y. Le Goc, A. Donzé / Science of Computer Programming 98 (2015) 531–550
533
Another issue concerns the definition of multi-methods for a contextual processing of objects. If a multi-method is defined outside of any class, the context must be passed as a parameter, which can be cumbersome. On the contrary, if this multimethod is encapsulated in a class, i.e., the set of functions participating to the resolution belong to a class [17], it makes it possible to encapsulate the context in this same class and have it accessible for the function definitions as a private member. Moreover it provides more control over the visibility of the functions, although it is more constrained and does not allow easy extensibility. Finally, Chambers [9] asks the questions whether multi-methods defined outside the scope of a class are object-oriented. In this paper, we propose a C++ multi-methods framework called EVL for “Extended Virtual functions Library”. We take advantage of the library approach as opposed to a language evolution approach to provide highly configurable multi-method classes. We first present a generalization of the function selection algorithm by using a redefinable function comparison operator (FCO). The operator is based on the distance (as measured by a configurable metric) between tuple of types in the subclass tree and optional additional information. The goal of this generalization is to propose a solution that can implement easily either the asymmetric or the symmetric resolution of the polymorphic arguments, or even a mix of both for multiple dispatch of more than two dimensions, or arbitrary refined and specific resolution strategies depending on the programmer’s needs and choices. Then we present our cache strategy that involves no more than an associative container filled at runtime. The advantage is that the cache can be implemented as a standard dispatch table by a vector of functions as well as a bounded hash map to reduce the memory footprint. More complex compressed dispatch tables are also possible to implement in principle [18] as well as caching algorithms such as page replacement algorithms [19]. We provide a complete and functional implementation of multiple dispatch for C++ based on the C++ 2003 standard. Our multi-methods are runtime objects that record functions and dynamically resolve their application. They are defined by class templates that preserve type safety through a strong typing. Only functions compatible with the defined prototype of the multi-method can be recorded without compilation errors. They also provide policies for FCO, cache and error handling. To realize the library implementation of the function selection algorithms based on tuple of types, we need inheritance relations at runtime. As the C++ standard does not specify runtime reflection and no existing library met our criterions, we defined a practical reflection mechanism using rules that make it as simple as possible for the programmer to declare types. Although the reflection part of the EVL framework is not our primary focus, we believe that it constitutes a contribution by itself, as it leads to the definition and implementation of interesting features such as fast dynamic casting [20] and simple rich pointers [21] (a variant of smart pointers that do not provide memory management but type information facilities). To summarize, the contributions of this work include: 1. A highly flexible multi-method framework including • the conceptualization of customizable function comparison operators to define any dispatch function selection algorithm, or refine existing ones (symmetric, asymmetric). FCO can be used to resolve ambiguities and implement predicate dispatch, • a customizable cache strategy to improve the performance of a multi-method (speed, memory), • a customizable error strategy to support environments where exceptions cannot be thrown. 2. A fully-functional C++ library implementation which • is C++03 standard compliant, macro-free and performant, • provides dynamic class loading support, • implements and uses a partial runtime reflection library that is non-intrusive and provides fast dynamic casting. The rest of the paper is organized as follows. In Section 2, we formalize the concepts of multi-methods and function comparison operators. In Section 3, we present our implementation of reflection in C++ to construct inheritance graphs, a prerequisite to multiple dispatch. In Section 4, we describe the implementation of multi-methods and how the notions introduced in the earlier sections are integrated into the EVL framework. Finally we discuss performance issues and related work, and present experimental results in Section 5 before concluding in Section 6. 2. Multi-methods formalization In this section, we formalize the multi-method notion and function comparison operators (FCOs), for which we will use the mathematical notation
• bp is a triplet (r , t , p ), called the base prototype, where r is the return type, t a tuple of virtual parameter types and p a tuple of any types. The tuple t has size N, called the number of dimensions of the multi-method. The tuple p has size M.
534
Y. Le Goc, A. Donzé / Science of Computer Programming 98 (2015) 531–550
For a tuple of arguments a = (a1 , · · · , a N , a N +1 , · · · , a N + M ), we let a¯ denote the tuple of dynamic types of (a1 , · · · , a N ), corresponding to the virtual parameter types; • D is a set of functions { f } such that every f ∈ D has a prototype compatible with bp: – The first N parameter types of f , called virtual parameter types and denoted ¯f = ( ¯f 1 , ¯f 2 , . . . , ¯f N ), are polymorphic and covariant to the corresponding N types in t, i.e., ¯f 1 < t 1 , ¯f 2 < t 2 , etc.1 ; – The remaining parameter types of f , called nonvirtual parameter types, are equal to p; – The return type of f is covariant to r. • d is a map which associates an additional attribute of type data2 to each function of D; fco