EVL: A framework for multi-methods in C++

Science of Computer Programming 98 (2015) 531–550 Contents lists available at ScienceDirect Science of Computer Programming www.elsevier.com/locate/...

Download PDF

519KB Sizes 1 Downloads 26 Views

Report

PDF Reader
Full Text

Science of Computer Programming 98 (2015) 531–550

Contents lists available at ScienceDirect

Science of Computer Programming www.elsevier.com/locate/scico

EVL: A framework for multi-methods in C++ Yannick Le Goc a , Alexandre Donzé b,∗ a b

Institut Laue-Langevin, 6 rue Jules Horowitz, BP 156, 38042 Grenoble Cedex 9, France University of California, Berkeley, EECS Department, Cory Hall 545S, 94720 Berkeley, CA, USA

a r t i c l e

i n f o

Article history: Received 3 April 2012 Received in revised form 9 August 2014 Accepted 11 August 2014 Available online 4 September 2014 Keywords: Multi-methods Multiple dispatch Runtime type information Object-oriented programming C++

a b s t r a c t Multi-methods are functions whose calls at runtime are resolved depending on the dynamic types of more than one argument. They are useful for common programming problems. However, while many languages provide different mechanisms to implement them in one way or another, there is still, to the best of our knowledge, no library or language feature that handles them in a general and ﬂexible way. In this paper, we present the EVL (Extended Virtual function Library) framework which provides a set of classes in C++ aiming at solving this problem. The EVL framework provides a generalization of virtual function dispatch through the number of dimensions and the selection of the function to invoke using a so-called Function Comparison Operator. Our library provides both symmetric and asymmetric dispatch algorithms that can be reﬁned by the programmer to include criteria other than class inheritance. For instance, the EVL framework provides multi-methods with predicate dispatch by deﬁning a dedicated FCO based not only on the dynamic types of the arguments but also on their values. This ﬂexibility greatly helps to resolve ambiguities without having to deﬁne new functions. Our multi-methods also unify dispatch tables and caching by introducing cache strategies for which the implementation is a balance between memory and speed. To deﬁne multi-methods in C++, we implement a non-intrusive reﬂection library providing fast dynamic casting and supporting dynamic class loading. Our multi-methods are policybased class templates that support virtual but not repeated inheritance. They check the type compatibility of functions at compile-time, preserve type-safety and resolve function calls at runtime by invoking the cache or updating it by computing the selected function for the requested tuple of types. By default, our multi-methods handle dispatch errors at runtime by throwing exceptions but an error-code strategy can be set up by deﬁning a dedicated policy class. Performance of our multi-methods is comparable with that of standard virtual functions when conﬁgured with fast cache. © 2014 Elsevier B.V. All rights reserved.

1. Introduction In object-oriented programming (OOP), it is common to deﬁne some member functions as virtual to allow them to be overridden by differing concrete implementations in derived classes; a call to a virtual function is then resolved to one of its concrete implementations at runtime based on the dynamic type of the object on which the call is made. Multiple dispatch can be seen as an extension of this selection mechanism, where the concrete implementation to use at runtime depends on

*

Corresponding author. E-mail addresses: [email protected] (Y. Le Goc), [email protected], [email protected] (A. Donzé).

http://dx.doi.org/10.1016/j.scico.2014.08.003 0167-6423/© 2014 Elsevier B.V. All rights reserved.

532

Y. Le Goc, A. Donzé / Science of Computer Programming 98 (2015) 531–550

the dynamic type of more than one argument. It has been implemented in different languages with various terminologies. In CLOS [1] and Dylan [2], generic functions implement multiple dispatch. In Cecil [3] and MultiJava [4], such method families are called multi-methods. The most popular object-oriented languages, including C++ [5] and Java [6], propose a built-in single dispatch mechanism implemented by class member functions (virtual functions in C++, non-static methods in Java), but do not support the runtime resolution of functions that depend on the dynamic type of more than one argument. Potential applications of multiple dispatch are (see [7] for more examples):

• The traversal of a graph data structure where nodes are instances of different polymorphic classes without deﬁning a process function in each node class;

• The deﬁnition of an event handler for a widget toolkit by deﬁning functions depending on two dynamic arguments (event, widget);

• The deﬁnition of binary comparison operators. Each example has its own speciﬁcs. The traversal example can be found in 3D image synthesis, where scenes are modeled with an OOP data structure called a scene graph and for which a number of different rendering algorithms (deﬁned into the process functions) can be used. The well-known visitor pattern [8], which makes use of common features of object-oriented languages, is functional and relatively convenient, but it introduces additional dependencies between the node class to process and the visitor class. The event handler and the binary operator examples are typical problems requiring the use of double dispatch. As an example of a triple dispatcher, one can consider the event handler problem with an additional polymorphic argument (its parameter type is polymorphic) representing a context state. Multiple dispatch is an advanced topic in OOP. As a consequence, training programmers to use such a tool is a problem by itself; this was addressed, e.g., by Chambers in [9] but is still an ongoing concern. Despite the fact that the concept has existed for years, there are still many remaining issues with multiple dispatch and many ways of deﬁning it; it is also not clear how popular it is among practitioners. Muschevici et al. [10] present a cross-language comparison on how multiple dispatch is used in programs. By deﬁning code metrics, they are able to provide statistics over applications coded in different languages. The results tend to show that when multi-methods are available in a language, they are used. We can thus infer that once multi-methods are implemented in C++, they will be used. Moreover Muschevici et al. studied the potential use of multi-methods in the Java language. For that they made statistics on the use of explicit multiple dispatch by implementations of the visitor pattern and implicit multiple dispatch by cascaded use of the instanceof operator. The results showed that cascaded instanceof operators are more often used than the visitor pattern. The ﬁrst issue for multiple dispatch is the resolution of ambiguities when looking for the best matching function, i.e., when more than one function matches but it is not possible to decide which one to call. In C++, it is usual to encounter this situation. For instance, if we deﬁne two non-member overloaded functions foo with a single argument of type ﬂoat for the ﬁrst function and double for the second function and we try to apply foo to an int, we get an ambiguous call error at compile-time. When multiple dispatch is deﬁned as an extension of single dispatch, the natural idea is to have a compiler that provides compilation errors when it detects ambiguities. However, detecting ambiguities with multi-methods is not as easy as for single dispatch functions, which are deﬁned in the scope of a class hierarchy. It may not even be possible for the compiler to check for ambiguities at compile-time, since it has to check all possible tuples of types, but their exhaustive list is only available at link-time. In the presence of dynamic class loading, this list can even vary at runtime. Furthermore, it can be very large, increasing the probability of ambiguities. Also, it is legitimate and plausible to deﬁne multi-methods for which ambiguous tuples of types exist but are intentionally never used together. An a priori systematic detection of possible ambiguities would forbid this use case. A common approach to ambiguity resolution is the deﬁnition of additional functions that forward their calls to the already deﬁned functions. The minimal set of additional functions to deﬁne can be computed [11] but the number of functions to deﬁne can remain large. The resolution can be either symmetric [9,7] or asymmetric [1]. Compared to asymmetric resolution (e.g., arguments are taken in lexicographic order), symmetric resolution (all arguments have the same “weight”) seems more natural in C++ because it conforms to its function overload resolution rules. However the symmetric resolution generates more ambiguities and the programmer will ﬁnd easier to consider the resolution argument by argument. Various approaches have been used to incorporate multiple dispatch in OOP languages. Some languages natively support it, like CLOS [1], Dylan [2] and Cecil [3]. For others, it comes as an extension, such as MultiJava [4] for Java. In C++, Open Multi-Methods [7] were proposed as an extension of virtual functions to more general multi-methods, but have not yet been submitted for inclusion in the standard. An alternative approach is to implement a library based on existing language features. In Java, JMMF [12] is such a library, based on Java’s reﬂection mechanism. In the Python language [13], several modules can be imported, and in C++, the Cmm library [14] uses pre-processor deﬁnitions and a pre-compiler to integrate multiple dispatch. By contrast, the library that we propose is based on the C++ standard only. Several other C++ libraries have been developed prior to ours, such as Loki [15] or DoubleCpp [16]. In Section 5.2, we provide a more detailed comparison of our work with existing approaches and implementations, in terms of features and performance. One point to be discussed is where to deﬁne functions of multi-methods. In [7], they are non-member functions automatically handled by a special linker which will construct the family of compatible callable functions, hence their qualiﬁcation as open. This provides an easy extensibility and less verbosity. The problem with such global functions is that they can be found by the linker in different translation units even if those have nothing in common, leading to unexpected dispatches.

Y. Le Goc, A. Donzé / Science of Computer Programming 98 (2015) 531–550

533

Another issue concerns the deﬁnition of multi-methods for a contextual processing of objects. If a multi-method is deﬁned outside of any class, the context must be passed as a parameter, which can be cumbersome. On the contrary, if this multimethod is encapsulated in a class, i.e., the set of functions participating to the resolution belong to a class [17], it makes it possible to encapsulate the context in this same class and have it accessible for the function deﬁnitions as a private member. Moreover it provides more control over the visibility of the functions, although it is more constrained and does not allow easy extensibility. Finally, Chambers [9] asks the questions whether multi-methods deﬁned outside the scope of a class are object-oriented. In this paper, we propose a C++ multi-methods framework called EVL for “Extended Virtual functions Library”. We take advantage of the library approach as opposed to a language evolution approach to provide highly conﬁgurable multi-method classes. We ﬁrst present a generalization of the function selection algorithm by using a redeﬁnable function comparison operator (FCO). The operator is based on the distance (as measured by a conﬁgurable metric) between tuple of types in the subclass tree and optional additional information. The goal of this generalization is to propose a solution that can implement easily either the asymmetric or the symmetric resolution of the polymorphic arguments, or even a mix of both for multiple dispatch of more than two dimensions, or arbitrary reﬁned and speciﬁc resolution strategies depending on the programmer’s needs and choices. Then we present our cache strategy that involves no more than an associative container ﬁlled at runtime. The advantage is that the cache can be implemented as a standard dispatch table by a vector of functions as well as a bounded hash map to reduce the memory footprint. More complex compressed dispatch tables are also possible to implement in principle [18] as well as caching algorithms such as page replacement algorithms [19]. We provide a complete and functional implementation of multiple dispatch for C++ based on the C++ 2003 standard. Our multi-methods are runtime objects that record functions and dynamically resolve their application. They are deﬁned by class templates that preserve type safety through a strong typing. Only functions compatible with the deﬁned prototype of the multi-method can be recorded without compilation errors. They also provide policies for FCO, cache and error handling. To realize the library implementation of the function selection algorithms based on tuple of types, we need inheritance relations at runtime. As the C++ standard does not specify runtime reﬂection and no existing library met our criterions, we deﬁned a practical reﬂection mechanism using rules that make it as simple as possible for the programmer to declare types. Although the reﬂection part of the EVL framework is not our primary focus, we believe that it constitutes a contribution by itself, as it leads to the deﬁnition and implementation of interesting features such as fast dynamic casting [20] and simple rich pointers [21] (a variant of smart pointers that do not provide memory management but type information facilities). To summarize, the contributions of this work include: 1. A highly ﬂexible multi-method framework including • the conceptualization of customizable function comparison operators to deﬁne any dispatch function selection algorithm, or reﬁne existing ones (symmetric, asymmetric). FCO can be used to resolve ambiguities and implement predicate dispatch, • a customizable cache strategy to improve the performance of a multi-method (speed, memory), • a customizable error strategy to support environments where exceptions cannot be thrown. 2. A fully-functional C++ library implementation which • is C++03 standard compliant, macro-free and performant, • provides dynamic class loading support, • implements and uses a partial runtime reﬂection library that is non-intrusive and provides fast dynamic casting. The rest of the paper is organized as follows. In Section 2, we formalize the concepts of multi-methods and function comparison operators. In Section 3, we present our implementation of reﬂection in C++ to construct inheritance graphs, a prerequisite to multiple dispatch. In Section 4, we describe the implementation of multi-methods and how the notions introduced in the earlier sections are integrated into the EVL framework. Finally we discuss performance issues and related work, and present experimental results in Section 5 before concluding in Section 6. 2. Multi-methods formalization In this section, we formalize the multi-method notion and function comparison operators (FCOs), for which we will use the mathematical notation
• bp is a triplet (r , t , p ), called the base prototype, where r is the return type, t a tuple of virtual parameter types and p a tuple of any types. The tuple t has size N, called the number of dimensions of the multi-method. The tuple p has size M.

534

Y. Le Goc, A. Donzé / Science of Computer Programming 98 (2015) 531–550

For a tuple of arguments a = (a1 , · · · , a N , a N +1 , · · · , a N + M ), we let a¯ denote the tuple of dynamic types of (a1 , · · · , a N ), corresponding to the virtual parameter types; • D is a set of functions { f } such that every f ∈ D has a prototype compatible with bp: – The ﬁrst N parameter types of f , called virtual parameter types and denoted ¯f = ( ¯f 1 , ¯f 2 , . . . , ¯f N ), are polymorphic and covariant to the corresponding N types in t, i.e., ¯f 1 < t 1 , ¯f 2 < t 2 , etc.1 ; – The remaining parameter types of f , called nonvirtual parameter types, are equal to p; – The return type of f is covariant to r. • d is a map which associates an additional attribute of type data2 to each function of D; fco

•
m(a) = f ∗ (ca), fco where f ∗ is the minimal element of D a¯ for the order
• Functions of D can have different names and be non-members or members of a class. In the latter case, they must be bound to a calling object making them equivalent to a non-member function. We call them overriders of m [7].

• Type-safety is preserved. The set D a¯ contains only functions f for which cast ¯f (a) succeeds. • The invocation of m fails if fco – there is no minimal element for
In the above deﬁnition of our multi-methods, the
fco

f

g=

true

if ¯f < g¯ ,

false

otherwise.

Recall that
1 2

The special type all can be used to relax the covariance constraint. E.g., t = ( A , B ) requires f¯1 ≤ A and f¯2 ≤ B; t = ( A , all) only requires f¯1 ≤ A. There is no attribute when data is void.

Y. Le Goc, A. Donzé / Science of Computer Programming 98 (2015) 531–550

535

2.3. Distance-based function comparison operators We ﬁrst introduce a distance between types. Let u and t be polymorphic types such that u < t, then the distance d = σ (u , t ) between u and t is deﬁned as the length of the longest class derivation path

u < t 1 < · · · < t d −1 < t . Using

σ , we can compare f ∈ D a¯ and g ∈ D a¯ by comparing σa¯ ( f ) σ (¯a1 , ¯f 1 ), · · · , σ (¯a N , ¯f N ) ∈ NN

with

σa¯ ( g ) σ (¯a1 , g¯ 1 ), · · · , σ (¯a N , g¯ N ) ∈ NN . Note that σa¯ ( f ) and σa¯ ( g ) are well deﬁned by construction of D a¯ . We propose three FCOs, based on lexicographic order, product order and 1-norm of tuples. The lexicographic FCO is deﬁned as:

f
⇔

σa¯ ( f )
⇔

σ (¯a1 , ¯f 1 ) < σ (¯a1 , g¯ 1 )

∃k > 1, ∀i < k,

σ (¯ai , ¯f i ) = σ (¯ai , g¯ i ) and σ (¯ak , ¯f k ) < σ (¯ak , g¯ k ).

The product FCO is deﬁned as: p

f
⇔

σa¯ ( f ) < p σa¯ ( g )

⇔

∀i , σ (¯ai , f i ) ≤ σ (¯ai , g¯ i ) and ∃ j , σ (¯a j , ¯f j ) < σ (¯a j , g¯ j ).

The lexicographic FCO does not produce ambiguities (except in case of multiple inheritance) but it gives more importance to the ﬁrst elements than to the last elements of the tuple. On the other hand, the product FCO is more “democratic” since it will only consider a tuple to be “inferior” to another tuple if all its elements are “inferior”. To deﬁne the lexicographic and the product FCOs, we do not need the actual distance between types – the inheritance relation would be enough. However, this additional information can be used to deﬁne new relations between tuples of types. For example, we can deﬁne a 1-norm FCO as:

f
⇔ ⇔

σa¯ ( f ) <1 σa¯ ( g ) N

σ (¯ai , ¯f i ) <

i =1

N

σ (¯ai , g¯ i ).

i =1

Since we have the implications

σa¯ ( f ) < p σa¯ ( g )

⇒

σa¯ ( f )
σa¯ ( f ) < p σa¯ ( g )

⇒

σa¯ ( f ) <1 σa¯ ( g ),

the 1-norm and the lexicographic FCOs can be seen as a product FCO for which we provide an additional disambiguation rule (sum of distances for 1-norm and order of arguments for lexicographic). The 1-norm FCO reduces ambiguities (without eliminating them) but can be less intuitive in practice. Finally, note that the lexicographic FCO is an example of asymmetric resolution and the product FCO is an example of symmetric resolution. 2.4. Reﬁning function comparison operators A distance relation might not be enough to deﬁne a proper FCO, meaning that using only distance information can still yield ambiguities. They can be resolved either by adding the necessary overriders or by reﬁning the FCO with additional rules. The reﬁnement of an FCO
reﬁned

f
⎧ if f
otherwise.

The library makes it possible to attach data to each function or use extra type information that the implementation of the FCO can use (through the d component of the multi-method). Thus the natural way to deﬁne a proper FCO that resolves ambiguities is as follows:

536

Y. Le Goc, A. Donzé / Science of Computer Programming 98 (2015) 531–550

Fig. 1. Illustration of the overriders multiplication problem. To resolve the ambiguity with ( B , A 1 ), one would have to deﬁne k overriders ( A 1 , B ), · · · , ( A k , B ).

1. Deﬁne

f
f

EVL: A framework for multi-methods in C++

EVL: A framework for multi-methods in C++

Recommend Documents