::type_t head_t; 10 typedef optim
TASK 3
TASK 3
TASK 2
TASK 2
TASK 2
TASK 2
TASK 1
TASK 1
TASK 1
...
2 1 0
TASK 1
TASK 3
...
Time Fig. 7. Task/Node mapping for a three stage pipeline.
...
614
J. Falcou et al. / Parallel Computing 32 (2006) 604–615
5. Related work As stated in Section 1, QUAFF is a library-based approach to skeleton-based parallel programming. The most significant projects related to this approach are BSMLlib, Lithium, eSkel and Muesli. BSMLlib [17] is a library for integrating Bulk Synchronous Parallel (BSP) programming in a functional language (Objective Caml). It extends the underlying lambda-calculus with parallel operations on parallel data structures. Being based on a formal operational semantics, it can be used to predict execution costs. Lithium [9,2] is a Java library in which parallel programs are written using a set of predefined skeleton classes to be run on a cluster or grid-like network of machines. The underlying execution model is a macro dataflow one. The choice of Java is motivated by the fact it provides an easy to use, platform-independent, object oriented languages. The ESkel Library [6] proposed by Murray Cole represents a concrete attempt to embed the skeleton based parallel programming method into the mainstream of parallel programming. It offers various skeletal parallel programming constructs to the C/MPI programmer, allowing a large variation in the kind of algorithms that can be developed [7]. ESkel proposes a limited, yet useful set of skeletons. Building applications is easy and integrating ad hoc parallelism is often an appreciable shortcut to circumvent skeleton limitations. However, eSkel exposes a rather low-level API, with a lot of MPI-specific implementation details. MUESLI, the Mu¨nster Skeleton Library [15] is a C++ skeleton library proposed by Herbert Kuchen. Based on a platform independent structure, it integrates features from several other systems like the two-tier model of P3L [4] and some data parallel skeleton. The main idea is to generate a process topology from the construction of various skeleton classes and to use a distributed container to handle data transmission. Kuchen’s approach of a polymorphic C++ skeleton library is interesting as it proposes a high level of abstraction but stays close to a language that is familiar to a large crowd of developers. Moreover, the C++ binding for higher order functions and polymorphic calls ensure that the library is type safe. The set of skeleton proposed is large enough to accommodate various parallel algorithms. The main problem with Muesli, however, is that the overhead paid for such a high-level API is rather high (in [15] it is reported to be between 20% and 110% for simple applications). 6. Conclusion In this paper, we have introduced QUAFF, a skeleton-based parallel programming library written in C++. Compared to similar works, QUAFF drastically reduces the runtime overhead by performing most of skeleton expansion and optimization at compile time. This is carried out by relying on template-based meta-programming techniques and without compromising expressivity nor readability. This has been demonstrated on several realistic, complex vision applications, including a real-time 3D reconstruction application and, more recently, a real-time pedestrian tracker based on particle filtering [10]. For these reasons, we claim that QUAFF meets the four guidelines proposed by Cole in [6] to assess skeletonbased parallel programming libraries. First, by relying on a plain C++ interface, QUAFF contributes to the propagation of the skeleton concept with minimal disruption. This is especially true in the computer vision domain – one of our target application domain –, where developers are traditionally reluctant to use another programming language and where reuse of existing code and libraries is mandatory. Second, QUAFF offers several ways to integrate ad hoc parallelism, either by using the pardo skeleton or by inserting direct calls to MPI inside user-defined tasks. Third, QUAFF shows the payback of using a skeleton-based approach. In our case, the gain in development effort time has been at least of one order of magnitude (an application such as the one described in Section 3 or in [10] can be implemented in less that 1 day, whereas a hand-crafted MPI implementation required more than a week of development and debugging). Last, and although this has been only suggested in this paper, QUAFF offers a way to accommodate diversity, by passing additional parameters to the existing skeletons (such as the dispatcher/collector option for the farm skeleton). Ongoing work focus on two directions. At the application level, we are experimenting with other applications from several domains in order to refine the skeleton set and options. At the implementation level, we are working to further improve the inter-skeleton optimization rules sketched in Section 4.4. A public release of the QUAFF software is planned in the next few months.
J. Falcou et al. / Parallel Computing 32 (2006) 604–615
615
References [1] D. Abrahams, A. Gurtovoy, C++ Template Metaprogramming: Concepts, Tools and Techniques from Boost and BeyondAW C++ in Depth Series, Addison Wesley, 2004. [2] M. Aldinucci, M. Danelutto, P. Teti, An advanced environment supporting structured parallel programming in Java, Future Generation Computer Systems (2003) 611–626. [3] A. Alexandrescu, Modern C++ Design: Generic Programming and Design Patterns AppliedAW C++ in Depth Series, Addison Wesley, 2001. [4] B. Bacci, M. Danelutto, S. Orlando, S. Pelagatti, M. Vanneschi, P3L: a structured high level programming language and its structured support, Concurrency: Practice and Experience (1995) 225–255. [5] M. Cole, in: K. Hammond, G. Michaelson (Eds.), Research Directions in Parallel Functional Programming, Springer, 1999 (Chapter 13, Algorithmic Skeletons). [6] M. Cole, Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming, Parallel Computing 3 (2004) 389–406. [7] M. Cole, A. Benoit, Using eSkel to implement the multiple baseline stereo application, in: ParCo 2005, Malaga, Spain, September 2005. [8] F. Dabrowski, F. Loulergue, Functional Bulk Synchronous Programming in C++, in: 21st IASTED International Multi-conference, Applied Informatics (AI 2003), Symposium on Parallel and Distributed Computing and Networks, ACTA Press, 2003, pp. 462–467. [9] M. Danelutto, P. Teti, Lithium: a structured parallel programming enviroment in Java, in: Proceedings of Computational Science ICCS 2002, 2002, pp. 844–853. [10] J. Falcou, J. Se´rot, T. Chateau, J.-T. Lapreste´, Real time parallel implementation of a particle filter based visual tracking, in CIMCV Workshop – ECCV 2006, 2006. [11] A. Fusiello, E. Trucco, A. Verri, A compact algorithm for rectification of stereo pairs, Machine Vision and Application 12 (1) (2000) 16–22. [12] M. Hamdan, G. Michaelson, P. King, A scheme for nesting algorithmic skeletons, in: Proceedings of the 10th International Workshop on Implementation of Functional Languages, IFL ’98, pp. 195–212, 1998. [13] C. Harris, M. Stephens, A combined corner and edge detector, in 4th Alvey Vision Conference, pp. 189–192, 1988. [14] R.I. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2000. [15] H. Kuchen, A skeleton library, in: Proceedings of the 8th International Euro-Par Conference on Parallel Processing, 2002, pp. 620– 629. [16] H. Kuchen, J. Striegnitz, Higher-order functions and partial applications for a C++ skeleton library, in: JGI ’02: Proceedings of the 2002 joint ACM-ISCOPE Conference on Java Grande, 2002, pp. 122–130. [17] F. Loulergue, Implementation of a Functional Bulk Synchronous Parallel Programming Library, in 14th IASTED International Conference on Parallel and Distributed Computing Systems, ACTA Press, 2002, pp. 452–457. [18] M. Poldner, H. Kuchen, Scalable Farms, in: ParCo 2005, Malaga, Spain, September 2005, pp. 122–130. [19] J.V.W. Reynders, P.J. Hinker, J.C. Cummings, S.R. Atlas, S. Banerjee, W.F. Humphrey, S.R. Karmesin, K. Keahey, M. Srikant, M. Dell Tholburn, POOMA: A Framework for Scientific Simulations of Parallel Architectures, in: Parallel Programming in C++, MIT Press, 1996, pp. 547–588. [20] N. Scaife, G. Michaelson, S. Horiguchi, Parallel Standard ML with Skeletons, 2004. [21] E. Unruh, Prime number computation, ANSI Technical Report X3J16-94-0075/ISO WG21-462, 1994. [22] T. Veldhuizen, Expression templates, C++ Report 7 (5) (1995) 26–31. [23] T. Veldhuizen, Five compilation models for C++ templates, in First Workshop on C++ Template Programming, October 10, 2000. [24] T. Veldhuizen, D. Gannon, Active libraries: rethinking the roles of compilers and libraries, in: Proceedings of the SIAM Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing (OO’98), SIAM Press, 1998.