HPC the easy way: new technologies for high performance application development and deployment

HPC the easy way: new technologies for high performance application development and deployment

Journal of Systems Architecture 49 (2003) 399–419 www.elsevier.com/locate/sysarc HPC the easy way: new technologies for high performance application ...

2MB Sizes 0 Downloads 18 Views

Journal of Systems Architecture 49 (2003) 399–419 www.elsevier.com/locate/sysarc

HPC the easy way: new technologies for high performance application development and deployment M. Danelutto

*

Department of Computer Science, University of Pisa, V. Buonarroti 2, I-56125 Pisa, Italy

Abstract With the increase of both computing power available and computer application size and complexity, existing programming methodologies and technologies for parallel and distributed computing demonstrated their inadequacy. New techniques have therefore been designed and are currently being developed that aim at providing the user/programmer with higher level programming methodologies, environments and run time supports. In this work, we take into account some of these new technologies and we discuss their features, both positive and negative. Eventually, exploiting our experience in structured parallel programming environment design, we try to summarize which features have to be included in the programming environments of the near future, those answering (or trying to answer) the pressures and urgencies of current days claiming for new, efficient, easy to use high performance programming environments.  2003 Elsevier B.V. All rights reserved. Keywords: Parallel programming; Structured parallel programming models; Skeletons; Design patterns; Components; High performance computing

1. Introduction The development of high performance applications (HPa) targeting parallel and/or distributed architectures can be greatly improved by adopting suitable programming techniques and tools. Recently, different technologies have been developed that can contribute to the development and deployment of efficient, high performance applications. On the one hand, different programming *

Tel.: +39-50-22-127-26; fax: +39-50-22-127-42. Address: Dipartimento di Informatica, Viale Buonarroti 2, 56127 Pisa, Italy. E-mail address: [email protected] (M. Danelutto). URL: http://www.di.unipi.it/~marcod.

methodologies and environments have been developed that relieve the programmer of a set of tasks that usually he has to completely deal with. On the other hand, new mechanisms have been developed, and old ones have been revisited, that can be usefully exploited in the implementation of efficient run time systems (RTS). Concerning programming methodologies, coordination languages, algorithmical skeletons, design patterns and component technologies all contribute to provide users with suitable programming tools and environments. Coordination languages, either in their data [1] or control based [2] style, make available suitable ways of coordinating existing code activity to implement larger and more complex applications. Algorithmical

1383-7621/$ - see front matter  2003 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2003.06.001

400

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

skeletons provide users with predefined, efficient and reusable ways of modeling parallel and distributed computations, thus relieving the programmer of the need of programming all those error prone details common in parallel and distributed program code [3,4]. Design patterns [5] also make available high level parallel programming constructs, but add to the skeleton technology different possibilities of intervention to programmers not satisfied with the provided constructs/patterns [6,7]. Last but not least, component technology provides users with uniform frameworks that can be used to glue together independently developed procedures/modules/ programs in a single, possibly large, application [8]. Concerning RTS, templates (generic programming), layered RTS implementation, script-able components and data flow techniques have to be considered. Generic programming techniques supply effective ways to organize RTS code in such a way that the common parts can be efficiently programmed and optimized. Script-able components (i.e. component frameworks making available scripting facilities to combine and to create components) represent a viable medium to provide an adequate abstraction level for experienced users, interested in developing RTS, as well as for common users, interested in plain application development. The usage of layering techniques in the RTS design allows careful optimizations to be performed in different parts of the RTS. Such optimizations make the RTS suitable to support the development and the execution of HPa. Older techniques, such as data flow ones, can also be exploited to let the RTS take care of usually difficult aspects such as parallel task distribution and load balancing [9–11]. Both the programming methodologies and the RTS mechanisms just mentioned have usually been developed within research frameworks that were completely independent one of the other. However, positive aspects of all these technologies can (must, actually) be taken into account and exploited in the design and implementation process of HPa. This is especially important when HPa target consists of a wide range of parallel and distributed architectures, including GRIDs.

When focusing on HPa development, two further factors have to be taken into account: interoperability and software reuse. Existing software reuse is a major concern, especially due to the massive amount of existing, optimized sequential and parallel code available nowadays. Actually, most of the programming methodologies mentioned above allow software reuse, at least in the sequential portions of the HPa. Design patterns and algorithmical skeletons allow patterns and skeletons to encapsulate existing sequential portions of code with minimal changes. Coordination languages usually provide mechanisms to develop applications by adding coordination code to existing software written in the more common programming languages. Component technology provides sort of interfaces that can be used to encapsulate (and glue together) most of the existing sequential code. On the other side, interoperability, i.e. the ability to provide/use services to/from other applications or programming environments, is still a ‘‘desiderata’’ in many cases. Interoperability is a critical feature both during the development phases and in the production phase. In the development of an HPa, interoperability guarantees that external services can be seamlessly used within the current HPa. In the production phase, interoperability guarantees that HPa services can be made available to other, independent contexts. Several proposals are being developed in different research frameworks that aim at defining more and more advanced tools for HPa development and deployment. In this work we first discuss the experiences that exploit these new technologie to provide the final user (i.e. the programmer/ developer of HPa) with suitable tools for HPa development and deployment. Some of the experiences we take into account come from the GRID research community, others from the component research community. Further experiences represent evolutions of the ‘‘independent’’ research tracks of the past years that are evolving towards more integrated and comprehensive solutions. Eventually, we come up with a list of features that should be supported by any environment for HPa development and deploying. We also suggest a set of hints on how such features can be combined in the design of tools supporting this process.

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

The paper is organized as follows: Section 2 outlines the urgencies and pressures motivating research in the field of parallel/distributed HPC programming tool (design). Section 3 will give an overview of those programming models that currently represent promising HPC models. Section 4 summarizes which are the different features of these models that are worth being preserved. Section 5 discusses two commonly agreed ‘‘success stories’’ in the perspective outlined in Sections 3 and 4. Eventually, Section 6 discusses how some of the features of those new programming models can be integrated into a single programming framework, possibly an ideal HPC programming framework, in particular, taking into account our experience in structured parallel programming environment design.

2. Pressures and urgencies Several distinct factors are pushing research on HPa development, in our opinion. First, processor, network and cluster technology impressively improved in the last years. Have a quick glance at the Top500 list 1 [12]. From November 1997 to June 2003 the kind of machines appearing in the list radically changed. In November 1997 most of the machines were either MPP or SMP (97.8%). In June 2003 a large percentage of the machines in the list are either clusters or ‘‘constellations’’ (57.8%). Peak performance values changed, also, from a raw 1.33 TFLOP of November 1997 to 35.86 TFLOPS of June 2003. In the meanwhile, network technology evolved both in the system network and in the geographical area network segment, reaching latency and bandwidth values that make possible both the migration from MPPs to clusters and the development of the GRID technology [13]. The more evident result of such improvements in hardware

1

That is the list of the 500 more powerful computing installations in the world, published twice per year in correspondence of major supercomputing events.

401

technology is the Japanese Earth Simulator Project. Within such project a 40 TFLOPS (peak) TFLOPS machine is being operated (the one appearing in the top position of both November 2002 and June 2003 Top500 list, actually), but other machines delivering comparable and bigger performance are being designed in the US, following the ASCI machines that already deliver peak performance values around 10 TFLOPS (7–13, actually, ASCI-White to ASCI-Q). The existence of more and more powerful hardware platforms obviously supports more and more complex applications, which, in turn, are more and more difficult to design, develop and deploy and therefore require more and more effective development tools. Second, and related to the previous point, while some interesting new programming models have been designed, experimented and proven successful, the concrete majority of high performance applications are still being developed using ‘‘traditional’’ programming models and tools. Traditional programming languages used in conjunction with communication libraries (such as PVM or MPI), OpenMP, HPF and vector compilers represent the de facto standard in the development of HPa on single parallel machines; as an example, the programming models used to write applicative software running on top of the Earth Simulator machine are still based on HPF, MPI, OpenMP and vectorizing compiler technology. On the other side, explicit message passing and RPC represent the de facto standard in the development of applications on large scale systems, such GRID systems. All those ‘‘traditional’’ programming models/mechanisms still require huge efforts to HPa programmers. They must explicitly deal with all the features related to parallel/distributed programming: process setup, mapping and scheduling, communication handling, termination, fault tolerance, etc. Again, this implies that better tools for HPa design, development and deployment are needed. Third an increasing number of standards are being developed, experimented and assessed in different fields that eventually come closer and closer to HPa development field. As an example, WEB community is converging on the definition of

402

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

standard WEB services that are used to develop WEB applications. When developing HPa it becomes more and more frequent that some part of your HPa has already been designed/developed using these standards. The GRID community already recognized the importance of these standards with the definition of the Open Grid Services Architecture (OGSA) [14]. Therefore HPa tools must be able to deal with an increasing number of standards ranging from Object Oriented ones (e.g. CORBA, DCOM, JavaBeans) to WEB ones (e.g. WSDL, SOAP). Last but not least, new communities of users appeared on the scene, namely GRID and peer-topeer community, that both asked for effective, easy to use tools for HPa development and deployment and provided new mechanisms to be used in the HPa development and deployment process. In particular, those communities introduced significant middle-ware infrastructures that should be definitely taken into account when developing HPa [15,16]. In fact, these middle-ware infrastructures may be considered as some kind of operating system extensions in the direction of network (or grid) aware operating systems. The existence of these new middle-ware architectures poses new challenges to the development of HPa applications, as the features provided by those middle-wares can greatly enhance HPa features and finale performance, if properly used and exploited.

3. New models As far as in the previous section we outlined the main factors contributing to the need of new technologies for HPa development and deployment, here we want to outline the features of new program development technologies that have been developed in the perspective of making the design and development of HPa easier and more and more effective. The technologies we want to focus on are algorithmical skeletons, design patterns, coordination languages, components and WEB services. For all these technologies we only outline their main features, i.e. those features that affect the

efficacy of the model and that therefore must be taken in the right account in the design of HPa development and deployment technologies. 3.1. Algorithmical skeletons Algorithmical skeletons are around since the Õ90 [17] and this research track has been followed by different research groups, including our one [3,4,18–21]. The basic idea is to provide the user/ programmer by a set of either language constructs or library calls that completely take care of exploiting a given, recurring, parallel computation pattern. The programmer must only provide those parameters that specialize the pattern in order to get a working, effective parallel program [22]. As an example a skeleton programmer must supply the sequential portions of code that specialize a generic pipeline construct/call as a particular pipe with a given number of stages each computing a particular function. He is not required to provide all the code needed to implement the generic pipeline. Skeleton based programming environments have been designed both providing the programmer with a new programming language [20,23] and with skeleton libraries to be called from within a sequential programming language [4,21,24]. Also, skeleton based programming environments have often been designed in such a way that skeletons can be nested. Therefore complex parallel exploitation patterns can be modeled out of the compositions of simpler skeleton. Overall, within skeleton technology different techniques have been developed/adopted that allow skeleton semantics [25] to be effectively described and rewriting techniques driven by performance cost models to be applied to improve skeleton code performance [26]. Fig. 1 shows a skeleton program written using KuchenÕs skeleton library [4]. Parallelism is exploited using a pipeline with three stages: the first and the third are sequential stages, while the second one is a farm with sequential workers. All the features related to process and communication handling (KuchenÕs library works on top of MPI) are actually handled by the library. What is required to the user, in order to get a parallel run, is a simple mpirun-np command.

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

403

Fig. 1. Sample skeleton code written using KuchenÕs C++ skeleton library. This is all the code needed to have the parallel program running in parallel, but the sequential code for generateInputStream, f and consumeResultStream.

3.2. Design patterns Design patterns technology has roots in object oriented software engineering technology [5]. Design patterns originally constitute a programming methodology aimed at improving the process of object oriented sequential software design through the adoption of well known (OO) programming patterns. After a while, different authors recognized that the methodology developed, which is based on a verbose but effective description of the patterns that can be used, can also effectively be applied to parallel software design [27]. Later on, different authors also recognized that this methodology could be easily implemented and provided to the user by means of a structured parallel programming environment [7,28–30]. At that point, design patterns look like to be very close to the algorithmical skeleton idea, but for the different roots: the former being originated from OO area,

the latter from parallel processing area. One of the most important contributions of parallel/distributed design pattern technology lies in the nice exemplification of how layered programming environments can be designed, that both allow plain user to fully exploit design patterns in software design and implementation and more experienced users to intervene adding new patterns in the programming environment once those patterns have been understood to be useful and effective [7,31]. Fig. 2 presents a summary of a parallel design pattern. Actual pattern is described in a fairly longer way. The description of this pattern, for instance, is taken from [32]. In the original work, the description of the pattern takes several pages (five out of a total of 29 pages). The level of details is such that sample C code is shown for an implementation of the pattern in the paper, and problems such as load balancing, termination,

404

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

Fig. 2. Sample parallel design pattern (summary).

optimization of data transfers are all taken into account in these five page description. 3.3. Coordination languages Coordination languages are around since longer time than both design patterns and algorithmical skeletons. The basic idea of coordination languages is to provide the user/programmer with a set of tools that can be used to coordinate the parallel, distributed or simply concurrent execution of exiting portions of code [33]. Coordination languages have been designed both having coordination mechanisms based on the existence of a common, share data space [1] and on explicit control mechanism coordinating network of independent processes [2,34]. In some cases, e.g. in

the Linda case, libraries affecting a shared data (tuples, in the case of Linda) space have been provided that can be called from different sequential languages [1]. In other cases, completely new programming languages have been developed [34] requiring moderate to substantial programming effort to the programmer ‘‘gluing’’ together (that is coordinating) different software portions into a single parallel/distributed application. In both cases coordination language design fully demonstrated that parallel/distributed software design and implementation process can be significantly boosted by clearly separating coordination aspects from computation aspects, and, in particular, by allowing programmers to reuse existing portions of sequential code in proper coordination patterns.

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

405

Fig. 3. Sample coordination language code, written in Manifold. The program represents a three stage pipeline. The stages, defined elsewhere, may also be ‘‘structured’’ processes.

Fig. 3 shows the code needed to run a three stage pipeline in Manifold [34]. Provided that the three ‘‘processes’’ have the right types in terms of input and output streams, the line labeled with begin completely denotes the pipeline. Fig. 4 shows how two processes may interact using Javaspaces [35], the Sun implementation of Linda tuple space on top of the Jini/Java technology. It is worth pointing out that this kind of usage of Linda-like cooperation mechanisms requires a programming effort close to the one needed when using plain message passing (but for naming issues, perhaps). 3.4. Components Component technology derived from even stronger software engineering requirements: the huge need for software reuse, as much as possible, within a given applicative framework. Companies get often specialized in the development of software in a particular application area. And often again, different software packages implementing applications in the same area require common code to solve single, distinct application subproblems. Therefore it became natural to develop frameworks (programming environments) where software components can be designed, developed, maintained and instantiated according to the specific application needs. DCOM [36] and JavaBeans [37] component frameworks are popular in the

Microsoft and Java software communities. CORBA distributed object technology has been used as a de facto component environment both in industrial applications (CORBA comes from a consortium including major software companies [38]) and in different, not traditional fields (Linux window managers access graphical object facilities through a small, local ORB, actually [39]). Recently, the Corba Component Model (CCM) has been included in the CORBA standard [40] thus moving the entire CORBA framework towards the component technology. As usual, technologies developed to enhance the quality of sequential software development process has been migrated to the parallel and distributed programming world. The Common Component Architecture forum (CCA) [41] is being developing a component model which is suitable to be used in the development of distributed HPa [42,43]. The focus there is both on software reuse and on performance, as expected from an HPa development environment. Fig. 5 shows some fragments of code needed to implement a task farm worker component using Enterprise Java Bean technology, i.e. Sun component framework. The code shows a sample of the ‘‘declarative’’ stuff needed to create and operate components. Sections of code labeled (a) and (b) are interfaces modeling the appearance of the component home, that is the standard wrapper encapsulating each single component, and the

406

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

Fig. 4. Sample code using JavaSpaces. The code shows two Java processes (e.g. two stages in a pipeline). The first one sends data to the second one, consuming (processing) that data. The setup of the Java processes is in complete charge to the user.

component interface, i.e. the methods/services provided by the component. In other component systems (such as CCA or CCM) these parts are written using some interface description language (IDL) dialect (SIDL, CIDL, etc.). Once the interfaces have been defined, component code must be provided that implements those interfaces, i.e. that implements the components. This code is not shown in the figure. You can image a class that implements Worker interface. Last but not least, in order to use the Enterprise Java Bean components you must do two distinct things (see code of section (c) in the figure): create a com-

ponent (or alternatively obtain a handle for an already existing component) and interact via component interface with the component itself. The former activity is mainly related to the invocation of methods of either standard classes or component home classes to obtain component handles. The latter just requires to invoke the component methods on the handles. Overall, the setup of a parallel framework (or of a parallel application) requires a sensible amount of relatively simple code to be prepared. The advantage there obviously lies in interoperability of components within the component framework [44].

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

407

Fig. 5. Sample component code (Enterprise Java Beans). (a, b) The code needed to specify home interface (basically the component wrappers) and component interface. (c) The kind of code used to invoke component services. The example is relative to creation and use of a remote component computing tasks (e.g. a task farm worker).

3.5. Web services The WEB community grow more and more as WEB itself become more and more accessible and standard (standard protocols, standard software environments both on the client and on the server

site). Standard protocols and mechanisms have been completely assessed to publish and access WEB contents. Recently, the WEB community is developing (hopefully standard) protocols and mechanism that access WEB services [45], rather than plain (dynamic or static) WEB documents.

408

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

According to W3C consortium ‘‘a Web service is a software system identified by a URI, whose public interfaces and bindings are defined and described using XML. Its definition can be discovered by other software systems. These systems may then interact with the Web service in a manner prescribed by its definition, using XML based messages conveyed by Internet protocols’’ [45]. The process of moving from (possibly dynamic) pages to services led to the development of a set XML based tools and protocols (UDDI, Universal Description, Discovery and Integration [46], SOAP, Simple Object Access Protocol [47] and WSDL, Web Service Description Language [48]) that overall allow user to provide (publish) services through WEB standard interfaces and use these services from remote. GRID technology is recently converging to solutions very close to the ones adopted in the WEB services context [14], thus moving technologies that were originally meant only to enhance WEB potentialities to the field of HPa development and deployment. Most of the code needed to implement a WEB service looks like the one needed to run Enterprise Beans Components, at least when the Java bindings to WEB services are used [49]. Fig. 6 shows the code used to implement the service (code portion (a), this is actually ‘‘normal’’ Java code) and the code

needed to run a client accessing the service (code portion (b) in the figure). Hidden in between the service and the client code there is the WSDL file (20–30 lines of XML code, in this case. The document basically describes a service in terms of methods exported, ports used, messages exchanged and the alike. It can be generated by proper automatic tools directly from the Java source code of the service class. Then, proper tools can be used to generate the stub/skeleton code that actually makes the service calls work on top of the SOAP protocol, as an example. 4. What is to be preserved All the programming models shortly described in the previous sections look like to promise substantial improvements in the process of designing and implementing HPa, although differences exist between them, both in the target architectures considered and in the programming methodologies provided to the user/programmer. In general, if we think to what the technologies for HPa design, development and deployment are and what they should actually be in order to be effective, we can conclude that several features must be inherited. The following subsections outline those features.

Fig. 6. Sample WEB service code. Portion (a) of this code shows the (skeleton of the) code needed to implement a task farm worker service. Portion (b) of this code shows the actions needed to access the service.

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

4.1. Programmability (expressive power) Most of the models described in Section 3 provide users with suitable ways of expressing complex parallel applications by using predefined parallel program building blocks (either skeletons or design/coordination patterns), possibly allowing the nesting of these building blocks in order to model more and more complex parallelism forms. The building blocks hide, to different extents, the cumbersome, error prone mechanisms a programmer must deal with when developing HPa, such as process decomposition, scheduling and mapping, communication setup and handling, load balancing, and so on. The expressive power of both skeletons and design patterns is commonly understood to be orders of magnitude better than the one of C + MPI style programming environments (as an example, in [7] an experience comparing design pattern based and message passing parallel programming is described, showing that structured parallel programming requires a smaller learning curve and a definitely smaller HPa develop time). 4.2. Semantics Some of these new programming environments/ languages/tools come with clear functional and parallel semantics [25,34] which in some cases is compositional with respect to pattern/skeleton composition, thus allowing the experienced user to extract all the features needed to implement efficient HPa using such environments. This is not the rule when using more traditional approaches based on the usage of sequential languages plus communication libraries (e.g. C/F77 + MPI). In those cases the semantics of the sequential language must be combined with (often informal) semantics of the communication library in order to understand programs, thus requiring a significant effort when program debugging (both functional and concerning performance) is performed. 4.3. Software reuse All these new programming environments allow existing (sequential) software to be reused in the development of HPa. Software reuse ranges from

409

plain reuse of sequential code wrapped in proper sequential constructs [23] to the ability of accommodate entire, possibly parallel, software packages wrapped in proper distributed object/component suites [40]. This is a fundamental feature as there are so many lines of carefully optimized code to reuse in new applications. Nobody wants to rewrite from scratch some FFT or triangularization code if these are needed in a new application. Everybody just wants either to get already developed source code to be suitably included in the HPa code or to use, within the HPa code, properly chosen, existing, optimized libraries performing the required tasks. 4.4. Interoperability Some of these new environments allow software of HPa to interoperate with different software developed using even completely different environments and tools. A notable example is the WEB service world where different protocols (e.g. SOAP) have been developed that can exploit different other protocols (e.g. TCP or HTTP) to achieve service connectivity. This is fundamental, especially taking into account that nowadays interesting HPa are becoming more and more complex and multidisciplinary as well. Therefore we can expect that those complex applications require different parts to interoperate, that are possibly written using different programming environments and are possibly living in different frameworks. As a consequence, any technology used to develop HPa must support as much interoperability protocols as possible to prevent situations where existing tools cannot be reused due to unsupported protocols. 4.5. Layered implementation Most of these environment share a layered implementation design. Lower layers usually decouple target hardware by implementing some kind of abstract machine. Middle layers implement middle-ware as well as different service layers providing an even more complex abstract machine to the upper layers, those close to user/programmer. Last but not least, different upper layers provide

410

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

different levels of intervention to the users. Some of them enable normal users to design and develop new HPa, other provide methods allowing the results of experienced users to be shared among all the system users [31,50]. Although layered implementation of programming environments is on the scene since the very beginning of the computer science era, HPa poses new requirements on layers. As HPa become more and more complex the layer structure must accomplish to be more and more specialized (i.e. optimized) and must provide clearly separated implementation layers. In particular, GRID and component middle-ware clearly set up scenarios where upper layers are definitely completely separated from the lower layers, those abstracting somehow target architecture features.

Most of these environments run on different platforms. There are skeleton libraries that run on top of any existing C++/MPI architecture [4], design pattern frameworks running on Java/SPMD architectures [7], WEB services that have been designed to run on top of almost any existing architecture. However, most of them do not guarantee that performance is ported also. In other words, apart from some skeleton based environments that have been claimed and proved portable both in terms of functionality and of performance on different architectures [21,23] none of these models currently allows programs to be migrated (in source form) from a machine to a different one with the guarantee that comparable performance is achieved on the second machine.

4.6. Performance 5. The ‘‘success’’ stories (in a critical perspective) Skeletons, design patterns, coordination languages, components and WEB services all ultimately have performance as one of the goals stated as goals to be achieved, although none of them was born to address performance problems. Skeletons, design patterns and coordination languages were initially aimed at solving the programmability problem for parallel, object oriented and distributed applications, respectively. Components were aimed at providing suitable, engineered ways of reusing software more and more times. WEB services were aimed at allowing active services to be provided through the WEB in addition to classical content retrivial services. However, all these items rapidly become interested in delivering performance as well. In particular, skeletons and design patterns demonstrate to be able to deliver performance values comparable to those achieved with more classical programming tools [7,10, 20,51]. Components and WEB services basically started considering performance goals when moving from industrial framework to CCA and when begin included, somehow, in the GRID framework, respectively. 4.7. Portability Last but not least portability is a feature of all these environments, again only to a given extent.

Let now take into account what currently available HPa programming models and tools offer to the programmer. We want to consider here just two cases, that we think to be the ones more currently occurring in the real world: HPF and C (or F77, C++) plus MPI. The former, is an implicit parallel language, allowing programmers to easily write data parallel only programs. The latter represents state-of-the-art in the field of sequential languages plus communication libraries field. 5.1. HPF High performance Fortran allows FORTRAN programmers to write data parallel programs. HPF explicitly leaves the programmer the responsibility to describe some features of the data parallel computation, e.g. the kind of data distribution and the classification of loop as embarrassingly parallel (DO INDEPENDENT) or not. On the other site, HPF arranges execution in such a way that it is performed in parallel, process scheduling and mapping is automatically performed by the HPF run time support and all the communications (or shared data access, depending on the target architecture features) is completely in charge of both the compiler and the run time system. Overall, HPF programmers can develop

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

plain data parallel HPa, possibly reusing existing sequential FORTRAN code without being exposed to some of the problems usually one has to deal with when developing parallel/distributed HPa. On the other side, however, some drawbacks must be considered: • There is no primitive way in HPF of expressing task (control) parallel applications. Proposals exist [52] that allow HPF control parallel applications to be developed. Also, HPF programmers can use MPI within their programs to explicitly handle those parallelism exploitation patterns not covered by the language. However no primitive support for task parallelism has been included in the standard, yet. • Load balancing has to be explicitly handled by programmers, in particular exploiting possibilities to spread shared data across the different processing elements according to different strategies (e.g. scatter, block). This, in conjunction with HPF owner computes rule, provides the user/programmer with a limited possibility of achieving a load balancing exploiting the knowledge relative to the data set at hand. • HPF is a programming language. HPF tools allow HPF source code to be compiled into F77 + MPI code, as well as into other FORTRAN dialects calling more or less specialized communication libraries supported on the target architecture at hand. However, there is no possibility to link other (possibly parallel) processes to HPF code in case they have not been developed using the FORTRAN suite or HPF. Therefore both in terms of code reuse and interoperability HPF performs poorly. • Last but not least, no guarantee is given that a certain HPF program running on target architecture A can be ported to target architecture B with correspondent performance. The success of HPF is mainly due to limited amount of things a programmer must learn of to use HPF once it knows FORTRAN and to the possibility of reuse the huge amount of existing FORTRAN code. Despite this, HPF never succeed in becoming the language to be used in order to develop HPa and currently it has been over-

411

whelmed by other, lower level programming models (MPI), waiting for new, better models to come onto the HPa scene. 5.2. MPI MPI is a message passing library providing programmers by hundreds of different calls that can be used to implement synchronous, asynchronous and one-way point-to-point as well as collective communications. Although not explicitly required in MPI standard documents, most of the existing MPI programs have been designed and implemented accordingly to SPMD programming model. While in HPF most of the details concerned with (data) parallelism exploitation are dealt with by compiler tools and run time system, when using MPI the programmer is forced to explicitly deal with process set up and communication handling. Load balancing is also in complete charge of the programmer. Furthermore, in most cases, when using MPI communication primitives the programmer must completely handle communication buffer set up, usage and release. On the other side, MPI standard implementations (run time systems) explicitly handle communications aspects such as data conversion in case of heterogeneous processing elements, synchronization and process (channel) naming. In general, when using MPI in conjunction with any sequential ‘‘host’’ language: • Interoperability and code reuse are better handled than in case of HPF. Different bindings have been provided for MPI library and therefore different source code can be adapted to fit different parts of an MPI program. Furthermore, as C (C++) is fully supported, users can write MPI processes explicitly using external services or providing services to the foreign world by using existing standard interoperability mechanism. As an example, calls to methods of external CORBA objects can be issued in the code of a process belonging to the process set of an MPI application. However, in some cases calls to foreign libraries may be impossible due to incompatibilities in the implementation. As an example, some well known (and used)

412

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

MPI libraries use SIGUSR POSIX signals for internal purposes thus making impossible to use other libraries in conjunction that also use these kind of signals. • All the cumbersome, error prone details related to parallelism exploitation are in complete charge of the programmer. • Debugging of MPI applications represents quite an enormous task, at least in case of non-trivial applications, as the whole parallel structure of the program is in charge to the programmer and therefore may be source of errors. • As in the case of HPF (even worse, perhaps), no guarantee concerning performance portability of MPI programs can be stated. Actually, when porting an MPI application from target architecture A to B, code may need significant changes to keep performance figures in a suitable range. Summarizing . . . Neither HPF nor MPI can be considered programming models suitable for easy and effective HPa development and deployment. They achieve great results, indeed, but programmers must pay an high price in terms of programmability, interoperability and performance portability, at least. Actually, most current HP architectures support both these two programming models. And in most cases, these are the programming models used to achieve performance figures close to the peak ones in applications running on top of these machines. However, the time spent in the development of such applications and, in particular, in debugging their performance, is huge and HPa programmers use these models mainly because they are the only ones at hand. 2 6. Desiderata How can we envision a set of tools that allows easy and efficient HPa design, implementation and development to be achieved? First of all, such a set of tools must rely on the existence of some high level, structured parallel

2 At least the only ones that, even spending a huge amount of time, can guarantee acceptable performance figures.

programming model with a flavor similar to that of skeletons, coordination languages and parallel design patterns. This because all those models demonstrated that the existence of high level parallelism exploitation constructs in general make easier and faster to develop HPa code [53]. The programming model must guarantee the possibility to reuse as far as possible existing sequential and parallel code written in any kind of formalism. Furthermore the programming model must provide the programmer with a set of tools suitable to interface HPa code with other frameworks through standard protocols. These two factors combined guarantee software reuse and interoperability properties that have historically been the ‘‘missing features’’ of different advanced programming environments such as skeleton based ones [11,19,23] or design pattern based ones [7]. The programming model must also guarantee controlled expandability, i.e. the ability to expand the programming environment with new programming features/constructs as soon as they exist and they have been demonstrated efficient. Again, the lack of expandability has been considered a big mistake in the past. Alberta university people developing CO3P2S included ways to extend their pattern set providing a structured pattern repository [31] to avoid situations where new features are needed but there is no way to integrate them in the existing programming environment. Possibly, the programming model must guarantee compositionality of programs (parts of/whole HPa) both in terms of syntax and in terms of semantics. A good, very promising way of achieving such compositionality looks like to be the adoption of component based programming models. 3 The programming environment must definitely be implemented by exploiting a set of independent layers, each implementing an abstract machine that is provided to upper layers and it is implemented in the most efficient way, exploiting all the techniques available. 3

The kind of compositionality we have in mind is close to the one available in the CSP model. Peter Welch [54] has some fresh papers clearly explaining all the advantages of modular design allowed by compositional semantics, mainly related to the development of JCSP [55], a Java CSP embedding.

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

Upper layers, those close to the layer provided to the user should make heavy usage of compilation features, as it happened in the skeleton world [22,23]. Compilation makes possible to eliminate most of the overheads needed to match the high level programming models used with the low level middle-ware or operating system features constituting the final target architecture at hand. Lower level layers may rely on intensive usage of interpretation techniques, rather than compilation ones. In particular these levels are the ones dealing with features implemented close to the target architecture and therefore must be able to deal with lots of different peculiarities when executing compiled HPa code. The usage of just in time compilation techniques may improve the performance of these lower level layers. Most of the implementation layers must support scriptability in order to allow expert users to intervene to modify the run time system (or the compiler) in all those cases where very particular exigencies must be accomplished. The level of scriptability should be structured in that different kind of users must be allowed to perform different kind of interventions on the tools. Furthermore, the existence of different, stacked layers at the lower level may contribute to achieve both functional and performance portability. In particular, functional portability may be guaranteed by substituting appropriate lower layers when moving from one target architecture to another. Performance portability can be achieved by exploiting both lower layer substitution and by setting up upper layer compilation process in such a way that appropriate knowledge concerning the final target architecture is used in the compiled code to adapt the HPa code to the current architecture features. Performance is an issue that must be taken into account in the design of all the levels of this imaginary HPa programming environment: while designing the programming model, we must be careful to introduce features that, unless used by very expert programmers, introduce performance penalties in HPa. When designing the lower level layers we must be careful to introduce layers that can be efficiently ported on a range of different

413

target architectures and we must take care of not merging in the same layer features that are only partially supported by different architectures. Last but not least, experience in different fields showed that open source projects evolve much more rapidly than non-open source ones, and therefore source code of the whole environment should be public, under some form of open source license (see [56]), in order to better exploit the community knowledge. 6.1. Our experience The statements and the reasonings in the previous section are motivated by the considerations made in the first part of this paper. Our experience in the design of structured parallel programming environments fully validates them, also. We started designing structured parallel programming languages in first Õ90 with P3L, the Pisa Parallel Programming Language [57–59]. P3L was a skeleton based structured parallel programming language. Control parallel (i.e. task parallel) skeletons (pipeline, farm and loop, modeling pipelines, task farms and iterative parallelism exploitation patterns, respectively) as well as data parallel skeletons (map and reduce, modeling forall and associative accumulation parallelism exploitation patterns) were included in the language. C was the only programming language the programmer can use in P3L to express sequential computations. No mechanism were provided to include new skeletons in the language, nor to call externally provided services. P3L compiler tools figured out most of the decisions needed to produce actual parallel code (C + MPI code, actually) [60]. As an example, the actual parallelism degree of task farms was decided by the compiling tools. It was computed using profile data and both analytical performance models and heuristics (heuristics were used to match the parallelism degree with the resources actually available) [61,62]. The whole compiler tools of P3L were based on the idea of implementation template library. Implementation templates, i.e. known, efficient, parametric process networks implementing a given skeleton, were instantiated and composed at compile time in order to produce actual C + MPI object code [23,60].

414

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

Fig. 7. Sample P3L code. Sequential portions of code are wrapped into seq skeleton templates. In the figure, just one seq skeleton is sketched. The code is relative to a three stage pipeline with sequential first and third stage. The second stage is a farm. Its workers, in turn, are two stage pipelines.

The P3L project was initiated in a joint project of Department of Computer Science of University of Pisa and the Hewlett Packard Pisa Science Center. When the HP Pisa Science Center closed, in 1992, the project was continued by people from the Pisa Computer Science Department. Later on, the project was absorbed by the PQE2000 project, a national project funded by the National Research Council with the participation of QSW. This project led to the SkIE prototype [51], that represented a re-engineered, industrial version of P3L. P3L 4 and SkIE demonstrated to produce parallel code as much efficient as handy written C + MPI code, at least in case of coarse grain parallel code. The main drawbacks in the P3L

4

Fig. 7 shows sample P3L code.

(partially resolved in SkIE) design can be summarized as follows: • There was no chance to exploit parallelism according to patterns even slightly different from the ones originally provided in the language, to accommodate particular user needs. • A limited amount of code reusability facilities were included. 5 • No interoperability with other standards was supported. 6

5

This holds for P3L, mainly. The prototype P3L compiler only allowed to use C code [60], while the industrial version SkIE also allowed to use F77, C++ and Java [51]. 6 Again, this is true for P3L. In SkIE, parts of the overall parallel program can be written according to the HPF standard.

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

• Even taking into account that the language was quite a simple one, programmers was forced to learn another programming language, in order to write P3L/SkIE programs. • Last but not least, the template based implementation of the compiling tools required a complete rewriting of the template library when porting the compiler from a target architecture to a second, different target architecture. Furthermore, template instantiation was performed statically at compile time. Therefore the object code did not perform efficiently on irregular computations. Also, in case of small errors or approximations in the analytical performance models used in the compiler, the object code happened to be slightly inefficient. Later on, we tried to address two of these drawbacks, namely the necessity to learn a new language and the problems related to template based implementation. First of all, we tried to experiment embedding of P3L-like skeleton frameworks in different sequential languages, i.e. we tried to provide skeletons to the final user as library calls rather than as statements in some kind of programming language [24,63,64]. The experiment produced good results. The ideas in SKELIB [24] can be found (in a more mature form and relative to a different skeleton set) in modern skeleton library implementations [4]. Second, we tried to change the static, template based implementation of skeleton compiling tools in order to address in a more efficient way both load balancing and irregular computations problems. Therefore we developed a macro-data flow implementation model [9,10] that has been eventually used to implement Lithium [21], a pure Java skeleton embedding (Fig. 8 shows an example of Lithium code). Lithium also used some rewriting/ optimization rules [65] that greatly enhanced the object code performance without any programmer intervention. The macro-data flow implementation model has been successively adopted by different authors [20]. As a side effect, the macro-data flow implementation model turns out to be more efficiently portable across a range of different architectures.

415

After these steps, we had new, highly expressive, parallel programming environments, that were able to deliver good performance on parallel code as well as to provide significant rapid prototyping/ debugging features. In all those environments the programmer was not required to write a single line of code setting up a process network, arranging communications or even implementing load balancing policies. We were not satisfied, however, as there were still two big problems to be addressed: interoperability and expandability. Here comes ASSIST (a software development system based on integrated skeleton technology), the coordination language we are currently implementing at the Pisa Computer Science Department [3,66,67]. ASSIST improves our previous structured parallel programming environments in at least three ways: • It provides user with a lower level but highly configurable parallel skeleton (the parmod one) that can be used to model a wide range of parallelism exploitation patterns, ranging from control parallel ones (farms and pipes) to data parallel ones, under the complete programmer control. Indeed, the programmer is anyway relieved from writing any process scheduling and mapping or communication setup code, as it happened in the P3L and SkIE cases. • It provides user with both ways of accessing external services and ways to provide services to the external world via well known standard protocols, e.g. CORBA, thus providing good interoperability features. 7 • Its compiler and run time support tools are completely engineered using advanced OO programming techniques and the whole system is organized around distinct, well defined and optimized layers. Therefore, the overall programming environment is easily expandable with new features, in case those are necessary. Furthermore, the lower layers are available on a range of different architectures, thus enforcing a certain degree of portability.

7 At least with respect to the previous skeleton based parallel programming environments.

416

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

Fig. 8. Sample Lithium code. The code uses a three stage pipeline having sequential first and third stages. The second stage is a farm having two stage pipeline workers.

This is currently our working prototype and we are currently involved in several national projects aiming at transforming the prototype in different ways. Within one of these projects (the National Research Council project 1999 Strategic project ‘‘E-science enabling technologies and applications’’), we are porting the whole environment on GRIDS. In another project (the FIRB project GRID.it: Enabling platforms for high-performance

computational grids oriented to scalable virtual organizations) we are transforming the environment in order to be able to use and provide components as the building blocks of the programming environment and as the building blocks of the final HPa applications. The whole line coming from P3L to ASSIST fully supports the features we outlined in Section 6 as ‘‘good’’ features for new environment ad-

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

dressing HPa design, development and deployment. 7. Conclusions In this work, we tried to give a uniform view of some existing research lines in the field of parallel/ distributed, high performance programming models and tools, in the perspective of understanding which are the features that an ideal programming environment supporting HPa design, development and deployment must support/provide. In so doing, we exploit the knowledge gained in more than 15 years of activity in the field of parallel structured programming environment design and implementation. We hope that this kind of ‘‘position’’ paper gives the reader a more precise idea of the problems related to HPa development as well as on the kind of solutions that can be adopted in HPa programming environments. Acknowledgements I wish to thank M. Aldinucci, S. Campa, M. Coppola, P. Pesciullesi, R. Ravazzolo, M. Torquati, M. Vanneschi, C. Zoccolo for the useful discussions that contributed to improve this paper, as well as all the people that participated in the design and implementation of P3L, SkIE, Lithium, SKELIB and ASSIST. This work has been partially supported by Italian MIUR under Strategic Project ‘‘legge 449/ 97’’ year 1999 No. 02-00470-ST97 and year 2000 No. 02-00640-ST97 and by FIRB Project No. RBNE01KNFP GRID.it: Enabling platforms for high-performance computational grids oriented to scalable virtual organizations. References [1] N. Carriero, D. Gelernter, Linda in context, Communications of the ACM 32 (4) (1989) 444–458. [2] G.A. Papadopoulus, F. Arbab, Control-driven coordination programming in shared dataspace, in: Vol. 1277 of LNCS, Springer-Verlag, 1997. [3] M. Vanneschi, The programming model of ASSIST, an environment for parallel and distributed portable applications, Parallel Computing 28 (12) (2002) 1709–1732.

417

[4] H. Kuchen, A skeleton library, in: Euro-Par 2002, Parallel Processing, LNCS, No. 2400, Springer-Verlag, 2002, pp. 620–629. [5] E. Gamma, R. Helm, R. Johnson, J. Vissides, Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1994. [6] S. Bromling, Generalising pattern-based parallel programming systems, in: Proceedings of PARCO 2001, Imperial College Press, 2002. [7] S. MacDonald, J. Anvik, S. Bromling, J. Schaeffer, D. Szafron, K. Taa, From patterns to frameworks to parallel programs, Parallel Computing 28 (12) (2002) 1663–1684. [8] D. Gannon, Using the GRID to support software component systems, in: Proceedings of the SIAM PP 1999, 1999. [9] M. Danelutto, Dynamic run time support for skeletons, in: E.H. DÕHollander, G.R. Joubert, F.J. Peters, H.J. Sips (Eds.), Proceedings of the International Conference PARCO99, Parallel Computing Fundamentals and Applications, Imperial College Press, 1999,, pp. 460–467. [10] M. Danelutto, Efficient support for skeletons on workstation clusters, Parallel Processing Letters 11 (1) (2001) 41– 56. [11] J. Serot, Tagged-token data-flow for skeletons, Parallel Processing Letters 11 (4) (2001). [12] Top500.org, Top500 supercomputer sites. Available from . [13] The Global GRID Forum home page. Available from , 2003. [14] I. Foster, C. Kesselmann, J.M. Nick, S. Tuecke, The physiology of the GRID. An open grid services architecture for distributed system integration. Available from , 2002. [15] The Globus Project home page. Available from , pointer to manuals, techreps and papers inside, 2003. [16] The JXTA home page. Available from , 2003. [17] M. Cole, Algorithmic Skeletons: Structured Management of Parallel Computations, Research Monographs in Parallel and Distributed Computing, Pitman, 1989. [18] M. Cole, Bringing skeletons out of the closet. Available at authorÕs home page, December 2002. [19] P. Au, J. Darlington, M. Ghanem, Y. Guo, H. To, J. Yang, Co-ordinating heterogeneous parallel computation, in: L. Bouge, P. Fraigniaud, A. Mignotte, Y. Robert (Eds.), EuroParÕ96, Springer-Verlag, 1996, pp. 601–614. [20] J. Serot, D. Ginhac, Skeletons for parallel image processing: an overview of the SKIPPER project, Parallel computing 28 (12) (2002) 1685–1708. [21] M. Aldinucci, M. Danelutto, P. Teti, An advanced environment supporting structured parallel programming in Java, Future Generation Computer Systems 19 (5) (2003) 611–626. [22] S. Pelagatti, Structured Development of Parallel Programs, Taylor & Francis, 1998. [23] B. Bacci, M. Danelutto, S. Orlando, S. Pelagatti, M. Vanneschi, P3 L: a structured high level programming

418

[24]

[25] [26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34] [35] [36] [37] [38] [39] [40] [41]

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419 language and its structured support, Concurrency Practice and Experience 7 (3) (1995) 225–255. M. Danelutto, M. Stigliani, SKELIB: parallel programming with skeletons in C, in: A. Bode, T. Ludwing, W. Karl, R. Wism€ uller (Eds.), Euro-Par 2000 Parallel Processing, LNCS, No. 1900, Springer-Verlag, 2000, pp. 1175– 1184. M. Aldinucci, M. Danelutto, An operational semantics for skeletons, in: Proceedings PARCOÕ2003, in press. H. Bishof, S. Gorlatch, E. Kitzelmann, Cost optimality and predictability of parallel programming with skeletons, in: H. Kosch, L. Boszormenyi, H. Hellwagner (Eds.), Proceedings of EuroParÕ03, Lecture Notes in Computer Science, Springer-Verlag, LNCS 2790, pp. 682–693. D. Goswami, A. Singh, B.R. Preiss, Using object-oriented techniques for realizing parallel architectural skeletons, in: Proceedings of the ISCOPEÕ99 Conference, lNCS, No. 1732, Springer-Verlag, 1999, pp. 130–141. B.L. Massingill, T.G. Mattson, B.A. Sanders, A pattern language for parallel application languages, Technical Report TR 99-022, Univeristy of Florida, CISE, 1999. B.L. Massingill, T.G. Mattson, B.A. Sanders, A pattern language for parallel application programs, in: A. Bode, T. Ludwing, W. Karl, R. Wismuller (Eds.), Euro-Par 2000 Parallel Processing, LNCS No. 1900, Springer-Verlag, 2000, pp. 678–681. S. McDonald, D. Szafron, J. Schaeffer, S. Bromling, Generating parallel program frameworks from parallel design patterns, in: A. Bode, T. Ludwing, W. Karl, R. Wism€ uller (Eds.), Euro-Par 2000 Parallel Processing, LNCS No. 1900, Springer-Verlag, 2000, pp. 95–105. S. Bromling, Meta-programming with parallel design patterns, MasterÕs thesis, Department of Computer Science, University of Alberta, 2001. B. Massingil, T. Mattson, B. Sanders, Patterns for parallel application programs, in: Proceedings of the Sixth Pattern Languages of Programs Workshop, 1999. Available from . D. Gelernter, N. Carriero, Coordination languages and their significance, Communications of the ACM 35 (2) (1992) 97–107. The MANIFOLD home page. Available from , 2002. Sun, The JavaSpace home page. Available from , 2003. Microsoft. Available from , 2003. Sun, Javabeans home page. Available from , 2003. Object Management Group home page. Available from , 2003. The GNOME home page. Available from , 2003. J. Siegel, CORBA 3, OMG Press, John Wiley and Sons, 2000. Common Component Architecture Forum home page. Available from , 2003.

[42] XCAT home page. Available from , 2003. [43] Ccaffeine home page. Available from , 2003. [44] J. Snell, Web services interoperability. Available from , January 2002. [45] W3C, Web services home page. Available from , 2003. [46] UDDI home page. Available from , 2003. [47] SOAP home page. Available from , 2003. [48] WSDL home page. Available from , 2003. [49] Sun, Java Web Services home page. Available from (2003). [50] S. MacDonald, D. Szafron, J. Schaeffer, J. Anvik, S. Bromling, K. Tan, Generative design patterns, in: 17th IEEE International Conference on Automated Software Engineering (ASE), Edinburgh, UK, 2002. Also available from . [51] B. Bacci, M. Danelutto, S. Pelagatti, M. Vanneschi, SkIE: a heterogeneous environment for HPC applications, Parallel Computing 25 (1999) 1827–1852. [52] S. Orlando, R. Perego, COLTHPF, a run-time support for the high-level coordination of HPF tasks, Concurrency–– Practice and Experience 11 (8) (1999) 407–434. [53] D. Szafron, J. Schaeffer, An experiment to measure the usability of parallel programming systems, Concurrency–– Practice and Experience 8 (2) (1996) 147–166. [54] P. Welch, Home page. Available from (2003). [55] The JCSP home page. Available from (2003). [56] GNU license home page. Available from (2003). [57] F. Baiardi, M. Danelutto, M. Jazayeri, S. Pelagatti, M. Vanneschi, Architectural models and design methodologies for general-purpose highly-parallel computers, in: Proceedings of the IEEE CompEuro Õ91––Advanced Computer Technology, Reliable Systems and Applications, Bologna, Italy, 1991, pp. 18–25. [58] F. Baiardi, M. Danelutto, R.D. Meglio, M. Jazayeri, M. Mackey, S. Pelagatti, F. Petrini, T. Sullivan, M. Vanneschi, Pisa Parallel Processing Project on general-purpose highlyparallel computers, in: Proceedings of the COMPSAC Õ91, 1991, pp. 536–543. [59] M. Danelutto, R.D. Meglio, S. Orlando, S. Pelagatti, M. Vanneschi, A methodology for the development and support of massively parallel programs, Future Generation Computer Systems 8 (1–3) (1992) 205–220. [60] S. Ciarpaglini, M. Danelutto, L. Folchi, C. Manconi, S. Pelagatti, ANACLETO: a template-based P3L compiler, in: Proceedings of the PCWÕ97, Camberra, Australia, 1997. [61] B. Bacci, M. Danelutto, S. Pelagatti, S. Orlando, M. Vanneschi, Unbalanced computations onto a transputer

M. Danelutto / Journal of Systems Architecture 49 (2003) 399–419

[62]

[63]

[64]

[65]

[66]

grid, in: Proceedings of the 1994 Transputer Research and Application Conference, IOS Press, Athens, Georgia, USA, 1994, pp. 268–282. B. Bacci, M. Danelutto, S. Pelagatti, Resource optimization via structured parallel programming, in: K.M. Decker, R.M. Rehmann (Eds.), Programming Environments for Massively Parallel Distributed Systems, Birkhauser, 1994, pp. 13–25. M. Danelutto, R.D. Cosmo, X. Leroy, S. Pelagatti, Parallel functional programming with skeletons: the OCAMLP3L experiment, in: ACM Sigplan Workshop on ML, 1998, pp. 31–39. M. Danelutto, Task farm computations in java, in: M. Buback, H. Afsarmanesh, R. Williams, B. Hertzberger (Eds.), High Performance Computing and Networking, LNCS, No. 1823, Springer-Verlag, 2000, pp. 385–394. M. Aldinucci, M. Danelutto, Stream parallel skeleton optimisations, in: Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems, IASTED/ACTA Press, Boston, USA, 1999, pp. 955–962. M. Aldinucci, S. Campa, P. Ciullo, M. Coppola, M. Danelutto, P. Pesciullesi, R. Ravazzolo, M. Torquati, M. Vanneschi, C. Zoccolo, ASSIST demo: an high level, high

419

performance, portable, structured parallel programming environment at work, in: H. Kosch, L. Boszormenyi, H. Hellwagner (Eds.), Proceedings of EuroParÕ2003, LNCS, Springer-Verlag, LNCS 2790, pp. 1295–1300. [67] M. Aldinucci, S. Campa, P. Ciullo, M. Coppola, M. Danelutto, P. Pesciullesi, R. Ravazzolo, M. Torquati, M. Vanneschi, C. Zoccolo, A framework for experimenting with structured parallel programming environment design, in: Proceedings of PARCOÕ2003, in press.

Marco Danelutto received his PhD in 1990 from University of Pisa. He is currently an Associate Professor at the Department of Computer Science of the same University. His main research interests are in parallel/distributed computing; in particular, in the field of structured parallel programming models and coordination languages. He was one of the designers of the skeleton based parallel programming language P3L and of the Lithium pure Java skeleton library. He is currently involved in several national research projects aimed at designing structured coordination languages for high performance parallel/distributed/GRID computing.