Enabling scientific workflow sharing through coarse-grained interoperability

Enabling scientific workflow sharing through coarse-grained interoperability

Future Generation Computer Systems 37 (2014) 46–59 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: ww...

3MB Sizes 0 Downloads 29 Views

Future Generation Computer Systems 37 (2014) 46–59

Contents lists available at ScienceDirect

Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs

Enabling scientific workflow sharing through coarse-grained interoperability Gabor Terstyanszky a,∗ , Tamas Kukla a , Tamas Kiss a , Peter Kacsuk a,b , Akos Balasko b , Zoltan Farkas b a

Centre for Parallel Computing, University of Westminster, London, United Kingdom

b

MTA-SZTAKI, Budapest, Hungary

highlights • Coarse-Grained Interoperability concept developed to support workflow sharing. • Data structure elaborated to enable workflow management in a repository. • Created a simulation platform to allow the execution of meta workflows.

article

info

Article history: Received 21 December 2012 Received in revised form 18 December 2013 Accepted 17 February 2014 Available online 19 March 2014 Keywords: Workflow Workflow interoperability Workflow repository Simulation platform Distributed Computing Infrastructure

abstract E-scientists want to run their scientific experiments on Distributed Computing Infrastructures (DCI) to be able to access large pools of resources and services. To run experiments on these infrastructures requires specific expertise that e-scientists may not have. Workflows can hide resources and services as a virtualization layer providing a user interface that e-scientists can use. There are many workflow systems used by research communities but they are not interoperable. To learn a workflow system and create workflows in this workflow system may require significant efforts from e-scientists. Considering these efforts it is not reasonable to expect that research communities will learn new workflow systems if they want to run workflows developed in other workflow systems. The solution is to create workflow interoperability solutions to allow workflow sharing. The FP7 Sharing Interoperable Workflow for LargeScale Scientific Simulation on Available DCIs (SHIWA) project developed two interoperability solutions to support workflow sharing: Coarse-Grained Interoperability (CGI) and Fine-Grained Interoperability (FGI). The project created the SHIWA Simulation Platform (SSP) to implement the Coarse-Grained Interoperability approach as a production-level service for research communities. The paper describes the CGI approach and how it enables sharing and combining existing workflows into complex applications and run them on Distributed Computing Infrastructures. The paper also outlines the architecture, components and usage scenarios of the simulation platform. © 2014 Elsevier B.V. All rights reserved.

1. Introduction E-scientists have put large efforts in the exploitation of Distributed Computing Infrastructures (DCIs) for their ability to run compute- and data intensive applications. Many DCIs with large user communities have emerged during the last decade, such as the Distributed European Infrastructure for Supercomputing Applications (DEISA) [1], Enabling Grids for e-Science Grid (EGEE) [2],

∗ Correspondence to: University of Westminster, 115 New Cavendish Street, W1W 6UW London, United Kingdom. Tel.: +44 20 79115000; fax: +44 20 79115089. E-mail addresses: [email protected], [email protected] (G. Terstyanszky). http://dx.doi.org/10.1016/j.future.2014.02.016 0167-739X/© 2014 Elsevier B.V. All rights reserved.

the German D-Grid initiative (D-Grid) [3], UK National Grid Service (NGS) [4] and the North American TeraGrid (TG) [5] etc. They are based on different middleware stacks that provide an abstraction layer between computer resources and applications. For example NGS and TeraGrid are built on the Globus Toolkit [6], EGEE on gLite [7], DEISA relies on both the Globus Toolkit and Unicore [8], while D-Grid supports gLite, the Globus Toolkit and Unicore. In Europe the European Grid Infrastructure (EGI) federates all major European organizations through National Grid Infrastructures (NGIs). Production DCIs are commonly built on a large number of components, such as data resources, metadata catalogues, authentication and authorization methods, and repositories. As a result, managing the execution of applications on DCIs is a complex task.

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

47

Fig. 1.1. Snapshot of heterogeneous technologies in current workflow systems.

Moreover, solutions developed for one DCI are difficult to port to other infrastructures. In order to hide this complexity from researchers workflow systems are widely used as a virtualization layer on top of the underlying infrastructures. Workflows have become essential to integrate expertise about both applications (user domain) and DCIs (infrastructure domain) in order to support research communities. Workflows have emerged as a new paradigm for e-scientists to formalize and structure complex scientific experiments to enable and accelerate many significant scientific discoveries. A workflow is a formal specification of a scientific process, which represents, streamlines and automates the analytical and computational steps that e-scientists need to go from data selection and integration, computation and analysis to final data presentation and visualization. A workflow system supports the specification, modification, execution, failure recovery and monitoring of a workflow using workflow engine to control its execution. Research communities have developed different workflow systems and created large numbers of workflows to run their experiments. Fig. 1.1 presents some of the best known workflow systems and DCIs where workflows are executed. Although this picture is far from being complete, it demonstrates how heterogeneous the workflow technology is. These systems differ in workflow description languages, which are based on different formalisms, enactment strategies and middleware providing access to different infrastructures. This often has a profound impact on the resulting workflow performance, development effort, management and portability. It takes significant effort and time to learn how to use workflow systems, and requires specific expertise and skills to develop and maintain workflows. As a result, creating, running and maintaining workflows on DCIs require substantial efforts and expertise. E-scientists would like to share workflows (automatic porting of workflows across workflow systems and DCIs) to optimize their efforts. Currently, the major obstacle of workflow sharing is that workflow systems are not compatible. E-scientists hesitate to learn new workflow systems or port their experiments to other workflow systems, as this is a time-consuming and error prone process. Sometimes they are even constrained to a particular workflow system by DCIs they can use, or services operated by developers or system administrators.

Without workflow interoperability e-scientists often depend on specific workflow systems and DCIs. As a result, they cannot develop and run their experiments on multiple workflow systems to unleash performance and share workflows. Remark. The SHIWA project addressed the highlighted middleware and workflow systems. Workflow sharing needs workflow systems interoperability for two reasons: first, being able to reuse workflows written in different languages; second, being able to execute workflows using different DCIs. The first challenge reflects the execution of workflows by different workflow systems to enable workflow sharing. The second one relates to the execution of compute and/or data intensive workflows on different DCIs to harvest available compute and data resources to improve the overall performance of workflows. It is difficult to address these two challenges not only due to the diversity and complexity of DCIs and workflow systems, but also because of the wide range of user requirements and the complex data/control flow dependencies of workflows. Workflow interoperability enables the integrated execution of workflows of different workflow systems that may span multiple heterogeneous infrastructures. It can facilitate application migration due to infrastructure, services and workflow system evolution. Workflow interoperability allows workflow sharing to support and foster the adoption of common research methodologies, improve efficiency and reliability of research by reusing these common methodologies, increase the lifetime of workflows and reduction of development time for new workflows. Interoperability among workflow systems does not only permit the development and enactment of large-scale and comprehensive workflows, but also reduces the existing gap between different DCIs, and consequently promotes cooperation among research communities exploiting these DCIs. The FP7 Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs (SHIWA) project [9] and [10] developed two workflow interoperability solutions: CoarseGrained Interoperability (CGI) and Fine-Grained Interoperability (FGI). The project created a simulation platform: the SHIWA Simulation Platform (SSP) to support these two interoperability

48

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

solutions. In this paper, we describe in detail the CGI approach and how the simulation platform implements this approach providing production services for research communities in order to share and combine existing workflows into complex applications that can run on several DCIs. In Section 2 the paper overviews and analysis four major workflow interoperability approaches: workflow language standardization, workflow translation, workflow engine integration and workflow interoperability through sharing data resources. Section 3 outlines the coarse-grained workflow interoperability solution and how it was implemented by the SHIWA project as a production-level service. Section 4 describes the SHIWA Simulation Platform and its major components. Section 5 presents two basic types of usage scenarios that enable description and execution of workflows on the simulation platform. In Section 6 we describe a sea vessel tracker application and how it was ported and executed on the simulation platform. 2. Related works Previously workflow systems were not able to share workflows, their components and data because of the lack of interoperability among them. Making workflow systems interoperable and sharing existing workflows of different workflow systems is a major requirement of e-scientists. This interoperability enables inter- and multidisciplinary collaboration between different research communities by speeding up the research design and improving the quality of research outputs. The major driving forces behind these efforts are the Workflow Management Research Group [11] of the Open Grid Forum (OGF) and the Workflow Management Coalition (WfMC) [12]. Four major approaches have been identified to achieve workflow interoperability: workflow language standardization, workflow language translation, workflow engine integration and sharing data among workflows. Workflow language standardization: workflow description language standardization would make workflow systems able to exchange workflows. This would realize interoperability via workflow sharing by defining a standard workflow description language which workflow system developers and e-scientists can use as a standard. Workflow translation: having an intermediate workflow language, workflow system developers can either adopt it or define import/export processes for workflow translation. This solution would allow e-scientists to reuse workflows created in different workflow systems and execute them using the workflow system which they are familiar with. WfMC defined XML Process Definition Language (XPDL) [13] to support workflow transition, however it did not gain universal acceptance so far. YAWL [14] is based on an extensive analysis of more than 30 existing workflow systems using a set of workflow patterns. Because of its expressive power and formal semantics, YAWL might be a candidate to be used as an intermediate language for workflow translations [15]. The WfMS workflow system [16] defines interfaces for workflow description translation to achieve workflow language interoperability. The SHIWA project developed the Fine-Grained Interoperability (FGI) solution which is based on the Interoperable Workflow Intermediate Representation (IWIR) [17]. It is a simple and interoperable workflow representation which enables translation from one workflow language to another one through IWIR. Workflow engine integration: the basic idea behind this approach is executing a workflow in its own native environment by its own native workflow engine. This approach can significantly reduce the complexity of workflow interoperability even if it does not provide workflow manipulation capabilities. The SIMDAT project [18] developed two approaches to deal with workflow interoperability.

The first approach wraps each workflow as a service and creates a workflow client which can invoke the workflow from its own native workflow system by passing references to data inputs and outputs. The second approach wraps a workflow engine as a service and creates a workflow engine client for each workflow system. This client passes the workflow and its data inputs to its own native workflow engine to execute it. The SIMDAT approaches implement these services as grid or web services. WfMS [19] is based on workflow engine coordination and synchronization to exchange data and control information between different workflow systems. The architecture supports two scenarios. Workflow systems are either pre-deployed and executed on dedicated servers or submitted and deployed on-fly. In both scenarios clients are used to forward the workflow description to the workflow engine. YAWL [20] introduced the YAWL interoperability broker to connect the YAWL workflow engine to different workflow engines. The broker provides an interoperability model based interface to access different workflow systems to execute workflows. Finally, the VLE-WFBus [21] wraps workflow systems as federated services and manages them as loosely coupled services. The workflow bus handles workflows of different workflow systems as sub-workflows of a meta-workflow. It incorporates sub-workflows into a meta-workflow system through a workflow bus. The bus manages the control and data dependencies among sub-workflows via data objects. It uses scenario managers to integrate workflow systems and study managers to orchestrate execution of the metaworkflow. The scenario and study managers are implemented in a multi agent framework. Shared data resources: interoperation between workflows can also be achieved via shared data resources. It enables different workflow systems to access the same data resources allowing workflows to exchange data. This integration level can be realized by using a general heterogeneous data access technique that can be easily integrated to any workflow system. Several solutions exist such as Java Data-base Connectivity middleware (JDBC) [22] or Open Database Connectivity middleware (ODBC) [23] to support sharing data resources. These solutions provide connection to several relational databases, such as MySQL, Oracle, PostgreSQL, IBM DB2, MS SQL Server, and many more, to allow the execution of simple SQL queries. Open Grid Services Architecture Data Access and Integration (OGSA-DAI) [24] also integrates several SQL and XML databases. OGSA-DAI also supports file systems and allows the execution of complex workflows of data related requests in a grid environment. There were several attempts to achieve workflow language interoperability but it is rather unlikely to happen in the near future because workflow communities insist on using their own workflow language and do not want to port them into a new workflow language even if it is a standard one. The main advantage of the workflow translation is that workflow developers can modify workflows of different workflow systems after translating them in their workflow system. Its major limitation is that workflow languages have different expression capabilities and formalism. As a result, when workflows are translated some features and properties may be lost. Further limitation is that the workflow translation requires significant efforts to create translators among different workflow languages. The main advantage of the workflow engine integration is that it requires minimal efforts to enable sharing workflows of different workflow systems. Its major limitation is that workflows are managed as black boxes and they cannot be changed. The major limitation of achieving workflow interoperability through data resources is that workflow systems use different data formats and data transfer policies. To address workflow interoperability SHIWA created two interoperability concepts: Coarse-Grained (CGI) and Fine-Grained

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

49

Fig. 3.1. Data structure of workflow and workflow engine description.

Interoperability (FGI). The CGI concept is based on workflow engine integration. (See in Section 3.) The FGI concept is based on the workflow translation. SHIWA developed and deployed a production-level CGI service on the SHIWA Simulation Platform (SSP) (see Section 4), and offers a prototype-level FGI solution. The production-level CGI service is publicly available and used by several European research communities, such as Computational Chemistry, Life Science etc. Further, the paper will present the CGI concept and how the simulation platform supports it. 3. Coarse-grained workflow interoperability Workflow language standardization is not likely to happen in the near future and workflow translation is not always possible. Considering these limitations we selected the workflow engine integration approach to address workflow interoperability. Analysing the existing workflow engine integration approaches we identified some major drawbacks. SIMDAT and YAWL wrap workflow engines as services. Publishing workflow engines as services and creating clients for those services make the whole process too complicated. These approaches do not offer automated mechanisms to create and publish these services. To make it even more complicated SIMDAT has a scenario in which even the workflows are wrapped as services. It is too complicated to manage the synchronization mechanism of workflow systems in the WfMS solution because of the Petri net based reference model. VLE-WFBus uses an API to access different workflow system. Any changes in the API may also require changes in scenario managers of workflow systems. All these approaches but YAWL do not provide a publicly available repository where workflow developers can upload and publish workflows, and where researchers can browse and download workflows. To further improve workflow sharing we elaborated the Coarse-Grained Interoperability (CGI) concept having the following objectives:

• to define a workflow engine integration based interoperability concept which is independent from any workflow system and allows efficient management workflows of different workflow systems; • to create the formal description of workflows and workflow engines which enables execution of workflows of different workflow systems having the workflows key elements; • to define an architecture that manages both workflows and workflow engines as legacy code applications based on the formal description and which can be adopted to a different workflow system; • to enable combining workflows of different workflow systems into meta-workflows and running meta-workflows on different DCIs;

• to design and implement a platform based on the CGI concept and create a production service based on it. The CGI concept is based on two major entities: the workflow and the workflow engine. It considers the host workflow engine and its workflows as native ones while all other workflows and workflow engines as non-native ones. The native host workflow engine manages non-native workflows and non-native workflow engines as legacy code applications. To manage workflows and workflow engines as legacy code applications they should have a publicly available formal description. The workflow is specified by a four-tuple data set: abstract workflow, concrete workflow, workflow configuration and workflow engine. (See Fig. 3.1.) WF =: {WFabs , WFcnr , WFcnf , WFeng }

(3.1)

where WFabs —abstract workflow WFcnr —concrete workflow WFcnf —workflow configuration WFeng —workflow engine. The abstract workflow entity WFabs specifies the workflow graph by describing workflow nodes, their inputs and outputs. It also contains a short summary of workflow functionality. The abstract workflow does not contain any executables, default input files and parameters needed to run the workflow. Table 3.1 presents the abstract workflow data structure. The domain and tasktype attributes describe application areas of the abstract workflow and what it does. They allow straightforward categorization of workflows to support efficient browse and search operations. The inport and outport attributes with their sub-attributes define input and output ports of the abstract workflow. The configuration attribute specifies parameters requirements of ports. Each abstract workflow may have multiple concrete workflows which represent different implementations of the abstract workflow on the same or on different workflow engines. The concrete workflow entity WFcnr strictly follows the input and output definitions of the abstract workflow and implements the functionality given in the workflow description. It contains or references (via URLs) all executables required to run the workflow on the associated workflow engine plus metadata. Table 3.2 presents key attributes of the concrete workflow: definition, dependency and configuration attributes. The definition attribute, is a reference to a file that describes the workflow graph of the concrete workflow. The concrete workflow also stores information about the workflow engine which executes it. The dependency attribute handles any requirement of the particular concrete workflow. It can be for instance a DCI middleware, a security infrastructure, or files/executables required for execution. The configuration attributes resolve these dependencies.

50

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

Table 3.1 Abstract workflow data structure. Element

Sub-element

Description

Data presentation

Domain

Sub-domain

Workflow application domain

Plain text

Type of task the workflow executes Workflow input port

Plain text

Tasktype Inport Number Description Data type Outport

List Plain text List Workflow output port

Number Description Data type Configuration Portref Value Keyword

List Plain text List Resolution of in- & output ports data requirements Input port referenced Value passed to the port To help search operations

Plain text Plain text Plain text

Table 3.2 Concrete workflow data structure. Element

Sub-element

Definition Title Language Engine Configuration ID Dependen-cyref Dependencies

Description

Data presentation

Engine specific workflow graph description Name of the concrete workflow Workflow language in which the workflow written Workflow engine which executes the workflow Data of the concrete workflow

File name Plain text List List

Dependency reference Dependencies of the concrete workflow

ID Title Description Rights Licence Keywords

Copyright info Licence info To help search operations

The workflow configuration WFcnf contains parameters, references, default input files and metadata of the abstract and concrete workflow. The workflow engine WFeng identifies the concrete workflow engine which executes the workflow. It points to the workflow engine table. For example having the formal description of the ligand–protein docking workflow WFabs specifies the graph of the docking workflow as WFabs (dock) while there are three concrete workflows: WFcnr (MOTEUR), WFcnr (WS-PGRADE) and WFcnr (UNICORE) which represent MOTEUR, WS-PGRADE and UNICORE implementation of WFab (dock). The workflow engine is specified by a three-tuple data set: abstract workflow engine, concrete workflow engine and workflow engine configuration. WE =: {WEabs , WEcnc , WEcnf }

(3.2)

where WEabs —abstract workflow engine WEcnr —concrete workflow engine WEcnf —workflow engine configuration. The abstract workflow engine WEabs identifies the workflow system which runs the workflow while the concrete workflow engine WEcnr defines the workflow engine release and/or version. For example WEabs identifies WS-PGRADE as workflow system and WEcnr its version 3.5.1, 3.5.2, . . . , 3.5.7. Similar to the workflow configuration the workflow engine configuration describes binaries, files and parameters needed to run the workflow engine. See Tables 3.3 and 3.4. Fig. 3.1 presents the relation between workflows and workflow engine based on their formal description. The concrete workflow WFcnr specifies the concrete workflow engine WEcnr by the ‘‘engine’’ attribute. CGI uses the meta-workflow concept to enable combination and execution of workflows of different workflow systems. Metaworkflows are described in the workflow language of the host

List List Plain text List Plain text Plain text Plain text Plain text plain text

Table 3.3 Abstract workflow engine data structure. Element

Description

Data presentation

ID Name Description Configuration Keyword

Workflow engine ID Workflow engine name Workflow engine description Workflow engine config data To help search operations

Plain text Plain text Plain text Plain text Plain text

native workflow system. They have two types of workflow nodes: native and non-native nodes. Native nodes could be either native jobs or native workflows. For example if the host workflow system is MOTEUER there could be MOTEUR jobs and workflows. The concept introduced a new node type, the non-native workflow node, to be able to manage non-native workflows. Non-native nodes embed non-native workflows. Workflow developers have to specify the workflow node as either native or non-native when they create workflows to have meta-workflows which are managed as native workflows of the host workflow system. Metaworkflows can contain a combination of native jobs, native and non-native workflows: WFmeta =: {J1 . . . Jk , WFnat1 . . . WFnatl , WFnnt1 , . . . WFnntm }

(3.3)

where Jnath —native job h = 1 . . . k WFnati —native workflow i = 1 . . . l WFnntj —non-native workflow j = 1 . . . m. The CGI concept is based on the formal description of workflows and workflow engines which are managed in a repository and submitted through a submission service. Workflow and workflow engine developers publish the formal description of workflows and workflow engines in the workflow repository. In contrary to other solutions the formal description also contains

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

51

Table 3.4 Concrete workflow engine data structure. Element

Description

Data presentation

ID Name Version Operating system File/URL Middleware Execution site Configuration Keyword

Workflow engine ID Workflow engine name Workflow engine version OS needed to run workflow engine Binaries, default file and parameters Middleware on which the workflow engine runs Site where the workflow engine runs Workflow engine config data To help search operations

Plain text Plain text Plain text List Plain text List Plain text Plain text Plain text

Fig. 3.2. Coarse-grained workflow architecture.

the execution data required to run workflows. Workflows and workflow engines are wrapped as services using this formal description. The submission service manages workflows and workflow engines as legacy code applications. It automatically deploys both workflows and workflows engines as a service using their formal description. The host workflow system communicates only with the submission service using its clients. There is no direct communication with the wrapped workflows and workflow engines. To support the CGI concept the CGI architecture contains a submission service, a submission service client, integrated with a host workflow engine, and a workflow repository. The host workflow engine contacts the submission service through its client when it identifies a non-native workflow and forwards the workflow ID to the submission service. This service retrieves the formal description of the non-native workflow from the repository and identifies the workflow engine which executes the workflow and associates them. Finally, the submission service submits the

workflow to the associated workflow engine which controls the workflow execution. See Fig. 3.2. Fig. 3.3 presents the CGI usage scenario of the docking metaworkflow. The meta-workflow contains one native job (J1), two native workflows (WF2 and WF3) and one non-native workflow (WF4). Let us assume that job J1 is a WS-PGRADE job which retrieves ligands and proteins from public databases. WF2 and WF3 are WS-PGRADE workflows which pre-process ligands and proteins to prepare the docking operation. WF4 is a non-native workflow which runs the docking operation. There are both MOTEUR and UNICORE concrete workflows which can execute the docking operation. Fig. 3.3 illustrates how a non-native workflow WF4 of a non-native workflow system (workflow system B) can be executed through the native workflow system (workflow system A). Whenever the workflow engine A reaches the workflow node WF4 it identifies it is a non-native workflow node and contacts the submission service. Next, the submission service retrieves workflow WF4 and its input data from the workflow repository and parameterizes it. Then, the service either invokes the predeployed workflow engine on the DCI resource selected by the user or submits the workflow engine to a DCI resource to execute workflow WF4. Finally, when the workflow execution is finished the submission service collects and transfers the results to the native workflow system (workflow system A). To support the CGI approach SHIWA created the SHIWA Simulation Platform (SSP). It contains the SHIWA Repository to handle workflow and workflow engine data. It is integrated it with the SHIWA Submission Service to execute non-native workflows and the SHIWA Portal to be used as development and execution environment to enable editing and running meta-workflows.

Fig. 3.3. Coarse-grained workflow interoperability usage scenario.

52

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

Fig. 4.1. SHIWA simulation platform architecture. WF1 , . . . , WFm —workflow data and metadata, WE1 , . . . , WEn —workflow engine data and binary.

Further details of the SHIWA Simulation Platform and the relevant usage scenarios are given in Sections 4 and 5. Workflow sharing raises further research challenges such as data interoperability among non-native workflows, scheduling, monitoring and fault tolerance of non-native workflows. The SHIWA project provided basic data interoperability and fault tolerance solution to address some of these challenges but further works are required even on these solutions. Workflow sharing leads to data interoperability issues because different workflow systems use different data formats and different data transfer solutions. To address this issue the SHIWA project has created a library of data handler jobs which manage different data formats or different data transfer approaches or both. These jobs should be included as extra jobs between workflow nodes if they are required. The FP7 Building an European Research Community through Interoperable Workflows and Data (ER-flow) project addresses the issue of data interoperability. 4. SHIWA simulation platform (SSP) The simulation platform implements the CGI concept. It has three major components: the SHIWA Science Gateway, the workflow engines supported by the CGI concept and DCI resources where workflows are executed. 4.1. Architecture of the SHIWA simulation platform SHIWA Science Gateway (Fig. 4.1). It contains a portal (SHIWA Portal), a submission service (SHIWA Submission Service), a workflow repository (SHIWA Repository) and a proxy server (SHIWA Proxy Server) to support the whole workflow life-cycle using the CGI concept. The SHIWA Portal is built on the WSPGRADE portal technology. It has a built-in workflow system: the WS-PGRADE workflow system which is used as the host native workflow engine in the simulation platform. The portal allows workflow creation, configuration, execution and monitoring through a Graphical User Interface. The SHIWA Repository stores the formal description of workflows and workflow engines plus binary and data needed to execute them. Workflow and workflow engine developers can describe, modify and delete workflows and workflow engines through the repository GUI. The repository offers a wide-range of browse and search operations. To support the non-native workflow execution the SHIWA Submission Service imports the workflow and workflow engine from the SHIWA

Repository. This service either invokes locally or remotely predeployed workflow engines or submits workflow engines with the workflow to local or remote resources to execute workflows. The SHIWA Proxy Server manages certificates needed to execute the workflows on different DCIs. Supported Workflow Engines: currently the simulation platform provides CGI based support for the following workflow systems: ASKALON [27], Galaxy [28], GWES [29], Kepler [30], MOTEUR [31], WS-PGRADE [32], Pegasus [33], ProActive [34], Taverna [35] and Triana [36]. The ASKALON, Galaxy, Kepler, MOTEUR, P-GRADE Taverna and Triana workflow engines have been deployed at the University of Westminster while the others were installed remotely. SHIWA VO: the project consortium created the SHIWA VO to enable user authorization across the DCIs providing resources for the workflow execution. The VO incorporates the resources of the Austrian, British, Dutch, French, German and Hungarian National Grid Infrastructures. These resources belong to different DCIs of the European Grid Infrastructure (EGI) based on ARC [25], gLite [26], Globus [7] and Unicore [8] middleware. 4.2. Components of SHIWA simulation platform 4.2.1. SHIWA portal [38] The SHIWA Portal incorporates two major components: backend (grid User Support Environment—gUSE) and front-end (WS-PGRADE). gUse (Fig. 4.2) offers an infrastructure virtualization environment providing a set of high-level web services. It enables access to different DCIs to deliver on-demand services in cloud, desktop and service grid plus web services. gUSE has a three-tier architecture. The architecture tier includes the DCI Bridge job submission service which provides access to local and remote resources. The middle tier contains the gUSE services: Application Repository, File Storage, gUSE Information System, Workflow Interpreter and Workflow Storage. gUSE services are available through the presentation tier implemented by the Liferay-based WS-PGRADE portal [32]. The portal uses the gUSE client APIs to convert the user requests into gUSE specific web service calls. Users can access the portal via web browsers. The portal is integrated with the WS-PGRADE workflow system which is implemented on top of the gUSE services. The concept of the WS-PGRADE workflow system is based on abstract workflow, concrete workflow and workflow instance. The workflow system

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

53

Fig. 4.2. gUSE architecture.

Fig. 4.3. GEMLCA service architecture.

enables embedding workflows to create meta-workflow and supports the parameter study of workflows. The WS-PGRADE workflow system stores the abstract and concrete workflows using the local Workflow Storage service. The local File Storage service stores input and output files of workflows. The DCI Bridge service manages job submission to different underlying DCIs. The decision of where to send the jobs is taken either by the user or by the Metabroker service at run time. The portal provides job submission level fault tolerance through the DCI Bridge. The portal administrator can specify the number of job resubmissions to tackle job errors through the DCI Bridge administrator interface. If the job fails due to some infrastructure problem, the DCI Bridge resubmits the job at least as many times as specified in the job resubmission number. If all the resubmission attempts fail, the DCI Bridge reports the job as failed and the workflow execution is suspended. The development and execution services have been separated in the WS-PGRADE portal providing the power user view for developers and the easy user view for e-scientists. Users can login to the portal either as developers or as e-scientists to have power or easy user view. The easy user view supports three basic operations: downloading workflows from the repository, their parameterization and execution. In the power user view developers can create workflows, configure and execute them. 4.2.2. SHIWA submission service The submission service is built on the GEMLCA Service. This service enables legacy code applications written in any source language (C, Fortran, Java, etc.) to be deployed as a Grid Service

with minimal efforts. The service supports the execution of legacy code applications through a service interface by describing the parameters and environment values of the application in the XML-based Legacy Code Interface Description (LCID) file [37]. In the context of the SHIWA Simulation Platform the legacy codes are either workflows or workflow engines. GEMLCA Service (Fig. 4.3) has a three-layer architecture. The front-end layer offers a WSRF Service interface, built on GT4 WSRF-Service-Core, through that any authorized user can retrieve the list of available legacy codes, manage and execute legacy codes. The front-end layer hides the core layer (or Legacy Code Manager), which deals with the legacy codes through two services: Legacy Code Parameter Management and Legacy Code Execution Management Service. The back-end layer connects the Core Layer to different grid middleware for example gLite and GT2/GT4. The GEMLCA Service uses the LCID file to handle legacy applications as Grid services. The legacy codes can be directly published in the Legacy Code Repository using the LCID files. The GEMLCA Service can either submit legacy codes to DCI resources or invoke legacy codes pre-deployed on DCI resources. Users can access the legacy code services through either a command-line client or a portal. GEMLCA was integrated with the WS-PGRADE portal to provide GUI for users to deploy and execute legacy code applications and retrieve results. It should be emphasized that the GEMLCA concept is very similar to the SaaS Cloud solution. In fact, having a backend plug-in for IaaS cloud GEMLCA is able to work as an SaaS cloud platform.

54

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

Fig. 4.4. User (or browse) view.

The original GEMLCA Service had a major limitation. It managed workflow engines as legacy codes and workflows as parameters of workflow engines. As a result, users had to choose a workflow engine next they had to browse the list of workflows which the selected workflow engine was able to run to find the workflow they needed. SHIWA extended the GEMLCA Service with the Generic Interpreter Back-end (GIB). It manages both workflows and workflow engines as legacy codes. GIB combines the legacy code description of the workflows with the legacy code description of the corresponding workflow engine and then submits this combined legacy code. 4.2.3. SHIWA repository [39] University of Westminster developed the SHIWA Repository to manage workflows and workflow engines based on the formal description of workflows and workflow engines. The repository enables specifying, uploading, modifying, deleting, browsing and searching workflows and workflow engines. The repository provides a role-based user management for four actors: e-scientists, workflow developers, workflow engine developers and repository administrators. Workflow and workflow engine developers can specify, upload, edit and delete workflows and workflow engines in the repository. E-scientists can browse/search the repository to find and download workflows they need. Repository administrators manage the repository as system administrators. The repository has a private and public domain to handle workflows. In the private domain developers can manage workflows and workflow engines having both read and write access rights. They can specify, upload, modify and delete them. The content of the private domain is available only for registered users. In contrast, the public domain does not require registration. It allows browsing/searching validated workflows in the repository and downloading them. The repository offers two major operations to find workflows: browse and search operations. The browse operation enables checking the list of workflows, select them and display their details. The search operation allows users to specify search criteria, for example domain name, workflow name to filter the search operation. The repository offers two views: user (or browse) and developer (or table) views. The user view presents workflow data assuming a basic user-level understanding of workflows. It enables users to find workflows (Fig. 4.4). In this view the

repository presents the summary (domain, application, owner, description, graph, etc.), the number and types of inputs and outputs, data sets and details of the existing concrete workflows. Users can search workflows either selecting a domain via the domain list or by specifying workflow names. The developer view (Fig. 4.5) allows workflow developers to specify, edit and delete workflow data. This view displays the name, owner, status and description of the workflow, and also the group to which the workflow belongs to in a table format. After selecting a particular abstract workflow further details such as attributes and its concrete workflow are also displayed. 4.2.4. SHIWA proxy server The Proxy Server allows the management of certificates needed for execution of workflows on multiple DCIs that require different certificates. The users upload their proxies to the MyProxy server through the Proxy Manager. The administrator handles the Proxy Manager by adding, deleting or editing information about DCIs. 5. Usage scenarios of the simulation platform 5.1. Execution modes The simulation platform supports invocation- and submissionoriented workflow execution mode. In the invocation mode the workflow engine is pre-deployed as a service on a dedicated node and the workflow is submitted to it at each invocation. If the workflow engine has complex dependencies and its executable size is large then the invocation mode is the recommended mode. This mode improves the performance by avoiding multiple workflow engine transfer and deployment operations. However, scalability issues may occur if multiple workflows are submitted to a single pre-deployed workflow engine. In the submission mode both the workflow engine and the workflow are submitted to compute nodes upon each workflow invocation. Next, the workflow engine is deployed on-fly and then the workflow is invoked. This mode is recommended if the workflow engine’s executable is small and has no complex dependencies. The submission node offer better scalability than the invocation node since instances of the workflow engine can run on different compute nodes. The invocation-oriented approach is the default execution mode in the simulation platform.

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

55

Fig. 4.5. Developer (or table) view.

5.2. Usage scenarios There are four actors of the simulation platform: e-scientists, workflow developers, workflow engine developers and administrators. The simulation platform supports two major types of scenarios:

• description scenarios: they enable workflow and workflow engine developers to describe workflows and workflow engines and upload their data into the SHIWA Repository. The simulation platform supports one scenario for workflow engine and two ones (one through the SHIWA Portal and one through the SHIWA Desktop) for workflow description. • execution scenarios: they allow users to browse and search the SHIWA Repository, select, download and execute workflows or create meta-workflows and run them. The simulation platform supports three scenarios: two scenarios for workflow execution through workflow system GUIs and one through the SHIWA Desktop. We will outline only the portal and repository UI based approaches since the SHIWA Desktop based scenarios have been already published [40]. 5.2.1. Scenario 1: workflow description and upload (Fig. 5.1) Workflow developers describe the workflow through the SHIWA Repository GUI. The workflow description is divided into two phases. In the first phase they specify the abstract and concrete workflow data. In the second phase the developers define the execution data of the workflow and initiate their deployment on the SHIWA Submission Service. In step 1 they specify the abstract and concrete workflows workflow data such as workflow description, owner, access rights, attributes – input and output ports – configurations and dependencies using the repository GUI using the developer view. In step 2 the developers generate the workflow execution data (or the legacy code description) using the repository GUI. Finally, in step 3 they initiate the deployment of the workflow on the SHIWA Submission Service. The last step

is done automatically by the SHIWA Simulation Platform and no involvement of the workflow developer is required. Scenario 2: workflow execution through the SHIWA portal (Fig. 5.2) The users (e-scientists or developers) can browse and search the SHIWA Repository, select, download and execute native or non-native workflows or create meta-workflows and run them through the SHIWA Portal using the WS-PGRADE workflow system as the host native workflow system. In step 1 the user browses/searches the SHIWA Repository and selects workflows to be executed on the simulation platform. In step 2 the user either embeds the non-native workflow in a WS-PGRADE workflow or creates a meta-workflow by combining native jobs, native workflows and non-native workflows selected in step 1 using the WS-PGRADE Workflow Editor. In step 3 the Workflow Editor retrieves the workflow description from the SHIWA Repository and identifies it as a non-native workflow. In step 4 the WS-PGRADE workflow engine sends a request to the SHIWA Submission Service to execute the non-native workflow. In step 5 the submission service combines the non-native workflow with non-native workflow engine. In step 6 the submission service retrieves the proxy from the SHIWA Proxy Server. Finally, in step 7 the submission service either invokes the pre-deployed workflow engine or submits the workflow engine with the workflow(s) to execute the workflow on the target Distributed Computing Infrastructure. 6. Sea vessel tracking on the SHIWA simulation platform 6.1. Sea vessel tracking application Correlation Systems, company from Israel, provides solutions to process and analyse large sets of geospatial data. They focus on the integration of open source and geospatial data analysis and implementation of real time resource allocation method. They developed a technology and the required tools for routine and link analysis, demand prediction and real time dispatching. This technology is being used in a wide range of applications in different domains such as cellular networks, location based services, inner perimeter security, access control analysis, fleet management, border control and more.

56

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

Fig. 5.1. Scenario 1: workflow description and upload.

Fig. 5.2. Scenario 2: workflow execution through portal. −→ steps performed by developers, 99K steps performed by the simulation platform.

In the SHIWA project Correlation Systems targeted the marine security community by porting the WWAIS Vessel Tracker application to the simulation platform. WWAIS first, processes partial event data received from sea vessels’ Accounting Information System (AIS) transmitters. Next, it extrapolates this data in order to define vessels’ activities. Finally, this data is compared to trends of activities along world sea-trade routes to define the vessel route and identify any unexpected activity. The extrapolation process is an essential part of analysing vessels’ activities and routes at sea since AIS receivers are placed in major cities only. As a result, sufficient data can be collected, if the vessel runs next to the receivers, and little or no data when vessels far out at sea. The extrapolation process is very computing resource and time intensive because of three reasons. First, a large amount of vessel data should be processed. Currently data is available for more than 200.000 vessels. Second, the size of the geographical data is huge. In order to accurately map a voyage its route must be plotted based on coastline data. A detailed world coastline map can contain many complex polygons consisting of several million coordinate points. Thirdly,

Dijkstra’s variation compares each ship event to world shipping route data. The matrix of world shipping route data contains more than 2 million coordinate points. Considering that each vessel’s voyages should be processed separately, this application is an ideal candidate for parallel processing. 6.2. Implementation The WWAIS Vessel Tracker application is implemented as a number of RESTful web services. There are web services that prepare the raw route-data, others analyse the route (different algorithms can be implemented and used), while further ones finalize and display the route. Correlation Systems created a meta-workflow which coordinates the route calculation using web services as black-box ‘‘applications’’. In the workflow two algorithms are executed in parallel to analyse the route using different models. Outputs of these algorithms are compared and the best results are selected. Analysing a complete voyage means splitting it to different routes, executing

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

57

Fig. 6.1. WWAIS workflow.

these two algorithms to calculate each route, and finally collecting and comparing routes to merge and create the voyage history. Basically each route analysis can be represented as a parameter study. Similarly, the analysis of vessels’ voyages follows the same technique but in a larger scale. First, a set of vessels must be identified, next, the voyage analysis must be processed on each vessel and finally results of all vessels should be collected and evaluated. Thus, the WWAIS application is a parameter study application that contains another parameter study. Since it must be guaranteed that the routes are assigned to groups according to their vessel ID, parameter study execution must be implemented in two levels. The WS-PGRADE metaworkflow in Fig. 6.1 shows the workflow structure that covers the two parameter study scenarios. The meta-workflow coordinates the voyage calculation of all vessels as a high-level parameter study. This workflow contains a native generator job (VesGen job) and a native collector job (VesCol job) to manage the non-native workflow instances. The generator job’s input is the set of vessel IDs. The non-native workflow (represented by the Emb job in Fig. 6.1) calculates and analyses the voyage of each vessel as a lowlevel parameter study. This job is executed in parallel by as many instances as many vessel IDs are in the ID set. In the figure the dashed line represents the mapping of the inputs among native jobs and the non-native workflow. In the non-native workflow the first job (VoyGen job) splits the vessel’s voyage into routes and saves the vessel’s ID for each route in the configuration file. Next, the interpreter creates as many instances of the Corr job as many routes the VoyGen job generated. The Corr instances are run in parallel. All jobs in the workflow but the Corr jobs are WS-PGRADE jobs. They use functionalities and services of the WS-PGRADE workflow system. The Corr job exploits the Coarse-Grained Interoperability solution. This job is a legacy Taverna workflow that combines several calls to different web services. To enable the execution of the Taverna workflow on the SHIWA Simulation Platform it has been formally described and

uploaded to the SHIWA Repository. When WS-PGRADE reaches this job, it identifies it as a non-native node and calls the SHIWA Submission Service. The submission service retrieves the Corr workflow from the SHIWA Repository and submits it to a predeployed Taverna workflow engine. The VoyCol job collects the routes of the vessel voyage and analyses it. After completing all Corr jobs of one particular vessel voyage the native workflow’s interpretation initiates the voyage calculation of the next vessel. Having calculated voyages of all vessels the VesCol job collects all the voyages and organizes them according to the configuration file generated by the VoyGen job. Fig. 6.2 presents two sea vessel routes generated by the WWAIS meta-workflow. Further challenge of the WWAIS application was that the voyage analysis needs more and more resources with the growing number of vessels to be processed. The SHIWA Simulation Platform solves this problem by implementing the voyage analysis as a grid based workflow which enables to call internal web-services (different algorithm services) remotely. As a result, local resources can be allocated in massively parallel way which makes the whole analysis much faster than the sequential execution. Finally, from a maintenance point of view, the SHIWA based implementation enables an easy extension. If an additional algorithm should be added to the voyage analysis it can be done without modifying either the meta-workflow or the VoyGen and VoyCol job. 7. Conclusion In this paper, we presented the Coarse-Grained Interoperability concept which enables researchers to combine workflows of different workflow systems and run them on multiple Distributed Computing Infrastructures. The CGI approach wraps non-native workflows as legacy applications to enable their execution using a formal description. The key component of the formal description is the workflow. As a result, users can search and select workflows and not workflow engines. Abstract workflows

58

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59

Fig. 6.2. Sea vessel routes generated by the WWAIS workflow.

may have multiple concrete workflows which may represent workflow implementations on different workflow engines. The concrete workflow identifies the workflow engine which can execute it. The major advantage of the CGI concept is its flexibility. It does not require any coding to CGI enable workflows and workflow engines. It takes up to one–two weeks to develop CGI support for a new workflow engine if it has command-line interface and only a short time to enable the execution of a new non-native workflow on the simulation platform. Currently the CGI concept is the only production level solution which supports the workflow interoperability. The simulation platform covers the whole lifecycle of workflows, i.e. creating, publishing, selecting and executing workflow. It provides graphical user interface to specify and upload workflows into the SHIWA Repository. Workflow developers can describe workflows in a reasonable time, upload and publish them. E-scientists can browse or search workflows in the repository, select and download them. Having workflows they can execute them through the SHIWA Portal using WS-PGRADE as the host native workflow system. The workflow execution support is the extra service which none of the existing repositories can offer for multiple workflow systems. In contrast to the myExperiment repository that only allows execution of Taverna workflows, the SHIWA Simulation Platform enables execution of workflows of ten workflow systems based on the CGI concept: ASKALON, Galaxy, GWES, Kepler, MOTEUR, Pegasus, WS-PGRADE, ProActive, Taverna, Triana. The first version of the SHIWA Simulation Platform was deployed in October 2010. Currently the fifth version is available. The simulation platform has more than hundred users. They uploaded more than hundred abstract workflows and more than three hundred concrete workflows. The number of executed workflows currently stands at 2500 and increases on a regular basis. The CGI concept enables the creation and execution of metaworkflow sharing but there are further challenges such as data interoperability, monitoring non-native workflows, offering fault tolerance support, scheduling non-native workflows, etc. which should be addressed. Currently, the SHIWA Simulation Platform manages these issues through the gUSE/WS-PGRADE based SHIWA Portal but further research is needed to address these issues properly. For example the ER-flow project is targeting the data interoperability at both semantic and technical level. We are introducing an abstract data set to manage workflow data to be exchanged between workflow nodes. We are also considering standard data transfer protocols to develop a data transfer solution which enables seamless data transfer among workflows of different workflow systems.

Acknowledgements The development of the SHIWA Simulation Platform was supported by the EU funded FP7 ‘‘Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs’’ (SHIWA) project (Grant No: 261585) and the ‘‘Building an European Research Community through Interoperable Workflow and Data’’ (ER-flow) project (Grant No. RI-312579). References [1] R. Niederberger, O. Mextorf, The DEISA project – Network operation and support – first experiences, Comput. Methods Sci. Technol. 11 (2005) 119–128. [2] EGEE: http://www.eu-egee.org. [3] W. Gentzsch, D.G.I. Germany, D-Grid, an e-Science Framework for German Scientists, 2006, pp. 12–13. [4] NGS: http://www.ngs.ac.uk/. [5] TeraGrid: http://www.teragrid.org/. [6] I. Foster, Globus toolkit version 4: software for service-oriented systems, J. Comput. Sci. Tech. 21 (2006) 513–520. [7] gLite. http://glite.web.cern.ch/glite/. [8] D.W. Erwin, D.F. Snelling, UNICORE—a Grid computing environment, Concurr. Comput. Pract. Exp. 14 (2002) 1395–1410. [9] SHIWA: www.shiwa-workflow.eu. [10] SHIWA: www.en.wikipedia.org/wiki/SHIWA_project. [11] M. Shields, Gridnet2 activities for WFM research group, October 2007. [12] WfMC: workflow management coalition website, 2008. [13] Workflow management coalition workflow standard, workflow process definition interface, XML process definition language, document number WFMC-TC-1025. [14] WMP van der Aalst and AHM ter Hofstede: YAWL: yet another workflow language, Inf. Syst. 30 (4) (2005) 245–275. [15] A. Brogi, R. Popescu, From BPEL processes to YAWL workflows, 2006. [16] S. Pellegrini, F. Giacomini, A. Ghiselli, Using GWorkFlowDL for middleware independent modelling and enactment of workflows, in: Proceedings of the CoreGRID Workshop, Crete, Greece, 2008. [17] K. Plankensteiner, J. Montagnat, R. Prodan, IWIR: a language enabling portability across grid workflow systems, in: Proceedings of WORKS’11, Workshop on Workflows in Support of Large-Scale Science, Seattle, United States, http://dx.doi.org/10.1145/2110497.2110509. [18] M. Ghanem, N. Azam, M. Boniface, J. Ferris, Grid-enabled workflow for industrial product design, in: e-Science’06 Second IEEE International Conference on e-Science and Grid Computing, Amsterdam, Netherlands, 04 December 2006, ISBN: 0-7695-2734-5. [19] S. Pellegrini, F. Giacomini, A. Ghiselli, WfMS—a practical approach for workflow management system, in: Springer (Ed.), Proceedings of the CoreGRID Workshop 2007, Dresden, Germany, 2007. [20] W.M.P. van der Alst, L. Aldred, M. Dums, A.H.M. Hofstede, Design and implementation of the YAWL system, in: Proceedings of The 16th International Conference on Advanced Information Systems Engineering, CAiSE 04, Riga, Latvia, June 2004.

G. Terstyanszky et al. / Future Generation Computer Systems 37 (2014) 46–59 [21] Z. Zhao, S. Booms, A. Belloum, C. de Laat, B. Hertzberger, VLE-WFBus: a scientific workflow bus for multi e-Science domains, in: e-Science’06 Second IEEE International Conference on e-Science and Grid Computing, Amsterdam, Netherlands, 04 December 2006, ISBN: 0-7695-2734-5, pp. 11–18. [22] J. Ellis, L. Ho, M. Fisher, JDBC 3.0 specification, in: Sun Microsystems, Santa Clara, CA, 2006. [23] Microsoft ODBC 3.0: Software Development Kit and Programmer’s Reference, Microsoft Press Redmond, WA, 1997. [24] M. Antonioletti, et al., OGSA-DAI usage scenarios and behaviour: determining good practice, in: Proceedings of the Third UK e-Science All Hands Meeting, 2004, pp. 818–823. [25] M. Ellert, et al., Advanced resource connector middleware for lightweight computational Grids, Future Gener. Comput. Syst. 23 (2007) 219–240. [26] I. Foster, Globus Toolkit Version 4: software for service-oriented systems, J. Comput. Sci. Technol. 21 (2006) 513–520. [27] Thomas Fahringer, Radu Prodan, Rubing Duan, Jüurgen Hofer, Farrukh Nadeem, Francesco Nerieri, Stefan Podlipnig, Jun Qin, Mumtaz Siddiqui, HongLinh Truong, et al., ASKALON: a development and grid computing environment for scientific workflows, in: Workflows for e-Science, 2007, pp. 450–471. http://dx.doi.org/10.1007/978-1-84628-757-2_27. Part III. [28] J. Goecks, A. Nekrutenko, J. Taylor, T. Galaxy Team, Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences, Genome Biol. 11 (8) (2014) R86. http://dx.doi.org/10.1186/gb-2010-11-8-r86. PMC 2945788. PMID 20738864. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool= pmcentrez&artid=2945788. [29] GWES—http://www.gridworkflow.org/kwfgrid/gwes-web. [30] Bertram Ludäscher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A. Lee, Jing Tao, Yang Zhao, Scientific workflow management and the Kepler system, Concurr. Comput. Pract. Exp. 18 (10) (2006) 1039–1065. Special Issue: Workflow in Grid Systems. [31] T. Glatard, J. Montagnat, D. Lingrand, X. Pennec, Flexible and efficient workflow deployement of data-intensive applications on grids with MOTEUR, Int. J. High Perform. Comput. Appl. (IJHPCA) 22 (3) (2008) 347–360. SAGE. [32] Z. Farkas, P. Kacsuk, P-grade portal: a generic workflow system to support user communities, Future Gener. Comput. Syst. 27 (5) (2011) 454–465. [33] Ewa Deelman, Gurmeet Singh, Mei-hui Su, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi, G. Bruce Berriman, John Good, Anastasia Laity, Joseph C. Jacob, Daniel S. Katz, Pegasus: a framework for mapping complex scientific workflows onto distributed systems, Sci. Program. J. 13 (2005) 219–237. [34] B.R. Odgers, J.W. Shepherdson, S.G. Thompson, Distributed workflow co-ordination by proactive software agents, in: Proceedings of Intelligent Workflow and Process Management, IJCAI-99 Workshop, 1999. [35] D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. Pocock, P. Li, T. Oinn, Taverna: a tool for building and running workflows of services, Nucleic Acids Res. 34 (2006) 729–732. Web Server Issue. [36] I. Taylor, M. Shields, I. Wang, A. Harrison, The Triana workflow environment: architecture and applications, in: Workflows for e-Science, Springer, New York, Secaucus, NJ, USA, 2007, pp. 320–339. [37] T. Delaitre, T. Kiss, A. Goyeneche, G. Terstyanszky, S. Winter, P. Kacsuk, GEMLCA: running legacy code applications as Grid services, J. Grid Comput. (ISSN: 1570-7873) 3 (1–2) (2005) 75–90. Springer Science + Business Media B.V., Formerly Kluwer Academic Publishers B.V. [38] SHIWA portal, https://shiwa-portal2.cpc.wmin.ac.uk/liferay-portal-6.1.0/. [39] SHIWA repository, http://shiwa-repo.cpc.wmin.ac.uk/shiwa-repo/. [40] SHIWA desktop, 2011, SHIWA Technical Report, http://www.shiwaworkflow.eu/documents/10753/626f809c-7853-40ce-a3b2-eb41a29a9ecd.

59

Tamas Kukla is a former Researcher at the Centre for Parallel Computing at the University of Westminster. He was involved in the research and development carried out in the field of distributed computing, workflow and data interoperability, clouds and virtualization. He was leading the Cloud Research Team and he had a key role in writing the proposal of the SHIWA project and in designing and developing the SHIWA Simulation Platform. Currently he is a Lead Software Engineer at a global company and his current research interests are in cloud based services, virtualization and big data.

Tamas Kiss is a Reader at the Department of Business Information Systems, and a principal researcher at the Centre of Parallel Computing, University of Westminster. He holds a Ph.D. in Distributed Computing. His research interests include distributed and parallel computing, cloud, cluster and grid computing. Tamas has been involved in several FP7 research projects such as CoreGrid, EDGeS, EDGI, SHIWA, and SCI-BUS, and FP7 support action projects such as DEGISCO, ER-flow and IDGF SP. He led and coordinated the application support activities and work packages in these projects. Tamas is currently the Project Coordinator for the CloudSME FP7 project that develops a cloud based simulation platform for manufacturing and engineering SMEs. He co-authored one book and more than 80 scientific papers in journals, conference proceedings and as book chapters.

Peter Kacsuk is the Director of the Laboratory of the Parallel and Distributed Systems in the Computer and Automation Research Institute of the Hungarian Academy of Sciences. He received his M.Sc. and university doctorate degrees from the Technical University of Budapest in 1976 and 1984, respectively. He received the kandidat degree (equivalent to Ph.D.) from the Hungarian Academy in 1989. He habilitated at the University of Vienna in 1997. He received his professor title from the Hungarian President in 1999 and the Doctor of Academy degree (D.Sc.) from the Hungarian Academy of Sciences in 2001. He served as full professor at the University of Miskolc and at the EötvösLóránd University of Science Budapest. He has been a part-time full professor at the Cavendish School of Computer Science of the University of Westminster. He has published two books, two lecture notes and more than 250 scientific papers on parallel computer architectures, parallel software engineering and Grid computing. He is a co-editorin-chief of the Journal of Grid Computing published by Springer.

Akos Balasko is working as a Research Fellow at the Laboratory of Parallel and Distributed Systems at the Computer and Automation Research Institute of the Hungarian Academy of Sciences from 2007, where he is a developer team member of WS-PGRADE Portal. He received M.Sc. degree from EötvösLóránd University of Sciences in 2008 in Computer Science, Systems and Software Engineering. Currently he is a Ph.D. student in the same institute. Main part of his profile is design and implement web-based interfaces for using distributed, grid and cloud systems, release management, and support scientific communities.

Gabor Terstyanszky is a professor at the School of Electronics and Computer Science, University of Westminster. He obtained M.Sc. and Ph.D. degree in Computer Science. Currently, he leads the Centre of the Parallel Computing at the University of Westminster. His research interest covers distributed and parallel computing. He made significant contributions to solving Distributed Computing Infrastructure and workflow interoperability. He was involved in several FP5 (WinPar), FP6 (ePerspace) and FP7 (EDGeS, DEGISCO, EDGI, SCI-BUS, SHIWA) research projects that address challenges of distributed and parallel computing, particularly infrastructure and workflow interoperability. He is the author and co-author of more than 100 hundred conference and journal papers and several book chapters. He was invited to Programme Committees of several major European and world conferences.

Zoltan Farkas has been a research fellow at the Laboratory of the Parallel and Distributed Systems in the Computer and Automation Research Institute of the Hungarian Academy of Sciences since 2002. He received his M.Sc. and Ph.D. from the Eotvos Lorand Science University of Budapest in 2004 and 2013, respectively. His research interests include grid computing, interoperability solutions, using standards and portal technologies. Within the Enabling Desktop Grids for e-Science (EDGeS) project he has created the core component of the grid interoperability solution, the 3G Bridge. Currently he is heading the portal team at MTA SZTAKI LPDS. He is a coauthor in more than 30 scientific papers in journals and conference proceedings on grid computing.