Future Generation Computer Systems 43–44 (2015) 24–37
Contents lists available at ScienceDirect
Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs
Exposing HPC and sequential applications as services through the development and deployment of a SaaS cloud Philip Church a,∗ , Andrzej Goscinski a , Christophe Lefèvre b a
School of Information Technology, Faculty of Science and Technology, Deakin University, Geelong VIC 3127, Australia
b
Institute for Technology Research and Innovation (ITRI), BioDeakin, Deakin University, Australia
highlights • • • • •
A framework for the deployment of SaaS clouds aimed at supporting scientific research. A novel resource selection approach which can automate complex deployment methodologies such as cloud bursting. Simplified deployment of Software as a Services through the publication of attributes. The automated development of hybrid HPC clouds. Framework and cloud feasibility and performance proofs demonstrated through a bio-informatics workflow.
article
info
Article history: Received 7 November 2013 Received in revised form 11 September 2014 Accepted 1 October 2014 Available online 13 October 2014 Keywords: HPC application service SaaS clouds Application deployment and exposure High performance computing
abstract Cloud and service computing has started to change the way research in science, in particular biology and medicine, is being carried out. Researchers that have taken advantage of this technology (making use of public and private cloud compute resources) can process large amounts of data (big data) and speed up discovery. However, this requires researchers to acquire a solid knowledge and skills in the development of sequential and high performance computing (HPC), and cloud development and deployment background. In response a technology exposing HPC applications as services through the development and deployment of a SaaS cloud, and its proof of concept in the form of implementation of a cloud environment, Uncinus, has been developed and implemented to allow researchers easy access to cloud computing resources. The new technology offers and Uncinus supports the development of applications as services and the sharing of compute resources to speed up applications’ execution. Users access these cloud resources and services through web interfaces. Using the Uncinus platform, a bioinformatics workflow was executed on a private (HPC) cloud, server and public cloud (Amazon EC2) resources, performance results showing a 3 fold improvement compared to local resources’ performance. Biology and medicine specialists with no programming and application deployment on clouds background could run the case study applications with ease. © 2014 Elsevier B.V. All rights reserved.
1. Introduction Currently biologists and medical researchers mainly use sequential applications (software tools) and have started to benefit from High Performance Computing (HPC) to analyze large amounts of data [1–3]. High Performance Computing studies often require the backing of supercomputing centers (to provide hardware) and HPC supporting staff (to install middleware, deploy applications and perform system administration tasks including
∗
Corresponding author. Tel.: +61 3 52271483. E-mail address:
[email protected] (P. Church).
http://dx.doi.org/10.1016/j.future.2014.10.001 0167-739X/© 2014 Elsevier B.V. All rights reserved.
resource monitoring). HPC resources, due to their high initial purchase and maintenance costs, are only affordable by rich institutions (research centers, some universities, and industry). These resources are shared by many researchers, which leads to long waiting times for execution of their applications. For this reason many researchers cannot access HPC infrastructures when needed; they have to scale down their experiments to get timely access to resources thus slowing down research progress. A solution to these problems exist in the use of clouds for HPC. Cloud computing alleviates the cost and scalability issues of procuring required IT resources [4,5]. However, the cost and time overheads in learning how to prepare a HPC cloud and then the applications to run it remains a problem [6]. Currently,
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
most HPC clouds are based on IaaS clouds enhanced by additional hardware and middleware to support HPC. In general, HPC cloud users are presented with a set of virtual servers and are then required to put the servers together to form the HPC facilities they need to run their software applications on. This process requires system administration skills, for which discipline specialists are not prepared [7]. In response, this paper proposes a solution to make HPC and cloud resources available to users with minimal computing expertise. While this can be accomplished through the use of Software as a Service, this approach requires the efforts of cloud service providers, which build and expose software through interfaces. Commercial operators are attracted to high profit, large market areas such as business and accounting. The investment required to develop services for specialized research areas (with a limited market) is therefore not attractive for service providers. For this reason, we propose the construction of a research cloud that enables researchers to take the role of cloud developer. By building a technology/framework, cloud platform around IaaS resources, and providing HPC applications exposed as services of a SaaS cloud system administrative tasks can be automated putting the focus back onto research. The proposed cloud solution should allow researchers without HPC and cloud computing expertise to access the large amounts of compute resources available through the cloud’s discipline interface for both data processing and storage. Software service deployment can be eased by supporting the routine analysis procedures that exist in scientific studies, for example data pre-processing. By building a repository of application deployment information, highly specialized applications could be shared as services while minimizing setup time. These concepts were explored by proposing a technology supporting automated IaaS cloud and HPC application deployment. A technical proof of concept was provided through a framework and generalized resource allocation algorithm. The frameworks feasibility was demonstrated in the form of Uncinus, a cloud platform designed to support scientific research. Uncinus serves as a practical example of how to build a research cloud around the concept of simplified service development, cloud scalability and resource sharing. As a proof of functionality, a bio-informatics workflow was published and deployed through Uncinus, taking advantage of public cloud, private cloud and cluster resources. In brief, the contributions of this paper are summarized below;
• We have proposed and presented a high-level technology/ framework for the deployment of SaaS clouds aimed at supporting scientific research. • We have devised a novel resource selection approach that can automate complex deployment methodologies such as cloud bursting. • We have simplified deployment of Software as a Services through the publication of attributes. • We have implemented our framework and resource selection approach in the form of Uncinus, a research cloud to support scientific research. The rest of this paper is organized as follows. Section 2 presents related work in the form of e-Research tools and (HPC) research clouds. Section 3 proposes an IaaS cloud framework that automates application deployment. Sections 3.1–3.3 present the requirements, structure and operation of our proposed IaaS cloud framework, respectively. Sections 3.4 and 3.5 focus on defining the methods used to publish and deploy applications and resources, respectively. Section 4 describes the steps taken to implement the framework as Uncinus. Section 5 focuses on proving the functionality of the proposed framework. In particular, Section 5.1 introduces
25
a bio-informatics case study that addresses the usability of Uncinus, whereas Section 5.2 presents results from the case study assessing the performance of Uncinus. Usability is measured in terms of easiness of use and minimizing computing activities such as HPC environment setup, while performance is measured in terms of execution time and cost. Finally, Section 6 presents the conclusion and future work. 2. Related work An analysis of the current state of projects and development of computing based frameworks and tools to support scientists leads to two major areas: (i) e-Research tools based on Web API and grids (clusters and/or clusters of clusters) and (ii) HPC research clouds, supporting applications exposed as services (SaaS). While the proposed cloud framework takes advantage of e-Research tool methodology, the outcome of this paper mainly sits in the area of HPC research clouds supporting service computing. HubZero [8] is an open source software platform for creating dynamic web sites that support scientific research and educational activities and promote scientific collaboration using primarily the grid infrastructure. By using HubZero, a scientific gateway (web site) containing discipline specific resources including software applications as well as data repositories can be formed and users of the scientific gateway can contribute by putting their own applications and data into the gateway for sharing. The P-GRADE Grid Portal [9] is a web based, service rich environment for the development, execution and monitoring of workflows and workflow based parameter studies on various grid platforms. P-GRADE Portal hides low-level grid access mechanisms by high-level graphical interfaces, making even non grid expert users capable of defining and executing distributed applications on multi-institutional computing infrastructures. Workflows and workflow based parameter studies defined in the P-GRADE Portal are portable between grid platforms without learning new systems or re-engineering program code. AGAVE (A Grid and Virtualized Environment) is a software development tool, developed by The Texas Advanced Computing Center (TACC). AGAVE seeks to make the separation of science and computing a bit easier by providing a set of REST APIs for performing distributed and grid computing. AGAVE excels in its ability to provide a holistic view of distributed heterogeneous systems that may span organizational domains into a single, cohesive platform upon which modern web applications can be built [10]. The next version of AGAVE promises to include new types of systems, such as public and private clouds, to give users faster turnaround times on their experiments. As of 2014, AGAVE supports Amazon EC2 resources which are deployed using one of three selection policies, HPC (greedy), CONDOR or CLI. Our proposed cloud solution integrates aspects from each of these e-Research tools in order to create a cloud platform targeted towards research. Like HubZero, application sharing is supported, however our cloud solution goes a step further and allows sharing of computing resources. Our framework incorporates the low level abstraction methodology of P-GRADE through automation of application deployment. Finally, from AGAVE, our cloud framework provides services, which separate scientific logic from computing and takes advantage of cloud scalability to improve turnaround times on experiments. While drawing from e-Research tools, our cloud solution is similar to that of other research clouds, providing services to schedule and execute applications. However, compared to well-known research clouds, our solution provides enhanced features relating to service composition and resource discovery. The following HPC research clouds served as a basis of comparison and contrasting for our framework.
26
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
Aneka [11] is a framework for development, deployment and management of cloud applications. Through a middleware approach, it provides modules to monitor cloud resources. Development in Aneka makes use of predefined programming methods (Task, Threads, MapReduce, and Parameter Sweep), each with different scheduling and execution methodologies. Unlike Aneka, which relies on a software engineer to develop services, our solution proposes automated deployment services. Opting to cut out the middle-man, allows researchers to become service provider able to access the cloud directly. CloudMesh [12] is a framework which aims to simplify the execution of multiple concurrent experiments on aggregate resources. Currently CloudMesh allows access to a range of clouds (including Amazon, Microsoft Azure) through a single interface. When a job is submitted, HPC jobs use the torques job scheduler (which implements a greedy approach) and cloud jobs can use Hadoop (which implements a map reduce approach). Future updates will add support for bare-metal systems and an app store which will allow users to access predefined experiment templates, in the form of images and image templates. Our framework provides similar features, such as hybrid cloud support and resource selection, but allows for a greater range of resources to be published (virtual machine, bare-metal resources and software). CLOUDRB [13] is a framework for scheduling and managing HPC scientific applications on clouds. In order to complete jobs within a user-specified deadline, CLOUDRB incorporates both deadlinebased and particle swarm optimization resource allocation. In terms of resource allocation, instead of CLOUDRB’s cloud centric solution, our solution can utilize both non-cloud and cloud resources to improve the performance of scientific applications and workflows. We also focus our framework on usability, supporting users by deploying applications as services. Galaxy [14] is a data integration, data analysis, and data publishing platform as well as a scientific workflow engine for computational biology. Galaxy users can share and publish analysis software as services, exposed through web forms defined using a XML like language. Galaxy supports the use of the Amazon EC2 cloud through the CloudMan service [15]. CloudMan automates the process of constructing a cluster using Amazon EC2 resources. Through a web interface, users must specify their AWS Secret Key ID, Secret Key, the cluster size and location of a persistent data volume. CloudMan utilizes a modified Cloud BioLinux image that contains Sun Grid Engine (SGE) [16]. This image is automatically deployed on the Amazon EC2 cloud as both the head node and workers. Once deployed, users can access Cloud BioLinux tools through command line and Galaxy through framework interfaces (via a web browser). Recently Galaxy has added support for MapReduce (Hadoop) and CONDOR allowing for automated scaling and hybrid clouds [17]. Our framework shares similarities with Galaxy (both supporting GUI design through XML like languages and workflow support) and hybrid cloud support, however our solution additionally provides support for publishing local resources and cloud bursting. Recent work from the University of Chicago [18] deploys a bio-informatics workflow across local and Amazon EC2 resources. Combining the features of the Galaxy workflow engine [14] and Globus allows for a robust research cloud that supports automated GUI generation, software sharing and workflow deployment. During workflow deployment, data was transferred through a web interface and resources selected manually through creation of a topology file. Unlike Globus, instead of manual resource selection through topology files, our solution supports resource selection through attribute based comparisons. This method of resource selection ensures full utilization of resources reducing the cost of service execution. Our cloud also supports workflow orchestration, and furthermore it makes an effort to deploy workflows in an intelligent manner, removing duplicate steps to improve performance.
HPC Hybrid Deakin (H2D) Cloud [19] is another solution which provides services to discover compute resources and deploy data and applications. This cloud platform is capable of utilizing both local and remote cloud compute services for single large embarrassingly parallel applications. In this solution, compute resources are published to a dynamic broker service that monitors the state of available compute resources. Compared to the H2D cloud, our framework makes use of a broker to support resource discovery, but can also store applications as a service. Finally, HPCaaS [20] is a software deployment tool for the Galaxy workflow engine that aims to simplify access to Amazon EC2 cloud resources. To provide a software service using HPaaS cloud, an Amazon Machine Image (AMI) and software service interface must be provided and stored. Users can select from stored AMIs and specify the required number of cloud resources. A middleware library is used to allocate published AMI to cloud resources. User’s access software on each deployed AMI through the service interface stored by Galaxy. Our framework approaches this publication at a higher level allowing for deployment at the application level, allowing users with limited cloud experience to develop software services. In summary, while there is a progress in research and development of e-Research tools based on Web API and grids (clusters and/or clusters of clusters), there are few (HPC) research clouds which support Software as a Service (SaaS). The features and compatibilities of the solutions described above are summarized in Table 1. 3. IaaS Cloud Deployment Framework Utilizing IaaS clouds to run applications can be a time consuming, demanding and expensive task. To access IaaS clouds users must manage cloud security credentials, and select and allocate resources. To use IaaS cloud resources, users must deploy applications, compress and transfer data. Although exposing the deployed cloud application as a service can simplify future access to the cloud, however this requires that a higher layer abstraction be created and graphical interfaces be designed to provide access to the deployed application, and applications must be stored and hosted for future use. To support deployment of (HPC) applications a number of services are required. These services must (a) support the configuration and utilization of HPC cloud resources and (b) support the construction of software services on the cloud. These services were integrated into a cloud framework in order to support research using HPC clouds. 3.1. Requirements In order to make clouds suitable for genomic analysis and research by non-computing specialists, a number of methods are proposed which allow users to publish virtual machines and applications to IaaS clouds. These methods are broken up into two categories; those that support utilization of cloud computing and those that support development of cloud services. Methods supporting cloud computing aim to reduce the complexity of utilizing IaaS clouds for research. Focus is placed on automating the common steps undertaken when utilizing and deploying applications on IaaS clouds. Four methods are proposed: (1) Management of cloud credentials; (2) Automation of resource selection/allocation; (3) Automation of secure data transfer; and (4) Automation of application deployment. Management of cloud credentials—To access cloud resources, users must provide cloud credentials to request resources from a cloud provider and a key pair to access each individual virtual machine instance. While the process to access cloud resources on an
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
27
Table 1 Comparison of e-Research tools and HPC research clouds supporting SaaS. Name
Type
Resource selection algorithms
Workflows
Hybrid cloud
Cloud bursting
Publication
HubZero P-GRADE AGAVE AGAVE API Aneka CloudMesh CLOUDRB Galaxy H2D HPCaaS Uncinus
e-Research e-Research e-Research Cloud Cloud Cloud Cloud Cloud Cloud Cloud Cloud
– – – Manual (Greedy), CONDOR and CLI Task, Threads, MapReduce, and Parameter Sweep Torques (greedy), Hadoop (MapReduce) Deadline-based and particle swarm optimization Manual (Greedy), Hadoop (MapReduce) and CONDOR Manual (Greedy) Manual (Greedy) Attribute based
N Y N N N N N Y N N Y
– – – N Y Y Y Y N N Y
– – – N Y N N N N N Y
Software and data Software Software Software – VM – Software Software, resource and data VM VM, software and resource
individual level can be handled by a user versed in system administration, it is not of interest to a discipline specialist; furthermore difficulties occur when utilizing many clouds (hybrid clouds) or many virtual machine instances (common in HPC cloud computing). In these scenarios, a mechanism to generate and manage these cloud security resources is required. Automated resource selection/allocation—To utilize cloud resources, a user must select and request resources from a cloud provider. The type of resources selected can have a large effect on the time and cost of running HPC applications in the cloud [21]. Once resources have been selected, cloud resources must be allocated and configured to enable HPC applications. When applied to HPC clouds, this process is made more complicated by the amount of resources that must be allocated and eventually terminated. The automation of this selection and allocation process can simplify access to HPC cloud resources. Automated secure data transfer—To support application deployment and execution, data must be transferred between the user and cloud. This can be a time consuming process depending on the size of data and the potential distance between the user and cloud provider. The time taken to transfer data can be reduced through specialized compression algorithms; genomic analysis can take advantage of specialized compression methods [22]. Additionally, there must be mechanisms in place to ensure confidential data is protected. Data transfer must incorporate encryption; the suggested approach would be to utilize a combination of symmetrical key and asymmetrical key encryption, and hashing. Automated application deployment—In order to enable Software as a Service development by non-computing researchers, there is need to automate aspects of the HPC application deployment. Supporting this application deployment process requires at least two levels of abstraction: (i) low-level deployment that consists of methods to install and configure a HPC application in a virtual machine and (ii) high-level deployment that consists of methods to save an image of the pre-configured virtual machine and construct an API of the HPC application that together form a deployable unit. Methods supporting cloud service development aim to reduce the complexity of exposing applications deployed on IaaS clouds. Focus is placed on defining the methods to support sharing of cloud resources and services, and generating the interfaces necessary to expose and utilize applications deployed on IaaS clouds. To this end, two methods are proposed: (1) Automatic Interface Generation; and (2) Deployment Information Publication. Automatic interface generation—To turn deployed applications and virtual machines into services, they need to be exposed through a graphical interface. The development of interfaces allows for the abstraction of both the application deployment and the command line execution style of HPC cloud applications. In order to simplify this process, a mechanism is required to automatically transform any HPC application, IaaS cloud execution ready, into an easy-touse service to be executed in Software as a Service (SaaS) clouds.
Fig. 1. IaaS Cloud Deployment Framework.
Deployment information publication—Combining the automation of application deployment and automation of interface generation allows for the construction of HPC application services. To enable sharing of HPC application services between researchers, this framework requires the construction of a HPC application services registry. Each HPC application service is proposed to be published and stored in such a manner that their discovery and selection are easy. This implies that the invocation information and at least two attributes of an application service, a host location (cloud image) or a web form, must be published. Additionally, to support discovery of application services (stored in the form of a host location), it is necessary to document the services provided through each cloud image. 3.2. Framework overview A framework that provides the services described in Section 3.1 has been proposed (see Fig. 1); it builds layers of abstraction upon IaaS clouds. In this framework, services are broken up into interacting layers, an Application Broker that implements publication services and supports resource/service discovery and selection; a Cloud Interface that implements resource deployment services, and a Graphical Interface that exposes the application broker services to users. The structure of the proposed framework is as follows. Sitting directly above the IaaS cloud is the Cloud Interface Service layer. This layer facilitates communication between the user and cloud, proving the ability to request computing resources from the cloud and securely transfer data to the cloud. An Application Broker sits above the Cloud Interface Service, stores and manages the security credentials and deployment information published by the service provider. This layer provides both virtual machine and application storage allowing application service providers to publish information regarding the access specifications of virtual
28
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
Fig. 2. Deployment of resources.
machines and application inputs and deployment procedures. Finally, through a graphical interface layer, the application broker is exposed allowing for publication and deployment of services by end-users. 3.3. Framework operation Through the framework presented in Section 3.2 users can publish applications, non-cloud and public/private cloud resources to the Application Broker. Applications are published as source code, along with any installation scripts and application requirements. Unlike traditional web services (which expose a complete system), this separation of software and resource allows users to take advantage of cloud scalability. Non-cloud resources are published using a domain name as an identifier along with the resource specification information (the CPU, RAM and OS). Virtual Machines are published in the same way, except using an external cloud identifier, the actual virtual machine stored by the cloud provider. This allows for quick deployment of cloud resources as virtual machine transfer is not required. The publication process is described in detail in Section 4.2. Application and resource discovery is also provided to users through the application broker. A number of searchable attributes are provided during publication, for example the operating system and service description. In addition, this framework is designed to support disciplinary separation of services through division of the application broker layer. In other words, a research department could host their own application broker, which has specialized applications and department resources published. This separation can be used to limit services to specific users. Another benefit of this approach is the ability to scale this framework to many users without loss in performance. Once an application or resource has been discovered, the application broker can pass on the service request to the Cloud Interface Service layer. Deployment of resources is handled separately, however deployment of applications requires resources to be specified in conjunction with the application. A process demonstrating the deployment of an application through the IaaS framework is seen in Fig. 2. In this process, through the graphical interface, users query the broker for a set of applications they wish to run. Resource discovery is then performed to find, from published resources, a sub-set that satisfies application requirements. Resource selection determines how applications are to be deployed on the selected sub-set of compute resources. The selection process makes use of published application requirements and computer specs. The feasibility of each potential application to resource mapping is given a score based on comparisons of common attributes (OS, CPU and RAM). Current usage of each resource is also taken into account when selecting resources. The goal of the applications’ mapping is to achieve the best possible resource utilization. Therefore, multiple applications can potentially be mapped to a single resource. The selection process is described in more detail in Section 4.2.
Following a successfully performed resource to application mapping, applications are then installed and executed. The output of each application is directed to a decision module. At this stage, a decision needs to be made on the validity of the result to determine if additional processing1 is required. If the output data is deemed inadequate then additional data can be passed to compute resources and applications re-run. If the research question has been answered successfully, the job is terminated and resources are released back to the pool. Depending on the complexity of the data, human intervention may be required to make this decision. In a similar manner, the publication and deployment of workflows is supported. In a workflow, the previously published applications are combined into modules. A workflow may consist of many modules. These modules are given a logical order in which to execute. The resource selection process is then performed dynamically based on the execution order of each module. By performing resource selection in this manner, parts of the workflow can be run in parallel. During workflow deployment, resources can be allocated statically or dynamically. Using static allocation all resources are deployed before workflow execution. As resource setup is performed before workflow execution there is no need to wait for cloud deployment and setup; however resources maybe left idle waiting for workflow modules to complete. Using dynamic allocation, resources are deployed as required. The dynamic method of allocation has the benefit of saving money when deploying cloud resources at the cost of performance. Regardless of which method of resource allocation is utilized, after execution of each module each file system must be synchronized to distribute generated files. 3.4. Application broker Use of a broker to support scientific applications can improve the productivity of a research community. Through a broker, each user becomes a service provider capable of publishing their data analysis method; in this way a library of application deployment information is built. Re-using deployment information can drastically reduce the required setup time of compute resources. This is particularly true in disciplinary focused systems, in which there is often a collection of commonly used tools and methods for data extraction. Through the broker, service providers can publish not only applications, but also cloud and non-cloud resources as services. The publication process varies depending on the type of service that is deployed (see Table 2). The information required to publish applications is designed to allow sharing, while hardware is user specific and incorporates individual access information. When publishing applications to the application broker, installation information, execution information and application requirements must be specified. The service provider begins by assigning a descriptive name (App_Name) to the application service; this name is used during application discovery. The service provider must also provide the installation procedures (Install) and running procedures (Running) to be undertaken on remote compute resource(s). Files required by the application service are also stored by the broker (Files); stored files could be source code, binaries or application data. Service providers must provide the broker with information on how to invoke the application service (AppLocation). Information about how input parameters (Arguments) and expected program output (Results) are displayed is stored as XML. Optionally, additional usage information about the application (Manual) can be published. Finally, each application
1 The logic of the decision module is defined by the service provider in order to improve the quality of the service. If not defined, control is instead given back to the user after job completion.
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
29
Table 2 Attributes required for application and resource publication. Data collection
Attributes
Application Resource (Physical) Resource (VM)
App_Name, Install, Running, Files, AppLocation, Arguments, Results, Manual, reqOS, reqCPU, reqRAM, Publish Description, DNS, loginName, SSHkey workingDir, specOS, specCPU, specRAM, CPUusage Description, AMIid, loginName, workingDir, specOS, instanceTypes, AppIDs
service must define hardware requirements and the amount of resources utilized during execution. The broker stores the required operating system (reqOS), CPU utilization consisting of number of nodes, cores and clock speed (reqCPU) and memory utilization in gigabytes (reqRAM). Upon successful publication, the service can be flagged for public viewing (discovery). The framework supports the publication of public and private clouds, and also non-cloud computers. When publishing compute resources as a service, access information and system specifications must be provided. During resource deployment, the main difference between clouds and non-cloud computer resources is the initial state. Cloud resources are allocated on-demand through series of API calls made to the provider. Non-cloud computers are not provided on-demand; resources are in a ready state, having been already assigned an IP address however, there is potential for a queue to that resource. Publishing a non-cloud resource begins by naming the system (Description); a descriptive name is often chosen to improve resource discovery. Each resource makes use of a DNS or IP address (DNS) as a unique identifier. A collection of resources, for example a cluster, can be defined using a set of addresses. Access information must be provided in terms of a login (loginName), a SSH key (SSHkey) and a readable/writable directory (workingDir). The hardware and software making up each compute resource must also be provided. The broker stores the operating system (specOS), CPU cores and clock speed (specCPU) and memory utilization in gigabytes (specRAM). The broker also stores the historic CPU usage (CPUusage). Unlike the other pieces of information that need to be provided, the CPU usage value is calculated from observations taken when resources are utilized. The process of publishing a cloud resource is simpler than that of physical hardware, as some information can be retrieved from the cloud provider. Instead of a DNS address, each virtual machine instance has an identifier given by the cloud provider (AMIid). SSH keys are not required, as they are requested from the cloud provider and stored for the duration of the cloud resources lifetime. The instance types each virtual machine supports must be provided (instanceTypes), however from the instance type, cost and the hardware specification of each virtual machine can be determined. Finally, each virtual machine resource stores a list of available applications (AppIDs), this information is provided to end-users during resource discovery. The benefit of this publication approach lies in the reuse of deployment information and interoperability of resources. By publishing applications through the application broker, stored procedures can be re-used to deploy previously run application services. It is also possible to create new services by building on top of pre-existing services; this can be utilized when publishing applications that rely on separate applications or libraries, allowing for a modular approach to service creation. When publishing resources through the application broker, a standard interface allows users to access a wide variety of compute resources, making resources easier to obtain. On the other hand, the resource publication process is not perfect. Resources in which the hostname is not known (where users are only given access to a head node and job submission is performed through a scheduler) cannot be published to the broker. This means that resources provided through supercomputing centers, are not publishable at the hardware level.
The application broker is instead forced to handle these resources on a per application basis. Adapting an application service to one that relies on an external scheduler requires modifications be made to the application location (AppLocation), operating system requirements (reqOS) and application execution script (Running). The application location must incorporate job submission into application execution. The type of scheduler utilized must be incorporated into the OS requirements, allowing for accurate mapping of application to scheduler. Finally a feedback loop must be built into the application execution instructions. This is often a loop that checks for a particular output file or queries the scheduler for job status. In this way, external manage resources can be accessed in the same way as the natively supported clouds, and non-cloud computer resources. 3.5. Cloud interface services Published resources and applications can be deployed via the proposed framework. To support this process, the Cloud Interface Services layer provides a method for resource selection. Resource selection is used to ensure applications are deployed on compatible resources. During resource selection, attributes published to the broker are compared in order to generate a resource to application map. This map is created by measuring the distance of a service’s requirements to each available resource specifications. In this case the optimal resource is dependent on the calculated distance length. This method could be extended to selection and deployment of workflows. We define workflows as separate services that run in a specified order, where multiple services could run at the same time. By applying resource selection dynamically, workflows can be deployed in an intelligent manner. Decisions can be made on the fly of how best to deploy each service, individually or in parallel. The goal is to reduce the cost of procuring cloud resources. 4. Uncinus Cloud Deployment The IaaS cloud Deployment Framework described in Section 3 aimed to simplify resource deployment on IaaS clouds, leading to exposing HPC applications as services and running it within a SaaS cloud. The implementation of the framework is described in the following sections. 4.1. Uncinus research cloud Implementation of the framework led to the creation of the Uncinus research cloud. Uncinus (see Fig. 3) consists of three major software components: Cloud Deployment Middleware, a mySQL DBMS, and Euca2ools. Uncinus is a web based environment that supports a modular approach to deploying software on IaaS clouds. Designed to be hosted on a Linux server, it consists of a number of services that enable automated application deployment and execution. A key advantage of the framework-based Uncinus is, through the basic features of clouds, able to take advantage of scalability to improve software performance. In addition, Uncinus also allows for the capability to share resources and applications through a centralized broker.
30
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
Fig. 3. Cloud Deployment Middleware.
The Cloud Deployment Middleware provides services to facilitate publication, resource selection and resource deployment. Following the layers given in the IaaS Cloud Deployment Framework, the application broker layer contains publication services while the Cloud Interface Layer contains resource deployment services. Publication services interact with a mySQL database to store Uncinus credentials (a username and password used to access the broker) and published applications and resources (see Section 3.4 for a list of published attributes). Resource deployment services communicate with the cloud through Euca2ools [23], a command-line implementation of the Amazon EC2 web management tool. Through the use of Euca2ools, Uncinus maintains compatibility with clouds that implement amazon-like API (Amazon EC2 [24], NEcTAR [25], OpenStack [26], Eucalyptus [23] etc.). The application broker layer divides services into two modules: the AMI/App Deployment Recorder and the Application to Interface Parser (A2I parser). The AMI/App Deployment Recorder implements services to query and publish resources and applications. To publish a cloud resource, a machine image identifier, username and working directories need to be provided. When publishing applications, installation instructions, any required installation files, a list of input arguments and output controls need to be provided. The AMI/App Deployment Recorder stores this data in a mySQL database on behalf of the application service provider. The A2I Parser module translates the input arguments recorded by the broker into equivalent web controls allowing for dynamic interface generation. The Cloud Interface Services layer consists of the Secure Data Transfer and Cloud Resource Allocation modules. Cloud Interface Services communicate with Amazon EC2 via Euca2ools to provide cloud resources to the user. When a user starts a cloud job, the Cloud Resource Allocator module creates the necessary private keys and security groups before requesting resources from Amazon EC2. Once the Cloud Resource Allocator can successfully access the virtual machine, the Secure Data Transfer module is used to deploy application modules to the virtual machine instance. The Secure Data Transfer module implements methods to expose a remote file system, including remote directory calls and the LZ77 data compression algorithm [27]. The operation of services implemented by Uncinus is shown in Fig. 4. Users access the Uncinus application broker through a graphical interface. This allows users to access the Resource Deployment Recorder in order to publish and query the broker. If a user selects a service in which to deploy, the application requirements and/or resource specifications are passed to the Compute Resource Allocator. This module implements resource selection, determining how applications are best mapped to resources. If a successful mapping has been found, resources are then deployed by the Cloud Resource Allocator. If cloud resources are required, the details are passed to Euca2ools that requests the required amount of resources from an available cloud provider. Once cloud resources are deployed, applications are setup. Application data and instructions are sent via the
Fig. 4. Operation of service implemented by the Uncinus Cloud Deployment environment.
Secure Data Transfer services. Secure Data Transfer ensures that application data is compressed, transferred to remote machines and decompressed on the other end. Upon successful deployment of all resources and applications, the Compute Resource Allocator sends the deployed application arguments to the Argument to Interface Parser (A2I Parser). In response, the A2I Parser generates web forms to transmit data and execute the deployed applications. From this point on, users are able to access the deployed application through the graphical interface generated by the A2I parser. 4.2. Uncinus application and resource publication The application broker implemented by Uncinus allows service providers to publish not only applications, but also cloud and cluster resources as services. As specified in the framework (see Section 3), the publication process varies depending on the type of service that is deployed. The application publishing process is designed to allow sharing, while hardware is user specific and incorporates individual access information such as usernames and SSH keys. Uncinus implements web interfaces to support the publication and storage of attributes. Users can publish an application as a service through a simple web interface (see Fig. 5). These interfaces support the attributes defined by the framework (see Section 3.4). Arguments provided through this interface must be assigned a type depending on the data that must be provided by the user. The Uncinus broker implements an XML-like language that defines a number of common input and output types. These types are;
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
31
Fig. 7. Cloud publication interface.
recorded, and used during application deployment to ensure that applications are run on reliable resources (see Section 4.4).
Fig. 5. Application publication interface.
4.3. Automated resource documentation Once a cloud resource has been published, Uncinus can carry out the following steps to help improve the overall knowledge of virtual machines, thus supporting the discovery of cloud resources. (1) A user defines a virtual machine as a tuple of a cloud identifier, a login name and a list of available instance types. virtualMachine = {vmID, loginName, instanceTypes} . (2) A list of applications published through Uncinus (appBrokerList) is defined. Each application in this list is described by a name, location and manual. app = {AppName, AppLocation, appManual} appBrokerList = {x ∈ app} .
Fig. 6. Non-cloud publication interface.
•
—Secure transfer of data files; the service provider • • •
•
can specify the file name. —Generates a hidden web field that indicates that the deployed application can be run on multiple machines. —A text based argument that is substituted into the execution script when the service is invoked. —Exposes a configuration file through a text control, allowing for direct manipulation of services. This type of control is often used to expose and configure applications such as databases and web servers. —Exposes existing web interfaces, used when deploying web server applications on the cloud.
Simple web interfaces are also made available to publish public and private clouds (see Fig. 6), and non-cloud resources (see Fig. 7). When publishing compute resources as a service, access information and system specifications must be provided. As a user requests resources on-demand from a cloud provider, Uncinus (through euca2ools) is only compatible with clouds which implement amazon-like API, for example; Amazon EC2, NEcTAR [25], OpenStack [26] and Eucalyptus [23]. On the other hand, the requirements of publishing non-cloud or non-amazonlike clouds are merely that the user has direct access to storage and permission to install and execute applications. Having direct access to compute resources allows Uncinus to monitor and manage cloud and non-cloud resources. Information about compute resources is
(3) The defined virtual machine is then deployed to the cloud using the cheapest2 available instance. A list of applications exposed through Uncinus is then devised by comparing the virtual machine (vmApps) against applications stored in the Uncinus broker (appBrokerList). vmAppList = vmApps
appBrokerList.
(4) The virtual machine is then terminated and the list of available applications (vmAppList) is published to the broker as a set of application identifiers (appIDs). Through this method, it is possible to refine a broker’s cloud image store and improve the discovery and selection of computing resources. 4.4. Uncinus resource selection Deployment of resources and applications via Uncinus is supported through the use of a method for resource selection. This method makes use of stored application and resource information to make informed decisions about the placement of applications on cluster, grid and cloud resources. Through this selection process, a range of criteria are taken into account (including resource cost, application specs and current/historical usage of each resource). This method maps selected applications to available resources in
2 ‘‘Cheapest’’ in the financial sense is one of the possible examples. Uncinus supports different criteria as a basis of selection: financial cost, execution time, trade-off between cost and performance etc.
32
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
such a way that resource usage is optimized and waiting times are reduced. As long as the user has money available, Uncinus opts to avoid queues and use public cloud resources to minimize job submission time. The resource mapping is only accurate at time of calculation; therefore deployment of resources is required up front. The resource discovery and selection method is carried out in the following steps: (1) The user selects one or more published application services from a list. Each service consists of source code, running script and application requirements (which include the required operating system, CPU, memory). appReq = {OS, Nodes, CPU, RAM} appList = {x ∈ appReq} . (2) Next the user selects published non-cloud resources and the amount of public cloud resources they wish to hire. Computer systems are indexed via hardware specification (cmpSpec) that consists of the rental cost, running operating system, CPU, memory, number of nodes, network latency and bandwidth. Compute = {OS, CPU, RAM, Nodes} cmpSpec = {cost, Compute, Latency, Bandwidth} rPool = {x ∈ cmpSpec} . (3) From the pool of compute resources (rPool), a sub-set that is best suited to the selected service(s) is allocated. The resources making up this sub-set is determined as follows; a. When utilizing compute resources, often there is no guarantee about the state of the system. The resource may be heavily utilized or in the worst case unavailable. Therefore, an availability score, or rScore, is calculated for each resource; (HstUsage ∗ 0.6) + (CurUsage ∗ 0.4) . rScore = 1 − 100 b. The rScore uses both historical usage (HstUsage) and current usage (CurUsage) data to provide a measure of resource reliability. Historic usage is measured and stored by the broker every time a job is run, while current usage is collected at run time. Due to the uncertainty of a single collection result, our calculation of the rScore is weighted by 20%, favoring the historical usage. The weight given to each usage value can be changed to effect the priority given to current and historical usage data. As long as both weights add up to 1, the calculated rScore will be between 0 and 1, where 1 represents an idle system and 0 a heavily used system. c. Using this score a sub-set of resources is defined, where resources must meet the rRank threshold and preference is given to the top K resources, where K is the total number of required compute resources. rSubset = {x ∈ rPool : rRank (x) ≥ 1 } . By setting the rRank threshold to 1, resources that do not meet application requirements are eliminated. In a scenario where the only available resources do not meet application requirements, the service will not run. A user encountering this scenario must either make available resources that meet application requirements or choose a service with reduced requirements. Of the resources that meet the application requirement, idle resources are prioritized and busy resources are sent less work or if possible avoided. (4) Applications are then mapped to the resources in the sub-set. This mapping uses set distance values to make optimal use of available resources. a. The distance between each set of application requirement attributes and resource specification attributes is calculated.
This is performed by treating each set of attributes as a multi-dimensional vectors, and calculating the distance between each set of applications and resources. dist (appReq, cmpSpec)
= inf {∥x − y∥ x ∈ appReq, y ∈ compSpec} . b. Applications and resources that have the closest distance are linked together. Utilized applications are removed from published service list (appList) and utilized resources are removed from resource sub-set (rSubset). rMap = {x = rSubset, y = appList : min (dist (x, y))} . This distance measurement is designed to reduce resource wastage (for example an application with the requirements of a 1-core processor should not run on a system with an 8-core processor unless it is the only available resource). This is very important in Cloud or HPC systems, which implement accounting systems where users pay for resources they use. While it is possible to improve resource usage by imposing a strict limit on the distance score (i.e. applications can only be mapped to resources in which requirements match exactly or with a certain distance), this limits the number of valid mapping solutions. If a strict limit is placed on the distance score, application requirements are small and only large resources exist, jobs will never run. For this reason, a strict limit on the distance score should only be implemented if a user has a guaranteed large range of resources with a variety of specifications. c. A difference value (rDiff) is calculated for each linked resource and then checked against the smallest application requirements in the published service list. rDiff = compSpec − appReq ≥ min (appList) . Resources with an rDiff score large enough to run the smallest application in the published service list are placed back into the resource sub-set (rSubset) for further consideration. This allows for multiple applications to run on a single node. d. Steps a, b and c are repeated until all applications are mapped to resources. 4.5. Workflow enabled selection Uncinus also supports selection and deployment of workflows. During this process, resource selection is applied dynamically to reduce the cost of procuring cloud resources. Workflows consist of services that are divided into blocks based on run order. Based on the number of services in a block, Uncinus is able to make intelligent decision about how to deploy each application, individually or in parallel. While the service provider can specify how a workflow is to be run, Uncinus can make intelligent decisions about how workflows are deployed. In a workflow, commonly the output of a module is directed to the next. Uncinus is able to recognize starting points through comparison of published running instructions and received output data. Modules are scheduled once all application inputs have been received. Applications with the same input are expected to have the same output. This is determined by calculating and comparing the hash of all input files. In this case, instead of execution of the module, output data is transferred and substituted into the next workflow step thereby reducing the workload. When deploying workflows, Uncinus also takes into account the cost of data transfer and communication, opting to run services in the same block on the same machine if possible to reduce the cost of communication. Depending on the size of data and the bandwidth, Uncinus can determine the cost to transfer output data to the next workflow step and react accordingly. In other words, the need to transfer large files can force the next services in a workflow to be allocated to the same machine/cloud provider.
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
33
Table 3 Published workflow module requirements. Order
Modules
Application requirements
1
Annotate data
OS: Linux 1 node; 1-core (1.24 GHz) 2 GB RAM
2
Gene set enrichment
OS: Linux 1 node; 1-core (1.24 GHz) 2 GB RAM
2
Gene set enrichment (P-value)
OS: Linux 1 node; 1-core (1.24 GHz) 2 GB RAM
3
Find Pivot Point
OS: Linux 1 node; 1-core (1.24 GHz) 2 GB RAM
4, 4
Correlation
OS: Linux, requires OpenMPI 8 nodes; 1-core (1.68 GHz) 2 GB RAM
5
Generate network map
OS: Linux 1 node; 1-core (1.24 GHz) 2 GB RAM
N/A
OpenMPI v1.6.3
OS: Linux
5. Uncinus assessment The framework feasibility was demonstrated through the design and development of the Uncinus research cloud. To completely achieve the goal of this research, the following questions are asked: (i) can Uncinus provide cloud resources in a manner accessible by discipline scientists? and (ii) does Uncinus offer good execution performance? The answers to these two questions are provided through a case study described in the following two sections.
Fig. 8. Correlation module publication.
30 node West-lin private (HPC) cloud and the Mamsap server. It should be noted that local resources are heavily used and availability changes regularly. The SAWF workflow was deployed on available resources using the automated resource selection algorithm described in Section 4.4. Normally, this mapping process is hidden from the user; however, as a key operation of this casestudy a logical study of this process is discussed.
• The service with the run order of 1 (Annotate Data) is compared
5.1. Genomics case study One area which can take advantage of cloud and service computing technologies (provided through the framework described in Section 3) is the study of mammalian genomics. Using the publication interfaces shown in Section 4.2, a workflow (which aimed to extract novel gene interactions) was published to Uncinus. The workflow was split into a number of modules (see Table 3), each providing a service to carry out genomic analysis. Of these modules, the correlation service (see Fig. 8) was designed to be run on multiple nodes, and therefore had an additional requirement of OpenMPI. These modules can be executed as a workflow by following the specified run order, where multiple instances of a service may be required. Compute resources were published through web interfaces (see Section 4.2). Resources include a 30 node private (HPC) cloud called west-lin, a standalone web server called mamsap, and Amazon EC2 resources (see Table 4). The two local3 resources (West-lin and Mamsap) were published as attributes. While public cloud resources (Amazon) required the development of an Ubuntu 64-bit server AMI (see Fig. 9) before attributes could be published. During this process a pre-built Ubuntu AMI was selected from the public Amazon repository and updated to remove potential vulnerabilities. Using the resources and service selection interface (see Fig. 10), the published services were selected along with published cloud and cluster resources. Selected resources consisted of five Amazon cluster compute instances, five Amazon large instances, the entire
3 We define ‘‘local’’ resources as computers in the same organizational boundary as the user and in a single organizational unit.
•
•
•
•
to each available compute resource in order of financial cost, cheapest to most expensive (Mamsap, West-lin, Amazon Large, Amazon Cluster). As the requirements of the Annotate Data service match those of the Mamsap server, this service and resource are mapped together and executed. Once the Annotate Data service has been executed successfully, this mapping procedure is repeated with services with a run order of 2. The Gene Set Enrichment (GSE) and GSE (p-value) modules are the target of the 2nd mapping step. As these services have the same run order, Uncinus opts to allocate to the same machine/cluster if possible to reduce communication costs. The Mamsap server does not have the required amount of resources, thus is passed over. Uncinus detects that all but two West-lin nodes are fully utilized, which is just enough to run the GSE and GSE P-value services. During the mapping of 3rd run order services, the Mamsap and West-lin server are fully allocated. With no local resources available, rather than waiting, the Pivot Point service is mapped to the Amazon EC2 cloud. For the Pivot Point service a large instance type (m1.Large) can fulfil all service requirements. During the 4th run order service, local resources are once again available, however not powerful enough to run two correlation modules. Each correlation module requires 8 nodes each with 4 GB RAM; Uncinus provides these resources through two cluster compute instances (cc1.4xlarge). The last mapping step in carried out for the Generate Network Map module. This service could be run using the Mamsap server however the data generated by the correlation module is very large (29 GB) and is mapped instead to an Amazon large instance to reduce the cost of data transfer.
This process results in the automatically generated resource map shown in Fig. 11. In summary, Uncinus makes cloud resources accessible by discipline researchers through the following features. First, by
34
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
Table 4 Uncinus resource pool. Name
Hypervisor
Platform
Hard disk
Comp. specs
Amazon (cc1.4xlarge)
Modified Xen: HVM
Linux (64-bit Ubuntu 9.10)
Elastic block store
2 × Intel 4-core Nehalem 23 GB RAM
Amazon (m1.Large)
Modified Xen: Paravirtual
Linux (64-bit Ubuntu 9.10)
Elastic block store
2 × Xeon 2007 7.5 GB
West-lin cluster
–
Linux (64-bit CentOS)
Shared drive
Nodes: 10 Physical, 20 Virtual Intel 4-Core Duo 8 GB RAM
Mamsap server
–
Linux (64-bit CentOS)
Shared drive
2 × Xeon 2007 8 GB
Fig. 9. AMI creation process.
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
35
Fig. 12. Resource allocation and module setup time of Uncinus. Fig. 10. Resource and service selection interface.
Fig. 13. Uncinus workflow execution break down by module. Table 5 Size of generated data during analysis.
Fig. 11. Generated resource to service mapping.
hiding the complexities of workflow and resource allocation, researchers can focus purely on data analysis, as demonstrated in the case study. Second, Uncinus provides an interface which is focused on software, rather than hardware, similar to commonly used packages such as Galaxy [13]. Third, the resource selection algorithm makes use of attributes to turn a complex procedure into a simple question, how many resources are needed to run the application? Fourth, it makes minimal assumptions about the state of the hardware to simplify (which is easier for non-computing researchers to work with and understand). 5.2. Results While Uncinus allows discipline researchers to access cloud resources, the benefits of utilizing cloud resources (when compared to stand-alone server solutions such as Galaxy) has not been addressed. In response, the time taken and cost to carry out the workflow described in Section 5.1, is investigated and described below. During the analysis procedure, resources were deployed following the generated map. Setup (see Fig. 12) includes the time to initialize cloud resources, transfer application data and install applications. The Correlation, Map and Pivot modules had to request resources from Amazon and as a result took longer on average to set up compared to local resources (Mamsap, West-lin). Results show that the correlation module took the longest time to setup, this was due to the use of cloud resources and the additional requirement of OpenMPI which has to be deployed on each cluster instance.
Modules
Data size (Human)
Data size (Mouse)
1. Normalization 2. Analysis 3. Annotate data 4a. Gene set enrichment analysis 4b. Gene set enrichment (P-val)
1.966 MB 29 kB 1.792 MB 1 kB 1 kB
2.247 MB 14 kB 2.228 MB 5 kB 7 kB
6. Find Pivot Point 7. Correlation 8. Generate network map
29216 MB
2.635 MB 988.643 MB 1.434 MB
Once resources were setup, the workflow was executed with the goal to compare human and mouse genetic data. The workflow was run twice; the first run focused on finding interactions between active genes (Up), the second finding interactions between non-active genes (Down). The run time of each individual module is seen in Fig. 13. The GSEA and correlation modules took the longest to run taking three hours and four hours respectively. Execution of the workflow generated a large quantity of output data (see Table 5). The results show a 30 fold difference in the amount of data generated from the human dataset compared to the mouse dataset. The main generator of data is the correlation module which ranks all possible gene interactions. During the correlation of the human dataset, an input file of 2.2 MB is turned into over 29 GB of data. This difference in generated data is due to the human dataset containing more genes then the mouse dataset. It was due to the large amount of data generated by the correlation module that Uncinus mapped the Generation Network Map service to an Amazon cloud instance instead of local resources. In order to assess the performance of Uncinus the workflow was run sequentially on the Mamsap server as individual modules and as a workflow (see Fig. 14). Sequential execution on the Mamsap server took over 12 h to run. For individual module deployment, applications were mapped to a combination of private cloud, server
36
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37
6. Conclusion and future work
Fig. 14. Overall performance comparison of local resources and Uncinus.
Fig. 15. Cost comparison of local resources and Uncinus.
and public cloud resources as described above, this resulted in a total run time of 8 h. When run in workflow mode, Uncinus was able to find the optimal module running order, realizing that the Correlation module could be performed after the annotate step. In addition, during the up-regulation run, Uncinus recognized the duplication of the annotation and correlation step resulting in a total run time of less than 4 h. We also assessed the financial cost of achieving the reduced run time when running the workflow as a sequential service through Uncinus and as an Uncinus workflow (see Fig. 15). When run as a sequential service on local resources, no cost was incurred as access to local resources were provided via a grant and considered free for the purposes of this case study. It should be noted that this is often not the case; it is common practise for users to be charged for HPC resources. Moving execution of this service to Uncinus enabled use of both local and Amazon EC2 cloud resources. Again we assumed that local resources are free, but cloud resources were charged per hour resulting in a total cost of $7.28. When running services as a workflow through Uncinus the cost was reduced further resulting in a total cost of $2.10. This reduction in cost was due to the lack of duplication of the annotation and correlation modules. In both, Uncinus and Uncinus (Workflow) scenarios, there was no charge for data transfer. As seen in Table 5, data transfer into Amazon was minimal, and while over 30 GB of data was generated this was all internal to Amazon and cost nothing to move [28]. This was due to Uncinus ensuring that the last module of the workflow was mapped to Amazon EC2. Even if a user wished to move this data to local resources at the current cost of $0.01 per GB this would only cost ∼30c. In conclusion, Uncinus is able to simplify access to cloud resources and achieve faster processing times through the use of cloud bursting, while minimizing the cost of the use of cloud resources.
Researchers carrying out analysis on big datasets face problems of cost and in obtaining required resources when utilizing HPC (provided in the form of clusters and grids). These issues can be solved through HPC clouds; these platforms allow users to access HPC resources on demand, without the need to invest in hardware and maintenance. However, discipline researchers do not have the administrative knowledge to setup virtual clusters in the cloud; additionally, these virtual clusters are difficult to develop and use. To address these issues we made contributions through (i) the design of a research cloud framework and attribute based resource selection algorithm, (ii) implementation of this framework and algorithm as a SaaS cloud called Uncinus, and (iii) a case study that validated the presented approach and implementation. The framework addressed cloud usability by providing researchers the tools to access the cloud and run distributed applications. Users with a background in programming, system administration and cloud computing can develop HPC software and publish VM resources. Users with knowledge of the software applications (but limited programming skills) can define required attributes through the service publication interfaces and become SaaS providers. Lastly, users with a background in biology and minimal computing knowledge can access the applications and resources (selected through the broker) that are required to perform analysis. For such users, clouds are made completely transparent and HPC applications are exposed as services. In this way users with different levels of computing expertise could take advantage of cloud resources. To encourage the development of HPC cloud services, the framework proposed and Uncinus implemented attribute based publication. By providing deployment information, software is deployed as a service, and exposed through an automatically generated web interface. These published applications are made available to other users through a broker. Furthermore, users can deploy and access these services through generated web interfaces; resource selection algorithms find the optimal resources that fit the requested services. Users accessing software developed via Uncinus benefit from the on-demand nature of the cloud to obtain required resources. By allowing users to publish cloud and non-cloud resources, more compute power is made available. The benefits of this approach was shown through the bio-informatics workflow performance results. The case study showed that when faced with delays, Uncinus successfully drew upon external (cloud) resources to speed up analysis (reducing the running time of the workflow by 70.2%). In conclusion, the framework is feasible and its proof of concept, Uncinus, is a robust environment that fulfils requirements of researchers, in particular those with no computing background, carrying out analysis on big data. Uncinus provides an easyto-use interface to publish and access a range of software and computing resources. By giving users control over software and hardware, they can construct services that fulfil their individual requirements. Additionally, as computer hardware is abstracted, Uncinus supports interoperability between private clouds, servers and public clouds. Being able to utilize many resources was shown to improve the performance (turnaround time) of genomic analysis services. Future work to the Uncinus platform will aim to further simplify the deployment and publication procedures through extensions made to the middleware library. References [1] A. Darling, L. Carey, W. Feng, The design, implementation, and evaluation of mpiBLAST, in: ClusterWorld 2003, 2002.
P. Church et al. / Future Generation Computer Systems 43–44 (2015) 24–37 [2] B. Langmead, C. Trapnell, M. Pop, S. Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biol. 10 (2009) R25. [3] H. Li, R. Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform, Bioinformatics 25 (2009) 1754–1760. [4] L. Stein, The case for cloud computing in genome informatics, Genome Biol. 11 (2010) 207. [5] M.C. Schatz, B. Langmead, S.L. Salzberg, Cloud computing and the DNA data race, Nature Biotechnol. 28 (2010) 691. [6] H.-L. Truong, S. Dustdar, Cloud computing for small research groups in computational science and engineering: current status and outlook, Computing 91 (2011) 75–91. [7] K. Yelick, S. Coghlan, B. Draney, R.S. Canon, The Magellan Report on Cloud Computing for Science, US Department of Energy, Office of Science, and Office of Advanced Scientific Computing Research (ASCR), Eds., ed, December, 2011. [8] M. McLennan, R. Kennell, HUBzero: a platform for dissemination and collaboration in computational science and engineering, Comput. Sci. Eng. 12 (2010) 48–53. [9] P. Kacsuk, P-GRADE portal family for grid infrastructures, Concurr. Comput.: Pract. Exper. 23 (2011) 235–245. [10] AGAVE, December 2012. AGAVE. Available: http://sourceforge.net/projects/agaveapi. [11] R.N. Calheiros, C. Vecchiola, D. Karunamoorthy, R. Buyya, The Aneka platform and QoS-driven resource provisioning for elastic applications on hybrid Clouds, Future Gener. Comput. Syst. 28 (2012) 861–870. [12] Monash, 2014. Overview—myCloudMesh. Available: http://cloudmesh.github.io/cloudmesh.html. [13] T.S. Somasundaram, K. Govindarajan, CLOUDRB: a framework for scheduling and managing High-Performance Computing (HPC) applications in science cloud, Future Gener. Comput. Syst. 34 (2014) 47–65. [14] J. Goecks, A. Nekrutenko, J. Taylor, The Galaxy Team, Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences, Genome Biol. 11 (2010) R86. [15] E. Afgan, D. Baker, N. Coraor, B. Chapman, A. Nekrutenko, J. Taylor, Galaxy CloudMan: delivering cloud compute clusters, BMC Bioinformatics 11 (2010) S4. [16] W. Gentzsch, Sun grid engine: towards creating a compute power grid, in: Presented at the Proceedings of the 1st International Symposium on Cluster Computing and the Grid, 2001. [17] Y. Kowsar, E. Afgan, Support for data-intensive computing with CloudMan, in: Presented at the 36th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO, Opatija, Croatia, May 2013. [18] B. Liu, B. Sotomayor, R. Madduri, K. Chard, I. Foster, Deploying bioinformatics workflows on clouds with galaxy and globus provision, in: Presented at the Third International Workshop on Data Intensive Computing in the Clouds, Salt Lake City, 2012. [19] M. Brock, A. Goscinski, Execution of compute intensive applications on hybrid clouds, in: 1st International Workshop on Hybrid/Cloud Computing Infrastructures for e-Science Applications, HCCIEA 2012, Palermo, Italy, 2012. [20] A.K.L. Wong, A.M. Goscinski, A unified framework for the deployment, exposure and access of HPC applications as services in clouds, Future Gener. Comput. Syst. 29 (2013) 1333–1344. [21] P.C. Church, A. Goscinski, IaaS Clouds vs. Clusters for HPC: a performance study, in: The Second International Conference on Cloud Computing, GRIDs, and Virtualization, Rome, Italy, 2011.
37
[22] S. Christley, Y. Lu, C. Li, X. Xie, Human genomes as email attachments, Bioinformatics 25 (2009) 274–275. [23] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, D. Zagorodnov, The Eucalyptus open-source cloud-computing system, in: Presented at the Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009. [24] Amazon, Amazon Elastic Compute Cloud: Getting Started Guide, Amazon2010. [25] J. Kirby, NeCTAR—Australian Research Cloud, 2012. Available: http://www.nectar.org.au/. [26] OpenStack Project, 2012. OpenStack—Open source software for building private and public clouds. Available: http://www.openstack.org/. [27] J. Ziv, A. Lempel, A universal algorithm for sequential data compression, IEEE Trans. Inform. Theory 23 (1977) 337–343. [28] Amazon, 2014. Amazon EC2 Pricing. Available: http://aws.amazon.com/ec2/pricing/.
Philip Church obtained his Bachelor of Information Technology with first class Honors from Deakin University, Australia in 2009. He is currently a Ph.D. student researching in the areas of cloud and service computing, bioinformatics, data management and distributed systems under the supervision of Dr. Andrzej Goscinski and Dr. Christophe Lefèvre.
Andrzej M. Goscinski is chair professor of computing at Deakin University. He is recognized as one of the leading researchers in distributed systems, distributed and cluster operating systems, and service and cloud computing. The results of his research have been published in international refereed journals and conference proceedings and presented at specialized conferences. Currently, He is carrying out study into high level abstractions of cloud computing; cloud’s service publication, selection and discovery; grid; and the development and deployment of services based clouds.
Christophe Lefèvre obtained a Doctorate degree in Biochemistry from the University of Montpellier in 1984. He went on to conduct post-doctoral research in Southern California (USC, UCSD, UCLA) and Japan (Tokai University). After leading bioinformatics software developments at Gentech in southern France, he joined Organon AkzoNobel, a pharmaceutical company in the Netherlands. In 2002 he moved to Australia, working at the Victorian Bioinformatics Consortium and the CRC for innovative dairy products. Since 2008 he joined the Institute of Technology Research and Innovation (ITRI) at Deakin University as Associate Professor of Bioinformatics.