Live migration of trans-cloud applications

Live Migration of Trans-Cloud Applications Journal Pre-proof Live Migration of Trans-Cloud Applications ´ Ernesto Pimentel Jose Carrasco, Francisco ...

Download PDF

1MB Sizes 0 Downloads 79 Views

Report

PDF Reader
Full Text

Live Migration of Trans-Cloud Applications

Journal Pre-proof

Live Migration of Trans-Cloud Applications ´ Ernesto Pimentel Jose Carrasco, Francisco Duran, PII: DOI: Reference:

S0920-5489(19)30098-4 https://doi.org/10.1016/j.csi.2019.103392 CSI 103392

To appear in:

Computer Standards & Interfaces

Received date: Revised date: Accepted date:

22 March 2019 24 July 2019 8 November 2019

´ Please cite this article as: Jose Carrasco, Francisco Duran, Ernesto Pimentel, Live Migration of Trans-Cloud Applications, Computer Standards & Interfaces (2019), doi: https://doi.org/10.1016/j.csi.2019.103392

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

Highlights • This paper presents a solution for the component-wise migration of cloud applications. • The migration is performed component-wise, each component to be migrated, which may be deployed on a specific service on a specific provider, may individually be moved to a different one. • Our solution relies on the three key ingredients of the trans-cloud approach: a CAMP-based unified API, TOSCA-based agnostic topology descriptions, and mechanisms for the independent specification of target service locations. • Downtimes are minimized. • The effort needed from the user for a migration operation is almost zero. • We present an implementation of our proposed solution and illustrate it with a case study and experimental results.

1

Live Migration of Trans-Cloud Applications Jose Carrasco, Francisco Dur´ an, and Ernesto Pimentel Dpto. Lenguajes y Ciencias de la Computaci´ on Universidad de M´ alaga, Spain Email: {josec,duran,ernesto}@lcc.uma.es

Abstract The development of applications independent of the cloud providers where they are going to be deployed is still an open issue. In fact, cloud agnostic software development presents important challenges to be solved. One of these issues is the runtime migration of components. Even more difficult is dealing with the interoperability issues when the migration also implies a change of provider or service level. This paper presents a solution for the component-wise migration of cloud applications. The migration is performed component-wise in the sense that each component of the application to be migrated, which may be deployed on a specific service on a specific provider, may individually be moved to a different one. Our solution relies on the three key ingredients of the transcloud approach, where the CAMP and TOSCA standards play a central role: a CAMP-based unified API, TOSCA-based agnostic topology descriptions, and mechanisms for the independent specification of target service locations. The effort and the time required for restoring the activity of applications are used as metrics to evaluate the performance of the proposed migration orchestrator. Although the time required for the migration operation is directly related to the topology of the application, the affected components, and their previous and target locations, the downtimes are significantly reduced. Moreover, thanks to the abstraction level at which it operates and the automation provided, the effort needed from the user for a migration operation is almost zero. We present an implementation of our proposed solution and illustrate it with a case study and experimental results. Keywords: Cloud, application migration, TOSCA, CAMP, Apache Brooklyn, heterogeneous deployment

1. Introduction

5

In recent years, Cloud Computing has experienced a growth in goals and capabilities [1], as an answer to market demands. During this quick technology evolution, cloud vendors have developed their own solutions. Thus, today, cloud providers are offering different functionalities through different APIs following

Preprint submitted to Journal of Computer Standards & Interfaces

November 13, 2019

10

15

20

25

30

35

40

45

50

diverse business models. As a consequence of this heterogeneity, cloud developers will often be locked-in specific services, since many interoperability and portability issues are found when trying to combine solutions from different vendors. The interoperability of cloud services is recently receiving an increasing attention [2]. Proposals like federated clouds [3], inter-cloud [4], multi-cloud [5], and others have contributed towards the automated management of applications over heterogeneous cloud infrastructures by taking advantage of the management of the topologies of applications. In fact, the capacity of distributing components of the same application in different providers is usually known as cross-cloud deployment. However, most of the solutions provided to solve these scenarios only deal with one service model, either IaaS or PaaS. A further step in this direction has been recently proposed in what has been called trans-cloud deployment [6, 7], which manages the deployment of applications combining services from IaaS and PaaS levels, possibly offered by different vendors. Although these solutions come to help in a complex task, the deployment problem is indeed aggravated since, given the multitude of cloud offerings, the number of possible alternatives becomes huge. If the decision of what vendor, service or service level to use to deploy an application is difficult, deciding how to distribute the components during an application deployment is now much more complicated [8, 9]. The reasons for a trans-cloud deployment may indeed be very diverse: For instance, we may be using existing components from different sources, we may be trying to minimize costs, or maybe we want to use replication to improve reliability, or improve latency with end users. To help with the decision, thanks to the above mentioned emerging solutions for cross-cloud deployment, we can nowadays find platforms with the capability of allowing developers to find the best deployment alternative for each individual component of their applications, considering different constraints related with Quality of Service (QoS) metrics, cost requirements or privacy issues (see, e.g., [10, 4]). Unfortunately, the story does not end there. Even if we had a perfect solution for the automatic trans-cloud deployment of applications, where each component of our applications is deployed using the best possible service so that its quality of service is the optimal one, according to your chosen criteria, we still have to face one more issue: change. There can be changes of very different nature affecting our applications, for example, there can be updates in the deployed applications [11], unpredicted changes in the applications’ workloads or context, or on the used services, namely, changes in the offered services, prices, security policies, or simply because a provider just stops providing its services. Indeed, the migration of individual components or entire applications is unavoidable over time. Can we migrate our applications in case of need? The possibility of changing decisions about what cloud providers to choose to deploy an application, partially mitigating the vendor lock-in problem, has been studied by several researchers (see, e.g., [12, 13, 5, 14]). However, these solutions are mostly limited to moving full applications from one provider to a different one. In some of these proposals, when cross-cloud deployment is supported, the re-deployment may 3

55

60

65

70

75

80

85

90

95

be performed component-wise. Indeed, additional problems have to be considered (and solved) when different components of the same application have to be moved to diverse datacenters, or even different cloud providers [15, 16]. However, in all these cases, when migrating, applications must be completely stopped and re-deployed, and most of the solutions provided to solve these scenarios only deal with one service model, typically IaaS. As we discuss in Section 5, on related work, currently only container-based solutions provide some support for runtime application migration. However, these proposals move complete execution environments, and only containerized components. The only currently-available way to handle these situations is using script languages, like Ansible, to manually control the actions to be performed. Of course, this task is error prone, and requires a significant amount of time and great expertise. The reliability and effectiveness of component-wise cloud-application migration can significantly benefit from automation. If component deployment reconfiguration is a complex task, doing it at runtime, trying to minimize downtimes, is even more challenging. Runtime migration is still an unresolved topic which has been widespread studied by both academia and industry (see, e.g., [17, 18]). To migrate some components of a running cloud application, it is necessary to orchestrate the entire cloud context, such as services and bound resources, to reach the expected movement of the components services but taking into account the possible interoperability and portability problems. In fact, runtime migration presents a lot of new key issues related to cloud resources, application components, and their performance and management. We can find several proposals on the live migration of cloud applications’ components (see, e.g., [19, 20, 21]), focusing on the movement of running application components between different vendors. However, as we will see in Section 5, these solutions are quite limited. The question is then the following: Is it possible to automate the componentwise live migration of trans-cloud applications? We respond positively in this paper, and we do so by presenting a framework for the automatic trans-cloud deployment and migration of applications. The migration infrastructure we present was developed with three main goals in mind: (i) minimizing the effort of the user to perform the migration, (ii) minimizing application downtimes during the migration, and (iii) ensuring the integrity of the application during the migration process. Indeed, with our proposal, the migration can be performed component-wise, which means that the decision of which provider and service, either IaaS or PaaS, to use as target of each component may be specified independently of the other components. Once an application is under execution, the only thing the user has to do to perform a migration is to provide the new locations of the components to be moved. The framework is built on the trans-cloud infrastructure presented in [6, 7], which has been extended with a migration orchestrator. All issues related to resource allocation, credential handling, component interconnection, etc., is handled by the already existing infrastructure. Although the framework may also be used for the update or re-deployment of entire applications, we focus in this paper on the migration of active applications, where dependencies between com4

100

105

110

115

120

125

130

135

ponents are used to minimize the number of operations to perform and the time of inactivity. In this sense, some of the most promising features of the approach is the use of the information provided by detailed architectural description of the managed applications [12]. Since the migration of components with state poses additional difficulties, we only consider the migration of stateless components. Our solution relies on the three key ingredients of the trans-cloud approach: a CAMP-based1 unified API, TOSCA-based2 agnostic topology descriptions, and mechanisms for the independent specification of providers. Our proposal uses the TOSCA standard to model the agnostic topologies of applications. Being agnostic means in this context that these topologies do not use any specifics on the target providers over which they will be deployed. The information related to the cloud service level, IaaS or PaaS, is added by means of a policy-based mechanism independent of the TOSCA topology description. The trans-cloud environment processes these specifications and uses a CAMPbased homogeneization API, which unifies IaaS and PaaS services of different vendors, to orchestrate the deployment over the required cloud services. Given an application deployed using the trans-cloud framework, the migration orchestrator receives as input the set of target locations of the components that have to be migrated. That is, when a migration of individual components of an application is requested, a number of cloud-agnostic processes are triggered by moving just the necessary components to respective target services, independently of the target providers and abstraction levels, either IaaS or PaaS. By relying on the trans-cloud infrastructure, operations such as stop, re-start, move and re-connect are used on the necessary components independently of the service level, the cloud technology or any other dependencies. In summary, we present in this paper a migration orchestrator that automates the reliable, efficient and component-wise migration of trans-cloud applications. The system, some examples of use, and documentation, both on its use an on its design, are available at https://github.com/scenic-uma/ brooklyn-dist/tree/trans-cloud. This work builds on previous work by the same authors and other researchers. Specifically, the trans-cloud approach is presented in [6, 7]. Basic technology for the migration of individual components in a trans-cloud environment was presented in [24]. A first attempt for the migration of multiple components was presented in [25]. We use the algorithm in [25] in Section 4, where we refer to it as Reference Algorithm, to compare the current proposal against it. The main contributions of this work in comparison to these previous ones, are: • The orchestrator allows the concurrent migration of several components of an application, and it only needs the mapping of components to be migrated to new locations, independently of the service level used both in 1 CAMP (Cloud Application Management for Platforms) is an OASIS standard focused on the integrated management of multiple platforms [22]. 2 TOSCA (Topology and Orchestration Specification for Cloud Applications) is an OASIS standard for the description of cloud applications and their relationships [23].

5

the source and the target providers.

140

• The current migration infrastructure was developed with the the minimization of application downtimes as one of the main driving goals. • The infrastructure is publicly available, it has been thoroughly tested, and its documentation and evaluation are now completed. Extensive experimentation of the migration technology is presented.

145

150

The rest of this paper is structured as follows. Preliminaries about transcloud deployment and its implementation are presented in Section 2, which also introduces a case study that is used to motivate and illustrate our approach along the paper. The proposed trans-cloud migration algorithm for stateless components is described in Section 3. Details on the implementation of the algorithm are presented in Section 4, together with some experimental and evaluation results. In Section 5, we compare our proposal to other related work. Finally, Section 6 concludes the paper and discusses some lines for future work. 2. An Overview of Trans-Cloud

155

160

165

170

175

In this section we provide an overview of the trans-cloud management of cloud applications and its main capabilities. These concepts are illustrated with a running case study, which will be later revisited, to show the use of our proposal for the autonomous migration of applications in Section 3 and to evaluate it in Section 4. 2.1. A Trans-Cloud Environment Trans-cloud [7] significantly reduces the issues related to portability and interoperability of cloud applications. It allows developers to deploy their applications by using available vendors’ services and resources, at IaaS or PaaS levels, releasing them from the usual infrastructure limitations while defining their applications. The trans-cloud approach relies on three main ideas: agnostic topology descriptions, target-service specifications, and a unified API. Trans-cloud was proposed as a step forward on mechanisms related to the management of applications’ components on different providers, such as, multi-/cross-/inter-cloud. With the goal of unifying cloud services, trans-cloud introduces a second dimension in which deployment tools integrate all three levels under a CAMP-based single interface [22]. This allows developers to automatically deal with IaaS and PaaS differences to deploy their applications, combining services offered by providers at any of these levels. Regarding the homogenization of cloud services, trans-cloud relies on an agnostic specification of applications’ topologies based on the TOSCA standard, which allows us to provide full-detailed specifications of application components, how they are inter-related and their respective configurations. Indeed, this specification is used to abstract from vendors specificities and to systematize their management. Then, to manage the selected cloud services to deploy 6

Figure 1: The trans-cloud approach

180

185

190

195

200

applications’ components, an extended Apache Brooklyn3 is used. Brooklyn is CAMP-based, and defines a common interface for the homogeneous treatment of different cloud vendors and their services. To abstract from specific vendors and services, the description of target services requires a mechanism to agnostically specify the target cloud services which will be managed from the unified API. To do this, the trans-cloud manager uses locations as policies, by means of placement policies defined in TOSCA. Locations represent cloud services on which to perform the deployment, both for IaaS and PaaS services. As we will see in the coming sections, locations are defined independently of the topologies, what allows us to change target services just by changing the corresponding locations. The trans-cloud framework presented in [6, 7] builds on the Brooklyn-TOSCA4 open project for enabling an independent specification of the used services, and on the Apache Brooklyn project to provide a common API for the unified management of IaaS and PaaS services. Figure 1 shows an overview of the proposal. Apache Brooklyn is a multi-cloud application management platform for the management of the provisioning and deployment of cloud applications, providing a common API that enables cross-computing features through a unified API. As can be seen in Figure 1, the current official release of Brooklyn only handles IaaS services. It relies on jClouds,5 through which it provides support for a number of IaaS services providers including AWS, Microsoft Azure, Rackspace and SoftLayer.6 Brooklyn’s API was extended in [6, 7] with new mechanisms for the management of PaaS services. Our extension takes advantage of the genericity and flexibility of Brooklyn’s API, which has the independency between application descriptions and cloud services used in their operation as one of its goals. Initially, CloudFoundry-based platforms, like Pivotal Web Services, IBM 3 Apache

Brooklyn: https://brooklyn.apache.org/. https://github.com/cloudsoft/brooklyn-tosca. 5 jClouds: https://jclouds.apache.org/. 6 See https://jclouds.apache.org/reference/providers/ for a complete list of providers supported by jClouds. 4 Brooklyn-TOSCA:

7

205

210

Bluemix, etc., were integrated, to prototype the PaaS support, and to allow components to be deployed using both IaaS and PaaS. As we will see in the following sections, the trans-cloud manager provides a set of useful basic mechanisms to build an agnostic algorithm to reach the migration of application’s component. Application components are specified in an agnostic way, and their control, with operations for stopping, re-starting, starting in a new location, etc., can be delegated to the aforementioned unified API, which handles each application component independently of the kind of service where it will be running, either IaaS and PaaS. 2.2. The SeaClouds Case Study

215

220

225

230

235

240

245

We use the SeaClouds application [10] as a case study to evaluate and illustrate our proposal. Although not a complex architecture, it will allow us to illustrate the trans-cloud deployment and migration. The application will allow us to go into some details on the management of dependencies between components, with an emphasis on simplicity and reliability of the procedure. SeaClouds is itself a multi-cloud application management system. Indeed, this is in a way a meta case study: SeaClouds was one of the first solutions proposed to solve some of the vendor lock-in problems. The basic capabilities delivered by SeaClouds to the software developer and cloud application manager allow: (i) the discovery and matchmaking of cloud offerings based on the requirements of a given application, (ii) the deployment of applications across multiple clouds to address non-functional requirements, and (iii) the multi-cloud governance, including monitoring, Service Level Agreement (SLA) enforcement, and some basic repairing (such as horizontal and vertical scaling to maximize the performance of each application module) upon the degradation of the execution requirements. As depicted in Figure 2, the SeaClouds application is composed of seven main modules: the Dashboard component provides the main graphical user interface. The Planner is in charge of generating an orchestration plan considering the application topology and its requirements. To do this, it needs the Discoverer module, which provides information about capabilities and add-ons offered by available cloud providers. The deployment plans generated by the Planner are executed by the Deployer component. Once the application has been deployed, each module is monitored to guarantee that quality-of-service (QoS) properties of the application are not violated. This is the responsibility of the MonitorManager component, which uses information provided by the DataCollector module, and provides the corresponding analysis results through a MonitorDashboard component. Finally, the SLAService component is in charge of mapping the low level information gathered from the Monitor into business level information, quality of business (QoB), thus providing information about the fulfilment of the SLA defined by the cloud provider. Figure 3 shows the TOSCA topology of the SeaClouds system. In it, each module is modeled as a NodeTemplate. Moreover, the diagram specifies specific relations between components and the used technology. We observe that

8

Figure 2: The SeaClouds architecture

250

255

260

265

270

Dashboard, DataCollector and Discoverer run on Jersey servers. Planner and SLAService require Tomcat servers. The MonitorDashboard is a Graphana instance, and MetricsDB, DiscovererDB and SLADB are InfluxDB, Mongo and MySql databases, respectively. Finally, we can see that Monitor has been split in two different components, namely MonitorManager and DataAnalyzer, both of them hosted on

Apache servers. Notice that relations between components, explicit in this topology, are key for the handling of dependencies between components. The rest of the details of the TOSCA specification of the application are not relevant to the rest of the presentation and therefore omitted. The entire description is available at https://github.com/scenic-uma/brooklyn-dist/tree/trans-cloud. Given a TOSCA YAML description of an application topology, with the appropriate location specifications, the trans-cloud infrastructure is able to create suitable entities to manipulate each module. The listing in Figure 4 shows a SeaClouds’s TOSCA YAML topology description, where just some relevant elements are described to illustrate the agnostic-based application description (please, notice the ellipses). Without going into details of the TOSCA specification, in this figure we can see how this location mechanism allows the separation between the topology description and the providers specification. In lines 29–48, we see that the components are deployed on AWS (Ireland and Oregon’s cluster) and SoftLayer (London’s cluster), all of them IaaS services. We do not go here into the discussion of which services and providers have to be used to deploy each of the components. Needless to say that having the components spread over different datacenters, geographically distributed, may affect its reliability (if any of these locations has some issue) and performance

9

Figure 3: The SeaClouds TOSCA topology

275

280

285

290

295

300

(latency will be substantially higher than having all of them in the same place). We just assume that some recommendation system or person has chosen these locations for this application. As illustrated in [7], just by changing the target locations of the application’s components, using deployment policies, we could deploy them using different services on different locations, possibly in different levels or providers, without any further changes on our application or its TOSCA specification. In the following sections, we will use this case study to illustrate and evaluate our proposal. Assuming that the application has been deployed as explained above using the trans-cloud environment, we will perform several alternative migration operations on several of its components. We will use it to explain the implementation of the main algorithm and the proposed migration framework, and to show some experimental results of its execution. 3. The Migration Orchestrator The migration infrastructure was developed with three main goals in mind: (i) minimizing the effort of the user, (ii) minimizing application downtimes, and (iii) ensuring the integrity of the application during the migration process. As explained in Section 2.1, every detail required for the deployment of applications is provided in TOSCA specifications, using mechanisms that allow such specification to abstract from particularities of levels and providers. Therefore, once the TOSCA specifications are available for the initial deployments, individual components of such running applications can be migrated just by specifying the new locations. As we will see below, for a migration operation, all a user has to provide is a map associating a new location to each component to be migrated. The underlying infrastructure will take care of the rest. The trans-cloud management infrastructure internally represents each component and service by an entity object that serves as proxy to the corresponding element. Thus, when we want to perform some operation on a component, we will do it on the corresponding entity, which is then capable of operating on the corresponding component or service in the cloud. Indeed, in the rest of the 10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

tosca_definitions_version: tosca_simple_yaml_1_0_0_wd03 ... topology_template: node_templates: Planner: type: org.apache.brooklyn.entity.webapp.tomcat.TomcatServer ... requirements: - endpoint_configuration: node: SLAService ... - endpoint_configuration: node: Discoverer ... - endpoint_configuration: node: MonitorManager ... SLAService: type: org.apache.brooklyn.entity.webapp.tomcat.TomcatServer ... requirements: - endpoint_configuration: node: SLA-DB ... SLADB: type: org.apache.brooklyn.entity.database.mysql.MySqlNode ... ... groups: ireland-services: members: [DataAnalizer, DiscovererDB, MetricsDB, SLADB] policies: - brooklyn.location: aws-ec2:eu-west-1

34 35 36 37 38

bluemix-services: members: [Planner] policies: - brooklyn.location: aws-ec2:eu-west-1

39 40 41 42 43

oregon-location: members: [Discoverer, MonitorManager, Deployer, SLAService] policies: - brooklyn.location: aws-ec2:us-west-2

44 45 46 47 48

london-location: members: [Dashboard, MonitorDashboard, DataCollector] policies: - brooklyn.location: softlayer:lon02

Figure 4: SeaCloud’s TOSCA YAML description

11

305

310

315

320

325

330

335

340

345

section, when we say that we perform some operation on a component, we mean that we are performing such operation on its entity, which then will find the way to perform the corresponding operation or operations on the actual component. Similarly, when we say that several operations are executed concurrently, we mean that the operations on the corresponding entities are executed concurrently by the orchestrator. Obviously, the components themselves are running on the cloud, on a distributed environment. To minimize downtimes, the algorithm follows a distributed approach, carefully organizing the tasks to perform so that components are inactive the strictly necessary time intervals. Decisions are made on a component basis, so that each component checks its own dependencies and moves forward in its specific task as soon as possible. The migration orchestrator simultaneously operates on all specified components—entities—by following a multi-threaded strategy, controlled by the application of a set of rules to detect dependencies between component and cloud resources to detect component dependencies, thus ensuring the integrity of the application during the migration process. With this approach, if we need, say, to start several components, we do not need to indicate the order in which the components are to be started, we simultaneously initiate their start, and their entities are in charge of checking their dependencies and advancing in their lifecycle as they can. The abstractions provided by the trans-cloud management infrastructure, and the way the applications and their components and inter-relations are modeled and handled, enables the possibility of defining a completely agnostic migration algorithm. Indeed, the management of each concrete application’s stateless component and bound cloud resource required in the process can be performed through the trans-cloud generic API, which hides the complexity of managing different cloud levels and providers. Such topology, the abstracted lifecycle and the unified API then allows us to define our migration algorithm simply as a process orchestrator. For example, the stop(component) operation is in charge of stopping a component regardless of where it is running, either on an IaaS or PaaS service, and the resources it is using. Similar operations for starting, re-starting, releasing, etc. are key for our proposal. 3.1. High-Level Description of the Algorithm By relying on the trans-cloud management infrastructure the operation of the orchestrator can be described at a very high level of abstraction. Our threadbased parallelized operational plan for the migration of each component to the specified target locations is specified in the listing in Figure 5. The algorithm is specified using a Java/Python-like pseudo-language. Given a running application, deployed using the trans-cloud infrastructure with a TOSCA specification of it, the operation MIGRATE(a,cls) (lines 4–26), which takes as inputs an in-memory representation of the application’s topology, a, and a migration blueprint, cls, executes the migration blueprint over the running application. The migration blueprint is basically a map that associates a target location to each of the components to be migrated. The set of keys of this map (cls.keys()) represents the components to be migrated. 12

1 2

Input a: application Input cls: component-location map

3 4 5 6 7 8 9 10

13 14 15 16

19 20 21 22 23 24 25 26

32 33 34 35 36 37

function STOP-TASK(a, c, cls): Task return new Task() tasks = empty list of tasks for c in parents(a, c) tasks.add(STOP-TASK(a, c)) executor(tasks).parallel-execution() stop(c)

39 40 41 42 43 44

tasks = empty list of tasks for c in depCs tasks.add(RE-START-TASK(a, c, cls)) executor(tasks).parallel-execution()

function START-TASK(c, cls): Task return new Task() start(c, cls.get(c))

31

38

depCs = lowest-ancestors(a, cls) tasks = empty list of tasks for c in depCs tasks.add(STOP-TASK(a, c, cls)) executor(tasks).parallel-execution()

17 18

29 30

procedure MIGRATE(a, cls) ls = empty list of resources tasks = empty list of tasks for c in cls.keys() ls.add(current-location(c)) tasks.add(START-TASK(c, cls)) executor(tasks).parallel-execution()

11 12

28

45 46 47 48

function RE-START-TASK(a, c, cls): Task return new Task() if is-ready-to-re-start(c) re-start(c) re-establish-relations(c, a) tasks = empty list of tasks for c in parents(a, c) tasks.add(RE-START-TASK(a, c)) executor(tasks).parallel-execution()

49

tasks = empty list of tasks for l in ls tasks.add(RELEASE-TASK(l)) executor(tasks).parallel-execution()

50 51 52

function RELEASE-TASK(l): Task return new Task() release-resources(l)

53

Figure 5: Sketch of the migration algorithm

350

355

360

365

We differentiate four phases in the algorithm: (i) the re-deployment of the components to be migrated in their new locations, (ii) the stopping of migrated components and those that depend on them, (iii) the re-starting of all stopped components, and (iv) the releasing of the services used by the migrated components in their old locations. To maximize concurrency during these phases, actions on components are bundled on tasks, namely, START-TASK (lines 28–30), STOP-TASK (lines 32–38), RE-START-TASK (lines 40–48) and RELEASE-TASK (lines 50–52). For example, a task for re-starting a component will contain the necessary operations to restart the component. Once a set of tasks have been generated, they will be executed in parallel using the parallel-execution() operation of the executor object. This operation returns control once all the tasks it executes are completed. Re-deployment phase. In the re-deployment phase (lines 6–10) the components to be migrated are deployed in the desired new locations so the required resources are provisioned in the new target cloud providers. The re-deploying is performed with the command start(c, cls.get(c))) (line 30), where start is an operation in the trans-cloud API that takes care of the specificities of the providers and services used for the provisioning the new resources. Notice that for their later release, the current locations of the resources used by the on-the-move components are stored in the ls variable (lines 5–8). This re-deployment is performed before stopping the original component instances, so at that point, affected components’ resources are still running in 13

370

375

380

385

390

395

400

405

410

the old locations. That is, these on-the-move components and the associated cloud resources are ‘living’ in two different providers simultaneously. Notice that the components related to the migrated ones are still connected to the old resources, that is, the initial re-deployment of the on-the-move components is performed using the old relations. The re-connection of the components will be performed in the re-starting phase. Since the orchestrator uses multiple threads and components are operated in parallel to minimize downtimes, these relations must be updated simultaneously after the new services have been provisioned. Of course, these duplicated elements cost money. However, they will be ready when the affected components are prepared to be re-started. Notice that the time these services are duplicated is very small, and since there is no interaction with the affected components, they do not have any workload. Stopping phase. In the stopping phase (lines 12–16), the on-the-move components and all the components that depend on them are stopped. Notice that this is key for ensuring the integrity of the application. For example, if we wanted to migrate a back-end server, which is being used by a set of front-end components, we should ensure that all the components connected to the back-end have been stopped beforehand. The algorithm uses a dependency analyzer based on the trans-cloud infrastructure which inspects the topology and the migration blueprint to identify the dependent components. Indeed, the orchestrator finds the lowest components in the topology whose dependencies are still connected to old resources (lowestancestors(a, cls) operation) and stores them in the depCs variable (see line 12). The operation parents(a,c) (line 35) returns the set of components dependent on the component c in the application model a. The algorithm proceeds by executing the stopping tasks in parallel. However, to maintain the topology integrity at all times, before stopping a component, the operation is invoked recursively on its ancestors (parents(a,c) operation), resulting in a top-down stopping strategy. In other words, all on-the-move components and those depending on them are stopped, but in all cases a node is stopped after its parents. A component is stopped with the operation stop(c) (line 38), an operation provided by the trans-cloud API that stops the specified component taking into account the specificities of the providers and services used. Re-starting phase. Once all affected components are stopped, they are re-started in the re-start phase (lines 18–21). Again, multi-threads are used to perform the re-start operation on every stopped node at the same time. However, to avoid unexpected component states, the algorithm checks that all dependencies of a node are already running before it is re-started with the is-ready-to-re-start(c) operation (line 42). The following stand-up operations are then applied to it: (1) start, (2) re-establishment of its connections, and, finally, (3) re-start of its ancestors. All these operations are performed by invoking trans-cloud’s API (lines 43–44). The recursive invocation to the ancestors results in a bottom-up traversing strategy on the depen14

Figure 6: Migration Scenario 1

415

420

dency structure, which means that a component will only be re-started after all its input dependencies are satisfied. This procedure ensures the topology’s integrity and avoids unexpected behavior during the migration. Releasing phase. Once the re-starting phase is completed, the application have been completely migrated, the application components are running in the indicated locations, and every relation have been updated to establish the appropriate connection with the new resources. However, unused old resources are still running. In its last phase (lines 23–26), the algorithm releases the unnecessary resources using the trans-cloud operation release-resources(resource) (line 52). 3.2. Migration Orchestration

425

430

435

440

Let us illustrate the algorithm presented in the previous section on a concrete migration scenario. Specifically, let us focus on how new providers are reached, how affected component actions are orchestrated, and how the topology’s integrity is maintained with multi-threading while application downtimes are minimized. Again, we do not go here into the discussion of which services and providers are to be used to deploy each of the components. We just assume that someone has chosen these locations for this application. Consider the SeaClouds application presented in Section 2.2, and let us assume that it has been deployed with the trans-cloud infrastructure introduced in Section 2.1, using the YAML specification in Figure 4. That is, the components Planner, DataAnalizer, DiscovererDB, MetricsDB, and SLADB are running using EC2 services (IaaS) in AWS’s Ireland datacenter; the Discoverer, MonitorManager, Deployer components are running using EC2 services in AWS’s Oregon datacenter; and the components Dashboard, MonitorDashboard, DataCollector and SLAService are running in SoftLayer’s London cluster (IaaS). These locations are shown in Figure 6, see the icons on the left over each component. Now, let us suppose that, for some reason, we eventually decide to migrate several of these components. Without going into details on the reasons to do it, let us assume that, as also illustrated in Figure 6 over each moved component, 15

Figure 7: Sequence diagram

445

450

455

460

465

we decide to move the Deployer and Discoverer components from AWS Oregon to its datacenter in Ireland, and the SLAService and MonitorDashboard component are moved from Softlayer to Pivotal Web Services (PaaS) and to AWS Oregon respectively. Notice that the first movements are from IaaS services to IaaS services, between datacenters of the same provider, and SLAService is moved from IaaS to PaaS services. We can see in Figure 7 a sequence diagram with the interactions between the migration orchestrator and the components of the SeaClouds application during the migration operation. Please, notice the brackets on the left side indicating the phases of the process. In this diagram, the migration orchestrator represents the set of tasks, executed using multiple threads, and the messages are the invocation of start, stop, re-start or release operations performed on the entities that represent the components in the trans-cloud infrastructure. Notice that this diagram does not show the dependencies between tasks, it shows the actual order in which the operations are performed on the components. Likewise, the durations of the executions of the operations are based on actual executions.7 We can observe, for example, how the stop and most re-start operations take a very short time. Notice, for example, that the times of the re-start operations of the DataCollector, SLAService and Dashboard components are comparatively much smaller than the time for the Planner one. This is because the Planner is deployed on a Tomcat server in IaaS, which requires a significant amount of time to restart. The start and release operations, however, require a greater time, very much depending on whether PaaS or IaaS services are being used. A more detailed discussion on experimental results can be found in Section 4. In the start phase, start operations are executed on each of the components to be migrated, with their new locations. We can see in the sequence diagram 7 The experimental results and the necessary specifications to reproduce the executions of the different scenarios presented in this paper are available at https://github.com/ scenic-uma/brooklyn-dist/tree/trans-cloud.

16

470

475

480

485

490

495

how the components SLAService, Discoverer, MonitorDashboard, MonitorManager and Deployer simultaneously get start messages. Once all the required resources have been provisioned in their new locations, the stop phase is initiated. The trans-cloud dependency analyzer identifies the components whose dependencies would be inconsistent. These components need to be stopped and re-connected. Specifically, the algorithm identifies the set of lowest parents in the hierarchy, those from which the bottom-up stopping strategy is to be initiated. Notice that although the Discoverer, MonitorDashboard, MonitorManager, and Deployer components are to be migrated, their out dependencies do not change with the migration. Although they will be stopped in the release phase, since they do not require reconnection, they are not stopped in this phase. The situation for the SLAService component is different: Since the MonitorManager component is being moved, the SLAService component would be pointing to old resources. In this case, the Dashboard, Planner, and DataCollector components are connected to the on-the-move SLAService and MonitorManager components. The Dashboard, Planner, and DataCollector components must therefore be stopped to re-establish their connections. Once the stop phase is completed, the stopped components are reconnected and re-started following a bottom-up strategy: a node can only be re-started if all its dependencies are running. First, the SLAService and DataCollector components are re-started, since their dependencies are satisfied. Once the SLAService component is running, it tries to re-start its parents. Since the Planner component is inactive at that time, the Dashboard component cannot be re-started yet. Once the Planner component is up, the dashboard component can proceed to complete the re-start phase. In the release phase, unnecessary cloud resources are released. This operation is performed in parallel on all the components, once the application is back to normal operation on its new locations. 4. The Tool in Practice

500

505

510

From a migration point of view, the main performance metric is the effort and time required for restoring our application to its operation on the new services. Of course, when dealing with the management of cloud applications, we will have to care about the performance of the target architecture, its resilience, security aspects, etc. However, we assume that the decision on the target architecture was taken with all these other concerns in mind. Needless to say that downtime may have a direct impact on the application’s business, both in terms of economical cost and reliability. In Section 4.1, we focus on the analysis of the application’s downtime. We discuss on the effort required for a migration operation in Section 4.2. Threads to the validity of our conclusions are discussed in Section 4.3. 4.1. Downtime Analysis In what follows we present some experimental results and provide some insights on the performance of the proposed trans-cloud-based migration orches17

515

520

525

530

535

540

545

550

555

trator. We evaluate it using the SeaClouds application presented in Section 2.2. Starting from an already deployed and running instance of the application, we use three different migration scenarios, trying to cover different cases: components in different positions in the topology of the application, components moved to locations at the same or different level of abstraction, and different numbers of affected components. To be able to evaluate the numbers obtained, we compare the results of our algorithm with the results obtained using an alternative migration algorithm. This algorithm, to which we will refer as Reference Algorithm, performs an application migration operation by first stopping the affected components and then re-starting them in their new locations. To guarantee the integrity of the application during the migration process, the reference algorithm also follows a top-down strategy to stop the affected components and a bottom-up one to re-start them. Both algorithms were integrated in the trans-cloud framework and use the provided infrastructure for locating components, calculate the set of dependent components and log execution times. In both cases, we have used Brooklyn’s monitoring mechanisms to gather information on each component. Each migration scenario has been successfully executed 100 times with each algorithm. That is, our test bed comprises 600 successful executions, meaning 1600 component movements. 4.1.1. Scenario 1 The first scenario was described in Section 3.2. In it, five components (Discoverer, MonitorDashboard, MonitorManager, SLAService, and Deployer) are migrated from IaaS to different IaaS and PaaS services. Specifically, the components Discoverer, MonitorManager and Deployer go from EC2 (IaaS) AWS Oregon datacenter to EC2 services (also IaaS) in AWS Ireland. The MonitorDashboard component is moved from (IaaS) services in Softlayer London datacenter to AWS Ireland (IaaS) services. And finally, SLA Service is moved from EC2 (IaaS) AWS Oregon to the Pivotal Web Services platform (PaaS). The charts in Figure 8 are boxplot diagrams of the times at which the different trans-cloud operations are completed along the execution of the algorithms. For both algorithms, we can see how the times for most operations are rather concentrated around their median values. A few outlayers are shown, due to delays on the operations on the providers. In both charts we can observe how the different operations are performed. For example, we can see in Figure 8a how using the reference algorithm the necessary components are first stopped and then re-started. However, we can see in Figure 8b how, with our algorithm, resources for the components to migrate are first provisioned with a start operation, then the necessary components are stopped, and then they are re-started after the necessary reconnection has happened. The process is concluded with the release of the unnecessary resources. Although the total migration process times are quite similar in both cases (about 530 and 590 secs., respectively), we can observe a significant reduction in the application’s downtime. We have marked the period in which the application is down with a shaded area, which is delimited on its left with the 18

(a) Reference algorithm

(b) Our algorithm

Figure 8: Boxplots for migration times of Scenario 1 (seconds)

560

565

570

575

time of the stop of the Dashboard component—the component at the top of the hierarchy—and on the right with the time of its start or re-start. Although both algorithms operate on the same nodes, the downtime interval with the reference algorithm is more than five times higher than for our algorithm (587.7 vs. 105.6 secs.). Notice that whilst the downtime period covers the entire execution of the reference algorithm—the Dashboard component is stopped at the beginning of the migration and re-started at the end—in our algorithm the provisioning of new resources is performed in background before the stopping phase is started, and the releasing of resources happens after the application is back alive. In the trans-cloud platform stop operations are available with and without releasing resources. Whilst the reference algorithm uses one or the other depending on the case, our algorithm always uses the non-releasing stop operation so that the release can be performed later. Although both algorithms follow a top-down strategy to stop the necessary components, not all the components need to be stopped with our orchestrator. Both begin by stopping the Dashboard and DataCollector components, and proceed downward the hierarchy. However, whilst the reference algorithm stops every components involved in the migration (Deployer, Discoverer, MonitorDashboard, SLAService and MonitorManager), our algorithm only needs to stop those components which require to be reconnected (see Section 3.2). Notice also that in our algorithm, stopped components are activated using re-start operations, since resources have been conveniently allocated in the target services. In this

19

580

585

590

595

way, the downtime is reduced to a minimum. Finally, notice that, with the reference algorithm, although the stop time is not negligible, most of the time is used by the start/re-start operations, mainly allocating resources. It is worth mentioning the amount of time spent in stopping the Deployer component, which was running on IaaS. 4.1.2. Scenario 2 In the Scenario 2 the application is initially deployed using a different initial deployment plan. In this alternative plan the same locations of Scenario 1 are used for all the components but for the Planner component, which is located in BlueMix (PaaS). With the application running on these specified locations, one component is moved from IaaS to IaaS and another one from PaaS to PaaS. Specifically, the DataCollector component is moved from (IaaS) services in Softlayer’s London datacenter to AWS Ireland’s services (IaaS), and the Planner component is moved from IBM Bluemix’s services (PaaS) to Pivotal Web Services’ platform (PaaS). Figure 9a shows a sequence diagram with the interactions in Scenario 2. Charts in Figure 10 show the times for the migration operations performed using both algorithms on Scenario 2.

(a) Sequence Diagram of Scenario 2

(b) Sequence Diagram of Scenario 3

Figure 9: Sequence Diagrams

600

We can observe in Figure 10a that the reference algorithm begins trying to stop the Planner and DataCollector components. To stop the Planner component, however, first the Dashboard component needs to be stopped. Although the Dashboard and the Planner components are stopped in a very short time, the DataCollector component requires a bigger time. The components are then started or re-started upwards in the hierarchy. The start of the Planner and DataCollector components take a significant amount of time. Although in PaaS, the Planner component uses a Tomcat server that needs to be started. The DataCollector component is deployed on IaaS and requires provisioning of corresponding 20

(a) Reference algorithm

(b) Our algorithm

Figure 10: Boxplots for migration times of Scenario 2 (seconds)

605

610

615

620

625

630

resources. The re-start of the Dashboard component takes a very short time. Figure 10b shows how the start of the Planner and DataCollector components is carried out before stopping the Dashboard component. Indeed, since the Planner and DataCollector components do not require re-connection, they are not stopped. Only the Dashboard component needs to be stopped, and is immediately restarted after updating its connections. As in Scenario 1, although the entire migration process takes approximately the same amount of time, the difference in the downtimes is significant. Whilst the downtime of the application using the reference algorithm is around 390 secs., the downtime with our algorithm is reduced to a few seconds. 4.1.3. Scenario 3 In this scenario we begin with the application deployed as in Scenario 2. In this case, only the top component is migrated. Specifically, the Dashboard component is moved from (IaaS) services in Softlayer’s London datacenter to AWS Ireland’s services (IaaS). Figure 9b shows a sequence diagram with the interactions in Scenario 3. Charts in Figure 11 show once more the capacity for reducing the migration impact of the operations run in background in our algorithm. In Figure 11a we observe that the execution of the migration process with the reference algorithm starts by stopping the Dashboard component. Since the component was running using IaaS services, stopping it involves resource releasing. Then, the component is re-deployed in a new provider and then started. If we now move to the boxplot in Figure 11b, we observe that the application is running at its new location right after the start of the component is completed using the new resources in the target provider. Since this operation is completed in background, there is no downtime. Once the application is active, the releasing of no-longer-necessary resources is carried out in background. As in the other scenarios, the migration process takes approximately the same time with both algorithms. However, whilst with the reference algorithm the downtime is 401 secs. in average, our algorithm does not have downtime.

21

(a) Reference algorithm

(b) Our algorithm

Figure 11: Boxplots for migration times of Scenario 3 (seconds)

1 2 3 4 5

seaclouds-discoverer: aws-ec2:eu-west-1, seaclouds-grafana: aws-ec2:eu-west-1, seaclouds-monitor-manager: aws-ec2:eu-west-1, seaclouds-sla-service: pivotal-ws, seaclouds-deployer: aws-ec2:eu-west-1

Figure 12: Location map for Scenario 1

635

640

645

650

655

660

4.2. Migration Effort We have seen in previous sections how the migration algorithm, provided as one of the management operations of the proposed framework, automates the migration process, and we have evaluated the effect on the downtime of applications. In this section we focus on the effort by the user to carry out such operations. Whereas the migration from on-premise applications to the cloud has been systematically studied by several researchers (see, e.g., [17] for a systematic review), the migration between cloud providers has not received the same attention so far. Given the difficulties and variability, the comparative analyses published on multi-deployment, understood as deploying using different providers, typically focus on feasibility and experience reports (see, e.g., [26] or [18]), and when measuring effort, the focus is usually on operational cost (see, e.g., [27, 28, 29]) or person hours (see, e.g., [27, 29, 30, 31, 32, 33]). With our trans-cloud infrastructure, once an application is deployed and running, the effort for a migration operation is almost zero. To perform the component-wise migration of an application we must just provide the target locations of the components to be migrated. Specifically, the migration may operate on as many stateless components of an application as desired. Assuming that the SeaClouds application has been deployed as indicated in Section 2.2, using the extended Brooklyn, we explain here how the tool can be used to execute Scenario 1, presented in Section 3.2 and evaluated in the previous section. As pointed out there, the algorithm just requires a map associating the components to be migrated to the target locations. These locations are selected from a catalog of available services. Specifically, the information provided to the migration process for Scenario 1 is the one shown in the listing in Figure 12.

22

665

670

675

680

685

690

4.3. Threads to Validity We do not provide specific results on the time required for a migration. As the experiments show the migration time depends on the sources and targets of the specific services used, and on the number of components affected by a specific migration request. To be able to grasp conclusions on the required amount of time, an exhaustive analysis should have to be carried out, with many different applications and many migration scenarios, perhaps randomly generated. However, notice that our goal was to show the feasibility of the approach and the significant reduction on the migration times and effort, what we believe is sufficiently demonstrated. The previous discussion shows the results for 100 experiments for each scenario for each of the algorithms under comparison. These were the successful experiments though. Some other experiments failed because some of the requests for the provision of services failed or a timeout error was internally produced. In these cases our framework may re-attempt provisioning as many times as necessary without altering the application availability, and the stop phase may only start when the re-deployment phase has been successfully completed, therefore guaranteeing the resilience of the framework. Notice however that the reference algorithm cannot take advantage of that because it directly starts by stopping components, taking the application down, and therefore impacting in its availability. Since we are not carrying on a study on the reliability of the providers, and these experiments would significantly worsen the results for the reference algorithm, we decided not to consider these failures in our analysis. In our scenarios, we have tried to show different situations in which, independently of the number of components being migrated, a different number of components in the hierarchy are affected by the operation. Basically, these situations reflect the depth in the hierarchy at which the components to migrate are, and how intricate are the dependencies in our applications. From the experience with the SeaClouds application, and other applications we have experimented with (see, e.g., reports at [24, 25]), the migration time goes in the execution of the stop and start/re-start operations of the affected components. In our algorithm, only necessary components are stopped (see Scenarios 2 and 3), and the provisioning and release operations are performed in background, so that downtimes are minimized.

695

5. Related Work

700

We can identify in the literature different ways of understanding the term migration depending on what kind of cloud services are involved and how they are managed (see, e.g., [17, 18, 15]): migration of legacy applications, VM migration, container migration, and the migration of application components. Our work clearly falls in this last group. However, since all the others are related to portability and interoperability issues, we summarize some related works dealing with other types of migration. Most available works in the literature on migration verse on the migration of legacy applications to the cloud [15], where entire applications or some parts 23

705

710

715

720

725

730

735

740

745

of them are moved to a cloud environment in order to take advantage of cloud features such as elasticity and scalability, payment strategies, or on-demand provisioning of services [34]. Regarding portability issues, proposals like [16, 35] focus on adaptation mechanisms. To deal with cloud heterogeneity, several authors have proposed unified APIs that integrate different cloud capabilities, providing common mechanisms to interact with providers when application are deployed. This is the case of initiatives such as COAPS [36] or Nucleus [37], as well as open libraries like jClouds. These solutions do not provide support for the management of the application life-cycle, what makes difficult to reach generic mechanisms to achieve component runtime migration. Moreover, they focus on concrete abstraction levels. While jClouds unifies IaaS services, Nucleus and COAPS focus on PaaS. Taking advantage of existing mechanisms based on these APIs and libraries, some other authors propose solutions to manage application life-cycles by using portability-oriented descriptions for application deployments. For example, Brooklyn benefits from jClouds to control IaaS services and Nucleus [38] defines an application description based on the homonymous API to integrate several PaaS platforms for multi-cloud deployment. There are several commercial tools, like Flexiant,8 that support cloud orchestration of multiple IaaS providers, bridging capabilities to scale, deploy and configure servers. The trans-cloud approach [7] builds on these APIs. As a way to abstract from the particularities of the providers and automate the deployment process, some approaches, like us, use detailed specifications of the topology of applications [15, 39, 40]. Using abstract specifications, portability issues can be solved using efficient and robust deployment processes. For instance, SCALES [41] uses a domain specific language (DSL) to build full-detailed application descriptions, where information on components, technological requirements, resource constraints, and other elements are specified. Although it provides support for IaaS and PaaS services, all modules must be deployed using the same provider, that is, it has no support for cross-cloud deployment. Systems like Cloud4SOA [42], SeaClouds [43], and CMotion [44] use exhaustive topology descriptions to provide multi-cloud deployments [12]. These systems use application topologies to propose solutions to semantic interoperability issues on IaaS and PaaS services, and facilitate the deployment, lifecycle management and monitoring of applications. CMotion [44] proposes a holistic migration process to move application components to different providers. Using a topology description and a set of adaptors, they can manage different kinds of services, both in IaaS and PaaS, but they do not support runtime migration. Instead of using ad-hoc notations for the description of the topologies, SeaClouds uses TOSCA to describe the topology of applications. CoMe4ACloud [45], a multi-cloud management system for resources on different abstraction levels, uses an enriched TOSCA to describe XaaS models [39]. Alien4Cloud is a platform that provides support for the managing of applications based on 8 Flexiant:

https://www.flexiant.com/.

24

750

755

760

765

770

775

780

785

790

their TOSCA topologies. It allows agnostic-topologies by taking advantage of placement policies defined by the TOSCA standard specification and it also supports multi-cloud management. A general discussion on the advantages of using TOSCA for multi-cloud orchestration can be found in [46]. Although TOSCA is the most popular standard for the description of the topology of cloud applications, there are other standardized application profiles to deal with heterogeneity and portability issues. For example, OCCI9 (Open Cloud Computing Interface) is a standard developed by OGF (Open Grid Forum) which, like TOSCA, provides a metamodel for the exhaustive modeling of applications and cloud resources and a a common interface to manage cloud providers, similar to CAMP. Some interesting proposals, like MoDMaCAO [47] and FCLOUDS [48], use OCCI to achieve multi-cloud management in IaaS. OCCIware [49] is a framework that proposes an extension of the OCCI standard to define and deploy applications using PaaS and SaaS services. None of the above mentioned works provide support for the migration of applications at runtime. In fact, only a few works have attempted the runtime reconfiguration of applications. Erbel et al. present in [50] an OCCI-based model and an engine to apply modifications to application topologies in runtime. Starting from a running application, whose topology is modeled using an extended-OCCI model, they propose to modify it according to a new target status specification (which is also modeled using OCCI). Although they do no support either multi-cloud, component-wise migration of applications, nor agnostic level mechanisms, their solution shares some similarities with our proposal. SeaClouds provides some basic support for moving components between providers, although very limited, and only between services at the same level of abstraction. Several algorithms for the robust migration between clouds have been proposed (see, e.g., [19, 20, 21]). Although they provide solutions for componentwise migration, they are limited to one specific level and they are not focused on the performance optimization of the migration process. The community has found in container-based approaches a solution to deal with portability issues in the cloud. Container-based solutions such as Docker allow describing applications and their functional dependencies using a layered strategy over a container, which is then portable to any system supporting the container technology. This ensures that both the application and its dependencies will be installed and properly configured, what in portability terms of cloud native applications makes containerization a real and effective breakthrough. Moreover, elastic container platforms like Apache Mesos, Docker Swarm, or Kubernetes have emerged to solve the problems related to the orchestration and clustering of applications, supporting demanded capabilities like multi-cloud management. The container solution is not perfect though. As some authors have already 9 OCCI

standard’s web site: http://occi-wg.org/.

25

795

800

805

810

815

820

825

830

pointed out [51, 52, 53], this kind of solutions presents a number of limitations. Since software components are packaged as containers, the extraction of runtime models of the topology of applications becomes more difficult, then challenging tasks such as the management of multi-component applications or multi-cloud runtime migration. Some effort is being performed on these challenges. For example, TOSCA has recently incorporated changes in the profiling for the integration of container-based technologies. On the same way, Brogi et al. envision in [51] a TOSCA-based combination with Docker like a method to enhance the capability to extract the complete description of complex applications in order to allow orchestrators to have a better management of multi-cloud environments. In [53], Quint et al. propose the use of applications’ agnostic topologies in the container scope, separating topology descriptions from the specification of the container-based platforms where they will be deployed. For it, they define a complete DSL for applications’ descriptions and propose an orchestrator to deploy applications on multi-cloud environments independently of the used elastic container platforms. Using this container-based framework, they provide support for the migration of runtime environments, that is, application migration between different providers in the container scope. However, they just consider whole-application migration and not its components individually. Proposals like [54, 55] also present some novel research on live migration in the container context. Taking into account application profiling, they propose new methods to accomplish transferable stateful containers. Works on container migration are quite close to VM migration [56]. Indeed, as pointed in, e.g., [52] or [51], containers can be seem as a replacement for VMs in the cloud context. As containerized solutions, VMs are managed as blackboxes and they do not facilitate the extraction of the contained applications components and its relations/dependencies with other application parts. Moreover, it depends on IaaS capabilities, making a challenge the agnostic migration of applications’ components. Some proposals have tried to mitigate these issues by providing algorithms [57, 58, 59], and new metrics [60] for the evaluation, optimization and scheduling of VM migration. We conclude our related section with Terraform.10 Terraform is a commercial tool that provides a flexible abstraction of cloud resources, providers and applications, which allows representing physical hardware, virtual machines, containers, and IaaS and PaaS cloud resources by means of an enriched application description that has its own DSL. Based on that description, Terraform can infer declarative plans to operate used resources to modify the application topology in runtime. Terraform can duplicate IaaS resources in new locations. However, they do not provide either agnostic-level migration or a standardized approach. 10 Terraform

by HashiCorp: https://www.terraform.io/.

26

6. Conclusions

835

840

845

850

855

860

865

870

In this work we have proposed a migration orchestrator. The migration orchestrator has been developed in the context of our trans-cloud framework, what allows the orchestrator to manage components deployed on different providers and on different levels, either IaaS or PaaS. The orchestrator allows the concurrent migration of several components of an application, and it only needs the mapping of components to be migrated to new locations, independently of the service level used both in the source and the target providers. Thus, the proposed component-wise migration orchestrator is vendor, technology and service-level agnostic. The migration process is fully automated, and is available on an extended version of Apache Brooklyn. The orchestrator relies on a TOSCA specification of the application being managed. This specification is however the same one used for the initial deployment of the application. The only required external intervention to carry out the migration is just a migration request to initialize the process and the specification of the target locations of the components to be migrated. As the analysis of the migration algorithm shows, it provides a migration mechanism that fulfils our goals: • The effort of the user to perform a migration operation is almost zero, just selecting the components to move and their target locations from the catalog of available services. Notice that of course there is an initial effort for the specification of the TOSCA description of the application. This effort was already necessary for the autonomous deployment of applications [23, 7]. • Section 4.1 presents evidence that the application downtimes have been significantly reduced; although we cannot say that it has been reduced to its minimum, we have argued that the algorithm was designed so that all operations that could be carried out in background, either before or after the downtime, are. We leave for future work possible reductions on the downtimes, for instance by estimating the stopping and starting times so that phases can be overlapped. • The presented migration algorithm could be improved by reducing the migration costs is different ways. A first idea could be, for instance, to analyze the dependencies and durations of the start phases so that instead of initiating all tasks simultaneously, they are postponed as much as possible so that they all terminate as late as possible but before the one taking longer. Our priority however was to minimize downtimes, and always assumed that the extra cost of provisioning the migrated resources before those being moved were stopped was negligible. An in-detail analysis of the extra cost, and the improvement of the algorithm, are left for future work.

27

875

• The way dependencies are handled ensure the integrity of the application during the migration process [19]. A formal proof of this claim is left for future work. Our approach has however three main limitations, which we plan to study as future work:

880

885

890

• The topology of the application does not change. Migration operations are restricted to re-allocation of components of applications to different services. It might be interesting to see how changes in the topology could similarly be handled. • Only stateless components can currently be migrated. The initial load of data and the transfer of data to and from the cloud is a hot research topic. Our algorithm assumes that no state needs to be transferred, what allows us to focus on the stop, re-connection and start/re-start of components. Although current distributed databases greatly solve the problem, we will study the migration of components with state. The migration of nodes of distributed databases is also an interesting topic of research. • Failures may happen in any of the phases of a migration process. We plan to improve our orchestrator by monitoring the entire process and adding the functionality for recovering from failures that may occur. Acknowledgements

895

This work has been partially supported by projects PGC2018-094905-B-I00 (Spanish MINECO/FEDER) and UMA18-FEDERJA-180 (J. Andaluc´ıa/FEDER), and by U. M´ alaga, Andaluc´ıa Tech. References References

900

[1] L. Youseff, M. Butrico, D. D. Silva, Toward a unified ontology of cloud computing, in: Grid Computing Environments Workshop (GCE), 2008, pp. 1–10. [2] M. A. Chauhan, M. A. Babar, B. Benatallah, Architecting cloud-enabled systems: a systematic survey of challenges and solutions, Software: Practice and Experience 47 (4) (2017) 599–644.

905

910

[3] F. Paraiso, N. Haderer, P. Merle, R. Rouvoy, L. Seinturier, A federated multi-cloud PaaS infrastructure, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2012, pp. 392–399. [4] N. Grozev, R. Buyya, Inter-cloud architectures and application brokering: taxonomy and survey, Software: Practice and Experience 44 (3) (2014) 369–390. 28

[5] K. Kritikos, D. Plexousakis, Multi-cloud application design through cloud service composition, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2015, pp. 686–693.

915

[6] J. Carrasco, J. Cubo, F. Dur´ an, E. Pimentel, Bidimensional cross-cloud management with TOSCA and Brooklyn, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2016, pp. 951–955. [7] J. Carrasco, F. Dur´ an, E. Pimentel, Trans-cloud: CAMP/TOSCA-based bidimensional cross-cloud, Computer Standards & Interfaces 58 (2018) 167– 179.

920

925

[8] D. Androcec, N. Vrcek, P. Kungas, Service-level interoperability issues of platform as a service, in: IEEE World Congress on Services (SERVICES), 2015, pp. 349–356. [9] A. Moustafa, M. Zhang, Q. Bai, Trustworthy stigmergic service composition and adaptation in decentralized environments, IEEE Transactions on Services Computing 9 (2) (2016) 317–329. [10] A. Brogi, A. Ibrahim, J. Soldani, J. Carrasco, J. Cubo, E. Pimentel, F. D’Andria, SeaClouds: a European project on seamless management of multi-cloud applications, ACM SIGSOFT SEN 39 (1) (2014) 1–4.

930

935

[11] T. Mahmood, B. Balasubramanian, M. Thottethodi, S. G. Rao, K. Joshi, ACCORD: automated change coordination across independently administered cloud services, in: IEEE Intl. Conf. on Cloud Computing, CLOUD, 2018, pp. 770–777. [12] K. Saatkamp, U. Breitenb¨ ucher, O. Kopp, F. Leymann, Topology splitting and matching for multi-cloud deployments, in: Intl. Conf. on Cloud Computing and Services Science (CLOSER), 2017, pp. 247–258. [13] K. Alexander, C. Lee, E. Kim, S. Helal, Enabling end-to-end orchestration of multi-cloud applications, IEEE Access 5 (2017) 18862–18875.

940

[14] A. Brogi, J. Carrasco, J. Cubo, E. D. Nitto, F. Dur´ an, M. Fazzolari, A. Ibrahim, E. Pimentel, J. Soldani, P. Wang, F. D’Andria, Adaptive management of applications across multiple clouds: The SeaClouds approach, CLEI Electronic Journal 18 (1) (2015) 2–2. [15] M. F. Gholami, F. Daneshgar, G. Low, G. Beydoun, Cloud migration process—a survey, evaluation framework, and open challenges, Journal of Systems and Software 120 (2016) 31–69.

945

[16] M. Borges, E. Barros, P. H. Maia, Cloud restriction solver: A refactoringbased approach to migrate applications to the cloud, Information and Software Technology 95 (2018) 346–365.

29

[17] P. Jamshidi, A. Ahmad, C. Pahl, Cloud migration research: a systematic review, IEEE Transactions on Cloud Computing 1 (2) (2013) 142–157. 950

[18] J.-F. Zhao, J.-T. Zhou, Strategies and methods for cloud migration, International Journal of Automation and Computing 11 (2) (2014) 143–152. [19] F. Dur´ an, G. Sala¨ un, Robust reconfiguration of cloud applications, in: Intl. ACM Sigsoft Symp. on Component-based Software Engineering, CBSE’14, 2014, pp. 179–184.

955

[20] F. Dur´ an, G. Sala¨ un, Robust and reliable reconfiguration of cloud applications, Journal of Systems and Software 122 (2016) 524–537. [21] F. Boyer, O. Gruber, D. Pous, Robust reconfigurations of component assemblies, in: Intl. Conf. on Software Engineering (ICSE), 2013, pp. 13–22.

960

[22] OASIS, CAMP: Cloud Application Management for Platforms (Version 1.1), http://docs.oasis-open.org/camp/camp-spec/v1.1/ camp-spec-v1.1.html/ (2012). [23] OASIS, TOSCA: Topology and Orchestration Specification for Cloud Applications (Version 1.0), http://docs.oasis-open.org/tosca/TOSCA/ v1.0/ (2012).

965

970

[24] J. Carrasco, F. Dur´ an, E. Pimentel, Component-wise application migration in bidimensional cross-cloud environments, in: Intl. Conf. on Cloud Computing and Services Science (CLOSER), 2017, pp. 259–269. [25] J. Carrasco, F. Dur´ an, E. Pimentel, Runtime migration of applications in a trans-cloud environment, in: Adaptive Services-Oriented and Cloud Applications (ASOCA), Vol. 10797 of Lecture Notes in Computer Science, 2018, pp. 55–66. [26] M. A. Chauhan, M. A. Babar, Migrating service-oriented system to cloud computing: An experience report, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2011, pp. 404–411.

975

980

[27] A. Khajeh-Hosseini, I. Sommerville, J. Bogaerts, P. B. Teregowda, Decision support tools for cloud migration in the enterprise, in: Intl. Conf. on Cloud Computing (CLOUD), 2011, pp. 541–548. [28] B. C. Tak, B. Urgaonkar, A. Sivasubramaniam, To move or not to move: The economics of cloud computing, in: Conf. on Hot Topics in Cloud Computing (HotCloud), 2011, pp. 1–6. [29] M. Menzel, R. Ranjan, L. Wang, S. U. Khan, J. Chen, Cloudgenius: A hybrid decision support method for automating the migration of web application clusters to public clouds, IEEE Transactions Computers 64 (5) (2015) 1336–1348.

30

985

[30] W. Qiu, Z. Zheng, X. Wang, X. Yang, M. R. Lyu, Reliability-based design optimization for cloud migration, IEEE Transactions on Services Computing 7 (2) (2014) 223–236. [31] K. Sun, Y. Li, Effort estimation in cloud migration process, in: Intl. Symp. on Service-Oriented System Engineering (SOSE), 2013, pp. 84–91.

990

995

[32] J. Miranda, J. Guill´en, J. M. Murillo, C. Canal, Assisting cloud service migration using software adaptation techniques, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2013, pp. 84–91. [33] L. M. Pham, A. Tchana, D. Donsez, N. De Palma, V. Zurczak, P.-Y. Gibello, Roboconf: a hybrid cloud orchestrator to deploy complex applications, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2015, pp. 365–372. [34] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, M. Zaharia, A view of cloud computing, Communications of the ACM 53 (4) (2010) 50–58.

1000

1005

[35] J. Spillner, Y. Bogado, W. Ben´ıtez, F. L´ opez-Pires, Co-transformation to cloud-native applications - development experiences and experimental evaluation, in: Intl. Conf. on Cloud Computing and Services Science (CLOSER), 2018, pp. 596–607. [36] E. Hossny, S. Khattab, F. Omara, H. Hassan, A case study for deploying applications on heterogeneous PaaS platforms, in: IEEE Intl. Conf. on Cloud Computing and Big Data (CloudCom-Asia), 2013, pp. 246–253. [37] S. Kolb, G. Wirtz, Towards application portability in platform as a service, in: Intl. Symp. on Service Oriented System Engineering (SOSE), 2014, pp. 218–229.

1010

[38] S. Kolb, C. R¨ ock, Unified cloud application management, in: IEEE World Congress on Services Computing (SERVICES), 2016, pp. 1–8. [39] Z. Cai, X. Li, J. N. Gupta, Heuristics for provisioning services to workflows in XaaS clouds, IEEE Transactions on Services Computing 9 (2) (2016) 250–263.

1015

1020

[40] M. P. Papazoglou, P. Traverso, S. Dustdar, F. Leymann, Service-oriented computing: State of the art and research challenges, IEEE Computer 40 (11) (2007) 38–45. [41] A. Ranabahu, E. M. Maximilien, A. Sheth, K. Thirunarayan, Application portability in Cloud Computing: An abstraction-driven perspective, IEEE Transactions on Services Computing 8 (6) (2015) 945–957.

31

1025

[42] E. Kamateri, N. Loutas, D. Zeginis, J. Ahtes, F. D’Andria, S. Bocconi, P. Gouvas, G. Ledakis, F. Ravagli, O. Lobunets, K. Tarabanis, Cloud4SOA: A semantic-interoperability paas solution for multi-cloud platform management and portability, in: European Conf. on Service-Oriented and Cloud Computing (ESOCC), 2013, pp. 64–78. [43] A. Brogi, J. Carrasco, J. Cubo, F. D’Andria, E. Di Nitto, M. Guerriero, D. P´erez, E. Pimentel, J. Soldani, SeaClouds: An open reference architecture for multi-cloud governance, in: European Conf. Software Architecture (ECSA), 2016, pp. 334–338.

1030

1035

[44] T. Binz, F. Leymann, D. Schumm, Cmotion: A framework for migration of applications into and between clouds, in: IEEE Intl. Conf. on ServiceOriented Computing and Applications (SOCA), 2011, pp. 1–4. [45] H. Bruneli`ere, Z. Alshara, F. Alvares, J. Lejeune, T. Ledoux, A model-based architecture for autonomic and heterogeneous cloud systems, in: Intl. Conf. on Cloud Computing and Services Science (CLOSER), 2018, pp. 201–212. [46] P. Lipton, D. Palma, M. Rutkowski, D. A. Tamburri, TOSCA solves big problems in the cloud and beyond!, IEEE Cloud Computing (2018) 1–1.

1040

[47] F. Korte, S. Challita, F. Zalila, P. Merle, J. Grabowski, Model-driven configuration management of cloud applications with OCCI, in: Intl. Conf. on Cloud Computing and Services Science (CLOSER), 2018. [48] S. Challita, F. Zalila, P. Merle, Specifying semantic interoperability between heterogeneous cloud resources with the FCLOUDS formal language, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2018.

1045

[49] F. Zalila, S. Challita, P. Merle, Model-driven cloud resource management with occiware, Future Generation Computer Systems 99 (2019) 260–277. [50] J. Erbel, F. Korte, J. Grabowski, Comparison and runtime adaptation of cloud application topologies based on OCCI, in: Intl. Conf. on Cloud Computing and Services Science (CLOSER), 2018, pp. 517–525.

1050

[51] A. Brogi, L. Rinaldi, J. Soldani, TosKer: A synergy between TOSCA and docker for orchestrating multicomponent applications, Software: Practice and Experience (2018) 2061–2079. [52] C. Pahl, Containerization and the PaaS cloud, IEEE Cloud Computing 2 (3) (2015) 24–31.

1055

[53] P. Quint, N. Kratzke, Towards a lightweight multi-cloud DSL for elastic and transferable cloud-native applications, in: Intl. Conf. on Cloud Computing and Services Science (CLOSER), 2018, pp. 400–408. [54] D. Elliott, C. Otero, M. Ridley, X. Merino, A cloud-agnostic container orchestrator for improving interoperability, in: IEEE Intl. Conference on Cloud Computing (CLOUD), 2018, pp. 958–961. 32

1060

1065

[55] X. M. Aguilera, C. Otero, M. Ridley, D. Elliott, Managed containers: A framework for resilient containerized mission critical systems, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2018, pp. 946–949. [56] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, A. Warfield, Live migration of virtual machines, in: Conf. on Networked Systems Design & Implementation (NSDI), 2005, pp. 273–286. [57] S. Kashyap, J. S. Dhillon, S. Purini, RLC-a reliable approach to fast and efficient live migration of virtual machines in the clouds, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2014, pp. 360–367.

1070

[58] W. Zhang, K. T. Lam, C. L. Wang, Adaptive live VM migration over a WAN: Modeling and implementation, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2014, pp. 368–375. [59] H. Lu, C. Xu, C. Cheng, R. Kompella, D. Xu, vHaul: Towards optimal scheduling of live multi-VM migration for multi-tier applications, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2015, pp. 453–460.

1075

[60] U. Deshpande, Y. You, D. Chan, N. Bila, K. Gopalan, Fast server deprovisioning through scatter-gather live migration of virtual machines, in: IEEE Intl. Conf. on Cloud Computing (CLOUD), 2014, pp. 376–383.

33

Conflict of Interest

1080

We are not aware of any conflicts of interest with the editorial board of the journal.

34

Live migration of trans-cloud applications

Live migration of trans-cloud applications

Recommend Documents