A Bio-Inspired Algorithm for Virtual Machines Allocation in Public Clouds

A Bio-Inspired Algorithm for Virtual Machines Allocation in Public Clouds

Available online at www.sciencedirect.com Available online at www.sciencedirect.com …‹‡…‡‹”‡…– …‹‡…‡‹”‡…– Procedia online Computer 00 (2018) 0...

838KB Sizes 4 Downloads 42 Views

Available online at www.sciencedirect.com Available online at www.sciencedirect.com

…‹‡…‡‹”‡…– …‹‡…‡‹”‡…–

Procedia online Computer 00 (2018) 000–000 Available at Science www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000

ScienceDirect

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

Procedia Computer Science 151 (2019) 1072–1077

The International Workshop on Peer-to-Peer Architectures, Networks, and Systems (P2PANS) The International Workshop on 29 Peer-to-Peer Architectures, Networks, and Systems (P2PANS) April – May 2, 2019, Leuven, Belgium April 29 – May 2, 2019, Leuven, Belgium

A Bio-Inspired Algorithm for Virtual Machines Allocation in Public A Bio-Inspired Algorithm for Virtual Machines Allocation in Public Clouds Clouds Sarah Alhassana,b, *, Majed Abdulghanicc a,b, Sarah Alhassan *, Majed Abdulghani

Computer Science Department, Al Imam Muhammad Ibn Saud Islamic University, Riyadh 11432, Saudi Arabia b Computer ScienceAl Department, King Saud Riyadh 11362,Riyadh Saudi Arabia Computer Science Department, Imam Muhammad Ibn University, Saud Islamic University, 11432, Saudi Arabia c Faculty ofbComputer Science, Engineering and Computing, Kingston University, Surrey KT1 1LQ,Arabia United Kingdom Science Department, King Saud University, Riyadh 11362, Saudi c Faculty of Science, Engineering and Computing, Kingston University, Surrey KT1 1LQ, United Kingdom

a a

Abstract Abstract The rapid spread of cloud computing during the past few years has increased the need to improve cloud services. The infrastructure-as-a-service (IaaS) model isduring amongthe thepast highly computing models that has been to many The rapid spread of cloud computing fewchallenging years has cloud increased the need to improve cloudsubject services. The infrastructure-as-a-service modelthe is among the highly challenging cloud models that has been subject many optimization proposals. In(IaaS) particular, scheduling algorithms followed by computing IaaS resource management systems to to allocate optimization proposals. the scheduling followed bythe IaaS resource management to allocate virtual machine requestsIntoparticular, physical cloud severs playalgorithms an important role in performance of the entiresystems cloud. This paper proposes a distributed scheduling algorithm for IaaS cloud services imitating theoflocust behavior in nature. The virtual machine requests to physical cloud severs play an computing important role in thebyperformance the entire cloud. This paper proposes a distributed scheduling IaaS cloud computing services by imitating the locust behavior in nature. The simulation results show that thealgorithm proposedforLocust algorithm reduces the average turnaround time of requests without simulation results showutilization. that the proposed Locust algorithm reduces the average turnaround time of requests without compromising the server compromising the server utilization. © 2019 The Authors. Published by Elsevier B.V. © 2019 The Authors. Published by Elsevier B.V. © 2019 The Authors. by Elsevier This is an open accessPublished article under the CC B.V. BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the Conference Program Chairs. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the Conference Program Chairs. Peer-review under responsibility of the Conference Program Chairs. Keywords: Cloud Computing; Infrastructure as service; Locust in scheduling; Resource Allocation; Bio-Inspired Keywords: Cloud Computing; Infrastructure as service; Locust in scheduling; Resource Allocation; Bio-Inspired

1. Introduction 1. Introduction As the Internet has evolved, cheaper and more powerful storage and computing resources have become available As thecloud Internet has evolved, cheaper andservices more powerful storage computing resources have become available through computing [1]. When these are offered to theand public, they are called public clouds [2]. Some through cloud computing these services offeredMicrosoft to the public, they areAmazon called public clouds [2]. Some examples of public clouds[1]. areWhen Google App Engineare (GAE), Azure, and Web Services (AWS). examples of public clouds are Google App Engine (GAE), Microsoft Azure, and Amazon Web Services (AWS).

* Corresponding author. Tel.: +96611 258-1616; fax: +96611 259-1616. address:author. [email protected] * E-mail Corresponding Tel.: +96611 258-1616; fax: +96611 259-1616. E-mail address: [email protected] 1877-0509 © 2019 The Authors. Published by Elsevier B.V. This is an open access under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) 1877-0509 © 2019 Thearticle Authors. Published by Elsevier B.V. Peer-review under responsibility of the Conference Program Chairs. This is an open access article under CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the Conference Program Chairs. 1877-0509 © 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the Conference Program Chairs. 10.1016/j.procs.2019.04.152

2

Sarah Alhassan et al. / Procedia Computer Science 151 (2019) 1072–1077 Sarah Alhassan and Majed Abdulghani / Procedia Computer Science 00 (2018) 000–000

1073

One cloud service model is infrastructure as a service (IaaS); it provides users with infrastructure (storage, processing, and a network) and it is abstracted from users as a group of services [3]. The greatest challenge for this model is how to allocate user requests to available and appropriate resources in an efficient manner; this is known as the resource allocation (RA) process. Cloud physical resources are shared through user requests under virtualization [4]. The RA is concerned with mapping user requests for cloud resources (CPU, memory, network, storage) to a set of virtualized servers having capacities for each cloud resource. RA is considered an NP-complete problem [5]; therefore, some heuristics are needed to achieve good performance. Many previous studies explored various RA strategies. In [6], the authors proposed an algorithm that showed improvement in time and CPU utilization. It computed the difference between the gain function and the loss function for each task. The algorithm was based on prioritizing the tasks, and it assigned high priority to tasks with the potential for early completion. Moreover, authors [7] proposed a method of priority task scheduling for monitoring virtual machines. The problem in these studies is that tasks should be prioritized before allocation. A study by [8] presented multiple strategies such as that of Siva Theja et. al. [9], who used join-the shortest-queue (JSQ) routing and two-choice routing algorithms with MaxWeight scheduling policy. The study showed that this policy was throughput optimal. Also, authors [10] investigated an RA strategy that limited electric power capacity by maximizing the number of requests processed while decreasing the power consumption. Authors [11] presented multiple RAs, including the efficient RA framework of Zhiyuan Xu et al. [12], called deep reinforcement learning (DRL), to minimize total power consumption. However, they did not consider our matrices: CPU utilization and turnaround time. Our contribution in this paper is to find an effective method for RA using a bio-inspired algorithm, Locust. Locusts are insects that form swarms of millions of individuals in order to migrate. Usually, locusts live in isolation; these are solitary locusts. Here, in the solitary phase, locusts tend not to be grouped and they act independently of each other. There is also a gregarious phase in which locusts act as group and cooperate with each other [13]. Some stimuli act as a trigger to perform the shift from solitary to gregarious phase and vice versa; this shift is called a phase change. The sensory stimuli can be divided into the cerebral pathway, with combined visual and olfactory stimuli, and the thoracic pathway, using tactile information [14]. The behavior of locusts is imitated in RA for clouds, and the results indicate a significant improvement in turnaround time without reducing CPU utilization. The rest of this paper is organized as follows: section 2 introduces the proposed algorithm; the evaluation methodology is presented in section 3; experimental results are presented and discussed in section 4. 2. Locust algorithm Our proposed Locust algorithm is implemented over Haizea [15], an open-source virtual machine (VM) scheduler that manages IaaS clouds. It takes the lease (request) and maps it to appropriate nodes (resources). Therefore, to enhance the scheduler, we modify it to fragment the leases into sub-leases, each of which requests just one node; we call the enhanced scheduler a fragmented scheduler. In this section, we first describe the behavior of the centralized scheduler of Haizea. Then, we present the partially distributed behavior of our approach that is built on the fragmented scheduler. To explain how Haizea works in the scheduling phase, first leases arrive in a centralized queue. Each lease is queued until it has a chance to be scheduled. When it has its turn, the scheduling process—involving mapping leases into physical nodes (resource reservation)—will start. For the mapping step, we need the host selection policy which takes the decision to map physical nodes to VMs. Here, quality of service is not considered; the leases just need to be served as soon as there are available resources. Multiple concerns are involved in the process: minimizing lease turnaround time while maximizing/preserving CPU utilization. For host selection in Haizea, the main built-in policy is the greedy policy. It gives host scores based on criteria such as preferring nodes with fewer leases to preempt, preferring nodes with the highest capacity, and preferring nodes where the current capacity does not change for the longest time. After this scoring, the list of nodes should be sorted, then suitable ones for the requested lease are found via a centralized decision.

Sarah Alhassan et al. / Procedia Computer Science 151 (2019) 1072–1077 Sarah Alhassan and Majed Abdulghani / Procedia Computer Science 00 (2018) 000–000

1074

3

In our approach, after the leases arrive, they are fragmented into sub-leases, each of which requires just one node. Then, the system takes the leases one by one and moves them into the mapping phase to find the best-fit node. In our approach, some information on reservations that is not considered in the original system should be known by the physical nodes. Thus, in this approach the decision is not centralized to the scheduler; each physical node has the ability to decide if it can serve or not in some circumstances. Consequently, each node has a buffer of assigned leases that are mapped into this node. For host selection, we start with the greedy policy mentioned above, which imitates the solitary phase of locust behavior. For each lease mapping, the lease is appended to the node buffer. The stimulus that triggers the nodes (locusts) to behave gregariously is the global threshold. When the system reaches the global threshold—the maximum number of leases that are not served yet—this information is broadcast to all the nodes to ask them to be ready nodes. A node will be ready if and only if it does not reach its local threshold (i.e., the maximum number of leases assigned to that node). Next, the host will select one of the ready nodes after shuffling. The system will go back to the solitary phase if the number of lease requests is below the global threshold, or if the system reaches the global threshold but the nodes reach their local threshold. So, the system moves forth and back between these phases according to the number of unserved leases. The decision of host selection here shows a kind of cooperation of the nodes with the system; the system does not have full control over decisions, and each node can decide whether it can cooperate or not via the local threshold. After a lease begins execution on a specific node, this lease is removed from the node’s buffer and the number of assigned leases is reduced by one. An illustration of locust behavior is shown in Fig. 1.

(a) Host selection of Locust

(b) Add reservation of Locust Fig. 1. Locust behavior

3. Methodology The testbed used is an HP ProLiant DL380p Gen8 E5-2620 1P 32 GB-R 900 GB 460 W Server/GO (470065769) with a 64-bit Centos operating system. It is powered by a 6-core, 2 GHz Intel Xeon E5-2620 processor, with a 15-MB cache. The memory used here is 32 GB. It uses also 2U which is a two-socket rack server that provides more processing efficiency. It comes with Virtualization Technology and a Virtual Machine Manager supported by the Intel Virtualization Technology for Directed I/O. The experiments were conducted over this testbed with a Cisco switch and VLAN with Virtual Manager. The workload was constructed using Nordugrid.org [16]. The NorduGrid Collaboration coordinates the development of Advance Resource Connector (ARC) which is an open-source Grid

4

Sarah Alhassan et al. / Procedia Computer Science 151 (2019) 1072–1077 Sarah Alhassan and Majed Abdulghani / Procedia Computer Science 00 (2018) 000–000

1075

middleware. It contains 14 data sets, formatted as a Grid Workload. The job trace contains 781,370 leases with 1024 nodes. For the experiments, 500 consecutive leases were used to generate XML trace files including other data, such as duration (requested time) and number of requested nodes. The Locust scheduler was implemented in Python and tested in the simulation mode. The experimental evaluation framework used can be described as follows: 1.

2.

3.

4.

Determining the parameters that affect the behavior of the algorithm: the number of system nodes; the interarrival time, which is the time between two consecutive leases; the global threshold for the whole system, which indicates that the system has large number of unserved leases; and the local threshold for each node, which is the number of reservations already assigned to the node beyond which the node cannot serve any further reservations at that time. Varying the experimental variable levels, generating 96 experiments (as shown in Table 1); these levels are: • Number of system nodes: values are selected from {10, 20, 30, 40}; • Inter-arrival time (IT): values are selected from {2, 4, 8, 16} minutes; • Global thresholds (G) which are selected empirically as {15, 20, 25} leases; • Local thresholds (L) which are also empirically selected as {5, 10} leases. Identifying benchmarks: the new approach is evaluated by comparing it to Haizea, a well-established VM scheduling for cloud environment [15], which we called pure benchmark, and the fragmented scheduler benchmark that are described above. The pure and fragmented benchmarks map the leases using a greedy heuristic. Identifying the performance measures, which are: • Average turnaround time (TT) computed for all leases from the request time until they are served; • CPU utilization, which is evaluated periodically for all nodes (where maximum is better). Table 1 Experimental Variables Levels #nodes IT 10 20 30 40

2 4 8 16

Global Threshold

Local Threshold

15 20 25

5 10

4. Results and discussions The results for TT and CPU utilization are presented in Fig. 2 and Fig. 3, respectively. All the TT results of our approach are superior to the pure benchmark. Next, we compare our approach with the fragmented benchmark. For 30 and 40 system nodes (Fig. 2 i–p), the TT results of the fragmented benchmark are better than those using our approach, due to not having large number of unserved leases. For 10 system nodes, the system is clearly having large number of unserved leases frequently since the number of serving nodes is small compared to the number of input leases. For all IT values, the results show lower TT when L is large compared to the fragmented benchmark; the results for L = 10 (Fig. 2 a–d) are better than those for L = 5. When L is small, it increases the waiting time of leases in the queue since the leases assigned to the nodes’ queues reach this small L. For 20 system nodes (Fig. 2 e– h), when L is larger, the results for TT are worse. Also, the effects of G can be seen: when L = 5 and G decreases, TT gets better, whereas when L = 10 and G increases, TT gets worse. Overall, when the number of nodes is small, our approach performs better. To consider the effect of IT, when IT is large, our approach performs worse (Fig. 2 l, p), especially with a large number of nodes. We can conclude that selection of the optimal or nearly optimal pair of G and L values is based on the number of nodes. For 10 nodes, G = 25 or 15 with L = 10 gives the best results, while for 20 nodes, G = 15, L = 5 gives the best results. In terms of CPU utilization, our approach preserves the high CPU utilization achieved as shown in (Fig. 3 a-d). CPU utilization is relatively equal during all of the experiments through the global and local thresholds that are used. This is due to the decreased waiting time of leases, so fewer nodes are idle, and due to the fragmented leases that boost the CPU utilization. Overall, the superior results of our approach are more obvious when the number of nodes is small (e.g., 10 or 20

Sarah Alhassan et al. / Procedia Computer Science 151 (2019) 1072–1077 Sarah Alhassan and Majed Abdulghani / Procedia Computer Science 00 (2018) 000–000

1076

5

nodes), due to having large number of unserved requests compared to having small serving nodes. Consequently, the system will reach the global threshold and the nodes cooperate and act as ready nodes if they did not reach their local threshold.

(a) Nodes 10 IT 2

(b) Nodes 10 IT 4

(c) Nodes 10 IT 8

(d) Nodes 10 IT 16

(e) Nodes 20 IT 2

(f) Nodes 20 IT 4

(g) Nodes 20 IT 8

(h) Nodes 20 IT 16

(i) Nodes 30 IT 2

(j) Nodes 30 IT 4

(k) Nodes 30 IT 8

(l) Nodes 30 IT 16

(m) Nodes 40 IT 2

(n) Nodes 40 IT 4

(o) Nodes 40 IT 8

(p) Nodes 40 IT 16

Fig. 2. Average Turnaround Time (TT)

6

Sarah Alhassan et al. / Procedia Computer Science 151 (2019) 1072–1077 Sarah Alhassan and Majed Abdulghani / Procedia Computer Science 00 (2018) 000–000

(a) Nodes 10

(b) Nodes 20

(c) Nodes 30

1077

(d) Nodes 40

Fig. 3. CPU Utilization

5. Conclusions The proposed Locust algorithm was proposed to optimize the scheduling of virtual machines in IaaS cloud data centres. The algorithms shows superior results in reducing the average turnaround time compared with the benchmark based on pure and fragmented implementation under a small number of nodes while preserving the CPU utilization of the servers. Acknowledgements This work was funded by the Long-Term Comprehensive National Plan for Science, Technology and Innovation of the Kingdom of Saudi Arabia, grant number 11- INF1895-08. References [1] Zhang, Qi, Lu Cheng, and Raouf Boutaba. (2010) “Cloud computing: state-of-the-art and research challenges.” J Internet Serv Appl 1 (1): 718. https://doi.org/10.1007/s13174-010-0007-6 . [2] Huth, Alexa, and James Cebula. (2011) “The Basics of Cloud Computing.” US-CERT. [3] Kavis, Michael J. (2014) Architecting the Cloud: Design Decisions for Cloud Computing Service Models (SaaS, PaaS, and IaaS). US: Wiley [4] Vinothina, V, R. Sridaran, and PadmavathiGanapathi. (2012) “A Survey on Resource Allocation Strategies in Cloud Computing.” International Journal of Advanced Computer Science and Applications 3 (6). [5] Heger, Dominique A. Optimized Resource Allocation & Task Scheduling Challenges in Cloud Computing Environments [6] Koneru, Sowmya, V N Rajesh Uddandi, and Satheesh Kavuri. (2012) “Resource Allocation Method using Scheduling methods for Parallel Data Processing in Cloud.” International Journal of Computer Science and Information Technologies 3 (4): 4625 – 4628. [7] Kim, Dongsung, Hwanju Kim, Myeongjae Jeon, Euiseong Seo, and Joonwon Lee. (2008) “Guest-Aware Priority-Based Virtual Machine Scheduling for Highly Consolidated Server.” Lecture Notes in Computer Science Euro-Par. 285–294. [8] Sudeepa, R, and H S Guruprasad. (2014) “Resource Allocation in Cloud Computing.” International Journal of Modern Communication Technologies & Resource 2 (3). [9] Maguluri, SivaTheja, R. Srikanta, and Lei Ying. (2014) “Heavy Traffic Optimal Resource Allocation Algorithms for Cloud Computing Clusters.” Performance Evaluation 81: 20–39. [10] Mochizuki, Kazuki, and Shin-ichi Kuribayashi. (2011) “Evaluation of optimal resource allocation method for cloud computing environments with limited electric power capacity.” International Conference on Network-Based Information Systems. [11] Ealiyas, Aicy, and S. P. Jeno Lovesum. (2018) “Resource Allocation and Scheduling Methods in Cloud- A Survey.” 2018 Second International Conference on Computing Methodologies and Communication (ICCMC) 601-604. [12] Xu, Zhiyuan, Yanzhi Wang, Jian Tang, Jing Wang, and Mustafa Cenk Gursoy. (2017) “A Deep Reinforcement Learning based Framework for Power-Efficient Resource Allocation in Cloud RANs.” IEEE ICC 2017 Next Generation Networking and Internet Symposium. [13] Food and Agriculture Organization (FAO) of the United Nations. Locust Watch, Locusts in Caucasus and Central Asia. (2010) http://www.fao.org/ag/locusts-CCA/en/1010/1018/index.html [accessed 3 May 2017]. [14] Ernst, Ulrich R, Matthias B. Van Hiel, Geert Depuydt, Bart Boerjan, Arnold De Loof, and Liliane Schoofs. (2015) “Epigenetics and locust life phase transitions.” Journal of Experimental Biology 218: 88-99. doi: 10.1242/jeb.107078 [15] Sotomayor, Borja. (2009) “Haizea” http://haizea.cs.uchicago.edu/ [accessed 6 Febraury 2017]. [16] The NorduGrid Collaboration. http://www.nordugrid.org [accessed 5 June 2017].