An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters

An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters

Journal Pre-proof An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters Chunlin Li , Yihan Zhang ,...

1MB Sizes 0 Downloads 32 Views

Journal Pre-proof

An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters Chunlin Li , Yihan Zhang , Hao Zhiqiang , Luo Youlong PII: DOI: Reference:

S1389-1286(19)30684-X https://doi.org/10.1016/j.comnet.2020.107096 COMPNW 107096

To appear in:

Computer Networks

Received date: Revised date: Accepted date:

29 May 2019 13 December 2019 2 January 2020

Please cite this article as: Chunlin Li , Yihan Zhang , Hao Zhiqiang , Luo Youlong , An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters, Computer Networks (2020), doi: https://doi.org/10.1016/j.comnet.2020.107096

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier B.V.

An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters Chunlin Li 1, 2*, Yihan Zhang 1, Hao Zhiqiang 3, Luo Youlong 1 1 2

Department of Computer Science, Wuhan University of Technology, Wuhan 430063, P.R.China

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China

3

Key Laboratory of Metallurgical Equipment and Control Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan,430081, P.R.China * Corresponding author

Abstract: As the complexity of workflow applications increase, the scheduling and execution of workflow incur more waste of resources. In order to achieve load balancing and reduce the task execution time in the cloud system, an effective scheduling strategy based on hypergraph partition for workflow application in the geographically distributed datacenters is proposed. Firstly, a workflow job scheduling algorithm is proposed, which aims to reduce the response time and considers the cloud state. Besides, the task scheduling algorithm based on the hypergraph partition is designed with the goal of reducing the completion time and energy consumption for the tasks. In addition, the optimal task scheduling strategy can be obtained according to the Dijkstra shortest path algorithm based on Fibonacci heap. Finally, in the experiment, the workflow task scheduling algorithm can optimize the task execution performance and maintain the load balance of the computing nodes in each cloud, so that the average execution time of the tasks and the total energy consumption of the system are minimized. Keywords: Geographically distributed datacenter; workflow applications; hypergraph partition; scheduling

1. Introduction With the globalization of cloud computing, geo-distributed cloud environments have received more and more attention. According to the common standards and strategies, different clouds are connected to each other to form a geo-distributed cloud [1]. The geo-distributed cloud has more storage capacity and faster processing speed than the traditional cloud computing so that users can get better services. A geo-distributed cloud consists of many clouds located in different geographic locations. For example, Google has 13 cloud data centers distributed in 8 different countries [2]. The geo-distributed cloud can serve a wide range of requirements across a larger geographic area. More and more applications today rely on geo-distributed datacenters, such as media streaming applications, sensor networks, and online social networks [3]. Thus, more and more complex workflow applications are emerging. Compared to traditional Internet video services, social media applications have highly dynamic content and require lower latency in order to respond to user requests. For example, many entertainment videos are short videos (usually a few minutes) and the users may be distributed around the world. Viewers in different locations can't tolerate delays of more than a few seconds. Video workflows are typically content-centric processes and tend to change frequently due to various reasons, such as new user requests and changing human interaction patterns. Therefore, how to serve video workflow applications more quickly and efficiently is a challenging issue. There are still many challenges for workflow scheduling in a geographically distributed datacenters.

For example, most computing resources of servers with fewer data may remain idle, while low resource utilization may further cause more servers to be activated, which leads to a significant waste of resources. Therefore, it is necessary to study the rational allocation and use of resources in a geo-distributed cloud. In this paper, the M/ M/ C queuing model for each cloud is built based on the impact of the cloud state. The main goal is to study how to properly allocate cloud resources, while ensuring load balancing and reducing the energy consumption of geographically distributed cloud systems. Moreover, a workflow task scheduling algorithm based on the hypergraph partition is designed with the goal of reducing the completion time and energy consumption for the tasks. The main contributions are shown as follows. (1) Aiming at the problem of resource waste in workflow job scheduling method in geographically distributed datacenters environment, a workflow job scheduling method based on load balancing is designed by analyzing the impact of cloud state on job execution efficiency. The job scheduling problem can be transformed into a M / M / C queue to minimize the total response time. (2) Aiming at the problem of excessive energy consumption in the workflow task scheduling method in the existing geographically distributed datacenters, a workflow task scheduling method based on the shortest path algorithm is designed. The hypergraph partitioning algorithm is composed of the multilevel coarsening initialization and the multilevel K-path hypergraph partitioning algorithm. (3) The real geographically distributed datacenters environment is used to validate the performance of the proposed algorithm. In the experiment, our algorithm can effectively optimize job execution performance and maintain the load balance of the computing nodes. The rest of this paper is structured as followings. Section 2 reviews related works and Section 3 presents the workflow job scheduling based on load balancing. Section 4 presents the workflow task scheduling based on the shortest path. Section 5 describes the process of experimental verification. Finally, conclusions and future work are given in Section 6.

2. Related Work 2.1 Job scheduling in a geographically distributed datacenters Nowadays, the queuing model is used in many job scheduling studies as shown in Table 1. Balla et al. [4] used queuing theory and reinforcement learning to achieve reliability within the cloud environment. They considered the queue buffer size and uncertainty value function. Guo et al. [5] used queuing theory to minimize the long-term average job completion time with the consideration of heterogeneous and dynamic workloads. Karthick et al. [6] build a job scheduling algorithm by using queuing theory to reduce the cost in order to attain high resource utilization and provide quality of the system. He et al. [7] proposed a scalable dynamic task scheduling model, which used a queuing model to reduce the programming burden by considering the average wait time of tasks. Ibrahim et al. [8] proposed a cloud computing scheduling model based on multiple queuing model. By taking advantage of some useful proprieties of queuing theory, the scheduling algorithm can be proposed to improve the quality of service. Table 1 Summary of various job scheduling algorithms using a queuing model Mathematical Algorithm Objective Consideration Methods To achieve reliability Queuing theory, Considering the queue buffer An adaptive action-selection within the cloud reinforcement size and uncertainty value algorithm [4] environment learning function To minimize the long-term A delay-optimal scheduling Considering the heterogeneous average job completion Queuing model algorithm of VMs [5] and dynamic workloads. time

An efficient multi-queue job scheduling algorithm [6] A scalable dynamic task scheduling model via N-level queuing theory [7] A proposed model for cloud computing scheduling based on multiple queuing models [8] Our work

To reduce the cost to attain high resource utilization and provide Quality of System

Queuing model

Considering the cost and the load

To reduce the programming burden

Queuing model

Considering the average wait time of tasks

To improve the quality of service

Queuing model

To balance the load of the whole geographically distributed cloud system

Queuing model

Considering execution time per jobs, waiting time and the cost of resources Considering the waiting time, the data transmission time and the job execution time

The proposed method is different from the existing work. Most of the related studies only consider some factors, such as network bandwidth utilization and computing resources. The most job scheduling problems in geographically distributed datacenters do not consider the impact of cloud state on job execution time. The load balance of the whole geo-distributed cloud system is improved by considering the impact of cloud state on job execution time and average resource utilization. 2.2 Task scheduling in a geographically distributed datacenters As shown in Table 2, Hu et al. [9] implemented Flutter, a new task scheduling algorithm that can reduce the completion time and network cost of big data processing jobs across geographically distributed data centers. Soualhia et al. [10] introduced a framework that can adjust the scheduling decisions based on the collected information. The framework can be used to reduce task execution time and increase resource utilization of the system. Zhang et al. [11] designed a load-aware resource allocation and task scheduling strategy. According to the cloudlet’s load profile, the strategy adaptively allocated resources in the MCC system for delay-tolerant and delay-sensitive mobile applications. Gawali et al. [12] proposed a heuristic algorithm to perform task scheduling and allocate resources efficiently in cloud computing environments. Bahman et al. [13] proposed an improved genetic algorithm called N-GA for static task scheduling. The proposed algorithm outperformed the makespans of the three well-known heuristic algorithms and also the execution time of the recently meta-heuristics algorithm. Chen et al. [14] proposed a real-time scheduling algorithm called RTSATD using task duplication in order to minimize the completion time and monetary cost of processing big data workflows in clouds. Zhang et al. [15] proposed a cascaded two-level (inter-DC and intra-DC) approximate dynamic programming (ADP) task scheduling algorithm that considers multiple objectives including the long-term performance benefits, energy and communication costs. Lu et al. [16] proposed a heuristic scheduling method for inter-GD BDT to find out the optimal solution to the Max-Green-Min-Cost problem. The method can achieve maximized use of renewable energy and minimize the grid energy cost when the renewable energy is insufficient. Table 2 Comparison of the related works about multi-level workflow scheduling Reference

Considerations

Design objective

Solution

the optimization problem for

complexity

The low Time- and Cost- Efficient Task Scheduling across Geo-Distributed Data Centers [9]

network bandwidth; initial data location and replicas

data-intensive job scheduling on

heuristic

geo-distributed data centers.

scheduling algorithm

A Dynamic and Failure-aware Task Scheduling Framework for Hadoop [10]

the execution time,

Reducing task execution time and

resources utilization

increasing resource utilization

A dynamic and failure-aware framework

a load- aware A Load-aware Resource Allocation and Task Scheduling

network transmission

for The Emerging Cloudlet

latency

System [11]

reduce the cloudlet’s monetary cost and turnaround time for delay-tolerant applications

resource allocation and task scheduling (LA-RATS) strategy the modified

Task scheduling and resource allocation in cloud computing using a heuristic approach [12]

computing resources and network bandwidth

analytic Increasing resource utilization

hierarchy process (MAHP) Load balancing

job execution capabilities; the computing power of Our work

each cloud; data transmission time; task completion time;

ensure the load balancing of geographically distributed datacenters and reduce energy consumption

scheduling method for workflow in the geographically distributed datacenters

In geographically distributed clouds, many applications require the multi-step operation of the data, and there is a certain data dependence between each step. Therefore, it is necessary to use the classical multi-level workflow scheduling framework. The above literatures have done some research on workflow scheduling in cloud computing. They adopt the classical multi-level workflow scheduling framework. Most of the related studies only consider some influencing factors, such as network bandwidth utilization and computing resources. There is no comprehensive understanding and consideration of workflow scheduling. Therefore, a load balancing scheduling method for workflow in geographically distributed clouds is proposed, which considers many factors and makes workflow scheduling more efficient. Compared with previous research work, the advantages of the scheduling method proposed in this paper are as follows. (1) The cloud job execution capabilities are considered by comparing the computing power of each cloud and assigning jobs to each cloud on an appropriate scale. The workflow job scheduling method aims to reduce job response time and maintain the load balance of the system. (2) At present, most workflow scheduling problems in geographically distributed clouds aim at improving data locality, without considering the impact of cloud state on job execution time. In this paper, by considering the impact of cloud state on job execution time, the load balance of the whole geographically distributed cloud system is improved. In some studies shown in Table 3, hypergraphs are also used for task partitioning. L.et al. [17] used hypergraph partitioning to minimize the completion time. They considered both task-task relations and completion time. Oguz et al. [18] proposed models for simultaneous scheduling of map and reduce tasks in order to improve data locality and reduce data transmission. Their approach is based on graph and hypergraph models which correctly encode the interactions between map and reduce tasks. Bo et al. [19] proposed an extended hypergraph-based task-scheduling algorithm. It considered platform heterogeneity to improve the execution time of tasks. Cheng et al. [20] proposed a hypergraph based task scheduling strategy on a master-slave platform to handle massive spatial datasets. They considered the number of tasks and data transmission time. Zhang et al. [21] proposed a strategy of partitioning group based on hypergraph. They considered the task execution time and cost to make full use of system resources and

improve the efficiency of task scheduling. Table 3 Summary of various task scheduling algorithms using hypergraph partitioning Algorithm

Objective

Mathematical Methods

Consideration

Joint scheduling algorithm of data and computation [17]

To minimize the completion time

Hypergraph partitioning

Considering both task-task relations and completion time

Locality-aware and load-balanced static task scheduling for MapReduce [18] An improved hypergraph-based task-scheduling algorithm [19]

To improve data locality and reduce data transmission To improve the execution time of tasks

Hypergraph partitioning

Considering location and workload

Hypergraph partitioning

A hypergraph based task scheduling strategy [20]

To handle massive spatial datasets

Hypergraph partitioning

A strategy of partitioning group based on hypergraph (PGH) [21]

To reduce the total task execution time To make full use of system resources and improve the efficiency of task scheduling

Hypergraph partitioning

Considering platform heterogeneity Considering the number of tasks and data transmission time Considering the task execution time and cost Considering the waiting time, the data transmission time and the job execution time

Our work

Hypergraph partitioning

The above is the related research literature on hypergraph partitioning. The innovations of hypergraph partitioning proposed in this paper are as follows: (1) The complex task hypergraph is initialized by multi-level coarsening. The random high connectivity matching heuristic algorithm is used to group the vertices in the hypergraph, and then the K-path initial partition is obtained by using the multi-level partition scheme. (2) In the refinement process of hypergraph partitioning, a method for calculating the maximum moving gain of vertices is proposed, and the maximum moving gain of vertices is partitioned. Through the above two partitioning steps, the effect of hypergraph partitioning is better, and the subsequent task scheduling is more reasonable and efficient. Nowadays, the shortest path algorithm is used in many job scheduling studies as shown in Table 4. Chakaravarthy et al [22] introduced a novel parallel algorithm, derived from the Bellman-Ford and Delta-stepping algorithms. They use Bellman-Ford to solve the shortest path problem in order to dramatically reduce inter-node communication traffic. Broumi S et al [23] redesigned Dijkstra's algorithm to handle the case in which most of the parameters of a network are uncertain. Broumi S et al [24] proposed an extended version of the Dijkstra algorithm for finding the shortest path on a network. Sunita et al [25] dynamized the Dijkstra algorithm which helps to efficiently solve the dynamic single-source shortest path problem. SedeO-Noda A et al [26] proposed a Dijkstra-like method generalizing Dijkstra algorithm to biobjective cases. They used it to solve the biobjective shortest path problem. Table 4 Summary of various task scheduling algorithms using the shortest path Reference

Objective

Scalable Single Source Shortest Path Algorithms for Massively Parallel Systems [22] Applying Dijkstra Algorithm for Solving Neutrosophic Shortest Path Problem [23] Application of Dijkstra Algorithm for Solving Interval Valued Neutrosophic Shortest Path Problem [24]

To dramatically reduce inter-node communication traffic to handle the case in which most of the parameters of a network are uncertain To find the shortest path on a network

Shortest Path Methods Bellman-Ford Dijkstra Dijkstra

A Retroactive Approach for Dynamic Shortest Path Problem [25] A Dijkstra-like method computing all extreme supported non-dominated solutions of the biobjective shortest path problem [26] Our work

To solve the dynamic single-source shortest path problem

Dijkstra

To solve one-to-one and one-to-all biobjective shortest path problems

Dijkstra

To minimize all task completion time and system energy consumption

Dijkstra

The shortest path problem has been studied in the above literature. There are many solutions to the shortest path problem, such as Bellman-Ford and Dijkstra. There are two main differences in the shortest path solutions proposed in this paper. (1) For the task scheduling of hypergraph partitioning proposed in this paper, Dijkstra on the Fibonacci heap is used. Compared with the binomial heap, it has better performance of amortized analysis and can be used to achieve merged priority queues. (2) In addition, this paper optimizes Dijkstra. Dijkstra algorithm based on the Fibonacci heap is used to find task scheduling schemes that minimize the completion time and energy consumption of all tasks. The workflow task scheduling algorithm proposed in this paper balances the execution time and energy consumption of tasks.

3. The job scheduling method based on queuing theory 3.1 Description of the workflow job and the workflow task The workflow job denotes the whole workflow application. A job contains one or more tasks. A workflow job can be represented by a directed acyclic graph G  T , E  , where T  T1 , T2 , , TN  is the collection of tasks ( N is the total number of tasks), and E is the collection of edges indicating the dependency and precedence constraint between tasks. We define the task without a parent task as the start task and the task without a child task as the end task in the DAG. There are only one single-start task and one single-end task in a DAG workflow job. If a DAG workflow job has multiple inputs and outputs, a zero-cost task can be added, and these start/end tasks can be connected to it with zero-cost edges. A workflow job can be represented by a directed acyclic graph including the collection of tasks and the collection of edges which indicate the dependency and precedence constraint between tasks. An example of DAG for workflow job is shown in Fig. 1. The workflow job is represented by a directed acyclic graph G  T , E  , where T  T1 , T2 , , T14  is the collection of tasks and E is the collection of edges indicating the dependency and precedence constraint between the tasks. A workflow task means an individual workflow activity. There are simple tasks ( T2 , T3 ... ) and parallel tasks ( T1 , T10 , T14 ) in this workflow job. In this paper, it is assumed that the jobs submitted by users conform to the Poisson distribution. For the workflow tasks, we use hypergraph theory to transform the DAG form of the workflow job into the hypergraph, then divide the corresponding hypergraph, and finally carry out task scheduling for each part after partitioning.

Simple task

Parallel task

T3

T2

T4

T14

T1

T5

T6 T11

T7

T13

T8

T10

T12

T9

Fig. 1 An example of DAG for a workflow job

3.2 Problem Description and Model In this section, we aim to study how to properly allocate cloud resources to ensure the load balancing of geographically distributed datacenters and reduce energy consumption. The main challenges are as follows:1) How to schedule jobs to the appropriate cloud to ensure load balancing of geographically distributed datacenters. The submitted jobs need to be scheduled to the cloud for execution. Job scheduling is necessary to reduce the job response time and increase the throughput of the entire system. 2) How to assign tasks to the appropriate nodes for execution to maintain load balancing of compute nodes in each cloud. The tasks in each cloud need to be assigned to a specific node for execution. Therefore, task scheduling is necessary to minimize average task execution time and system energy consumption.

Central Controller Job1

Job2

...

Data center Local queue Taks -- > Workflow

C2

Task Scheduling Load balancing

Job Scheduling

1.Calculate job arrival rate for Job Scheduling

Data center Local queue Taks -- >

2.Obtain the Task Workflow scheduling order and task execution Task Scheduling

C1

3. Feedback to user User

Geo-distributed Clouds

Fig. 2 The components and scheduling process in the geographically distributed datacenters architecture

Fig. 2 illustrates the components and scheduling process in the geographically distributed datacenters architecture in this paper. We consider geographically distributed datacenters that includes N data centers located in different geographic locations which are managed by a central controller. The central controller is composed of local hosts. It is mainly responsible for collecting the jobs submitted by the client to each cloud. Then the jobs are assigned to the data centers according to the arrival rate of the jobs in each cloud and the complete information of the jobs are fed back to the users. Each data center located in a different geographical location is responsible for performing jobs assigned by the central controller. Different data centers communicate through the network. All data centers need to report the progress of their tasks to the central controller, and the central controller needs to assign the newly arrived jobs to the various data centers based on the state of the data center. After the jobs are assigned to each cloud, the jobs are fragmented into multiple tasks with mutual dependencies. In each cloud, the tasks are scheduled to the corresponding computing nodes and executed in accordance according to the method proposed in this paper.

3.3 The method of job scheduling based on queuing theory A. Job scheduling based on M / M / C queuing theory The

M /M /C

queuing model can be introduced into the geographically distributed cloud job

scheduling system. The M1 / M 2 / C cloud job scheduling model is proposed to maintain load balancing for the entire geographically distributed cloud system. M 1 indicates that the job arrival time follows a negative exponential distribution with the parameter  , where  is the job average arrival rate.

M2

represents the job execution time follows the negative exponential distribution with the parameter  , where  represents the average number of jobs completed in a unit of time. C denotes the number of resource pools in each cloud; The simplest case of the M1 / M 2 / C model is the M1 / M 2 / 1 model with a single resource pool. The status of the system denotes the number of jobs in the system.     denotes the service strength of the cloud, that is, the average number of jobs being serviced. A larger  indicates a higher system resource utilization. We assume   1 . The probability that the entire cloud is idle can be expressed as follows.  1 1 P0  (  n )1  ( ) 1  n0

(1)

Pn  P{n  N}(n  0,1,2,...) denotes the probability distribution of the queue length N in the equilibrium state for the system. There is  n   , n  0,1,2,... for the C resource pools. The job queue length can be expressed as follows.

Pn  (1   )  n , n  0,1,2,...

(2)

As for the M1 / M 2 / C model, it is assumed that the users submit the jobs individually in turn. The successive submission intervals follow the negative exponential distribution with parameter  . There are

C resource pools in each cloud and each resource pool is independent of each other. If C resource pools are free, the jobs which are submitted to the queue are scheduled for execution immediately. Otherwise, the jobs continue to wait in the queue for scheduling.  j 

j  1 represents the service C j

strength of the jth cloud. The stable conditions of each queue can be expressed as follows.   nj  Pj 0 n  1, 2,..., C  n! Pjn   n  j nC  C !C n C Pj 0 The probability that any cloud is in idle status can be expressed as follows.

 C 1 (  j )n (  j )C  Pj 0      C !(1   j )   n 0 n!

(3)

(4)

Therefore, the probability that the jobs need to wait when it reaches the jth cloud is shown as follows. 

(  j )C

n C

C !(1   j )

P(C ,  j )   Pjn 

Pj 0

The average queuing length for the jth cloud can be shown as follows.

(5)

Pj 0 (  j )C  j

L jq 

(6)

C !(1   j )2

Therefore, the average queue length of each cloud under steady state can be expressed as L j  L jq   j . According to Little's law, the average queue waiting time is T j  L j  j . The workflow job scheduling problem can be transformed into a problem that minimizes the total user response time, which can be expressed as follows.

 j  Lj    j 1    j 

N

N

min Ttotal   PjT j  j 1

(7)

N

s.t.   j   j 1

B. Job Queuing based on Linear Equality Constrained Nonlinear Programming The job queuing problem based on M/ M/ C queuing theory can be transformed into the nonlinear programming problem under linear equality constraints, which can be expressed as follows. min f  X  s.t. AX  b

 P 

(8)

T where X=  λ1 , λ2 ,..., λN  , f  X =Ttotal . We solve it by gradient descent method. Since f : R N  R1 is a

second-order continuous and differentiable function, we assume that there is a non-singular sub-block B  (a pq )mm in matrix A and M  (a ij )m p , where m  p  N . If X is a feasible solution to the problem

 P  , then A   B, M  can be decomposed. There is X   X , X  ,where B is the m  m invertible matrix. T If the non-zero vector d satisfies Ad  0 and f  X  d  0 , it means that d is a feasible point of  P  . Let T

T B

T M

d T   d BT , d MT  .if Ad  0 ,then there is Bd B  Nd M  0 and d B   B1 Nd M . The objective function can be

expressed as follows.

f  X   f  X B , X M   f  B1b  B1MX M , X M 

(9)

The gradient rM of f  X  is shown as follows. rM  M f  X    B 1M   B f  X  T

f  X  d   B f  X  d B  M f  X  d M  rMT d M T

Then there is: (11)

(10)

T

T

In order to ensure d is the feasible decent direction, we should choose right d M to make r T d  0 M M established. d can be calculated as follows. d M : d j  rj , j  j1 , j2 ,..., j p ,  d B   B 1 Nd M 

(12)

According to the above analysis, an invertible matrix B can be found. m Linearly independent column vector of A should be obtained. Further we obtain the job arrival rate X h of all clouds.

3.4 The algorithm of job scheduling based on queuing theory The core code of the workflow job scheduling algorithm based on queuing theory is shown in Algorithm 1. The total user response time can be calculated (Algorithms 1 Line 1~9). The nonlinear programming method based on linear equality constraints can be used to calculate the probability of the job reaching each cloud. The jobs can be scheduled to execute in the cloud according to the probability (Algorithms 1 Line 10~12). Algorithm 1: Workflow job scheduling algorithm based on queuing theory Input: the job arrival rate X h for all clouds, historical execution information Infors . Output: the optimized job arrival rate res Ttotal  0 1 2 for j  1,2,.., N  do // the job scheduling based on queuing theory 3

 1 1 P0  (  n )1  ( ) 1  n 0

4

 C 1 (  )n (  j )C  Pj 0   j   C !(1   j )   n  0 n!

5

P(C ,  j )   Pjn 

6

L jq 



(  j )C

n C

C !(1   j )

Pj 0 (  j )C  j C !(1   j ) 2

// the probability that any cloud is in idle status

Pj 0

// the average queuing length for the jth cloud

Tj  L j  j 7 //calculate the average waiting time by Little law T 8 total  Ttotal  T j 9 end for 10 calculate min Ttotal  satisfying  Pj  1 11 add  1,  2 , ,  n to res 12 return res

The time cost of job scheduling algorithm based on load balancing is mainly made up of two parts. Firstly, the job execution time can be estimated according to the cloud state. We assume that the number of clouds is N . The time cost of the job execution time prediction algorithm is O  N  . Then, the job scheduling algorithm based on queuing theory is mainly to establish a queuing model. The optimal job arrival rate can be solved by the nonlinear programming method under linear equality constraints. The 2 time cost of the job scheduling algorithm based on queuing theory is O N . In summary, the time

 .

complexity of the proposed job scheduling algorithm is O N

 

2

4. The task scheduling based on the hypergraph partition 4.1 The method of task scheduling based on the hypergraph partition 4.1.1 The hypergraph partitioning method for the workflow The workflow hypergraph can be represented as H  (V , E ) . E denotes the edges collection of data dependencies between nodes. There is a weight for each node ni V . The problem of K endpoint hypergraph partition is to dividing the set of nodes into K subsets, which can be expressed as V=V1 ,V2 ,...,VK  . Besides, the problem of hypergraph partitioning also needs to meet the equilibrium

constraints of the computing load with w(Vi )  (1   )  w(V ) k , where w(V )   nk V w(nk ) . The hypergraph partitioning for workflow tasks is divided into two steps. (1) the multilevel coarsening initialization partitioning. (2) the multilevel k-path refinement. The steps are detailed below. (1) The multilevel coarsening initialization partitioning

The goal of multilevel coarsening initialization partitioning is to compress the workflow hypergraph H  (V , E ) into a sufficiently small hypergraph H L  (V L , E L ) so that H L can be quickly processed.

The coarsening initialization partitioning is performed in a progressive manner. The nodes in the l -layer hypergraph H l can be merged into super nodes in the l +1 -layer hypergraph H l +1 by the random HCM (Heavy Connectivity Matching) method. Therefore, the result for the multilevel coarsening initialization partitioning is composed of L hypergraph denoted by {H 1 , H 2 ,..., H L } . As for l there are the inequality | V || V l

l 1

| and | E || E l

 0,1,..., L  1 ,

l 1

| . The multilevel coarsening initialization partitioning ends until the scale of the hypergraph is reduced to a predefined threshold  .

The nodes in hypergraph H l are visited in random order. For each visited node ni V l , if the node ni has not been merged or matched before, one of the visited but unmatched nodes which are directly connected to node ni will be selected to merge it with the node ni . The selection of nodes is based on the re-connectivity criterion. Among all unmatched nodes that are directly connected to node ni , the node n j with the largest weight on the connecting edge will be selected as the node ni to merge with. After the node ni and n j are merged, they are called matched nodes and the merged node nu will become the nodes in the hypergraph H l 1 . If there is no unmatched node among all the nodes connected to node ni , the node ni will be marked as visited and remain unmatched so that node ni can be selected as a paired node in the subsequent node merger. (2) The multilevel k-path hypergraph partition The number of nodes in the final multigraph H L is not exceeded the threshold  . The hypergraph

H L can be divided into K subgraphs so that the sum of the edge weights for the divided subgraphs after division is minimized. P'  {n1' , n2' ,..., nk' }  V denotes the endpoint collection. The hypergraph partition problem can be transformed into the multi-endpoint graph cut problem. The number of endpoints in the endpoint set P ' is first checked. If the endpoint set P ' is empty or there is only one endpoint, then the hypergraph partitioning is not necessary. Otherwise, for each endpoint ni' in the endpoint set P ' , the other endpoints in set P ' are connected to the empty nodes ns' with weight edges of positive infinity. The node ni' and ns' are regarded as the start and end to calculate a

minimum cost cut set Ci . After obtaining the independent minimum cut sets {C1 , C2 ,..., Ck } of all the endpoints,

the

cut

sets

are

sorted

in

ascending

order

of

cutting

costs.

We

assume

c(C1 )  c(C2 )  ...  c(Ck ) . The first k  1 cut sets with the smallest cost are combined to form the final cut set C  C1  C2  ...  Ck 1 for hypergraph H L  (V L , E L ) with respect to the end point set P ' . In order to obtain all the subgraphs divided by the cut set C , the edges in cut set C is removed from the edge set of hypergraph G L . The remaining graph H '  (V L , E L  C ) are width-first searched with each endpoint ni'  P' as the root. For any endpoint ni'  P' , the minimum cut set Ci separates ni' from all other endpoints. After removing the minimum cut set of k -1 endpoints, each endpoint in P ' will be separated from the other endpoints. After traversing the remaining graph H ' , the connected

components of each endpoint ni' constitute a segmentation block or segmentation subgraph H iL . The set   {H1L , H 2L ,..., H kL } consisting of k subgraphs is the result for hypergraph graph division.

(3) The multilevel fine-grained partitioning After the multilevel k-path hypergraph partition, the obtained K subgraphs need to be mapped back to the original workflow graph by using the multilevel fine-grained partitioning. In the fine-grained process, the coarse-grained hypergraph in l  1 layer can be mapped back to the fine-grained hypergraph in l

layer, which is executed iteratively until the hypergraph is mapped to layer 0. Before each mapping, the refined algorithm is used to adjust the nodes between K subgraphs so that the hypergraph partition results can be further optimized and the load balancing constraints can be satisfied. In the process of refining the nodes for K subgraphs, the load balancing constraints are enforced only when the last layer of the graph H 0 is mapped. 4.1.2 The task scheduling method based on the hypergraph partition The DAG task scheduling problem in the geographically distributed datacenters is represented as a 2 sparse directed acyclic graph. According to Wikipedia, the directed graphs which satisfying E  O V can be defined as sparse graphs. load(u ,v ) is the workload of the partition in path u to v . The number of

 

currently active physical machines in the data center j can be denotes by rj .  j denotes the average rate of each physical machine in the data center j . Tcom represent the communication time. Therefore, the workflow task completion time of this partition in the path can be shown as follows.

T(u ,v )  load(u ,v) rj σ j  Tcom

(13)

We assume that the data center j includes m j servers. Eij represents the energy consumption of server i in data center j . The server is the main source of energy consumption in the data center. The load of server is affected by the number of virtual machines and the load inside the virtual machine. Therefore, the more the number of virtual machines is, the larger the server load is. The linear relationship between the power consumption Pmax of physical server and server utilization u can be expressed as follows[27].

P(u)  cPmax  (1  c) Pmax u

(14)

where Pmax is the maximum power consumption when the server is full load. c denotes the power consumption ratio when the server is no load and full load. The energy consumption when the server is idle is equivalent to about 70% of the energy consumption when the server is fully loaded. At a specific time T , the power consumption of server i in data center j can be expressed as follows. Eij (t )   P(u )dt

(15)

T

Therefore, the energy consumption E j of data center j can be expressed as follows. m

E j (t )   pi Eij (t )

(16)

i 1

The workflow task scheduling problem for any path can be transformed into the problem that minimize task completion time and task execution energy consumption, which can be expressed as follows.



min  (u ,v ) Tu ,v   Eu ,v 

s.t.(u, v)  p



(17) (18)

where p denotes the set of directed edges that arbitrarily connect two cloud data centers. Eu ,v   E j  t  is the tasks execution energy consumption for workflow. The workflow task scheduling problem is transformed into a shortest path problem above. The Dijkstra algorithm is often used to calculate the shortest path problem in the graph. The algorithm traverses from the starting point until it finds the shortest path to other vertices. When using the Dijkstra

algorithm, the number of traversals is too large due to the large number of nodes resulting in a relatively high time complexity. While the Fibonacci heap is a typical data structure consisting of several trees. There are many operations in the Fibonacci heap, such as: Insert  H , x  :insert a node x into the heap H ; Decrease- Key  H , x, k  : set the weight of the node x in the heap H to a new value k ( k  x ), and Extract  min  H  : return the node with the lowest weight in the heap H and remove the node from the heap point. The average time complexity of Extract- min is O  log n  ,and the two other operations is

O 1 . The number of Extract- min operations in the shortest path is relatively small, thus the data

structure Fibonacci heap can effectively reduce the algorithm time complexity. However, the data structure Fibonacci heap can effectively reduce the algorithm time complexity. Therefore, in this paper, the Dijkstra algorithm based on Fibonacci heap can be used to optimize the workflow task scheduling problem in geographically distributed datacenters. The specific algorithm steps is shown in Section 4.2.2. 4.2 The algorithm of task scheduling based on the hypergraph partition 4.2.1 The hypergraph partitioning algorithm for the workflow A. the multilevel coarsening initialization hypergraph partitioning algorithm The pseudocode for the multilevel coarsening hypergraph partitioning is shown in Algorithm 2. First, the node collections V ' , V l +1 and VS are initialized (Algorithm 2 Line 1~6). If the node ni has not been merged or matched before, one of the visited but unmatched nodes which are directly connected to node

ni will be selected to merge it with the node ni (Algorithm 2 Line 3~10). Among all unmatched nodes that are directly connected to node, the node with the largest weight on the connecting edge will be selected as the node to merge with (Algorithm 2 Line 11~17). Algorithm 2: the multilevel coarsening initialization hypergraph partitioning algorithm Input: H L  (V L , E L ) Output: H L +1  (V L +1, E L +1 ) 1 V '  V l , V l +1   2 Initialize the nodes set that have been visited but not matched with VS   3 while | V ‘ | 0 do i  random(| V ' |); V '  V '  {ni } 4 5 The set V ''   of other unmatched nodes directly connected to node ni 6 for each n j  ni .neighbors do ' 7 if n j V || n j VS then '' V  V ''  {n j } 8 9 end if 10 end for 11 if V ''   then n j  max{ c(eij )  c(e ji )} 12 n j V '' nu  merge( H l , ni , n j ); V '  V '  {n j } 13 14 else VS  VS  {ni } ; 15 16 end if 17 end while 18 return H L +1  (V L +1, E L +1 )

B. the multilevel k-path hypergraph partition algorithm The pseudocode for the multilevel k-path hypergraph partition is shown in Algorithm 3. First, if the number of endpoints in the endpoint set P ' is less than 1, it is not necessary to divide it and return H L directly (Algorithm 3 Line 1~3). Then, for each endpoint ni , the minimum cost cut set Ci can be calculated (Algorithm 3 Line 4~6). The independent minimal cut sets {C1 , C2 ,..., Ck } for all endpoints can be obtained and the cut edges are removed in cut sets C from hypergraph H L (Algorithm 3 Line 7~8). The width-first search is used for the remaining graphs (V L , E ' ) in order to obtain the hypergraph

partition sequence (Algorithm 3 Line 9~11). Finally, after the multilevel k-path hypergraph partition, the obtained subgraphs need to be mapped back to the original workflow graph (Algorithm 3 Line 12~21). Algorithm 3: the multilevel k-path hypergraph partition algorithm Input: H L  (V L , E L ) , P ' L L L Output: k -path hypergraph partitioning sequences   {H1 , H 2 ,..., H k } 1 if k  1 then 2 return   {H L } ; 3 4 5 6 7 8

9 10

end if ' for ni  P , i  1,2,..., k do Ci  min(G L , P' , ni' ) ; end for C  C1  C2  ...  Ck 1 E‘  E L  C ;   {} for ni  P' , i  1,2,..., k do HiL  BFS (ni' ,V L , E ' ) ;     {HiL }

11 end for 12 for r  0; r   ; r  r  1 do 13

V ' V l ;

14

while | V ‘ | 0 do

15 16 17 18 19 20 21

nb  random(| V ' |); V '  V '  {nb }

if

max{gain(nb , H lj )}  gain(nb , H il ) then 1 j  k l l Vi l  Vi l  {nb } ; V j  V j  {nb }

end if end while end for L L L return   {H1 , H 2 ,..., H k }

4.2.2 The task scheduling algorithm based on the hypergraph partition A. The Dijkstra algorithm based on Fibonacci heap Algorithm 4 describes the pseudocode of the Dijkstra algorithm based on the Fibonacci heap. First, the shortest path for each vertex are initialized (Algorithm 4 Line 1-6). Then, the vertices in set Q are stored in the Fibonacci heap according to the value of d (vi ) (Algorithm 4 Line 7~15). The shortest path of ( s, u ) can be found and vertex u is added to the vertex set S (Algorithm 4 Line 8~11). Finally, we obtain the shortest workflow task scheduling strategy (Algorithm 4 Line 16). Algorithm 4: the Dijkstra algorithm based on Fibonacci heap Input: hypergraph partition results of workflow tasks and the weights of each edge Output: the shortest workflow task scheduling strategy 1 for vi V do // initialize the shortest path for each vertex 2 if vi  s then d  vi   0 3 4 else d  vi    5 add vertex vi to set Q 6 end for 7 for u  Q do // store the vertices for set Q in the Fibonacci heap according to the value of

d  vi 

8 9 10 11 12 13 14

for v  Q do // find the shortest path of  s, u  and add u to the vertex set S if the shortest path of vertex v through vertex u becomes shorter then d (v)  δ(s, u, v) // δ  s,u,v  is the shortest path for vertex v through the vertex u end for Q V  S if Q= then break

15 16

end for return the shortest workflow task scheduling strategy

B. The workflow task scheduling algorithm based on the hypergraph partition The pseudocode for the workflow task scheduling algorithm is shown in Algorithm 5. First, we divide the hypergraph according to Algorithm 2 and Algorithm 3 (Algorithm 5 Line 1). The execution time and energy consumption of the task can be calculated (Algorithm 5 Line 2~7). Then, we establish a task scheduling model for each partition, and a task scheduling scheme for each partition can be found by the shortest path algorithm until all tasks are scheduled to be completed (Algorithm 5 Line 8). The shortest path scheduling strategy for tasks can be obtained according to the Dijkstra algorithm based on the Fibonacci heap [28] which is a common algorithm (Algorithm 5 Line 9-12). Finally, we return the workflow task scheduling strategy with minimal execution time and energy consumption (Algorithm 5 Line 13). Algorithm 5: the workflow task scheduling algorithm based on the hypergraph partition Input: workflow tasks set which has been converted to hypergraph * Output: task scheduling strategy str  s, u, v  1 hypergraph partition // According to Algorithm 2 and Algorithm 3 2 for task  job do 3 for DC  DCs do T(u , v)  load(u , v)m rj σ j  Tcom 4 // calculate the execution time according to formula (13) E(u , v )=E j (t )   pi Eij (t ) // calculate the energy consumption according to formula (16) 5 i 1 6 end for 7 end for 8 calculate the task scheduling model for each partition // according to Section 4.1.2 9 for part  Parts do   (Tmin , Emin )  min (u ,v ) Tu ,v   Eu ,v  // minimize time and energy according to formula (17) 10



11 12 13



* find the scheduling strategy str  s, u, v  with the shortest path d  v  // by Dijkstra algorithm 4

end for * return the workflow task scheduling strategy str  s, u, v 

The workflow task scheduling algorithm based on the shortest path algorithm proposed in this paper mainly consists of two parts. (1) the hypergraph partition of the workflow task graph. (2) the Dijkstra algorithm based on the Fibonacci heap is used to find the shortest path task scheduling strategy for each partition. For the k -path partition, we assume the number of tasks is T . The time complexity of the

workflow task hypergraph partitioning algorithm is O T log T  . When the number of clouds is N , the number of tasks corresponding to the partition with the largest number of tasks in the k-way partition is m . The time complexity of the task scheduling algorithm for each partition is O  m  N log N  .

Therefore, the time complexity of the workflow task scheduling algorithm is O  m+N log N+T log T  .

5. Performance evaluation 5.1 Evaluation Environment 5.1.1 Experimental Setup In our experiment, public clouds, local cloud and remote clouds are deployed in different geographic locations. The experiments in this paper are carried out on an open source Hadoop distributed platform. The operating system used by all virtual machines is Ubuntu 14.04.5. The Java environment is JDK 1.8.0_91. The code development environment is Eclipse 4.5.0 in Linux. The version of Hadoop is

Hadoop-2.7.4. The architecture of a geo-distributed environment is shown in Fig. 3. The hostname, CPU, memory, disk capacity and the corresponding IP address of each virtual machine is shown. The configuration of the primary node which is the core is better than the child nodes which are composed of virtual machines with the same configuration. The configuration of virtual machines in different cloud data centers is different, which means the cloud data centers are different in performance. Since the experimental environment of this paper is based on virtual machines, the scale of the cloud data center can be dynamically expanded as needed. In our experiments, each experimental data is the average value of 20 replicate experiments. Public cloud

Remote cloud 10.136.25.65~10.136.25.67 Core 2/4G/500G

10.136.25.68~10.136.25.71 10.136.25.72~10.136.25.74 Core 2/4G/500G Core 2/4G/500G Node24~Node27 Node28~Node30

Node21~Node23

105.36.48.175 Core 4/8G/1T

Node11~Node13

Node14~Node16 115.28.19.197 Core 8/16G/2T Master2

10.144.17.27~10.144.17.29 Core 8/12G/2TB 10.144.17.20~10.144.17.33 Core 8/12G/2TB

Master3 192.168.203.27~192.168.203.29 Intel i5(4 cores with 3.3GHz)/ Local cluster 8GB/1TB Node8~Node10 192.168.203.24~192.168.203.26 Intel i5(4 cores with 3.3GHz)/ 8GB/1TB

10.144.17.24~10.144.17.26 Core 8/12G/2TB

Node17~Node20

Local cluster

172.17.0.8~172.17.0.10 Intel i5(4 cores with 3.3GHz)/8GB/1TB Node8~Node10

172.17.0.1~172.17.0.7 Intel i5(4 cores with 3.3GHz)/8GB/1TB 192.168.134.1 Node5~Node7 Intel i7(4 cores with Master1 172.17.0.1~172.17.0.4 3.6GHz)/16GB/2TB 10.138.116.227 Intel i5(4 cores with Intel i7(8 cores with User 3.3GHz)/8GB/1TB 3.6GHz)/12GB/2TB Node1~Node4

center controller

192.168.203.20~ Node5~Node7 Master1 192.168.203.23 192.168.203.10 Intel i5(4 cores Intel i7(4 cores with with 3.3GHz)/8GB/ 1TB Node1~Node4 3.6GHz)/16GB/2TB

Fig. 3 The architecture of a geo-distributed environment

5.1.2 Experimental benchmark data Benchmark data determine the correctness and reliability of the performance evaluation for the proposed algorithm. There are currently some tools for generating workflow applications such as Pegasus and COMPSs [29,30]. COMPSs is a framework to provide programming models for task-based application development in distributed environments. Developers can code applications in order without knowing about the underlying infrastructure. At runtime, COMPSs detects the data dependencies between tasks of DAG. In this experiment, the COMPSs programming model is used to implement four benchmark applications. By executing these applications in the cluster, the data dependencies can be detected and a DAG task graph for each benchmark application can be obtained. These DAG task graphs are inputs data for the scheduler application evaluation. Four types of benchmark applications are selected to verify the feasibility of our proposed algorithm. EP tasks denote the parallel tasks consisting of multiple independent parallel tasks. GT tasks denote the centralized tasks consisting of multiple tasks that are continuously reduced by different parent tasks and child tasks. MT tasks denote the matrix tasks consisting of a series of specified tasks with parallel dependencies. SG tasks denote the decentralized-centralized tasks consisting of multiple tasks that are extended and reduced by different parent and child tasks. For each type of DAG benchmark application, we generate three different numbers of tasks. A large

90] . The medium number of tasks range [100, 900] . The small number of tasks number of tasks range [10, , range [100010000] . Each type of task has different data dependencies. The tasks are divided into complex tasks and simple tasks according to the complexity of the data dependencies. In this experiment, tasks with centralized characteristics are defined as complex tasks which means tasks GT and SG are complex tasks. The parallel and matrix tasks are defined as simple tasks which means tasks EP and MT are simple tasks. The proportion CTP of complex tasks for all tasks are 20% and 80% respectively. When CTP is 20%, the average proportion of EP and MT tasks to all simple tasks are 70% and 30% respectively. The

average proportion of GT and SG tasks to all complex tasks are 70% and 30% respectively. When CTP is 80%, the average proportion of EP and MT tasks to all simple tasks are 30% and 70% respectively. The average proportion of GT and SG tasks to all complex tasks are 30% and 70% respectively. 5.1.3 Evaluation metrics In order to verify the accuracy of the job prediction execution time, we compare the predicted time

preTime with the actual time exeTime of the job execution. The prediction accuracy can be calculated

as accuracy  1   preTime  exeTime exeTime  . The average absolute percentage error can be calculated by MAPE  (i 1 | preTime  exeTime | exeTime) 100% num , which is used to evaluate the num

prediction accuracy. When MAPE  0.1 the prediction results can be regarded as reliable.

In order to evaluate the feasibility of the workflow job scheduling algorithm, we select the average waiting time, average response time and system throughput as evaluation metrics. The average waiting time is the time interval between the job arrival time and the time that jobs start to be executed. The average response time is the time interval between the job arrival time and job completion time. The throughput of the entire system is defined as the number of jobs processed by the system per second. The throughput can be calculated as Tp  N ATI , where N is the total number of jobs that are executed at any time ATI . In order to evaluate the feasibility of the workflow task scheduling algorithm based on the shortest path algorithm, we select the average execution time of the tasks, the energy consumption, the QoS success ratio and the total communication traffic as the performance evaluation metrics. The average completion time of tasks is the average time interval from the resources being allocated to the tasks being completed. The QoS success ratio [31,32] denotes the number of tasks completed before the expiry of their respective deadlines divided by the total number of tasks. The power consumption in the geographically distributed datacenters denotes the system power consumption over a period. 5.1.4 Experimental parameters As for the workflow job scheduling algorithm based on load balancing, the list of experimental parameters is shown in Table 5. We set different job arrival rates  , data sets J and the maximum number CT of parallel tasks. The job arrival rate  indicates the number of jobs that arrive at the

, system per minute. The range of the low, medium and high arrival rate is (0,30] , (30120] and (120,150] respectively. The data set J is divided into three categories according to the average size. The range of 128GB] respectively. The the small, medium and big data set is (0,16GB] , (16,64GB] and (64, maximum number CT of parallel tasks indicates the maximum number of tasks that the system can execute at the same time. The range of fewer, medium and multiple parallel tasks is (0,64] , (64,256] and (256, 512] respectively. Parameter 

 c

J CT

N  Job  Type  Job  Type    Type T 

Table 5 Experimental parameters settings Definition Range The step size of the gradient ascent 0.01 method [30,480] The average job arrival rate Number of queues in the resource pool 10 (0,128] The average size of the data set (GB) (0,512] Maximum number of parallel tasks Number of jobs 1000 {small , medium, l arg e} Data set type {low, medium, high} System job arrival rate type { few, medium, multiple} Maximum number of parallel tasks type

As for the workflow task scheduling algorithm, the parameter settings in this experiment are shown in Table 6. Tasks denotes the task number for the workflow jobs whose range is [1,512] . VMs denotes the average virtual machines number in each cloud is 10. Resource utilization percentage RUP denotes the percentage of resources allocated to tasks to the total resources whose range is [30%,80%] . Parameter K

v

Deadline Tasks VMs RUP 5.1.5 Benchmark Algorithms

Table 6 Experimental parameter settings Definition Range Number of hypergraph partition 64 Energy consumption parameter 2 Task deadline (s) 8 Number of tasks [32,512] Number of virtual machines 10 Resource utilization percentage [30%,80%]

In order to verify the proposed job scheduling algorithm, we select the SRPT algorithm [33] and SWAG algorithm [34] as the comparison algorithms. The Shortest Remaining Processing Time (SRPT) algorithm is a classic scheduling algorithm that is often used as a comparison algorithm. The SRPT algorithm enables optimal average job completion time for preemptive job scheduling. Different from the research scenario of this paper, the jobs of this paper consist of multiple parallel tasks. The Workload-Aware Greedy Scheduling (SWAG) algorithm considers the same environment as our paper in the geographically distributed datacenters environment. The SWAG algorithm is a workload-aware greedy scheduling algorithm that prioritizes job execution between data centers to reduce the average work completion time for the near-optimal performance. In order to verify the proposed task scheduling algorithm, we select the MCMKCut algorithm [35], CAWT algorithm [36] and MOSACO algorithm [37] as the comparison algorithms. The Multi-Constraint Multilevel K-way Cut (MCMKCut) algorithm considers the workflow scheduling problem in the geographically distributed datacenters environment and is designed to optimize overall data transmission costs. The algorithm transforms the workflow scheduling problem into a graph partitioning model including computational and storage constraints for each data center. The Cost-Aware Workflow Transformation (CAWT) algorithm considers workflow optimization to reduce the cost of computation and communication on geographically distributed datacenters. The algorithm proposes a cost-aware workflow transformation for cost minimization in real-time big data processing considering the price versatility of both virtual machines and traffic. The task-oriented Multi-Objective Scheduling algorithm based on Ant Colony Optimization (MOSACO) is proposed to minimize task completion time and costs using time-first and cost-first single-objective optimization strategies, respectively. The algorithm uses an entropy optimization model to maximize user service quality and resource provider profitability. 5.2 Results and Analysis 5.2.1 The job scheduling algorithm based on queuing theory The average job size affects the performance of the proposed algorithm. In the experiment, the job arrival rate is set to a fixed value of 60. The maximum number of parallel tasks is 32,128 and 256 per node respectively. In order to evaluate the effect of the average job size on the average response time, the average waiting time and the system throughput, the average job size is set as 8GB to 128GB. Each experimental data is the average value of 20 replicate experiments.

40

140

proposed

120

SRPT

100

SWAG

Average waiting time(s)

Average waiting time(s)

160

80 60 40 20 0

35

proposed

30

SRPT

25

SWAG

20 15 10 5 0

8

16

32

64

128

8

16

Average job size (GB)

32

64

128

Average job size (GB)

(a) maximum number of parallel tasks =32

(b) maximum number of parallel tasks =128

Average waiting time(s)

25 proposed 20 15

SRPT SWAG

10

5 0

8

16

32

64

128

Average job size (GB) (c) maximum number of parallel tasks=256 Fig.4 The effect of average job size on average waiting time

As shown in Fig.4, when the job arrival rate and node maximum parallel tasks number are fixed, the job average waiting time increases as the average job size increases. This is because the larger the amount of job data are, the more resources are used and the longer the execution time is leading to the longer jobs queued waiting time. From Fig.4 (a), when the maximum number of parallel tasks is 32, the average waiting time of the proposed job scheduling algorithm is reduced by 49.6 % and 31.87% compared with the SRPT and SWAG algorithm respectively. From Fig.4 (b), when the maximum number of parallel tasks is 128, the average waiting time of the proposed job scheduling algorithm is 47.43% and 29.8% lower than the SRPT and SWAG algorithm. From Fig.4 (c), when the node maximum number of parallel tasks is 256, the average waiting time of the proposed job scheduling algorithm is reduced by 62.77% and 35% compared with the SRPT and SWAG algorithm. Thus, the proposed job scheduling algorithm can effectively reduce the average waiting time for the job with the average job size changing.

120 proposed

Average response time(s)

Average response time(s)

180 160 140 120 100 80 60 40 20 0

SRPT SWAG

proposed

100

SRPT 80

SWAG

60 40 20 0

8

16

32

64

128

8

16

(a) maximum number of parallel tasks =32

Average response time(s)

64

128

Average job size (GB)

Average job size (GB)

80 70 60 50 40 30 20 10 0

32

(b) maximum number of parallel tasks =128

proposed SRPT SWAG

8

16

32

64

128

Average job size (GB) (c) maximum number of parallel tasks =256 Fig.5 The effect of average job size on average response time

In Fig.5, the job average response time increases as the average job size increases. The reason is that the larger the amount of job data is, the longer the jobs execute. The increase in job waiting time eventually leads to an increase in the average job response time. From Fig.5 (a), when the node maximum number of parallel tasks is 32, the average response time of the proposed job scheduling algorithm is reduced by 28.23% and 20.7% compared with the SRPT and SWAG algorithm. In Fig.5 (b), when the node maximum number of parallel tasks is 128, the average response time of the proposed job scheduling algorithm is reduced by 29.77% and 19.14% compared with the SRPT and SWAG algorithm. In Fig.5 (c), when the node maximum number of parallel tasks is 256, the average response time of the proposed job scheduling algorithm is reduced by 27.52% and 14.69% compared with the SRPT and SWAG algorithm. This indicates that the proposed job scheduling algorithm can reduce the job average response time more effectively when the average job size changes.

35 proposed SRPT SWAG

System throughput

System throughput

9 8 7 6 5 4 3 2 1 0

30

proposed

25

SRPT

20

SWAG

15 10 5 0

8

16

32

64

128

8

16

Average job size (GB)

32

64

128

Average job size (GB)

(a) maximum number of parallel tasks =32

(b) maximum number of parallel tasks =128

System throughput

70 60

proposed

50

SRPT

40

SWAG

30 20 10 0 8

16 32 64 Average job size (GB)

128

(c) maximum number of parallel tasks =256 Fig.6 The effect of average job size on system throughput

In Fig.6, the system throughput decreases as the average job size increases. This is because the larger the amount of job data is, the more resources are required to execute the jobs. The number of jobs completed per unit time reduce resulting in system throughput decreasing. In Fig.6 (a), when the maximum number of parallel tasks is 32, the proposed job scheduling algorithm increases the system throughput by 40.2% and 22.16% compared with the SRPT and SWAG algorithm. In Fig.6 (b), when the maximum number of parallel tasks is 128, the system throughput of the proposed job scheduling algorithm increases by 32% and 16.6% compared with the SRPT and SWAG algorithm. In Fig.6 (c), when the maximum number of parallel tasks is 256, the system throughput of the proposed job scheduling algorithm increases by33.9 % and 11.6% compared with the SRPT and SWAG algorithm. This indicates that the proposed job scheduling algorithm can improve the system throughput more effectively when the average job size changes. 5.2.2 The task scheduling algorithm based on the hypergraph partition A. The effect of the number of tasks The number of tasks directly affects the performance of the proposed task scheduling algorithm. In each geographically distributed data center, the virtual machines number can be set as 10. The resource utilization percentage is set to 60%. The task number is set to 32, 64, 128, 256 and 512 respectively. The Complex Task Percent (CTP) represents the percent of the complex tasks to the total tasks. As shown in Fig.7, the average task completion time of four algorithms increases with the increase of the task number. In Fig.7 (a), when the task number is 32, the proposed task scheduling algorithm can

achieve up to 25%, 15.4% and 9.6% average completion time reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the task number is 512, the proposed task scheduling algorithm can achieve up to 26.5%, 17.4% and 13.2% average completion time reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. In Fig.7 (b), when the task number is 32, the proposed task scheduling algorithm can achieve up to 37.1%, 22.6% and 6.5% average completion time reduction over the MCMKCut , CAWT and MOSACO algorithm respectively. However, when the task number is 512, the proposed task scheduling algorithm can achieve up to 34.5%, 25.7% and 3.6% average completion time reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. The reason for the above is that the task transmission time is mainly considered in MCMKCut algorithm when assigning tasks. CAWT algorithm mainly considers the task transmission and calculation cost. Neither of them considers the task load and resource allocation. The MOSACO algorithm is a multi-objective optimization algorithm that uses time-first and cost-first single-objective optimization strategies to minimize task completion time and cost. The MOSACO algorithm and the proposed task scheduling algorithm have approximately the same impact on task completion time. The proposed task scheduling algorithm not only considers the task transmission time but also considers the task execution efficiency. The tasks are reasonably scheduled according to the minimum completion time of all tasks to ensure the task average completion time shortest. Therefore, the proposed task scheduling algorithm had the best performance in terms of the task average completion time, followed by MOSACO and CAWT in that order,

14

MCMKCut

CAWT

MOSACO

proposed

12 10 8 6 4

Average completion time (s)

Average completion time (s)

while the performance of MCMKCut is the worst. 20

MCMKCut

CAWT

MOSACO

proposed

18 16 14 12 10 8 6 4

32

64 128 256 Number of tasks (N)

512

32

64 128 256 Number of tasks (N)

512

(a) CTP=20% (b) CTP=80% Fig. 7 The effect of the number of tasks on average completion time

As shown in Fig.8, the system energy consumption of four algorithms increases with the increase of the task number. In Fig.8 (a), when the task number is 32, the proposed task scheduling algorithm can achieve up to 41.7%, 25% and 16.7% energy consumption reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the task number is 512, the proposed task scheduling algorithm can achieve up to 54.6%, 23.7% and 28.9% energy consumption reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. In Fig.8 (b), when the task number is 32, the proposed task scheduling algorithm can achieve up to 48.5%, 27.5% and 13.3% energy consumption reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the task number is 512, the proposed task scheduling algorithm can achieve up to 60.6%, 28.4% and 45.9% energy consumption reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. The reason for the above is that the task transmission time is mainly considered in the MCMKCut algorithm, while the CAWT algorithm considers the task transmission and execution cost. Neither of them considers the system energy

consumption when tasks are executed. The MOSACO algorithm primarily considers the monetary cost of completing the tasks, but does not consider the energy consumption of the servers. MOSACO tends to consume more energy, particularly when the number of tasks is very large. The proposed task scheduling algorithm regards the task execution time and the system energy consumption as the weights of the edges in the task scheduling graph. The task scheduling strategy with the shortest execution time and the lowest system energy consumption can be found by the shortest path algorithm. Therefore, the performance of the proposed task scheduling algorithm is better than MCMKCut, CAWT and MOSACO algorithms in terms of energy consumption. MCMKCut

CAWT

MOSACO

proposed

20

14

Energy consumption (nJ)

Energy consumption (nJ)

16 12 10 8 6 4 2 0

MCMKCut

CAWT

MOSACO

proposed

15

10

5

0 32

64 128 256 Number of tasks (N)

512

32

64 128 256 Number of tasks (N)

512

(a)CTP=20% (b) CTP=80% Fig. 8 The effect of the number of tasks on energy consumption

As shown in Fig.9, the QoS success ratio of four algorithms decreases with the increase of the task number. In Fig.9 (a), when the task number is 32, the proposed task scheduling algorithm can achieve up-to 3.1%, 1.5% and 1.3% QoS success ratio improvement over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the task number is 512, the proposed task scheduling algorithm can achieve up-to 10.2%, 6.1% and 2.2% QoS success ratio improvement over MCMKCut, CAWT and MOSACO algorithm respectively. In Fig.9 (b), when the task number is 32, the proposed task scheduling algorithm can achieve up-to 4.2%, 2.8% and 1.1% QoS success ratio improvement over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the task number is 512, the proposed task scheduling algorithm can achieve up-to 6.4%, 4.1% and 2.1% QoS success ratio improvement over the MCMKCut, CAWT and MOSACO algorithm respectively. The reason for the above is that MCMKCut algorithm minimizes the task completion time by controlling the task transmission time. The main goal of the CAWT algorithm is to minimize the transmission cost and execution cost. Neither of them considers the QoS success ratio. The MOSACO algorithm considers deadline constraints, so their deadline violation rates are relatively low and within a limited range. The definition of the deadline violation rates in MOSACO algorithm is similar to the definition of the QoS success ratio in the proposed task scheduling algorithm. The proposed task scheduling algorithm regards the task execution time and the system energy consumption as the weights of the edges in the task scheduling graph. The task scheduling strategy with the shortest execution time and the lowest system energy consumption can be found by the shortest path algorithm. The proposed task scheduling algorithm effectively controls the task completion time within the deadline. Therefore, the proposed task scheduling algorithm had the best performance in terms of the QoS success ratio, followed by MOSACO and CAWT in that order, while the performance of MCMKCut is the worst.

MCMKCut

CAWT

MOSACO

proposed

98

96

QoS success ratio (%)

QoS success ratio (%)

98

94 92 90 88 86

MCMKCut

CAWT

MOSACO

proposed

96 94 92 90 88 86 84

84 32

64

128 256 Number of tasks (N)

32

512

64

128 256 Number of tasks (N)

512

(a) CTP=20% (b) CTP=80% Fig. 9 The effect of the number of tasks on QoS success ratio

B. The scalability of the proposed algorithm with increasing number of VMs The average virtual machines number per cloud data center affects the performance of the proposed task scheduling algorithm. In order to evaluate the impact of the virtual machines number on the average completion time, energy consumption, QoS success ratio and total communication traffic, we set task

8

MCMKCut

CAWT

MOSACO

proposed

7 6 5 4

3 2 10

15 20 25 30 Number of virtual machines (N)

Average completion time (s)

Average completion time (s)

number to 64 in this experiment. The resource utilization percentage can be set to 60%. The virtual machines number is set to 10,15,20,25 and 30 respectively. 12 11 10 9 8 7 6 5 4 3 2 10

MCMKCut

CAWT

MOSACO

proposed

15 20 25 30 Number of virtual machines (N)

(a)CTP=20% (b)CTP=80% Fig. 10 The effect of the number of virtual machines on average completion time

As shown in Fig.10, the average tasks completion time of four algorithms decreases with the increase of the virtual machines number. In Fig.10 (a), when the virtual machines number is 15, the proposed task scheduling algorithm can achieve up-to 25.5%, 16.3% and 9.1% average completion time reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the virtual machines number is 30, the proposed task scheduling algorithm can achieve up-to 60%, 40% and 6.3% average completion time reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. As such, in Fig.10(b), when the virtual machines number is 15, the proposed task scheduling algorithm can achieve up-to 19.2%, 12.1% and 8.2% average completion time reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the virtual machines number is 30, the proposed task scheduling algorithm can achieve up-to 57.5%, 52.7% and 22.3% average completion time reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. The reason for the above is that the task transmission time is mainly considered in MCMKCut algorithm when assigning tasks, while the CAWT algorithm mainly considers the task transmission and calculation cost. Neither of them considers the rational allocation of virtual machine resources. The MOSACO algorithm prioritizes cost with deadline constraints and task completion time with cost constraints. The proposed task scheduling algorithm considers the scheduling

topology of all virtual machines. The optimal nodes are selected to perform tasks. The proposed task scheduling algorithm makes full use of computing resources, so that the task average completion time is the shortest. Therefore, when the virtual machines number increases, the proposed task scheduling algorithm can reduce the average completion time more effectively than MCMKCut, CAWT and MOSACO algorithms. MCMKCut

CAWT

MOSACO

proposed

5 4 3 2 1

7 Energy consumption (nJ)

Energy consumption (nJ)

6

6

MCMKCut

CAWT

MOSACO

proposed

5 4 3 2 1

0

0 10

15 20 25 30 Number of virtual machines (N)

10

15 20 25 30 Number of virtual machines (N)

(a)CTP=20% (b) CTP=80% Fig.11 The effect of the number of virtual machines on energy consumption

As shown in Fig.11, the system energy consumption of four algorithms increases with the increase of the virtual machines number. In Fig.11 (a), when the virtual machines number is 15, the proposed task scheduling algorithm can achieve up-to 42.1%, 26.3% and 21.1% energy consumption reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the virtual machines number is 30, the proposed task scheduling algorithm can achieve up-to 63.3%, 50% and 16.7% energy consumption reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. As such, in Fig.11 (b), when the virtual machines number is 15, the proposed task scheduling algorithm can achieve up-to 42.3%, 26.9% and 23.1% energy consumption reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the virtual machines number is 30, the proposed task scheduling algorithm can achieve up-to 38.1%, 28.6% and 19.5% energy consumption reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. The reason is that when the virtual machines number increases, the number of active virtual machines increases. The MCMKCut algorithm and CAWT algorithm mainly considers the task transmission time and transmission cost. They do not make full use of virtual machine resources. The MOSACO algorithm tends to consume more energy because it does not consider the energy consumption of the servers. The proposed task scheduling algorithm considers reasonable resources and minimizes energy consumption by optimizing all tasks completion time. Therefore, when the virtual machines number increases, the proposed task scheduling algorithm can reduce energy consumption more effectively than MCMKCut, CAWT and MOSACO algorithms. As shown in Fig.12, the QoS success ratio of four algorithms increases with the increase of the virtual machines number. In Fig.12 (a), when the virtual machines number is 15, the proposed task scheduling algorithm can achieve up-to 5.4%, 2.1% and 1.7% QoS success ratio improvement over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the virtual machines number is 30, the proposed task scheduling algorithm can achieve up-to 1.8%, 0.8% and 0.6% QoS success ratio improvement over the MCMKCut, CAWT and MOSACO algorithm respectively. As such, in Fig.12 (b), when the virtual machines number is 15, the proposed task scheduling algorithm can achieve up-to 6.1%, 2.5% and 1.1% QoS success ratio improvement over the MCMKCut, CAWT and MOSACO algorithm

respectively. However, when the virtual machines number is 30, the proposed task scheduling algorithm can achieve up-to 3.6%, 1.6% and 1.0% QoS success ratio improvement over the MCMKCut, CAWT and MOSACO algorithm respectively. The reason for the above is that MCMKCut algorithm considers the task transmission time to reduce the task completion time. CAWT algorithm considers the task transmission cost and execution cost. Neither of them considers how to use the virtual machines resource more effectively. The MOSACO algorithm considers the impact of deadline on the deadline violation rates, which is also a parameter indicator for Qos. The proposed task scheduling algorithm allocates reasonable resources and coordinates all tasks execution progress to ensure more tasks completed within the deadline. Therefore, when the virtual machines number increases, the proposed task scheduling algorithm can improve the QoS success ratio more effectively than MCMKCut and CAWT algorithms. MCMKCut

105

105

proposed QoS success ratio (%)

QoS success ratio (%)

MOSACO

CAWT

100 95 90 85

MCMKCut

CAWT

MOSACO

proposed

100 95 90 85 80

10

15 20 25 30 Number of virtual machines (N)

10

15 20 25 30 Number of virtual machines (N)

20000

MCMKCut MOSACO

CAWT Communication traffic (MB)

Communication traffic (MB)

(a)CTP=20% (b) CTP=80% Fig. 12 The effect of the number of virtual machines on QoS success ratio

proposed

15000 10000 5000 0 10 15 20 25 Number of virtual machines (N)

30

20000

MCMKCut

CAWT

MOSACO

proposed

15000 10000 5000 0 10

15 20 25 Number of virtual machines (N)

30

(a)CTP=20% (b) CTP=80% Fig. 13 The effect of the number of virtual machines on the total communication traffic

As shown in Fig.13, the total communication traffic of four algorithms increases with the increasing number of virtual machines. In Fig.13 (a), when the virtual machines number is 15, the proposed task scheduling algorithm can achieve up-to 49.9%, 31.1% and 13.0% communication traffic reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the virtual machines number is 30, the proposed task scheduling algorithm can achieve up-to 32.9%, 27.5% and 7.6% communication traffic reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. As such, in Fig.13 (b), when the virtual machines number is 15, the proposed task scheduling algorithm can achieve up-to 36.5%, 20.3% and 6.3% communication traffic reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. However, when the virtual machines number is 30, the proposed task scheduling algorithm can achieve up-to 18.1%, 7.5% and 2.9% communication traffic reduction over the MCMKCut, CAWT and MOSACO algorithm respectively. The reason is that as for the same optimization algorithm, the total network traffic in different network topologies will be different. The MCMKCut, CAWT and MOSACO

algorithm do not consider the network topology of the data center. The multi-objective optimization task scheduling algorithm proposed in this paper considers the scheduling topology of all virtual machines and selects the optimal nodes to perform tasks. Therefore, when the virtual machines number increases, the proposed task scheduling algorithm can reduce total communication traffic more effectively than MCMKCut, CAWT and MOSACO algorithms. 5.3 Experiment Summary In this experiment, we first compare the proposed job scheduling algorithm with the traditional SRPT and SWAG algorithm by analyzing the job average waiting time, the job average response time and the system throughput. The experiment results show that the proposed job scheduling algorithm can schedule jobs more efficiently by considering the state of each cloud. The system throughput significantly improves due to the full application of resources on each cloud. Then, we compare the proposed task scheduling algorithm with the MCMKCut, CAWT and MOSACO algorithm by analyzing the average completion time, system energy consumption and QoS success ratio. The experiment results show that the proposed task scheduling algorithm performs more reasonable tasks scheduling by considering the energy consumption and task execution efficiency. Due to the reasonable allocation of cloud resources, the task average completion time and system energy consumption are significantly shortened. The QoS success ratio is also improved.

6. Conclusion In this paper, an effective scheduling strategy based on hypergraph partition in the geographically distributed datacenters is proposed to efficiently utilize cloud resources. First, a queuing model based on M/M/C queuing theory is established to minimize the response time in each cloud. Then, the job scheduling problem in the geographically distributed datacenters is transformed into the minimum total response time problem. The workflow jobs are assigned into each cloud according to the job arrival rate. In addition, a workflow task scheduling algorithm based on the shortest path algorithm is proposed to minimize all task completion time and energy consumption. Finally, the performance of the proposed workflow scheduling method is evaluated via extensive experiments. The results indicate that our proposed workflow scheduling method can effectively utilize cloud resources and reduce system energy consumption.

Acknowledgements The work was supported by the National Natural Science Foundation (NSF) under grants (No.61672397, No.61871352), Application Foundation Frontier Project of WuHan (No. 2018010401011290). Open Fund of Science and Technology on Parallel and Distributed Processing Laboratory of National University of Defense Technology. Open Fund of Key Laboratory of Metallurgical Equipment and Control Technology of Ministry of Education (No. MECOF2020B01). Any opinions, findings, and conclusions are those of the authors and do not necessarily reflect the views of the above agencies.

Author Statement Chunlin Li, Yihan Zhang, Hao Zhiqiang, Luo Youlong designed the study, developed the methodology, performed the analysis, and wrote the manuscript. Chunlin Li, Yihan Zhang collected the data. Chunlin Li, Yihan Zhang also revised the paper according to the comments.

References [1] Rochwerger B, Breitgand D, Levy E, Galis A, Nagin K, Llorente IM, Montero R, Wolfsthal Y, Elmroth E, Caceres J. The reservoir model and architecture for open federated Cloud computing. IBM Journal of Research and Development. 2009, 53(4): 1-4. [2] Li H, Zhu G, Cui C, Tang H, Dou Y, He C. Energy-efficient migration and consolidation algorithm of virtual machines in data centers for cloud computing. Computing. 2016, 98(3): 303-317. [3] Zhu J, Zheng Z, Zhou Y, Lyu MR. Scaling service-oriented applications into geo-distributed clouds. Proceeding of 2013 7th IEEE International Symposium on Service-Oriented System Engineering, New York. 2013: 335-340. [4] H. A. M. Balla, C. G. Sheng, J. Weipeng. Reliability Enhancement in Cloud Computing Via Optimized Job Scheduling Implementing Reinforcement Learning Algorithm and Queuing Theory. 2018 1st International Conference on Data Intelligence and Security (ICDIS), South Padre Island. 2018: 127-130. [5] M. Guo, Q. Guan, W. Ke. Optimal Scheduling of VMs in Queueing Cloud Computing Systems with a Heterogeneous Workload. IEEE Access, 2018(6): 15178-15191. [6] A. V. Karthick, E. Ramaraj, R. G. Subramanian. An Efficient Multi Queue Job Scheduling for Cloud Computing. 2014 World Congress on Computing and Communication Technologies, Trichirappalli, 2014: 164-166. [7] He W, Wei D, Quan J, Wu Wei;Qi Fengbin. Dynamic task scheduling model and fault-tolerant via queuing theory. Journal of Computer Research & Development, 2016,53(6):1271-1280. [8] Ibrahim E. Enhancing cloud computing scheduling based on queuing models. Journal of Computer Applications.2014, 85(2):17-23. [9] Hu Z, Li B, Luo J. Time- and Cost- Efficient Task Scheduling across Geo-Distributed Data Centers. IEEE Transactions on Parallel and Distributed Systems. 2018, 29(3): 705-718. [10] Soualhia M, Khomh F, Tahar S. A Dynamic and Failure-aware Task Scheduling Framework for Hadoop. IEEE Transactions on Cloud Computing, 2018, 99:1-16. [11] Zhang F, Ge J, Li Z, Li CY, Wong CF, Kong L, Luo B, Chang V. A Load-aware Resource Allocation and Task Scheduling for The Emerging Cloudlet System. Future Generation Computer Systems, 2018,87: 438-456. [12] Gawali M B, Shinde S K. Task scheduling and resource allocation in cloud computing using a heuristic approach. Journal of Cloud Computing, 2018, 7(1):1-16. [13] Bahman Keshanchi, Alireza Souri, Nima Jafari Navimipour. An improved genetic algorithm for task scheduling in the cloud environments using the priority queues: Formal verification, simulation, and statistical testing. Journal of Systems and Software, 2017(124):1-21. [14] H. Chen, J. Wen, W. Pedrycz and G. Wu. Big Data Processing Workflows Oriented Real-Time Scheduling Algorithm using Task-Duplication in Geo-Distributed Clouds. IEEE Transactions on Big Data.2018:1-14. [15] Zhang P , Ma X , Xiao Y , et al. Two-level task scheduling with multi-objectives in geo-distributed and large-scale SaaS cloud. World Wide Web. 2019:1-29. [16] Xingjian Lu, Dongxu Jiang, Gaoqi He, Huiqun Yu. GreenBDT: Renewable-aware scheduling of bulk data transfers for geo-distributed sustainable datacenters. Sustainable Computing: Informatics and Systems. 2018(20):120-129.

[17] L. Yin, J. Sun, L. Zhao, C. Cui, J. Xiao, C. Yu. Joint Scheduling of Data and Computation in Geo-Distributed Cloud Systems. Proceeding of 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Shenzhen, China. MAY 04-07, 2015 .2015: 657-666. [18] Oguz S, Vehbi D G, Ata T, Aykanat, C. Locality-aware and load-balanced static task scheduling for MapReduce. Future Generation Computer Systems, 2019, 90:49-61. [19] Bo C, Xuefeng G, Huayi W, Li, R. Hypergraph+: An Improved Hypergraph-Based Task-Scheduling Algorithm for Massive Spatial Data Processing on Master-Slave Platforms. ISPRS International Journal of Geo-Information, 2016, 5(8):141-157. [20] Cheng B, Guan X, Wu H. A Hypergraph Based Task Scheduling Strategy for Massive Parallel Spatial Data Processing on Master-Slave Platforms. Proceeding of 23rd International Conference on Geoinformatics. Wuhan, China. JUN 19-21, 2015:1-5. [21] Zhang W, Wang L, Ma Y, Liu DS. Design and implementation of task scheduling strategies for massive remote sensing data processing across multiple data centers. Software: Practice and Experience, 2014, 44(7):873-886. [22] Chakaravarthy V T, Checconi F, Petrini F, Murali P. Scalable Single Source Shortest Path Algorithms for Massively Parallel Systems. IEEE Transactions on Parallel & Distributed Systems, 2017, 28(7):2031-2045. [23] Broumi S, Bakali A, Talea M, Smarandache, F. Applying Dijkstra Algorithm for Solving Neutrosophic Shortest Path Problem. Proceeding of International Conference on Advanced Mechatronic Systems. Melbourne, Australia. NOV 30-DEC 03, 2016. 2016:412-416. [24] Broumi S, Talea M, Bakali A, Smarandache, F. Application of Dijkstra Algorithm for Solving Interval Valued Neutrosophic Shortest Path Problem. Proceeding of IEEE Symposium Series on Computational Intelligence. Athens, Greece Dec 06-09, 2016.2016:1-6. [25] Sunita, Garg D. A Retroactive Approach for Dynamic Shortest Path Problem. National Academy Science Letters, 2018,42(1):25-32. [26] SedeO-Noda A, Raith A. A Dijkstra-like method computing all extreme supported non-dominated solutions of the biobjective shortest path problem. Computers & Operations Research, 2015, 57:83-94. [27] Xiong F, Zhou C. Virtual machine selection and placement for dynamic consolidation in Cloud computing environment. Frontiers of Computer Science. 2015, 9(2): 322-330. [28] Fredman M. L. and Tarjan R. E. Fibonacci heaps and their uses in improved network optimization algorithms. Proceeding of 25th Annual Symposium onFoundations of Computer Science. Singer Island, USA, 24-26 Oct. 1984 34(3):596–615. [29] Deelman E, Vahi K, Juve G, Rynge M, Callaghan S, J.Maechling P, Mayani R, Chen W, Ferreira da Silva R, Livny M. Pegasus, a workflow management system for science automation. Future Generation Computer Systems. 2015, 46: 17-35. [30] Lordan F, Ejarque J, Sirvent R, et al. Energy-aware programming model for distributed infrastructures. Proceeding of 2016 24th Euromicro International Conference on Parallel, Berlin: Springer. 2016: 413-417. [31] K. A. M. Al-Soufy and A. M. Abbas. CAQS: A Contention Aware Quality of Service Routing for TDMA-Based Ad Hoc Networks. 2010 International Conference on Computational Intelligence and Communication Networks, Bhopal. 2010: 214-219. [32] Hamid Arabnejad, Jorge G. Barbosa. Multi-QoS constrained and Profit-aware scheduling approach for concurrent workflows on heterogeneous systems. Future Generation Computer Systems, 2017(68):211-221. [33] Zhang S, Pan L, Liu S, Wu L, Meng XX. Profit Based Two-Step Job Scheduling in Clouds. Proceeding of 17th International Conference on Web-Age Information Management, Berlin: Springer. 2016 (9659): 481-492. [34] Hung C C, Golubchik L, Minlan Yu. Scheduling jobs across geo-distributed datacenters. Proceeding of 2015 6th ACM Symposium on Cloud Computing, New York: ACM. 2015: 111-124. [35] Chen J, Zhang J, Song A. Efficient Data and Task Co-Scheduling for Scientific Workflow in Geo-Distributed Datacenters. Proceeding of 2017 5th International Conference on Advanced Cloud and Big Data, New York: IEEE Computer Society Press, 2017: 63-68. [36] Ishizuka Y, Chen W, Paik I. Workflow Transformation for Real-Time Big Data Processing. Proceeding of 2016 5th IEEE International Congress on Big Data, New York: IEEE Computer Society Press. 2016: 315-318.

[37] L. Zuo, L. Shu, S. Dong, Y. Chen and L. Yan. A Multi-Objective Hybrid Cloud Resource Scheduling Method Based on Deadline and Cost Constraints. IEEE Access.2017(5): 22067-22080.

Biographical notes: Chunlin Li is a Professor of Computer Science in Wuhan University of Technology. She received the ME in Computer Science from Wuhan Transportation University in 2000, and PhD in Computer Software and Theory from Huazhong University of Science and Technology in 2003. Her research interests include cloud computing and distributed computing. Yihan Zhang received her BS degree in computer network engineering from Anhui Agricultural University in 2018. She now is a MS student in Wuhan University of Technology. Her research interests include cloud computing and big data. Hao Zhiqiang received his MS degree in mechanical engineering from Wuhan University of Science and Technology. He now is an engineer in Key Laboratory of Metallurgical Equipment and Control Technology of Ministry of Education (Wuhan University of Science and Technology). His research direction is metallurgical equipment failure and diagnostic analysis. Youlong Luo is a vice Professor of Management at Wuhan University of Technology. He received his M.S. in Telecommunication and System from Wuhan University of Technology in 2003 and his Ph.D. in Finance from Wuhan University of Technology in 2012. His research interests include cloud computing and electronic commerce.

Chunlin Li

Yihan Zhang

Hao Zhiqiang

Youlong Luo