Load balance based workflow job scheduling algorithm in distributed cloud

Load balance based workflow job scheduling algorithm in distributed cloud

Journal Pre-proof Load balance based workflow job scheduling algorithm in distributed cloud Chunlin Li, Jianhang Tang, Tao Ma, Xihao Yang, Youlong Luo...

2MB Sizes 0 Downloads 94 Views

Journal Pre-proof Load balance based workflow job scheduling algorithm in distributed cloud Chunlin Li, Jianhang Tang, Tao Ma, Xihao Yang, Youlong Luo PII:

S1084-8045(19)30378-9

DOI:

https://doi.org/10.1016/j.jnca.2019.102518

Reference:

YJNCA 102518

To appear in:

Journal of Network and Computer Applications

Received Date: 29 May 2019 Revised Date:

12 December 2019

Accepted Date: 16 December 2019

Please cite this article as: Li, C., Tang, J., Ma, T., Yang, X., Luo, Y., Load balance based workflow job scheduling algorithm in distributed cloud, Journal of Network and Computer Applications (2020), doi: https://doi.org/10.1016/j.jnca.2019.102518. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.

Load balance based workflow job scheduling algorithm in distributed cloud Chunlin Li 1, 2,3 *, Jianhang Tang 1, Tao Ma 2 , Xihao Yang2, Youlong Luo1 1 2

Department of Computer Science, Wuhan University of Technology, Wuhan 430063, P.R.China

State Key Laboratory of Smart Manufacturing for Special Vehicles and Transmission System, Baotou City, Inner Mongolia ,014030, P.R.China 3

CAAC Key laboratory of General Aviation Operation, Civil Aviation Management Institute of China, Beijing, 100102,P.R.China * Corresponding author

Abstract As the scale of the geo-distributed cloud increases and the workflow applications become more complex, the system operation is more likely to cause the waste of resources and excessive energy consumption. In this paper, a workflow job scheduling algorithm based on load balancing is proposed to efficiently utilize cloud resources. Firstly, the execution time of the jobs on the cloud is estimated based on the state of the cloud. Then, a queuing model is established for each cloud to minimize the total response time of the system. Finally, the job scheduling problem in a geo-distributed cloud can be transformed into the minimum system response time problem. Moreover, a workflow task scheduling algorithm based on the shortest path algorithm is proposed to minimize all task completion time and energy consumption. Firstly, the directed acyclic graph (DAG) of the tasks can be converted into the hypergraph according to the execution order of the tasks. Then, the k -path hypergraph partition is performed with the balance of the hypergraph. Finally, the Dijkstra shortest path algorithm is used to find the optimal task scheduling strategy which is performed on each hypergraph partition. The experimental results indicate that our proposed workflow scheduling method can effectively utilize cloud resources and reduce system energy consumption. Moreover, the applicability of the proposed effective scheduling strategy is shown in the scenarios of new media live video application. Keywords: distributed cloud; job scheduling; task scheduling; workflow jobs

1. Introduction The geo-distributed cloud is that clouds are deployed in different geographical locations. To deal with the data explosion, the geo-distributed cloud is a promising solution. Google’s 13 data centers are deployed in 4 continents over 8 countries [1]. Some applications, such as media streaming applications and virtual reality, can benefit from such a system [2]. To improve the utilization of geo-distributed cloud, user requests are adapted in different data centers. Compare with the traditional cloud computing, geo distributed cloud can provide faster processing capacity and more storage size, which can guarantee the quality of service. Thus, it is significant to study the job scheduling problem in the geo-distributed cloud environment. Users are increasingly demanding cross-regional cloud services and the cloud service quality, which greatly promotes the development of geo-distributed clouds. More and more complex workflow applications are born, such as face recognition, image processing, video processing, and online shop user

behavior analysis. The biggest feature of workflow applications is that there are certain dependencies between tasks. Workflow scheduling is to discover resources and dispatch tasks on suitable resources for tasks of workflow applications [3]. The data dependency between workflow tasks is an import role. Thus, how to schedule these tasks with appropriate resources is an important problem. The higher the complexity of data association is, the higher the demands for computing performance and processing power are. Geo-distributed clouds can effectively handle these complex workflow applications by providing unlimited on-demand virtual resources. The load balancing is one of the common objectives for workflow scheduling schemes [4]. To avoid over-utilized or under-utilized clouds, the workflow jobs should be dispatched based on load balancing to improve the utilization of all cloud resources in geo-distributed cloud environment. Load balancing can improve Quality of Service (QoS), including response time, cost, throughput and performance [5]. Geographical load balancing can utilize the server heterogeneity in of geo-distributed clouds to dispatch jobs so that response time and energy cost can be reduced [2], [6]. Server consolidation can help the data center operator avoid server sprawls by combining workloads of different servers on a set of target servers. Resource reconfiguration can reconfigure the virtual machines (VMs) to improve the utilization of computation resources. The above two methods, server consolidation and resource reconfiguration, are suitable for data centers without geographical transfer costs [7]. However, it is costly to dispatch jobs and deliver VMs to applications among different clouds. Thus, the load balancing is applied in this work. By spanning geo-distributed data centers at different locations, a cloud platform can be provided with larger computing capacities. The burden on computation, storage and communication incurs considerable expenditure to users. For timely responses, tasks should be distributed among the cloud sites suitably to make full use of the cloud resources [8]. For resource management in geo-distributed clouds, how to achieve load balance of workflow applications among various clouds and schedule tasks with certain dependencies is an interesting research problem. However, how to dispatch workflow applications effectively in the geo-distributed cloud environment is an interesting problem. The user’s requests are distributed geographically. Some computation resources of clouds with less tasks may be idle, while some clouds are overloaded. This unbalancing will result in a significant waste of resources. Thus, it is significant to study the scheduling method for workflow applications to improve the performance of geo-distributed clouds. The main contributions are summarized as follows. (1) We formulate the job scheduling in geo-distributed clouds. The average waiting time of the jobs is obtained based on Little's law and the system response time can be further obtained. The job scheduling problem is formulated as a system response time minimization problem. (2) To deal with the DAG task scheduling problem, a workflow task partitioning algorithm based on hypergraph is proposed. The task scheduling problem is transformed into a shortest path problem, which is further solved by Dijkstra algorithm based on Fibonacci heap. (3) We take a real live video application scenario as an example to introduce our proposed method. The experimental results depict that the proposed scheduling algorithms can improve the system

performance and achieve the load balancing in each cloud.

2. Related Work In this section, the related work of job scheduling and task scheduling is reviewed. The job scheduling is to dispatch jobs which can be split into lots of tasks to the optimal resources. The task scheduling is to dispatch the tasks which are divided from jobs to the optimal resources. At present, the research of job scheduling in the geo-distributed cloud is mainly about reducing the job execution time and maintaining the system load balance. Chen et al. [9] proposed a fair job scheduling strategy, which aims to realize the fair sharing of resources in geo-distributed data centers and considers data center data location and network bandwidth. Xu et al. [10] studied the cost minimization of big data analysis in geo-distributed data centers connected to renewable energy with unpredictable capacity and proposed a job scheduling algorithm based on reinforcement learning (RL). Convolbo et al. [11] proposed a low-complexity heuristic scheduling algorithm to solve data-intensive job scheduling problems in geo-distributed data centers. This algorithm allows data to be processed locally and transmitted in real time for higher performance. Li et al. [12] designed a two-dimensional online control algorithm, which aims to characterize and optimize the cost-performance tradeoff of geo-distributed data analysis, but it lacks the full utilization of resources. Forestiero et al. [13] proposed two algorithms specifically for workload distribution and migration in geo-distributed data centers to achieve load balancing and energy cost reduction goals. Li et al. [14] mainly studied how to map MapReduce jobs to the least amount of data transmitted between data centers for geo-distributed big data, while providing expected job completion time. Zhou et al. [15] proposed an adaptive resource management algorithm based on reinforcement learning for geo-distributed data center differentiated services to achieve a balance between QoS benefits and power consumption. Cavallo et al. [16] proposed a job scheduling method in geo-distributed clouds, which was based on the Late Acceptance Hill-Climbing algorithm. With the proposed job scheduling algorithm, sub-jobs can be finished in very limited scheduling time. Although the above studies are all about the analysis of job scheduling in geo-distributed environments, the difference from this paper is that they do not consider the different computing power of different clouds. However, we consider the cloud job execution capabilities by comparing the computing power of each cloud and assigning jobs to each cloud on an appropriate scale. The goal is to minimize job response time and maintain the system load balanced of the geo-distributed cloud system. The comparisons between our work and current job scheduling methods are listed in the Table 1. Table 1 Comparisons between our work and current job scheduling methods in geo-distributed cloud Study [9]

Environment

Objective

Performance metrics

Applications

geo-distributed

the job completion time

the worst job completion time

MapReduce

data centers

across all concurrent jobs

second-worst job completion time,

applications

while maintaining max-min

the algorithm running time, the

fairness

average improvement ratios

[10]

geo-distributed

the costs of data center

data centers

the impacts of network structures,

Internet

the performance of RPS, the

services

impacts of UBN, [11]

geo-distributed

makespan of processing jobs

data centers [12]

the

normalized

makespan,

the

percentage of scheduled Jobs,

geo-distributed

the inter-DC traffic cost,

the

time-averaged

data centers

system throughput and a

time-averaged

maximum delay for each

maximum queuing delay

Data-intensive applications

cost,

the

Data analytics

throughput,

the

applications

query request. [13]

geo-distributed

energy

data centers

load

costs, balancing

guaranteed and

the

the Energy cost, the Utilization of

Web

DCs, Power consumption of DCs

applications

carbon emission [14]

[15]

geo-distributed

inter-DC traffic generated by

the average inter-DC traffic, average

Data analytics

data centers

a cross-DC MapReduce job

job completion time

applications

geo-distributed

the balance between QoS

the

data centers

revenue

average Power consumption

and

power

average

QoS

revenue,

the

Internet service

consumption [16]

geo-distributed

the overall jobs execution

the jobs execution time

Big

data

cloud

time

Our

geo-distributed

cloud resources, minimize

the average waiting time, the average

Live

work

cloud

the total response time of the

response

applications

system,

throughput

applications

time

and

the

system

video

At present, the goal of related research on task scheduling in the geo-distributed cloud is mainly about reducing data transmission traffic between different data centers and shortening the entire work completion time [50,51,52]. Hu et al. [17] implemented Flutter, a new task scheduling algorithm that can reduce the completion time and network cost of big data processing jobs across geographically distributed data centers. Wang et al. [18] proposed a processor-level migration algorithm to reschedule remaining tasks among processors on an individual server and dynamically balance the workloads and lower the total ECR on this server. Geng et al. [19] proposed a novel scheme called CODE to minimize the makespan of tasks with consideration of both the correlation and dependency constraints among tasks. Yuan et al. [20] presented a workload-aware revenue maximization (WARM) approach to maximize the revenue of the data center providers. WARM yields the best schedules that can reduce the round-trip time of tasks for all applications. Żotkiewicz et al. [21] proposed a scheduling strategy that dynamically assigns tasks to computing servers based on the current load of network links and servers within the data center. Lu et al. [22] formulated an optimization problem to maximize the time-average profit from serving data-oriented tasks in a cloud DC system and then leverage the Lyapunov optimization techniques to propose an efficient scheduling algorithm. Yuan et al. [23] presented a time aware task scheduling (TATS) algorithm that investigates the temporal variation and schedules all admitted tasks to execute in GDC without violating delay constraints of all admitted tasks. Zhang et al. [24] proposed a task scheduling method for scientific workflow in geo-distributed datacenters. This proposed task scheduling algorithm

was based on the multilevel graph coarsening and un-coarsening frameworks. The authors also applied a hybrid genetic algorithm with graph partition driven features of repair and local improvement. Alshaer H [25] studied the networking aspect in cloud computing to provision cloud network as a service and engineer network of data centers. The author also discussed the network virtualization and DCN virtualization technologies. Although most of the current work considers data transmission time and task completion time, in this paper, we aim to more smoothly schedule workflow tasks for the purpose of reducing the system energy consumption and fully utilizing the resources, while ensuring load balancing of each computing node. The comparisons between our work and current task scheduling methods are listed in the Table 2. Table 2 Comparison between our work and current task scheduling method in geo-distributed cloud Study

Environment

Objective

[17]

geo-distributed

both

data centers

network

[18]

[19]

completion costs

Performance metrics

Applications

time

and

the job computation times, the

Big data

big

data

stage

processing

of

completion

times,

the

processing jobs

amount of data transferred bytes

applications

geo-distributed

to save energy, to dynamically

the energy consumption Ratio,

N/A

cloud

balance the workloads and lower

energy saving percent, active

the total ECR on this server.

server number.

the completion time of tasks

the tasks completion time

geo-distributed

Data-parallel

cloud [20]

applications

geo-distributed

the round-trip time of tasks for all

the revenue, the task arriving

Heterogeneo

cloud

applications

rates, round trip time (RTT),

us

variance

applications

of

average

VM

utilization. [21]

[22]

[23]

[24]

geo-distributed

the server energy consumption

data center

and dependencies between tasks

geo-distributed

how

network

the time-average profit, the task

E-Science

data centers

resources and IT resources to

processing delay, the average

and backup

support the data-oriented tasks

throughput and

applications

dynamically and cost-effectively

computation time.

geo-distributed

to achieve profit maximization by

the task arriving rates, the Profit

Big

data center

cost effectively

and penalty, Consumption of grid

processing

scheduling tasks while meeting

and green energy, number of

applications

their delay constraints.

tasks, Execution time, throughput

the cross-datacenter data transfer

the data transfer and run time

geo-distributed

to

allocate

the energy consumption,

Mobile applications

data centers

data

Scientific workflow applications

Our

geo-distributed

all task completion time and

the task average completion time,

Live

work

cloud

energy consumption

system energy consumption and

applications

QoS satisfaction rate

video

3. Load balance based scheduling method for workflow in the distributed cloud 3.1 The architecture of distributed cloud

Data center 1

...

C1 Local queue Taks -- > Workflow Task Scheduling

Recovery Manager Module

Central Controller

Data center 2 Job1

Submit jobs

Users

Prediction Manager Module

Replication Manager Module

Fault-Tolerance Manager

Job2

Submit jobs

...

...

Load balancing

C2 Job Scheduling

...

Local queue Taks -- >

Users

Workflow Task Scheduling

Fig. 1 The overview of the geo-distributed cloud architecture

Fig. 1 depicts the overview of the geo-distributed cloud. As shown in Fig. 1, a geo-distributed cloud system including a central controller, a fault-tolerance manager, multiple data centers and users. The users in various geographic locations submit their workflow jobs to the data centers near them. When jobs arrive at data centers, the central controller is responsible for dispatching the jobs to each data center according to our proposed job scheduling method. The function of the central controller is to collect the information of jobs submitted by users and schedule the jobs to appropriate data centers. The central controller can be deployed at the places where all the data centers communicate with it with least transmission time [44] or least energy cost [45]. The data centers deployed in different geographical locations based on the user density are responsible for executing jobs assigned by central controller. The function of data centers is to provide computation capacities for executing jobs. Since the central controller works in a centralized manner, it is costly that this controller fails. A fault-tolerance manager installed near the controller is responsible for monitoring and recovering the central controller [26], [46]. The fault-tolerance manager works independently from central controller for fault-tolerance in the background [46]. Moreover, both the central controller and fault-tolerance manager work in a centralized manner [26], [46], [47]. The functions and responsibilities of the fault-tolerance manager are discussed in Section 3.2. The flow of workflow job scheduling is shown as follows. Data centers send the progress of executing jobs to the central controller. Then, the central controller will schedule new arrived jobs to clouds accordingly. When jobs arrive the clouds, these jobs will be divided into multiple tasks with mutual dependencies. In each cloud, tasks are scheduled to the computation resources according to the scheduling method proposed in this work.

3.2 Discussion of fault-tolerance for central controller The central controller is the global scheduler to make job-level scheduling decisions by gathering load information in the system. The central controller fault may occur due to the hardware and software failures, such as interruption of power supply, software crash and processor crash [26], [27]. [28]. As shown in Fig.1, the fault-tolerance manager for central controller integrates three modules, replication manager module, prediction manager module and recovery manager module [27]. The fault-tolerance manager is installed near the central controller to avoid failures of the central controller [26], [46]. The responsibilities of these three modules are to improve the reliability of the central controller. The function of replication manager module is to produce replica from the central controller. The algorithm in the replication manager module is byzantine fault tolerance technique [48]. The function of prediction manager module is to predict and detect faults. The algorithms in the prediction manager module are Preemptive migration and Software Rejuvenation methods [49]. The function of recovery manager module is to recover the system after fault detection. The algorithm in the recovery manager module is software redundancy techniques [28]. The fault-tolerance methods for central controller applied in our work are shown in Fig. 2, preemptive migration algorithm and software redundancy method. For replication manager module, when a request arrives at geo-distributed data centers, this request will be processed based on byzantine fault tolerance technique in different nodes [48]. For prediction manager module, the system data are monitored all the time to trigger an alarm. Then, an architectural transformation tool is used to act in response the gauges which can detect differences between running and actual system. Finally, the preemptive migration algorithm is applied to send the task to similar machine [49]. For recover manager module, the information from central control is used for repair strategies first. Then, service and contract protocols are applied to dynamically substitute a healthy service for central control. Finally, the central control is reconfigured based on software redundancy methods [28]. Replication Manager Module 1

Achieving request arrival

2 Byzantine fault tolerance technique for processing requests

Prediction Manager Module Monitoring data for trigger 2 Differences between running and actual system 3 Preemptive migration

Recover Manager Module

1

algorithm

1 Central Controller

2 3

Use of gauges

Service and contract protocol

Software redundancy techniques

Fig. 2 Fault-tolerance modules for central controller

3.3 Load balance based workflow job scheduling method Table 3 Summary of Notations

Notations RU

Pj Linit

Ljq Li , j

Dic, j Ttotal

x( Π )

ec[e]( τe-1) T( u , v )

E( u ,v)

E j (t ) PUE j

Definition The average CPU and memory utilization of each server in the cloud The probability that the jobs are scheduled to the jth cloud The number of initial waiting jobs The average job number in qth queue in jth cloud The Euclidean distance between the area i and j The communication time between the area i and j The response time of the entire system The cutting size metric The effect of each cutting edge on the cut size The workflow task completion time of the partition The workflow task execution energy consumption of the partition The power consumption of the data center j at the time t The ratio of the total power to the power delivered to the computing devices

The notations used in this work are summarized in Table 3. First, the states of clouds should be obtained to dispatch jobs appropriately. In this work, the cloud states are classified into two levels, idle and busy. Logistic Regression is a method to explain the relationship between one binary variable and continuous variable [30]. Thus, Logistic Regression is applied to classify the cloud states. The three main factors affecting the service rate of the cloud can be defined as f = ( Load , RU , JC ) , where the cloud workload Load is the total size of all pending and executing jobs in the cloud. The average resource utilization RU is the average CPU and memory utilization of each server in the cloud. The complexity

JC of the jobs is mainly reflected in job types. The sigmoid function can be defined as σ ( f )=

where a = ( a1 , a2 , a3 )

1

(1) T 1 + e− af denotes the parameters for workload, resource utilization and complexity. As

shown in equation (1), it holds that 0 ≤ σ ( f ) ≤ 1 . When σ ( f ) ≥ 0.5 , the service rate of the cloud can be estimated as µ = µ1 . When σ ( u ) < 0.5 , µ = µ2 denotes the service rate. µ denotes the service

rate of cloud with µ1 (idle) and µ2 (busy). The probability that the cloud state is busy can be expressed as

P ( y = 1 f ) = σ ( f ) . The probability that the cloud state is idle can be expressed as

P ( y = 0 f ) = 1 − σ ( f ) . The likelihood function of N independent samples, y1 , y2 ,..., yN , can be

defined as

L ( a ) = ∏i =1 (σ ( fi ) ) (1 − σ ( fi ) ) N

yi

1− yi

(2)

In order to obtain the optimal parameters a , the gradient ascent method is applied to maximize the likelihood function proposed in equation (2). To train Logistic Regression, the training set comes from the historical system log including the workload, average resource utilization, computation complexity and cloud types. The cloud users submit the jobs to each cloud. The number of users is usually high. Each user

submits a job at a time with low probability. Thus, the job arrival rate follows the Poisson distribution [31], [32]. We assume that the job arrival rates follow the Poisson distribution with a parameter λ , which can

be expressed as λj = λ Pj ( j = 1,2,..., N ) . Pj denotes the probability that the jobs are scheduled to the

jth cloud. Since the resources in each cloud are limited, there are at most c jobs which can be executed

simultaneously in every cloud. The remaining jobs need to be queued for waiting resources. We can simulate the number of jobs in the M/M/C queuing system through the birth-death process. The stable conditions of each queue can be described as follows.

where Pjn

 λ nj ,n < c  n  n !µ j Pjn =  (3) λ nj   c( n −c ) c !µ n , n ≥ c j  is the probability of n jobs executed in the jth cloud. µ j indicates the service rate of

the jth cloud. Denote ρ j = λ j cµ j as the service intensity of cloud j . It holds that ρ j < 1 to make the queueing system stable. Therefore, L jq = Linit + ∑ n = c ( n − c ) Pjn = Linit + ρ j Pj∞ (1 − ρ j ) denotes the average job number in the ∞

queue. Linit is the number of initial waiting jobs. P jn is the probability of n jobs executed on the jth cloud. The communication time in the same data center is a constant c ms. Li , j is the Euclidean c distance between the area i and j . Thus, Di , j = δ ( time ) δ ( distance ) × Li, j ( km ) + c ( ms ) denotes the c communication time Di , j .

The response time T j of the jth cloud consisting of the waiting time, the data transmission time and the job execution time can be expressed as Tj = W jq + 1 µ j +D . W jq = L jq λ j denotes the average queue waiting time according to Little law. D denote the data transmission time. Ttotal represents the response time of the entire system. We transform the workflow job scheduling problem in the geo-distributed cloud into a process of solving the minimum response time of the whole system, which can be expressed as follows. min Ttotal = min ∑ j =1 Pj T j = min ∑ j =1 N

N

λj (L λ +1 µ j + D) λ jq j

(4)

s.t. ∑ j =1 λ j = λ N

Job1

(5) ...

18 Job1

Data center1

...

...

Job

45 Data center1

Data center1

45 53 Job2

... Data center2

Job2

...

26 Data center2

53 ...

Job

Data center2

Fig. 3 An example of the workflow job scheduling method

Fig. 3 illustrates an example of the proposed scheduling method for workflow jobs based on load balancing with 2 data centers and 2 submitted jobs. The optimal arrival rates for job1 to the two data centers are 18 and 45 per minute respectively. The optimal arrival rates for job2 to the two data centers are 53 and 26 per minute respectively. The scheduling result is that job1 is assigned to data center 2 for execution and job2 is assigned to data center1 for execution by the proposed method.

3.4 Workflow task scheduling method based on the shortest path Big data processing in a geo-distributed cloud may involve a directed acyclic graph that contains hundreds of tasks. However, simple DAG can only reflect the sequential execution relationship between tasks and cannot consider the relationship between multiple tasks from a global perspective. Thus, the DAG (Directed Acyclic Graphs) form of workflow is transformed into hypergraph according to hypergraph theory. Corresponding hypergraph can be divided according to the following steps. (1) the multilevel coarsening initialization hypergraph partitioning. (2) the multilevel K-path refinement hypergraph partitioning. The purpose of workflow task hypergraph partition is to divide a large number of scattered tasks into several smaller task subsets according to specific constraints. Each task subset after partitioning has the same computing scale and the number of dependencies among task subsets is the least, thus minimizing the data transmission traffic between different data centers and shortening the whole work completion time. The hypergraph partitioning problem has been shown as NP-Hard problem. A hypergraph H = (V , E) is a further generalization of a graph in which an edge can join any number of vertices. Denote vertex set V as task set and hyperedge e ∈ E as a subset of the point set V . A k -path partition of a hypergraph H is a partition of its vertex set which can be expressed as

Π={V1 ,V2 ,...,VK } . Each vertex only belongs to one of the k output sets. Each part of partition Π needs to meet the following three conditions: (1) Every partition is disjoint with Vk ∩ Vl = ∅ for all 1 ≤ k < l ≤ K ; (2) Every partition is non-empty with Vk ⊆ V and Vk ≠ ∅ for 1 ≤ k ≤ K ; (3) The union of K partitions is equal to V with

U

K k =1

Vk = V . Wk = ∑ v∈Vk w [ v ] denotes the

sum of the total vertex weights in the partition Vk . Wavg = ∑ v∈V w[ v ] K is the weight of each partition

when all vertex weights are uniformly distributed. If every partition Vk ∈ Π ( k = 1,2,..., K ) satisfies the

balance criterion Wk ≤ Wavg (1 + ε ) , then Π satisfy the equilibrium of ε . ε is called the maximum allowable imbalance rate. The hypergraph partition problem can be defined to search the k -path equilibrium partition Π for minimizing x ( Π ) , which is an NP-hard problem [33]. The task hypergraph partition for workflow applications can be expressed as follows.

min x ( Π ) = min ∑ n∈N ec [ e] ( τe − 1)

s.t. Wk ≤ Wavg (1 + ε )

(6) (7)

where x ( Π ) is the cutting size metric, ec [ e] ( τe -1) is the effect of each cutting edge on the cut size.

ec [⋅] is the cost of the edge. The number of partitions connected by a hyperedge e is called the

connectivity of the hyperedge, which can be expressed as τe . The hypergraph partitioning process for workflow jobs is shown as the following 6 steps: (1) The goal of k -path initialization hypergraph partitioning is to obtain the coarsened hypergraph sequences H 1 , H 2 , . . . , H m . The vertex v j is found from all unmatched vertices to maximum ∑ eh ∈Eij ec ( eh ) . The matched vertex vi and vertex v j are regarded as hyper points in the vertex set

V l+1 of coarse hypergraphs. The coarsening process may produce edges with the same vertices in the

hypergraph after coarsening, which can be deleted. (2) Then, m sequences H 1 , H 2 , . . . , H m of coarsening hypergraphs can be obtained. The

multilevel recursive partition scheme is used to obtain an initial partition Π m {V1m ,V2m ,...,VKm } of coarsening hypergraph H m . (3) In the end, if the current unbalance exceeds the allowable unbalance ε , we move the vertices of the K partitions so that the unbalance is less than ε at the cost of the partition dimension. (4) In the refinement process, we move the vertex of K vertex partitions for refining partition ∏l and minimize the partition dimension with maintaining balance constrain. (5) In the projection process, the current coarsening hypergraph H l and partition ∏l are projected to H l +1 and ∏l +1 respectively. The refinement and projection steps are repeated iteratively until flat hypergraph H 0 with partition ∏0 is obtained. A FIFO queue B is used to record boundary vertices. The vertex will move to the part with the highest positive FM gain after calculating the move gain. Negative FM gain and non-negative move gain are not performed. (6) The refinement process will complete until the queue B is empty. If the predetermined number of refinements is reached or the cut size is below the predetermined threshold, the refinement process is no longer performed. In this paper, the workflow task scheduling problem can be transformed into the directed graph scheduling problem. The directed graph in this paper belongs to a sparse directed graph which can effectively reduce the time complexity of the proposed algorithm. The literature of graphs defines sparse graphs as those which satisfy E < O (V 2 ) . The concrete proof process is shown in Theorem 1.

Theorem 1: DAG task scheduling problem in the geo-distributed cloud can be expressed as sparse directed acyclic graph.

... u

v

...J Data Centers

... ... the partition number k corresponding to the maximum partition of paths Fig. 4 Workflow task scheduling topology for the maximum partition path

Proof: We consider a geo-distributed cloud system where there are J data centers located in different geographical locations. k denotes the number of partitions corresponding to the number of the maximum partition of paths. The workflow task scheduling topology with the path of the maximum partition is shown in Fig. 4. According to the relation between vertices and edges, the formulas can be obtained as follows.

E =J+( k − 1) J 2 +J= ( k − 1) J 2 +2 J

V =kJ + 2

(8) (9)

After transformation, there are J ⋅ V = O ( kJ 2 ) and V

2

( ),

k = O . Let E =O V

q

then 1
Therefore, the DAG task scheduling problem in the geo-distributed cloud can be represented by a sparse graph. The workflow task completion time of the partition in the path can be calculated as

T(u ,v) = workload m j χ j + D , where workload denotes the workload of a partition in a path. m j is the number of currently active physical machines in the data center j . χ j is the average rate of a server in the data center j . D is the communication time. T(u,v) after Z-score standardization is shown as follows.

T(*u ,v ) = (T(u ,v ) − ηT ) δT

(10)

where ηT is the mean and δT is the standard deviation for the workflow tasks completion time of all the partitions in the path. The workflow task execution energy consumption of the partition in the path can be expressed as E (u , v ) = E j ( t ) , where the power consumption of the data center j at the current time can be expressed as E j ( t ) = PUE j ⋅ m j ( t ) κ j χ j + β j  . κ j χ j + β j denotes the power consumption

of the server with service rate χ j . κ j is a positive factor. β j is the power consumption in the idle state. Assume we know the number m j of active servers, parameters κ j and β j . PUE j denotes the ratio

of the total power to the power delivered to the computing devices [34]. E( u , v ) after standardization is shown as follows.

E(*u ,v ) = ( E(u ,v ) − ηE ) δE

(11)

where ηE is the mean and δE is the standard deviation for the workflow tasks energy consumption of all the partitions in the path. The workflow task scheduling problem for any path can be expressed as follows.

(

min ∑ ( u ,v ) T(∗u ,v ) + E(∗u ,v )

)

s.t.(u, v) ∈ p

(12) (13)

where p represents the set of directed edges that arbitrarily connect two cloud data centers.

4. Load balance based scheduling algorithm for workflow in the distributed cloud 4.1 Load balance based workflow job scheduling algorithm The load balancing job scheduling algorithm proposed in this paper is mainly composed of two parts. (1) The job execution time can be predicted according to Logistic Regression. (2) The job scheduling problem can be modeled as a M / M / C queue to optimize the job arrival rate. The pseudocode of the workflow job scheduling algorithm based on load balancing is shown in Algorithm 1. First, the coefficients a1 , a2 and a3 (Algorithm 1 Line 1), which can be used repeatedly in the future executions, are obtained to maximize the likelihood function proposed in (2). Logistic Regression is used to predict the job execution time and the service rate times[1...N ] of all the clouds can be obtained (Algorithm 1 Line 2~13). We repeat the above operations until the traversal is completed (Algorithm 1 Line 14~19). Then, the response time of the entire system can be calculated by job queuing based on the queuing theory

(Algorithms 1 Line 18). Finally, the nonlinear programming method with linear equality constraints is used to achieve job arriving probability. The jobs are scheduled to cloud according to the probability (Algorithms 1 Line 20~22). The time complexity of the proposed job scheduling algorithm is O ( N 2 ) . Algorithm 1: Load balance based workflow job scheduling algorithm Input: historical execution information Infors and the number N of clouds. Output: the optimized job arrival rate Re sults 1

Obtain the parameters a1 , a2 and a3 by gradient ascent method to maximize the likelihood function proposed in (2)

2

for DC j ∈ DC do load ← getWorkload (), ru ← get Re sourerUsage (), jc ← getJobComplexity () // The training set

3

comes from the historical system log including the workload, average resource utilization, computation complexity and cloud types. 4 5 6

z j ← ∑ j =1 ∑i =1 ∑ p (a1Load jiP + a2 RU jiP + a3 JC jiP ) N

F

σ ( z j ) = 1 (1 + e ) −zj

if σ (z j ) > 0.5 then

serviceRate ← µ1 , preExeTime ← 1 / µ1

7 8

M

else

serviceRate ← µ 2 , preExeTime ← 1 / µ2

9 10

end if

11

add serviceRate to set sr

12

add preExeTime to set times

13

end for

14

for µ j ∈ sr do // the job scheduling based on queuing theory

(c × µ ) , P ← ∑ ← ( ρ × P ) (1 − ρ ) , W

c −l

15

ρ j ← λj

16

L jq

17

Tj ← Wjq +1 µ j

18

Ttotal ← Ttotal + Tj

19

j

j

n =0

0



j

jq

c (cρj )n n! + (cρj )c c!(1- ρj ) , P∞ ← (cρ j ) P0 c!(1- ρ j )

← L jq λ j //calculate the average waiting time by Little law

end for

20

calculate min ( Ttotal ) satisfying

21

add λ1 , λ2 ,K , λn to Re sults

22

∑P = 1 j

return Re sults

4.2 Workflow task scheduling algorithm based on the shortest path algorithm Algorithm 2 describes the pseudocode of the move gain of hypergraph vertex calculation algorithm. The main steps of the move gain of hypergraph vertex calculation algorithm are shown in Algorithm 2. Algorithm 2: the move gain of hypergraph vertex calculation algorithm Input: vertex v i , partition Part , two arrays Λ and δ Output: partition of the maximum moving gain shift for vertex v i 1

leave − gain ← 0

2

for edge e j ∈ Nets ( vi ) do // calculate the move gain by iterating over all edges connecting vertex v i

3 4

(

)

if δ e j , Part ( vi ) = 1 then

leave − gain ← gain + ec ( e j )

5

end for

6

If there is no positive move gain, return results. otherwise, the following steps are performed

7

calculate the maximum arrival loss and store the results in the memory temporarily

8

calculate the move gain for the partition connected by at least one of the cut edges of the vertex v i store the results in the memory temporarily targetParts ← 0 , maxArrivalLoss ← 0

9

for edge e j ∈ Nets ( vi ) do

10

maxArrivalLoss ← maxArrivalLoss + ec ( e j )

11

λj > 1 , find the partition which vertex v i can move to and compute the respective move gains for part Vk ∈Λ(n j ) − Part(vi ) do

12

if

13

if Vk ∉ targetParts then

14 15

insert Vk into the partition set which vertex v i can move to

16

targetParts ← targetParts ∪ {Vk } , MoveGain(Vk ) ← 0

MoveGain(Vk ) ← MoveGain(Vk ) + ec(e j )

17 18

end for

19

end for

20

for part Vk ∈ targetParts do // calculate the maximum move gain of the vertex v i if W (Vk ) + w(vi ) < (1 + ε)Wavg then

21 22

MoveGain(Vk ) ← MoveGain(Vk ) + (leaveGain − maxArrivalLoss )

23

if MoveGain(Vk ) ← MoveGain(Vk ) then

24

maxMoveGain ← MoveGain (Vk )

25

maxPart ← Vk

26

end for

27

return

maxPart maxMoveGain

Algorithm 3 describes pseudocode for a workflow task scheduling algorithm based on the shortest path algorithm. The main steps of the proposed task scheduling method are shown in Algorithm 3. Total time complexity of the workflow task scheduling algorithm is O ( m+N log N+T log T ) . Algorithm 3: the task scheduling algorithm based on shortest path algorithm Input: workflow tasks set which has been converted to hypergraph H

0

Output: task scheduling strategy with minimum execution time and energy consumption 1 2

// hypergraph partition for

vi ∈Vl do



ec(eh ) .

3

maximize

4

match the vertex v i and

eh ∈Eij

vj // regard the matching vertices as the hyper points in the vertices set.

5

end for

6

delete edges with the same vertices

7

steps 1-5 are iterated for m

{V

m 1

m 2

m K

}

m

times to obtain H 1 , H 2 , . . . , H m .

8

Π

9

if the imbalance exceeds the threshold ε , the imbalance is reduced by moving the vertices in the

,V ,...,V

can be obtained // by multi-level recursive grading method.

K partitions

10 11

obtain k -path initial partitioning of coarsened hypergraph sequences H 1 , H 2 , . . . , H m // task scheduling strategy

12 13 14 15

for task ∈ job do for DC ∈ DCs do ∗ ∗ calculate execution time T( u ,v ) and energy consumption E(u , v ) for tasks

end for

16

end for

17

establish the task scheduling model for each partition and store the results in the memory temporarily

18

for part ∈ Parts do

19

find the shortest path scheduling strategy for tasks // by Dijkstra algorithm based on Fibonacci heap

20

end for

21

return workflow task scheduling strategy

5. Application scenario: new media live video application We design a real-world social media streaming application scenario. Geo-distributed clouds are ideal for supporting large-scale social media streaming applications, for example, live streaming of new media and YouTube's social networking. In the era of the high-speed Internet, real-time live social media based on streaming media technology has become the most live, real-time and interactive internet applications after text socialization, voice socialization, and picture socialization. In this section, we use live video scenario as an example to introduce the specific process of load balancing scheduling method for workflow applications. Information search

Central Contrller

Job dispatch

Dispatch job

The Geo-distributed clouds

Live video files

Storage Storage Workflow

Workflow

Task schedule

User zone

Cloud data center

Task schedule

Cloud data center

...

User zone 4.Fetch Live Video information

sender viewer

... viewer

sender

viewer sender

Fig. 5 Our proposed method in the live video scenario

As shown in Fig. 5, the structure of the live video is composed of two parts: client and server (video cloud service center). The video cloud service center is composed of a central controller and cloud data centers. In the video cloud service center, the central controller manages all workflow jobs submitted by users. The central controller can be a powerful machine or server cluster that is responsible for request reception, task assignment, resource allocation, capacity expansion decisions and more. The cloud data centers are mainly responsible for initializing the execution environment, executing the workflow jobs,

transmitting the video stream, reporting its own running status and feeding back the results of the workflow jobs execution. In this section, we use target recognition as a task instance to illustrate how our method works in this situation. Considering a real-life scenario, some people use their mobile phones or Google Glass to capture images and videos for live video. The data here can be video taken by the mobile phone or image taken by Google Glass. The detection request will arrive at the frame rate per second. Computationally intensive computer vision algorithms can be performed in cloud data centers. There are many applications in this field such as target recognition, face recognition, gesture recognition, and image stitching. Two well-known algorithms can be used such as Canny Edge Detection and Scale Invariant Feature Transform (SIFT) [35].

6. Experiments 6.1 Configurations Public cloud

node1

node1

node2

node2 master

master

node10

node10

Client

node1 node2

feedback

Assign jobs

center controller

master

node10 Local cluster

node1 node2 master

node10 Local cluster

Fig. 6 The experimental environment of a geo-distributed environment

In this experiment, a cluster of a public cloud is leased and a virtual machine cluster created on a remote cloud is combined with two local clusters to build a geo-distributed environment. The experimental environment of a geo-distributed environment is shown in Fig. 6. The corresponding DAG for these four benchmark applications are shown in Fig. 7. In order to evaluate the average response time, the average job waiting time and system throughput of the workflow job scheduling algorithm, we compare the LBJS (Load Balancing Job Scheduling) algorithm with Shortest Remaining Processing Time (SRPT) algorithm [37], Workload-Aware Greedy Scheduling (SWAG) algorithm [38] and Weighted Fair-Share Queue (WFS) algorithm [39]. In order to evaluate the task average completion time, energy consumption and QoS satisfaction rate of the workflow task scheduling algorithm based on shortest path algorithm, we compare the SPTS (Shortest Path Task Scheduling) algorithm with Multi-Constraint

Multilevel K-way Cut (MCMKCut) algorithm [40] and Cost-Aware Workflow Transformation (CAWT) algorithm [41]. T1

T2

...

Tn

T11

...

T21

... T2n

...

...

Tm1

... Tmn

T1

T1

(a) DAG task type EP

...

... ...

Tn −1

Tn +1

T3

T2

...

Tn (b) DAG task type GT

...

... ...

Tn +1

...

Tn −1 Tn

(c) DAG task type MT (d) DAG task type SG

Fig. 7 DAG task type

6.2 Experimental Comparison 6.2.1 Experimental verification and impact analysis of parameter A. Verification of job execution time prediction based on Logistic Regression Each type of jobs is repeated for 100 times. The job complexity, resource utilization and workload can be used as a training set. The job complexity is measured by an open source software program SourceMonitor [42]. The weights a1 , a2 , a3 of data volume can be calculated. The number of iterations can be set as 500 and the step size can be 0.01. Then 500 sets of weights can be obtained and the average value is taken as the value of the parameter. When performing EP, GT, MT and SG jobs, the slice size is 128MB. We set different execution times. The jobs execution time can be predicted by Logistic Regression and the real time of the job execution can be obtained from the log files. Then the prediction time accuracy rate MAPE can be calculated. The specific experimental results are shown in Fig. 8. To evaluate the accuracy of a regression model, coefficient of determination ( R 2 ) error (RMSE) and mean absolute percent error (MAPE) are usually used. R

2

root mean square depicts the linear

covariation between two datasets instead of the actual differences. RSME is a dimensional measure, which is not independent of data scale and unit. To evaluate the binary prediction results, it is meaningful to use the MAPE [43].

Fig. 8 Prediction time accuracy

In Fig. 8, the accuracy of the job execution time prediction method based on Logistic Regression is reliable. As the number of the experiment increases, the value MAPE becomes smaller. When the number of experiments is 200, the maximum MAPE corresponding to the workflow MT is 0.034. It means that the prediction accuracy is reliable. This is because the more the experimental number is, the more data of the training set is. The parameters are continuously corrected and the accuracy of the predicted values is improved. Thus, the average absolute percentage error of the job execution time prediction is smaller. B. The impact analysis of parameter K in the workflow task hypergraph partition In the process of hypergraph partition, the number K of partitions will directly affect the minimum partition size, the average partition size and the average number of the partition. We set the maximum number for K path channels as 3. The allowable imbalance threshold is 10%. When it holds that K ∈{32,64,128,256} , the minimum partition size, the average partition size and the average partition number are listed in Table 6.

K 16 32 64 128 256

minimum partition size 3.04 6.29 10.14 17.14 28.03

Table 4 Performance of k -path Hypergraph partition average average cutting size partition size partition number improvement percentage 3.86 0.51 5.12% 6.9 0.72 7.51% 10.64 0.93 10.62% 17.77 1.29 8.39% 28.40 1.76 6.44%

According to Table 4, the minimum partition size, the average partition size and the average partition number increase with K increases. The percentage of cut size improvement increase first and decreases then. When K = 64 , the cut size improvement percentage is 10.62% which reaches the highest. It shows that when K is equal to 64, the cut size of the hypergraph partition is improved most obviously and the cutting effect is the best. Therefore, the partition number is set to 64 in the K path partition of the workflow tasks. 6.2.2 Comparisons and Analysis A. The workflow job scheduling algorithm based on load balancing The node maximum number of parallel tasks affects the efficiency of job execution. In order to evaluate the influence of the node maximum number of parallel tasks on the average waiting time, the average response time and the system throughput, we set three different job arrival rates with 30, 60 and 120 respectively. The node maximum number of parallel tasks ranges from 32 to 512. Since both WFS algorithm [39] and our proposed LBJS algorithm are based on queueing theory, a comparation in theoretical part is provided first. Based on equations (4) and (5), the LBJS algorithm considers the different cloud running states the different average service rates of job queues, which can reduce the total job response time and improve the resource utilization of clouds. The WFS algorithm cannot adjust the job dispatching processes according to the various job service rates of clouds dynamically in geo-distributed cloud environment. As shown in Fig. 9, the job average waiting time decreases as the node maximum number of parallel

tasks increases. This is because the more the node maximum number of parallel tasks is, the higher the efficiency of job execution is which resulting in reducing the job queuing waiting time. For instance, in Fig. 9(a), when the job arrival rate is 30, the LBJS algorithm can achieve up-to 40%, 12.5% and 45.46% average waiting time reduction over the SRPT algorithm, SWAG algorithm and WFS algorithm respectively. From Fig. 9(b), when the job arrival rate is 60, the LBJS algorithm can achieve up-to 33.33%, 28.57% and 44.44% average waiting time reduction over the SRPT algorithm, SWAG algorithm and WFS algorithm respectively. From Fig. 9(c), when the job arrival rate is 120, the LBJS algorithm can achieve up-to 18.75%, 7.14% and 23.53% average waiting time reduction over the SRPT algorithm, SWAG algorithm and WFS algorithm respectively. This indicates that the LBJS algorithm can reduce the job average waiting time when the node maximum number of parallel tasks changes.

(a) job arrival rate =30

(b) job arrival rate =60

(c) job arrival rate =120 Fig. 9 The maximum number of parallel tasks Vs. average waiting time

In Fig. 10(a), when the job arrival rate is 30, the LBJS algorithm can achieve up-to 27.9%, 18.24% and 34.24% average response time reduction over the SRPT algorithm, SWAG algorithm and WFS algorithm respectively. From Fig. 10(b), when the job arrival rate is 60, the LBJS algorithm can achieve 32.5%, 22.8%, 40.52% average response time reduction over the SRPT algorithm, SWAG algorithm and WFS algorithm respectively. From Fig. 10(c), when the job arrival rate is 120, the LBJS algorithm can achieve up-to 38.54%, 28.66% and 49.62% average response time reduction over the SRPT algorithm, SWAG algorithm and WFS algorithm respectively. This indicates that the LBJS algorithm can effectively reduce the job average response time when the maximum number of parallel tasks changes.

(a) job arrival rate =30

(b) job arrival rate =60

(c) job arrival rate =120 Fig. 10 The effect of the maximum number of parallel tasks on average response time

As shown in Fig. 11, the system throughput increases as the node maximum number of parallel tasks increases. This is because the larger the node maximum number of parallel tasks is, the higher the efficiency of job execution is. The number of tasks completed per unit time increase resulting in the system throughput increasing. In Fig. 11(a), when the job arrival rate is 30, the LBJS algorithm can achieve up-to 34.7%, 16.5% and 42.12% system throughput improvement over the SRPT algorithm, SWAG algorithm and WFS algorithm respectively. From Fig. 11(b), when the job arrival rate is 60, the LBJS algorithm can achieve up-to 33.33%, 18.33% and 39.45% system throughput improvement over the SRPT algorithm, SWAG algorithm and WFS algorithm respectively. From Fig. 11(c), when the job arrival rate is 120, the LBJS algorithm can achieve up-to 32.77%, 17.38% and 43.14% system throughput improvement over the SRPT algorithm, SWAG algorithm and WFS algorithm respectively. This indicates that the LBJS algorithm can effectively improve the throughput when the node maximum number of parallel tasks changes.

(a) job arrival rate =30

(b) job arrival rate =60

(c) job arrival rate =120 Fig. 11 The effect of the maximum number of parallel tasks on throughput

B. The workflow task scheduling algorithm based on the shortest path algorithm As shown in Fig.12, we show the impact of the number of virtual machines on the average completion time. When the task number and the resource utilization percentage remain unchanged, the average tasks completion time of the proposed SPTS algorithm, MCMKCut and CAWT algorithm show a

decreasing trend with the increase of the virtual machines number. As the virtual machines number increases, the system ability of performing tasks increases. The more system resources can be used to perform tasks. The task average waiting time reduce resulting the task average completion time reducing. From Fig.12 (a), when the virtual machines number is 15, the average completion time of the proposed SPTS algorithm is 25.5% and 16.3% less than the MCMKCut and CAWT algorithm respectively. From Fig.12(b), when the virtual machines number is 15, the average completion time of the proposed SPTS algorithm is 32.2% and 19.4% less than the MCMKCut and CAWT algorithm respectively.

(a) CTP=20%

(b) CTP=80% Fig.12 The effect of number of virtual machines on average completion time The resource utilization percentage of each virtual machine has a direct impact on the performance of the proposed SPTS algorithm. In order to evaluate the impact of the resource utilization percentage on the average completion time, energy consumption and QoS satisfaction rate, we set virtual machines

number in each cloud data center to 10. The task number is set to 64 and the resource utilization percentage is set to 40%, 50%, 60%, 70% and 80% respectively. Each of the experimental data is the average value of 20 replicate experiments. As shown in Fig.13, when the virtual machines number and the task number remain unchanged, the average tasks completion time of the proposed SPTS algorithm, MCMKCut and CAWT algorithm show a decreasing trend with the increase of the resource utilization percentage. As the resource utilization percentage increases, the more system resources can be used to perform the tasks. The efficiency of task execution is higher so that the task average completion time is shorter. From Fig.13 (a), when the resource utilization percentage is 50%, the average completion time of the proposed SPTS algorithm is 21.5% and 10.8% less than the MCMKCut and CAWT algorithm respectively. From Fig.13 (b), when the resource utilization percentage is 50%, the average completion time of the proposed SPTS algorithm is 42.1% and 18.6% less than the MCMKCut and CAWT algorithm respectively. In general, when the resource utilization percentage changes, the average completion time of the proposed SPTS algorithm is shorter

Average completion time(s)

than MCMKCut and CAWT algorithms. MCMKCut

CAWT

SPTS

9 8 7 6 5 4 3 2 1 0 40

50

60

70

80

(a) CTP=20%

(b) CTP=80% Fig.13 The effect of resource utilization percentage on average completion time

As shown in Fig.14, when the virtual machines number and the task number remain unchanged, the system energy consumption of the proposed SPTS algorithm, MCMKCut and CAWT algorithm show a decreasing trend with the increase of the resource utilization percentage. When the virtual machines number remains, the number of active virtual machines remain. As the resource utilization percentage increases, the more efficient the system performs tasks. The task completion time is shorter, which increases the system energy consumption. From Fig.14(a), when the resource utilization percentage is 50%, the energy consumption of the proposed SPTS algorithm is 35% and 20% less than the MCMKCut and CAWT algorithm respectively. From Fig.14(b), when the resource utilization percentage is 50%, the energy consumption of the proposed SPTS algorithm is 43.5% and 33.3% less than the MCMKCut and CAWT algorithm respectively.

Energy consumption(nJ)

MCMKCut

CAWT

SPTS

3.5 3 2.5 2 1.5 1 0.5 0 40

50

60

70

80

(a) CTP=20%

(b) CTP=80% Fig.14 The effect of resource utilization percentage on energy consumption As shown in Fig.15, when the virtual machines number and the task number remain unchanged, the QoS satisfaction of the proposed SPTS algorithm, MCMKCut and CAWT algorithm show an increasing trend with the increase of the resource utilization percentage. As the resource utilization percentage increases, the higher the efficiency of the system performing tasks. The task average execution time is reduced. When the deadline is fixed, the QoS satisfaction rate increases. From Fig.15 (a), when the resource utilization percentage is 50%, the QoS satisfaction rate of the proposed SPTS algorithm is 6.3%

and 3.2% more than the MCMKCut and CAWT algorithm respectively. When the resource utilization percentage is 80%, the QoS satisfaction rate of the proposed SPTS algorithm is 3.9% and 2.9% more than the MCMKCut and CAWT algorithm respectively. From Fig.15 (b), when the resource utilization percentage is 50%, the QoS satisfaction rate of the proposed SPTS algorithm is 12.7% and 7.2% more than the MCMKCut and CAWT algorithm respectively.

(a) CTP=20%

(b) CTP=80% Fig.15 The effect of resource utilization percentage on QoS satisfaction rate

7. Conclusion and Future Work In this paper, an effective scheduling strategy based on hypergraph partition for workflow applications in the geo-distributed cloud. We formulate the job scheduling as M/M/C queues to indicate the job scheduling processes in geo-distributed clouds. The average waiting time of the jobs is obtained based on Little's law and the system response time can be further obtained. The job scheduling problem is formulated as a system response time minimization problem. To deal with the DAG task scheduling problem, a workflow task partitioning algorithm based on hypergraph is proposed. The task scheduling problem is transformed into a shortest path problem, which is further solved by Dijkstra algorithm based on Fibonacci heap. We take a real live video application scenario as an example to introduce our proposed

method. The experimental results depict that the proposed scheduling algorithms can improve the system performance and achieve the load balancing in each cloud. In the future, the unreliable or congested machines in geo-distributed cloud will be considered. Thus, speculative execution schemes will be investigated in geo-distributed cloud. Moreover, Software-Defined Networking (SDN) can bring many benefits to cloud computing. Thus, how to utilize SDN in geo-distributed cloud is a significant problem.

Acknowledgment The work was supported by the National Natural Science Foundation (NSF) under grants (No.61672397, No. 61873341, No.61771354), Application Foundation Frontier Project of WuHan (No. 2018010401011290). Open Foundation of State Key Laboratory of Smart Manufacturing for Special Vehicles and Transmission System (GZ2019KF002). Any opinions, findings, and conclusions are those of the authors and do not necessarily reflect the views of the above agencies.

References [1] Li H, Zhu G, Cui C, et al. Energy-efficient migration and consolidation algorithm of virtual machines in data centers for cloud computing. Computing. 2016, 98(3): 303-317. [2] Zhu J, Zheng Z, Zhou Y, et al. Scaling service-oriented applications into geo-distributed clouds. Proceeding of 2013 IEEE Seventh International Symposium on Service-Oriented System Engineering. IEEE, 2013: 335-340. [3] Wu F , Wu Q , Tan Y . Workflow scheduling in cloud: a survey. The Journal of Supercomputing, 2015, 71(9):3373-3418. [4] Masdari M , Valikardan S , Shahi Z , et al. Towards Workflow Scheduling In Cloud Computing: A Comprehensive Analysis. Journal of Network and Computer Applications, 2016, 66: 64-82. [5] Jafarnejad Ghomi E , Masoud Rahmani A , Nasih Qader N . Load-balancing Algorithms in Cloud Computing: A Survey. Journal of Network and Computer Applications, 2017, 88: 50–71 [6] Zhou Z , Liu F , Zou R , et al. Carbon-Aware Online Control of Geo-Distributed Cloud Services. IEEE Transactions on Parallel & Distributed Systems, 2016, 27(9): 2506 – 2519 [7] Ahmad R W , Gani A , Hamid S H A , et al. A survey on virtual machine migration and server consolidation frameworks for cloud data centers. Journal of Network and Computer Applications, 2015, 52:11-25. [8] Wu Y , Wu C , Li B , et al. Scaling Social Media Applications Into Geo-Distributed Clouds. IEEE/ACM Transactions on Networking, 2015, 23(3):689-702. [9] Chen L, Liu S, Li B and Li B. Scheduling Jobs across Geo-Distributed Datacenters with Max-Min Fairness. IEEE Transactions on Network Science and Engineering. 2018: 1-9 [10] Xu C, Wang K, Li P, Xia R, Guo S and Guo M. Renewable Energy-Aware Big Data Analytics in Geo-distributed Data Centers with Reinforcement Learning. IEEE Transactions on Network Science and Engineering .2018: 21-32 [11] Convolbo M W, Chou J, Hsu C H, et al. GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers. Computing. 2018, 100(1): 21-46. [12] Li W, Xu R, Qi H, et al. Optimizing the cost-performance tradeoff for geo-distributed data analytics with uncertain demand. Proceeding of 2017 IEEE/ACM 25th International Symposium on Quality of Service. IEEE, 2017: 1-6. [13] Forestiero A, Mastroianni C, Meo M, et al. Hierarchical approach for efficient workload management in geo-distributed data centers. IEEE Transactions on Green Communications and Networking. 2017, 1(1): 97-111. [14] Li P, Guo S, Miyazaki T, et al. Traffic-aware geo-distributed -big data analytics with predictable job completion time.

IEEE Transactions on Parallel and Distributed Systems. 2017, 28(6): 1785-1796. [15] Zhou X, Wang K, Jia W, et al. Reinforcement learning-based adaptive resource management of differentiated services in geo-distributed data centers. Proceeding of 2017 IEEE/ACM 25th International Symposium on Quality of Service. IEEE, 2017: 1-6. [16] Cavallo M , Modica G D , Polito C , et al. A LAHC-based Job Scheduling Strategy to Improve Big Data Processing in Geo-distributed Contexts. Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security. 2017: 92-101. [17] Hu Z, Li B and Luo J. Time- and Cost-Efficient Task Scheduling across Geo-Distributed Data Centers. IEEE Transactions on Parallel and Distributed Systems. 2018, 29(3): 705-718. [18] Wang S, Qian Z, Yuan J and You I. A DVFS Based Energy-Efficient Tasks Scheduling in a Data Center. IEEE Access. 2017: 13090-13102. [19] Geng J. CODE: Incorporating Correlation and Dependency for Task Scheduling in Data Center. 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), Guangzhou. 2017: 909-913. [20] Yuan H, Bi J, Zhou M and Sedraoui K. WARM: Workload-Aware Multi-Application Task Scheduling for Revenue Maximization in SDN-Based Cloud Data Center. IEEE Access. 2018: 645-657. [21] Żotkiewicz M, Guzek M, Kliazovich D and Bouvry P. Minimum Dependencies Energy-Efficient Scheduling in Data Centers. IEEE Transactions on Parallel and Distributed Systems. 2016, 27(12): 3561-3574. [22] Lu W, Lu P, Sun Q, Yu S and Zhu Z. Profit-Aware Distributed Online Scheduling for Data-Oriented Tasks in Cloud Datacenters. IEEE Access. 2018: 15629-15642. [23] Yuan H, Bi J, Zhou M. and Ammari A. C. Time-Aware Multi-Application Task Scheduling With Guaranteed Delay Constraints in Green Data Center. IEEE Transactions on Automation Science and Engineering. 2018, 15(3): 1138-1151. [24] Zhang J, Chen J, Zhan J, et al. A. Graph partition-based data and task co-scheduling of scientific workflow in geo-distributed datacenters. Concurrency and Computation: Practice and Experience, 2019. [25] Alshaer H. An overview of network virtualization and cloud network as a service. International Journal of Network Management, 2015, 25(1):1-30. [26] Hasan M , Goraya M S . Fault tolerance in cloud computing environment: A systematic survey. Computers in Industry, 2018, 99:156-172. [27] Cheraghlou M N , Khadem-Zadeh A , Haghparast M . A Survey of Fault Tolerance Architecture in Cloud Computing. Journal of Network and Computer Applications, 2015, 61:81-92. [28] Alkasem A , Liu H . A Survey of Fault-tolerance in Cloud Computing: Concepts and Practice. Research Journal of Applied Sciences, Engineering and Technology `, 2015, 11(12):1365-1377. [29] Psaier H , Dustdar S . A survey on self-healing systems: approaches and systems. Computing, 2011, 91(1):43-73. [30] Rao S J . Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Publications of the American Statistical Association, 2005, 98(461):257-258. [31] Zeng D , Gu L , Guo S , et al. Joint Optimization of Task Scheduling and Image Placement in Fog Computing Supported Software-Defined Embedded System. IEEE Transactions on Computers, 2016, 65(12): 3702 - 3712. [32] Khazaei H , Misic J , Misic V B . Performance of Cloud Centers with High Degree of Virtualization under Batch Task Arrivals. IEEE Transactions on Parallel and Distributed Systems, 2013, 24(12):2429-2438. [33] Lengauer T. Combinatorial Algorithms for Integrated Circuit Layout. Vieweg,Teubner Verlag. 1992.

[34] Han T and Ansari N. Powering mobile networks with green energy. IEEE Wireless Communications. 2014(21): 90-96. [35] Parul Pandey, Hariharasudhan Viswanathan, Dario Pompili. Robust orchestration of concurrent application workflows in mobile device clouds. Journal of Parallel and Distributed Computing. 2018(120): 101-114. [36] Deelman E, Vahi K, Juve G, et al. Pegasus, a workflow management system for science automation. Future Generation Computer Systems. 2015, 46: 17-35. [37] Zhang S, Pan L, Liu S, et al. Profit Based Two-Step Job Scheduling in Clouds. Proceeding of 17th International Conference on Web-Age Information Management, Berlin: Springer. 2016 (9659): 481-492. [38] Hung C C, Golubchik L, Yu M. Scheduling jobs across geo-distributed datacenters. Proceedings of the Sixth ACM Symposium on Cloud Computing. ACM, 2015: 111-124. [39] Kumar K N, Mitra R. Resource Allocation for Heterogeneous Cloud Computing Using Weighted Fair-Share Queues. Proceedings of 2018 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM). IEEE, 2018: 31-38. [40] Chen J, Zhang J, Song A. Efficient Data and Task Co-Scheduling for Scientific Workflow in Geo-Distributed Datacenters. Proceeding of 2017 Fifth International Conference on Advanced Cloud and Big Data. IEEE, 2017: 63-68. [41] Ishizuka Y, Chen W, Paik I. Workflow Transformation for Real-Time Big Data Processing. Proceeding of 2016 IEEE International Congress on Big Data . IEEE, 2016: 315-318. [42] SourceMonitor, http://www.campwoodsw.com/sourcemonitor.html. [43] Ji L, Gallo K. An Agreement Coefficient for Image Comparison. Photogrammetric Engineering & Remote Sensing, 2006, 72(7):823-833. [44] Alhazmi K, Shami A, Refaey A. Optimized provisioning of SDN-enabled virtual networks in geo-distributed cloud computing datacenters. Journal of Communications and Networks, 2017, 19(4): 402-415. [45] Toosi A N, Buyya R. A fuzzy logic-based controller for cost and energy efficient load balancing in geo-distributed data centers. Proceedings of the 8th International Conference on Utility and Cloud Computing. IEEE, 2015: 186-194. [46] Feng Q, Han J, Gao Y, et al. Magicube: High Reliability and Low Redundancy Storage Architecture for Cloud Computing. Proceeding of 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage. IEEE, 2012: 89-93. [47] Zhang B, Hwang J. Task assignment optimization in geographically distributed data centers. Proceeding of 2017 IFIP/IEEE Symposium on Integrated Network and Service Management . IEEE, 2017: 497-502. [48] Zhang Y, Zheng Z, Lyu M R. BFTCloud: A byzantine fault tolerance framework for voluntary-resource cloud computing. Proceeding of 2011 IEEE 4th International Conference on Cloud Computing. IEEE, 2011: 444-451. [49] Ganesh A, Sandhya M, Shankar S. A study on fault tolerance methods in cloud computing. Proceeding of 2014 IEEE International Advance Computing Conference. IEEE, 2014: 844-849. [50] Li Chunlin, Yaping Wang, Yi Chen, Luo Youlong, Energy-Efficient Fault-Tolerant Replica Management Policy with Deadline and Budget Constraints in Edge-Cloud Environment, Journal of Network and Computer Applications, Volume 143, 1 October 2019, Pages 152-166 [51] Li Chunlin, Sun Hezhi, Yi Chen, Luo Youlong, Edge Cloud Resource Expansion and Shrinkage Based on Workload for Minimizing the Cost, Future Generation Computer Systems, Volume 101, December 2019, Pages 327-340 [52] Li Chunlin, Tang Jianhang, Hengliang Tang Luo Youlong, Collaborative Cache Allocation and Task Scheduling for Data-Intensive Applications in Edge Computing, Future Generation Computer Systems, Volume 95, June 2019, Pages 249-264

Author Contributions Section Chunlin Li, Jianghang Tang, Tao Ma, Xihao Yang, Youlong Luo designed the study, developed the methodology, performed the analysis, and wrote the manuscript. Chunlin Li, Jianghang Tang collected the data. Chunlin Li, Jianghang Tang also revised the paper according to the comments.

Biographical notes: Chunlin Li is a Professor of Computer Science in Wuhan University of Technology. She received the ME in Computer Science from Wuhan Transportation University in 2000, and PhD in Computer Software and Theory from Huazhong University of Science and Technology in 2003. Her research interests include cloud computing and distributed computing. Jianhang Tang received his BS degree in Applied Mathematics from South-central University for Nationalities in 2013 and MS degree in Applied Statistics from Lanzhou University in 2015. He is a PhD student in School of Computer Science and Technology from Wuhan University of Technology. His research interests include cloud computing and big data. Tao Ma is a senior engineer of State Key Laboratory of Smart Manufacturing for Special Vehicles and Transmission System, dedicated to digital and intelligent manufacturing technology research. Xihao Yang is an engineer of State Key Laboratory of Smart Manufacturing for Special Vehicles and Transmission System, dedicated to digital and intelligent manufacturing technology research.

Youlong Luo is a vice Professor of Management at Wuhan University of Technology. He received his M.S. in Telecommunication and System from Wuhan University of Technology in 2003 and his Ph.D. in Finance from Wuhan University of Technology in 2012. His research interests include cloud computing and electronic commerce.