Author’s Accepted Manuscript A Cooperative Scheduling Method based on the Device Load Feedback for Multiple Tasks Scheduling Yu Xin, Ya-Di Wang, Zhi-Qiang Xie, Jing Yang www.elsevier.com/locate/jnca
PII: DOI: Reference:
S1084-8045(17)30310-7 https://doi.org/10.1016/j.jnca.2017.09.012 YJNCA1979
To appear in: Journal of Network and Computer Applications Received date: 23 January 2017 Revised date: 14 August 2017 Accepted date: 27 September 2017 Cite this article as: Yu Xin, Ya-Di Wang, Zhi-Qiang Xie and Jing Yang, A Cooperative Scheduling Method based on the Device Load Feedback for Multiple Tasks Scheduling, Journal of Network and Computer Applications, https://doi.org/10.1016/j.jnca.2017.09.012 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
A Cooperative Scheduling Method based on the Device Load Feedback for Multiple Tasks Scheduling Yu Xina,∗, Ya-Di Wanga , Zhi-Qiang Xiea , Jing Yangb a College
of Computer Science and Technology, Harbin University of Science and Technology, Heilongjiang, 150001, China b College of Computer Science and Technology, Harbin Engineering University, Heilongjiang, 150001, China
Abstract With the development of cloud computing, the traditional Star Scheduling System with solo scheduler can not meet the requirement of distributed system. Thus, we designed a scheduling method, which can be applied into multischeduler system, denoted by MTDR (Multi Task Dynamic-Rank scheduling method). For the issue of device confliction in multi-scheduler system, we design the following scheduling principle: the devices feedback their load state to schedulers to control the successive scheduling process. For the device load modeling, we utilize time window method to predict the device’s load state, denoted by LSF (Load State Feedback model). When the schedulers deal with the task slices, the device load state is considered. We performed experiments on the arrival time test, device dependence test, task structure test, CCR(communication computation ratios) test, Devices Set test. By experimental comparison, the effectiveness and rationality of the proposed method is verified. Keywords: load balancing, task scheduling, multi-scheduler, dynamic rank
1. Introduction With the development of communication technology, the traditional single host system has begun converting into distributed system.
And many
∗ Corresponding
author Email addresses:
[email protected] (Yu Xin),
[email protected] (Ya-Di Wang),
[email protected] (Zhi-Qiang Xie),
[email protected] (Jing Yang)
Preprint submitted to Journal of LATEX Templates
October 2, 2017
distributed systems, such as Mapreduce[1, 2], IaaS[3] are emerging. All the 5
distributed systems utilize service devices to response the individual service request. In order to improve the efficiency of resource allocation and data processing, the scheduling issue has become the focus of distributed system research recently. The contents of these researches mainly comprise the task allocation
10
scheduling and data transmission scheduling. At the task allocation scheduling framework aspect, The Refs [4, 5] suggested 6 compatible strategies orient to elastic computing cloud, and proposed the two levels scheduling model, comprising cloud schedule and sub-schedule. The two levels scheduling model gave a macroscopic and microcosmic view to illustrate the scheduling. At the data
15
transmission scheduling aspect, the content is to design the scheduling policy with the objective of minimizing the telecommunication cost, according to the distribution features. In this aspect, Nan[6] proposed the cost model and queuing mode, and designed the QoS oriented distributed resource scheduling algorithm to minimize the distributed data transmission cost. Kllapi[7] suggested
20
the splitting, computing and merging method for data stream processing, and designed the data stream transmission scheduling method. And Lin[8] considered the processing and transmission capacity of each device, to allocate the devices with the similar capacity into the same cluster. By which, the scheduling problem in distributed system can be converted into DAG (DAG, Directed
25
Acyclic Graph) scheduling problem. Fig.1(a) shows the structures of the single scheduler system used the multi-scheduler system. The single scheduler system is used in the traditional solo scheduler environment, while the multi-scheduler system can be used in the distributed environment. In Fig.1(a), the single scheduler controls the entire scheduling process, and allocates the task slices into the
30
separate devices. Therefore, the whole loads of tasks scheduling are concentrated on the single scheduler. If the amount of tasks is very large, the single scheduler will be in the overload risk. Fig.1(b) shows the topological structure of multi-scheduler system, where the schedulers cooperatively receive and allocate the tasks, thus the loads of tasks scheduling are balanced. By the comparison of 2
35
Fig.1, the multi-scheduler system is the future trend in distributed environment.
6FKHGXOHU
'HYLFH
'HYLFH
6FKHGXOHU
6FKHGXOHU 'HYLFH 6FKHGXOHU
'HYLFH
(a)The single scheduler system
'HYLFH 6HW (b)The multi-scheduler system
Figure 1: The topological structure of single scheduler and multi-scheduler scheduling system
2. Related Work Recently, the DAG scheduling methods can be summarized into the following 5 categories: 1) Weight Priority Methods. These methods, such as HEFT[9], DCP(Dynamic 40
Critical-Path)[10], determine the scheduling sequence of task slices, by the rank or weight. The rank or weight of task slices can be obtained directly, according to the structure of task. Thus the Weight Priority Methods can utilize EFT (earliest finish time) or EST (Earliest Start Time) policy to get the scheduling result. The advantage of these methods is the low computing complex O(n2 ).
45
2) Level Priority Methods. These methods, such as HDS(Hierarchical DAG Scheduling)[11], PSDS(Parallel Sparse Direct Solver)[12], CPDS(Critical Path based dynamical DAG Scheduling)[13], treat the levels of task slices in DAG as their priority. Thus, the Level Priority Methods guarantee the tasks have a parallel scheduling result. With maximizing the parallelization of each task slice,
50
the objective of minimizing the total execution time can be realized indirectly. The advantage of these methods is the high resource occupancy ratio, which is beneficial for the load balance research, but the disadvantage is the poor performance on the case that, the structure of task is complex and the load on each level is imbalanced.
3
55
3) Roll Back Methods. These methods, such as HEFT-lookahead[14], utilize the roll back policy to reschedule the previous result to adjust the global scheduling result. These method has a better result, but also a higher computing complex O(n3 ). However, these methods are static scheduling methods, can not be used in the real time scheduling and dynamic scheduling field.
60
4) Path Clustering Methods. These methods, such as PCH(Path Clustering Heuristic)[15], HHDS(Hybrid Heuristic Dag Scheduling)[16], BCHCS(Budget Constrained Hybrid Cloud Scheduler)[17], DCP(Dynamic Critical-Path)[10], give the task slices in the critical path a higher priority, to reduce the total cost. The advantage of these methods is the better performance on the case
65
that, the task has more levels and the amount of task slices is less, however, when the amount of task slices in each levels are high, they perform poorer. 5) Heuristic Methods. These methods such as genetic methods[18], particle swarm optimization methods[19, 20], optimize the global scheduling result according to the cost function. These methods iteratively adjust the scheduling
70
result, adaptively reserving the local optimization result to realize the global optimization. The advantage of these methods is the better performance, and suitable to be used by various optimal policies. The disadvantage is that it is easy to fall into a local optimum process and obtain an instability result. In the distributed environment, the schedulers sent the scheduling result to
75
the devices in parallel, shown in Fig1.(b), causing the conflict on the devices[21]. The scheduling policy of the traditional methods, such as HEFT, DCP can not adjust the scheduling result according to the device load and device conflict, in dealing with multiple tasks. Therefore, they do not fit to the multi-scheduler system. For the problems above, we treat the device load as the measurement
80
of device conflict, by which we design the feedback mechanism to adjust the following scheduling result in each scheduler. Based on the feedback mechanism, the cooperative scheduling in distributed system can be realized.
4
3. Problem Description and the Notations The distribution tasks are composed by the task slices, and the task slices are 85
the units of task scheduling. The scheduling tasks are heterogeneous computing takes. Task slices in the same level do not belong to the same task category, so the execution times of task slices in the same level would not have the same execution times. The task slices in the DAG task have the following constraints: (1) The DAG task is composed by the task slices, which are the scheduling
90
units. There exists solo start slice (entrance slice) and end slice (exit slice) in each DAG task. (2) The device which can execute the same task slice is not unique, and the execution times of task slices in each device are known previously. (3) At the same time, one device can only execute one task slice.
95
(4) If vj is the successive task slice of vi , vj needs to receive all the data transmitted from vi . That is the precedence constraint. If vi and vj are executed in the same device, the communication cost is 0. (5) The task slice has to receive all the data from precedent slices before its execution.
100
The DAG scheduling can be described by the following symbols: (1) The DAG can be denoted by G=(V , E), where G represents the task, and V ={v1 , v2 , . . . , vn } represents the task slices in G, with |V |=n representing the number of task slices. (2) D={d1 , d2 , . . . , dm } represents the device set, where |D| is the number of
105
devices. (3) ci,j is the communication cost from device di to dj . The HEFT utilizes the rank of task slices as the sorting criterion, and the task slice with the maximal rank has the highest priority to be scheduled. The equation of the rank is as follows:
ranki = Wi +
max
vi ∈succ(vi )
5
(ci,i + ranki )
(1)
110
where succ(vi ) is the successors of vi , Wi is the average execution time of vi on each device, when vi is the end slice, ranki = Wi . Fig.2 shows the structure of a DAG task sample, where the nodes represent the task slices, and v1 , v8 are the start and end slices. In Fig.2 the directed edges represent the successive relationship between task slices, and the weights
115
of the edges represent the communication cost. The execution times of all the task slices are listed in Table 1, and the ranks calculated by Eq(1) are listed in Table 2. By the rank list in Table 2, the priority sequence of the task slices is: v1 , v2 , v4 , v3 , v5 , v6 , v7 , v8 . The scheduling result of Fig.2 task is show in Fig.3(a) by HEFT, and the execution time is 111. Y Y
Y
Y
Y
Y
Y
Y
Figure 2: The structure of DAG
Table 1: The parameters of the 3 synthetic datasets
devices
v1
v2
v3
v4
v5
v6
v7
v8
d1
17
22
15
20
13
49
17
13
d2
19
27
25
9
27
49
16
10
d3
21
17
9
22
18
46
15
9
Table 2: The parameters of the 3 synthetic datasets
120
v1
v2
v3
v4
v5
v6
v7
v8
rank
130.67
99.67
80
97.67
79
64.67
52.67
10.67
rank
156.6
118.87
88.8
120
90.2
79.33
59.07
13.07
The traditional DAG methods based on rank, such as HEFT, PCH, Looka6
G
Y
G
Y
G
Y
Y
Y
G
Y
G
Y Y
Y
Y
Y
(b)The HEFT scheduling result as d3 delay
Y
G G
Y
(a)The original HEFT scheduling result
Y
Y
G
Y
Y Y
G
Y
Y Y
Y
Y
Y
Y
Y
(c) The HEFT scheduling result by consider d3 delay
Figure 3: The HEFT scheduling result under the 3 conditions
head, HHDS, are the single scheduler methods, with the topological structure shown in Fig.1(a). The advantage of the single scheduler is that it can centralized manage the multiple tasks and allocate the devices, thus the beginning and ending time of each task slice can be determined without the device conflict. And 125
the disadvantage of the single scheduler is the star topological structure. The centralized scheduler undertakes all the task scheduling loads, which is in the overload risk. However, for the multi-scheduler system shown in Fig.1(b), there exists the device conflict of task slices, caused by the paralleled task scheduling. If one task slice is scheduled on a occupied device, the task slice would
130
be suspended, because of device confliction. That will decrease the execution efficiency. There is a DAG task shown in Fig.2, if d3 is the critical device and the delay rate is 1.8, implying the task slices executed on d3 are delayed 1.8 times, the execution time of each task slices on d3 is: v1 (37.8), v2 (30.6), v3 (16.2), v4 (39.6),
135
v5 (32.4), v6 (82.8), v7 (27.0), v8 (16.2). That implies the task slices allocated on d3 need to prolong their execution time as the conflict on d3 . Fig3.(b) is the scheduling result in the case of conflict on d3 delay, where the finish time of v7 is 91, that delays the beginning time of v8 . Iteratively, the total execution time is extended to 123. The reason is that the rank in Eq(1) lacks of the consideration
140
on the device conflict, for the multiple tasks scheduling. However, if add the
7
delay rate of d3 into the scheduling, namely d3 ← d3 × 1.8, the new rank can be obtained, and Table 1 lists the rank . The priority sequence of task slices with rank is v1 , v4 , v2 , v5 , v3 , v6 , v7 , v8 , where the task slices v4 , v2 , v5 , v3 have changed their orders. The scheduling result is shown in Fig.3(c), where 145
the total execution time is 115 which is better than that of Fig.3(b). Essentially the objective of rank is to predict the execution time for each task slice as the task beginning. However, there exists the device conflict among the tasks. The amount of competitive task slices can impact the whole efficiency. The delayed task slice will influence the successive task slices, iteratively, causing
150
the extension of the total execution time. To resolve this issue, we designed the MTDR (Multi Task Dynamic-Rank) scheduling method, and the contributions are the following: (1) Established the multiple task parallel scheduling methods for the multischeduler system, by which each task can be scheduled separately.
155
(2) The feedback mechanism can control the following task allocation, to reduce the device conflict. (3) We provide the dynamic load measurement, denoted by LSF, to evaluate the conflict state on the devices.
4. Multi-Scheduler Scheduling Method 160
4.1. LSF (load state feedback) model The rank represents the ideal priority of the task slice. According to Eq(1), the ranki of vi is mainly depend on the device, and Eq(1) can be rewritten as:
ranki =
|d| r(dj , vi ) j=1
|d|
+
max
vi ∈succ(vi )
(ci,i + ranki )
(2)
where |d| is the number of devices in the distributed environment, r(dj , vi ) is the time cost of vi executed on dj . When the device dj has a higher occupancy 165
rate, the conflict of task slices on dj is more intensive. Thus, the execution time of task slice vi on dj will be delayed, implying the execution time of vi on 8
dj will be greater than r(dj , vi ). Therefore, it needs to give a higher priority to the task slices which will be executed on dj , to reflect the delay of r(dj , vi ). Thus the ranks of all the successive task slices will get adjustment, by 170
r(dj , vi ) ← (1 + Bj )r(dj , vi ), where Bj is the Busyness of dj and Bj ≥0. The Busyness can represent the conflict state of dj , and Eq(2) can be expressed as the following form:
ranki =
|d| j=1
(1 + Bj )
r(dj , vi ) + max (ci,i + ranki ) |d| vi ∈succ(vi )
(3)
When dj is idle, Bj =0, the ranki in Eq(3) is equal to that in Eq(1). If the task slices executed on dj have a larger amount and longer execution times, dj 175
will provide more contribution on the rank. Thus, if a task slice vi has a long execution time on dj , it would have a large increment on rank and r(dj , vi ), implying that if the priority of vi gets larger the vi would be scheduled earlier. So, the vi tends to be scheduled on other devices. Thus the load of dj will be reduced, and the conflict on dj will be relaxed. According to Eq(3), Bj can
180
control the value of rank, therefore, the device can feedback the Busyness B to the scheduler to control the successive scheduling result. For the Busyness modeling, we utilize the time window to calculate the Busyness. Fig.4 shows the 3 time windows {(T0 , T0 +T ), (T0 , T0 +T ), (T0 , T0 + T )} on device d, where the length of the window is T . The Busyness of d can
185
be obtained, at T0 + T , T0 + T , T0 + T , according to the occupancy state in the windows of (T0 , T0 + T ), (T0 , T0 + T ) and (T0 , T0 + T ). In Fig.4, at T0 + T there are 3 occupancy segments τ1−3 in (T0 , T0 + T ), with the beginning time t1−3 , while at T0 + T there are 5 occupancy segments τ1−5 in (T0 , T0 + T ),
where τ4 and τ5 are the new segments in (T0 + T, T0 + T ). At T0 + T there are 190
5 occupancy segments τ1−5 in (T0 , T0 + T ), where τ5 is the new segments in
(T0 + T, T0 + T ). We utilize the frequency and the beginning time of the occupation segments
9
W
W
IJ
W
W
W
W
IJ
W
IJ
IJ
W
W
W
7
W
IJ
W
W
IJ
W
IJ
IJ
W
IJ
7
IJ
7 7
7 7
7 7
IJ
W
IJ
IJ
7 7
Figure 4: The illustration of the time windows on device d
as the input of Busyness calculation, designed the following equation: f B=
i=1
T
w i τi
(4)
where f is the amount of task slices in the window, implying the frequency, T 195
is the length of window, τ ={τ1 , τ2 , . . . , τf } is the execution time of the task slices in the window, wi is the weight of τi . The principle of Eq(4) is that: 1) The wi is influenced by ti , assuming T ∗ is the ending time of the window, if T ∗ − ti is less implying the ti approach to the end of the window, the impact of τi on B is greater and the wi is greater. 2) In the window, if the amount
200
of task slices is higher representing the device busier, the B would have a large value. For the modeling of w and t, the t∗ is the normalized t, denoted by t∗ =[T − (T ∗ − t)]/T , 0≤ t∗ ≤ 1, and we give the follow conditions to construct the function w=f unction(t∗ ). (1) If t the beginning time of the time slice is closer to the ending time of the
205
window, the w of which is greater. Thus, f unction(t∗ ) is an increasing function in the interval of [0,1], for t∗ = 1, the w has the maximal value w=1. So the w has the interval of 0≤ w ≤1. (2) If the weight of a time slice is w at the normalized beginning time t∗ , the distance between t∗ and T ∗ is 1-w. Therefore, the increment of w at t∗ , namely
210
w(t∗ + t) − w(t∗ ), is proportional to the w, while inversely proportional to the
10
1-w. According to the 2 conditions above, the differential equation of w and t can be constructed as the following: w(t∗ − Δt∗ ) − w(t∗ ) = k
w Δt∗ 1−w
(5)
where k is the regulation parameter, the following equation is obtained by solv215
ing the differential equation. ∗ w = k0 ekt w e
(6)
For t∗ =1 and w=1, k0 =e−1−k , the expression of w=f unction(t∗ ) can be obtained as: w = −lambertw(0, −ek(t
∗
−1)−1
)
(7)
where the lambertw function is the ’Lambert W Function’. The T In Fig.4 is 35, for k=1.5, the Busyness of d at T0 +T , T0 +T , T0 +T is 0.13, 0.15, 0.16, 220
according to Eq(7). Integrated the Eqs(1)(3)(4)(7), the ranki of vi can be obtained as the following:
ranki =
|d| j=1
(1 +
f (j) t −T ∗ 1 i (j) r(dj , vi ) −lambertw(0, −ek( T )−1 )τi ) T i=1 |d|
+
max
vj ∈succ(vi )
(ci,j + rankj )
(8)
4.2. MTDR(Multi Task Dynamic-Rank scheduling method) We designed the MTDR (Multi Task Dynamic-Rank scheduling method), according to the multi-scheduler topological structure in Fig.1(b). The Busyness 225
has considered the confliction of simultaneous schedule of the multiple schedulers, by minimizing the conflict loss. So if the tasks arrive at different schedulers at the same time, the schedulers can assign the tasks to the devices simultaneously, then the devices process these tasks by arrival time order. Each scheduler follows the following policy to scheduling the task G at T ∗ : 11
230
(1) The busyness of each device is calculated during (T ∗ -T , T ∗ ), where T is the length of the window. (2) Calculate the ranks of all the task slices in G, according to Eq(8). Then sort the task slices by the rank. The sorted sequence is Q. (3) Assuming the first element of Q is vi , select the device dj , on which vi can
235
finish earliest, as the execution device. Then remove the first element from Q. (4) Repeat the step (3) until Q is empty. Fig.5 shows the DAG structures of task A, B and C, with the device set D={d1 , d2 , d3 }. The execution time of each task slice on the devices is listed in
240
table 3. The arrival time of the 3 tasks is 20, 40, 60. Fig.6 shows the schedule process of 3 tasks on the 3 schedulers. The arrival time of the 3 tasks is 20, 40, 60. In Fig.6(a)(b), when the task A arrive at Scheduler 1, devices d1 , d2 and d3 feedback their Busyness B1 , B2 and B3 to Scheduler 1, then Scheduler 1 allocate task slices to d1 , d2 and d3 according to Busyness. Correspondingly,
245
Fig.6(c)(d) and Fig.6(e)(f) show the schedule process of B and C on Scheduler 2 and Scheduler 3. Fig.7 shows the scheduling result of the 3 tasks with T =20, k=1.5. Fig.7(a) shows the occupancy state as the task A arriving at T ∗ =20. At T ∗ =20, Busyness of the 3 devices is B1−3 ={0.09, 0.00, 0.11}. According to Eq(8), the ranks of
250
A1−6 are 34.48, 23.21, 24.93, 13.43, 16.28, 5.05, and the Fig.7(b) shows the scheduling result of task A. As the task B arriving (T ∗ =40), the Busyness of the 3 devices is B1−3 ={0.15, 0.11, 0.10}, with the ranks of B1−6 are 34.89, 21.57, 27.44, 14.81, 14.66, 3.02, and Fig.7(c) shows the scheduling result of task B. As the task C arriving (T ∗ =60), the Busyness of the 3 devices is
255
B1−3 ={0.12, 0.13, 0.04}, with the ranks of C1−6 are 39.21, 25.71, 26.23, 13.99, 13.21, 5.97, and the Fig.7(d) shows the scheduling result of task C.
12
$
%
$
$
%
%
&
&
$
&
%
&
$
%
$
&
%
&
Figure 5: The structures of the task A, B, C
Table 3: The execution time of the task slices in the 3 tasks A1
A2
A3
A4
A5
A6
B1
B2
B3
B4
B5
B6
C1
C2
C3
C4
C5
C6
d1
5
4
1
8
3
9
2
6
8
5
3
3
9
5
4
4
1
9
d2
9
3
7
3
8
2
2
5
6
4
2
4
9
4
2
3
2
6
d3
2
9
8
4
4
3
8
2
4
1
2
1
5
9
8
4
9
1
5. The Analysis on the parameters of MTDR According to Eq(7), the weight w of τ during (T ∗ − T, T ∗ ) is influenced by the parameter k. Fig.8 shows the functional relationship of w against k. It 260
can be seen from Fig.8, if the beginning time t of τ is close to T ∗ , namely t∗ approaching to 1, the weight w is greater. When k has a large value, w is close to 0 for t∗ =0. In this case, the ratio of wi and wj gets less, where wi is the weight of τi far from the end of window, wj is the weight of τj near to the end of window. It implies that, when k is large the influence of τj on Busyness is
265
weaker than that of τi . k and T are key parameters of MTDR, and the optimal values of k and T are various for different experimental data. To analyze the optimal ranges of the parameters, we designed the following 2 experimental analysis methods. 5.1. The analysis on k
270
We utilize the experimental study to test the effective value of k. The designed experimental scheme is as follows:
13
^ĐŚĞĚƵůĞƌϮ
^ĐŚĞĚƵůĞƌϮ
^ĐŚĞĚƵůĞƌϭ
% % %
^ĐŚĞĚƵůĞƌϭ
^ĐŚĞĚƵůĞƌϯ
^ĐŚĞĚƵůĞƌϯ
$ $
%
%
Ěϭ ĚϮ
%
$ $ $
$
Ěϭ
Ěϯ
ĚϮ
Ěϯ
(a) At T =20 feedback the Busyness to Scheduler 1 (b) At T =20 Scheduler 1 allocate task slices to A
% ^ĐŚĞĚƵůĞƌϮ % %
^ĐŚĞĚƵůĞƌϭ
^ĐŚĞĚƵůĞƌϯ
^ĐŚĞĚƵůĞƌϮ
^ĐŚĞĚƵůĞƌϭ
^ĐŚĞĚƵůĞƌϯ
% %
%
%
Ěϭ ĚϮ
%
% % % %
Ěϭ
Ěϯ
ĚϮ
Ěϯ
(c) At T =40 feedback the Busyness to Scheduler 2 (d) At T =40 Scheduler 2 allocate task slices to B
^ĐŚĞĚƵůĞƌ Ϯ
^ĐŚĞĚƵůĞƌϭ
% % %
^ĐŚĞĚƵůĞƌ Ϯ
^ĐŚĞĚƵůĞƌϯ
^ĐŚĞĚƵůĞƌϭ
^ĐŚĞĚƵůĞƌϯ
& & & &
&
%
%
Ěϭ ĚϮ
%
Ěϭ
Ěϯ
&
ĚϮ
Ěϯ
(e) At T =60 feedback the Busyness to Scheduler 3 (f) At T =60 Scheduler 3 allocate task slices to C
Figure 6: The scheduling process of tasks A, B and C
14
% % %
G
$
$
G $
G
G
G
G
$ % % %
$$
$
$ $
$ %% % % %
%
(b)The scheduling result of task A at T ∗ =20
(a)The original HEFT scheduling result G
% % %
$
$
$ %%
&
& &
&
G $
G
% % %
%
&
G $$
$$
G
& % % %
G
∗
(d)The scheduling result of task C at T ∗ =60
(c)The scheduling result of task B at T =40
Figure 7: The Gantt charts of the 3 tasks
1.00
k
wj
0.75
4
w
wi
3
0.50
2
T*
0.25
0.00 0.00
0.25
0.50
0.75
t* Figure 8: The function relationship of k against w
15
1.00
1
Experimental Data. We utilize the Workflow Generator[22] to generate 3 groups of tasks, where each group comprises 100 tasks. The 3 groups of tasks are generated with the following parameters: Group1 (|V |=100, average size of 275
the task slices is 10M), Group2 (|V |=500, average size of the task slices is 10M), Group3 (|V |=1000, average size of the task slices is 10M). Device (|D|=9, the processing speed is 6Mbps, the data transmission speed is 8Mbps). Contrast Methods. We select the HEFT[9], PCH[15], HHDS[16] as the contrast methods. These methods all utilize the rank in Eq(8) as the scheduling
280
criterion, and these methods are all the single scheduler methods. Experimental Process. To simulate the multiple tasks, randomly set the different arriving time for each task in the group, and the arriving interval is a unit of time. Thus, if the time unit is less, there will exist the device conflict between the successive tasks. For the 3 groups data, the time unit is
285
T ={10S, 20S, 30S}, for instance, the 100 tasks in Group1 will arrive at 1S, 11S, 21S, . . . , correspondingly. We utilize the MTDR method, with the parameters k={0.5, 1.5, 2, . . . , 5} and T =16S, to schedule the 3 groups of tasks, scheduling 50 times for each k. Fig.9 shows the distribution of the total execution time of MTDR against
290
k, and also gives the scheduling result of HEFT, PCH, HHDS, where the boxes represent the distribution state. It can be seen from Fig.9, for 0.5< k <3, the entire distribution of MTDR performs best, while for k >3 the scheduling result of HEFT, PCH, HHDS is better than MTDR. Therefore, the optimal range of k in the experiment is 0.5 < k < 3. We have carried out many tests by this
295
experimental method on various data, to obtain the optimal value of k analysis. The conclusion we obtained is that the optimal value of k concentrates in the range of 0.5 < k < 3 with a probability of 0.95. So 0.5 < k < 3 can be seen as the experiential value of k. 5.2. The analysis on T
300
Experimental Process. 1) Calculate the average execution time of all the task slices for each task in Group1-3, denoted by T 0 , and set T =N × T 0 . 2) 16
The total execution time (S)
Group1 1600 1400 1200 1000 800 Methods
Group2 4000
MTDR
3000
HEFT
2000
PCH HHDS Group3
9000 8000 7000 6000 5000 0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
HEFT PCH HHDS
k
Figure 9: The distribution of the total execution time against k
Carry out the MTDR with k=1.5 on each group for 50 times, and record the results. Fig.10 shows the distribution of the total execution time of MTDR against 305
N , and also gives the scheduling result of HEFT, PCH, HHDS, where the boxes represent the distribution state. It can be seen from Fig.10, for 4< N <16, the MTDR performs best, while for N >16 the scheduling result of HEFT, PCH, HHDS is better than MTDR. Therefore, the effective value of N is 4< N <16, namely 4T 0 < T < 16T 0 . We have carried out many tests by this
310
experimental method on various data, to obtain the optimal value of T analysis. The conclusion we obtained is that the optimal value of T concentrates in the range of 4T 0 < T < 16T 0 with a probability of 0.87. So 4T 0 < T < 16T 0 can be seen as the experiential value of T . By the analysis above, the experiential parameter values of MTDR are 0.5 <
315
k < 3, 4T 0 < T < 16T 0 .
17
Group1
The total execution time (S)
1600 1400 1200 1000 800 2
4
6
8
10
12
14
16
18
20
HEFT HHDS PCH Methods
Group2
MTDR 4000 HEFT 3000 HHDS
2000 2
4
6
8
10
12
14
16
18
20
HEFT HHDS PCH
16
18
20
HEFT HHDS PCH
PCH
Group3 10000 8000 6000 2
4
6
8
10
12
14
N
Figure 10: The distribution of the total execution time against N
6. Experiments In this section, we also use the Workflow Generator to generate the experimental data. The generated data D200 with the following generative parameters, the number of tasks is 200, the number of task slices is 300 for each task, and 320
the size of the task slices is randomly set in the interval [10M, 15M]. The device set comprises 6 devices with the processing speed 6Mbps, and the data transmission speed is randomly set in the interval [5M, 10M]. The HEFT[9], HHDS[16] and PCH[15] are selected as the contrast methods. The HEFT, HHDS, and PCH are the representative methods of Weight Priority Methods, Level Pri-
325
ority Methods and Path Clustering Methods. All the 3 methods schedule the tasks based on the rank. The proposed MTDR is also based on the rank, and the improvement of MTDR is to control the influence of rank on the scheduling priority. So these selected methods can effectively contrast the effect of MTDR. 6.1. Arrival time test
330
To analysis the influence of arrival time of the 200 tasks in D200 on the scheduling result, we set the arrival time as 5×n(S), implying the arrival time 18
of the n-th rask is 5n(S). If n is less, the tasks are more intensive, and the device conflict is fiercer. Fig.11 shows the comparison of the 4 methods MTDR (k=1.5, T =20), HEFT, PCH, HHDS, on the average execution time of the 200 tasks, 335
against n. For there exists the conflict on the multiple tasks, if the method have a sufficient consideration on the device loads, the average execution time of the 200 tasks tends to be lower. In Fig.11, the MTDR has a lowest average execution time for n < 10, implying that the MTDR performs better than the other methods under the device conflict condition. For n > 15, the successive
340
arrival time has a long distance, and the conflict gets less. Thus the average execution time of the 4 methods gets closed to each other, implying the MTDR has the similar performance with the classic DAG methods under the condition
7KHDYHUDJHH[HFXWLRQWLPH6
of less device conflict. 0HWKRGV
07'5 +()7 ++'6
3&+
Q
Figure 11: The comparison on the average execution time of the 200 tasks
To analyze the the performance of the 4 methods on the random arrival time, 345
we set each task a random n for the 200 tasks in D200, and the random vaule of n is in the range of (1 20). Table 4 shows the times of each method obtaining the optimal scheduling result. For example, for k=1.0 and T =15, MTDR obtains 64 optimal scheduling results in the 200 task scheduling, while HEFT, HHDS and PCH obtain 34, 61 and 41 optimal scheduling results respectively. It can 19
350
be seen from table 4, MTDR performs bettter than the other 3 methods for various k and T . Table 4: The times of 4 methods obtaining the optimal scheduling result
Parameter
MTDR
HEFT
HHDS
PCH
k=1.0, T=15
64
34
61
41
k=1.0, T=20
72
24
61
43
k=1.0, T=25
59
50
43
48
k=1.0, T=30
79
42
53
26
k=1.5, T=15
62
42
49
47
k=1.5, T=20
63
46
33
58
k=1.5, T=25
74
38
48
40
k=1.5, T=30
75
40
42
43
k=2.0, T=15
79
47
51
23
k=2.0, T=20
69
43
39
49
k=2.0, T=25
66
50
42
42
k=2.0, T=30
74
49
60
17
k=2.5, T=15
77
33
54
36
k=2.5, T=20
81
34
32
53
k=2.5, T=25
62
52
43
43
k=2.5, T=30
75
40
47
38
6.2. Device dependence test The device dependence is the device selection tendency of the scheduling methods. To analyze the device dependence of each method, we utilize the 355
device occupancy rate as the measurement, and the expression of the device occupancy is the following: o(di ) occupancy rate = |D| j=1 o(dj )
(9)
where o(dj ) represents the occupied time on device dj , |D| is the number of devices. If one method have a higher occupancy rate on the device d, the method
20
has a higher device selection tendency on d. Thus, the methods considering the 360
load balance will have a uniform distribution of occupancy rate on the devices. Fig.12 shows the occupancy rate of the 4 methods, MTDR (k=1.5, T =20), HEFT, PCH, HHDS, on the data D200, and the distance of successive arrival time is 20S. We designed 5 devices Device1-5, the processing speed is: 10Mbps, 9Mbps, 8Mbps, 7Mbps, 6Mbps correspondingly. The total execution time of the
365
4 methods is: 158S(MTDR), 202(HEFT), 178(PCH), 183(HHDS). By the comparison from Fig.12, the methods of HEFT, PCH and HHDS have an obviously higher occupancy rate on Device1 than Device2-5, and the occupancy rate has the downward trend from Device1 to Device5, implying the device dependence of HEFT, PCH, HHD on Device1 is higher than on Device2-5. However, the
370
MTDR consider the device confliction, tend to schedule the task slices onto the other low efficient devices. So the utilization rate of Device2-5 is much higher. In contrast, other methods mainly tend to allocate the task slices onto the high efficient device. So the utilization rate of from Device1 to 5 gets gradually smaller. Because of the device confliction, the task execution times of other
375
methods are long, and then the ratio of occupancy and task execution times is lower than that of MTDR. The occupancy rate distribution of MTDR is relatively uniform, implying the MTDR can fully utilize the low efficient devices to reduce the task execution time and has a better performance on the load balance. By which the MTDR can sufficiently utilize the low efficient devices to reduce the total execution time. Device1
Device2
Device3
Device4
Device5
0.4
Methods MTDR
0.3
HEFT
0.2
HHDS 0.1
PCH
PCH
HEFT
HHDS
MTDR
PCH
HEFT
HHDS
MTDR
PCH
HEFT
HHDS
MTDR
PCH
HHDS
HEFT
MTDR
PCH
HEFT
HHDS
0.0 MTDR
occupancy rate
380
Figure 12: The occupancy rate of the 4 methods on the Device1-5
21
6.3. Task structure test In order to analysis the influence of DAG structure on the scheduling result, we designed the data D150. The structure of D150 comprises 8 levels, where the first level and the eighth level contains only 1 task slice, and the amount 385
of task slices in other levels is randomly generated with the total amount is 100 constantly. The size of the task slices is randomly set in the interval [10 M, 15 M]. The devices used are the same 4 devices, with the processing speed 5Mbps, and the data transmission speed 8Mbps. The distribution of task slices in each level represents the feature of the DAG structure. Thus, we utilized
390
the information entropy to evaluate the feature of D150. The expression is the following: entropy = −
n
pi log(pi )
(10)
i=1
where pi =Li /100, Li is the number of task slices in the i-th level. The distribution of task slices in each level is more uniform, and the value of entropy is larger. For instance, for L1−8 ={1,16,16,16,16,17,17,1}, the entropy obtained 395
the maximal value of 1.87, conversely, for L1−8 ={1,93,1,1,1,1,1,1}, the entropy obtained the minimal value of 0.39. Therefore, we separated the D150 into 6 groups by the entropy={(0.39 0.65], (0.65 0.90], (0.90 1.15], (1.15 1.40], (1.65 1.87]}. Each group contains 150 tasks, and the distance of successive arrival time is 10S.
400
Fig.13 shows the distribution of the total execution time for the 6 groups, where the MTDR, in the condition of entropy < 1.40 implying the distribution of task slices is nonuniform, performs best. The reason is that, when the distribution of task slices is nonuniform on each level, the amount of task slices in a specific level is large. Thus, lots of task slices in the same level have the similar
405
ranks, will lead to the intensive device conflict. In this case, the MTDR has the better efficiency than the others.
22
The total execution time (S)
2200
Methods MTDR HEFT
2000
HHDS PCH 1800
1600 0.40~0.65
0.65~0.90
0.90~1.15
1.15~1.40
1.40~1.65
1.65~1.90
entropy
Figure 13: The total execution time of the 4 methods on in 6 entropy intervals
6.4. CCR(Communication computation ratios) test The objective of CCR test is to verify the stability of the methods with varying the transmission speed. For that, we utilize the D200 as the experimental 410
data, and designed the following process: 1) Enlarge the transmission speed of D200 for 1, 0.8, 0.6, 0.4, 0.2 times, forming 5 groups DAG tasks. Thus, the random intervals of the 5 groups DAG tasks are: [5Mbps, 10Mbps], [4Mbps, 8Mbps], [3Mbps, 6Mbps], [2Mbps, 4Mbps], [1Mbps, 2Mbps]. 2) The distance of successive arrival time is 20S. 3) Utilize the 4 methods, MTDR (k=1.5, T =20),
415
HEFT, PCH, HHDS, to schedule the 5 groups of tasks for 100 times and record times of the methods obtaining the best result on each group. Fig.14 shows the statistical results, where the MTDR performs best with enlarge ratio=0.2. With reducing the enlarge ratio, the superiority of MTDR is getting obvious. The reason is that, with the reduction of transmission speed, the device conflict
420
becomes intensive. In this case, the MTDR has the better efficiency than the others. 6.5. Device set test The objective of this experimental test is to analysis the efficiency of MTDR (k=1.5, T =20). We designed the following experimental process: 1) Generate
425
the data D500: contains 500 tasks, and for each task there are 300 task slices,
23
enlarge ratio=0.4
enlarge ratio=0.6
enlarge ratio=0.8
enlarge ratio=1 Methods
30
MTDR
20
HEFT HHDS
10 PCH
PCH
HEFT
HHDS
MTDR
PCH
HHDS
HEFT
MTDR
PCH
HEFT
HHDS
MTDR
PCH
HEFT
HHDS
MTDR
PCH
HEFT
HHDS
0 MTDR
The frequency of best performance
enlarge ratio=0.2 40
Figure 14: The histogram of obtaining the best result times for the 4 methods on each groups
with the size randomly set in the interval [10M, 15M]. 2) The distance of successive arrival time is 20S. 3) There are 3 types of devices, such as the high efficient device with processing speed 10Mbps, middle efficient device with processing speed 6Mbps, low efficient device with processing speed 3Mbps. The 430
designed 6 device sets Set1-6 are list in table 5, where each set contains 9 devices. For instance, in Set1 there are 2 high efficient devices, 4 middle efficient devices, 3 low efficient devices. 4) Schedule the D500 on each device set utilizing the MTDR method with k=1.5, and record the execution time of each task on D500.
435
Fig.15 is the scatter diagram of the 500 tasks on 6 sets, where the efficiency of the sets is decreasing from Set1 to Set6, and the execution time of the 4 methods is increasing. In Fig.15, the MTDR has the superiority over the other methods on Set1 and Set2, while the performance of MTDR is close to the others on Set5 and Set6. The reason is that, the distribution of the 3 types
440
of devices is relatively uniform, thus the MTDR considering the balance load performs better. However, the type of devices tend to be the same on Set5 and Set6, thus the MTDR performs closely to the other methods.
7. Conclusion We designed a multi-scheduler scheduling method MTDR orient to the dis445
tributed environment. The innovative thinking of this paper is utilizing the
24
Table 5: The 6 device sets Set1-6
Device Types
Set1
Set2
Set3
Set4
Set5
Set6
high efficient device
2
2
1
1
0
0
middle efficient device
4
3
3
2
2
0
low efficient device
3
4
5
6
7
9
6HW
6HW
6HW
6HW
6HW
6HW
7KHWDVNH[HFXWLRQWLPH6
0HWKRGV 07'5 +()7
++'6 3&+
3&+
+()7
++'6
07'5
3&+
+()7
++'6
07'5
3&+
+()7
++'6
07'5
3&+
+()7
++'6
07'5
3&+
+()7
++'6
07'5
3&+
+()7
++'6
07'5
Figure 15: The scatter diagram of the 500 tasks on 6 sets
feedback mechanism to control the following scheduling result, denoted as LSF, to balance the device load. The proposed device Busyness modeling method, denoted by LSF, is utilizing the frequency and the beginning time of the occupation segments as the input, and treating the Busyness as the control parameters 450
of rank. By that, the task slices tend to be allocated to the relative idle devices, realizing the balance load. We utilize the experimental study to determine the parameters’ optimal interval, and verified the MTDR performs better than other methods, such as HEFT, PCH and HHDS. The experimental process can be used as the general
455
method to analysis the parameters. In the experiments, the efficiency of MTDR is tested on the arrival time, device dependence, task structure, CCR, device set, 5 aspects. The experiments shows that because of considering the device load the MTDR performs best in multiple tasks scheduling with device conflict. When the conflict is less, the performance of MTDR is close to the other classic
25
460
DAG scheduling methods. Thus the MTDR can also meet the requirement of single scheduler system. The MTDR method we designed can provide a basis for big data scheduling, cloud computing scheduling, having practical significance on the study of parallel scheduling. Furthermore, the drawback of MTDR is unable to determine
465
the optimal value of k and T . Thus the further work is to research the optimal value of k and T . Acknowledgement We acknowledge the support of the National Natural Science Foundation of China under Grant Nos.61602133, 61672179, 61370083, 61370086; the Hei-
470
longjiang Postdoctoral Science Foundation(LBH-Z15096); China Postdoctoral Science Foundation Funded Project (2016M591541). References [1] J. Dean, S. Ghemawat, Mapreduce: Simplified data processing on large clusters., In Proceedings of Operating Systems Design and Implementation
475
(OSDI 51 (1) (2004) 107–113. [2] J. Dean, Mapreduce: simplified data processing on large clusters, Communications of the Acm 51 (1) (2008) 147152. [3] S. Bhardwaj, L. Jain, S. Jain, Cloud computing: A study of infrastructure as a service (iaas), International Journal of engineering and information
480
Technology 2 (1) (2010) 60–63. [4] M. Malawski, G. Juve, E. Deelman, J. Nabrzyski, Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in iaas clouds, Future Generation Computer Systems 48 (1) (2015) 1–18. [5] H. Xu, B. Yang, W. Qi, E. Ahene, A multi-objective optimization approach
485
to workflow scheduling in clouds considering fault recovery, Ksii Transactions on Internet & Information Systems (2016) 1–18. 26
[6] X. Nan, Y. He, L. Guan, Optimal resource allocation for multimedia cloud based on queuing model., in: IEEE 13th International Workshop on Multimedia Signal Processing (MMSP), 2011, 2011, pp. 1–6. 490
[7] H. Kllapi, E. Sitaridi, M. M. Tsangaris, Y. Ioannidis, Schedule optimization for data processing flows on the cloud, in: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, 2011, pp. 289– 300. [8] C. Lin, S. Lu, Scheduling scientific workflows elastically for cloud comput-
495
ing., in: 2013 IEEE Sixth International Conference on Cloud Computing, 2011, pp. 746–747. [9] H. Topcuouglu, S. Hariri, M. Y. Wu, Performance-effective and lowcomplexity task scheduling for heterogeneous computing, Parallel & Distributed Systems IEEE Transactions on 13 (3) (2002) 260–274.
500
[10] Y. K. Kwok, I. Ahmad, Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors, IEEE Transactions on Parallel & Distributed Systems 7 (5) (1996) 506–521. [11] W. Wu, A. Bouteiller, G. Bosilca, M. Faverge, J. Dongarra, Hierarchical dag scheduling for hybrid distributed systems, IEEE International Parallel
505
& Distributed Processing Symposium (2015) 1–11. [12] K. Kim, V. Eijkhout, A parallel sparse direct solver via hierarchical dag scheduling, Acm Transactions on Mathematical Software 41 (1). [13] Y. Ma, L. Wang, A. Y. Zomaya, D. Chen, R. Ranjan, Task-tree based largescale mosaicking for massive remote sensed imageries with dynamic dag
510
scheduling, Parallel & Distributed Systems IEEE Transactions on 25 (8) (2014) 2126–2137. [14] L. F. Bittencourt, R. Sakellariou, E. R. M. Madeira, Dag scheduling using a lookahead variant of the heterogeneous earliest finish time algorithm, in:
27
2010 18th Euromicro Conference on Parallel, Distributed and Network515
based Processing, 2010, pp. 27–34. [15] L. F. Bittencourt, E. R. M. Madeira, Towards the scheduling of multiple workflows on computational grids, Journal of Grid Computing 8 (3) (2010) 419–441. [16] R. Sakellariou, H. Zhao, A hybrid heuristic for dag scheduling on heteroge-
520
neous systems, in: Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, 2004, pp. 111b–111b. [17] A. Rezaeian, H. Abrishami, S. Abrishami, M. Naghibzadeh, A budget constrained scheduling algorithm for hybrid cloud computing systems under data privacy, in: IEEE International Conference on Cloud Engineering,
525
2016, pp. 230–231. [18] S. Tayal, Tasks scheduling optimization for the cloud computing systems, International Journal of Advanced Engineering Sciences And Technologies (IJAEST) 5 (2) (2011) 111–115. [19] S. Pandey, L. Wu, S. M. Guru, R. Buyya, A particle swarm optimization-
530
based heuristic for scheduling workflow applications in cloud computing environments, in: 2010 24th IEEE International Conference on Advanced Information Networking and Applications, 2010, pp. 400–407. [20] Z. Wu, Z. Ni, L. Gu, X. Liu, A revised discrete particle swarm optimization for cloud workflow scheduling, in: 2010 International Conference on
535
Computational Intelligence and Security, 2010, pp. 184–188. [21] Y. Xin, Z. Q. Xie, J. Yang, A load balance oriented cost efficient scheduling method for parallel tasks, Journal of Network and Computer Applications (2016) 1–15. [22] Workflow Generator, https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator
540
(2014).
28