A hybrid meta-heuristic algorithm for scientific workflow scheduling in heterogeneous distributed computing systems

Engineering Applications of Artificial Intelligence 90 (2020) 103501 Contents lists available at ScienceDirect Engineering Applications of Artificia...

Download PDF

6MB Sizes 0 Downloads 117 Views

Report

Full Text

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai

A hybrid meta-heuristic algorithm for scientific workflow scheduling in heterogeneous distributed computing systems✩ Mirsaeid Hosseini Shirvani ∗ Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran

ARTICLE

INFO

Keywords: Cloud computing Directed acyclic graph (DAG) Discrete particle swarm optimization (DPSO)

ABSTRACT Cloud computing has attracted great attentions in research community because of its ubiquitous, unlimited computing resources, low cost, and flexibility owing to virtualization technology. This paper presents a hybrid meta-heuristic algorithm to solve parallelizable scientific workflows on elastic cloud platforms since applying a single approach cannot yield optimal solution in such complicated problems. Scientific workflows are modeled in the form of directed acyclic graph (DAG) in which there exists data dependency between sub-tasks. In the cloud marketplace, each provider delivers variable virtual machine (VM) configurations which lead different performance. Generally, parallelizable task scheduling on parallel computing machines to obtain minimum total execution time, makespan, belongs to NP-Hard problem. To deal with the combinatorial problem, the hybrid discrete particle swarm optimization (HDPSO) algorithm is presented that has three main phases. At the first phase a random algorithm following by novel theorems is applied to produce swarm members; it is as input of presented new discrete particle swarm optimization (DPSO) algorithm in the second phase. To avoid getting stuck in sub-optimal trap and to balance between exploration and exploitation, local search improvement is randomly combined in DPSO by calling Hill Climbing technique at the third phase to enhance overall performance. Second and third phases are iterated till the termination criteria is met. The average results reported from different executions of intensive settings on 12 scientific datasets proved our hybrid meta-heuristic has the amount of 10.67, 14.48, and 3 percentage dominance in terms of SLR, SpeedUp, and efficiency respectively against other existing meta-heuristics.

1. Introduction Heterogeneous distributed computing systems include set of miscellaneous processors, machines and computers which are interconnected with high speed networks (Hosseini Shirvani and Babazadeh Gorji, 2020; Tanenbaum and Van Steen, 2007). They can share their resource capabilities to figure out complicated problems. In the past, scientists have run their scientific workload on distributed grid computing also known as e-Science. By the advent of cloud computing, it is utilized not only for e-Business applications, but also for e-Science problems. With the pervasiveness of cloud computing in geographical domain, low cost and flexibilities thanks to virtualization technology, it has attracted a lot of attention in scientific and research communities. Mathematical parallelizable workflows known as scientific workflows are being applied in several real world engineering applications such as LU-decomposition, GJ-elimination, FFT and etc. (Jin et al., 2008; Xu et al., 2014). A common representation of a parallel application is to model it into directed acyclic graph (DAG) with which the nodes

are application tasks and directed edges connect data-dependent tasks. Such parallelizable applications have wide range of variation in shapes and have variable resource demand. In addition, the users cannot ultimately scale up their own infrastructure; therefore, a distributed system such as cloud computing is scalable enough especially with elasticity attribute that provisions requested services commensurate with users’ variable resource demand. In the cloud environment, independent tasks in a DAG can simultaneously be executed by multiple virtual machines (VMs). On the other hand, one of the most important quality of service (QoS) parameters is turnaround time which is the duration between execution of the first task until the last task is finished in requested application; this time duration is so-called makespan which a user indeed experiences in reality. This is the reason to consider makespan minimization as a prominent objective function in this paper. Generally, scheduling of tasks on parallel distributed machines to obtain minimum makespan, total execution time, is a NP-Hard problem (Johnson and Garey, 1979; Hosseini Shirvani et al., 2017; Amin and Hosseini Shirvani, 2009). Therefore, heuristic approaches can be

✩ This paper has been prepared under Islamic Azad University (Sari-Branch) support. For full disclosure statements refer to https://doi.org/10.1016/j.engappai. 2020.103501. ∗ Corresponding author. E-mail address: [email protected].

https://doi.org/10.1016/j.engappai.2020.103501 Received 1 September 2019; Received in revised form 14 December 2019; Accepted 13 January 2020 Available online xxxx 0952-1976/© 2020 Published by Elsevier Ltd.

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

1- To present two theorems along with their proofs which help for generating smart initial swarm 2- To present a novel hybrid discrete PSO with new operators to solve discrete task scheduling problem 3- To present a Hill Climbing algorithm with local search trend to make a balance between exploration and exploitation in search space.

applied to obtain a sub-optimal solution instead of exhaustively traversing all possible scheduling solutions. In the cloud marketplace, different providers such as Amazon, Google and IBM present variable VM configurations which lead different performance (Hosseini Shirvani, 2019; Hosseini Shirvani et al., 2018). To reach optimal task scheduling, several works have been published in literature, which can be classified into list schedulers, heuristic-based, and meta-heuristic-based approaches. One of the basic idea in this ambit is list scheduling approach which has attracted attentions because of low time complexity and rather good results. Heterogeneous earliest finish time (HEFT) is one of the most well-known list scheduling algorithm (Topcuoglu et al., 2002). It has two phases, at the first phase it provides a list of tasks with topological sort order and at the second phase, it picks up a task from in front of list and finds the suitable processor/VM/Computer which can guarantee the earliest finish time (EFT) of task. It also preserves dependency constraints between sub-tasks. Heuristic-based approaches are clustering and duplication techniques. In task duplication technique, the idea is behind the fact that the parent task which has many children is in critical position; then it can be duplicated on one or more machines to open new parallel paths leading to reduction in task finishing time (Lin et al., 2013; Sinnen et al., 2009; Bansal et al., 2003). On the other hand, clustering technique tries to cluster dependent tasks with high communication costs to be executed on the same processor/VM/computer; in this way, the communication cost can be omitted and consequently the execution time would be reduced (Tang et al., 2010; Mishra et al., 2012; Sih and Lee, 1993). Besides aforementioned approaches, abundant meta-heuristic algorithms are being published in this domain to cover weakness of list scheduler and heuristic-based algorithms. For instance, a geneticbased task scheduling algorithm on heterogeneous computing systems has been presented by Xu et al. (2014). The only smart way they went, was to consider multiple priority queue derived from list schedulers. However, it could not smartly explore search space in the next rounds because it explores search space uniformly. Al Badawi and Shatnawi have designed PSO-based task scheduling algorithm (Al Badawi and Shatnawi, 2013). Although it converges very fast, it lacks to balance between exploration and exploitation in search space because it tends to explore search space globally and neglects to enhance the obtained solutions via local search improvement. However, reviews in literature reveal that list scheduler and heuristic-based approaches produce only accurate solution which are not necessarily an optimal solution; also, applying a single meta-heuristic approach does not yield an optimal solution because in larger task graphs, search space are very large scale in which current approaches cannot well explore search space; as a result rarely exact optimal solutions can be made by utilizing only a single meta-heuristic algorithm. It revolves around the fact that most of the literature tend to explore search space globally which neglect to enhance local search; therefore, the overall performance is not as it should be. Since the task scheduling problems have discrete optimization search space, there exist scares discrete optimization methodologies to figure out this kind of problems in a logical time window. Among them, simulated annealing attracted great attentions because of its fast convergence, but with low optimality (Damodaran and Vélez-Gallego, 2012). Performance result of SA reported in Jin et al. (2008) and Akbari et al. (2017) proves that SA cannot control huge search space in this type of problems. Therefore, we needed a fast and rather optimal algorithm to cover this problem. So, we customized PSO as a fast optimization algorithm, based on our requests. Also, we obviate its original weakness by two novel works. Firstly, we present a new version of discrete PSO commensurate with discrete search space. Secondly, exploitation technique is randomly applied to improve local search and global quality as well. In this way, all shortcomings such as earlier convergence and getting stuck in local optimal trap are overshoot. This is the reason for preparing the current paper. The main contribution of the current papers is as follows:

This novel hybrid meta-heuristic approach has different phases, i.e., initial swarm are randomly generated based on proved theorems. This generated population is an input of PSO algorithm. As the discrete nature of problem shows, canonical PSO that is continuous is no longer beneficial. Afterwards, our new discrete PSO (DPSO) with new updating rules plummets into solving the problem. Also, to balance between exploration and exploitation in search space, our algorithm randomly applies local search treatment by Hill Climbing technique to improve algorithm’s overall performance. The result of different settings indicated the promising outcome against other state-of the art methods. The rest of the current paper is structured as follows: Section 2, is dedicated to related works in literature; Section 3, clarifies task scheduling modeling and formulation; theorems and motivation are placed in Section 4. Our proposed novel hybrid meta-heuristic algorithm is elaborated in Section 5. Time complexity is discussed in Section 6. Experiments and Analysis are brought in Section 7. Finally, Section 8 concludes the paper along with future direction. 2. Related works Task scheduling can be classified into three classes below: List Scheduler: In list scheduler, each task is assigned a weight as a priority which guarantees dependency constraints. The task selection is based on determined priority and the available processor/VM that returns earliest finish time of task can be selected. Heterogeneous earliest finish time (HEFT) and critical path on processor (CPOP) are the most famous version of list schedulers (Topcuoglu et al., 2002). In HEFT algorithm, two type of rankings upward and downward are done to weight on graph nodes. In upward ranking, the algorithm starts from exit node, which has not any child, to entry node, which has not any parent, by considering appropriate weight to each node. In this way, the parent nodes have greater weight in comparison with their children; the final list is provided by decreasing order based on nodes’ weight. Another technique, downward ranking starts from entry node to exit node. It tries to augment children weight more than related parents. In this line, the lowest weight determines the highest priority. On the other hand, CPOP approach tries to determine critical path and strives to put that critical tasks on the fastest processors/VMs. Several works in this domain that have been presented in research community such as CCP, CEFT, RHEFT and DHEFT are the extension of HEFT and CPOP algorithms (Khan, 2012; Thaman and Singh, 2017). Although list schedulers take benefit of promising techniques, there is a clear lack of good exploring in search space. Heuristic-based approaches: Two heuristic algorithms namely clustering and duplication techniques have been presented to improve scheduling performance. The former technique is used to reduce graph communication costs. Indeed, clustering method tries to bundle high communication delay dependent tasks to be mapped on the same processor/VM. By this, it precludes data transfer between two consecutive dependent tasks (Sahni and Vidyarthi, 2016; Gkoutioudi and Karatza, 2010). The latter technique, duplication method, tries to apply task duplication on different processors/VMs. In fact, in this technique, the idea is behind the fact that the parent task which has many children is crucial; then it can be duplicated on one or more machines to open new parallel path leading to reduction in processor/VM communication and task finishing time too. Bansal et al. (2003), Lin et al. (2013) and Mishra et al. (2012) have exploited aforementioned techniques. However, aforementioned techniques are not suitable for limited computing platforms (Bansal et al., 2003; Hosseini Shirvani, 2015). 2

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Meta-heuristic-based approaches: Besides canonical and traditional list schedulers and heuristic approaches, meta-heuristic algorithms have been extended to harness such large-scale search space. Geneticbased, PSO-based and ant colony optimization (ACO)-based algorithms are more popular among others. For instance, a new shuffle geneticbased task scheduling algorithm in distributed heterogeneous systems has been proposed by Hosseini Shirvani (2018). The Quantum genetic algorithm with rotation angle refinement for dependent task scheduling on distributed systems has been published in Gandhi and Alam (2017). Also, Akbariet al. have presented an enhanced genetic algorithm with new operations to solve task scheduling problem (Akbari et al., 2017). Multiple priority queue genetic algorithm (MPQGA) has been propounded in heterogeneous systems to solve task scheduling (Xu et al., 2014). Al Badawi and Shatnawi presented a PSO-based static task scheduling algorithm in 2013 (Al Badawi and Shatnawi, 2013). Their algorithm only explores search space globally which saturates in local optimum. Another PSO-based, entitled ‘‘Self-Adaptive Learning PSOBased Deadline Constrained Task Scheduling for Hybrid IaaS Cloud’’ has been presented in literature by Zuo et al. (Zuo et al., 2013). Kang and He presented a new DPSO algorithm for meta-task assignment in parallel distributed systems (Kang and He, 2011). Sarathambekai and Umamaheswari presented a Discrete PSO task scheduling algorithm in distributed systems works on dynamic topology that is binary heap tree for communication between the particles in the swarm (Sarathambekai and Umamaheswari, 2017). Nevertheless, their particles uniformly traverse trajectory in search space and neglect local neighborhood search. Penharkar proposed ant colony optimization (ACO) for constrained task allocation problem (Pendharkar, 2015). This is a single method which could not lead sustainable results. However, study on related works such as in static list schedulers, heuristic, and meta-heuristic approaches reveals that aforementioned methods uniformly explore search space and there is not any balance between exploration and exploitation which can promisingly improve generated solutions. Applying only a single method does not lead optimal solution in this combinatorial problems; this is the reason to extend current hybrid meta-heuristic algorithm that take Hill Climbing technique with local search trend for improving solutions generated by meta-heuristic algorithm.

Fig. 1. Proposed system framework (Hosseini Shirvani, 2018).

3. Task scheduling modeling and formulation Here the task scheduling problem formulation and modeling are presented. To do so, the system, application, and scheduling models are brought in advance. 3.1. System and heterogeneity model

Fig. 2. A full connected parallel system with four heterogeneous computing system.

Our proposed system framework is depicted in Fig. 1. It has different components such as Front-end module, Cloud Broker, Scheduler, VMs, and Datacenter. Front-end component receives user applications that need to be executed on parallel processors/VMs. Cloud Broker delivers users semantic of cloud service ability and related QoS. The datacenter contains a set of m different heterogeneous processors which are interconnected with high speed networks. The system heterogeneity is based on architecture and speeds; each of which can run multiple VMs. Such system model is illustrated in Fig. 2. The number one on the links indicates that network speed is normalized and all processors uniformly are accessible to each other. Each VM has its own configurations, so every task has different final execution time based on underlying VM’s configuration. Moreover, execution time can be estimated in advance by datacenter profiling techniques. Also, physical machines are equipped with high speed networks; therefore, VMs can easily communicate with each other. If in distributed systems all processors are the same, the runtime of given tasks are the same on each processor, but in real distributed systems the

speed of processors are variable. The heterogeneity model indicates the difference speed of processors to executing the given task. The Eq. (1) shows the degree of asymmetry of processors. 𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑎𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑦 =

1+ℎ 1−ℎ

(1)

So that h is heterogeneity parameter in which it is ℎ ∈ [0..1). If we consider ℎ = 0, the degree of asymmetry is 1; or if taking ℎ = 0.5, the degree of asymmetry is 3. In the latter case, it means that the fastest processor is 3 times faster than the lowest processor of distributed system in executing a given task. The amount of computation cost of task 𝑇𝑖 on processor 𝑃𝑗 , 𝑊 (𝑇𝑖 , 𝑝𝑗 ), is calculated by Eq. (2) where 𝑊𝑐𝑜𝑚𝑝 (𝑇𝑖 ) and S(𝑇𝑖 , 𝑝𝑗 ) are the amount of computation of task 𝑇𝑖 and the execution speed of task 𝑇𝑖 on processor 𝑝𝑗 respectively. 𝑊 (𝑇𝑖 , 𝑝𝑗 ) = 3

𝑊𝑐𝑜𝑚𝑝 (𝑇𝑖 ) S(𝑇𝑖 , 𝑝𝑗 )

(2)

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 3. Mathematical task graphs (Jin et al., 2008).

Moreover, 𝑤(𝑇𝑖 ) is the average computation cost of task 𝑇𝑖 which is calculated by Eq. (3). 𝑤𝑖 =

𝑞 ∑

𝑊 (𝑇𝑖 ,𝑝𝑗 )∕𝑞

(3)

𝑗=1

Note that, the parameter q is the number of processors in distributed system. In heterogeneous systems, inequality (4) is valid. ℎ ℎ ) ≤ 𝑊 (𝑇𝑖 , 𝑝𝑗 ) ≤ 𝑤(𝑇𝑖 )(1 + ) (4) 2 2 In case of ℎ = 0, the degree of asymmetry according to Eq. (1) is 1 and execution time of task 𝑇𝑖 on each processor 𝑝𝑗 is the same because the processors are homogeneous. In our simulation, we consider a heterogeneous system by ℎ = 0.6 with 4 processors in which it normalizes their speed to 0.25, 0.5, 0.75 and 1.0; in the other words, the fastest processor is 4 times quicker than the lowest in this system. We also consider homogeneous scenario by taking ℎ = 0. 𝑤(𝑇𝑖 )(1 −

3.2. Application model Parallel applications are modeled in the form of DAG. Fig. 3 depicts such applications which have sub-task inter-dependencies. For instance, LU-decomposition, GJ-elimination and FFT task graphs are illustrated in Fig. 3a, Fig. 3b and Fig. 3c respectively. Note that, in linear algebra and numerical analysis, LU-decomposition factors a matrix as the product of lower triangular matrix and an upper triangular matrix. Computers usually figure out square systems of linear equations using LU-decomposition. Accordingly, GJ-elimination algorithm, also known as row reduction, is used in linear algebra to solve system of linear equations too. On the other side, FFT algorithm computes the discrete Fourier transform (DFT) of a sequence and converts a signal from its original domain (time or space) to a frequency domain and vice versa. As a result, it reduces computing complexity of DFT from O(n2 ) to O(nlogn) where n is the input data size. Indeed, serial execution of such mathematical algorithms are time-consuming especially in an interactive environment. So, scheduling of aforementioned problems on parallel platforms can improve performance and reliability. If a DAG has some exit/entry nodes, we can consider a dummy node with zero processing and communication costs. Fig. 3b. demonstrates GJ-Elimination algorithm which needs a dummy input node 𝑇0 to be connected to tasks 𝑇1 through 𝑇𝑖 for constructing standard DAG. Also, Fig. 3c needs a dummy exit node 𝑇16 to have connections from nodes 𝑇12 through 𝑇15 . In addition, the structure of LU-decomposition graphs 2 is determined by the number of nodes equal to { L +5𝐿+4 , 𝐿 ≥ 1}, 2 which depends on level L; also, the structure of GJ-elimination graphs 2 is determined by the number of nodes equal to { L +3𝐿+2 , 𝐿 ≥ 1}, which 2

Fig. 4. An example of DAG application with 11 subtasks.

depends on level L. Moreover, FFT task graphs needs 2𝑚 − 1 recursive and m.log(m) butterfly operations respectively, where the parameter m is the length of input vector that must be power of 2 (Topcuoglu et al., 2002). Since the majority of mathematical workflows have homogeneous and symmetric task graph structure, as can be seen in Fig. 3, we present random approximately asymmetric and heterogeneous task graphs to prove our proposed algorithm’s robustness on wide variety input spectrums. For example, Fig. 4 is a random task graph. As underlying architecture is heterogeneous, Table 1 shows execution time of each task on determined processor; moreover, the number on the arc indicates average communication costs provided two dependent tasks are executed on different processors/VMs. Each task graph, a DAG, has several nodes which are defined for computations and edges which show average communication cost between nodes. The edge also indicates precedence constraint between nodes. Every subtask should be executed on one computational node from our target parallel system. Also, each DAG has two special 4

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Therefore, the scheduling objective of this paper can be stated as an optimization problem, which is formulated in Eq. (11): )}} { { ( (11) min 𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛 = min max 𝐴𝐹 𝑇 𝑇𝑒𝑥𝑖𝑡

nodes 𝑇𝐸𝑛𝑡𝑟𝑦 and 𝑇𝐸𝑥𝑖𝑡 that not have predecessor and successor nodes respectively. As the defined system is heterogeneous in nature, processing of each subtask of DAG over different computational nodes have different costs. Although execution time could be a real number, we take integer number for the sake of simplicity. Note that, the execution time can be taken deterministic since in the compile time, the number of instructions are determined. Also, the processing power of underlying infrastructure has been determined in advance. Then, the execution time can be obtained from Eq. (2). For instance, Table 1 shows the different task execution times on different processing nodes. The last row in Table 1 indicates average execution time. Moreover, to determine whether an application is computation-intensive or communication-intensive, we apply communication-to-computation ratio (CCR) concept which can be attained via Eq. (5) (Topcuoglu et al., 2002): 1 ∑ ( ) 𝑒𝑑𝑔𝑒 𝑇𝑖 ,𝑇𝑗 ∈𝐸 𝐶(𝑇𝑖 , 𝑇𝑗 ) 𝑒 CCR = (5) 1 ∑ 𝑇𝑖 ∈𝑇 𝑤(𝑇𝑖 ) 𝑛

It is subject to precedence constraints that must be preserved. The aforesaid algorithm is executed in two phases. In the first phase subtasks are sorted based on predetermined priority. Upward ranking, Downward ranking and Level ranking are three typical approaches in this ambit (Xu et al., 2014). For instance, in Upward ranking approach each subtask is assigned with a weight directed from exit subtask to entry subtask. This value for exit subtask is calculated by Eq. (12) (Topcuoglu et al., 2002); for other subtasks the value is calculated by Eq. (13) recursively (Topcuoglu et al., 2002). ( ) (12) 𝑟𝑎𝑛𝑘𝑢 𝑇𝑒𝑥𝑖𝑡 = 𝑤𝑒𝑥𝑖𝑡 ( ) ( ) 𝑟𝑎𝑛𝑘𝑢 𝑇𝑖 = 𝑤𝑖 + 𝑚𝑎𝑥𝑇𝑗 ∈𝑠𝑢𝑐𝑐 (𝑇𝑖 ) (𝐶(𝑇𝑖 , 𝑇𝑗 ) + 𝑟𝑎𝑛𝑘𝑢 𝑇𝑗 ) (13) Note that, the function succ(𝑇𝑖 ) indicates to all immediate successors of task 𝑇𝑖 in a DAG application. In addition to, the average execution time for each node is calculated by Eq. (3) which has been stated in this paper in Section 3.1. On the other side, in Downward approach, priority value is calculated from entry node to exit node. This value is considered zero for entry subtask whereas for other subtasks the value is recursively calculated by Eq. (14) (Topcuoglu et al., 2002). ( ) ( ) (14) 𝑟𝑎𝑛𝑘𝑑 𝑇𝑖 = 𝑚𝑎𝑥𝑇𝑗∈𝑝𝑟𝑒𝑑 (𝑇 ) (𝑟𝑎𝑛𝑘𝑑 𝑇𝑗 + 𝑤𝑗 + 𝐶(𝑇𝑗 , 𝑇𝑖 ))

For instance, the amount of CCR parameter related to graph depicted in Fig. 4 is 0.7189. It indicates that this graph is computation-intensive. 3.3. Scheduling model List schedulers are famous scheduling algorithms in distributed systems. The Heterogeneous Earliest Finish Time (HEFT) belongs to list scheduler category. It was firstly introduced by Topcuoglu et al. for static task scheduling on limited heterogeneous parallel processing systems (Topcuoglu et al., 2002). To apply such static-oriented algorithm in dynamic environment such as in cloud computing, we can determine static time window to behave the problem with static fashion (Burkimsher et al., 2013). HEFT utilizes two important functions: Earliest Finish Time (EFT) and Earliest Start Time (EST). The first indicates the earliest time in which a processor 𝑝𝑗 can executes subtask 𝑇𝑖 whereas the second indicates the earliest time that the execution can be started. The earliest start time for entry task in DAG is zero in which Eq. (6) calculates. In addition to, functions EST and EFT for other nodes are calculated by Eqs. (7) and (8) respectively (Topcuoglu et al., 2002).

𝑗

𝑚

} ∈𝑝𝑟𝑒𝑑(𝑇 ) 𝑖

𝑖

For instance, for the DAG depicted in Fig. 4, Table 2 shows the priority values of different approaches for each node. Note, sum of upward and downward values makes new ranking method; the nodes which have the most value make graph’s critical path. It urges high priority for this type of nodes. Here, we define new method as Level ranking. In this method we try to place this type of tasks on the fastest processor such as CPOP algorithm in Topcuoglu et al. (2002). Hereafter, we try to place at least three promising lists in our initial population. Hence, three lists of subtasks [𝑡1 𝑡3 𝑡2 𝑡7 𝑡6 𝑡4 𝑡5 𝑡8 𝑡10 𝑡9 𝑡11 ], [𝑡1 𝑡2 𝑡3 𝑡4 𝑡6 𝑡7 𝑡5 𝑡9 𝑡10 𝑡8 𝑡11 ] and [𝑡1 𝑡3 𝑡2 𝑡7 𝑡5 𝑡6 𝑡4 𝑡8 𝑡10 𝑡9 𝑡11 ] are valid topological sort, based on Upward, Downward and Level ranking approaches (Xu et al., 2014). As stated before, in the second phase, a subtask is picked up from in front of sorted list to be scheduled; the algorithm searches to find a processor which guarantees the earliest finish time for that subtask. Therefore, the search space can be explored with different ways because it has several possibilities to take task for execution. In this discrete space, not only it has so many feasible permutations of sub-tasks for large DAGs, but also even for small DAGs, this is the reason that deterministic algorithms do not take over this kind of optimization problems.

(6)

EST(𝑇𝑒𝑛𝑡𝑒𝑟𝑦 , 𝑝𝑗 ) = 0

( ( ) ) EST(𝑇𝑖 , 𝑝𝑗 ) = max{𝑎𝑣𝑎𝑖𝑙{𝑗}, max 𝐴𝐹 𝑇 𝑇𝑚 + 𝐶(𝑇𝑚 , 𝑇𝑖 ) 𝑇

𝑖

In the Level ranking approach, level for each subtask is calculated by Eq. (15) (Topcuoglu et al., 2002). Then, each subtask is placed in sorting list based on its level value in increasing order. In case of the same level for two different subtasks, the subtask which have greater value in sum of Upward rank and Downward rank values is selected from left to right in ordered list. { 0, 𝑖𝑓 𝑇𝑖 = 𝑇𝑒𝑛𝑡𝑒𝑟𝑦 ( ) 𝐿𝑒𝑣𝑒𝑙 𝑇𝑖 = (15) max(𝐿𝑒𝑣𝑒𝑙(𝑇𝑗 ))𝑇 ∈𝑝𝑟𝑒𝑑(𝑇 ) + 1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

(7)

EFT(𝑇𝑖 , 𝑝𝑗 ) = 𝑊 (𝑇𝑖 , 𝑝𝑗 ) + EST(𝑇𝑖 , 𝑝𝑗 ) (8) ( ) The function pred 𝑇𝑖 in Eq. (7) indicates to a set of all predecessor nodes of 𝑇𝑖 in DAG. Moreover, the term 𝑎𝑣𝑎𝑖𝑙 {𝑗} illustrates the time that processor 𝑝𝑗 accomplished the last task on itself and it is ready for the next task to execute. The inner max in Eq. (7) means that ( ) the actual finish time (AFT) of the last task in pred 𝑇𝑖 should be determined. Meanwhile, the outer max indicates that it may happen ( ) the situation that the output of last subtask of 𝑇𝑖 in pred 𝑇𝑖 becomes ready later than 𝑎𝑣𝑎𝑖𝑙 {𝑗}. On the other words, despite 𝑝𝑗 readiness, the execution time is postponed until the time which the last precedence subtask of 𝑇𝑖 is ready; because this procedure precludes of swerving in dependency constraints violation in a given DAG. On the other hand, if the value of 𝑎𝑣𝑎𝑖𝑙 {𝑗} is greater than the finish time of last subtask in 𝑝𝑟𝑒𝑑 (𝑇𝑖 ), despite execution of all subtasks in 𝑝𝑟𝑒𝑑(𝑇𝑖 ), the actual execution will be started until the processor 𝑝𝑗 is ready. The parameter 𝐶(𝑇𝑚 𝑇𝑖 ), indicates average transfer time between processors 𝑝𝑚 and 𝑝𝑖 . If 𝑚 = 𝑖 then 𝐶(𝑇𝑚 𝑇𝑖 ) = 0. The actual execution time for subtask 𝑇𝑖 on processor 𝑝𝑗 is calculated by Eq. (9) (Topcuoglu et al., 2002). Moreover, the total execution time of DAG so called makespan, is calculated by Eq. (10) (Topcuoglu et al., 2002). ( ) ( ) 𝐴𝐹 𝑇 𝑇𝑖 , 𝑝𝑗 = 𝑚𝑖𝑛1≤𝑙≤𝑚 𝐸𝐹 𝑇 𝑇𝑖 , 𝑝𝑙 (9) { ( )} 𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛 = max 𝐴𝐹 𝑇 𝑇𝑒𝑥𝑖𝑡 (10)

3.4. An illustrative example An illustrative example is necessary to determine discrepancies in performance of approaches which solve task scheduling problems. Fig. 5 demonstrates execution of approaches Upward (Topcuoglu et al., 2002), Downward (Topcuoglu et al., 2002) GA-based (Xu et al., 2014), PSO-based (Al Badawi and Shatnawi, 2013) and proposed HDPSO for DAG depicted in Fig. 4. Note that, all of the algorithms in Xu et al. (2014), Topcuoglu et al. (2002) and Al Badawi and Shatnawi (2013) have been studied and executed on the same graph in the same circumstance; then, the results have been reported in Fig. 5. This figures prove the success of HDPSO to take over on search space. 5

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501 Table 1 Different task execution times on different processing nodes. Tasks Processors Average computation cost

𝑡1

𝑡2

𝑡3

𝑡4

𝑡5

𝑡6

𝑡7

𝑡8

𝑡9

𝑡10

𝑡11

𝑃1 𝑃2

7 9

10 9

5 7

6 8

10 8

11 13

12 15

10 13

8 9

15 11

8 9

𝜔̄

8

9.5

6

7

9

12

13.5

11.5

8.5

13

8.5

Table 2 Task priority based on three approaches. Tasks

( ) 𝑟𝑎𝑛𝑘𝑢 𝑡𝑖 ( ) 𝑟𝑎𝑛𝑘𝑑 𝑡𝑖 Level ( ) ( ) 𝑟𝑎𝑛𝑘𝑢 𝑡𝑖 + 𝑟𝑎𝑛𝑘𝑑 𝑡𝑖

𝑡1

𝑡2

𝑡3

𝑡4

𝑡5

𝑡6

𝑡7

𝑡8

𝑡9

𝑡10

𝑡11

104.5 0 0 104.5

76.5 20 1 96.5

82.5 22 1 104.5

53.5 37.5 2 90.5

52 44.5 2 96.5

55.5 40.5 2 96

63.5 41 2 104.5

35 69 3 104.5

24 59.5 3 83.5

31.5 68.5 3 100

8.5 96 4 104.5

4. Theorems and motivation

On the other hand, 𝐿𝑖𝑠𝑡2 = [𝑇1 , 𝑇2 , 𝑇3 , 𝑇5 , 𝑇4 , 𝑇7 , 𝑇6 , 𝑇8 , 𝑇9 , 𝑇10 , 𝑇11 } wherein we put {𝑇5 , 𝑇4 , 𝑇7 , 𝑇6 }, as one of 4! = 24 valid permutations of {𝑇4 , 𝑇5 , 𝑇6 , 𝑇7 } in level 2 of graph that is known as {𝑆𝑒𝑔𝑚𝑒𝑛𝑡2 }. Fig. 6b proves the effectiveness of proposed theorems which leads makespan to be 78 instead of 83.

In the list schedulers, it is necessary to provide a topological order of task lists for preserving dependency constraints. Existing list schedulers and heuristics cannot well explore search space. Here we present some theorems; they can be exploited toward problem optimization. We also prove that this type of problem is a NP-Hard one. Later, we state our motivation to figure out this combinatorial problem. Presented theorems are conducted in such a way that to find optimal solutions.

Motivation: For simplicity, assume that we are given a scientific DAG with n vertices with average k levels in which each level has 𝑘𝑛 tasks on average. For instance, there are a lot of such scientific workloads such as LIGO, CyberShake, Montage, SIPHT and Epigenomics (Bharathi et al., 2008; Verma and Kaushal, 2017; Haidri et al., 2017; Abrishami et al., 2013). Anyway, we put set of tasks with same level in a segment; to avoid confusing, we use segment concept instead of cluster known as heuristic approach in scheduling problems. According to stated theorems, each arbitrary permutation of tasks in each segment is valid. Take a 𝐿𝑖𝑠𝑡 = [{𝑆𝑒𝑔𝑚𝑒𝑛𝑡1 }, {𝑆𝑒𝑔𝑚𝑒𝑛𝑡2 }, … , {𝑆𝑒𝑔𝑚𝑒𝑛𝑡𝑘 }] where each segment has a set of sibling tasks. Fig. 7 draws the facts about possible permutations. ( ) In each segment 𝑘𝑛 ! possible states are contingent, so the total states are calculated at least by a formulation in Eq. (16):

Theorem 1. A task scheduling solution 𝑆 includes a given topological sort of tasks which determines the execution order of tasks. Each sub-list 𝑆 ′ obtained from 𝑆 by deleting one or more tasks are still topological sort lists. Proof. If a task 𝑇𝑖 is deleted from a valid topological sort list, it is trivial that the dependency constraints are not violated; only in the case of adding some tasks in a list needs checking whether the constraints are violated or not. This theorem is also verified for sub-list of the main list. Theorem 2. A solution 𝑆, containing a list of ordered tasks, has topological attribute if for each task 𝑇𝑖 ∈ {list of ordered tasks in 𝑆} is placed after the last elements of 𝑝𝑟𝑒𝑑 (𝑇𝑖 ) and before the first elements of 𝑠𝑢𝑐𝑐 (𝑇𝑖 ) in the ordered list.

At least probable states =

𝑘 ( ) ∏ 𝑛 𝑖=1

𝑘

( ) 𝑛 𝑘 ! ∈ 𝛺( ! ) 𝑘

(16)

We use 𝛺(.) notation because there exists more chance even for permutations of tasks in distant levels, but we consider the ceil of positions. For instance, in List1 related to a graph depicted in Fig. 4, we can exchange task 𝑇3 with tasks 𝑇4 , 𝑇5 and 𝑇6 in the next level except for 𝑇7 , which is its parent, to make new list although they do not belong to the same level. Moreover, each possible permutation leads different results and performance. Therefore, this problem is NP-Hard whereas for big n there is not any deterministic polynomial time algorithm to reach optimal solution in limited time window. For instance, deem a moderate graph with 50 nodes which has approximately 5 levels of 10node; the minimum possible lists to be investigated are 10!5 = 6.29 ∗ 1032 based on Eq. (16). If it is executed on a platform that can take a nanosecond to investigate one list, the total investigation will take myriad centuries.

Proof. Take L be an ordered list of n tasks, so each task 𝑇𝑖 ∈ L[1..n]. If task 𝑇𝑗 = 𝐿[𝑝] is the last task of pred (𝑇𝑖 ) and 𝑇𝑘 = 𝐿[𝑞] is the first task of succ (𝑇𝑖 ) in the list of topological sort L; then, each task 𝑇𝑚 ∈ L [𝑝 + 1..𝑞 − 1] does not have any dependency to 𝑇𝑖 . Therefore, it can be placed in arbitrary order between them. To apply aforesaid theorems, we categorize nodes by their level detection. Fathers of nodes which are in level dth are in level (𝑑−1)th also their children are in level (𝑑 + 1)th; therefore, each arbitrary permutation of tasks in level dth between levels (𝑑 − 1)th and (𝑑 + 1)th are valid. This note can be observed in the List below. From ordered List below we can derive several topological sort lists based on aforesaid theorems. For instance, List 2 is derived from List1 by permutation of tasks in the same level which leads promising result; each of which is valid for graph depicted in Fig. 4. Also, Fig. 6 demonstrates application of our theorems in practice. Even we can go further to permute tasks from distant levels provided dependency constraint is not violated; it is the reason we apply it in exploitation of hybrid approach to improve local search solutions. 𝐿𝑖𝑠𝑡 = [𝑇𝐸𝑛𝑡𝑟𝑦 , …. {tasks in level (𝑑 − 1)th}, {tasks in level 𝑑th}, {tasks in level (𝑑 + 1))th} …., 𝑇𝑒𝑥𝑖𝑡 ] Ordered List based on node level = [𝑇1 , {𝑇2 , 𝑇3 }, {𝑇4 , 𝑇5 , 𝑇6 , 𝑇7 }, {𝑇8 , 𝑇9 , 𝑇10 }, 𝑇11 }, 𝐿𝑖𝑠𝑡1 = [𝑇1 , 𝑇2 , 𝑇3 , 𝑇4 , 𝑇5 , 𝑇6 , 𝑇7 , 𝑇8 , 𝑇9 , 𝑇10 , 𝑇11 }. In a system with 2 non-identical VMs/processors, the execution of List 1 based on HEFT leads makespan to be 83 which Fig. 6a demonstrates.

5. Proposed hybrid Meta-heuristic algorithm definition Before presenting the novel HDPSO, we shortly explain about canonical PSO, new DPSO along with its new operators and Hill Climbing technique. 5.1. Canonical PSO The PSO algorithm has been published by Kennedy and Eberhart in 1995 which inspired from bunch of animal movements such as flock of birds and fish behaviors (Eberhart and Kennedy, 1995). A PSO algorithm is a population based meta-heuristic like other evolutionary 6

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 5. An illustrative example.

⃖⃖⃖⃗𝑖 = (𝑥𝑖1 , 𝑥𝑖2 , … , 𝑥𝑖𝑑 ) be a position of ith particle algorithms. Assume 𝑋 in d-dimensional search space, 𝑉⃖⃖⃗𝑖 = (𝑣𝑖1 , 𝑣𝑖2 , … , 𝑣𝑖𝑑 ) be velocity attribute ⃖⃖⃖⃗𝑖 = (𝑝𝑖1 , 𝑝𝑖2 , … , 𝑝𝑖𝑑 ) be the best personal experience so of the particle, 𝑃 far, for ith particle. The next step velocity and trajectory of particle, of each dimension j, is governed by Eqs. (17) and (18) respectively. 𝑣𝑖𝑗 (𝑡 + 1) = 𝜔.𝑣𝑖𝑗 (t) + 𝑐1 .𝑟1 .(𝑃𝑖𝑗 (𝑡) − 𝑥𝑖𝑗 (𝑡)) + 𝑐2 .𝑟2 .(𝑔 − 𝑥𝑖𝑗 (𝑡))

(17)

𝑥𝑖𝑗 (𝑡 + 1) = 𝑥𝑖𝑗 (𝑡) + 𝑣𝑖𝑗 (𝑡 + 1)

(18)

the global optimal as social concepts. The values of 𝜔, 𝑐1 𝑎𝑛𝑑 𝑐2 allow to tune the inertia, cognition and social behaviors where 𝜔, 𝑐1 , 𝑐2 ≥ 0 and 𝑟1 , 𝑟2 ∈ [0..1] to make the sense of randomness for particles in the flight. Moreover, the variable g in Eq. (17) is representative of global performance in all of the particles. Variable 𝜔 controls the effect of last velocity on current velocity. The canonical PSO was originally presented for continuous optimization problems; so it is no longer beneficial for natural discrete problems such as task scheduling in parallel processors environment. It is necessary to present a new version of discrete PSO (DPSO) to figure out discrete optimization problems.

The movement of particles in a swarm is governed toward three directions, i.e., the last position as inertia, the local optimal as cognition and 7

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 6. Effectiveness of proposed theorems.

Fig. 7. Sample Graph.

of 𝑃 𝑎𝑟𝑡𝑖𝑐𝑙𝑒𝑖 must be modified; the bit 𝑥𝑖𝑗 is embedded for this reason;

5.2. DPSO and new operators

if 𝑥𝑖𝑗 = 0 then 𝑃𝑖𝑗 is possible to be changed; otherwise no change Kennedy and Eberhart also presented discrete version of PSO. It tries to round continuous value into near integer one (Kennedy and Eberhart, 1997). It is no longer beneficial here. Here we define several new parameters and operators which are used in our novel DPSO. New definitions and operators are listed below:

for 𝑃𝑖𝑗 happens. Moreover, the meme 𝑃𝑖𝑗 in 𝑃 𝑎𝑟𝑡𝑖𝑐𝑙𝑒𝑖 is one of the task 𝑇𝑘 ∈ T = {set of tasks in DAG}; the index i and j indicate to number of particle in swarm and the place of task in the given list. For instance, take particle 𝑃1 = (1, 2, 3, 5, 4, 7, 6, 8, 9, 10, 11) and 𝑋1 = (1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1) means that corresponding tasks in 𝑃1 having bit 0 in 𝑋𝑖 can be constituted; as an example pair tasks (𝑇2 , 𝑇7 ), (𝑇6 , 𝑇8 )

5.2.1. Particle position ⃖⃖⃖⃖⃖⃖⃖⃖⃖⃖⃖⃖⃖⃖⃖⃖ Let 𝑃 𝑎𝑟𝑡𝑖𝑐𝑙𝑒⃗𝑖 = (𝑃𝑖1 , 𝑃𝑖2 , … , 𝑃𝑖𝑛 ) be a particle position vector and ⃖⃖⃖⃗𝑖 = (𝑥𝑖1 , 𝑥𝑖2 , … , 𝑥𝑖𝑛 ) be a binary n-bit vector to indicate which element 𝑋

and (𝑇5 , 𝑇10 ) are candidates to be exchanged, but the only pair (𝑇5 , 𝑇10 ) exchange makes precedence violation, so it can be ignored. Finally, new 8

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 8. Proposed HDPSO in Algorithm 1.

position for particle 𝑃1 is (1, 7, 3, 5, 4, 2, 8, 6, 9, 10, 11). Note that the possible pairs can be randomly opted.

5.2.6. Updating rule After definition of several new operations; two binary vectors 𝑉𝑖 and 𝑋𝑖 for 𝑃 𝑎𝑟𝑡𝑖𝑐𝑙𝑒𝑖 are defined which are the basis of particles treatment in flight. It is conducted by Eqs. (19) and (20).

5.2.2. Particle velocity This particle velocity 𝑉⃖⃖⃗𝑖 = (𝑣𝑖1 , 𝑣𝑖2 , … , 𝑣𝑖𝑛 ) is also n-bit binary vector to effect on particle position binary vector 𝑋𝑖 which is placed in updating rule by ⊗ operator.

𝑉𝑖+1 = 𝑃1 𝑉𝑖 ⊕ 𝑃2 (𝑋𝑙𝑏𝑖 ⊖ 𝑋𝑖 ) ⊕ 𝑃3 (𝑋𝑔𝑏 ⊖ 𝑋𝑖 )

(19)

Note that, like 𝑋𝑖 , the binary vectors 𝑋𝑙𝑏𝑖 and Xgb are utilized for particle 𝑃𝑖 and the whole swarm respectively.

5.2.3. Subtract operator (A ⊖ B) This operator is designed to calculate the difference between two vectors; if the corresponding elements are the same the corresponding bit is assigned to 1 otherwise it is set to 0. For instance; (1, 0, 1, 0) ⊖ (1, 1, 0, 0) = (1, 0, 0, 1).

𝑋𝑖 = 𝑋𝑖 ⊗ 𝑉𝑖+1

(20)

Then the vector 𝑋𝑖 governs movement direction of 𝑃 𝑎𝑟𝑡𝑖𝑐𝑙𝑒𝑖 in flight. 5.3. Hill Climbing

5.2.4. Add operator (A ⊕ B) This operator is used for velocity update operation affected by inertia, local best and global best velocity in velocity updating process. Therefore, the new operators is such as 𝑉 = 𝑃1 𝑉1 ⊕ 𝑃2 𝑉2 ⊕ … 𝑃 𝑛 𝑉𝑛 ∑ where 𝑛𝑖=1 𝑃𝑖 = 1. For instance, 0.3(1, 0, 1, 0, 0) ⊕ 0.7(1, 1, 0, 0,1) = (1, #, #, 0, #); the symbol # indicates uncertainty. Here, the first sharp means the bit can be almost both 0 and 1 with 30% and 70% respectively whereas the second sharp chance is exactly vice versa. To solve this problem, the novel approach is to find coefficients 𝑃𝑖 in such a way that 𝑃1 ≤ 𝑃2 ≤ ⋯ ≤ 𝑃𝑛 ; then we draw a random number 0< 𝑞 0 < 1; if 𝑞0 < 𝑃1 then corresponding bit is captured from 𝑉1 else if 𝑞0 ≥ 𝑃1 and 𝑞0 < 𝑃2 the corresponding bit is captured from 𝑉2 etc.

Hill Climbing technique is a local search optimization algorithm. It is combined with global optimization meta-heuristic to improve the current solution by searching locally. It starts searching around current solution which obtained via meta-heuristic algorithm to improve current solution; if the solution found by Hill Climbing is better than that of current solution, the new solution is constituted otherwise it is rejected (Rusell and Norvig, 2003; Taborda and Zdravkovic, 2012). This technique is a promising approach to balance between exploration and exploitation. For this reason, we randomly call Hill Climbing approach beside our proposed meta-heuristic algorithm to keep randomness treatment and to avoid earlier convergence of PSO.

5.2.5. Multiply operator (A ⊗ B) This operation is used to update binary particle position vector X = X⊗V; the binary position vector is used in the next step; only if the corresponding bit of velocity vector is 0; then the binary position vector must be adjusted. For instance, (1, 0, 1, 0, 0) ⊗ (1, 0, 0, 1, 1) = (1, 1, 0, 0, 0).

5.4. Proposed HDPSO Here, we explain our novel hybrid algorithm. The fundamental and new operations have been explained in Section 5.2; also we call randomly Hill Climbing algorithm in proposed meta-heuristic the reason 9

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 9. Block diagram of Proposed HDPSO algorithm.

for naming our algorithm to hybrid discrete PSO (HDPSO). Algorithm 1 in Fig. 8 illustrates proposed novel HDPSO. Also, Fig. 9 demonstrates block diagram of proposed algorithm. This algorithm receives a given DAG specification in terms of number of tasks and edges along with their computation and communication costs, underlying system configuration, number of swarm members, and maximum number of iteration (MaxIteration) as input. Then, it returns an optimal task scheduling solution. Also, we consider constant 𝑞0 to be 0.5 as another input for applying it in exploitation phase. During the main loop, a random number q is drawn in line 10; if this number is greater than 𝑞0 then the exploitation as complementary algorithm, which is done by calling Hill

Climbing algorithm, is combined to DPSO to avoid earlier convergence and to improve overall performance. Algorithm 1 starts with producing initial swarm members. Firstly, the swarm with limited members is randomly generated by applying Algorithm 2. To improve random population as a random swarm, it also takes benefit of other heuristic approaches such as Upward, Downward and Level rankings; it can be seen in lines 1 through 3 of Algorithm 2. In the other words, three particles are made by aforesaid heuristics and the rest particles are produced randomly based on proved theorems in line 7. Fig. 10 depicts it in details. Then, the main algorithm calls Algorithm 3 to evaluate each particle’s fitness in the swarm. It 10

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501 Table 3 FFT-like random graph. Scenarios

Communication cost

Computation cost

CCR

Graph type

1 2 3 4

[40..50] [10..20] [10..20] [5..10]

[2..5] [10..30] [50..100] [50..150]

0.4 1.0 5.0 10.0

computation-intensive moderate graph rather Communication-int. communication-intensive

Table 4 SLR comparison of approaches along with RPD improvement. Dataset

LU14 LU20 LU27 LU35 Gj15 Gj21 Gj28 Gj36 LU14 LU20 LU27 LU35 Gj15 Gj21 Gj28 Gj36 FFT0.4 FFT1.0 FFT5.0 FFT10.0

Platform

Homo. Homo. Homo. Homo. Homo. Homo. Homo. Homo. Hetero Hetero Hetero Hetero Hetero Hetero Hetero Hetero Hetero Hetero Hetero Hetero.

RPD (%)

SLR Upward

Downward

GA

PSO

HDPSO

Upward

Downward

GA

PSO

1.80 2.05 2.22 2.26 2.70 2.67 2.79 2.88 1.80 2.18 2.49 2.26 2.67 2.92 2.93 2.88 1.73 2.37 5.05 7.81

1.80 2.05 2.22 2.33 2.70 2.67 2.79 2.88 1.80 2.05 2.22 2.33 2.67 2.92 2.93 2.88 1.71 2.40 5.27 5.86

1.80 1.32 1.26 1.30 2.40 2.42 2.36 2.44 1.80 1.39 1.43 1.42 2.40 2.53 2.52 2.44 1.68 2.19 4.16 5.86

1.27 1.21 1.39 1.37 2.40 2.25 2.43 2.56 1.40 1.39 1.54 1.33 2.47 2.33 2.78 2.56 1.70 2.24 4.30 5.86

1.27 1.00 1.00 1.00 2.40 2.17 2.29 2.31 1.28 1.13 1.23 1.11 2.40 2.33 2.45 2.31 1.66 2.10 3.83 5.86

29.63 51.28 54.90 55.74 11.11 18.75 17.95 19.57 28.80 48.23 50.58 50.77 10.00 20.00 16.26 19.57 3.81 11.36 24.10 24.91

29.63 51.28 54.90 57.14 11.11 18.75 17.95 19.57 28.80 45.13 44.44 52.33 10.00 20.00 16.26 19.57 2.63 12.56 27.37 0

29.63 24.00 20.69 22.86 0 10.34 3.03 5.13 28.80 18.73 14.14 21.65 0 7.59 2.74 5.13 0.89 4.41 7.92 0

0 17.39 28.13 27.03 0 3.70 5.88 9.76 8.45 18.73 20.19 16.65 2.70 0 11.66 9.76 1.94 6.25 10.85 0

28.37%

26.97%

11.38%

9.95%

Average

Table 5 Speed Up comparison of approaches along with RPD improvement. Dataset

LU14 LU20 LU27 LU35 Gj15 Gj21 Gj28 Gj36 LU14 LU20 LU27 LU35 Gj15 Gj21 Gj28 Gj36

Platform

Homo. Homo. Homo. Homo. Homo. Homo. Homo. Homo. Hetero. Hetero. Hetero. Hetero. Hetero. Hetero. Hetero. Hetero.

RPD (%)

Speed Up Upward

Downward

GA

PSO

HDPSO

Upward

Downward

GA

PSO

1.00 1.00 1.04 1.13 1.11 1.31 1.44 1.57 1.00 0.94 0.92 1.13 1.13 1.20 1.37 1.53

1.00 1.00 1.04 1.10 1.11 1.31 1.44 1.57 1.00 1.00 1.04 1.10 1.13 1.20 1.37 1.53

1.00 1.56 1.83 1.97 1.25 1.45 1.70 1.85 1.00 1.48 1.61 1.80 1.25 1.39 1.59 1.61

1.42 1.70 1.66 1.86 1.25 1.56 1.65 1.76 1.29 1.48 1.49 1.64 1.22 1.50 1.44 1.54

1.42 2.05 2.30 2.56 1.25 1.62 1.75 1.95 1.40 1.82 1.87 1.92 1.25 1.50 1.63 1.64

42.11 105.26 121.74 125.93 12.5 23.08 21.88 24.32 40.44 93.15 102.35 69.76 11.11 25.00 19.42 6.82

42.11 105.26 121.74 133.33 12.5 23.08 21.88 24.32 40.44 82.24 80.00 75.32 11.11 25.00 19.42 6.82

42.11 31.58 26.09 29.63 0 11.54 3.13 5.41 40.44 23.05 16.47 6.68 0 8.21 2.82 1.48

0 21.05 39.13 37.04 0 3.85 6.25 10.81 9.23 23.05 25.29 16.98 2.78 0 13.20 6.06

52.80%

51.54%

15.54%

3.42%

Average

is done by EFT heuristic algorithm. It can be seen in Fig. 11. This algorithm receives a swarm which contains several particles; each of which has a topological ordered of tasks as a candidate solution. For each particle, while there is one non-visited task remained in the list, it takes a non-visited task based on priority and assigned on the available processor/VM which guarantees the earliest finish time; it continues till all the tasks are assigned to processors/VMs. The actual finish time of the last task is considered as a fitness value of current particle.

the initial swarm. Then in the next steps, all of the variables would be changed based on environment circumstances. Then main loop of Algorithm 1 starts from line number 4 to line number 22 whereas the while loop iterates MaxIter times; the variable MaxIter is determined by the user. In each iteration, all particles in the swarm are moved by governing binary vectors X and V. For each particle in the swarm, Algorithm 5 modifies particle’s position affected by binary vectors X and V ; if changed particle is valid, the updating rules are applied on particle’s parameters both on binary vectors and its main values; this updating rules are utilized by Algorithm 6. Note that, the first and the last tasks in the particles are always remained unchanged the reason why their corresponding bit value in velocity vector are set to 1; it can be observed in lines 8 and 9 of Algorithm 5. Moreover, the valid function checks whether new changed particle

After evaluating swarm members by Algorithm 3, the DPSO parameters are initialized by Algorithm 4 which is depicted by Fig. 12. It takes benefit of some binary vectors to govern particles trajectories in search space for next steps; its preliminaries have been explained in Section 5.2. In this initial state, all personal best particles are themselves and the best between them is the global best particle in 11

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Table 6 Efficiency comparison of approaches along with RPD improvement in FFT-Like dataset. CCR

No. Of Proc.

RPD (%)

Efficiency Upward

Downward

GA

PSO

HDPSO

Upward

Downward

GA

PSO

0.4

2 3 4 5 6 10

94.90 87.06 83.54 75.73 70.95 42.57

94.42 85.29 84.47 75.92 69.00 41.40

96.10 89.70 86.02 78.08 72.17 43.30

95.50 88.75 85.63 77.68 71.35 42.81

96.36 90.51 87.01 79.32 72.59 43.55

1.53 3.96 4.16 4.74 2.31 2.31

2.05 6.13 3.00 4.47 5.20 5.20

0.26 0.90 1.15 1.58 0.58 0.58

0.90 1.99 1.62 2.11 1.73 1.73

1.0

2 3 4 5 6 10

90.46 77.58 62.75 55.35 46.63 27.98

92.42 75.53 60.66 52.51 45.15 27.09

96.24 83.66 71.11 57.53 48.48 29.09

94.12 82.05 66.67 55.96 48.21 28.93

101.19 87.52 71.51 58.18 49.90 29.94

11.86 12.82 13.97 5.11 7.02 7.02

9.49 14.36 17.88 10.80 10.53 10.53

5.14 4.62 0.56 1.14 2.92 2.92

7.51 6.67 7.26 3.98 3.51 3.51

5.0

2 3 4 5 6 10

55.21 37.39 31.05 24.84 20.75 12.45

50.92 35.78 26.83 21.47 17.89 10.73

69.55 45.36 35.17 27.29 22.80 13.68

65.35 43.92 33.47 27.67 21.07 12.64

71.55 9.26 35.98 28.06 23.71 14.21

29.60 31.75 15.90 12.96 14.29 14.29

40.51 37.69 34.10 30.70 32.57 32.57

2.87 8.61 2.31 2.82 4.00 4.00

9.48 12.17 7.51 1.41 12.57 12.57

10.0

2 3 4 5 6 10

36.89 24.59 18.44 14.76 12.30 7.38

48.62 32.41 24.31 19.45 16.21 9.72

48.62 32.41 24.31 19.45 16.21 9.72

48.62 32.41 24.31 19.45 16.21 9.72

48.62 32.41 24.31 19.45 16.21 9.72

31.80 31.80 31.80 31.80 31.80 31.80

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

16.55%

12.97%

1.97%

4.02%

Average

Fig. 10. Initial swarm generation in Algorithm 2.

Fig. 11. Performing task-to-processor mapping in Algorithm 3.

represents a topological sort of tasks or not. Algorithms 5 and 6 are depicted in Figs. 13 and 14 respectively.

particle; in this situation if current particle’s fitness is better than that of previous situation the Algorithm 6 returns true; it is the clue to indicate movement was effective. Afterwards, we draw a random number 𝑞 ∈ [0..1]; if 𝑞 > 𝑞0 then Algorithm 7 is called; it is Hill Climbing

As mentioned earlier Algorithm 5 changes particle position, if it is a valid particle then Algorithm 6 updates personal best and global best 12

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 12. Initialize DPSO parameters in Algorithm 4.

6. Time complexity

technique as complementary of DPSO to balance between exploration and exploitation. Note that drawing random number preclude permanent calling Hill Climbing technique which may lead getting stuck in local optimal. This procedure also avoids intrinsic earlier convergence of PSO. Anyway, it locally searches around current particle to improve current solution. The details of Algorithm 7 is brought in Fig. 15. Hill Climbing receives a particle as a solution and explores around this particle to improve performance of current solution. It can investigate tasks in particle except for entry and exit tasks. It starts from second task in the list and finds the next successor task appeared in the list; it can exchange current task with all task before the first successor which is appeared in the list. If this exchange leads a promising solution; then this solution is placed on previous particle. Note that, this auxiliary movement guarantees precedence constraints. Also, the local and global best are updated by Algorithm 6. Moreover, Fig. 16 presents a real example of Hill Climbing effectiveness based on Algorithm 7. Take a DAG depicted in Fig. 4. List1 is the value of 𝑃 𝑎𝑟𝑡𝑖𝑐𝑙𝑒𝑖 as input of Hill Climbing algorithm. Here, 𝐿𝑖𝑠𝑡1 = (1, 3, 2, 7, 6, 4, 5, 8, 10, 9, 11) with 𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛 = 86. It starts from second task in the list which is 𝑇3 ; then it finds the first successor appeared in the list which is 𝑇7 ; 𝑇3 can be constituted with tasks in the list, but before 𝑇7 . The only possible exchange happens between 𝑇3 and 𝑇2 which leads a valid particle (1, 2, 3, 7, 6, 4, 5, 8, 10, 9, 11), but the 𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛 = 88 is not promising. The algorithm continues searching locally; so it eventually improves the current solution to 𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛 = 81. Fig. 16 shows Hill Climbing approach in details that can improve current solution. Since meta-heuristic algorithms do not terminate, termination criteria must be determined in advance. To do so, reaching to predetermined fitness value or limited number of iterations are typical termination criteria which are applied in such algorithms. Here, we experimentally determined MaxIter as limited rounds where no improvement happens. In this study, we considered 𝑀𝑎𝑥𝐼𝑡𝑒𝑟 = 50; also, the number of particles are 10 times of number of given DAG’s node. Moreover, the average result of 20 different runs is reported.

The time complexity of proposed HDPSO is analyzed as follows. It starts with Algorithm 2 to produce random swarm; it applies three heuristic to generate three particles. All of them took O(n). Moreover, its main loop executes S-3 times to generate new particles. Totally, it takes O(n+S) where n and S are particles length and swarm length respectively. Algorithm 3 takes O(𝑒 × 𝑃 ) where e is the number of DAG’s edge and P is the number of parallel processors in the system. This algorithm is also utilized as a fitness function. Time complexity of Algorithm 4 is O(S). The main loop of HDPSO takes the most. It repeats MaxIter times which contains lines 5 through 22. In each iteration for every particle (and all the swarm) several process such as Algorithm 5, Valid, Algorithm 6 and Algorithm 7 are executed. Time complexity of Valid function is O(𝑒 + 𝑛) where in the worst case is O(𝑛2 ). Time complexity of Algorithm 5 and Algorithm 6 are the same O(n) for investigating n-bit vectors. Finally, Hill Climbing algorithm takes O(𝑛2 ) in the worst case. Time complexity of HDPSO is O(𝑀𝑎𝑥𝐼𝑡𝑒𝑟 × 𝑆 × 𝑛2 + 𝑒 × 𝑃 ); typically the last term is less than the first one; then we can take it O(𝑀𝑎𝑥𝐼𝑡𝑒𝑟 × 𝑆𝑛2 ). 7. Experiments and analysis To attain good evaluation, several scenarios are conducted. We assess effectiveness of HDPSO by utilizing different level of mathematical workflows on Homogeneous and Heterogeneous parallel systems respectively. In this study, we considered LU and GJ task graphs along with FFT-like random graph. To analyze scalability, the size of the task graphs increases from 14 tasks to 35 tasks for LU decomposition and from 15 to 36 tasks for GJ-elimination algorithm. Note that, all datasets of LU-decomposition and GJ-elimination graphs have been derived from Jin et al. (2008) except for FFT-like graph where we produce datasets based on information in Table 3 to adjust them for suitable CCR parameter. Fig. 3a and b show the general structure of n tasks for LU-decomposition and GJ-elimination task graphs respectively. Also, to show algorithm’s robustness we define a random pseudo-mathematical 13

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 13. Movement of particle position in Algorithm 5.

Fig. 14. Local and global updates in Algorithm 6.

Fig. 15. Hill Climbing in Algorithm 7.

workflow known as FFT-like graph with 34 nodes. In this regard, per-

Moreover, all scenarios have been executed 20 times and the average results is reported.

formance metrics schedule length ratio (SLR), Speed Up, and Efficiency have been derived from literature to compare HDPSO versus different

7.1. Performance metrics

heuristic-based algorithms such as Upward (Topcuoglu et al., 2002), Downward (Topcuoglu et al., 2002), and meta-heuristic-based algo-

Three metrics SLR, Speed Up, and Efficiency are applied to contrast the performance of HDPSO task scheduling with other state-of-the-arts. As different graphs have different attributes which are utilized in the

rithms such as GA-based (Xu et al., 2014), PSO-based (Al Badawi and Shatnawi, 2013) which are explained in the forthcoming subsection. 14

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 16. Example to show Hill Climbing effectiveness.

algorithm, it is essential to normalize them. It needs a lower bound value to reach good normalization. Therefore, critical path (CP) is the best option. If we consider schedule length of tasks belonging to CP which are executed on the fastest processor, certainly, the scheduling of whole tasks cannot be less than it. The SLR metric is used for this reason that is computed via Eq. (21). So, the low value of SLR is favorable. Note that, this value cannot be less than one (Xu et al., 2014; Topcuoglu et al., 2002; Akbari et al., 2017). 𝑆𝐿𝑅 = ∑

𝑇𝑖 ∈𝐶𝑃 𝑚𝑖𝑛

𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛 ( ) 𝑚𝑖𝑛𝑃𝑗 ∈𝑃 (𝑊 𝑇𝑖 , 𝑃𝑗 )

faster processor. This metric is attained via Eq. (22) (Xu et al., 2014; Topcuoglu et al., 2002; Akbari et al., 2017). 𝑆𝑝𝑒𝑒𝑑 𝑢𝑝 = =

𝑆𝑒𝑟𝑖𝑎𝑙 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑜𝑛 𝑡ℎ𝑒 𝐹 𝑎𝑠𝑡𝑒𝑠𝑡 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟 𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛 ( ) ∑ 𝑚𝑖𝑛𝑃𝑗 ∈𝑃 ( 𝑇𝑖 ∈𝑇 𝑊 𝑇𝑖 , 𝑃𝑗 ) 𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛

(22)

The complementary metric is efficiency because the Speed Up metric does not determine you gained this level of speed up with spending how many processors. This metric is calculated via Eq. (23) (Xu et al., 2014; Topcuoglu et al., 2002; Akbari et al., 2017).

(21)

The next metric is Speed Up which indicates how many times the algorithm runs faster in comparison to a single processor, preferably the

𝐸𝑓 𝑓 𝑖𝑐𝑖𝑒𝑛𝑐𝑦 = 15

𝑆𝑝𝑒𝑒𝑑 𝑢𝑝 ∗ 100% 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟𝑠

(23)

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 17. Average SLR and Speed Up of algorithms for LU-decomposition and GJ-elimination in homogeneous platform.

For instance, gaining 𝑆𝑝𝑒𝑒𝑑 𝑢𝑝 = 3 by spending 4 parallel processors are more beneficial rather than spending 5 parallel processors; in the former case 𝑒𝑓 𝑓 𝑖𝑐𝑖𝑒𝑛𝑐𝑦 is 75% whereas in latter case is 60% based on Eq. (23).

7.3. Mathematical workload on heterogeneous system Here we consider the scenarios in Section 7.2 except for taking the underlying architecture heterogeneous. The reason behind it refers to energy consumption. Fortunately, existing datacenters utilize Dynamic Voltage Frequency Scaling (DVFS)-enabled servers which are capable to adjust their voltage–frequency pair based on current workload (Hosseini Shirvani et al., 2018; Mokaripoor and Hosseini Shirvani, 2016); we consider this concept with 𝑤′1 = 0.25, 𝑤′2 = 0.5, 𝑤′3 = 0.75 and 𝑤′4 = 1.0 respectively to normalize it for 4 parallel heterogeneous systems. Note that lowering down the frequency saves energy, but extends execution time. Also, in this scenarios the HDPSO beats other approaches in terms of SLR and Speed Up metrics for LU and GJ graphs. Fig. 18 proves this issues.

7.2. Mathematical workloads on homogeneous system Here we considered LU-decomposition problems with {14, 20, 27, 35} as set of nodes and GJ-elimination problems with {15, 21, 28, 36} as set of nodes. The computation and communication values have been derived from paper published by Jin et al. (2008). For GJ-elimination, each computation node needs 40 units of time along with communication cost 100 unit of time whereas in LU-decomposition problem each computation node in bottom layer needs 10 unit of time, plus 10 units for every layer and taking 80 unit of time for communication cost. In this experiment we take underlying infrastructure a homogeneous system with considering 𝑤𝑖 = 1 where 𝑖 = 1, …, 4; Fig. 17, demonstrates superiority of HDPSO for LU and GJ graphs in terms of SLR and Speed Up metrics in comparison with other existing approaches.

7.4. Semi-mathematical workload on heterogeneous system As mentioned earlier, we conduct a FFT-like random graph with 34 nodes. To attain better results we make some FFT-like graphs with 16

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 18. Average SLR and Speed Up of algorithms for LU-decomposition and GJ-elimination in heterogeneous platform.

different CCR values, so that we show effectiveness of HDPSO for different graph spectrums from communication-intensive to computationintensive. For this, we take heterogeneous system. Fig. 19 depicts FFTLike random graph with 34 nodes. Table 3, shows the result of uniform distribution of determined values for computation and communication to construct FFT-like graph with different 𝐶𝐶𝑅 = {0.4, 1.0, 5.0, 10.0}. The high CCR value indicates the graph is communication-intensive; on the other hand its low value means that the graph is computationintensive. In this scenario, we take three parallel heterogeneous computing system. Fig. 20 shows different algorithms’ performance in term of SLR metric versus different CCR. As the figure proves HDPSO outperforms in comparison to other approaches in term of SLR except for in the last scenario. In that case, the high CCR for the sake of high communication costs compels most of algorithms to place tasks on just one fastest processor to execute serially. It tends to make a big cluster till it is executed on the fastest processor to omit communication costs; consequently, it leads the same results for all algorithms. Also, Fig. 21 illustrates algorithms’ efficiency decrease by increasing the number of VMs/processors in distributed system. Fig. 21 separates this comparisons in different CCR categories.

In addition, Tables 4 and 5 respectively bring analytical data analysis in terms of SLR and SpeedUP of our proposed HDPSO against other approaches for all used datasets. We also utilize relative percentage deviation (RPD) parameter to indicate the amount of improvement in percent. Moreover, Table 6 illustrates efficiency comparison of HDPSO against other approaches in FFT-Like dataset. As can be seen increasing the number of processors declines system efficiency. Nevertheless, HDPSO has superiority against other approaches in term of efficiency except for in the last scenario in which the given DAG is communication-intensive. In the last scenario of HDPSO, Upward heuristic along with GA and PSO meta-heuristic have the same treatment; it revolves around the fact that aforementioned algorithms tend to omit high communication cost and executing tasks on the fastest processor the reason for having the same behavior and result. One important thing about efficiency is that, increasing the number of processors/VMs can increase speedup in some extent but after a point, not only the increment in the number of processors/VMs does not lead more speedup, but also declines the whole system efficiency; it revolves around the fact that graph depth and width, and task dependencies nullify spending more processors/VMs (Hosseini Shirvani, 2018). This point can be observed in Table 6 where the last two rows in each CCR category shows no improvement. 17

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 19. FFT-like random graph. Table 7 makespan comparison of approaches along with RPD improvement in new FFT-Like dataset. Dataset

CCR

No. Of VMs

RPD (%)

makespan GA

PSO

HDPSO

GA

PSO

FFT63 FFT63 FFT63 FFT63

0.4 1.0 5.0 10.0

6 6 6 6

640 365 648 830

643 361 644 830

608 334 624 830

0.05 0.09 0.04 0.00

0.05 0.07 0.03 0.00

FFT127 FFT127 FFT127 FFT127

0.4 1.0 5.0 10.0

8 8 8 8

711 634 746 1024

707 632 745 1024

681 605 715 1024

0.04 0.05 0.04 0.00

0.04 0.04 0.04 0.00

3.88%

3.38%

Average

Fig. 20. Average SLR of FFT-like graph vs CCR.

8. Conclusion and future direction Overall, if we neglect our proposed HDPSO algorithm’s trivial superiority versus heuristic-based approaches, the average results of different executions of intensive settings on 12 scientific datasets proved our hybrid meta-heuristic has the amount of 10.67, 14.48, and 3 percentage dominance respectively in terms of SLR, SpeedUp, and efficiency against other existing GA-based and PSO-based meta-heuristics. One important thing to mention is that we applied our novel approach on typical scientific workflows with specific scale, which have been cited in literature. Then, out of curiosity, we applied HDPSO on large scale random task graphs with 63, 127 nodes. For instance, we made 8 new dataset of FFT-like graphs with 63 and 127 nodes respectively; each of which has been made with different CCR value. Afterward, we executed them on heterogeneous platforms with 6 and 8 VMs respectively; the average value of 20 executions is reported. Table 7 illustrates the comparison of makespan metric derived from their average execution reports amongst HDPSO, PSO, and GA algorithms. The gained results showed superiority of HDPSO in some extent in comparison with other single meta-heuristic approaches. It proves that HDPSO is a scalable algorithm.

This paper presented a stochastic searching algorithm to schedule tasks on heterogeneous distributed platforms such as in cloud computing environment. This hybrid meta-heuristic algorithm is based on discrete PSO because task scheduling problems are intrinsically discrete optimization problem in which it intends to minimize makespan, as an important quality of service parameter that a user experiences. Therefore, the new DPSO algorithm with efficient operators have been designed. Also, to balance between exploration and exploitation it takes benefit of Hill Climbing algorithm which has local approach. This hybrid approach also precludes earlier convergence nature of PSO. The intensive experimental simulation results demonstrated that proposed HDPSO has superiority against other existing heuristics and meta-heuristic approaches. Anyway, although utilizing more virtual machines in cloud environment increases parallelism degree and leads more speedup, but it definitely burdens more monetary costs for subscribers which is related to their budget. Therefore, for future work, we envisage to design a bi-objective scheduling model to minimize both equally important parameters makespan and total monetary service cost simultaneously. 18

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501

Fig. 21. Average efficiency of FFT-like graph vs number of processors for different CCR.

Acknowledgment

dependencies and a wide variation in execution times. Future Gener. Comput. Syst. 29 (8), 2009–2025. http://dx.doi.org/10.1016/j.future.2012.12.005. Damodaran, P., Vélez-Gallego, M.C., 2012. A simulated annealing algorithm to minimize makespan of parallel batch processing machines with unequal job ready times. Expert Syst. Appl. 39, 1451–1458. Eberhart, R., Kennedy, J., 1995. Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Vol. 4. pp. 1942–1948, www.cs.cmu.edu/~Earielpro/15381f16/c_slides/781f16-26.pdf. Gandhi, T., Alam, T., 2017. Quantum genetic algorithm with rotation angle refinement for dependent task scheduling on distributed systems. In: 2017 Tenth International Conference on Contemporary Computing. IC3, IEEE, pp. 1–5. http://dx.doi.org/10. 1109/IC3.2017.8284295. Gkoutioudi, K., Karatza, H.D., 2010. Task cluster scheduling in a grid system. Simul. Model. Pract. Theory 18 (9), 1242–1252. http://dx.doi.org/10.1016/j.simpat.2010. 04.011. Haidri, R.A., Katti, C.P., Saxena, P.C., 2017. Cost effective deadline aware scheduling strategy for workflow applications on virtual machines in cloud computing. J. King Saud Univ-Comput. Inf. Sci. http://dx.doi.org/10.1016/j.jksuci.2017.10.009. Hosseini Shirvani, M., 2015. Evaluating of feasible solutions on parallel scheduling tasks with DEA decision maker. J. Adv. Comput. Res. 6 (2), 109–115. Hosseini Shirvani, M.S., 2018. A new shuffled genetic-based task scheduling algorithm in heterogeneous distributed systems. J. Adv. Comput. Res. 9 (4), 19–36, http: //jacr.iausari.ac.ir/article_660143.html. Hosseini Shirvani, M.S., 2019. To move or not to move: An iterative four-phase cloud adoption decision model for IT outsourcing based on TCO. J. Soft Comput. Inf. Technol. 3 (in press). Hosseini Shirvani, M.S., Amirsoleimani, N., Salimpour, S., Azab, A., 2017. Multi-criteria task scheduling in distributed systems based on fuzzy TOPSIS. In: 2017 IEEE 30th

This paper has been prepared under Islamic Azad University (SariBranch) support. References Abrishami, S., Naghibzadeh, M., Epema, D.H., 2013. Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29 (1), 158–169. http://dx.doi.org/10.1016/j.future.2012.05.004. Akbari, M., Rashidi, H., Alizadeh, S.H., 2017. An enhanced genetic algorithm with new operators for task scheduling in heterogeneous computing systems. Eng. Appl. Artif. Intell. 61, 35–46. http://dx.doi.org/10.1016/j.engappai.2017.02.013. Al Badawi, A., Shatnawi, A., 2013. Static scheduling of directed acyclic data flow graphs onto multiprocessors using particle swarm optimization. Comput. Oper. Res. 40 (10), 2322–2328. http://dx.doi.org/10.1016/j.cor.2013.03.015. Amin, G.R., Hosseini Shirvani, M.S., 2009. Evaluation of scheduling solutions in parallel processing using DEA FDH model. J. Ind. Eng. Int. 5 (9), 58–62. Bansal, S., Kumar, P., Singh, K., 2003. An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems. IEEE Trans. Parallel Distrib. Syst. 14 (6), 533–544. http://dx.doi.org/10.1109/TPDS.2003.1206502. Bharathi, S., Chervenak, A., Deelman, E., Mehta, G., Su, M.H., Vahi, K., 2008. Characterization of scientific workflows. In: 2008 Third Workshop on Workflows in Support of Large-Scale Science. IEEE, pp. 1–10. http://dx.doi.org/10.1109/WORKS. 2008.4723958. Burkimsher, A., Bate, I., Indrusiak, L.S., 2013. A survey of scheduling metrics and an improved ordering policy for list schedulers operating on workloads with 19

M. Hosseini Shirvani

Engineering Applications of Artificial Intelligence 90 (2020) 103501 Pendharkar, P.C., 2015. An ant colony optimization heuristic for constrained task allocation problem. J. Comput. Sci. 7, 37–47. http://dx.doi.org/10.1016/j.jocs. 2015.01.001. Rusell, S., Norvig, P., 2003. Artificial Intelligence: a Modern Approach. In: Pretice Hall Series in Artificial Intelligence, vol. 1, ISBN: 0-13-790395-2, pp. 111–114. Sahni, J., Vidyarthi, D.P., 2016. Workflow-and-platform aware task clustering for scientific workflow execution in Cloud environment. Future Gener. Comput. Syst. 64, 61–74. http://dx.doi.org/10.1016/j.future.2016.05.008. Sarathambekai, S., Umamaheswari, K., 2017. Task scheduling in distributed systems using heap intelligent discrete particle swarm optimization. Comput. Intell. 0 (0), http://dx.doi.org/10.1111/coin.12113. Sih, G.C., Lee, E.A., 1993. A compile-time scheduling heuristic for interconnectionconstrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst. 4 (2), 175–187. http://dx.doi.org/10.1109/71.207593. Sinnen, O., To, A., Kaur, M., 2009. Contention-aware scheduling with task duplication. In: Workshop on Job Scheduling Strategies for Parallel Processing. Springer, Berlin, Heidelberg, pp. 157–168. http://dx.doi.org/10.1007/978-3-642-04633-9_9. Taborda, D.M.G., Zdravkovic, L., 2012. Application of a Hill–Climbing technique to the formulation of a new cyclic nonlinear elastic constitutive model. Comput. Geotech. 43, 80–91. http://dx.doi.org/10.1016/j.compgeo.2012.02.001. Tanenbaum, A.S., Van Steen, M., 2007. Distributed Systems: Principles and Paradigms. Prentice-Hall, cds.cern.ch/record/1056310/files/0132392275_TOC.pdf. Tang, X., Li, K., Liao, G., Li, R., 2010. List scheduling with duplication for heterogeneous computing systems. J. Parallel Distrib. Comput. 70 (4), 323–329. http://dx.doi.org/ 10.1016/j.jpdc.2010.01.003. Thaman, J., Singh, M., 2017. Green cloud environment by using robust planning algorithm. Egypt. Inf. J. 18 (3), 205–214. http://dx.doi.org/10.1016/j.eij.2017.02. 001. Topcuoglu, H., Hariri, S., Wu, M.Y., 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13 (3), 260–274. http://dx.doi.org/10.1109/71.993206. Verma, A., Kaushal, S., 2017. A hybrid multi-objective particle swarm optimization for scientific workflow scheduling. Parallel Comput. 62, 1–19. http://dx.doi.org/ 10.1016/j.parco.2017.01.002. Xu, Y., Li, K., Hu, J., Li, K., 2014. A genetic algorithm for task scheduling on heterogeneous computing systems using multiple priority queues. Inform. Sci. 270, 255–287. http://dx.doi.org/10.1016/j.ins.2014.02.122. Zuo, X., Zhang, G., Tan, W., 2013. Self-adaptive learning PSO-based deadline constrained task scheduling for hybrid IaaS cloud. IEEE Trans. Autom. Sci. Eng. 11 (2), 564–573. http://dx.doi.org/10.1109/TASE.2013.2272758.

Canadian Conference on Electrical and Computer Engineering. CCECE, IEEE, pp. 1–4. http://dx.doi.org/10.1109/CCECE.2017.7946721. Hosseini Shirvani, M.S., Babazadeh Gorji, A., 2020. Optimization of automatic web services composition using genetic algorithm. Int. J. Cloud Comput. (in press). Hosseini Shirvani, M.S., Rahmani, A.M., Sahafi, A., 2018. An iterative mathematical decision model for cloud migration: A cost and security risk approach. Softw. Pract. Exp. 48 (3), 449–485. http://dx.doi.org/10.1002/spe.2528. Hosseini Shirvani, M., Rahmani, A.M., Sahafi, A., 2018. A survey study on virtual machine migration and server consolidation techniques in DVFS-enabled cloud datacenter: Taxonomy and challenges. J. King Saud Univ-Comput. Inf. Sci. http: //dx.doi.org/10.1016/j.jksuci.2018.07.001. Jin, S., Schiavone, G., Turgut, D., 2008. A performance study of multiprocessor task scheduling algorithms. J. Supercomput. 43 (1), 77–97. http://dx.doi.org/10.1007/ s11227-007-0139-z. Johnson, D.S., Garey, M.R., 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. WH Freeman. Kang, Q., He, H., 2011. A novel discrete particle swarm optimization algorithm for meta-task assignment in heterogeneous computing systems. Microprocess. Microsyst. 35 (1), 10–17. http://dx.doi.org/10.1016/j.micpro.2010.11.001. Kennedy, J., Eberhart, R.C., 1997. A discrete binary version of the particle swarm algorithm. In: 1997 IEEE International Conference on Systems, Man, and Cybernetics. In: Computational Cybernetics and Simulation, vol. 5, IEEE, pp. 4104–4108. http://dx.doi.org/10.1109/ICSMC.1997.637339. Khan, M.A., 2012. Scheduling for heterogeneous systems using constrained critical paths. Parallel Comput. 38 (4–5), 175–193. http://dx.doi.org/10.1016/j.parco. 2012.01.001. Lin, C.S., Lin, C.S., Lin, Y.S., Hsiung, P.A., Shih, C., 2013. Multi-objective exploitation of pipeline parallelism using clustering, replication and duplication in embedded multi-core systems. J. Syst. Archit. 59 (10), 1083–1094. http://dx.doi.org/10.1016/ j.sysarc.2013.05.024. Mishra, P.K., Mishra, A., Mishra, K.S., Tripathi, A.K., 2012. Benchmarking the clustering algorithms for multiprocessor environments using dynamic priority of modules. Appl. Math. Model. 36 (12), 6243–6263. http://dx.doi.org/10.1016/j.apm.2012.02. 011. Mokaripoor, P., Hosseini Shirvani, M., 2016. A state of the art survey on DVFS techniques in Cloud Computing Environment. J. Multidiscip. Eng. Sci. Technol. 3 (5).

20

A hybrid meta-heuristic algorithm for scientific workflow scheduling in heterogeneous distributed computing systems

A hybrid meta-heuristic algorithm for scientific workflow scheduling in heterogeneous distributed computing systems

Recommend Documents