A multi-objectives scheduling algorithm based on cuckoo optimization for task allocation problem at compile time in heterogeneous systems

A multi-objectives scheduling algorithm based on cuckoo optimization for task allocation problem at compile time in heterogeneous systems

Accepted Manuscript A Multi-Objectives Scheduling Algorithm Based on Cuckoo Optimization for Task Allocation Problem at Compile Time in Heterogeneous...

8MB Sizes 1 Downloads 43 Views

Accepted Manuscript

A Multi-Objectives Scheduling Algorithm Based on Cuckoo Optimization for Task Allocation Problem at Compile Time in Heterogeneous Systems Mehdi Akbari , Hassan Rashidi PII: DOI: Reference:

S0957-4174(16)30232-9 10.1016/j.eswa.2016.05.014 ESWA 10672

To appear in:

Expert Systems With Applications

Received date: Revised date: Accepted date:

20 January 2016 7 May 2016 8 May 2016

Please cite this article as: Mehdi Akbari , Hassan Rashidi , A Multi-Objectives Scheduling Algorithm Based on Cuckoo Optimization for Task Allocation Problem at Compile Time in Heterogeneous Systems, Expert Systems With Applications (2016), doi: 10.1016/j.eswa.2016.05.014

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

Highlights:  A novel algorithm for task scheduling in heterogeneous systems is proposed.  We use an extended Cuckoo Optimization Algorithm (COA) to solve the problem.  Defining an efficient immigration function to escape from local optimums.  The results shows the proposed algorithm superiority over the previous algorithms.

ACCEPTED MANUSCRIPT

A Multi-Objectives Scheduling Algorithm Based on Cuckoo Optimization for Task Allocation Problem at Compile Time in Heterogeneous Systems Mehdi Akbaria,*, Hassan Rashidib a

Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran b Department of Mathematics and Computer Science, Allameh Tabataba'i University, Tehran, Iran

*

Corresponding author. Tel.: +98 (31) 4229-2632, fax: +983142291016 E-mail addresses: [email protected], [email protected] (M. Akbari), [email protected] (H. Rashidi).

CR IP T

Abstract

M

AN US

To handle scheduling of tasks on heterogeneous systems, an algorithm is proposed to reduce execution time while allowing for maximum parallelization. The algorithm is based on multi-objective scheduling cuckoo optimization algorithm (MOSCOA). In this algorithm, each cuckoo represents a scheduling solution in which the ordering of tasks and processors allocated to them are considered. In addition, the operators of cuckoo optimization algorithm means laying and immigration are defined so that it is usable for scheduling scenario of the directed acyclic graph of the problem. This algorithm adapts cuckoo optimization algorithm operators to create proper scheduling in each stage. This ensures avoiding local optima while allowing for global search within the problem space for accelerating the finding of a global optimum and delivering a relatively optimized scheduling with the least number of repetitions. Moving toward global optima is done through a target immigration operator in this algorithm and schedules in each repetition are pushed toward optimized schedules to secure global optima. The results of MOSCOA implementation on a large number of random graphs and real-world application graphs with a wide range characteristics shows MOSCOA superiority over the previous task scheduling algorithms.

1. Introduction

ED

Keywords: Task scheduling; Heterogeneous systems; Cuckoo Optimization Algorithm; Meta-Heuristic algorithm; Makespan.

AC

CE

PT

To make balance between maximizing parallelization and minimizing communication delay, heterogeneous systems face a critical problem in task scheduling. This problem is a max-min problem with parallelism. The parallelization in this problem concerned with simultaneous task allocation to a higher number of processors, which increases communication cost especially when communication delay is high. Meanwhile, task execution on some specific processors reduces efficiency and therefore, the main objective of the task scheduling algorithms is to make a balance between parallelization increase and communication costs decrease. The static task scheduling problem in which this is no changes in situation, is an NP-hard problem (Ullman, 1975). To solve this problem, the proposed algorithms employ heuristic methods to achieve a relatively optimal solution. The main quandary of heuristic methods is to get stuck in local optima. To scape of this quandary, some meta-heuristic methods are used. These methods utilize detection functions in search strategies (Kwok & Ahmad, 2005; Wu, Shu, & Gu, 1997) that results in reducing the possibility of engaging with local optima as well as decreasing search space. One of the difficulties of meta-heuristic methods such as genetic algorithms is to determine the number of repetitions of these algorithms to achieve a relatively optimal solution. Cuckoo optimization algorithm achieves the solution at a lower number of repetitions compared with some meta-heuristic algorithms in this regard (Rajabioun, 2011).

ACCEPTED MANUSCRIPT

CE

PT

ED

M

AN US

CR IP T

In addition, the number of repetitions for achieving an appropriate solution does not affect execution time used for next times since the program is compiled once and ran for several times. Faster convergence and local search capability along with global search are among the factors which justify the application of cuckoo optimization algorithm. The main objectives of MOSCOA are to minimize total execution time and processor allocation to tasks so that maximum parallelization in simultaneous performance of processors is achieved while the ratio of communication costs between processors to task execution costs is minimized. In MOSCOA, first new schedule is created in laying stage with regard to the radius of defined modifications for each cuckoo. Then these schedules immigrate toward optimal schedules in generated population. In this algorithm, creating new schedules is done in a way that precedence relations between tasks are maintained in order to have a correct schedule. The number of repeated schedules or those close to each other in generated population is minimized at each stage to maintain the variety of samples. The three major contributions of this study are listed below:  In most of metaheuristic-based scheduling algorithms, there is a stage for selection and assignment of tasks to processors (Ahmad, Liew, Munir, Ang, & Khan, 2016; Gogos, et al., 2016; N. Kumar & Vidyarthi, 2016; G. Wang, Wang, Liu, & Guo, 2016; Longxin Zhang, et al., 2015). However, a new method has been developed in the proposed algorithm to display schedules in which each task is displayed in conjunction with the processor that was assigned randomly in the stage of generating initial populations.  Most algorithms that employ CS as the metaheuristic-based method for handling scheduling problem use Le'vy flights for conducting searches within the problem space (Mandal & Acharyya, 2015; Navimipour & Milani, 2015; H. Wang, et al., 2016), while in the proposed algorithm immigration and laying operators are tasked with this.  There is often a repair stage in some algorithms that employ laying and immigration operators (Shahdi-Pashaki, Teymourian, Kayvanfar, Komaki, & Sajadi, 2015). This stage is tasked with verifying the integrity of subsequences of the schedules which have been created by operators and repairing them in case of need. These operators are defined in MOSCOA to skip repair stage. The rest of this paper is organized as follows. In Section 2, the related works on the scheduling algorithms on heterogeneous systems is reviewed. In Section 3, the task scheduling problem is described and its model are formulated. In Section 4, we present MOSCOA framework for the task scheduling to achieve maximizing parallelization and minimizing the makespan based on cuckoo optimization algorithm. In Section5, we evaluate the time and space complexities. In Section 6, the results and analyses of simulations to validate the algorithm are given. In Section7 downsides of MOSCOA algorithm is described. Finally, a conclusion and an overview of future work are presented in Section 8.

AC

2. Related works

The major previous researches in task allocation problem at compile time in Heterogeneous Systems focused on static task scheduling algorithms. These algorithms can be classified as heuristic and meta-heuristic algorithms (Bansal, Kumar, & Singh, 2003; Kwok & Ahmad, 1996; Manudhane & Wadhe, 2013; Topcuoglu, Hariri, & Wu, 2002). The Heuristic algorithms are divided into list-based heuristics, clustering heuristics and duplication heuristics. Fig. 1 shows this classification together with the proposed algorithms samples for each category.

CR IP T

ACCEPTED MANUSCRIPT

Fig. 1. Classification of static task scheduling algorithms at compile time.

AC

CE

PT

ED

M

AN US

In list-based heuristics, a priority is assigned to each task and the tasks are listed in order of their preferences. In these kinds of heuristics, the task selection for processing is done in order of preference and a task with a higher priority is assigned to processor earlier. The heterogeneous earliest finish time (HEFT) and critical path on a processor (CPOP) are the most important examples of list based heuristics (Topcuoglu, et al., 2002). Heterogeneous earliest finish time algorithm is designed for a given number of heterogeneous processors and includes two stages: at the first stage, task scheduling priority is calculated and at the second stage, processor selection stage, tasks are analyzed in order of priority and assigned to a processor providing the shortest finish time. In this algorithm, priority is determined by a couple of parameters known as tlevel and blevel. In tasks graph, tlevel of each node denotes the longest path weight from the start node to the desired node. In another aspect, tlevel of each node represents the nearest possible start time of that node. Blevel of each node shows the longest pass weight of that node to the exit node. If this algorithm uses tlevel parameter to determine the priorities it is called HEFT-T and if blevel parameter is applied it is called HEFT-B. In critical path on a processor algorithm, the total sum of blevel and tlevel of each node is regarded as task priority and then tasks are selected in order of priority and if placed on critical path, are allocated to the processor which minimizes total task calculation costs on critical path and if not, they are assigned to the processor which makes the nearest complete time possible for it. Critical path refers to the path from the input node to the output node that has the highest total value of calculation cost and communication costs of edges (Kwok & Ahmad, 1996). Therefore, an effective scheduling list-based algorithm requires an appropriate scheduling of tasks placed on critical path (Daoud & Kharma, 2011). The length of critical path is equal to blevel and tlevel sum of input task. Thus, each task whose tlevel and blevel equals to the sum of blevel and tlevel of input task lies on critical path (Topcuoglu, et al., 2002). Duplication heuristic method reduces runtime by applying task duplication in different processors (C.-S. Lin, Lin, Lin, Hsiung, & Shih, 2013). In this method, communication time between processors is reduced by executing tasks on more than one processor. It avoids the transmission of the results from a specific task to the next one (because both are executed on a single processor) and therefore it reduces communication costs. In parallel and distributed systems, clustering heuristic method is an appropriate solution to reduce graph communication delay. Communication delay is reduced in this method since

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

tasks which have high relations with each other are put together in a cluster and are assigned to a single processor (Mishra, Mishra, Mishra, & Tripathi, 2012). The Meta-heuristic algorithms usually make up an important solution of global optimization algorithms. Totally heuristic means to find and discover by error and trial. ''Meta-heuristic'' can be applied to strategies with a higher level that have modified and guided heuristic methods in a way that can achieve solutions and innovations beyond what is normally accessible in local optimum search. There are two types of local and global optima while dealing with optimization problems. The local optima are the best solution found in an area of solution spaces but not necessarily the best for the whole problem space. In contrast, the global optima are the best solution to the whole problem space. Finding the global optima in most real-life problems is extremely difficult and therefore satisfactory and good-enough solutions are often accepted. Meta-heuristic algorithms are employed to achieve these targets. The reasons for using meta-heuristic algorithms could be put in three categories of simplicity, flexibility and ergodicity which come as below:  Simplicity: Most meta-heuristic algorithms are simple, easy to implement and relatively less complex. The primary part of the algorithm could be written in 100 lines in programming languages.  Flexibility: These algorithms are flexible as they are simple to cover a wide range of optimization problems which can’t be handled by classic algorithms.  Ergodicity: Meta-heuristic algorithms have high degrees of ergodicity which means they can search multi-modal search spaces with sufficient variety and avoiding local optima at the same time. Ergodicity is often the result of randomized techniques derived from natural systems such as crossover and mutation or statistical models such Random walks or Le'vy flights (X.-S. Yang, Cui, Xiao, Gandomi, & Karamanoglu, 2013). There are various methods to solve task scheduling problems based on meta-heuristic methods. The most famous ones are:  Genetic based algorithms (Daoud & Kharma, 2011; Gupta, Agarwal, & Kumar, 2010; Hwang, Gen, & Katayama, 2006; Kołodziej & Khan, 2012; Lu, Niu, Liu, & Zhu, 2013; Omara & Arafa, 2010; Rahmani & Vahedi, 2008; Sathappan, Chitra, Venkatesh, & Prabhu, 2011; Singh & Singh, 2012; Zomaya, Ward, & Macey, 1999),  Particle Swarm Optimization (PSO) (Guo, Zhao, Shen, & Jiang, 2012; Lei Zhang, Chen, Sun, Jing, & Yang, 2008),  Ant Colony Optimization (ACO) (Babukartik & Dhavachelvan, 2012; H. Kim & Kang, 2011; Lo, Chen, Huang, & Wu, 2008; Pendharkar, 2015; Y. Yang, Wu, Chen, & Dai, 2010),  Cuckoo Search Algorithm (CS) (Navimipour & Milani, 2015),  Artificial Bee Colony (ABC) (Babukartik & Dhavachelvan, 2012; Ferrandi, Lanzi, Pilato, Sciuto, & Tumeo, 2010; J. Lin, Zhong, Lin, Lin, & Zeng, 2014),  Simulated Annealing algorithm (Damodaran & Vélez-Gallego, 2012; J. Wang, Duan, Jiang, & Zhu, 2010),  Memetic Algorithm (Pendharkar, 2011). In the proposed algorithm, we used Cuckoo Optimization Algorithms (COA). COA have been used in scheduling since their introduction in 2011 (Abu-Srhahn & Al-Hasan, 2015; Babukartik & Dhavachelvan, 2012; Branch, 2015; Dasgupta & Das, 2015; Durgadevi & Srinivasan, 2015; Navimipour & Milani, 2015; Shahdi-Pashaki, et al., 2015; H. Wang, et al., 2016). Laying and immigration operators are used in Cuckoo optimization algorithms to search the problem space. This will be explained in Section 4.2 and 4.3. The algorithm proposed in (Shahdi-Pashaki, et al., 2015) is one of the algorithms which use these operators to search the problem space. The approach that is included in this algorithm covers a repair

ACCEPTED MANUSCRIPT stage that examines the created schedules in terms of their integrity and repairs them if necessary once laying and immigration are completed. In addition to immigration and laying operators, there is a special random search option named Le'vy flights or Le'vy walk that uses Le'vy distribution which is a continuous distribution for randomized non-negative variables for step-lengths. This flight is in fact a random search or random steps in which step-lengths follow the Le'vy distribution. NCS algorithm (H. Wang, et al., 2016) is an example of the algorithms using Le'vy flights in Cuckoo algorithm to tackle scheduling problems.

3. Task scheduling problem and its model

AN US

CR IP T

A scheduling program by DAG is expressed in which each node represents a single task and each edge shows task execution dependency. The node which has no parent is called Tentry and the node without any child is called Texit. If speed of internal communications between processors is far more than network communication speed, communication costs between processors will be considered zero. In this case, the value assumed as each edge weight is defined as internetwork communication overhead. Each task will be executed only when all its father node information is accessible for the desired processor that is all father nodes be executed. In this case, this task is defined as ready task and is assigned to one of the processors. Table 1 shows a list of notations and their definitions used in this paper. Table 1. Definitions of notations.

Definition

DAG T E P I n p e m Ti Pk Texit Tentry α h K β F Nmax NSmax NSmin MRp MRt

A direct acyclic graph A set of tasks in an application A set of edges for precedence constraints among the tasks A set of heterogeneous processors Number of iteration Number of tasks Number of processors Number of edges The matrix size of Gaussian graph The ith task in the application The kth processor in the system The exit task with no successor The entry task with no predecessor. The parallelism factor The range percentage of computation costs on processors The number of new schedules The maximum range of task modification The immigration probability The number of schedules survive that have better fitness values The Max Number of New Schedule The Min Number of New Schedule The Modifications Radius of processor The Modifications Radius of task

AC

CE

PT

ED

M

Notation

A sample task graph with task execution time (runtime) on three processors working as a heterogeneous distributed system is illustrated in Fig. 2 (Xu, Li, Hu, & Li, 2014).

3.1. Task graph generation metrics These parameters are employed to generate graphs with different characteristics to test algorithms.  DAG size. This parameter determines the number of program tasks. In the task of graph, DAG size is identified by a special parameter that is explained in Section 6.2.

ACCEPTED MANUSCRIPT  Computation cost of tasks. The amount of necessary computations for each task is shown by and is task execution speed on processor and task cost on a given processor is computed by Eq. (1) (Topcuoglu, et al., 2002; Xu, et al., 2014): .

(1)

CR IP T

 Computation cost ratio (CCR). This parameter shows the average cost of communication divided by the average cost of task execution. If this value is very low, this graph shows a computation-intensive program. CCR value is computed by Eq. (2) (Xu, et al., 2014): ̅̅̅̅̅̅̅̅̅̅



̅̅̅̅̅̅̅̅



(2)

(

)

PT

̅̅̅̅̅̅̅̅

ED

M

AN US

where ̅̅̅̅̅̅̅̅̅̅ is the average communication cost of the and ̅̅̅̅̅̅̅̅ is the average computation cost of .  Parallelism factor (α). In random graph, graph depth is randomly generated with a uniform distribution that its average value is equal to √ ⁄ . Graph depth equal to the smallest value of greater integer is equal to the obtained decimal value. The number of nodes at each level is randomly created with uniform distribution and its average is equal to α √ if α we have maximum parallelism graph and if α a minimum parallelism graph is generated (Topcuoglu, et al., 2002).  Computation cost heterogeneity factor (h). The high value of h represents strong difference of computation cost for task execution on processors. If h value is considered zero, it means that execution cost of each task is equal on all processors. Average cost of execution of each task shown by ̅̅̅̅̅̅̅̅ is randomly achieved with a uniform distribution in the range of [ ̅̅̅̅̅̅̅]. ̅̅̅̅̅̅̅ is average graph runtime that is randomly considered not to affect comparing results. In this case, runtime on shown by lies accidentally in relation range (3) (Topcuoglu, et al., 2002): ̅̅̅̅̅̅̅̅

(

)

(3)

CE

In some resources (Xu, et al., 2014), h parameter along with Eq. (4) are determinants of processors symmetry level. For example, if h value is considered as half (0.5), asymmetry level equals 3 and if h value is 1, system will be symmetric.

AC

[

(4)

3.2. Scheduling view A schedule shows the sequences of tasks determined for each processor. In this paper, schedule is expressed by a two-dimensional array that its length is equal to the total number of tasks in the task graph and a processor is randomly selected for each task. As a result, processor selection for existing tasks in most scheduling algorithms will be deleted. Task sequence in a schedule should be in a way that dependences are maintained in program task graph. Fig. 3 shows an example of task graph scheduling.

CR IP T

ACCEPTED MANUSCRIPT

Fig. 2. (a) A sample DAG and (b) Computation cost matrix.

AN US

Fig. 3. A sample of graph scheduling for the simple DAG task graph in Fig. 2.

3.3. Task execution by processors

To assign tasks to processors and to compute execution time, after creating schedules in MOSCOA, actual start time of task on processor is identified by Eq. (5) (Xu, et al., 2014): )

(5)

M

(

PT

ED

In this equation shows the earliest time when processor is ready to schedule, that is, previous task being executed on this processor has been completed. is the earliest start time of a task on a single processor computed by Eq. (6) (Xu, et al., 2014):

(

(

)

(6) )

(

))

CE

{

(

AC

In this equation is the set of immediate predecessors of and ( ) is the communication cost between two tasks. is the actual finish time of executing one task on a processor expressed by Eq. (7):

In Fig. 4 AFT and AST are illustrated for scheduled tasks in Fig. 3.

(7)

ACCEPTED MANUSCRIPT

3.4. Fitness calculation function

CR IP T

Fig. 4. Task execution by processors and makespan calculation.

The length of schedule resulted from algorithm makespan or scheduling fitness which is actual finish time of exit task is defined by Eq. (8) (Xu, et al., 2014):

(

AN US

Fitness can also be computed by Eq. (9): )

(8)

(9)

AC

CE

PT

ED

M

where is finish time of exit task on processor. In Fig. 4 fitness for the desired scheduling is 136. Fig. 5 shows some samples of schedules generated by the HEFT-B, CPOP, HEFT-T, BGA and MOSCOA on Fig. 2 graph. The resulted fitness from MOSCOA equals 66 that is a better schedule compared to other algorithms. This result is achieved by 25 (number of repetitions) while in BGA algorithm, the result of 69 is obtained by 50 repetitions.

Fig. 5. The schedules generated by the HEFT-B, CPOP, HEFT-T, BGA and MOSCOA.

ACCEPTED MANUSCRIPT

4. The Proposed algorithm framework

CR IP T

In MOSCOA, cuckoo optimization algorithm is applied to achieve a relatively optimal solution with lower number of repetitions compared to the genetic based algorithm. This algorithm framework is illustrated in Algorithm 1. In the following, the proposed algorithm stages will be described in detail. To start solving a problem, Cuckoo optimization algorithm randomly generates an initial population as a habitat matrix that each member shows the current habitat of cuckoos. In our problem, each cuckoo in initial population represents a complete solution or schedule that is shown by a sequence of tasks and processors. In repetition loop, laying and immigration stages are done according to the desired number of repetition usually far smaller than the number of repetitions in other meta-heuristic algorithms. After laying completed, the number of population is modified and immigration is carried out toward the best achieved schedules with regard to the maximum number of cuckoos survived. Algorithm 1. Pseudocode of MOSCOA.

4.1. Initial population

ED

M

AN US

Input: A DAG application, Size of Run, Size of Population, Number of Processors, Nmax, NSmin, NSmax, β and F; Output: A task schedule; 1: Call InitialPopulation; 2: repeat 3: Call Laying; 4: Call SetToNmax; 5: Call Immigration; 6: until Size of Run;

CE

PT

In order to produce an initial population, we used randomly generated schedules for the proposed algorithm. In initial population algorithm (stages are shown in Algorithm 2) first initial population is generated at a given number in the algorithm and based on Eq. (10), a K random number is assigned to each created schedules. [

]

(10)

AC

This number suggests new schedules number to be generated by parent schedules at laying stage. In this equation NSmax presents maximum number of new schedules and NSmin signifies minimum number of new schedules received as algorithm input parameter. Each parent schedule is allowed to change the order of a limited tasks or elements to create new schedules. This limited number is known as Modifications Radius and denoted by MR. For each parent schedules, both MRp and MRt values are computed. MRp shows modifications radius of processors and MRt shows allowed modifications radius of tasks computed by equations (11) and (12). *

+

(

)

(11)

ACCEPTED MANUSCRIPT

*

+

(

)

(12)

In these equations, β indicates maximum number of possible modifications in the order of tasks or processors to achieve a correct schedule received as algorithm input.

CR IP T

Theorem 1. Legal set of one task displacement involves a series of tasks without father and son relationship with each other in task graph and the desired task is included in that series. This task can be displaced by one of the legal series task without dependences being violated in task graph. Proof 1. Lack of father-son relationship between tasks, causes tasks not have dependences due to priority and delay of execution, that is, they can be executed in any order.

Algorithm 2. Pseudocode of InitialPopulation.

AN US

and are respectively upper and lower limits of each variable in optimization problems. The sets of these variables for processors ( and ) include zero to last number of processor and for and is a legal set where that task can be displaced without dependences being violated in task graph according to Theorem 1.

ED

M

Input: A DAG application; Output: The initial population; 1: repeat 2: Generate random Schedule as a member of population; 3: Calculate K, MRp and MRt for this Schedule; 4: until Size of Population; 5: Calculate the fitness for all Schedules; 6: Set minimum fitness as a GoalPoint; 7: GlobalPoint : = GoalPoint;

AC

CE

PT

In Fig. 6, these legal sets are shown with different colors for three sample schedules for task graph in Fig. 2. Legal set for T3 task is highlighted with yellow and this task can be displaced by T1 and T4 tasks in schedule (a) and T4 task in schedule (b) and finally by T1, T4 and T2 in schedule (c).

Fig. 6. Determining legal sets for modifications radius in three sample schedules of Fig. 2 graph.

After generating the initial population, the fitness degree is computed by Eq. (9) for every one of schedules (cuckoos). Having computed schedule fitness, it is necessary to compare

ACCEPTED MANUSCRIPT fitness values to select the best schedule. All schedules should be arranged in an ascending order according to fitness function values to determine the best schedule. The result of sorting is a sorted list of schedules where the first member of the list will be selected as the best schedule or GoalPoint. Since this goal point is computed at later stages, it should be continuously compared to global optimum point (GlobalPoint) that finally shows the answer of problem.

4.2. Generating new schedules (laying stage)

AN US

CR IP T

In this stage, new schedules with K number are generated with regard to K and MR created for each schedule in initial population. In every new schedule, each processor is randomly replaced by a processor in modifications radius processor limit and each task is also randomly replaced by one of existing tasks in modification radius task limit. After generating new population, fitness is calculated for all schedules and goal point and global optimum point are updated. Stages of this algorithm are shown in Algorithm 3. After laying stage, schedules with worse fitness compared to other schedules must be deleted. To this aim, first generated population should be sorted according to schedules fitness and then number of maximum cuckoos survived (Nmax) has to be selected from the beginning and others be deleted. Algorithm 3. Pseudocode of Laying.

PT

ED

M

Input: The initial population; Output: The laying population; 1: foreach Schedule in population do 2: Generate new K Schedules by considering values of MRp and MRt; 3: endfor 4: Calculate the fitness for all Schedules; 5: Set minimum fitness as a GoalPoint; 6: if GoalPoint < GlobalPoint then 7: GlobalPoint : = GoalPoint; 8: endif

4.3. Immigrating toward optimal schedule

AC

CE

In this stage, first global optimum point is set as goal for other existing schedules in population. Then all schedules immigrate toward it. By immigration we don’t want to maximize each schedule similarity to global optimum point and reach it we rather want to get close to global optimum point. This causes a more expanded area to be searched and helps not involve in local optima. Theorem 2. In two schedules, two legal sets which have the same beginning index number are called corresponding legal sets. In these sets, Common tasks can be displaced in any of schedules without dependences being violated in task graph. Proof 2. Existing tasks in legal sets have the capacity of parallel execution. That is, their order can be changed. Now if two sets have common tasks, these tasks can be copied from one schedule to the other with the same order without dependences being violated in task graph. Legal sets with the same beginning index number are selected since these sets have more common tasks and all tasks have been executed in both schedules before this index and are not following schedule. Therefore, there is no task that other tasks be waiting for its

ACCEPTED MANUSCRIPT execution and dependences in task graph are violated by changing the order of task execution.

CR IP T

The following steps should be taken for immigration: 1. First, legal sets similar to what proposed in Fig. 6 are determined for every schedule. 2. In every schedule, the legal set whose beginning index number equals to beginning index number of one legal set in global optimum point is selected. 3. If two corresponding legal sets include the same tasks, corresponding legal set in global optimum point including task sequence and their processors is exactly copied in the same location in the desired schedule. Otherwise, only tasks and processors existing in legal set of the desired schedules are copied from global optimum point. By this, dependences in task graph are maintained according to Theorem 2. 4. Entry and exit tasks are not considered in immigration. In Algorithm 4, immigration algorithm is shown. In this algorithm, immigration factor indicated by F suggests the probability of immigrating for one given schedule and is received as an algorithm input parameter. Algorithm 4. Pseudocode of Immigration.

AC

CE

PT

ED

M

AN US

Input: The laying population; Output: The migrated population; 1: foreach Schedule in the laying population do 2: Generate legal sets; 3: foreach legal sets of GlobalPoint do 4: if the current Schedule have the same beginning index then 5: copy all tasks (that also exist in the legal set of the current Schedule) and their processors from the legal set of GlobalPoint to the legal set of the current Schedule by applying probability of F; 6: endif 7: endfor 8: Calculate the fitness for all Schedules; 9: Set minimum fitness as a GoalPoint; 10: if GoalPoint < GlobalPoint then 11: GlobalPoint : = GoalPoint; 12: endif 13: Calculate K, MRp and MRt for this Schedule; 14: endfor Schedules immigration toward a global optimum point is explained in Fig. 7 (a) for two sample schedules and one global optimum point. In Fig. 7 (b), the first schedule has only one legal set corresponding to global optimum point having one common beginning index with corresponding legal set in global optimum point i.e. the first index. This legal set is highlighted by yellow color. In this legal set, tasks T1, T2 and T3 from global optimum point are replaced by these tasks in the first schedule respectively. In Fig. 7 (c), second schedule having two corresponding legal sets to global optimum point is shown whose index is the same as two legal sets in global optimum point and are highlighted with yellow and light blue colors. In these sets, all tasks exist in both schedules (with different order) therefore; all of them are replaced by current tasks and processors in global optimum point. If the number of legal sets in one schedule is the same as their number in global optimum point this means that all tasks and processors in global optimum point are exactly copied in the desired schedule

ACCEPTED MANUSCRIPT and we reach exactly that point instead of immigrating to near the optimum point (arriving at local optima). To get rid of local optima, it is enough to check this condition that the number of legal sets in one schedule not be the same as their number in global optimum point if we face this case, immigration should not be done. Thus, in Fig. 7 (c), immigration operation is not done for second schedule.

4.4. Algorithm termination criterion

AC

CE

PT

ED

M

AN US

CR IP T

Evolutionary algorithms can last forever but termination criterion should be considered for them. Termination criterion of these algorithms is usually when algorithm execution result remains stable for a certain number of repetitions (Xu, et al., 2014). In this paper, algorithm termination criterion is when algorithm result remains stable for 10 times of execution.

Fig. 7. How schedules immigrate toward global optimum point.

ACCEPTED MANUSCRIPT

CR IP T

5. Time and space complexity evaluation The time complexity of MOSCOA can be evaluated as following: According to Algorithm 1, the time is mostly spent in running the loop (Steps 2 through 6) in the proposed MOSCOA. In each iteration of the loop, the algorithm needs to execute Laying function, SetToNmax function, Immigration function and fitness evaluation function. The time complexity of the Laying function (Step 3) is . The time complexity of the SetToNmax function (Step 4) is . The time complexity of the Immigration function (Step 5) is . The time complexity of the fitness evaluation function is , where is the number of edges in the DAG and is the number of processors. Therefore, the time complexity of MOSCOA is , where is the number of iteration performed by MOSCOA. For a dense graph where the number of edges is , the time complexity is . The space complexity of MOSCOA is analyzed as follows: In MOSCOA, to store each schedule, an array of size is needed. There are (size of population) schedules in the initial population, hence, the space complexity of MOSCOA is .

6. Simulation and evaluations

M

AN US

To show the efficiency of MOSCOA for task scheduling in heterogeneous systems, results of running this algorithm are compared with HEFT-T, HEFT-B, CPOP (Topcuoglu, et al., 2002) and BGA (Gupta, et al., 2010) algorithms. MOSCOA is implemented in C# language and it consists of different classes to create standard graphs including FFT, Molecular and Gaussian graphs and random graph with various parameters. To maintain one population schedules, a matrix equal to the number of desired population where each row of this matrix equals to the number of existing tasks in schedule is applied. The simulations are performed on the same PC with an Intel processor Core i7 @ 2.2 GHz and 6 GB RAM.

ED

6.1. Performance comparison metrics

AC

CE

PT

The following metrics are usually employed to compare performance of scheduling algorithms on standard graphs.  Scheduling Length Ratio (SLR). The most important performance measurement criterion on graph scheduling algorithms is scheduling length resulted from output algorithm. Since a large number of graphs with different characteristics are used ,it is necessary to normalize scheduling length to a low band for every graph so that we reach a criterion for total comparison called scheduling length ratio that is expressed by Eq. (13) (Burkimsher, Bate, & Indrusiak, 2013; Dai & Zhang, 2014; Ijaz, Munir, Anwar, & Nasir, 2013; V. Kumar & Katti, 2014; Topcuoglu, et al., 2002; Xu, et al., 2014; J. J. Zhang, Hu, & Yang, 2014): ∑

(13)

In this equation, shows critical path and indicates execution time of task on processor. In a non-scheduled graph, if calculation cost for each task is considered as the lowest execution time on processors, critical path will be computed based on the lowest execution cost indicated by . In fact, this criterion divides resulted time into best schedule ( task execution sum on the fastest processor) to obtain a normal criterion. SLR average on several graphs is considered as comparison criterion. Scheduling length ratio cannot be less than 1. In some sources (Bansal, et al.,

ACCEPTED MANUSCRIPT 2003; Daoud & Kharma, 2011), this metric is known as normalized scheduling length (NSL) and is computed by Eq. (14): (14)



The cost of transmission from Ti to Ta when executed on various processors.

CR IP T

 Speedup. This criteion is resulted frem dividing speed of task continuos execution on fastest processor in parallel execution and is expressed by Eq. (15) (Burkimsher, et al., 2013; Dai & Zhang, 2014; Ijaz, et al., 2013; V. Kumar & Katti, 2014; Topcuoglu, et al., 2002; Xu, et al., 2014; J. J. Zhang, et al., 2014): ∑

(15)

6.2. Real-world application graphs

AN US

 Efficiency metric. Efficiency is another criterion that shows speedup ratio to the number of applied processor.

6.2.1. Molecular graph

M

In this paper, three standard task graphs are utilized as real-world application graphs to evaluate the proposed algorithms. These graphs include Molecular Dynamic Code, Gaussian elimination, and Fast Fourier Transform (FFT) graphs (Daoud & Kharma, 2011; Mohamed & Awadalla, 2011; Topcuoglu, et al., 2002; Xu, et al., 2014). All of these graphs along with their generation algorithms will be explained in this section.

AC

CE

PT

ED

A sample of Molecular graph with 40 nodes is given in Fig. 8 (S. Kim & Browne, 1988). This graph is used as a graph with a fixed value to compare scheduling algorithms.

Fig. 8. Sample of Molecular graph.

ACCEPTED MANUSCRIPT

30

Average SLR

25 20

MOSCOA BGA CPOP HEFT_B HEFT_T

15

10 5 0 0.1

0.5

AN US

35

CR IP T

Metrics to create 100 molecular graphs and BGA and MOSCOA input parameters are given in Table 2. In Fig. 9 (a), SLR average of scheduling algorithms versus different CCR values are presented that express MOSCOA superiority over other algorithms. The findings of this research demonstrated that increased CCR improved the performance of the algorithm. This indicates that MOSCOA still performs better in spite of increased communication costs. In Fig. 9 (b), the efficiency of algorithms versus different number of processors is evaluated showing MOSCOA superiority over other algorithms. This results show that MOSCOA will still perform better than other algorithms if more processors are added, while BGA as a metaheuristic algorithm has a poor performance compared with other algorithms. This is because BGA requires more repetitions to produce the proper answer while MOSCOA has a higher efficiency with the same number of repetitions.

1

5

10

CCR

M

(a) Average SLR of the algorithms vs. CCR 0.9

MOSCOA BGA CPOP HEFT_B HEFT_T

ED

0.8 0.6 0.5

PT

Efficiency

0.7

0.4

CE

0.3 0.2

AC

0.1 0 2

4

6

8

Number of Processors (b) Efficiency of the algorithms vs. the number of processors Fig. 9. Average SLR and the efficiency of algorithms for molecular graph.

6.2.2. Gaussian graph Gaussian graph results from using Fig. 10 (a) algorithm and identifying value of matrix m. In this algorithm, Tk, k are main nodes expressing main operations and Tk, j is considered as ⁄ their dependent nodes in graph. Total number of nodes in this graph is (Topcuoglu, et al., 2002). Communication cost for edges and calculation cost in each level for

ACCEPTED MANUSCRIPT

CR IP T

nodes are computed by a normal distribution with values of two CCR and h metrics. Fig. 10 (b) shows a Gaussian graph sample with matrix sized of five (Topcuoglu, et al., 2002).

AN US

Fig. 10. (a) Gaussian graph generation algorithm. (b) Gaussian graph sample with matrix sized of five.

M

Results obtained from algorithm execution on 100 Gaussian graphs with parameters given in Table 3 and various input (entry) points from 5 to 20 suggesting superiority of MOSCOA over others are presented in Fig. 11. The findings of this section show that bigger DAG size denotes its complexity. Moreover, MOSCOA still produces shorter-length schedules in spite of lower repetitions (50), i.e. distributes program tasks among processors in a way to minimize the total execution time of the program. Table 2. Molecular graph generation metrics vs. parameters of graph scheduling algorithm inputs.

DAG Number of DAGs 100

BGA

CE

MOSCOA

AC

Initial population Random

α 0.5

h 0.1

F 0.7

β 20

CCR 1 (Fig. 9 (b))

Node number 40

Pc (Cross over probability) Random select from [0.05,1]

Population Number 30

Generation Number 100

Nmax 12

Population Number 30

Iteration Number 100

PT

Pm (Mutation probability) Random select from [0.05,1]

ED

Type Molecular

NSmin 20

NSmax 40

Table 3. Gaussian graph generation metrics vs. graph scheduling algorithm inputs parameters.

DAG

Type Gaussian

α 0.5

Number of DAGs 100

h 0.1

CCR 1

Processor Number 8

BGA Pm (Mutation probability) Random select from [0.05,1]

Pc (Cross over probability) Random select from [0.05,1]

Population Number 40

Generation Number 50

Nmax 48

Population Number 40

Iteration Number 50

MOSCOA Initial population Random

F 0.7

β 9

NSmin 6

NSmax 12

ACCEPTED MANUSCRIPT 2.8

MOSCOA BGA CPOP HEFT_B HEFT_T

Average SLR

2.6 2.4 2.2 2 1.8 1.6 1.4

Matrix sizes

CR IP T

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Fig. 11. Average SLR of the algorithms vs. the matrix sizes for Gaussian graph.

6.2.3. FFT graph

AC

CE

PT

ED

M

AN US

FFT graph is generated by an algorithm given in Fig. 12 (a). In this algorithm, A is an array of size n for maintaining polynomial coefficients and Y is an array showing algorithm output (Topcuoglu, et al., 2002). This algorithm consists of two parts: in lines three and four, there is a recursive call that creates the generated graph above dotted line in Fig. 12 (b) (recursive nodes) and in lines six and seven there is a butterfly operation that creates nodes below dotted line. For a vector of size n, return nodes number equals 2×n-1 and butterfly operation recursive nodes is (n size is considered as a power of 2). In generated graphs by this algorithm, all paths from start node to exit node are critical paths (Topcuoglu, et al., 2002). To generate this graph, CCR and h parameters elaborated in graph generation parameters section together with input points indicating number of graph leaves are utilized. Fig. 12 (b) shows a sample FFT graph with the number of input points of 4.

Fig. 12. (a) FFT algorithm. (b) A sample generated graph by this algorithm.

Fig. 13 (a) and (b) give 100 graph comparison average of speedup and SLR with different input points from 2 to 32 showing MOSCOA superiority over other algorithms. Although small graphs don’t display high scheduling differences, creation of schedules by MOSCOA’s improves more than other parameters once DAG increases in size. Parameters of graph generation and scheduling algorithms inputs are presented in Table 4.

ACCEPTED MANUSCRIPT Table 4. FFT graph generation metrics vs. parameters of graph scheduling algorithm input.

DAG Type FFT

α 0.5

Number of DAGs 100

h 0.1

CCR 1

Processor Number 4

BGA Pm (Mutation probability) Random select from [0.05,1]

Pc (Cross over probability) Random select from [0.05,1]

Population Number 40

Generation Number 50

Nmax 48

Population Number 40

Iteration Number 50

MOSCOA

6

Average SLR

5 4

NSmin 6

NSmax 12

CR IP T

β 9

F 0.7

MOSCOA BGA CPOP HEFT_B HEFT_T

3 2

1 0 2

4

AN US

Initial population Random

8

16

32

M

Input points

CE AC

MOSCOA BGA CPOP HEFT_B HEFT_T

ED

5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0

PT

Average Speedup

(a) Average SLR of the algorithms vs. the input points

2

4

8

Input points

16

32

(b) Average Speedup of the algorithms vs. the input points Fig. 13. Average SLR and the Speedup of algorithms for FFT graph.

6.3. Random graph Random graph is generated by random graph generation software. In random graph, the number of nodes (DAG size) and the number of edges of each node is received stably from SETv and SETout_degre sets and graph depth is randomly created by a uniform distribution where its average value equals √ ⁄ and the number of nodes in each level is also randomly

ACCEPTED MANUSCRIPT created by a uniform distribution where its average value equals α √ . In random graph, communication costs are assumed equal for all nodes (Topcuoglu, et al., 2002). SLR and speedup average comparisons of 100 random graphs with different sized from 10 to 200 are given in Fig. 14 (a) and (b) showing better performance of MOSCOA compared to others. The findings of this evaluation show that speedup parameters and SLR improve in the presence of larger DAG. In other words, MOSCOA performs better for bigger complex randomized graphs in comparison to other algorithms. Parameters of graph generation and scheduling algorithms inputs are presented in Table 5.

6.4. Convergence trace

M

AN US

MOSCOA BGA CPOP HEFT_B HEFT_T

ED

Average SLR

20 18 16 14 12 10 8 6 4 2 0

CR IP T

Convergence trace is utilized to compare MOSCOA repletion number with BGA algorithm. This trace shows which algorithm reaches a better schedule in lower number of repetitions. Results of this trace in Fig. 15 (a) and (b) carried out respectively on graphs of sizes 10 and 20 shows that MOSCOA reaches a better schedule in a lower number of repetitions compared to BGA algorithm. Graph generation metrics vs. algorithm inputs are shown in Table 6.

10

20

50

100

200

Number of Tasks

PT

(a) Average SLR of the algorithms vs. the number of tasks 4.5

AC

CE

Average Speedup

4

3.5 3 2.5

MOSCOA BGA CPOP HEFT_B HEFT_T

2 1.5 1 0.5 0 10

20

50

100

Number of Tasks

200

(b) Average Speedup of the algorithms vs. the number of tasks Fig. 14. Average SLR and the Speedup of algorithms for random graph.

ACCEPTED MANUSCRIPT Table 5. Random graph generation metrics vs. scheduling algorithm input.

DAG Type Random

α 0.5

Number of DAGs 100

h 0.1

CCR 1

Processor Number 3

BGA Pm (Mutation probability) Random select from [0.05,1]

Pc (Cross over probability) Random select from [0.05,1]

Population Number 30

Generation Number 25

Nmax 48

Population Number 30

Iteration Number 25

Initial population Random

F 0.7

β 9

NSmin 6

NSmax 12

CR IP T

MOSCOA

Table 6. Random graph generation metrics vs. graph scheduling inputs for convergence trace.

DAG Out degree 5

α 0.5

Number of tasks 10, 20

h 0.1

BGA Pc (Cross over probability) Random select from [0.05,1]

MOSCOA Initial population Random

F 0.7

β 9

Nmax 48

NSmin 6

Population Number 40

AN US

Pm (Mutation probability) Random select from [0.05,1]

CCR 1

NSmax 12

Population Number 40

Processor Number 6

Generation Number 100

Iteration Number 100

M

7. Downsides of MOSCOA algorithm

AC

CE

PT

ED

Although MOSCOA needs lower repetitions than their genetic counterparts, it needs high frequency of repetition to produce the desired answer. This is a drawback for this algorithm compared to heuristic algorithms such as CPOP and HEFT. Creating random initial populations is another negative point of the proposed algorithm. This is because the quality of the schedules created in initial populations directly affects the outcomes. This is tackled in evolutionary algorithms where initial populations are generated by other heuristic models (Carretero, Xhafa, & Abraham, 2007; Daoud & Kharma, 2011; K. Kaur, Chhabra, & Singh, 2010; S. Kaur & Verma, 2012; H. Wang, et al., 2016; Xu, et al., 2014) which are interesting for future research. Furthermore, the volumes of entry-level parameters of the algorithm such as Nmax, NSmin, NSmax, β and F which affect the outcomes are set manually without any involvement of the algorithm. This could be tackled through the application of self-adaptive cuckoo optimization algorithm (Li & Yin, 2015) which will be discussed in future works. Random assignment of tasks to processors is another drawback of MOSCOA algorithm. Although the processor selection and task prioritization have been skipped in the algorithm thanks to a random selection of processors in the generation of the initial population stage, in contrast to CPOP and HEFT, this random selection generates an initial population which is not desirable for the processor to perform optimization tasks.

Average Makespan

ACCEPTED MANUSCRIPT

460

MOSCOA

440

BGA

420 400 380 360 340 320

CR IP T

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97

300

Number of Generations

BGA

650 630

AN US

Average Makespan

(a) The convergence of makespan for randomly generated DAGs with 10 tasks, 50 independent runs 670 MOSCOA

610 590 570

M

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97

550

Number of Generations

ED

(b) The convergence of makespan for randomly generated DAGs with 20 tasks, 50 independent runs Fig. 15. Convergence trace for random graph.

PT

8. Conclusions

AC

CE

In this paper, a novel algorithm for task scheduling in heterogeneous systems is proposed. This algorithm is a multi-objective algorithm based on cuckoo optimization algorithm (MOSCOA). In this algorithm, task execution parallelizing is considered while reducing execution time. Functions of Cuckoo optimization algorithm i.e. laying and immigration are adopted for task scheduling. The experimental results obtained from running this algorithm are compared with HEFT-T, HEFT-B, CPOP and BGA algorithms using standard comparison metrics. The results suggest that MOSCOA has a better performance in comparison with HEFT-T, HEFT-B and CPOP algorithms and compared to BGA algorithm, offers a better schedule with lower number of repetitions. Generation of initial population by other heuristic methods will be studied in future works and also a self-adaptive mechanism will be added to MOSCOA using cuckoo optimization algorithm. In this mechanism, algorithm input metrics will be changed by each iteration and will be moved toward being optimized. Results comparison with a wider range of graph applications will be also carried out by various parameters.

ACCEPTED MANUSCRIPT

References

AC

CE

PT

ED

M

AN US

CR IP T

Abu-Srhahn, A. a., & Al-Hasan, M. (2015). Hybrid Algorithm using Genetic Algorithm and Cuckoo Search Algorithm for Job Shop Scheduling Problem. International Journal of Computer Science Issues (IJCSI), 12, 288. Ahmad, S. G., Liew, C. S., Munir, E. U., Ang ,T. F., & Khan, S. U. (2016). A hybrid genetic algorithm for optimization of scheduling workflow applications in heterogeneous computing systems. Journal of Parallel and Distributed Computing, 87, 80-90. Babukartik, R., & Dhavachelvan, P. (2012). Hybrid Algorithm using the advantage of ACO and Cuckoo Search for Job Scheduling. International Journal of Information Technology Convergence and Services, 2. Bansal, S., Kumar, P., & Singh, K. (2003). An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems. Parallel and Distributed Systems, IEEE Transactions on, 14, 533-544. Branch, K. (2015). A Novel Task Scheduling Method in Cloud Environment using Cuckoo Optimization Algorithm. Burkimsher, A., Bate, I., & Indrusiak, L. S. (2013). A survey of scheduling metrics and an improved ordering policy for list schedulers operating on workloads with dependencies and a wide variation in execution times. Future Generation Computer Systems, 29, 2009-2025. Carretero, J., Xhafa, F., & Abraham, A. (2007). Genetic algorithm based schedulers for grid computing systems. International Journal of Innovative Computing, Information and Control, 3, 247-272. Dai, Y., & Zhang, X. (2014). A Synthesized Heuristic Task Scheduling Algorithm .The Scientific World Journal, 2014. Damodaran, P., & Vélez-Gallego, M. C. (2012). A simulated annealing algorithm to minimize makespan of parallel batch processing machines with unequal job ready times. Expert Systems with Applications, 39, 1451-1458. Daoud, M. I., & Kharma, N. (2011). A hybrid heuristic–genetic algorithm for task scheduling in heterogeneous processor networks. Journal of Parallel and Distributed Computing, 71, 1518-1531. Dasgupta, P., & Das, S. (2015). A Discrete Inter-Species Cuckoo Search for flowshop scheduling problems. Computers & Operations Research, 60, 111-120. Durgadevi, P., & Srinivasan, S. (2015). Task Scheduling using Amalgamation of Metaheuristics Swarm Optimization Algorithm and Cuckoo Search in Cloud Computing Environment .Journal for Research| Volume, 1. Ferrandi, F., Lanzi, P. L., Pilato, C., Sciuto, D., & Tumeo, A. (2010). Ant colony heuristic for mapping and scheduling tasks and communications on heterogeneous embedded systems. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 29, 911-924. Gogos, C., Valouxis, C., Alefragis, P., Goulas, G., Voros, N., & Housos, E. (2016). Scheduling independent tasks on heterogeneous processors using heuristics and Column Pricing. Future Generation Computer Systems, 60, 48-66. Guo, L., Zhao, S., Shen, S., & Jiang, C. (2012). Task scheduling optimization in cloud computing based on heuristic algorithm. Journal of Networks, 7, 547-553. Gupta, S., Agarwal, G., & Kumar, V. (2010). Task scheduling in multiprocessor system using genetic algorithm. In Machine Learning and Computing (ICMLC), 2010 Second International Conference on (pp. 267271): IEEE. Hwang, R., Gen, M., & Katayama, H. (2006). A performance evaluation of multiprocessor scheduling with genetic algorithm. Asia Pacific Management Review, 11, 67. Ijaz, S., Munir, E. U., Anwar, W., & Nasir, W. (2013). Efficient scheduling strategy for task graphs in heterogeneous computing environment. Int. Arab J. Inf. Technol., 10, 486-492. Kaur, K., Chhabra, A & ,.Singh, G. (2010). Heuristics based genetic algorithm for scheduling static tasks in homogeneous parallel system. International Journal of Computer Science and Security (IJCSS), 4, 183. Kaur, S., & Verma, A. (2012). An efficient approach to genetic algorithm for task scheduling in cloud computing environment. International Journal of Information Technology and Computer Science (IJITCS), 4, 74. Kim, H., & Kang, S. (2011). Communication-aware task scheduling and voltage selection for total energy minimization in a multiprocessor system using ant colony optimization. Information Sciences, 181, 3995-4008. Kim, S., & Browne, J. (1988). A general approach to mapping of parallel computation upon multiprocessor architectures. In International conference on parallel processing (Vol. 3, pp. 8). Kołodziej, J., & Khan, S. U. (2012). Multi-level hierarchic genetic-based scheduling of independent jobs in dynamic heterogeneous grid environment. Information Sciences, 214, 1-19. Kumar, N., & Vidyarthi, D. P. (2016). A novel hybrid PSO–GA meta-heuristic for scheduling of DAG with communication on multiprocessor systems. Engineering with Computers, 32, 35-47.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

Kumar, V., & Katti, C. (2014). A Scheduling Approach with Processor and Network Heterogeneity for Grid Environment .International Journal. Kwok, Y.-K., & Ahmad, I. (1996). Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors. Parallel and Distributed Systems, IEEE Transactions on, 7, 506-521. Kwok, Y.-K., & Ahmad, I .)5002( .On multiprocessor task scheduling using efficient state space search approaches. Journal of Parallel and Distributed Computing, 65, 1515-1532. Li, X., & Yin, M. (2015). Modified cuckoo search algorithm with self adaptive parameter method. Information Sciences, 298, 80-97. Lin, C.-S., Lin, C.-S., Lin, Y.-S., Hsiung, P.-A., & Shih, C. (2013). Multi-objective exploitation of pipeline parallelism using clustering, replication and duplication in embedded multi-core systems. Journal of Systems Architecture, 59, 1083-1094. Lin, J., Zhong, Y., Lin, X., Lin, H., & Zeng, Q. (2014). Hybrid Ant Colony Algorithm Clonal Selection in the Application of the Cloud's Resource Scheduling. arXiv preprint arXiv:1411.2528. Lo, S.-T., Chen, R.-M., Huang, Y.-M., & Wu, C-.L. (2008). Multiprocessor system scheduling with precedence and resource constraints using an enhanced ant colony system. Expert Systems with Applications, 34, 2071-2081. Lu, H., Niu, R., Liu, J., & Zhu, Z. (2013). A chaotic non-dominated sorting genetic algorithm for the multiobjective automatic test task scheduling problem. Applied Soft Computing, 13, 2790-2802. Mandal, T., & Acharyya, S. (2015). Optimal task scheduling in cloud computing environment: Meta heuristic approaches. In Electrical Information and Communication Technology (EICT), 2015 2nd International Conference on (pp. 24-28): IEEE. Manudhane, K. A., & Wadhe, A. (2013). Comparative Study of Static Task Scheduling Algorithms for Heterogeneous Systems. International Journal on Computer Science and Engineering, 5. Mishra, P. K., Mishra, A., Mishra, K. S., & Tripathi, A. K. (2012). Benchmarking the clustering algorithms for multiprocessor environments using dynamic priority of modules. Applied Mathematical Modelling, 36, 6243-6263. Mohamed, M .R., & Awadalla, M. H. (2011). Hybrid Algorithm for Multiprocessor Task Scheduling. International Journal of Computer Science Issues, 8, 79-89. Navimipour, N. J., & Milani, F. S. (2015). Task scheduling in the cloud computing based on the cuckoo search algorithm. International Journal of Modeling and Optimization, 5, 44. Omara, F. A., & Arafa, M. M. (2010). Genetic algorithms for task scheduling problem. Journal of Parallel and Distributed Computing, 70, 13-22. Pendharkar, P. C. (2011). A multi-agent memetic algorithm approach for distributed object allocation. Journal of Computational Science, 2, 353-364. Pendharkar, P. C. (2015). An ant colony optimization heuristic for constrained task allocation problem. Journal of Computational Science, 7, 37-47. Rahmani, A. M., & Vahedi, M. A. (2008). A novel Task Scheduling in Multiprocessor Systems with Genetic Algorithm by using Elitism stepping method. Science and Research branch, Tehran, Iran. Rajabioun, R. (2011). Cuckoo optimization algorithm. Applied Soft Computing, 11, 5508-5518. Sathappan, O., Chitra, P., Venkatesh, P., & Prabhu, M. (2011). Modified genetic algorithm for multiobjective task scheduling on heterogeneous computing system. International Journal of Information Technology, Communications and Convergence, 1, 146-158. Shahdi-Pashaki, S., Teymourian, E., Kayvanfar, V., Komaki, G. M., & Sajadi, A. (2015). Group technologybased model and cuckoo optimization algorithm for resource allocation in cloud computing. IFACPapersOnLine, 48, 1140-1145. Singh ,J., & Singh, G. (2012). Improved Task Scheduling on Parallel System using Genetic Algorithm. International Journal of Computer Applications, 39, 17-22. Topcuoglu, H., Hariri, S., & Wu, M.-y. (2002). Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, 13, 260-274. Ullman, J. D. (1975). NP-complete scheduling problems. Journal of Computer and System sciences, 10, 384393. Wang, G., Wang, Y., Liu, H., & Guo, H .)5002( .HSIP: A Novel Task Scheduling Algorithm for Heterogeneous Computing. Scientific Programming, 2016. Wang, H., Wang, W., Sun, H., Cui, Z., Rahnamayan, S., & Zeng, S. (2016). A new cuckoo search algorithm with hybrid strategies for flow shop scheduling problems. Soft Computing, 1-11. Wang, J., Duan, Q., Jiang, Y., & Zhu, X. (2010). A new algorithm for grid independent task schedule: genetic simulated annealing. In World Automation Congress (WAC), 2010 (pp. 165-171): IEEE. Wu, M.-Y., Shu, W., & Gu, J. (199 .)7Local search for DAG scheduling and task assignment. In 2013 42nd International Conference on Parallel Processing (pp. 174-174): IEEE Computer Society.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

Xu, Y., Li, K., Hu, J., & Li, K. (2014). A genetic algorithm for task scheduling on heterogeneous computing systems using multiple priority queues. Information Sciences, 270, 255-287. Yang, X.-S., Cui, Z., Xiao, R., Gandomi, A. H., & Karamanoglu, M. (2013). Swarm intelligence and bioinspired computation: theory and applications: Newnes. Yang, Y., Wu ,G., Chen, J., & Dai, W. (2010). Multi-objective optimization based on ant colony optimization in grid over optical burst switching networks. Expert Systems with Applications, 37, 1769-1775. Zhang, J. J., Hu, W. W., & Yang, M. N. (2014). A Heuristic Greedy Algorithm for Scheduling Out-Tree Task Graphs. TELKOMNIKA Indonesian Journal of Electrical Engineering, 12. Zhang, L., Chen, Y., Sun, R., Jing, S., & Yang, B. (2008). A task scheduling algorithm based on PSO for grid computing. International Journal of Computational Intelligence Research, 4, 37-43. Zhang, L., Li, K., Xu, Y., Mei, J., Zhang, F., & Li, K. (2015). Maximizing reliability with energy conservation for parallel task scheduling in a heterogeneous cluster. Information Sciences, 319, 113-131. Zomaya, A. Y., Ward, C., & Macey, B. (1999). Genetic scheduling for parallel processor systems: comparative studies and performance issues. Parallel and Distributed Systems, IEEE Transactions on, 10, 795-812.