Future Generation Computer Systems 25 (2009) 819–828
Contents lists available at ScienceDirect
Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs
A hybrid load balancing strategy of sequential tasks for grid computing environments Yajun Li a,∗ , Yuhang Yang a , Maode Ma b , Liang Zhou a a
Department of Electronics and Engineering, Shanghai Jiaotong University, 800 DongChuan Road, Minhang District, Shanghai 200240, PR China
b
The School of Electronics and Engineering, Nanyang Technological University, Singapore
article
info
Article history: Received 10 November 2008 Received in revised form 12 February 2009 Accepted 16 February 2009 Available online 26 February 2009 Keywords: Load balancing Scheduling Grid computing Genetic algorithm
abstract Load balancing is of paramount importance in grid computing. Generally, load balancing can be categorised into two classes of activity based on the type of information on which the corresponding decisions are made, namely averages-based and instantaneous measures-based classes. Either class has certain flaws which confine themselves to limited performance improvement when being employed separately. It is therefore advantageous to combine both to form a hybrid one in order to make most of the strong points of each. In this paper, we address the load balancing problem by presenting a hybrid approach to the load balancing of sequential tasks under grid computing environments. Our main objective is to arrive at task assignments that could achieve minimum execution time, maximum node utilisation and a well-balanced load across all the nodes involved in a grid. A first-come-first-served and a carefully designed genetic algorithm are selected as representatives of both classes to work together to accomplish our goal. The simulation results show that our algorithm can achieve a better load balancing performance as compared to its ‘pure’ counterparts. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Originated from a new computing infrastructure for scientific research and cooperation, grid computing has become a mainstream technology for large-scale resource sharing and distributed system integration during the past decades [1,2]. Basically, grid resources are geographically distributed computers or clusters, which are logically aggregated to serve as a unified computing resource. Due to uneven task arrival patterns and unequal computing capabilities, the computing node in one grid site may be overloaded while others in a different grid site may be under-utilised. As a result, to take full advantage of such grid systems, task scheduling and resource management are essential functions provided at the service level of the grid software infrastructure, where issues of task allocation and load balancing represent a common problem for most grid systems. Load balancing mechanism aims to equally spread the load on each computing node, maximising their utilisation and minimising the total task execution time [3]. In general, loadbalancing algorithms can be roughly categorised as centralised or decentralised in terms of the location where the load balancing
∗
Corresponding author. E-mail address:
[email protected] (Y. Li).
0167-739X/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2009.02.001
decisions are made. A centralised load balancing approach can function either based on averages or instantaneous measures according to the type of information on which the load balancing decisions are made [4,5]. The first type of load balancing takes a holistic view of the tasks waiting to be processed and makes an optimal decision over the entire group of tasks systematically while the second makes an instantaneous load balancing action for one task each time to satisfy certain performance criteria [6,7]. Both these load balancing schemes have merits and flaws. Average-based schemes might achieve a thorough and holistic load balancing in handling sequential tasks in computing grids. It assumes, however, that all information including the characteristics of tasks, the computing nodes and the communication network are known in advance. In addition, it might cost some processing time when making appropriate decisions and thus affects the total task execution time, which greatly degrades the overall system performance. In contrast to its counterpart, instantaneous schemes can yield an immediate and direct decision for the newly arrived task without a priori knowledge and hence hardly waste any processing time. However, it might perform erratically for sequential task load balancing in computational grids due to the lack of an overall knowledge of the tasks to be scheduled. Therefore, it may be advantageous to combine both schemes to work in tandem to form a hybrid approach that makes the best of their strong points of each. In this paper, we address ourselves to a centralised load balancing scheme of sequential tasks on one kind of grid
820
Y. Li et al. / Future Generation Computer Systems 25 (2009) 819–828
computing environment with middleware (e.g., the well-known Globus Toolkit [8]) installed inside every component therein. Different from traditional ‘pure’ strategies mentioned above, a hybrid load balancing algorithm incorporating both the averagebased and instantaneous scheme is presented in our work. A first-come-first-served algorithm and a carefully designed Genetic Algorithm (GA), selected as representative approaches of these two classes for our work, function together to fulfill our goal. Our main objective is to arrive at task assignments that will achieve minimum execution time, maximum node utilisation and a wellbalanced load across all the computing nodes involved. Extensive simulation studies were conducted to analyse the performance of our proposed load balancing algorithm. The proposed algorithm is compared with the other two algorithms with respect to three performance metrics. It is observed that our algorithm can achieve better load balancing performance as compared to its counterparts in simulation. To the best of our knowledge, it is the first hybrid proposal to address the load balancing problem in grid environments. Cao and Spooner first juxtaposed the two algorithms similar to us in [6]. However, they considered them separately with respect to load balancing. Zomaya and Teh proposed a dynamic load balancing strategy based on a genetic algorithm in [3]. They suppose that there are always enough tasks waiting to be processed and the characteristics of tasks can be known in advance. Kalantari [9] and Chang [10] addressed load balancing by employing neural models and an ant algorithm respectively, both of which can be subsumed under averages-based classes. On the other hand, Yan and Wang presented a hybrid load balancing policy in [11]. They concerned themselves with the topic as to how to select efficient nodes to carry out the computation work under a grid environment. Similarly, Abawajy [12] presented a twolevel adaptive space-sharing scheduling policy for non-dedicated heterogeneous clusters, which use an adaptive way to fulfill the allocation of processors to every job. Iqbal and Carey also studied the performance of different load balancing algorithms when varying the computing nodes during the lifetime of a multistage parallel computation in [13]. In addition, Akhtar [14] put forward a GA-based analyser and load balancer to schedule tasks according to the current situation as well as the history resource data. A two-phase load balancing approach was advanced with an attempt to achieve the dynamic load balance of a running system. These works are complementary to our work. Taken altogether, the main contributions and novelties of our paper are: (1) we employed the concept of hybrid load balancing to deal with the load balancing problem in grid computing for the first time; (2) a well-designed genetic algorithm is adopted to solve our NP-hard problem; (3) a new set of performance metrics is introduced to evaluate the system performance. The rest of paper is organised as follows. We begin with the overview of the system model including the grid and application model that we are using in Section 2. Section 3 presents our proposed algorithm with a thorough explanation of how the two schemes work and cooperate to fulfill the ultimate objectives. We evaluate the performance of the proposed scheme in Section 4. Finally, Section 5 concludes the paper. 2. System model 2.1. Grid Model In this paper, a quite simple yet sufficiently realistic abstraction of grid computing environments is applied to carry out our work as depicted in Fig. 1. In order to focus our attention on computing grids, we assume that the network connectivity involved is perfect so that propagation latencies and transfer delays can be considered
to be negligible. We further assume that in the grid under study there is a central resource management unit (RMU), to which every Computing Node (CN) connects and the grid clients send their tasks to the RMU. Without loss of generality, we use only one client, but several kinds of workloads have been simulated. The RMU is responsible for scheduling tasks among CNs. In our work, we use calculating speed to represent the capability of a CN (hereafter we use speed and capability interchangeably throughout the paper). Each CN has a relative speed Ci , which is the relative speed of its calculating speed compared to a reference CN. That means that if we have two CN with relative speeds C1 and C2 , a job which takes l units of time on the first CN to be processed would take C l · C1 units of time on the second one. We do not restrict CNs to 2 be locally homogenous, that is, CNs in the grid under study can be either homogeneous or heterogeneous. Upon arrival, tasks must either be assigned to exactly one CN for processing immediately by the instantaneous scheduling or wait to be scheduled by the averages-based scheme. The agent therein monitors the waiting tasks and signals the RMU to switch between the two components of the hybrid algorithm when the predefined criteria are met. The agent also collects the real-time system information (e.g. current task execution status or computing capabilities of all the CNs) for the RMU to make further processing. Once scheduled, tasks will be dispatched to the assigned CNs accordingly. When a task is completed, the executing CN will return the result to the RMU. 2.2. Application model We adopt the application model introduced by Berten and Goossens [15] in our work. In particular, tasks arrive randomly following a Poisson process with a random computation length following an exponential distribution in our work. They are assumed to be computationally intensive and therefore the communication overheads between the RMU and CNs are considered to be negligible as compared with the computation cost. In addition, we assume that tasks are mutually independent (i.e., there is no precedence constraint between tasks). Also, we assume that tasks can be executed on any computing node and each processor can only execute one task at each time point and the execution of a task cannot be interrupted or moved to another processor during execution. That is, the task is not preemptable. The agent is responsible for probing the current states of all the CNs and then estimates accordingly the execution time of the task under the instantaneous scheduling or tasks under the average-based one. 3. Hybrid load balancing The arrived tasks will be placed in a task queue in the RMU from which they are assigned to CNs shown in Fig. 1. Initially, since there are not very many tasks in the queue, the instantaneous scheme, the First-Come-First-Served (FCFS) of the hybrid scheduler functions to find the earliest completion time of each task individually based on the information provided by the Agent component. As the system workload grows heavy, there are more and more tasks waiting in the queue, which gives the scheduler a chance to perform load balancing over this group of tasks from a holistic perspective. The load balancing objective now is to minimise the total execution time of those waiting tasks as well as a well-balanced load across all nodes. This is a combinatorial optimisation problem and usually computationally intractable even under simplified assumptions [16]. In order to obtain near-optimal solutions, we resort to an iterative heuristic method—in this work a genetic algorithm (GA) [17,18]. To trigger the GA into operation, a similar sliding window technique which was first introduced by Zomaya [3] is used. The
Y. Li et al. / Future Generation Computer Systems 25 (2009) 819–828
821
Fig. 1. System Model and its Components.
functionality of the sliding window in our algorithm is twofold. First, it is used by the Agent to signal the start point of the GA as the number of tasks in the queue reaches the sliding window size. Then, tasks enter the window and get scheduled by the GA. Second, as there may be too many tasks waiting to be assigned at a time, the execution of the GA will take too long in arriving at a decision and thus renders the load balancing results useless. Therefore, a sliding window is employed so that only tasks within the window are considered for scheduling each time. This will help the GA make rather rapid and significant task assignments [19]. The window size is fixed and the number of tasks in the string of the GA is equal to the size of the window. Then, the permutations of these tasks will form a series of lists that represent different orders in which these tasks can be scheduled for execution. When the GA arrives at a task schedule, these tasks will be assigned to the CNs accordingly. Once these tasks have been assigned, the slidingwindow will be updated with new tasks by sliding along to the next set of tasks in the task queue and repeating the assignment process. It is worth noting that our algorithm is running on a dedicated scheduler (RMU) so that the scheduling process and the real task execution can be carried out in parallel. Thus, the overhead that the GA incurs can dwindle to zero.
find the earliest possible time for each task to complete, according to the sequence of the task arrivals. tej = min tej .
(6)
∀P ∈P
Whatever the selection of CNs, the completion time of tasks is always equal to the earliest possible start time plus the execution time, which is described as follows: tej = tsj + texej .
(7)
The earliest possible start time for task Tj on a CN is the latest free time of the chosen CN if there are still tasks running on it. if there is no task running on the chosen CN ahead of task Tj ’s arrival, Tj can be executed on this CN immediately. These are expressed as follows: tsj = max tj ,
max
(tep )
∀p,P (p)=P (j),p≺j
(8)
where task p represents any task executed prior to task j on the same CN, P (·) denotes an assigned CN to some a task and tep is the end time of task p. In summary, we formulate our problem as follows. min ∀j
tej
(9)
Subject to : 3.1. First-come-first-served algorithm Consider a grid system with m CNs where each CN Pi has its own capability Ci . We use P and C to denote such a system and its corresponding capacity respectively.
P = {Pi |i = 1, 2, . . . . . . , m}
(1)
C = {Ci |i = 1, 2, . . . . . . , m}.
(2)
A set T of tasks T1 , T2 , . . . ,Tn is considered to be run on P. Each task of T has an arrival time t, a virtual computation duration td (the accurate time it takes depends on which computing node it is assigned to and can be estimated by the Agent). In addition, each task has two scheduled attributes: a start time and an end time denoted by ts, te, respectively. Hence, task Tj can be expressed as a tuple of four elements (tj , tdj , tsj , tej ).
T = {Tj |j = 1, 2, . . . . . . , n}
(3)
(t , td, ts, te) = {(tj , tdj , tsj , tej )|j = 1, 2, . . . , n}.
(4)
Upon arrival, tasks are allocated to a certain CN P ∈ P by the central resource management unit using first-come-first-served algorithm. Given an appropriate CN, the execution time texej for each task Tj can be calculated and expressed as follows:
∀P ∈ P,
C P,
texej = tdj /C .
(5)
A task has the probability of being allocated to any CN. So, the FCFS should consider all these possibilities and identify which node will complete the task earliest. As a result, the function of the FCFS is to
tej = tsj + texej
for all j
tsj = max tj ,
max
texej = tdj /C
(tep ) for all j, p
∀p,P (p)=P (j),p≺j
for all j
The final goal of the FCFS is to find the minimum tej in Eq. (9) for each task. Although the FCFS algorithm is proved to be efficient under some conditions in [20], it is clear that if the number of CNs in a grid increases, the overhead incurred by the FCFS algorithm will increase considerably since the agent should probe every node to collect the required information. In addition, the defect that the sequence of the task arrivals determines the order of task executions limits the FCFS algorithm to gain a more desirable performance. Reordering the tasks by the GA may optimise the task execution and load balancing further when the aforementioned criteria are met. This is addressed in the following section. 3.2. Genetic algorithm As stated above, as the number of tasks in the queue reaches the sliding-window’s size, the GA component comes into effect. The GA here focuses on an overall performance of the tasks within the sliding window. Rather than looking for an earliest completion time for each task individually, the main objectives of the GA are to achieve the minimum of makespan, which represents the latest completion time among all the tasks involved, maximum node utilisation and a well-balanced load across all the CNs. GAs gain more application in load balancing for grid environments [21,22,6,3] in recent years. The usual form of genetic
822
Y. Li et al. / Future Generation Computer Systems 25 (2009) 819–828
algorithm was described by Goldberg [18]. GAs are stochastic search techniques based on the mechanism of natural selection and natural genetics. Differing from conventional search techniques, GAs start with an initial set of random solutions called population. Each individual in a population is called a chromosome (or string), representing a solution to the problem at hand. The chromosomes evolve through successive iterations, called generations. During each generation, the chromosomes are evaluated, using some measure of fitness. To create the next generation, new chromosomes, called offspring, are formed by either (a) merging two chromosomes from the current generation using a crossover operator or (b) modifying a chromosome using a mutation operator. A new generation is formed by (a) selecting, according to fitness values, some of the parents and offspring and (b) rejecting others so as to keep the population size constant. After several generations, the algorithms converge to the best chromosome, which hopefully represents the optimum or suboptimal solution to the problem. Our GA is based on the standard GAs. The technique requires a coding scheme that can represent all legal solutions to the optimisation problem. Any possible solution is uniquely represented by a particular chromosome (or string). Chromosomes are manipulated in various ways by applying three genetic operators until the termination condition is met. In order for this manipulation to proceed in the correct direction, a well-designed fitness function is required. In this section, we present an in-depth discussion on our proposed algorithm by enumerating several major points. 3.2.1. Encoding mechanism To a large extent, a well-constructed coding interpretation of the specific problem space is the source of efficiency and efficacy and thus of big significance to genetic algorithms. In this regard, the better the coding methods are, the more efficiency and efficacy the genetic algorithms will have. An important factor in selecting the string representation for the search nodes is that all of the search nodes in a search space are represented and the representation is unique. It is also desirable, though not necessary, that the strings are in one-to-one correspondence with the search nodes. That is, each string corresponds to a legal search node. Furthermore, for our aforementioned assumptions, since every task is supposed to be present and appear only once in one individual, the coding scheme is therefore responsible for satisfying this condition. As a result of that, instead of the traditional simple binary vectors, an otherwise encoded string is implemented as follows. A gene in our plan consists of a decimal tuple of two attributes hTj , Pi i, where Tj denotes task j, j = 1 . . . n and Pi denotes CN i, i = 1 . . . m. Each 2-tuple gene indicates that task j is assigned to be processed on CN i. Each string is composed of tuples with a fixed length/size equal to the maximum number of tasks that may be considered for scheduling. Fig. 2 shows an example individual. The first tuple of this individual assigns task 1 to node 1, the next tuple assigns task 5 to node 4 and so on. This representation requires that the number of nodes and the number of tasks to be performed be known in advance. Note that Fig. 2(a) is used to explicitly illustrate our scheme throughout this paper and Fig. 2(b) is the actual form in practical algorithm. Despite the different forms, they are virtually equivalent to each other. While the tuples on an individual determine which task is assigned to which node, the positions in which the tuples appear on an individual determine the order in which the tasks will be performed. Individuals are read from left to right to determine the ordering of tasks on each node. For example, task 1 will be executed first, followed by task 5 and task 7 and so on. It is worth pointing out that two tasks assigned to two different CNs may be executed in parallel. For example, task 1,5,7 can be executed on CN 1,4,2 simultaneously, respectively since they are scheduled both to execute in different CNs and to be the first task in each own allocated CN as depicted in Fig. 2(a).
Fig. 2. Coding Scheme.
3.2.2. Objective and fitness function The fitness function in genetic algorithms is typically the goal that we want to optimise in the problem. It is used to evaluate the search nodes and also controls the genetic operators. The main objective here is to arrive at task assignments that will achieve minimum completion time as well as a well-balanced load across all nodes. Therefore, a combined fitness function is used which incorporates such dual optimisation criteria. This fitness function will then be used to measure the performance of the strings.
• Makespan. The first objective function for our proposed algorithm is the latest completion time of the task schedule, which is generally referred to as Makespan, ω. Note that Makespan means the longest completion time among all the CNs in a grid. Consider a schedule k based on our proposed coding scheme, it is straightforward to calculate the Makspan, ωk . Using the notation given in Section 3.1, we express this in the following formulation: min ωk
(10)
∀k
Subject to : tskj =
max
∀p,P (p)k =P (j)k ,p≺j
tekj = tskj + texekj ,
ωk = max tekj ,
tekp ,
for all j, P
(10a)
for all j
(10b)
for all j
(10c)
1≤j≤n
where task p represents any task executed prior to task j on the same node, P (·)k denotes the assigned CN to a task under schedule k and tekp is the end time of task p. Note that since all of the tasks within the window are considered together, the order is defined according to schedule k in Eq. (10a) instead of the task arrival time tj in Eq. (8). The goal is to minimise ωk in Eq. (10), which means that the assigned tasks will be completed in the shortest time. Therefore, minimising the Makespan optimises this objective function. • Maximum load balance. To achieve maximum load balance, we first introduce the average node utilisation. By definition, the average node utilisation is defined as the sum of all the nodes’ utilisation divided by the total number of nodes. Thus, the expected utilisation of each node based on the given task assignment must be calculated. This can be achieved by dividing the task completion time of each node by the makespan value. We therefore formulate the calculation as follows: Pi (utilisation) = Pi (completion time)/Makespan P¯ (utilisation) =
X m
Pi (utilisation)
(11)
m.
(12)
i =1
Nevertheless, a high value of the average node utilisation does not always mean a desirable load balance across all the nodes. Sometimes even two schedules with the same average node utilisation may be entirely distinct in terms of load balance. To support our arguments, consider the schedules depicted in Fig. 3. Eqs. (11) and (12) will lead to:
Y. Li et al. / Future Generation Computer Systems 25 (2009) 819–828
P1a = 10/20 = 0.5
P2a = 20/20 = 1
= 19/20 = 0.95 5 X P¯ = Pia /5 = 0.87 P4a
P5a
823
P3a = 18/20 = 0.9
= 20/20 = 1
i=1
P1b P3b P4b
= 17/20 = 0.85
P2b = 20/20 = 1
= 18/20 = 0.9
= 16/20 = 0.8 P5b = 16/20 = 0.8 5 X P¯ = Pia /5 = 0.87. i=1
Note that we use superscripts a and b to denote the two schedules, respectively. Apparently, although both these schedules have the same value of average node utilisation, they differ in load balancing. Of the two schedules in Fig. 3(a) and (b), we prefer the latter that is more suitable for us to obtain a much better load balance. As a result of that, we should seek other performance measure to fulfill such a goal. In this paper, we use the mean square deviation of Pi (utilisation) as the objective function, which is defined as:
v uP u m k ¯k 2 u (Pi − P ) t i=1
Fig. 3. Maximum load balance: the mean square deviation.
(13)
Fig. 4. Makespan and Max. load balance: when considering the current system load.
where σ k , Pik , P¯ k denote the mean square deviation, the individual node utilisation and the average node utilisation of schedule k, respectively. Once again, the task schedules in Fig. 3 are used here to calculate the new metric σ .
3.2.3. Genetic Operators • Reproduction. The reproduction process forms a new population of strings by selecting strings in the old population based on their fitness values. The selection criterion is that strings with higher fitness value should have a higher chance of surviving to the next generation. In our work, instead of the well-known biased roulette wheel, we try to use another famous selection policy, stochastic remainder sampling without replacement which has been demonstrated superior to the others. As stochastic remainder sampling with replacement methods start, the probability of selection are calculated as usual,
σk =
m
σ a = 0.1887,
σ b = 0.0748
which one achieves the more effective load balancing of those two schedules is straightforward under this criterion. Thus, the smaller the mean square deviation is, the more effective load balance we can obtain. • Current work loads. An important factor that should be taken into consideration in this scheduling is the current system load since there are always tasks being executed in certain nodes when the GA begins its tour. Assigning more tasks to a currently overloaded node may incur a long wait of tasks and thus prolong the Makespan. Hence, it is essential for us to adjust the fitness function to allow for this load expense. In our plan, we consider the current work loads as dedicated tasks for each own node so that the calculations of Makespan and the mean square deviation of the nodes’ utilisation are carried out in full knowledge of these remaining tasks. Note that the current work loads are calculated from the time point when the slidingwindow GA begins. Fig. 4 demonstrates this process, where the quantities in bold italic are the existing loads on. • Combined fitness function. To achieve the multiple goals we have discussed above, we incorporate both the objective functions into a single fitness function and express them in the following equation:
ωvk =
ωmax − ωk ωmax − ωmin
(14)
σvk =
σ max − σ k σ max − σ min
(15)
fvk = km ωvk + kd σvk
(16)
where ω and ω represent the largest and the smallest value of makespan ω in the scheduling set respectively and σ max and σ min are analogous with ωmax and ωmin . The km and kd are positive real constants less than 1 and subject to km + kd = 1. max
min
Pselectv = fv /
X v
fv
where fv denotes the fitness value of a given individual v . Then the expected number of individuals for each string ev is calculated as ev = Pselectv × Popsize. The value of ev is divided into two parts: the integer part determines the number of this individual that should be passed to the succeeding generation; the fractional part is treated as probability of whether this individual should survive or not. For example, a string with an expected number of copies equal to 1.5 would receive a single copy surely and another with probability 0.5. This process continues until the population reaches its full size. • Crossover. The crossover operator is of prime importance for genetic algorithms to exploit the current population. Considering the fact that our coding scheme has twodimensional strings, some delicacy should be taken in the design of crossover operator. First of all, although each string consists of a 2-tuple sequence hTj , Pi i, the computing node Pi , in our point of view, is thought of as a variable dependent of task Tj and thus changes accordingly when task Tj shifts its locus during the crossover process. Under this observation, we have transformed the two-dimensional string vector into a simple one-dimensional sequence. Second, since a standard crossover operator may incur illegal strings (e.g. duplicate task in a string which is not permitted in our coding scheme), we need to seek other welldefined genetic operators. In this paper, an elaborately designed crossover operator is described below.
824
Y. Li et al. / Future Generation Computer Systems 25 (2009) 819–828
(3) While termination condition is not met do (4) Apply reproduction rules to P to select the better individuals and therefore create Pmating , consisting of individuals of the same population size as P; (5) Crossover Pmating according to a crossover rate; (6) Mutate Pmating according to a mutation rate; (7) Copy the whole individuals of Pmating to P; (8) Evaluate P; (9) Endwhile (10) Report the best chromosome as the final solution.
Fig. 5. Crossover operator.
Random permutations of the tasks within the window and also random node assignments to tasks in that task sequence jointly construct the initial population. Each string is evaluated and is determined whether to survive or not by the reproduction rules. Then, the survived strings cross over and mutate according to a fixed crossover rate and mutation rate respectively to generate the succeeding generation. Finally, the termination condition is checked. Here, we simply set the number of generations as the termination condition. When the iteration times reach the generation size we prescribed, the GA stops and reports the best solution to the RMU. 4. Experimental results
Fig. 6. Mutation operator.
First a random crossover point is chosen. Then we compare the two right segments of both parents so as to seek out the different genes (Note that only the first element of the 2tuple gene is taken into consideration). In so doing, we get two disjoint gene sets, namely set A and set B (from the right segment of parent 1 and 2, respectively). We subsequently locate these genes in set A (B) along the right segment of parent 1 (2) and replace them with the genes in set B (A) using a random mapping. We then exchange the two altered right segment and finally create two children. An example of this crossover operator is given in Fig. 5. In this illustration, the different sets for the first and second different right segments are the set {T2 , T3 } and the set {T5 , T7 }, respectively. As depicted by a red arrow, the substitutions take place at randomly mapped pairs, i.e., {T7 → T2 } and {T5 → T3 } in parent 1 and similarly {T2 → T5 } and {T3 → T7 } in parent 2. Then these two right segments are swapped to complete the crossover. • Mutation. Instead of the standard mutation operator, we use a specific mutation operator to deal with the particular problems that arise in our work. Unlike the crossover operator, which considers the task Tj a dominant element of the 2-tuple hTj , Pi i, our mutation operator will ‘mutate’ both elements to fulfill its function. Our mutation method is a two-part process: First two randomly tuples are selected to exchange their position in the string and then a random change is applied to each of the corresponding processors inside both tuples. The Fig. 6 depicts the above-mentioned procedure, where tuple hT5 , P2 i and hT4 , P3 i are randomly selected and swapped their positions with the random changes {P2 → P4 } and {P3 → P1 } at the same time for both the tuples, respectively.
3.2.4. Genetic algorithm framework The framework of the GA designed for our problem, considering the points discussed above, is shown below. Genetic algorithm framework. (1) Generate initial population P, consisting of a set of individuals represented by chromosomes; (2) Evaluate P;
In this section, we present the experiments carried out for performance evaluation of proposed algorithm. We use OMNET++TM 3.3 as our underlying simulation platform and developed a simple yet pragmatic model to simulate a real grid computing system. Our proposed algorithm was compared to the following algorithms in the simulation.
• First-come-first-served algorithm [6]. • Dynamic Genetic load balancing Algorithm (hereafter referred to as DGA). This is a modified algorithm based on the main idea introduced by Zomaya [3]. Sliding window technique was employed and current system loads were taken into consideration when load balancing decisions would be made. A different genetic algorithm, which is similar to the GA part of our proposed algorithm, was developed in accordance with the characteristics of the concerned problem. Tasks entered the sliding window and were scheduled using the GA to achieve a shorter Makespan as well as a higher overall computing node utilisation. As a matter of fact, these two aforementioned algorithms are exactly two extremes of our hybrid algorithm: one only focus on the instantaneous load balancing and the other concerns itself with a relatively holistic load balance within the current sliding window. We got all three algorithm involved in our comparative studies in most cases during our simulation. However, in some cases, only two of them were used for meaningful analysis. 4.1. Simulation model and related parameters 4.1.1. Model design There are four components in our simulation model based on the simple module concept of OMNET++, namely task generator, grid middleware agent, task dispatcher and computing node. As depicted in Fig. 1, created by task generator, tasks first enter the task queue and then get scheduled in a task dispatcher before arriving at the appropriate CN in the end. The Agent is responsible for collecting enough relevant information in order for task dispatcher to make better load balancing decisions. The task dispatcher is equipped with three algorithms: DGA, FCFS and our proposed HGA (Hybrid Genetic Algorithm). During each experiment, diverse requests from task lists are generated for load
Y. Li et al. / Future Generation Computer Systems 25 (2009) 819–828
825
Table 1 Experimental parameters Parameters
Values
Number of computing nodes Capacity of nodes Ci Number of tasks Mean Inter-arrival Time Mean Computation Length Window Size(DGA,HGA) Population size(DGA,HGA) Generation size(DGA,HGA) Crossover Rate(DGA,HGA) Mutation Rate(DGA,HGA) Weighted Coefficient of Makespan km (DGA,HGA) Weighted Coefficient of Deviation kd (DGA,HGA)
5 1 250 1s 5s 10 20 40 0.9 0.14 0.8 0.2
balancing. For each task list running, we apply all three abovementioned strategies to our simulation grid and observe each load balancing performance of them one by one, thereby leading to a comprehensive and comparative study on load balancing. Several assumptions are made for the simulation. These are:
Fig. 7. Makespan: under the homogeneous scenario.
• Tasks arrive and enter into the RMU, according to a Poisson process with rate λ. • The expected computation lengths of tasks are assumed to follow an exponential distribution with a mean χ . • Let Ci denote the average capability of node i, which is the relative ratio of its capability compared to a reference one (see Section 2). • Let ρ be the average system utilisation factor for our simulation, which is the average task arrival rate divided by the average task processing rate [23]. Using this definition, we adjust the task computation length χ needed to get the desired ρ . 4.1.2. Experimental parameters Unless otherwise specified, experimental runs were based on a set of default values as listed in Table 1 throughout the whole simulation process. Note that some parameters had been marked out their applicability in parenthesis since they were not applicable for all the three algorithms. The performance measurements of these algorithms are by three metrics: Makespan, average node utilisation and the square deviation of nodes’ utilisation that indicated implicitly how much load balancing can be achieved. We addressed ourselves to observing how well those algorithm performed in terms of such performance metrics under different experimental parameters. In particular, we varied the average system utilisation factor as well as the number of tasks and collected the results from experimental runs to study the performance of these three algorithms under different conditions. While the selection of nodes and the processing order are different among those three approaches, the seed produced by the generator module in each experimental run was set to the same so that the experimental conditions for the algorithms examined is identical. For each simulation run, five sets of randomised seeds are applied and the final results are produced on an average basis. 4.2. Results: performance under homogeneous nodes We first assume that all the nodes have the same processing capacity in this section. Under such assumption, we present a series of simulation results of the algorithms described above under different system utilisation factor ρ . In particular, we vary the mean computation length, χ from 4 s–7.5 s, the performance under the average system utilisation factor from 0.8 to 1.5, which constitutes an observation cycle, can be observed. Default values were used for all the other parameters. The results are shown as follows.
Fig. 8. Average Node Utilisation: under the homogeneous scenario.
Figs. 7 and 8 show Makespan and average node utilisation for all these three algorithms under different system utilisation factor, respectively. As expected, our proposed algorithm outperforms the others in all cases. When comparing the results of the HGA and the FCFS algorithm, one can observe that the gap between the two curves widens as the average system utilisation factor increases. However, the two curves are almost the same when the system utilisation is less than 1.0 in the very beginning. The reason is that the FCFS dominates the load balancing process when the system workload is light. As the system utilisation factor increases, the system workload becomes heavier and heavier and thus the GA has more chance to make load balancing decisions, thereby bringing in better performance. This shows that the HGA actually works well, especially in higher workload scenarios. On the other hand, although DGA also employs the same GA algorithm, it requires the sliding window to be full before the GA really gets started and therefore wastes some processing time. As mentioned above, the square deviation of nodes’ utilisation indicates how well the loads are balanced across all the nodes involved in a grid system. The lower the measurement of this performance metric, the better the performance of load balancing. Depicted in Fig. 9, the result shows once again that our proposed algorithm performs better than the FCFS. It is worth pointing out that at the point of ρ equal to 1.5, HGA gains a considerable amount of performance improvement and also has a desirable tendency toward perfect load balance.
826
Y. Li et al. / Future Generation Computer Systems 25 (2009) 819–828
Fig. 10. Average Node Utilisation: varied task numbers.
Fig. 9. Square Deviation: under the homogeneous scenario.
Table 2 The measurement lists of performance. Task
Makespan(s)
Numbers
DGA
FCFS
HGA
100 150 200 250 300 350 400 450 500
100.9 137.6 196.6 230.1 275.7 326.7 361.9 422.1 466.0
94.1 132.8 184.9 230.4 269.1 315.4 356.4 413.1 459.2
90.6 129.2 181.0 222.5 265.9 312.0 352.4 407.3 456.4
We then focused our attention on observing the different performances of all the three algorithms when the number of tasks changed. The number of tasks was varied from 100 to 500 with an interval of 50. We varied the mean computation length of the tasks, χ to keep the average system utilisation factor ρ at a value of 1.4. The other parameters took on their own default values individually. Table 2 shows that the Makespan taken for all the three algorithms grows up as the number of tasks increases. This is within the expectation since there are more tasks needed to be dealt with. It is also noted that our proposed algorithm performs better than the FCFS and DGA algorithm in all cases. Figs. 10 and 11 illustrate the effects of the average node utilisation and the square deviation of nodes’ utilisation under varied numbers of tasks, respectively. As the number of tasks increases, the average node utilisations of both the FCFS and the HGA are on the rise and approach 100% in Fig. 10. However, the HGA always achieves a higher value than the FCFS, which indicates better performance. Similarly, the HGA obtains a more balanced distribution of load over the whole system as the number of tasks increases in Fig. 11. By carefully observing all curves of both these figures, we find that our proposed algorithm is much steadier and smoother than the FCFS, which means that our proposed algorithm is able to accommodate more contingencies and thus performs more robustly than the FCFS.
Fig. 11. Mean Square Deviation: varied task numbers.
Fig. 12. Makespan:under the heterogeneous scenario.
4.3. Results: Performance under heterogeneous nodes In the experiments of this section, we observe the performance of our algorithm under a heterogeneous gird system. We randomly selected two nodes and fixed their relative capabilities double the
value of the others. We varied the mean computation length, χ from 5.6 s to 10.5 s and thus obtained the performance expressed by the three metrics as depicted in Figs. 12, 13 and 14 respectively under different utilisation factors from 0.8 to 1.5.
Y. Li et al. / Future Generation Computer Systems 25 (2009) 819–828
827
merits. The FCFS can help make instantaneous decisions and thus reduce the system response time, thereby resulting in a shorter Makespan. The GA concentrates on an overall performance over a list of tasks and aims at a more desirable load balance across all the nodes in a computational grid. A sliding-window technique is presented to trigger the switch between the two policies and provide a mechanism to help make rapid task assignments as well. It is observed that our algorithm can give better performance than its counterparts over a wide range of system parameters.
References [1] I. Foster, C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure, first ed., Morgan Kaumann, San Francisco, 1999. [2] I. Foster, C. Kesselman, S. Tuecke, The anatomy of the grid: Enabling scalable virtual organizations, The International Journal of High Performance Computing Applications 15 (3) (2001) 200–222. Fig. 13. Average Node Utilisation:under the heterogeneous scenario.
[3] A.Y. Zomaya, Y.-H. Teh, Observations on using genetic algorithms for dynamic load-balancing, IEEE Transactions on Parallel and Distributed Systems 12 (9) (2001) 899–912. [4] L. He, S. Jarvis, D. Spooner, H. Jiang, D. Dillenberger, G. Nudd, Allocating nonreal-time and soft real-time jobs in multiclusters, IEEE Transactions on Parallel and Distributed Systems 17 (2) (2006) 99–112. [5] X. Tang, S. Chanson, Optimizing static job scheduling in a network of heterogeneous computers, in: Proceedings of International Conference on Parallel Processing, Toronto, Canada, 2000. [6] J. Cao, D.P. Spooner, S.A. Jarvis, G.R. Nudd, Grid load balancing using intelligent agents, Future Generation Computer Systems 21 (2005) 135–149. [7] J. Cao, D. Spooner, S. Jarvis, S. Saini, G. Nudd, Agent-based grid load balancing using performance-driven task scheduling, in: Proceedings of the IPDPS’03, Nice, France, 2003. [8] Globus, http://www.globus.org, 2009. [9] Mohammad Kalantari, Mohammad Kazem Akbari, A parallel solution for scheduling of real time applications on grid environments, Future Generation Computer Systems, in press, (doi:10.1016/j.future.2008.01.003). [10] Ruay-Shiung Chang, Jih-Sheng Chang, Po-Sheng Lin, An ant algorithm for balanced job scheduling in grids, Future Generation Computer Systems 25 (1) (2009) 20–27. [11] K. Yan, S. Wang, C. Chang, J. Lin, A hybrid load balancing policy underlying grid computing environment, Compute Standards and Interfaces 29 (2007) 161–173.
Fig. 14. Mean Square Deviation:under the heterogeneous scenario.
Once again, our proposed algorithm enjoys the better performance in terms of all three metrics as compared with its counterparts. Furthermore, since the GA itself is an robust process and therefore can accommodate more uneven computing environments, the results in this section are rather encouraging. Much more performance improvement can be observed for all these three figures as compared with the homogeneous scenario. 5. Conclusion Load balancing is a crucial issue for the efficient operation of grid computing environments in distributing the sequential tasks. Numerous items in the literature focus their attention on this and present a series of solutions, which can be categorised into two classes based on the type of information on which the load balancing decisions are made, namely static and dynamic approaches. Instead of devoting ourselves into an in-depth study on either static or dynamic approaches individually, we propose a novel load balancing strategy using a combination of static and dynamic load balancing strategies. In particular, we combine a first-come-first-served algorithm with a special-designed GA to form a hybrid so as to take full advantage of their respective
[12] J.H. Abawajy, An efficient adaptive scheduling policy for high-performance computing, Future Generation Computer Systems 25 (3) (2009) 364–370. [13] S. Iabal, G.F. Carey, Performance analysis of dynamic load balancing algorithms with variable number of processors, Journal of parallel and distributed computing 65 (2005) 934–948. [14] Z. Akhtar, Genetic load and time prediction technique for dynamic load balancing in grid computing, Information Technology Journal 6 (7) (2007) 978–986. [15] V. Berten, J.l. Goossens, E. Jeannot, On the distribution of sequential jobs in random brokering for heterogeneous computational grids, IEEE Transactions on Parallel and Distributed Systems 17 (2) (2006) 113–124. [16] H. El-Rewini, H. Ali, T. Lewis, Task scheduling in multiprocessing systems, Computer 28 (12) (1995) 27–37. [17] J. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, MI, 1975. [18] D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, Mass, 1989. [19] S. Salleh, A. Zomaya, Scheduling in Parallel Computing Systems: Fuzzy and Annealing Techniques, Kluwer Academic, 1999. [20] M. Harchol-Balter, M. Crovella, C. Murta, Task assignment in a distributed server, in: 10th International Conference on Modeling Techniques and Tools for Computer Performance Evaluation, in: Lecture Notes in Computer Science, 1998. [21] A. Abraham, R. Buyya, B. Nath, Nature’s heuristics for scheduling jobs on computational grids, in: Proc. 8th IEEE International Conference on Advanced Computing and Communications, Cochin, India, 2000. [22] R. Subrata, A.Y. Zomaya, B. Landfeldt, Artificial life techniques for load balancing in computational grids, Journal of Computer and System Sciences 73 (8) (2007) 1176–1190. [23] L. Kleinrock, Queueing Systems, in: Theory, vol. 1, John Wiley and Sons, New York, 1976.
828
Y. Li et al. / Future Generation Computer Systems 25 (2009) 819–828 Yajun Li received B.S. degree in Automation Control Engineering from Central South University and M.S. degrees in System Engineering from Xi’an Jiaotong University in 1994 and 1998, respectively. He worked as a telecommunication network engineer in China Telecom for 7 years. He is now working toward his Ph.D. degree in Shanghai Jiaotong University, Shanghai, China. His current research interests include grid computing, computer communications, and artificial intelligent algorithm design.
Yuhang Yang graduated from the Electronic Engineering Department of Chengdu Institute of Meteorology in 1982. From 1984 to 1987, Yang studied telecommunications and computer networking and received an MSEE from Aston University, Great Britain. Now he is professor of the Department of Electronic Engineering, Shanghai Jiao Tong University. Mr. Yang was awarded top honors in the Technology Improvement Award by the Electronics Ministry of Chinese government in 1995. Yang was honored as the most outstanding person Cross the Century by the National Education Committee in 1997. His current research interest lies mainly in the field of Broadband Wireless, Grid Networking, Information Security and Online Video Distribution.
Maode Ma received his BE degree from Tsinghua University in 1982, ME degree from Tianjin University in 1991 and Ph.D. degree in computer science from Hong Kong University of Science and Technology in 1999. Dr. Ma joined the School of Electrical and Electronic Engineering of Nanyang Technological University as an assistant professor in November 2000. Dr. Ma has published around 60 academic papers in the areas of optical networks, wireless networks, etc. His current research interests are wireless networks, optical networks, grid computing, bioinformatics, etc. He currently serves as an associate editor for IEEE Communications Letters and an editor for IEEE Communications Surveys and Tutorials.
Liang Zhou received B.S. and M.S. degrees in signal and information processing from Nanjing University of Posts and Telecommunications, Nanjing, China, in 2003 and 2006, respectively, and now he is pursuing his Ph.D. degree in Shanghai Jiao Tong University, Shanghai, China. His research interests include image processing, wireless multi-media communication, wireless mesh network and artificial algorithm design.