Future Generation Computer Systems 26 (2010) 857–867
Contents lists available at ScienceDirect
Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs
An adaptive multisite mapping for computationally intensive grid applications Ivanoe De Falco, Umberto Scafuri, Ernesto Tarantino ∗ Institute of High Performance Computing and Networking, National Research Council of Italy, Via P. Castellino, 111 80131 Naples, Italy
article
info
Article history: Received 20 July 2009 Received in revised form 1 February 2010 Accepted 10 February 2010 Available online 17 February 2010 Keywords: Processor architectures: Other architecture styles - Heterogeneous (hybrid) systems Artificial intelligence: Problem solving, control methods, and search - Heuristic methods
abstract The unemployed computational resources, usually available on the multi-owner time-shared computing nodes of a multisite grid, could be fruitfully exploited to execute the parallel tasks of computationally challenging applications. An efficient use of such resources requires an optimal task/node mapping which, already known as NP-complete on classical parallel computers, becomes much harder on grid systems where additional degrees of complexity are introduced. Since classical mapping algorithms result inadequate in such an environment, heuristic techniques turn out to be adopted to find near-optimal solutions. In this paper a software tool, based on a multiobjective differential evolution algorithm, is tested on some artificial mapping problems differing in applications and grid working conditions. The aim is to fulfill several optimization criteria such as optimization in the use time of grid resources achieving the minimization of application execution while, contemporaneously, complying with Quality of Service requirements. The findings obtained show the ability of the evolutionary approach proposed to cope with such a multisite grid mapping, i.e. a deployment not constrained to select nodes from one single site. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Grids [1,2] are decentralized collections of networked heterogeneous processing elements which, belonging to several organizations, are generally subject to multiple and divergent management policies. Usually the grid is organized in geographically distributed sites. Any site can contain different computing systems, each consisting of one or more processing nodes whose power is seldom fully exploited. When a large number of these nodes can act in concert, as in a grid, the aggregated use of their locally unemployed resources makes available a computational power which is inconceivable in any system which can be physically assembled. This considerable resource could be profitably used to execute computationally intensive parallel applications. These applications are generally broken down into communicating tasks and their execution on a grid acts in three successive phases: resource discovery, task-to-node mapping and task scheduling [3]. The resource discovery phase has to determine the amount, type and status of the available resources considering the dynamic availability and the relevant workload of grid resources. The mapping phase must select, in accordance with possible user requests, the nodes which opportunely match the application needs with the available grid resources. Finally, the last phase establishes the schedule timing of the tasks on the nodes.
∗
Corresponding author. Tel.: +39 081 6139525; fax: +39 081 6139531. E-mail address:
[email protected] (E. Tarantino).
0167-739X/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2010.02.009
In our hypothesis the first and the third phase are implicitly overcome. In fact, supposing that both the characteristics and the average loads of the nodes and the tasks requirements are known, the resource discovery phase is avoided. As concerns the third phase, it can be eluded ensuring either the advance reservation or the coallocation for the different application tasks [4–7]. In line with application requests, resource conditions and knowledge of the different local scheduling policies, we have selected from one or more sites only the nodes which, according to their estimated workloads, have at their disposal the computing power needed to execute the tasks assigned to them. This last assumption implies that all the tasks of the same application will be co-allocated, i.e. tasks will be contemporaneously loaded in the queues of the runnable processes of the nodes which they are assigned to. Obviously this co-allocation is necessary because, in the absence of information about the communication timing, the communicating task execution proceeds only if their simultaneous allocation is guaranteed [8]. This paper is focused only on the mapping phase, assuming that the grid middleware layer supplies suitable services for coallocation [9]. Differently from the canonical approach [10,11] which takes into account the grid user’s perspective aiming at minimizing only the completion time of the application, here the problem is faced from the grid manager’s point of view. Hence, our goal is to supply a software tool able to find the solution which minimizes the application execution time and communications delay and, at the same time, exploits best the overall grid resources. The problem becomes even more complex when, as often happens, one desires to determine mapping solutions which,
858
I. De Falco et al. / Future Generation Computer Systems 26 (2010) 857–867
besides guaranteeing low execution times and an efficient use of resources, must also satisfy user-dependent requirements [12], such as Quality of Service (QoS) [13]. In such a circumstance, resources in a single site could be insufficient to meet all these needs. Thus, an efficient multisite mapping tool, able to choose from among resources spread over multiple sites and to obtain high throughput while matching user’s QoS requests, must be designed. The classical mapping algorithms, already known as NPcomplete on traditional parallel and distributed systems, usually run on homogeneous and dedicated resources and cannot work adequately in these situations [14]. In fact, the mapping problem introduces further degrees of complexity in a grid where the resources have features which vary even substantially over time as the local loads change, owing to the high network dynamicity [15], and when the application is also constituted by communicating tasks. Due to the practical relevance of grid task allocation, several approaches have been investigated; unfortunately they deal with independent task workloads [16–18], loosely dependent task mapping [19,20], homogeneous tasks to heterogeneous nodes [21], single site mapping [22,23] or multisite mapping without considering the heterogeneity of the sites from different domains [4,24–26]. Furthermore, evolutionary algorithms have been explored [17,22,27] to find a near-optimal solution in a reasonable time. However, these approaches are limited to the mono-objective case and to small size instances. Singh et al. [28] presented a multiobjective genetic algorithm (GA) formulation for providing resources for an application using a slot-based resource model to optimize cost and performance but, due to the timeconsuming nature of GA, this approach is not the most suitable for online mapping. Here a multiobjective version of differential evolution (DE) [29,30], based on the Pareto method [31], is proposed to automatically provide a set of possible mapping solutions, each with its different balance for resources use and QoS constraints, here investigated only in terms of the degree of reliability of the grid nodes and of the internet links of the sites they belong to. This approach, successfully presented in a previous paper [32], is enriched with the task scheduling model and with a deeper evaluation of its efficiency in holding a wider grid structure, more significant applications and different grid operating conditions. Unlike all the existing evolutionary approaches, which simply search for one site onto which to map the whole application, we deal with a multisite approach spreading the application tasks among nodes which can belong to different heterogeneous sites. Hence, as a further distinctive issue, compared to other methods [33], we consider the nodes as the lowest computational units taking their actual loads and their reliabilities into account. Paper structure is as follows: Section 2 describes the employed technique; Section 3 presents the working environment while Section 4 illustrates our multiobjective evolutionary mapper; Section 5 reports on the test problems experienced and shows the results achieved, and finally Section 6 contains conclusions. 2. The technique 2.1. Differential evolution for single-objective problems The stochastic DE algorithm, proposed by Storn and Price [29], exhibits remarkable performance in optimizing a wide variety of multidimensional and multimodal objective functions in terms of final accuracy and robustness and outperforms many of the already existing stochastic and direct search global optimization techniques [34,35]. Given a minimization problem with q real parameters, DE faces it by starting with a randomly initialized population consisting of
M individuals each made up of q real values. Then, the population is updated from one generation to the next by means of some transformations. Many different transformation schemes have been defined. The authors [29] tried to come up with a sensible naming convention and decided to name any DE strategy with a string DE /x/y/z. In it DE stands for differential evolution, x is a string which denotes the vector to be perturbed (best = the best individual in current population, rand = a randomly chosen one, rand-to-best = a random one, but the current best participates in the perturbation too), y is the number of difference vectors taken for the perturbation of x (either 1 or 2), while z is the crossover method (exp = exponential, bin = binomial). For instance, if a random individual is perturbed by using one difference vector and by applying binomial crossover, the strategy can be referenced as DE/rand/1/bin. This means that for the given target individual xi in the current population, three vectors xr1 , xr2 , and xr3 are randomly selected, such that the indices i, r1 , r2 and r3 in [1, M ] are distinct. A new individual xi0 , called the donor vector, is generated whose generic jth component is obtained by adding the weighted difference of two of the vectors to the third one xi0 ,j = xr3 ,j + Fm · (xr1 ,j − xr2 ,j ). The mutation factor Fm is a real and constant factor which controls the magnitude of the differential variation (xr1 ,j − xr2 ,j ) and is a parameter of the algorithm. In recombination a trial vector is developed from elements of the target vector xi and of the donor vector xi0 . Components of the donor vector enter the trial vector with a probability CR (parameter of the algorithm). Such a new trial individual xi∗ is compared with the ith individual in the current population and, if fitter, replaces it in the next population; otherwise the old one survives and is copied into the new population. This basic scheme is repeated for a maximum number of generations g or until some stopping criterion is reached. 2.2. Differential evolution for multiobjective problems While the goal of single-criterion optimization is to find the global optimal solution, in multi-criterion optimization more than one objective function exists. In such cases, each single objective function could have an optimal solution which is in conflict with those corresponding to the other single objective functions. In such a case a multiobjective DE algorithm based on the Pareto-front approach can be designed and implemented. It is very similar to the DE scheme described in the previous paragraph, apart from the way the new trial individual i0 is compared to the current individual i. In this case i0 is chosen if and only if it is not worse than i in terms of both the fitness functions and is better than i for at least one of them. By doing so, a set of ‘‘optimal’’ solutions, the so-called Pareto optimal set, emerges in that none of them can be considered to be better than any other in the same set with respect to all the single objective functions. This relies on the notion of dominance: for a problem with multiple objectives to optimize, each represented by a fitness function Φi , a solution s∗ is said to dominate in the Pareto sense ∗ (P-dominate) another solution s if and only if Φis Φis ∀i and s∗ s ∃j : Φj ≺ Φj , where and ≺ mean ‘not worse than’ and ‘better than’ respectively. A solution s∗ is said to be Pareto-optimal if there is no other solution s which dominates s∗ in the current population. Pareto-optimal Set and Pareto-optimal front are the sets of Pareto-optimal solutions in design variables and objective function domains, respectively. By doing so, at each generation a set of ‘‘optimal’’ solutions emerges in which none of them can be considered to be better than any other in the same set. As the number of generations increases the current Pareto front will shift, and hopefully will approach the Pareto-optimal front. At the end of
I. De Falco et al. / Future Generation Computer Systems 26 (2010) 857–867
859
Fig. 1. The multisite task scheduling model.
DE execution the final Pareto front will be supplied to the user who will choose, from among the solutions contained therein, the one which best suits his needs. If for example the first fitness function should be minimized while the second should be maximized, the pseudocode of our multiobjective DE for mapping is reported below: Multiobjective DE procedure begin randomly initialise a population X = (x1 , . . . , xM ); evaluate fitness Φ1 and Φ2 for any individual xi ; while (maximal number of generations g is not reached) do for i = 1 to M do choose a random real number psm ∈ [0.0, 1.0]; if psm < pm apply site mutation; else choose three integers r1 , r2 and r3 in [1, M ], with r1 6= r2 6= r3 6= i; choose an integer number s in [1, q]; for j = 1 to q do choose a random real number ρ ∈ [0.0, 1.0]; if ( (ρ < CR) OR (j = s) ) xi0 = xr3 j + F · (xr1 j − xr2 j ); j
else xi0 = xij ; j
endif endfor if ( ((Φ1 (xi0 ) < Φ1 (xi )) AND (Φ2 (xi0 ) ≥ Φ2 (xi ))) OR ((Φ1 (xi0 ) ≤ Φ1 (xi )) AND (Φ2 (xi0 ) > Φ2 (xi ))) ) insert xi0 in the new population; else insert xi in the new population; endif endif endfor endwhile end
3. The task scheduling model It is common knowledge that there exist various grid typologies, different in terms of computing systems included, purposes and administrative policies adopted.
In this work we refer to a computational grid consisting of numerous computing nodes, distributed in separate sites, which contribute with their locally unexploited resources to the execution of parallel communicating tasks of computationally challenging applications. The grid mapper is connected to external information services which retrieve data about (current and future) availability and the performance of node resources. Each node is supposed to operate in a time-shared modality and has two different queues, one for the locally submitted processes and another for the remote tasks presented via grid to be executed during idle times (see Fig. 1). While the tasks in the local queue will be scheduled on the basis of the locally established policy, a first-come-first-served (FCFS) strategy with priority must be adopted for those in the grid queue. This means that the strategy assigns a priority to each non-local task to be scheduled and then sorts the list of tasks in decreasing priority order. In particular, the same priority is assigned to the tasks of the same application. Instead, for the processes of different applications the submission time is considered and the tasks of an application submitted at a given instant are assigned a priority greater than that of the tasks of the applications presented later. The scheduling of the remote queue does not assume a fair CPU time distribution, but rather operates as follows: the time (or available or unemployed local power) destined for the execution of the processes in the remote queue will be fully dedicated to the highest priority tasks. In other words the processor will activate a ready task with a lower priority if, and only if, tasks in the runnable state with higher priorities do not exist. Obviously, assuming the round-robin for the tasks of the same grid application, the execution of a process will be interrupted only if, once its time slice has elapsed, another task belonging to the same application exists in the queue, either because the current task is moved to a waiting state owing to a not-ready communication, or else because a suspended process of a higher priority application has become ready again (the scheduler must be pre-emptive). With respect to the previous assumptions, our grid is organized in terms of clients, i.e., parallel application submitters, and resource providers (owned by institutions or individuals) which donate their computing resources during idle time. We hypothesize a unique mapping server which, given a submitted application and having knowledge of the local node workloads (hence, the average idle times are known), performs the co-allocation of tasks on the grid resources presumed to be the most suitable for that execution by the server. In other words, the user submits his/her application subdivided into tasks to the centralized mapper/scheduler on
860
I. De Falco et al. / Future Generation Computer Systems 26 (2010) 857–867
the whole grid; this later establishes the mapping and co-allocates the tasks on the selected processing elements. The implication is that the generic task of a grid application shares the computing resources of the assigned node with local and remote processes on the basis of the priorities. If the time instant in which the task becomes ‘running’ and the amount of computation it has to perform are known, our mapper server calculates the execution time operating as if all the remaining node resources were exclusively dedicated to that task. Clearly, the idle time of a processing node consequent to a not-ready communication could be conveniently exploited to carry out the execution of other tasks there allocated. When a task of a different application is assigned to a node, if the tasks previously assigned to that node are not pending and the load remains constant, the time needed to execute this new task is that obtained by the sum of its execution time, calculated as if the first tasks had never been allocated there, plus the time required for the running of the first tasks. 4. Our multiobjective evolutionary mapper 4.1. Mapping problem definition Starting from the hypothesis that grid computing environments are inherently dynamic and unpredictable environments sharing services amongst many different applications, some authors propose an approach to grid scheduling which abstracts over the details of individual applications [36]. Unfortunately this method does not allow an efficient resource allocation. An alternative method to focus the mapping problem is to retrieve information on the number and on the status of both accessible and demanded resources. We assume that we have an application task subdivided into P subtasks (demanded resources) to be mapped on n nodes (accessible resources) with n ∈ [1, N ], where P is fixed, N is the number of grid nodes and each node is identified by an integer value in the range [1, N ]. Such an application may require access to several kinds of resources for its execution, mainly computing, data and network facilities and information about their availability and status has to be retrieved from appropriate entities. We need to know a priori the number of instructions αi computed per time unit on each node i and the communication bandwidth βij between any couple of nodes i and j. Note that βij is the generic element of an N × N symmetric matrix β with very high values on the main diagonal, i.e., βii is the bandwidth between two tasks on the same node. However, while it is easy to get this information in a static environment, application run-time resources, such as CPU load, available memory and network capacity are variable because a grid is a dynamic resource-sharing environment. At the time of making decisions about mapping, information on run-time available resources is critical. The predictive analysis is based on historical data on previous resource availability and application performance. Various statistical techniques can be applied ranging from stochastic process analysis used to predict future resource measurements on a fixed point [37] or during a certain interval of time [38] to regression techniques used for performance prediction in the presence of a performance model [39]. Information on the status of available resources is very important for a grid scheduler to make a proper schedule, especially when the heterogeneous and dynamic nature of the grid is taken into account. The role of the Grid Information Service (GIS) is to provide such information to grid schedulers. For example, in the Globus Toolkit [13], which is a standard grid middleware, most schedulers fetch predicted resource parameters from GIS which is responsible for collecting and predicting resource state information, such as CPU capacities, memory sizes,
network bandwidth and load of a site in a particular period. The Globus Monitoring and Discovery System (MDS) [40] and Network Weather Service (NWS) [41] are examples of GIS. Thus it is clear that there exist different software components to retrieve runtime resource information. Here, this information is supposed to be acquired either through statistical estimations in a particular time span or gathered by tracking periodically and forecasting dynamically resource conditions [40,42]. In general grids address non-dedicated resources since they have their own local workloads. This affects the local performance and we must consider these load conditions to evaluate the expected computation time. There exist several prediction models to face the challenge of non-dedicated resources [41,43]. For example, concerning the computational power, we suppose that we know the average computational load `ci (1t ) of the node i at any given time span 1t with `ci (1t ) ∈ [0.0, 1.0], where 0.0 means a node fully discharged and 1.0 a node locally loaded at 100%. Hence (1 − `ci (1t )) · αi represents the power fraction of the node i available for the execution of grid tasks. Analogously we assume that we have information about the average bandwidth load `βij (1t ) ∈ [0.0, 1.0] relative to the communication channel between the node i and j where, as before, 0.0 implies a bandwidth completely unused and 1.0 a communication channel totally busy due to the communications requested by the other tasks on these nodes. Similarly (1 −`βij (1t ))·βij denotes the bandwidth available for the communication of the grid tasks. Besides raw resource information from NWS or via MDS, application task properties are also necessary to make a feasible deployment. We assume to know, for each task k, the number of instructions γk and the amount of communications ψkm between the kth and the mth task ∀m 6= k to be executed. Obviously, ψkm is the generic element of a P × P symmetric matrix ψ with all null elements on the main diagonal. A comprehensive set of performance modeling strategies to predict this information for parallel applications on both dedicated and non-dedicated environments can be found in [44]. Finally, information must be provided about the degree of reliability of any component of the grid. This is expressed in terms of the fraction of actual operativity πz for the processor z and λw for the link connecting to the internet the site w to which z belongs. These values can be gathered by means of a historical and statistical analysis and range in [0.0, 1.0] where a higher value means a better reliability. It is worth noting that, although this information retrieval is time consuming, such a time becomes negligible compared to the time saved when an optimal or good suboptimal mapping is performed for the execution of intensive workloads. 4.2. DE for mapping On the basis of the assumptions about the application and the available resources previously made, it is possible to define the encoding and fitness functions for our multiobjective mapping tool based on a DE technique. 4.2.1. Encoding Any mapping solution can be represented by a vector µ of P integers ranging in the interval [1, N ]. To obtain µ, the real values provided by DE in the interval [1, N + 1[ are truncated before evaluation. The truncated value bµi c denotes the node onto which the task i is mapped by the proposed solution. Given the nature of the problem, some considerations have to be made. As long as the mapping is considered by characterizing the tasks by means of their computational needs γk only, this is a classical NP-complete optimization problem, in which the allocation of one task does not affect that of the other ones, unless,
I. De Falco et al. / Future Generation Computer Systems 26 (2010) 857–867
of course, one attempts to load more tasks onto the same node. Instead, as soon as communications ψkm are taken into account also, the mapping problem becomes far more complicated. In this event, in fact, the deployment of a task on a given node in a given site can cause the optimal mapping to require that other tasks also must be allocated on the same node or in the same site, so as to decrease their communication times and thus their execution times, taking advantage of the higher communication bandwidths existing within any site compared to those between sites. In fact, although co-allocation can decrease the response time and the utilization, the makespan of some communicationintensive multisite applications is increased sharply because of the low bandwidth between sites. Such a problem is a typical example of epistasis, i.e. a situation in which the value taken on by a variable influences those of other variables. This situation is also deceptive, since one solution can be transformed into another with better fitness only by passing through intermediate solutions worse than both current and best ones. Let us consider, as an example, the circumstance in which two or more tasks are allocated onto nodes belonging to a site with a given communication bandwidth, while a suitable number of nodes at least equally fast yet with a higher bandwidth are available in another site. It is extremely improbable to migrate all at once all those tasks from the slow to the fast site by using the canonical DE mechanisms. In fact, what very likely happens is that a newly generated solution proposes keeping some of those tasks in the former site and moving some others to the latter. This allocation leads to a worse fitness value, so such a solution will be discarded, while it should be saved since it might help to reach the optimal solution by further applications of the evolutionary operators to it. To overcome this problem we have introduced in the classical DE scheme a new operator of site migration that is applied with a probability pm any time a new individual must be produced. When this migration is carried out in the current solution, a gene is randomly chosen and the node value contained in it, related to a site Si , is equiprobabilistically modified into another one which is related to another site, say Sj . Then, any other task assigned to Si in the current solution is allowed to randomly migrate to another node belonging to Sj by inserting into the related gene a random value within the bounds for Sj . If site migration does not take place, the classical transformations typical of DE must be applied. In such a way the mapping algorithm tries to adapt to network performance and to determine which application should be mapped to single or multiple sites. 4.2.2. Fitness Grid users and resource providers can have different demands to satisfy. For example users could be interested in the total cost of running their application, while providers could pay more attention to the throughput of their resources in a particular time interval. Thus objective functions can meet different goals. To evaluate fitness we make use of the information on the number and status of both the available and requested resources contained within the data structures defined in Section 4.1. Furthermore, we have two fitness functions, one accounting for the time of use of resources and the other for their reliability. comp Use of resources. Denoting with τij and τijcomm respectively the computation and the communication times requested to execute the task i on the node j it is assigned to, the total time needed to execute i on j is:
τij = τijcomp + τijcomm . This is evaluated on the basis of the computation power and of the bandwidth which remain available on the local and grid workloads. Let τjs be the summation of all the tasks assigned to the jth node by the current mapping which have been deducted.
861
This value is the time spent by node j in executing computations and communications of all the tasks assigned to it by the proposed solution. Clearly, τjs is equal to zero for all the nodes not included in the vector µ. The fitness function is
Φ1 (µ) = max {τjs } j∈[1,N ]
and the goal of the evolutionary algorithm is to search for the smallest fitness value among these maxima optimizing, in terms of time, the grid resource use. The allocation of P tasks on N nodes determines as many τjs as the number of nodes picked in the current mapping. The execution time τc of the application is included within Φ1 (µ) if all the coscheduled tasks run in perfect conditions, and, in P overlapping s the worst conditions, Γ = {τ } if the tasks are totally j∈[1,N ] j sequential. It is evident that in the absence of information related to the communication timing, it is impossible to carry out an effective evaluation on the parallelism degree of the challenging application and even to establish how much each task concurs in Γ . Naturally several solutions having the same value Φ1 (µ) but with different values for Γ can exist. The most powerful nodes in terms of computation and bandwidth must be used to diminish τjs , and hence also Γ , but this does not imply necessarily a decrease in τexec . Conscious of this eventuality and relying on a good parallelism degree able to minimize the actual contribution of each τjs to τexec , we find the mapping which minimizes Φ1 (µ) without being interested in the minimization of the execution times of each single concurring component. Thus, differently from other approaches which simply use the most powerful nodes available simply pursuing the user’s desires, here also the grid manager’s perspective aimed at the best exploitation of the grid resources is taken into account. This is ensued by searching for the best possible balance, in terms of time, for the use of the grid nodes involved in the solution. Such a view leads to the discovery of mappings which do not use for each task the most powerful nodes if, due to overlapping, their employment does not reduce the execution time of the whole application. This optimization procedure avoids keeping busy without any benefit the most powerful nodes which could be more profitably exploited for further applications and, thus, permits the best possible exploitation of the grid resources. Reliability. In this case the fitness function is given by the reliability of the proposed solution. It is evaluated as
Φ2 (µ) =
P Y
πbµi c · λw
i=1
where bµi c is the node onto which the ith task is mapped and w is the site this node belongs to. It should be pointed out that the first fitness function should be minimized, while the second should be maximized. Such a two-objective problem can be effectively faced by exploiting the multiobjective DE approach, based on the concept of the so-called Pareto optimal set, reported in 2.2. 5. Experimentation and results To the best of our knowledge, a generally accepted set of benchmarks relative to mapping of parallel applications on heterogenous computing does not exist. A comparison with other methods is not reported due to the lack of approaches which deal with this problem in the same operating conditions as ours. In fact some of these methods, based on Min–min [33], Max–min [33] and XSufferage [16] algorithms, are related to independent tasks and their performances are affected in heterogenous environments. In the case of dependent tasks, the classical methods apply the model of the direct acyclic graph (DAG) [45] unlike our approach in which
862
I. De Falco et al. / Future Generation Computer Systems 26 (2010) 857–867
33-56
57-104
125-148
149-184
1-32
105-124
Fig. 3. The application structure.
Fig. 2. The grid architecture.
no assumptions are made about the communications between the processes since we have hypothesized the co-scheduling of tasks. Moreover, although some papers have addressed multisite application scheduling with co-allocation [26], differently from us, all the sites of the grid framework are assumed to be homogeneous. Because we have been for such a reason prevented from making comparative evaluations, in our experimental environment the grid arrangement is conceived so as to simplify the checking of the mapping solution proposed. In particular, by choosing carefully the computing capabilities and the communication bandwidths of the grid nodes, and their load conditions and reliabilities, it is possible in some test cases, by planning simulations with suitable applications, to promptly detect the optimal solutions. In these situations, the goodness of the solutions achieved can be rapidly verified for comparison with the expected optimal ones. Our simulations refer to a grid which aggregates 184 nodes subdivided into six sites denoted by A, B, C , D, E and F with 32, 24, 48, 20, 24 and 36 nodes respectively (Fig. 2). The sites B, C , E and F are each composed of two sub-sites. Without any loss of generality, we suppose that each sub-site is made up of a cluster of nodes with the same nominal power α expressed in terms of millions of instructions per second (MIPS). To give an example the site F is composed of the sub-site F1 , i.e., a cluster made up of 16 nodes with α = 200, and by the sub-site F2 consisting of 20 nodes with α = 100. For the sake of simplicity we have hypothesized that the power of the clusters decreases from 3000 to 100 MIPS when going from A to F2 . Hereinafter we shall denote the nodes by means of the external numbers shown in the figure, so that, for instance, 60 is the fourth node in cluster C1 . For each node four communication typologies are assumed. The first is the bandwidth βii available when tasks are mapped on the same node (intranode communication); the second is the bandwidth βij between the nodes i and j belonging either to the same cluster (intracluster communication), to different clusters of the same site (intrasite communication) or to different sites (intersite communication). The intranode bandwidths βii , usually higher than βij , have all been fixed to 10 Gbit/s. For each link, the input and output bandwidths are supposed to be equal. In our case the intersite, the intrasite and the intracluster bandwidths are reported in Table 1. It is remarkable that the intracluster bandwidth increases from 100 to 10,000 when going from A to F and the bandwidth between clusters belonging to the same site is supposed to be lower than that within a cluster. For example, each node of E1 communicates with a node of E2 with a bandwidth of 75 Mbit/s. Moreover we suppose that we know the average load of the available grid resources for the time span of interest and the reliability of the nodes and of the internet links. For the sake of simplicity, in the experiments performed this average local load is assumed to be constant during all the execution time of the task placed on it. Obviously a variable load would require only a different calculation but it would not invalidate the approach proposed.
To identify the most appropriate strategy among different DE schemes and to perform the tuning of DE parameters, a preliminary comparison was carried out on mapping test problems with different sets of parameter values. In addition, these trials were used also to evaluate the effectiveness of the site mutation operator introduced and its most suitable use probability. At the end of this phase, we chose the DE/best/2/exp scheme and the following values: M = 100, g = 20,000, pm = 0.2, CR = 0.7 and Fm = 0.5. Different simulations were carried out on two applications composed by P = 24 and P = 30 tasks respectively. For each test problem, 20 DE executions were accomplished, so as to reduce the randomness of the algorithm. Each of these executions lasted about 1 min for each applications on a MacBook Pro equipped with a 2.4 GHz Intel Core 2 Duo processor. At the end of the experiments the most meaningful findings are extracted and presented. The average results confirmed the robustness of our approach. The outcomes achieved are shown only for some of the numerous simulations carried out. Henceforth we denote by µΦ1 and µΦ2 the best solutions found in terms of lowest maximal resource utilization time and of highest reliability. 5.1. Structure 1 The first set of experiments was carried out on an application made up of P = 24 tasks numbered from 1 to 24 and structured as depicted in Fig. 3. Each of the first four tasks, organized in a totally connected topology, had γk = 100 Mega Instructions (MI) and exchanged 310 Gbit with the other three. The remaining 20 tasks, according to the computations and communications to perform, were subdivided into three groups G1, G2 and G3, each with a different cardinality. The application was conceived so that the first group resulted compute-bound, the third I/O-bound and the second a balance between them. In particular G1 consisted of three pairs of tasks (5, 15), (6, 16) and (7, 17): each of these six tasks had to perform 400 Giga Instructions (GI) and exchanged 100 Mbit with the respective partner and 100 Mbit with its root task 2. The group G2 was made of four pairs of tasks (8, 18), (9, 19), (10, 20) and (11, 21): each of these eight tasks had γk = 100 GI to execute, and exchanged 10 Gbit with its corresponding partner and 100 Mbit with its root task 3. The last group was composed of three pairs of tasks (12, 22), (13, 23) and (14, 24): each of these six tasks had to carry out 100 MI, and exchanged 30 Gbit with its related partner and 100 Mbit with its root task 4. Simulation 1. During the first simulation it was supposed that all the nodes were completely discharged (`ci (1t ) = 0.0 ∀i) and all the communication channels were unused (`βij (1t ) = 0.0 ∀ (i, j)). Finally, λw = 1.0 ∀w and πz = 1 for all the nodes. In this case, the problem was actually a single-objective one and, of course, the Pareto front degenerated into one single point. The allocation found was 180 178 183 183 µΦ1 = µΦ2 = 62 63 64 65 84 85 86 7 8 16 24 25 26 66 67 68 69 94 95 96 with Φ1 = 137.4583 s and Φ2 = 1.0.
I. De Falco et al. / Future Generation Computer Systems 26 (2010) 857–867
863
Table 1 Intersite, intrasite and intracluster bandwidths expressed in Mbit/s.
A B1 B2 C1 C2 D E1 E2 F1 F2
A
B1
B2
C1
C2
D
E1
E2
F1
F2
100 2 2 4 4 8 16 16 32 32
200 75 4 4 8 16 16 32 32
200 4 4 8 16 16 32 32
400 45 8 16 16 32 32
400 8 16 16 32 32
800 16 16 32 32
1000 75 32 32
1000 32 32
2000 100
10,000
The solution here achieved was the optimal one. In fact, by taking into account the amount of computations and the supposed number of communications, the tasks decisive in determining the resource utilization time were the computationally heaviest ones belonging to G1 and the first four tasks. The former had been correctly placed on nodes of A. The latter, characterized by the highest number of communications, had been appropriately allocated to nodes of F2 , due to their highest intrasite bandwidths. Moreover, the solution deploys two of the first four tasks on the same node (tasks 3 and 4 were both placed on node 183) to exploit the high intranode communication bandwidth. Naturally all these four tasks could be assigned to the same node of F2 but such a choice would not have contributed to reduce the total execution time of the whole application. Hence, if there is a task whose execution slows down the performance of the whole application, as in our case a task of G1, it is senseless to further speed up the execution time of the other co-allocated tasks belonging to the same application. This is in line with the restriction imposed by the fitness function to yield a better employment of the grid resources, leaving free, when possible, the most powerful resources which could be more profitably used for other applications. Remembering that Φ1 is the maximal utilization resource time which represents also the maximal execution time among all the co-allocated tasks on the basis of the mapping supplied, all the considerations to explain the value of Φ1 are effected with reference to the allocation task/node which implies the highest execution time. Considering that in this case the maximal execution time relates to the tasks belonging to G1, the execution time Φ1 = 137.4583 s is obtained by adding the computation time to execute 400 GI on the most powerful nodes of A (400 GI/3000 MIPS = 133.3333 s), the communication time to exchange 100 Mbit with the partner task still on A (100 Mbit/ (100 Mbit/s) = 1.0 s) and the communication time needed to exchange 100 Mbit with the root task deployed on the node 178 of F2 (100 Mbit/(32 Mbit/s) = 3.125 s). It is simple to ascertain that any different allocation of the tasks of G1 on nodes not belonging to the site A would entail a resource execution time much greater than that proposed. For instance, if the nodes of B1 were used, the execution time due to the computation alone would be equal to 200 s. Simulation 2. This simulation was carried out leaving unchanged all the operating conditions apart from the node and the bandwidth loads. In particular all the nodes were discharged except the nodes in the interval [1, 16] which had `ci (1t ) = 0.7, while all the node channels were unused apart from those in [175, 184] which presented `βij (1t ) = 0.9. Obviously the two mappings still coincided:
µΦ1 = µΦ2
166 = 19 20 21 29 30 31
165 167 61 62 63 64 66 67 68 69
with Φ1 = 137.4583 s and Φ2 = 1.0.
168 65 59 60 70 71 72
The value of Φ1 for the proposed solution was the same as in the previous simulation. In such a case the tool provided a mapping which excluded the loaded nodes and, from among the discharged ones, as occurred in the former simulation, selected those with the most suitable characteristics in terms of both computation and bandwidth. In fact the first four tasks were deployed on four unused nodes in F2 while the six tasks of G1, involving the heaviest computations, were mapped on six of the discharged nodes of A. This experiment was useful to demonstrate the general ability of the evolutionary tool to adapt to the state of the grid resources. Simulation 3. This simulation was performed leaving unchanged the working conditions on node and bandwidth loads. Also the reliability of all the internet links remained the same, while πz = 0.995 for all the nodes in the interval [1, 60], πz = 0.993 for those in [61, 120] and finally πz = 0.990 for the nodes in the domain [121, 184]. The allocation at the extremes of the Pareto front achieved were as follows:
µΦ1 =
172
17 18 28 21 26 31
171 173 64 61 62 59 66 67 68 69
174 60 70 71 57 58 72
with Φ1 = 137.4583 s and Φ2 = 0.8517, and
µΦ2 =
58
15 16 17 25 26 27
57 59 18 19 20 21 28 29 30 31
60 22 23 24 32 31 32
with Φ1 = 2525.1 s and Φ2 = 0.8867. The first solution was characterized by the optimal resource utilization time with a reasonably good reliability value. As regards the second objective, the placement on all the nodes with the highest reliabilities, i.e., 0.995, has achieved the maximal reliability value of 0.8867. It should be emphasized that, from among the nodes with the maximal reliabilities, those of C1 were selected for the deployment of the first four tasks. This choice, at a parity of maximal reliability, guaranteed the best possible value in terms of resource utilization time. Furthermore, the system proposed some other non-dominated solutions better balanced in terms of the two goals, such as
µΦ =
174
35 36 37 21 25 31
167 169 38 39 40 41 34 40 41 45
169 42 43 44 46 47 48
with Φ1 = 253.125 s and Φ2 = 0.8689. In this solution, to exploit their high intersite bandwidth, the nodes of F2 were selected for the mapping of the first four tasks which involved intensive communications. Although the best value of reliability was not achieved, this choice contributed to a reduction of the resource utilization time ensuring, contemporaneously, a very good value for reliability.
864
I. De Falco et al. / Future Generation Computer Systems 26 (2010) 857–867
Table 2 Statistical findings for the experiments. Simulation 1
Best nb Avg. Std dev.
Simulation 2
Simulation 3
Ob_1
Ob_2
Ob_1
Ob_2
Ob_1
Ob_2
137.4583 11 163.2151 29.5124
1.0 20 1.0 0.0
137.4583 8 169.5384 32.0641
1.0 20 1.0 0.0
137.4583 5 170.9046 24.5624
0.8867 20 0.8867 0.0 Fig. 4. The application skeleton.
Table 3 Statistical results without site mutation operator. Simulation 1
Best nb Avg. Std. dev.
Simulation 2
Simulation 3
Ob_1
Ob_2
Ob_1
Ob_2
Ob_1
Ob_2
262.5532 3 304.1923 14.6723
1.0 20 1.0 0.0
339.6333 2 389.8342 46.8325
1.0 20 1.0 0.0
339.6333 2 389.8744 49.0810
0.7802 2 0.7674 0.0089
Another example is
µΦ =
182
17 19 26 24 22 23
167 174 81 57 20 21 93 58 30 31
180 60 60 59 66 69 67
with Φ1 = 170.9821 s and Φ2 = 0.8602. It is up to the user to choose, from among the solutions supplied, the mapping which best suits his/her needs. In Table 2, for all the experiments performed and for both the objectives, i.e. the resource utilization time Ob_1 and the reliability Ob_2, the statistical results are presented in terms of the best found value Best, the number of times nb this value was achieved, the average Avg. and the standard deviation Std. dev. of the best values over the ten tests effected. These findings indicate a high degree of efficiency of the proposed model. In fact, the optimal solutions were provided for both resource use and reliability, independently of work conditions. Also the robustness is quite good as can be observed from the number of times the best values are attained. Concerning the average and the standard deviation values, it must be remarked that these apparently high values do not imply that the relative solutions are far from the optimal ones. Rather it means that some of the tasks were placed on some inappropriate nodes. In fact, since fitness can take on only a finite set of values, even some quite different from each other, it is sufficient that one task be deployed on a wrong node to have a fitness function value much higher than that of the related optimal solution. This explains the value for the averages and the standard deviations. To further prove that the effectiveness of the site mutation operator added to the classical DE transformations is not linked to a particular arrangement of the simulations, experiments analogous to those reported in Table 2 were carried out in its absence. The results attained are reported in Table 3. From a comparison of the findings shown in these two tables, it is simple to verify the actual benefits due to the introduction of the site mutation operator inside the classical DE transformations. 5.2. Structure 2 This experimental phase was made considering an application made up of P = 30 tasks: a master denoted with 1 plus 29 slaves numbered from 2 to 30 and divided into two groups, G4 arranged in a ternary tree topology of 13 tasks and G5 composed of 16 tasks arranged as a mesh topology (see Fig. 4). The master task 1 had γk = 100 MI to execute and 150 Gbit to exchange with task 2 (the root of the tree) and with task 15 (the first task of the mesh). Based on the amount of computations and
communications, the thirteen tasks, according to tree levels, were subdivided into three subgroups composed of one, three and nine tasks respectively: G41 = [2], G42 = [3, 5] and G43 = [6, 14]. Task 2, in addition to the previous communications, exchanged 50 Gbit with each of its three leaves and carried out 100 MI. Each of the tasks of G42 , besides communicating with its root task 2, exchanged 100 Mbit with its leaves and had γk = 1 GI to perform. Finally each of the nine tasks of G43 had to execute 450 GI in addition to ψij = 100 Mbit to exchange with the related root. According to the computations and communications to execute, the 16 tasks of the mesh topology were also subdivided into four subgroups each composed of the tasks of the same row: G51 = [15, 18], G52 = [19, 22], G53 = [23, 26] and G54 = [27, 30]. The tasks belonging to the same subgroup carried out the same number of computations and communications. As concerns the computations, each of the tasks of G51 and G54 executed 1 GI, while those of G52 and G53 had γk = 150 GI. Regarding the communications, the tasks of the first and fourth subgroups, with the exception of task 15 which exchanged 150 Gbit also with task 1, exchanged 100 Mbit with the tasks to which they were linked along the column and 15 Gbit with their neighbours along the row, while the tasks of the second and the third subgroups exchanged 100 Mbit with all their respective neighbouring tasks along both the row and the column. Thus the tasks of G51 and G54 could be considered I/O-bound while those belonging to the other two subgroups compute-bound. Simulation 1. In such an experiment it was supposed that all the nodes were completely discharged (`ci (1t ) = 0.0 ∀ i) and all the communication channels were unused (`βij (1t ) = 0.0 ∀ (i, j)). Finally, λw = 1.0∀ w . Hypothesizing that πz = 0.994 for all the nodes in the interval [41, 60], πz = 0.995 for those in [121, 160] and πz = 0.996 for the remaining nodes, the allocations at the extremes of the achieved front µΦ1 and µΦ2 are reported in Box I. Both the solutions were the optimal ones with respect to their own objective. In fact, as regards the first objective, it is simple to ascertain that the tasks involving substantial communications, i.e., the tasks in the range [1, 5] and task 15, were allocated on nodes of F2 which present the highest communication bandwidth. Moreover, the determining factor in the evaluation of the best resource utilization time was the allocation of the tasks in the range [6, 14] which had to execute 450 GI. As can be noticed, these tasks were deployed on the nodes of A which are the most powerful ones. Any other placement would have entailed an execution time higher than that provided. For example, choosing the nodes with the immediately inferior power, i.e. those of B1 , this computation time would have been equal to 225 s (450 GI/2000 MIPS = 225 s). It is note worthy that, it being impossible to execute 450 GI in a shorter time, a better mapping could have been achieved by leaving the tasks [6, 14] on A and by allocating suitably the relative root tasks [3, 5] so as to reduce the communication time. To communicate with a bandwidth higher than 32 Mbit/s, as in the solution recommended, the above-mentioned root tasks would have had to be placed also on A which has a bandwidth of 100 Mbit/s. Unfortunately, since each of these tasks exchanged
I. De Falco et al. / Future Generation Computer Systems 26 (2010) 857–867
32 40
180 20 47
32 68
180 20 69
( µΦ1 = 181 173
84
182 85
183 86
184 25
43
865
) 18 29
19 35
6 34
7 37
8 101
9 102
10 103
104
with Φ1 = 153.125 s and Φ2 = 0.8849, and
( µΦ2 = 181 172
81
182 96
183 101
184 66
67
) 30 70
28 71
29 72
31 37
26 97
27 92
19 93
100
with Φ1 = 177.7222 s and Φ2 = 0.8867 Box I.
0.860 0.855 0.850 Reliability
50 Gbit with its root task 2, this choice would have yielded 500 s for this communication only. As previously remarked, some of the tasks could have been assigned to more appropriate nodes. For example the tasks in the range [16, 18] could have been placed on nodes of F2 but such a choice, although it would have reduced the communication time of the tasks of G51 , would not have contributed to reduce the total execution time of the whole application. Thus, this solution was not preferred in line with the constraint established by the fitness function, namely to leave free, when possible, the most powerful resources which could be more fruitfully exploited for other applications. The execution time Φ1 = 153.125 s was achieved by adding to the computation time needed to execute 450 GI, required by the compute-bound tasks, on the most powerful nodes of A (450 GI/3000 MIPS = 150.0 s), the communication time needed by each of these tasks to exchange 100 Mbit with its root task placed on F2 (100 Mbit/(32 Mbit/s) = 3.125 s). Regarding the second objective, it is simple to verify that the mapping provided had the best reliability. In fact all the chosen nodes had πz = 0.996. However it should be emphasized that this value could have been simply attained by choosing the nodes with the highest reliability in one of the ranges [1, 40], [61, 120] and [161, 184]. Instead, on the basis of its multiobjective functionality, the nodes had been picked up opportunely to reduce, at the same time, the resource utilization time also. Simulation 2. This simulation was carried out varying only the loads of some nodes of the grid. In particular all the nodes were discharged except those in the intervals [1, 16] which had `ci (1t ) = 0.7. The best mappings relative to the two objectives µΦ1 and µΦ2 are shown in Box II. As can be seen, each mapping was optimal for its own objective. However, it is to noteworthy that, concerning the resource utilization time, the tasks in the range [6, 14] had been placed on the discharged nodes and this had still allowed the achievement of the best value for Φ1 . Instead, for the second objective, even though the best value of reliability had been preserved, the value for the resource utilization time had increased. Simulation 3. Once we had established that the system worked properly in predetermined conditions, we were able to test it in scenarios closer to real situations. So such a simulation was carried out changing the load and the reliability of the nodes. In particular each `ci (1t ) was randomly set in the range [0.1, 0.7] while each `βij (1t ) was randomly set in the interval [0.2, 0.9]. The reliabilities of all the internet links remained unchanged while πz was randomly distributed in the range [0.992, 0.995]. Unfortunately, in this situation, it is extremely difficult, if not prohibitive, for a user, even if he/she is particularly skilled, to identify the criteria on the basis of which he/she can perform the placement to attain even a suboptimal mapping. The best solution provided by our evolutionary tool for the resource utilization time µΦ1 is outlined in Box III. The best deployment for the reliability µΦ2 is reported in Box IV.
0.845 0.840 1,000 generations 2,000 generations 5,000 generations 15,000 generations
0.835 0.830 200
400
600
800 1000 1200 1400 1600 1800 2000 Use of resources (in seconds)
Fig. 5. The Pareto fronts.
The first solution presented a resource utilization time of about 279.1657 s with a reasonably good reliability value, while that relative to the second objective had a very good value for Φ2 but a much higher resource utilization time. However, these allocations turned out to be very close to the optimal ones. In fact, in the operating working conditions here hypothesized, it is simple to notice that even if the solution nominated for the first objective selected the least charged nodes, they, due to a random load included between 10% and 70%, would have their power engaged on average for about 40%. In this situation the most powerful nodes of A with a load inferior to 40% would have a residual power of about 1.8 GI. Consequently the execution of the tasks requiring 450 GI on these nodes would take about 250 s. By adducing an analogous reasoning for the bandwidth loads, it is evident that the value supplied for Φ1 is a suboptimal one. The efficiency of the mapping results is much more manifest for the second objective. In the supposed reliability values, the mapping provided was attained by deploying the tasks only on the nodes with πz > 0.9948. Obviously, in the case of a fair distribution, if the reliability varied in the range [0.992, 0.995], it was particularly improbable to have 30 nodes with πz ∈ [0.9948, 0.995] available. This is verifiable for the deployment supplied in which more tasks were allocated on the same node. This demonstrates that our automatic tool can be profitably used to identify a good solution for any working conditions. In Fig. 5 we report the Pareto fronts of the alternative solution sets, at various stages of evolution, discovered for one of the runs for this last test. The solutions range between those with a good reliability Φ2 and a high resource use time Φ1 to those with a lower reliability but with a shorter use time. As can be easily perceived from the figure, the solutions which yield a better balance in satisfying both goals are those in the intermediate region of the front. In fact, such solutions are geometrically closer to the theoretically optimal solution for which Φ1 → 0 and Φ2 → 1.
866
I. De Falco et al. / Future Generation Computer Systems 26 (2010) 857–867
32 44
180 20 45
33 8
180 26 20
( µΦ1 = 181 172
74
182 96
183 85
184 25
43
) 30 46
28 47
29 48
31 37
22 97
23 92
24 93
100
with Φ1 = 153.125 s and Φ2 = 0.8760, and
( µΦ2 = 181 173
182 172
171
183 173
184 25
2
) 27 68
28 69
29 70
30 71
31 112
32 107
14 108
109
with Φ1 = 228.125 s and Φ2 = 0.8867 Box II.
( µΦ1 = 176 176
168 122
107
183 107
184 42
30 46
45
183 41 56
) 2 46
15 51
17 35
14 39
13 86
27 77
25 100
103
with Φ1 = 279.1657 s and Φ2 = 0.8487 Box III.
( µΦ2 = 91 95
62
77 67
103 62
103 2
122
30 137
97 39 169
) 42 122
39 155
39 30
42 135
69 77
30 103
62 77
103
with Φ1 = 1287.1502 s and Φ2 = 0.8553 Box IV. Table 4 Statistical results for the experiments. Simulation 1
Best nb Avg. Std. dev.
Simulation 2
Simulation 3
Ob_1
Ob_2
Ob_1
Ob_2
Ob_1
Ob_2
153.125 5 201.135 96.5042
0.8867 20 0.8867 0.0
153.125 4 214.421 92.8762
0.8867 20 0.8867 0.0
279.1657 4 482.5321 116.3226
0.8553 4 0.8551 0.0001
In Table 4 the findings for all the simulations carried out on the second application structure are shown. Even for this set of simulations, considerations analogous to those referred to in Table 2 can be adduced to sustain that efficient solutions have been supplied by the proposed evolutionary mapper. 6. Conclusions This paper illustrates how to formulate the static mapping problem of computationally intensive applications on a grid using a differential evolution algorithm with the goals of minimizing the degree of use of the grid resources and, at the same time, respecting as far as possible user’s QoS requests and environment specifications. To provide the user with a set of possible mapping solutions, each with a different balance for use of resources and QoS values, a multiobjective version of DE, based on the Pareto optimality criterion is proposed. It has been shown that this evolutionary approach can be used in a variety of heterogeneous frameworks because it does not rely on any specific communication subsystem model. Moreover, when the grid operating parameters, as for example the load of nodes and network bandwidths, are randomly distributed, it becomes extremely difficult, if not prohibitive, even for a particularly skilled user, to identify the criteria on the basis of which he can perform the placement to attain a suboptimal mapping. Experimental results have been outlined and discussed on several allocation scenarios differing in terms of application requests and grid operating conditions. These findings have demonstrated that our algorithm, enriched with a new mutation operator, represents a valid alternative to a manual and ineffective attempt to achieve the mapping on a heterogeneous set of multisite grid resources, if information about the status of both accessible and demanded resources is available.
References [1] I. Foster, C. Kesselmann (Eds.), The Grid: Blueprint for a New Computing Architecture, Morgan Kaufmann, 1998. [2] R. Buyya, D. Abramson, J. Giddy, H. Stockinger, Economic models for resource management and scheduling in grid computing, Journal of Concurrency and Computation: Practice and Experience 14 (13–15) (2002) 1507–1542. [3] J.M. Schopf, Ten actions when grid scheduling: The user as a grid scheduler, in: Grid Resource Management: State of the Art and Future Trends, Kluwer Academic, Norwell, MA, USA, 2004, pp. 15–23. [4] H.H. Mohamed, D.H.J. Epema, The design and implementation of the KOALA co-allocating grid scheduler, in: Proc. of the European Grid Conference, in: LNCS, vol. 3470, Amsterdam, 2005, pp. 640–650. [5] C. Qu, A grid advance reservation framework for co-allocation and coreservation across heterogeneous local resource management systems, in: R. Wyrzykowski, et al. (Eds.), Proc. of 7th Int. Conf. on Parallel Processing and Applied Mathematics, in: LNCS, vol. 4967, Springer, 2007, pp. 770–779. [6] A. Takefusa, H. Nakada, T. Kudoh, Y. Tanaka, S. Sekiguchi, GridARS: An advance reservation-based grid coallocation framework for distributed computing and network resources, in: Proc. of the 13th Workshop on Job Scheduling Strategies for Parallel Processing, in: LNCS, vol. 4942, Springer, 2008, pp. 152–168. [7] E. Elmroth, J. Tordsson, Grid resource brokering algorithms enabling advance reservations and resource selection based on performance predictions, Future Generation Computer Systems 24 (6) (2008) 585–593. [8] G. Mateescu, Quality of service on the grid via metascheduling with resource co-scheduling and co-reservation, International Journal of High Performance Computing Applications 17 (3) (2003) 209–218. [9] O. Wäldrich, P. Wieder, W. Ziegler, A meta-scheduling service for coallocating arbitrary types of resources, in: Proc. of the Sixth Int. Conf. on Parallel Processing and Applied Mathematics, in: LNCS, vol. 3911, 2005, pp. 782–791. [10] L. Wang, J.S. Siegel, V.P. Roychowdhury, A.A. Maciejewski, Task matching and scheduling in heterogeneous computing environments using a geneticalgorithm-based approach, Journal of Parallel and Distributed Computing 47 (1997) 8–22. [11] J. Blythe, S. Jain, E. Deelman, Y. Gil, K. Vahi, A. Mandal, K. Kennedy, Task scheduling strategies for workflow-based applications in grids, in: Proc. of the IEEE Int. Symp. on Cluster Computing and Grid, 2005, pp. 759–767. [12] A. Doğan, F. Özgüner, Scheduling of a meta-task with QoS requirements in heterogeneous computing systems, Journal of Parallel and Distributed Computing 66 (2) (2006) 181–186.
I. De Falco et al. / Future Generation Computer Systems 26 (2010) 857–867 [13] I. Foster, Globus toolkit version 4: Software for service-oriented systems. in: Proc. of IFIP Int. Conf. on Network and Parallel Computing, LNCS, vol. 3779, Beijing, China, 2005, pp. 2–13. [14] F. Berman, High-performance schedulers, in: I. Foster, C. Kesselman (Eds.), The Grid: Blueprint for a Future Computing Infrastructure, Morgan Kaufmann, 1998, pp. 279–307. [15] D. Fernandez-Baca, Allocating modules to processors in a distributed system, IEEE Transactions on Software Engineering 15 (11) (1989) 1427–1436. [16] H. Casanova, A. Legrand, D. Zagorodnov, F. Berman, Heuristics for scheduling parameter sweep applications in grid environments, in: Proc. of the 9th Heterogeneous Computing Workshop, Cancun, Mexico, 2000, pp. 349–363. [17] T.D. Braun, H.J. Siegel, N. Beck, L.L. Bölöni, M. Maheswaran, A.I. Reuther, J.P. Robertson, M.D. Theys, B. Yao, A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems, Journal of Parallel and Distributed Computing 61 (2001) 810–837. [18] N. Fujimoto, K. Hagihara, A comparison among grid scheduling algorithms for independent coarse-grained tasks, in: Proc. of Symp. on Applications and the Internet-Workshops, Tokio, Japan, 2004, p. 674. [19] E. Heymann, M.A. Senar, E. Luque, M. Livny, Adaptive scheduling for masterworker applications on the computational grids, in: Proc. of the First IEEE/ACM Int. Workshop on Grid Computing, Bangalore, India, 2000, pp. 214–227. [20] O. Beaumont, A. Legrand, Y. Robert, Scheduling divisible workloads on heterogeneous platforms, Parallel Computing 29 (9) (2003) 1121–1152. [21] C.-H. Hsu, T.-L. Chen, K.-C. Li, Performance effective pre-scheduling strategy for heterogeneous grid systems in the master slave paradigm, Future Generation Computer Systems 23 (2007) 569–579. [22] V. Di Martino, M. Mililotti, Suboptimal scheduling in a grid using genetic algorithms, Parallel Computing 30 (2004) 553–565. [23] Y. Gao, J.Z. Huang, H. Rong, Adaptive grid job scheduling with genetic algorithm, Future Generation Computer Systems 21 (2005) 151–161. [24] C. Ernemann, V. Hamscher, U. Schwiegelshohn, A. Streit, R. Yahyapour, On advantages of grid computing for parallel job scheduling, in: Proc. of the 2nd IEEE/ACM Int. Symp. on Cluster Computing and the Grid, CCGrid 2002, 2002, pp. 39–47. [25] A.I.D. Bucur, D.H.J. Epema, The maximal utilization of processor co-allocation in multicluster systems, in: Proc. of the Int. Symp. on Parallel and Distributed Processing, Nice, France, April 2003, pp. 60–69. [26] J.M.P. Sinaga, H.H. Mohamed, D.H.J. Epema, A dynamic co-allocation service in multicluster systems, in: Proc. of the Tenth Workshop on Job Scheduling Strategies for Parallel Processing, D.G. Feitelson, L. Rudolph, U. Schwiegelshohn (Eds.), LNCS, vol. 3277, pp. 194–209, New York, USA, 2005. [27] G. Onwubolu, G. Davendra, Scheduling flowshops using differential evolution algorithm, European Journal of Operational Research 171 (2006) 674–692. [28] G. Singh, C. Kesselman, E. Deelmann, A provisioning model and its comparison with best-effort performance-cost optimization in grids, in: Proc. of the 16th Int. Symp. on High Performance Distributed Computing, California, USA, 2007. [29] K. Price, R. Storn, Differential evolution, Dr. Dobb’s Journal 22 (4) (1997) 18–24. [30] R. Storn, K. Price, Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces, Journal of Global Optimization 11 (4) (1997) 341–359. [31] C.M. Fonseca, P.J. Fleming, An overview of evolutionary algorithms in multiobjective optimization, Evolutionary Computation 3 (1) (1995) 1–16. [32] I. De Falco, A. Della Cioppa, U. Scafuri, E. Tarantino, Multiobjective differential evolution for mapping in a grid environment, in: R. Perrott, et al. (Eds.), Proc. High Performance Computing Conf., in: Lecture Notes in Computer Science, vol. 4782, Springer, 2007, pp. 322–333. [33] F. Dong, S.G. Akl, Scheduling algorithms for grid computing: State of the art and open problems. Technical Report no. 2006–504, School of Computing, Queen’s University Kingston, Ontario, 2006. [34] A. Nobakhti, H. Wang, A simple self-adaptive differential evolution algorithm with application on the ALSTOM gasifier, Applied Soft Computing 8 (2008) 350–370. [35] S. Das, A. Konar, U.K. Chakraborty, A. Abraham, Differential evolution with a neighborhood-based mutation operator: A comparative study, IEEE Transactions on Evolutionary Computation 13 (3) (2009) 526–553. [36] A. Afzal, A.S. McGough, J. Darlington, Capacity planning and scheduling in grid computing environments, Future Generation Computer Systems 24 (2007) 404–414. [37] J.M. Schopf, F. Berman, Using stochastic information to predict application behavior on contended resources, International Journal of Foundations of Computer Science 12 (3) (2001) 341–364.
867
[38] L. Yang, J.M. Schopf, I. Foster, Conservative scheduling: Using predicted variance to improve scheduling decisions in dynamic environments, in: Proc. of the ACM/IEEE Conf. on High Performance Networking and Computing, 15–21 November 2003, Phoenix, AZ, USA, 2003, p. 31. [39] S. Vazhkudai, J.M. Schopf, Using regression techniques to predict large data transfers, The International Journal of High Performance Computing Applications 17 (3) (2003) 249–268. Special Issue on Grid Computing: Infrastructure and Applications. [40] K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman, Grid information services for distributed resource sharing, in: Proc. of the Tenth IEEE Symp. on High Performance Distributed Computing, San Francisco, CA, USA, 2001, pp. 181–194. [41] R. Wolski, N. Spring, J. Hayes, The network weather service: A distributed resource performance forecasting service for metacomputing, Future Generation Computer Systems 15 (5–6) (1999) 757–768. [42] S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke, A directory service for configuring high-performance distributed computations, in: Proc. of the Sixth IEEE Symp. on High Performance Distributed Computing, Portland, OR, USA, 1997, pp. 365–375. [43] L. Gong, X.H. Sun, E. Waston, Performance modeling and prediction of nondedicated network computing, IEEE Transactions on Computers 51 (9) (2002) 1041–1055. [44] H.A. Sanjay, S. Vadhiyar, Performance modeling of parallel applications for grid scheduling, Journal of Parallel and Distributed Computing 68 (2008) 1135–1145. [45] A.K.M. Khaled Ahsan Talukder, M. Kirley, R. Buyya, Multiobjective differential evolution for workflow execution on grids. in: Proc. of the 5th Int. Workshop on Middleware for Grid Computing, Newport Beach, California, USA, 2007.
Ivanoe De Falco was born in Naples, Italy, in 1961, and got his Laurea degree cum laude in Electrical Engineering at University of Naples ‘‘Federico II’’ in 1987. He is currently a senior researcher at the Institute of High Performance Computing and Networking (ICAR) of the National Research Council of Italy (CNR). He is the author or a coauthor of about 100 publications in international journals and in the proceedings of international conferences, and his papers have received about 220 citations in the international literature. His main fields of interest include evolutionary algorithms and parallel computing. He is in the editorial board of Applied Soft Computing and of Journal of Artificial Evolution and Applications.
Umberto Scafuri was born in Baiano (AV) on May 21, 1957. He got his Laurea degree in Electrical Engineering at the University of Naples ‘‘Federico II’’ in 1985. He currently works as a technologist at the Institute of High Performance Computing and Networking (ICAR) of the National Research Council of Italy (CNR). His research activity is basically devoted to parallel and distributed architectures and evolutionary models.
Ernesto Tarantino was born in S. Angelo a Cupolo, Italy, in 1961. He received the Laurea degree in Electrical Engineering in 1988 from University of Naples, Italy. He is currently a researcher at National Research Council of Italy. After completing his studies, he conducted research in parallel and distributed computing. During the past decade his research interests have been in the fields of theory and application of evolutionary techniques and related areas of computational intelligence. He is author of numerous scientific papers in international conferences and journals. He has served on several program committees of conferences in the area of evolutionary computation.