Dynamic workload balancing of parallel applications with user-level scheduling on the Grid

Dynamic workload balancing of parallel applications with user-level scheduling on the Grid

Future Generation Computer Systems 25 (2009) 28–34 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: ww...

3MB Sizes 0 Downloads 19 Views

Future Generation Computer Systems 25 (2009) 28–34

Contents lists available at ScienceDirect

Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs

Dynamic workload balancing of parallel applications with user-level scheduling on the Grid Vladimir V. Korkhov a,∗ , Jakub T. Moscicki b , Valeria V. Krzhizhanovskaya a,c a

Section Computational Science, University of Amsterdam, The Netherlands

b

CERN, Switzerland St. Petersburg State Polytechnic University, Russia

c

article

info

Article history: Received 18 March 2008 Received in revised form 3 July 2008 Accepted 9 July 2008 Available online 16 July 2008 Keywords: Dynamic load balancing Resource management User-level scheduling High-performance computing Grid Heterogeneous resources Parallel distributed application

a b s t r a c t This paper suggests a hybrid resource management approach for efficient parallel distributed computing on the Grid. It operates on both application and system levels, combining user-level job scheduling with dynamic workload balancing algorithm that automatically adapts a parallel application to the heterogeneous resources, based on the actual resource parameters and estimated requirements of the application. The hybrid environment and the algorithm for automated load balancing are described, the influence of resource heterogeneity level is measured, and the speedup achieved with this technique is demonstrated for different types of applications and resources. © 2008 Elsevier B.V. All rights reserved.

1. Introduction and motivation Among the most prominent and as yet unsolved problems of Grid performance and usability are efficient resource management at the application and system levels, and optimization of the workload allocation for parallel and distributed applications on highly diverse and dynamically changing resources. One of the priority considerations in distributed parallel computing research is the adaptation of parallel programs previously developed for homogeneous computer clusters to the dynamic and heterogeneous Grid resources with minimal intrusion into the code. Another issue is enabling ‘‘smart’’ selection of the resources most appropriate to the given parallel program. Earlier we proposed an approach to tackle these problems by introducing a mechanism for dynamic load balancing of black-box applications with resource selection [1]. In the present paper this approach is developed further by introducing user-level scheduling. A great number of algorithms, approaches and tools have been developed to bring Grid resource management and job

∗ Corresponding address: Section Computational Science, University of Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands. Tel.: +31 20 5257599; fax: +31 20 5257490. E-mail address: [email protected] (V.V. Korkhov). 0167-739X/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2008.07.001

scheduling issues to a more advanced level of efficiency and usability (see for instance [6–15]), but in a seemingly successful research field teeming with various solutions at hand, when things come to practice it turns out to be impossible to find a tool/library for automatic load balancing of a parallel distributed application according to its individual requirements on heterogeneous resources. The stumbling-block is that these application requirements are not known beforehand in most real-life complex applications, as they might differ in each new computational experiment depending on initial conditions, numerical schemes chosen, computational parameters, etc. The benchmarking and performance assessment of the Virtual Reactor application showed that even within one solver different trends can exist in the application requirements and parallel efficiency, depending on the problem type and computational parameters [4,5]. A countless number of parallel applications have been developed for traditional parallel systems. Porting such applications from homogeneous computing environments to dynamic heterogeneous resources poses a challenge to keep up a high level of application efficiency. An adequate workload optimization method should take into account two aspects: (1) The application characteristics, such as the amount of data transferred between the processes, application communica-

V.V. Korkhov et al. / Future Generation Computer Systems 25 (2009) 28–34

tion structure, amount of floating point operations, memory requirements, hard disk or other I/O activity, etc. (2) The resource characteristics, like computational power and memory of the worker nodes, network links bandwidth, disk I/O speed, and the level of heterogeneity of the resources randomly assigned to the application by the Grid resource broker. The method should be (a) self-adapting and flexible with respect to the type of application, (b) computationally inexpensive not to induce a large overhead on the application performance, and (c) should not require significant modifications in the code. On top of that, the load balancing shall be (d) dynamic and fully automated since the complexity of Grid environment should be hidden from end-users. Generally studies on load balancing consider distribution of processes to computational resources on the system/library level with no modifications in the application code [16,17]. Less often, load balancing code is included into the application source-code to improve performance in specific cases [18,19]. Some research projects concern load balancing techniques that use source code transformations to speedup the execution of the application [20]. Traditionally there are two approaches to workload balancing of parallel applications: (1) to carefully calculate the distribution of the workload, taking into account all the properties of the environment and application — a time and resource consuming task requiring expert knowledge of the application structure and algorithms used; and (2) distribute the workload in a straight forward way, at best considering only the processing power of the worker nodes — a fast but not very efficient way in terms of parallel performance. We propose an intermediate approach that iteratively refines the workload distribution [5]. In the integrated system proposed here a hybrid approach is employed, where the balancing decision is taken in interaction of the application with the execution environment. The paper is organized as follows: Section 2 introduces the basic ideas and steps of a hybrid resource management approach with automated load balancing technique and userlevel scheduling for parallel applications on the Grid; Section 3 summarizes the algorithm for adaptive workload balancing (AWLB) introduced in [5]; Section 4 presents the results of AWLB implementation, describes a synthetic model application developed for experiments, and shows the trends of the load balancing speedup and the influence of the resource heterogeneity level. Section 5 concludes the paper with discussion and summary of results. 2. Automated load balancing with user-level scheduling Based on our previous experience [4,5], we developed a load balancing technique that takes into account the heterogeneity and the dynamics of Grid resources, estimates the initially unknown application requirements, and provides resource selection and optimal mapping of parallel processes to the available resources. This section describes a hybrid resource management environment, operating on both application and system levels: scheduling decisions are taken in interaction of the application with the execution environment. On the application level the Adaptive Workload Balancing algorithm (AWLB) [5] is employed, which ensures optimal workload distribution based on the discovered application requirements and measured resource parameters. The execution on the system level is controlled by a User-Level Scheduling (ULS) environment that maintains the user-level resource pool and enables resource selection. One of the most widely used types of parallel applications consists of a set of parallel tasks that process the workload scheduled by the Master task. In the proposed system the latter is

29

bound to the ULS environment to establish the interaction between system and application levels. The ULS operates information about the available resources and monitors the application response. In dynamic environment the ULS responds to changing application or resource conditions: more suitable resource set can be selected to execute the application at a given time. This is a distinctive feature, in contrast to the traditional parallel programs where resources are allocated once and fixed during the execution (unless special migration libraries are used like Dynamite [17]). To support the replacement of resources during run-time, the concept of userlevel resource pool is employed. This resource pool is maintained and supplied by the ULS environment which dynamically selects the resources most suitable for the application. The suitability of resources is determined by the application requirements; for traditional parallel computing applications considered here as a test case, it depends on the processing power and network connectivity correlated with the application communication to computation ratio. After resources have been selected and assigned to the tasks, the workload balancing on the application level is performed using the AWLB algorithm summarized in Section 3. The computation is performed as an iterative process; in each iteration the distribution of the workload is re-evaluated on a new set of resources, and the AWLB parameters are re-estimated. The outline of adaptive workload balancing in ULS environment is presented in Fig. 1 as a meta-algorithm. All the processing is performed iteratively within the framework of the ULS environment, and two levels are distinguished: the resource pool level and the application level. The ULS environment is responsible for managing the resource pool to supply the application with appropriate resources. In turn, the application updates the environment with its changing requirements. Explanation of the steps in Fig. 1 is provided below. I. Resource pool level. In parallel to the application execution, the resource pool is being monitored and updated by the ULS environment. Step R1. Update the pool: Discover available resources using Grid information services, acquire them to the pool if they meet the application requirements (if the requirements have already been determined, i.e. this is not the initial occurrence of the step R1, and step A7 was already performed at least once). Check already acquired resources for availability, remove no longer available ones. Step R2. Benchmark resources: Measure the computational power and available memory on the worker nodes, bandwidth of the network links, hard disk capacity, I/O speed etc. Step R3. Rank resources: Update and re-order a table of resource parameters, which will be used by the resource selection procedure in Steps A2, A3. The ranking criterion is based on the application characteristics: for the communicationbound applications, network links bandwidth is the top priority ranking parameter, for the computation-intensive applications the processing power is, and for the intermediate cases the resources are chosen to best fit the application communication to computation ratio discovered by the AWLB algorithm (step A7). II. Application level. The load balancing, resource matching and task mapping is performed on this level, in coordination with the resource pool level. Step A1. Estimate application requirements; set initial AWLB parameters: Application requirements are used to set initial values of the AWLB parameters (see Section 3); additional constraints are set (e.g. minimal memory required). Later the AWLB parameters are automatically tuned during runtime (Step A4). Step A2. Matching resources I. Constraining factors: Filter out the resources which do not meet the minimal application requirements determined in Steps A1, A7. For instance, memory

30

V.V. Korkhov et al. / Future Generation Computer Systems 25 (2009) 28–34

characteristics measured in Steps R2 and application requirements estimated in Steps A1, A7. Step A6. Iteration execution: Perform an iteration (including calculations and communications) with the workload distribution defined in Step A4. Step A7. Analyze execution and AWLB parameters: Measure the execution time of one iteration (Step A6) with current AWLB parameters and quantitatively estimate the requirements of the application based on the results of resource benchmarking (Step R2) and measurements of the application response. Being an adaptive heuristic, AWLB requires several steps of computations to estimate optimal values of the application parameters on a given resource set. If the resources change between the iterations, re-estimation of AWLB parameters is required. For dynamic resources where performance is influenced by other factors, a periodic re-estimation of resource parameters and load re-distribution is performed. If the application is dynamically changing (for instance due to adaptive meshes or different combinations of physical processes modeled at different simulation stages), then the application requirements must be periodically re-estimated even on the same set of resources. This is shown by the dashed arrows in Fig. 1. 3. Adaptive load balancing on heterogeneous resources: Theory

Fig. 1. Iterative execution of parallel application in the integrated AWLB + ULS environment with dynamic resource pool.

can be the constraining factor: in case of insufficient memory, the processor is disregarded from the computation. Step A3. Matching resources II. Selecting resources: (1) Find an optimal number of processors. It is provided by the AWLB mechanism, depending on the execution time of a single iteration on different number of processors. (2) Select best suited resources: acquire the needed number of worker nodes from the top of the ranked resources list (step R3). Steps A2, A3 use information from the resource pool and send back the application requirements and the requests on chosen resources. The requests are then processed by the ULS environment to book or release the resources. Step A4. Tune AWLB parameters: The AWLB parameters (see Section 3) are tuned to provide better workload distribution, based on the execution results analyzed in Step A7. Step A5. Resource mapping and load balancing: Actual optimization of the workload distribution within the parallel tasks is performed, i.e. mapping the processes and workload onto the allocated resources. This step is based on the AWLB algorithm described in Section 3. It includes a method to calculate the weighting factors for each processor depending on the resource

In [4] we proposed a methodology for adaptive load balancing of parallel applications on heterogeneous resources, extending it to dynamic load balancing and introducing the heterogeneity metrics in [5]. This section gives a theoretical description of the basic concepts and parameters mentioned in the meta-algorithm and concentrates on the two most important issues: (1) estimating the application requirements (Step A1) and (2) the actual load balancing of parallel applications on heterogeneous Grid resources (Step A5). The load balancing Step aims at optimizing the load distribution among the resources already selected in previous Steps (after performing the check against the restricting factors such as the memory deficiency). Therefore the theory is given under the assumption that the resources are ‘‘fixed’’ for a single load-balancing loop, and that using all these resources provides a reasonably good performance result (e.g. parallel speedup for traditional parallel computing applications). Another prerequisite is that the application is already implemented as a parallel (or distributed) program, and is able to distribute the workload by chunks of controllable size. This implies that the Master–Worker model is kept in mind, but the technique is applicable to other communication logical topologies, given that the measurements are carried out along the links used within the application. The algorithm is designed in such a way that the knowledge of these resource and application characteristics would give an instant solution to the workload distribution, thus making the procedure very lightweight and suitable for dynamic load balancing at runtime. The main generic parameters that define a parallel application performance are:

• The application parameter fc = Ncomm /Ncalc , where Ncomm is the total amount of application communications, i.e. data to be exchanged (measured in bit) and Ncalc is the total amount of computations to be performed (measured in Flop); • The resource parameters µi = pi /ni , where pi is the available performance of the ith processor (measured in Flop/s) and ni is the network bandwidth to this node (measured in bit/s). The AWLB algorithm is based on benchmarking the available resources capacity, defined as a set of individual resource parameters µ = {µi }, and experimental estimation of the application parameter fc . The value of the application parameter

V.V. Korkhov et al. / Future Generation Computer Systems 25 (2009) 28–34

31

fc is determined by searching through the space of possible fc and finding the value fc∗ which provides minimal runtime of the application on this set of resources. This idea is implemented in Step A4 and will be illustrated in Section 4. The detailed algorithm is described in [5]. The combination of µ and fc∗ determines the distribution of the workload between the processors. To calculate the amount of workload per processor, a weighting factor wi is assigned to each processor. It determines the final workload for a processor given by Wi = wi W , where W is the total application workload. In [4] the expression for the weighting factors wi was derived:

wi = qi

X N

qj ,

qi = pi (1 + fc ϕ/µi )

(1)

j=1

where ϕ is the network heterogeneity metric that can be expressed as a standard deviation of the set of normalized dimensionless resource parameters:

v u u ϕ=t

1

N X (1 − ni /navg )2 ,

N − 1 i=1

nav g =

N 1 X

N i=1

ni .

(2)

To evaluate the efficiency of the workload distribution the load balancing speedup Θ = Tnon−balanced /Tbalanced is introduced, where Tnon−balanced is the execution time without load balancing (even load distribution of the processes), and Tbalanced is the execution time after load balancing on the same set of resources (the time taken to execute the algorithm itself is included). This metric is used to estimate the application parameter fc∗ that provides the best performance on given resources, that is the largest value of speedup Θ . In a non-trivial case a maximum of Θ can be found and thus an optimal fc∗ can be determined for some workload distribution (see Section 4.1). While deriving Eq. (1), a simple case of memory requirements was considered: they only put a Boolean constraint to the allocation of processes on the resources, either there is enough memory to run the application or not. But memory can be one of the determining factors of the application performance and play a role in the load balancing process. This is the case for applications that are able to control memory requirements according to the available resources. In this case there will be additional parameters analogous to fc and µi , but the idea and the load balancing mechanism remain the same. Similar considerations shall be applied for the other types of applications. For instance, in a widely used class of applications performing sequential computing with hard disk intensive operations, the network link bandwidth parameter ni shall be replaced with the disk I/O speed for finding an optimal load distribution in ‘‘farming’’ computations on the Grid. 4. Experimental evaluation This section shows the results illustrating the performance of the proposed workload balancing approach for different types of applications and resources. The AWLB algorithm was first applied while deploying the Virtual Reactor parallel components on heterogeneous Grid resources [3]. The load balancing procedure is implemented as an external library, and after linking with the application provides a recommendation on how much work shall be done by each of the assigned processors to ensure the fastest possible execution time taking into account the specific parameters of the resources and the estimated application requirements [4, 5]. However, one application can obviously provide only a limited freedom for experiments. To be able to examine the behavior of an arbitrary parallel application (characterized by various values of the application parameter fc and various interprocess

Fig. 2. Dependency of the load balancing speedup Θ on the ‘‘guessed’’ application parameter fc for 5 synthetic applications with different values of Fc .

communication topologies), we developed a model parallel application with a synthesized workload, tunable communication to computation ratio fc and configurable logical communication topology (i.e. the patterns of interconnections between the processes). The latter gives the possibility to model different connectivity schemes, e.g. Master–Worker, Mesh, Ring, Hypercube etc. The experiments were carried out on different scales: on the DAS-2 computer cluster [23] and RIDgrid [4], using MPI implementation of the model application (Sections 4.1–4.3), and on a larger scale on the EGEE grid testbed [24], in Geant4 VO [25] (Section 4.4) with the model application implemented as a plug-in for DIANE User-Level Scheduling environment [21,22,26]. 4.1. Balancing speedup for different applications This section illustrates searching through the space of possible values of the application parameter fc in order to find the actual application requirement fc = Fc (see Step A4 of the meta-algorithm and Section 3). Fig. 2 presents the results of load balancing of the model application with the Master–Worker non-lockstep asynchronous communication logical topology (a Worker node can immediately start calculation while the Master continues sending data to the other Workers): the load balancing speedup for 5 applications with different pre-defined values of Fc (0.1–0.5) on the same set of heterogeneous resources is shown. The value of fc = fc∗ corresponding to the maximal speedup assures the best application performance. The best speedup in all cases is achieved with fc∗ close to the real application Fc , thus proving the validity of our approach. Another observation is that the applications characterized by a higher communication to computation ratio Fc , achieve a higher balancing speedup, which means that the communication-intensive applications benefit more from the proposed load balancing technique. It is also worth noticing that the distribution of the workload proportional only to the processor performance (fc = 0) also gives a significant increase of the performance (180% in case of Fc = 0.5), but introduction of the dependency on application and resource parameters adds another 35% percent to the balancing speedup in this case. In experiments with a higher level of resource heterogeneity, this additional speedup contributed up to 150%. 4.2. Load balancing for master–worker model: Heuristic vs analytically derived load distribution To assess the performance of the proposed load balancing algorithm, we analytically derived the best workload distribution parameters for some specific communication logical topologies of parallel applications, and compared the speedup achieved with our

32

V.V. Korkhov et al. / Future Generation Computer Systems 25 (2009) 28–34

Fig. 3. Comparison of the execution times for different applications: the analytical distribution versus generic heuristic load balancing.

Fig. 4. Dependency of the load balancing speedup Θ on resource heterogeneity metrics ϕ .

heuristic algorithm with that provided by the analytical method. Here the results for a widely used Master–Worker non-lockstep asynchronous communication model are presented. The values of the weighting factors defining the optimal load distribution have been derived from the principle of equalizing the time spent by each processor working on the application. Omitting the mathematical details, the final recurrence relation for calculating the weights is: qN =

1+

N Y N X τk + Tk i=2 k=i

qi−1 = qi

τi + Ti Ti−1

Tk−1

!−1 ,

for i = N . . . 2;

wi = qi

X N

(3) qj

j=1

where τi = Ncomm /ni is the time for sending the total amount of application communications Ncomm from the Master to the ith Worker node over the network link with the measured bandwidth ni ; and Ti = Ncalc /pi is the time for performing the total amount of application calculations Ncalc by the ith processor with the processing power of pi . The synthetic applications were tested with different communication to computation ratios Fc on different sets of resources, with the two different load distributions: analytical and heuristic. In Fig. 3 an example of comparison of the execution times achieved with these load balancing strategies on a set of highly heterogeneous resources is presented. In the worst case the heuristic time is only 5%–15% higher than the best possible for these applications (the larger difference attributed to the very communicationintensive test). Considering that our approach is generic and suits any type of communication topology, this overhead is a relatively small impediment. 4.3. Influence of resource heterogeneity on the load balancing efficiency Thorough testing of different applications on various sets of resources showed a strong influence of the level of resource heterogeneity on the results achieved. We performed a series of targeted experiments varying the resource heterogeneity in the processor power and the network links bandwidth. As a sample of these tests, Fig. 4 shows the dependency of the load balancing speedup on the processing power heterogeneity metrics, analogous to that of the networks links heterogeneity introduced by Eq. (2). As it can be seen, the speedup grows superlinearly with the heterogeneity level, thus indicating that our approach is especially beneficial for strongly heterogeneous resources, such as the Grid resources.

Fig. 5. Execution time with self-scheduling and AWLB load balancing for six applications characterized by different communication/computation ratio Fc .

4.4. AWLB and self-scheduling To validate the methodology of integrated AWLB+ ULS approach, we compared the performance achieved by the AWLB algorithm and dynamic resource pool with the results shown by the standard DIANE [26] task dispatching technique, self-scheduling (also called FIFO scheduling algorithm) for a Master–Worker application. The Master–Worker communication model is the most natural choice, since existing large-scale Grid infrastructures are federations of clusters which often do not support inbound connectivity to individual worker nodes. In the EGEE Grid, the cross-cluster communication is not accessible to individual users; therefore user-level scheduling frameworks such as DIANE assume a Tree communication topology, typically reduced to one Master host in the user environment linked to a pool of Worker hosts. In self-scheduling algorithm, all the workload is divided into tasks of equal size; typically the number of tasks largely exceeds the number of available worker nodes. As long as a worker becomes available, it is being assigned the next task from the list. In AWLB, all the workload is divided into the number of available workers, and the size of the workload assigned to each task is calculated by the heuristic algorithm, using the resource and application characteristics. The execution time with the AWLB algorithm was up to 2 times faster than self-scheduling for all types of simulations. Fig. 5 demonstrates the results for six applications characterized by different communication/computation ratio Fc . In these experiments the computational load (Ncalc ) was kept constant on a fixed set of 16 processors, while the amount of data transferred between the master and the workers (Ncomm ) was varied.

V.V. Korkhov et al. / Future Generation Computer Systems 25 (2009) 28–34

33

Acknowledgments The research was conducted with financial support from the Dutch National Science Foundation and the Russian Foundation for Basic Research under projects # 047.016.007 and 047.016.018, and with partial support from the Virtual Laboratory for e-Science Bsik project. References

Fig. 6. Dynamics of the resource pool population.

Fig. 6 presents the statistics of an actual run using a dynamically populated resource pool. The resources are gradually added to the resource pool, depending on their availability, and some workers are removed from the pool: the number of workers used is not growing steadily, but experiences dips every few iterations. 5. Conclusions This paper introduced a new hierarchical approach that combines user-level job scheduling with a dynamic load balancing technique that automatically adapts a parallel application to the heterogeneous resources. The proposed algorithm dynamically selects the resources best suited for a parallel application and optimizes the load balance based on the dynamically measured resource parameters and estimated requirements of the application. The performance of this load balancing approach is studied on a real-life Virtual Reactor application [2,3] and on a synthetic application with flexible user-defined application parameters and logical network topologies, on heterogeneous resources with a controlled level of heterogeneity. Some of the conclusions from our methodological experiments are as follows:

• The proposed algorithm adequately finds the application requirements;

• The resulting balancing speedup reached up to 450% in some test-cases;

• The novelty of our load balancing approach – dependency of the load distribution on the application and resource parameters – adds up to 150% to the balancing speedup compared to the balancing that takes into account only the processor performance; • Analysis of the balancing speedup achieved for different types of applications and resources indicates that the communicationintensive applications benefit most from the proposed load balancing technique; • The balancing speedup from applying our approach grows superlinearly with the increase of the resources heterogeneity level, thus showing that it is especially useful for the severely heterogeneous Grid resources; • Comparison of the performance of our load balancing heuristic with that achieved using the analytically derived weights showed a relatively small discrepancy of 5%–15%, with a larger difference attributed to the very communication-intensive applications. The results presented here were obtained for parallel applications with the most widespread communication model in the existing large-scale Grids: a Master–Worker scheme in a nonlockstep asynchronous mode. At present, we test other connectivity schemes, such as the different Master–Worker modes, as well as Mesh, Ring and Hypercube topologies.

[1] V.V. Krzhizhanovskaya, V.V. Korkhov, Dynamic Load Balancing of BlackBox Applications with a Resource Selection Mechanism on Heterogeneous Resources of the Grid, in: LNCS, vol. 4671, 2007, pp. 245–260. PaCT’07. [2] V.V. Krzhizhanovskaya, P.M.A. Sloot, Y.E. Gorbachev, Grid-based simulation of industrial thin-film production, Simulation 81 (1) (2005) 77–85. [3] V.V. Krzhizhanovskaya, et al. Computational Engineering on the Grid: Crafting a Distributed Virtual Reactor. in: 2nd IEEE Intl Conference on e-Science and Grid Computing, e-Science’06, 2006, pp. 101. [4] V.V. Korkhov, V.V. Krzhizhanovskaya, Benchmarking and Adaptive Load Balancing of the Virtual Reactor Application on the Russian-Dutch Grid, in: LNCS, vol. 3991, 2006, pp. 530–538. [5] V.V. Korkhov, V.V. Krzhizhanovskaya, P.M.A. Sloot, A grid based virtual reactor: Parallel performance and adaptive load balancing, Journal of Parallel Distributed Computing 68 (2008). [6] J. Nabrzyski, J.M. Schopf, J. Weglarz (Eds.), Grid Resource Management: State of the Art and Future Trends, Kluwer Academic Publishers, 2004. [7] I. Foster, C. Kesselman (Eds.), The Grid 2: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 2003. [8] R. Buyya, T. Cortes, H. Jin, Single system image, International Journal of High Performance Computing Applications 15 (2) (2001) 124–135. [9] K.E. Maghraoui, et al., The internet operating system: Middleware for adaptive distributed computing, International Journal of High Performance Computing Applications 20 (4) (2006) 467–480. [10] O.O. Sonmez, A. Gursoy, A novel economic-based scheduling heuristic for computational grids, International Journal of High Performance Computing Applications 21 (1) (2007) 21–29. [11] W.F. Boyera, G.S. Hura, Non-evolutionary algorithm for scheduling dependent tasks in distributed heterogeneous computing environments, Journal of Parallel Distributed Computing 65 (2005) 1035–1046. [12] D.E. Collins, A.D. George, Parallel and sequential job scheduling in heterogeneous clusters: A simulation study using software in the loop, Simulation 77 (2001) 169–184. [13] A. Schoneveld, J.F. de Ronde, P.M.A. Sloot, On the complexity of task allocation, Complexity 3 (1997) 52–60. [14] J.F. de Ronde, A. Schoneveld, P.M.A. Sloot, Load balancing by redundant decomposition and mapping, Future Generation Computer Systems 12 (5) (1997) 391–407. [15] H.D. Karatza, R.C. Hilzer, Parallel job scheduling in homogeneous distributed systems, Simulation 79 (5–6) (2003) 287–298. [16] A. Barak, et al., The MOSIX Distributed Operating System, Load Balancing for UNIX, in: LNCS, vol. 672, Springer-Verlag, 1993. [17] B.J. Overeinder, et al., A dynamic load balancing system for parallel cluster computing, Future Generation Computer Systems 12-1 (1996) 101–115. [18] G. Shao, et al., Master/Slave Computing on the Grid, in: Proceedings of Heterogeneous Computing Workshop, IEEE Computer Society, 2000, pp. 3–16. [19] S. Sinha, M. Parashar, Adaptive runtime partitioning of AMR applications on heterogeneous clusters. in: Proceedings of 3rd IEEE Intl. Conference on Cluster Computing, 2001, pp. 435–442. [20] R. David, et al., Source Code Transformations Strategies to Load-Balance Grid Applications, in: LNCS, vol. 2536, Springer-Verlag, 2002, pp. 82–87. [21] C. Germain-Renaud, et al., Scheduling for responsive grids, Grid Computing Journal 6 (1) (2008) 15–27 (Special Issue on EGEE User Forum). [22] J.T. Moscicki, et al. Quality of Service on the Grid with User Level Scheduling; In: Cracow Grid Workshop Proceedings, 2006. [23] DAS2 environment. http://www.cs.vu.nl/das2/. [24] F. Gagliardi, et al., Building an infrastructure for scientific grid computing: Status and goals of the EGEE project, Philosophical Transactions of the Royal Society A 363 (1833) (2005) 1729–1742. [25] Geant4 VO. http://lcg-voms.cern.ch:8443/vo/geant4. [26] J.T. Moscicki, Distributed analysis environment for HEP and interdisciplinary applications, Nuclear Instruments and Methods in Physics Research A 502 (2003) 426–429. Vladimir V. Korkhov is a researcher at the Faculty of Science of University of Amsterdam (UvA). He received his Master’s degree in mathematics and computer science from St. Petersburg Institute of Fine Mechanics and Optics, Russia, and is currently finalizing the Ph.D. thesis at UvA. His research interests include grid computing, distributed software systems, resource management and workload balancing in heterogeneous environment, workflows on the grid; he is the author of more than 20 conference and journal papers.

34

V.V. Korkhov et al. / Future Generation Computer Systems 25 (2009) 28–34 Jakub T. Moscicki is a software engineer and researcher at CERN in Geneva. He obtained the M.Sc. in Computing from the AGH University of Science and Technology in Krakow. His research interests focus on the distributed and parallel applications deployed on large-scale computing infrastructures such as the Grids. The multidisciplinary application portfolio includes high-energy physics, theoretical physics, medical and radiation studies, bio-informatics and drug design, telecommunications and simulation.

Valeria V. Krzhizhanovskaya is a researcher at the University of Amsterdam (UvA), The Netherlands, and a senior lecturer at St. Petersburg State Polytechnic University (StPSPU), Russia. She received the M.Sc. degree in Applied Mathematics and Physics from StPSPU and Ph.D. degree in Computational Science from UvA. Valeria has published over 40 papers, worked as a guest editor of 4 special issues of the International Journal of Multiscale Computational Engineering, organized 5 international symposia on Simulation of Multiphysics Multiscale Systems http://www.science.uva.nl/~valeria/SMMS/, served as a program committee member and a reviewer in over 20 conferences and 6 international journals, participated in more than 40 conferences, and worked in about 20 international projects. Her research interests include parallel distributed computing in heterogeneous systems, Grid computing, problem solving environments; modeling, simulation and numerical methods in physics.