Future Generation Computer Systems 28 (2012) 1058–1069
Contents lists available at SciVerse ScienceDirect
Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs
A GridWay-based autonomic network-aware metascheduler Luis Tomás a,∗ , Agustín C. Caminero b , Omer Rana c , Carmen Carrión a , Blanca Caminero a a
Department of Computing Systems, The University of Castilla-La Mancha, Albacete, Spain
b
Department of Communication and Control Systems, The National University of Distance Education, Madrid, Spain
c
Cardiff School of Computer Science, 5 The Parade, Cardiff, CF24 3AA, United Kingdom
article
info
Article history: Received 5 November 2010 Received in revised form 6 June 2011 Accepted 21 August 2011 Available online 31 October 2011 Keywords: Grid computing Autonomic systems GridWay Quality of service
abstract One of the key motivations of computational and data grids is the ability to make coordinated use of heterogeneous computing resources which are geographically dispersed. Consequently, the performance of the network linking all the resources present in a grid has a significant impact on the performance of an application. It is therefore essential to consider network characteristics when carrying out tasks such as scheduling, migration or monitoring of jobs. This work focuses on an implementation of an autonomic network-aware meta-scheduling architecture that is capable of adapting its behavior to the current status of the environment, so that jobs can be efficiently mapped to computing resources. The implementation extends the widely used GridWay meta-scheduler and relies on exponential smoothing to predict the execution and transfer times of jobs. An autonomic control loop (which takes account of CPU use and network capability) is used to alter job admission and resource selection criteria to improve overall job completion times and throughput. The implementation has been tested using a real testbed involving heterogeneous computing resources distributed across different national organizations. © 2011 Elsevier B.V. All rights reserved.
1. Introduction Computational and data grids allow the coordinated use of heterogeneous computing resources within large-scale parallel applications in science, engineering and commerce [1]. Since organizations sharing their resources in such a context still keep their independence and autonomy [2], grids are highly variable systems in which resources may join/leave the system at any time. This variability makes Quality of Service (QoS) highly desirable, though often very difficult to achieve in practice. One reason for this limitation is the lack of a central entity that orchestrates the entire system. This is especially true in the case of the network that connects the various components of a grid system. Achieving an end-to-end QoS is often difficult, as without resource reservation any guarantees on QoS are often hard to achieve. Furthermore, in a real grid system, reservations may not be always feasible, since not all the Local Resource Management Systems (LRMS) permit them. There are also other types of resource properties, such as bandwidth, which lack a global management entity, thereby making their reservation impossible.
∗ Correspondence to: Instituto de Investigación en Informática de Albacete (I3A), Universidad de Castilla-La Mancha, Campus Universitario s/n 02071, Spain. Tel.: +34 967 599 200x2696; fax: +34 967 599 343. E-mail addresses:
[email protected] (L. Tomás),
[email protected] (A.C. Caminero),
[email protected] (O. Rana),
[email protected] (C. Carrión),
[email protected] (B. Caminero). 0167-739X/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2011.08.019
However, for applications that need a timely response (i.e., distributed engine diagnostics [3] or collaborative visualization [4]), the grid must provide users with some assurance about the use of resources—a non-trivial subject when viewed in the context of network QoS. In a grid, entities communicate with each other using an interconnection network—resulting in the network playing an essential role in grid systems [5]. In a previous contribution [6], authors proposed an autonomic network-aware grid meta-scheduling architecture as a possible solution. This architecture takes into account the status of the system in order to make meta-scheduling decisions—paying special attention to network capability. This is a modular architecture in which each module works independently of others, thereby providing an architecture that can be adapted to new requirements easily. In this paper, the aforementioned architecture has been implemented as an extension to the GridWay meta-scheduler [7], with case studies and performance results provided to demonstrate how it can used. A scheduling technique that makes use of Exponential Smoothing (ES) [8] to calculate predictions on the completion times of jobs is also provided. Thus, the main contributions of this paper are: (1) an implementation of an architecture to perform autonomic network-aware metascheduling based on the widely used GridWay system; (2) a scheduling technique that relies on ES to predict the completion times of jobs; (3) a performance evaluation carried out using a testbed involving workloads and heterogeneous resources from several organizations.
L. Tomás et al. / Future Generation Computer Systems 28 (2012) 1058–1069
The structure of the paper is as follows: Section 2 reviews existing approaches for supporting QoS in grids, along with grid meta-schedulers. Section 3 discusses a scenario in which an autonomic scheduler can be used and harnessed. Section 4 contains details about the implementation based on an extension to the GridWay meta-scheduler. Section 5 presents a performance evaluation of our approach, with conclusions and suggestions for further work identified in Section 6. 2. Related work The provision of QoS in a grid system has been explored by a number of research projects, such as GARA [5], G-QoSM [9], GNRB [10–12], among others. The proposals which provide scheduling of users’ jobs to computing resources are GARA and G-QoSM, and the schedulers used are DSRT [13] and PBS [14] in GARA, whilst G-QoSM uses DSRT. These schedulers (DSRT and PBS) only pay attention to the load of the computing resource, thus a powerful unloaded computing resource with an overloaded network could be chosen to run jobs, which decreases the performance received by users, especially when the job requires a high network I/O. A network is a key component within a grid system due to the coordinated use of distributed resources, thus attention should be paid when carrying out tasks such as scheduling, migration, or monitoring [15]. Surprisingly, many of the above efforts do not take network capability into account when scheduling tasks, thus other meta-schedulers available today, such as GRMS [16], CSF [17], VIOLA [18], Grid Service Broker [19], Grid Network Broker (GNB) [6] or GridWay [7] need to be analyzed. The main drawback of GRMS is that it is obsolete since the last Globus version it deals with is 2.4 [20]. CSF provides reservations based on a number of features, including the network, but it is a centralized engine and is not intended for bulk data transfer, rather it primarily tackles scheduling heterogeneities [21]. VIOLA provides co-allocation support for both computational and network resources, but the focus in VIOLA is on co-allocation and reservation—which is not always possible if the network is under ownership of a different administrator. Grid Service Broker already includes network information provided by Network Weather Service (NWS) [22] to perform the meta-scheduling, but it requires information on the effective bandwidth between all the data hosts and all the compute hosts in the system to perform network-aware scheduling, which makes the proposal difficult to scale. GNB [6] is an autonomic network-aware metascheduling framework, but it is only tested by means of simulations. GridWay is straight-forward to install and use. Besides, one of the main goals of GridWay is its modular architecture that allows extensions to be implemented easily. However, the network is not among the parameters it uses to perform meta-scheduling. GridWay has been chosen as the basis for the implementation described in this paper due to the reasons provided above. However, it also has been necessary to make GridWay networkaware in order to fully support the devised techniques. On the other hand, conceptually, an autonomic system requires: (a) sensor channels to sense the changes in the internal state of a system and the external environment in which the system is situated, and (b) motor channels to react to and counter the effects of the changes in the environment by changing the system and maintaining equilibrium. The changes sensed by the sensor channels have to be analyzed to determine if any of the essential variables have gone out of their viability limits. If so, it has to trigger some kind of planning to determine what changes to inject into the current behavior of the system such that it returns to the equilibrium state within the new environment. This planning requires knowledge to select the right behavior from a
1059
large set of possible behaviors to counter the change. Finally, the motor neurons execute the selected change. Sensing, Analyzing, Planning, Knowledge and Execution are thus the keywords used to identify an autonomic computing system. A common model based on these ideas was identified by IBM Research and defined as MAPE (Monitor-Analyze-Plan-Execute) [23]. There are, however, a number of other models for autonomic computing [24,25]— in addition to work in the agent-based systems community that share commonalities with the ideas presented above. There has been significant work already undertaken toward autonomic grid computing [26–29]. In [26], an architecture to achieve automated control and management of networked applications and their infrastructure based on XML format specification is presented. Liu and Parashar [27] present an environment that supports the development of self-managed autonomic components, with the dynamic and opportunistic composition of these components using high-level policies to realize autonomic applications, and provide runtime services for policy definition, deployment and execution. In [28], an autonomic job scheduling policy for grid systems is presented. This policy can deal with the failure of computing resources and network links, but it does not take the network into account in order to decide which computing resource will run each user application. Only idle/busy periods of computing resources are used to support scheduling. Finally, in [29], a simple but effective policy was formulated, which prioritized the finishing and acceptance of jobs over their response time and throughput. It was determined that due to the dynamic nature of the problem, it could be best resolved by adding self-managing capabilities to the middleware. Using the new policy, a prototype of an autonomous system was built and succeeded in allowing more jobs to be accepted and finished correctly. 3. Autonomic Network-aware Meta-scheduling (ANM) The availability of resources within a grid environment may vary over time—some resources may fail whereas others may join or leave the system at any time. Additionally, each grid resource must execute a workload that combines locally generated tasks with those that have been submitted from external (remote) user applications. Hence, each new task influences the execution of existing applications, requiring a resource selection strategy that can account for this dynamism within the system. Our autonomic approach uses a variety of parameters to make a resource selection, such as network bandwidth, CPU usage or resource goodness, amongst others. A motivating scenario for an autonomic networkaware meta-scheduler architecture is depicted in Fig. 1 and includes the following entities [6]:
• Users, each with a number of jobs/tasks to run. • Computing resources, which may include clusters running a Local Resource Management Systems (LRMS), such as PBS [14].
• GNB (Grid Network Broker), an autonomic network-aware meta-scheduler.
• GIS (Grid Information Service), such as [30], which keeps a list of available resources.
• Resource monitor(s), such as Ganglia [31] or Iperf [32], which provide detailed information on the status of the resources.
• BB (Bandwidth Broker), such as [33], which is in charge of the administrative domain and has direct access to routers, mainly for configuration and topology discovering purposes. • Interconnection network, such as a Local Area Network (LAN) or the Internet. The interaction between components within the architecture is as follows: 1. Users ask the GNB for a resource to run their jobs. Users provide features of jobs (the ‘‘job template’’), that includes the input/output files, the executable file and a deadline, amongst other parameters.
1060
L. Tomás et al. / Future Generation Computer Systems 28 (2012) 1058–1069
Fig. 1. Example scenario.
2. The GNB performs two operations for each job: (a) it performs Connection Admission Control (CAC) by both (a) filtering out the resources that do not have enough capacity to accept the job and (b) verifying whether the required QoS can be fulfilled by executing the job on the selected resource—i.e. whether execution can be finished before the deadline set by the user. An estimation of job execution time is undertaken to support this [34,35] (explained in the next section). (b) it chooses the most appropriate resource for its execution, i.e. one that exhibits the best tolerance (explained in the next section). Choosing a tolerance parameter and supporting CAC constitute the main autonomic capability of the system. The tolerance parameter is dynamically adjusted based on estimated and real execution time of jobs on a particular resource, and admission control is subsequently used to limit allocation of jobs to particular resources. As outlined in Section 2, the autonomic control loop involves a dynamic adjustment of the tolerance parameter to improve job completion times and resource utilization. Hence, if the selected resource is either not available or has excessive workload, the resource with the next best tolerance is checked. This process is repeated until a suitable resource is found or until a certain number of resources are checked. Finally, if it is not possible to allocate the job, it will be dropped, since its QoS cannot be fulfilled given current resource availability. To achieve this, the GNB first obtains a list of resources from the GIS and subsequently gets the current load on each of these from the resource monitor. 3. When found, the GNB submits the job to the selected computing resource. 4. Finally, after job completion, the GNB will get the output sent back from the resource, and will forward it to the user. Once the GNB has selected a resource to execute the job (step 2), the value of tolerance is updated. This value indicates how accurately completion times of jobs can be predicted. To achieve this, the GNB first estimates the job completion time (prior to actual execution), taking into account data transfer time and CPU time. Subsequently on job completion, the estimated value is compared with the real value of job completion time. The difference between these represents the accuracy with which such estimation can be achieved—in practice often limited due to some sites and administrative domains not sharing information on the load of their resources. Resource contention provides another obstacle causing host load and availability to vary over time, making the completion time estimation difficult [36].
Consequently, it is necessary to estimate how trustworthy a specific resource is likely to be, or even if it will be available to execute the job. The prediction information can be derived in two ways [36]: application-oriented and resource-oriented. For the applicationoriented approaches, the running time of grid tasks is directly extrapolated by using information about the application, such as the running time of previous similar tasks. For the resource-oriented approaches, the future performance of a resource, such as CPU load and availability, is predicted by using historical information. Then, data are used for forecasting the running time of a task, given the information on the resource requirement of the task. In this work, a combination of these two approaches is used. We use application-oriented approaches to sort out the execution time of the application and resource-oriented approaches to calculate the time needed to perform the network transfers. The resource that is selected to execute a job is the one that has provided the most predictable behavior up to the point the schedule is generated. Information on the status of already scheduled jobs is used to obtain network and CPU tolerances. The CPU execution time is calculated by means of using exponential smoothing functions [35] to tune resource status estimations, which are in turn calculated by using information about past executions of similar tasks. The information on network latency is obtained by means of Iperf [32], and information on jobs already scheduled is obtained by means of the GIS—in our case, Globus GRAM [30]. The monitoring of the network and computing resources is carried out with a given frequency referred to as the monitoring interval. As the GNB performs scheduling of jobs to computing resources in between two consecutive monitoring intervals, it must take into account the jobs already scheduled on those resources—i.e. calculate the effective bandwidth taking account of the existing workload. 4. Implementation of ANM The GNB has been implemented as an extension to the GridWay meta-scheduler—to achieve this, it was first necessary to make GridWay network-aware, as well as performing the needed adaptations to develop an scalable and suitable solution for real grid environments. Details about how this was undertaken along with a description about predictions of network and CPU performance are provided in this section. 4.1. Extending GridWay to be network-aware GridWay has been modified to take into account the status of the network when ordering resources in the meta-scheduling
L. Tomás et al. / Future Generation Computer Systems 28 (2012) 1058–1069
1061
Algorithm 1 CAC algorithm. 1: 2: 3: 4: 5: 6: 7: 8: Fig. 2. Conceptual view of the extensions introduced to GridWay.
9: 10:
process [37], a value calculated using the Iperf tool [32]. Similarly, monitoring data for computing resources is obtained by Ganglia (already present in GridWay) and the GIS provided by the Globus Toolkit [38]. GridWay performs meta-scheduling by requiring the user to provide a job template which specifies the features of the job, including the executable file and the input and output files, amongst others. The job template has two tags that specify the criteria used by GridWay for selecting resources to run the job, namely, REQUIREMENTS and RANK. With the REQUIREMENTS tag, the user can set the minimal requirements needed to run the job, thus applying a filter on all the resources known to GridWay. Once the REQUIREMENTS tag is processed, the set of resources that fulfill the REQUIREMENTS are sorted according to the criteria posed by the RANK tag. For both tags, several characteristics such as CPU type and speed, operating system, memory available, etc. can be specified. Many of these values are gathered through the Globus GIS module, while others (dynamic ones such as amount of free memory) are monitored through Ganglia [31]. For this implementation, the BANDWIDTH attribute has been introduced into GridWay. It refers to the effective network bandwidth in the path between the GridWay node and each computing resource, i.e. the path traversed by the data (I/O files) needed by the job to run. If an application has a large amount of input data, it must correspondingly chose an appropriate network path based on the value of this attribute. Both the REQUIREMENTS and the RANK expressions can utilize the BANDWIDTH attribute. Thus, the user can filter and/or sort resources by also taking into account the effective bandwidth from the GridWay node to each resource. Fig. 2 illustrates the extensions introduced in GridWay to be network-aware. The elements in shade have been added by the authors. Details about these extensions can be found in [37]. 4.2. Autonomic scheduler Autonomic behavior in GridWay has been implemented by means of (1) performing connection admission control (CAC); (2) adding a new attribute named TOLERANCE that GridWay uses to perform the filtering and sorting of resources, reacting to changes in the state of the system; and (3) using Exponential Smoothing (ES) [8] to tune the predictions on the duration of jobs. This section presents details on how these have been implemented. 4.2.1. Connection admission control (CAC) The Connection Admission Control Algorithm (see Algorithm 1) checks resources (with R being the set of computing resources of the same VO) to identify those with enough CPU capacity (thresholdCPU ) on which the job can be executed within its deadline (line 11). If the predicted completion time for the job is lower than the deadline, the resource is chosen (line 12). Otherwise, the next resource is checked. This process is repeated until all the resources are checked (line 10). Not all the known resources have to be checked, for efficiency and scalability an upper limit may be defined (MaxRes). If RCAC is empty, then the job is rejected—or alternatively a negotiation process is started.
11: 12: 13: 14: 15:
R: set of resources known to GridWay {ri / i in [1..n] } RCAC : set of resources that fulfill the CAC algorithm CPUfree (ri ): the percentage of free CPU of resource ri j: a job deadline(j): deadline of job j completion(j, ri ): estimated completion time of job j on resource ri MaxRes: maximum number of resources to check RCAC = ∅; i = 1; while (CPUfree (ri ) ≥ thresholdCPU ) AND (i ≤ MaxRes)) do if (completion(j, ri ) < deadline(j)) then RCAC = RCAC + ri end if increment i end while
Algorithm 2 Scheduling algorithm. 1: 2: 3: 4: 5: 6: 7: 8: 9:
j: a job RCAC : set of resources that fulfill the CAC criteria rexe : resource where j will be submitted for all ri in RCAC do if Toleranceri < Tolerancerexe then rexe = ri end if increment(i) end for
4.2.2. Scheduling algorithm Once the target set of resources (RCAC ) has been calculated by the CAC algorithm, the scheduling algorithm sorts them by taking into account their TOLERANCE (the way of estimating this value is explained in Section 4.2.3), from the lowest to the highest one, as outlined in Algorithm 2. As discussed next, resources with a high TOLERANCE value are less predictable, hence even though a job execution time on that resource may be within the deadline, this does not mean that the deadline will actually be met. It must be noted that predictions about durations of jobs have to be used, such as in [34,35]. On the other hand, and due to performance and scalability reasons, the CAC and scheduling algorithms are tightly coupled. This means that they are executed together in such a way that when a resource which could execute the job within its deadline is found, the job is submitted to that resource and the process stops. Hence, when the CAC is filtering resources, the list of resources to check have already been validated by the scheduling algorithm taking into account their TOLERANCE values. 4.2.3. Including the TOLERANCE attribute in GridWay In order to filter and order the list of resources known to GridWay considering the accuracy of previous scheduling decisions, a new attribute has been added to the job template. TOLERANCE has been implemented as a new attribute that can be used both in the RANK and REQUIREMENTS tags, in the same way as BANDWIDTH. The TOLERANCE attribute reflects the accuracy of predicting job completion times for each resource. In order to perform scheduling of jobs to computing resources and provide QoS to users (e.g. finish jobs before a deadline), predictions on the completion time of jobs must be calculated. This includes predictions on the transfer times (transfer of input and output files, along with the executable file) and execution time. The calculation of the TOLERANCE attribute is motivated by [6], where each time a job has to be scheduled both transfer and execution latency of that job is calculated for each resource known
1062
L. Tomás et al. / Future Generation Computer Systems 28 (2012) 1058–1069
to the meta-scheduler. However this is a time consuming process and therefore not scalable, so instead it should only be carried out after choosing a resource (and not for all resources). Additionally, the calculation of the TOLERANCE attribute in [6] relies on the millions of instructions that a job has, which is a measure of the size of the job provided by the simulator but really hard to obtain in practice. This term has been substituted by the average execution time of jobs of the same type, which is a more realistic metric to measure in practice (Section 4.3.2 explains this). For these reasons, the scheduling process explained in [6] has been modified. In our approach, the GNB performs scheduling for each job request and uses the value of the TOLERANCE attribute to filter and sort the set of resources known to GridWay. TOLERANCE values are calculated (as outlined in Eq. (1)) after every job completion and associated with the resource on which the job has been executed. Subsequently, the GNB orders resources based on their TOLERANCE, from the lowest to the highest, i.e. from the most predictable to the least one. r
k k TOLERANCE (rk ) = TOLERANCE rcpu + TOLERANCE net
r
k = TOLERANCE net
rk tnet_real
−
(1) (2)
MB r
k TOLERANCE rcpu =
rk tnet_estimated
r
k k tcpu_real − tcpu_estimated
r
k tcpu_real (app)
.
(3)
r
The terms TOLERANCE xk , x = {net, cpu}, represent the accuracy of the previous predictions carried out by the GNB for the resource rk rk , with k ∈ [1, n]. For TOLERANCE net , the last measurement of network bandwidth between the GridWay node and rk is considered, collected from the last update of this measure before the execution of the job. With this information along with the total number of bytes to be transferred, an estimation of transfer time of rk the job (tnet_estimated ) is calculated. After job execution, the actual rk time needed to complete the transfers (tnet_real ) can be obtained. Finally, with these two times, the updated network tolerance for the resource where the execution took place (rk ) is calculated. The rk value of TOLERANCE net reflects how accurate the prediction on the k is transfer time has been for the given job. Similarly, TOLERANCE rcpu calculated for each job after its completion. Eqs. (2) and (3) show the actual formulas used for the last completed job, where MB represents the size of the job in megar
k bytes, and tcpu_real (app) represents the average execution time of a certain job (app) on an specific resource (rk ). To estimate future values of TOLERANCE an approach similar to the one used by the Transport Control Protocol (TCP) for computing retransmissions time-outs [39] may be used. Hence, we can consider:
D = TOLERANCE (rk ) − Tolerancerk (i)
(4)
Tolerancerk (i + 1) = Tolerancerk (i) + D ∗ δ
(5)
where δ reflects the importance of the last sample in the calculation of the next TOLERANCE (Tolerancerk (i + 1)). TOLERANCE is only considered for those resources known to GridWay to have enough available capacity to accept more jobs. The GNB keeps a TOLERANCE value for the network and CPU capacity of computing resource and modifies these in response to changes in the system. Fig. 3 illustrates the autonomic control loop for modifying the TOLERANCE parameter, as outlined above. The ∆ means the difference between the predicted times and the real times. Hence, it is related with the Eqs. (2) and (3). 4.3. Predicting and tuning resource performance Two types of predictions are necessary, namely (1) predictions on the transfer times, and (2) predictions on the execution times. Furthermore, predictions must be tuned using exponential smoothing. These are explained next.
Fig. 3. Autonomic control loop for adapting the TOLERANCE parameter. The ‘‘X ’’ in tX _real and tX _estimate refers to the set {net, cpu}.
4.3.1. Calculating the network performance Once the scheduler has sorted the available resources by using their tolerance values, it is necessary to estimate the effective bandwidth between two end points in the network—these being the GNB and the computing resource where the job will be executed. This prediction is used for the CAC algorithm. Estimation of link bandwidth was implemented in [37], in which the Iperf tool monitors the available bandwidth from the GNB to all the computing resources it knows with a given frequency. However, this only provides the effective bandwidth at the moment when monitoring is performed, but may not be the bandwidth when a schedule needs to be defined (as other jobs may have been scheduled and are being transferred at that moment). It is therefore necessary to infer the effective bandwidth between two monitoring intervals—similar to [6]. We achieve this by considering the number of jobs that are being submitted to the selected resource at the point at which a schedule needs to be defined. Hence the effective bandwidth of the path between the GNB and a computing resource r at time t + x can be calculated as follows: eff_bw(rk )t +x =
Bw(rk )t
(6)
#PrologJobs + 1
where Bw(rk )t is the last measured value (at time t) of the available bandwidth between the GNB and the resource rk selected to execute the job; and #PrologJobs is the number of jobs that are submitting data from the GNB to that resource. In order to take into account all the data being transferred, this number is updated with the new incoming connection (‘‘+1’’ in the Eq. (6)). Note that x must be between (0, 1), which means that the estimation is made between two real measures. Once we have the effective bandwidth of a network path, the latency of the data transfers for the job over a network path can be calculated by dividing the I/O file size of the job in MB (mega bytes) by the effective bandwidth. These values (I/O file sizes) are known since I/O files are specified in the job template. In this way, the estimated time to complete the transfers is obtained by using Eq. (7). r
k Tnet_estimated =
sizeFilesIn eff_bw(rk )t +x
+
sizeFilesOut Bw(rk )t
.
(7)
It must be noted that Bw(rk )t is used for calculating the time needed to complete the output transfers (epilog step) since we know the number of jobs that are sending input files but we cannot know how many jobs will be sending back the output files when the job being submitted is completed. Additionally, the destination resource of this output file transfers does not have to be the same for all output transfers. Thus, a grid meta-scheduler cannot have complete knowledge of the network structure, making it necessary to make assumptions about effective bandwidth available in the future.
L. Tomás et al. / Future Generation Computer Systems 28 (2012) 1058–1069 rk Algorithm 3 Estimation of Execution Time (tcpu_estimated (app)).
1: 2: 3: 4: 5:
6:
R: set of resources known to GridWay {r1 , r2 ,. . . ,rn } app: the application from which the job to be executed is generated rk (app)j : the j − th execution time for the application app tcpu_real in the resource rk DB_Resourcesrk : the filtered database with the information about the status of the resource rk CPUfree (rk ): the mean percentage of free CPU in the resource rk between now and the deadline of the app, calculated by using the Exponential Smoothing (ES) function Ov erload: the extra time needed due to the CPU usage at the chosen resource rk r
rk j=1 tcpu_real (app)j
n
7:
k (app) = tcpu_estimated
8:
k (app) ∗ (1 − CPU_free(rk )) Ov erload = tcpu_estimated
9:
k k (app) + Ov erload (app) = tcpu_estimated tcpu_estimated
10:
n
r
r
r
return
rk tcpu_estimated
(app)
4.3.2. Calculating CPU latency Predictions of job execution time are quite difficult to obtain since there are performance differences between grid resources and their performance characteristics may vary for different applications (e.g. resource A may execute an application P faster than resource B, but resource B may execute application Q faster than A). With the aim of estimating as accurately as possible the time needed to execute the jobs on the selected resources, we apply the techniques developed in [34,35]. These techniques use application-oriented prediction techniques to sort out the execution time of the application and resource-oriented approaches to recalculate the execution time of the job depending on the predicted CPU status of the resource. Predictions on execution times are performed as explained in Algorithm 3 and are based on the average of previous executions of an application on a particular resource (line 7)—this estimation takes into account the different input parameters. This average is calculated for each job type, and information related to previous execution of a specific job is used to determine an average execution time. After that, the prediction on the future status of the CPU of each resource is calculated by means of an exponential smoothing function. Finally, the mean execution time is determined by predicting future CPU status of each resource (line 9). 4.4. Tuning predictions using exponential smoothing As grids are highly distributed and heterogeneous systems where resources are shared among many users, the future status of resources is very difficult to predict. On the other hand, predictions on the future status of resources are needed in order to be able to estimate the time needed to complete a job on those resources. To this end, the exponential smoothing prediction method is used to calculate predictions on the status of resources. Exponential Smoothing (ES) [8] is a statistical technique for detecting significant changes in data by ignoring the fluctuations irrelevant to the purpose at hand. It provides a simple prediction method based on both historical and current data [40]. In ES (as opposed to moving average smoothing) older data is given progressively-less relative weight (importance) whereas newer data is given progressively-greater weight. In this way, ES assigns exponentially decreasing weights as the observations get older. Hence, recent observations are given relatively more weight in forecasting that the older ones. ES is a procedure for continually revising a forecast in the light of more recent experience and it is employed in making
1063
short-term forecasts. There are several types of ES’s. In this work a triple exponential smoothing is used, which is also named Holt–Winters [8]. With this kind of ES the trend and seasonality of data are taken into account for the predictions. Trend refers to the long term patterns of data, whilst seasonality is defined to be the tendency of time-series data to exhibit behavior that repeats itself every L periods. Such trends are apparent when a user wants to execute a large, complex simulation on a grid at periodic intervals (such as for analyzing sales data or running a scientific experiment with new observations). Conversely, at vacation periods, probably not all the staff may take vacation at the same time, and the load on the resources would be progressively decreasing. We have chosen ES because our data is likely to represent both behaviors. ES also provides a simple and efficient method which can be implemented without slowing down the performance of the system. Regarding seasonality, it is likely that CPU availability would increase at night, even more in our scenario in which resources are shared. Similarly, resource usage may also depend on the week day being considered, with greater workload during the week and greater availability at weekends. Thus, data collection and analysis needs to be run at different times of the day to make an accurate prediction about resource status. Hence, a weekly log is used as input to the ES function for predicting the future status of network and resources for the next whole day, as we are then able to account for both seasonal behaviors. In our approach, predicted information is updated every 30 min to improve the results from time to time depending on the new knowledge observed from the recent resource and network behaviors. The forecasting method used is presented in Eq. (8). At the end of a time period t and xt being the observed value of the time series at time t (in our case, the CPU usage), ft +m is the forecasted value for m periods ahead, Tt is the trend of the time series, Lt is the deseasonalized level and St is the seasonal component. deadline
deadline
ES =
ft +m =
(Lt + Tt ∗ (m + 1))St +m−L
xt S t −L
+ (1 − α) ∗ (Lt −1 + Tt −1 )
Tt = β ∗ (Lt − Lt −1 ) + (1 − β) ∗ Tt −1 St = γ ∗
xt Lt
(8)
m=initT
m=initT
Lt = α ∗
+ (1 − γ ) ∗ St −L .
(9) (10) (11)
The deseasonalized level (Lt ) is calculated as shown in Eq. (9), taking into account the previous values obtained for trend and seasonality and the actual value observed. The new trend of the time series (Tt ) is the smoothed difference between two successive estimations of the deseasonalized level as described in Eq. (10). Finally, the seasonal component (St ) is calculated using Eq. (11). This expression contains a combination of the most recently observed seasonal factors given by the demand xt , divided by the deseasonalized series level estimate (Lt ) and the previous best seasonal factor estimate for this time period. Thus, seasonality indicates how much this period typically deviates from the period (in our case weekly) average. At least one full season of data is required for the computation of seasonality. In the equations, α, β and γ are constants that must be estimated in such a way that the mean square error is minimized. These weights are called the smoothing constants. For each component (level, trend, seasonal) there is a smoothing constant that falls between zero and one. It is important to set a correct value for them to predict the behavior of resources and network as accurately as possible. In our work, the R program [41] is used for calculating these parameters. We need to use at least a two week log, divided into weekly data sets. Using the first of these data sets (one week log), the R program tries to estimate these values for
1064
L. Tomás et al. / Future Generation Computer Systems 28 (2012) 1058–1069
Fig. 5. Visualization Pipeline (VP) test.
Fig. 4. Grid testbed topology.
the last week. These results are then compared with the real status registered for the following week, and the α, β and γ values are adjusted to minimize the mean square error. 5. Experiments and results This section describes the experiments conducted to test the usefulness of this work, along with the results obtained. 5.1. Experiment testbed The evaluation of the autonomic implementation has been carried out in a real grid environment. The testbed consists of resources located at two different Universities, as illustrated in Fig. 4. At the University of Castilla La-Mancha, (UCLM, Albacete, Spain) there are resources located in two different buildings. In one building, named Instituto de Investigación en Informática de Albacete (I3A), there is one machine which performs the scheduling of tasks and several computational resources (10 desktop computers belonging to other users). In a second building, named Escuela Superior de Ingeniería Informática (ESII), there is a cluster machine with 88 cores and with the PBS [42] scheduler, which is also shared with other users. All these machines belong to the same administrative domain (UCLM) but they are located within different subnets. On the other hand, there is another computational resource at the National University of Distance Education (Universidad Nacional de Educación a Distancia, UNED, Madrid, Spain), which is also a desktop computer. Thus, the network which links this computational resource with the UCLM resources is the Internet. Table 1 outlines the main characteristics of these computing resources. Notice that these machines belong to other users, so they have their own local background workload (including the network load). Each machine is the desktop computer of a member of the staff (and not a dedicated machine) at UCLM or UNED, so they may have different CPU and network background workloads, which are not defined in the testbed. 5.2. Workload To evaluate our implementation we use one of the GRASP [43] benchmarks, named 3node, since it is possible to model workload
similar to that used in [6]. The 3node test consists of sending a file from a ‘‘source node’’ to a ‘‘computation node’’, which performs a search pattern, generating an output file with the number of successes. The output file is sent to the ‘‘result node’’. This test is meant to mimic a pipelined application that obtains data at one site, computes a result on that data at another, and analyzes the result on a third site. Furthermore, this test has parameterizable options to make it more compute intensive (compute_scale parameter), which means that the run time is increased, and/or it can become more network demanding (output_scale parameter), which means that the files to be transferred are bigger. This versatility is the reason why we have chosen this test to measure the performance of our approach. With these two parameters, it is possible to generate different types of jobs. Therefore, in order to emulate the workload used in [6], the compute_scale parameters takes the value 10 and the output_scale 1. Besides, the input file size is 48 MB, and these values of output_scale create output files whose size is the same as the input file size of 48 MB. To better validate and evaluate this implementation, one of the NAS Grid Benchmarks (NGB) [44], named Visualization Pipeline (VP), has also been used. This test has different workflow dependences. Some jobs are more computationally intensive while others are network demanding. Therefore, the VP allows us to explore a big spectrum of running conditions. Fig. 5 shows the workflow of this test. VP represents chains of compound processes, like those encountered when visualizing flow solutions as the simulation progresses. It comprises three NPB [45] problems, namely BT, MG, and FT, which fulfill the role of flow solver, post processor, and visualization module, respectively. This triplet is linked together into a logically pipelined process, where subsequent flow solutions can be computed while postprocessing and visualization of previous solutions are still in progress [44]. The circular nodes depict BT jobs, square nodes are MG jobs and trapezium nodes are FT jobs. All of them (BT, MG and FT ) are defined in the NGB benchmarks. 5.3. Performance evaluation GridWay has been used in several research articles to compare metascheduling techniques. Among others, Vazquez-Poletti et al. [46] present a comparison between EGEE and GridWay over EGEE resources. This comparison is both theoretical and practical through the execution of a fusion physics plasma application on the EGEE infrastructure, and shows the better performance of
L. Tomás et al. / Future Generation Computer Systems 28 (2012) 1058–1069
1065
Table 1 Characteristics of the resources. Domain
Machine
Hardware
Globus
CPU
RAM
(version)
UCLM (I3A)
GridWayI3A.uclm.es
2 Intel Pentium 4 CPU 3.00 GHz
2 GB
v. 4.0.5
UCLM (I3A) UCLM (I3A) UCLM (I3A) UCLM (I3A) UCLM (I3A) UCLM (I3A) UCLM (I3A) UCLM (I3A) UCLM (I3A) UCLM (I3A)
R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
2 AMD Opteron 244, 1.80 GHz 2 AMD Opteron 244, 1.80 GHz 2 AMD Opteron 244, 1.80 GHz 2 AMD Opteron 244, 1.80 GHz 2 AMD Opteron 244, 1.80 GHz 2 Intel Pentium 4 CPU 3.20 GHz Intel Core 2 Duo CPU 2,66 GHz 2 AMD Opteron 244, 1.80 GHz 2 Intel Pentium 4 CPU 3.00 GHz 2 Intel Pentium 4 CPU 3.00 GHz
1 GB 1 GB 1 GB 1 GB 1 GB 3 GB 2 GB 1 GB 2 GB 1 GB
v. 4.0.3 v. 4.0.3 v. 4.0.3 v. 4.0.4 v. 4.0.8 v. 4.0.7 v. 4.0.8 v. 4.0.7 v. 4.0.4 v. 4.0.8
UCLM (ESII)
Cluster1
22 AMD bipro dual core Opteron CPU 2.4 GHz
4 GB
v. 4.0.8
UNED
Uned R1
Intel Core 2 Duo CPU 2,80 GHz
4 GB
v. 4.0.8
(a) Average completion time.
(b) Average completion time boxplot. Fig. 6.
3node Average Time.
GridWay over LCG-2 Resource Broker. Several theoretical comparisons, among others [47–49] compare GridWay with other metascheduling techniques, highlighting the fact that this is a valuable and versatile tool to manage dynamic and heterogeneous grids. This section compares the following meta-scheduling schemes: (1) the original GridWay meta-scheduler [7], which chooses the first discovered resource with enough free CPU to execute a job (labeled as GW in the figures), (2) the GridWay meta-scheduler using the CPU power to select resources (labeled as GW-MHZ), (3) the network-aware GridWay extension presented in [37] (labeled as GW-Net), (4) the autonomic network-aware metascheduler with ES disabled (labeled as ANM), and (5) the autonomic network-aware meta-scheduler with ES enabled (labeled ANM-ES). For the last two schemes, the CAC functionality has been disabled in order to make a fair comparison with the other scheduling techniques which do not have this feature, which means that it does not matter if the QoS could be reached or not when accepting jobs. To evaluate the performance of the aforementioned scheduling techniques in our environment, we emulate a workload similar to [6] by using the 3node test. To do this, we simulate 5 different users. Each of them submits its jobs with one type of scheduling technique. User requests consist of 1000 jobs of 3node type with the parameters set as explained in Section 5.2. Results of these submissions are presented in Fig. 6. Fig. 6 (a) represents the average completion time of the 3node test with each scheduling technique. Fig. 6(b) represents a boxplot about the average execution time of the 1000 3node executions. The last one is a convenient way
of graphically depicting groups of numerical data through their five-number summaries: the smallest observation (minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and largest observation (maximum). As we can see on both figures, the average completion time is lower when using ANM and ANM-ES, in spite of having the CAC functionality disabled. The best results are obtained for ANM-ES. This scheduling technique achieves a time reduction of 26.11% over GW, of 17,47% over GW-MHZ and of 15.75% over GW-Net. Moreover, using the Exponential Smoothing predictions also results in a gain of 9.35% over the completion times obtained by ANM. Furthermore, as Fig. 6(b) depicts, there are other important results that highlight the better behavior of ANM-ES technique. These are:
• Median time reduction: 33.92% compared with GW, 24.16% compared with GW-MHZ and 26.62% compared with GW-Net. • Maximum time reduction: there exist a clear reduction obtained on this metric, since ANM-ES selects a resource whose behavior is more predictable. Because of that, the probability of choosing a bad resource which delays the execution is quite low. Thus, the maximum time reduction obtained is of 53.94% over GW, of 66.75% over GW-MHZ and of 55.53% over GW-Net. Moreover, comparing ANM and ANM-ES, the latter performs better since the completion time predictions are more accurate when using exponential smoothing and this makes the TOLERANCE value more reliable. Thus, ANM-ES obtains a reduction of 9.96% over ANM for the median time, and of 53.61% for the maximum time. The worst case is therefore clearly improved by using the ANM-ES compared to other techniques.
1066
L. Tomás et al. / Future Generation Computer Systems 28 (2012) 1058–1069
Table 2 Percentage of resource usage by 3node tests.
% of used resources Maximum % of resource usage Minimum % of resource usage
GW (%)
GW-MHZ (%)
GW-Net (%)
ANM (%)
ANM-ES (%)
25 79 21
38.4 42.5 1
50 32 12
75 42 1
62.5 46 8
Fig. 7. QoS not fulfilled.
It must be noted that for the GW scheduling technique, the box is narrower since the selected resource is the first discovered resource (whenever possible). Consequently, most of the jobs are executed on the same resource, so the time needed to complete the executions is more uniform. On the other hand, the resource usage is also improved by using autonomic behavior. Information about resource usage is presented in Table 2. The first line represents the percentage of resources used for executing the 3node test with each type of scheduling technique. The second line shows the usage of the most saturated resource—the one which each scheduling technique sends more jobs to. Finally, the last line represents the load submitted to the least used resource. There is a higher number of hosts used when ANM and ANM-ES are running (as the first row depicts). Also, the load is spread over the resources in such a way that there are not overloaded resources (as second row depicts). Finally, those resources whose behavior is not predictable are less used. For this, the use of exponential smoothing improves the predictability of the resources. Despite the fact that ANM uses more resources than ANM-ES (which may make us think that ANM balances load more efficiently), this does not necessary mean that ANM uses resources more efficiently. On the one hand, it could be better to submit more jobs to a resource having a better behavior, even if the load is not balanced. It may also not be advantageous to keep balancing the load since it may mean that worse resources (i.e. those that do not have the exactly desired capability) are used—rather than focusing our load on the best resources. Furthermore, in some cases ANM may not be accurate enough about its predictions and an unsuitable resource may be selected. This fact can be deducted from the minimum percentage of resource usage for ANM, as Table 2 shows. In that case, the resource was selected because of its TOLERANCE. However, the resource did not present such a predictable behavior since the exponential smoothing function was not used. Regarding the QoS perceived by the users, measured as the number of jobs that are executed fulfilling the deadline set by the users, another experiment has been conducted taking into account the previous results. In this case, we enable the CAC system for the ANM-ES ((labeled ANM-ES CAC) and compare its results against the previous ones by setting different deadlines for the submitted jobs. Fig. 7 depicts the number of jobs that would not
have fulfilled the QoS if the deadline had been set to 300, 180 and 120 s, respectively. As the other techniques do not have CAC capability, we use the information obtained in the previous test to know how many jobs would have finished on time. Hence, we count the number of jobs for which the execution time was lower than the deadline. It must be noted in the ANM-ES CAC all the jobs rejected due to the CAC algorithm are computed as jobs which not fulfill the QoS requirements. The same information for both ANM techniques is also presented to highlight the improvement obtained by using the CAC algorithm. As this figure depicts, for a 300 s deadline, the differences are negligible, since almost all the jobs can finish their execution before the deadline. In this case, the worst behavior is presented by GW-MHZ due to the way in which resources are selected. Sometimes resources with low network connectivity are selected, hence the time needed to complete the transfers is high, which leads to missing the established deadline. For a deadline of 180 s, it is again not useful to focus purely on CPU speed. Moreover, it is also highlighted that the autonomic behavior is better considering the three cases that use it (ANM, ANM-ES and ANM-ES CAC). However, ANM-ES CAC seems to work worse than when the CAC is disabled. This is due to the fact that there may be jobs that are not accepted since it is estimated that their deadline cannot be met, which is not happening when CAC is disabled. Additionally, there may be the case of jobs whose estimations for completion time are a bit larger than the deadline (e.g. the estimation says 181 s and the deadline is 180 s). These jobs are rejected when CAC is enabled. However, when CAC is disabled, this job is executed, and its execution time may be within the deadline. Nonetheless, these scenarios do not involve deadlines that are hard to fulfill. However, as shown in Fig. 7, for a 120 s deadline the behavior is different. In this case, it is really important to reject jobs when it is clear that their QoS cannot be fulfilled. This way, there will be fewer executions, and it is more likely that the remaining jobs finish on time. For these reasons, ANM-ES CAC shows an improvement of 34.7% over ANM-ES. Next, an evaluation of the performance received by users emulating a more realistic situation is presented. In this test, several jobs were submitted to the grid testbed during a long time interval. Moreover, jobs are submitted at the same time for all the meta-scheduler configurations. Thus, there is a competitive behavior among all the jobs submitted in all the tests. Furthermore, the duration of this interval is not fixed and depends on the way tests are submitted. Different cases have been analyzed, being more or less demanding over the grid environment. This test illustrates how different grid workloads affect the user performance. Three different user behaviors which imply different stresses on the grid system have been used, namely, case 1, in which VP tests (from the NGB suit [44]) are submitted every 35 min; case 2, which consists in submitting VP tests every 20 min; and case 3, where one VP test is submitted when the previous one has just already finished (labeled as T_VP). Hence, case 1 is the less stressing, and case 2 is the most stressing. For all the cases, 5 VP tests were submitted. In this experiment, the metric used for evaluating the performance obtained by the user is the average completion time of all VPs. Fig. 8 depicts the results for each submission frequency for each scheduling technique, and Table 3 presents a summary of such information. As can be seen in Fig. 8, the best performances
L. Tomás et al. / Future Generation Computer Systems 28 (2012) 1058–1069
1067
Table 3 Percentage of improvement by using the autonomic implementation with ES (ANM-ES).
ANM-ES vs. GW ANM-ES vs. GW-MHZ ANM-ES vs. GW-Net ANM-ES vs. ANM
35 min (%)
20 min (%)
T_VP (%)
Average (%)
31.53 17.09 10.92 8.6
25.19 21.72 9.03 4.8
18.37 5.21 6.97 3.27
25.03 14.67 8.98 5.56
Fig. 8. VP average completion time.
for all the submission frequencies are again obtained by ANM and ANM-ES, even taking into account that the CAC is disabled for a fairer comparison. Hence the stability of the behavior of a resource is the best way of choosing the resource to run a job. On the other hand, these results also highlight the usefulness of using the exponential smoothing function to estimate the time needed to complete the execution of a job. This leads to predictions which are more accurate and it is possible to obtain an improved value of the resource TOLERANCE. Hence, the resource selection process is better and the time needed to complete a job is decreased. For these reasons, the ANM-ES technique obtains the best results. The main differences arise when ANM-ES is used, although the largest difference is between using network-aware (GW-Net, ANM, and ANM-ES) and non network-aware techniques (GW and GW-MHZ) due to the fact that VP is very network demanding. Moreover, the largest differences are obtained for 35 min submission frequency, since at this rate there is less load in the grid. There are now more free resources and it is possible to select a better resource due to the fact that the system has more idle resources to choose from. To sum up, these results highlight the usefulness of using the TOLERANCE parameter to perform better selection of resources, and consequently, in the QoS delivered to users. Moreover, the benefits of using exponential smoothing to predict the future status of resources are also illustrated. This way, we obtain, on average, around 25% completion time reduction compared with GW (as presented in Table 3), around 15% compared with GW-MHZ and almost 10% compared with the network-aware GridWay implementation. Furthermore, the improvement provided by the use of exponential smoothing function means a completion time reduction of more than 5% over ANM. Finally, from a system point of view, the autonomic techniques (ANM-ES and ANM) also present better behavior since the workload is better balanced over the resources. This can be seen in Fig. 9, which depicts the percentage of jobs submitted to each resource when using each technique. It must be noted that when using GW
Fig. 9. Resource usage.
or GW-MHZ, the resource usage is not balanced since resources are selected based on the order in which they are discovered or their CPU speed (which are static parameters), rather than just taking into account available capacity (such as percentage of free CPU) to be able to execute the incoming job. Moreover, if GW-Net is used, the resource usage is also not balanced since only the resources with better bandwidth are selected. However, when using autonomic techniques (ANM-ES and ANM), the scheduling of jobs is more balanced since almost all the resources are used. Additionally, some resources are also more used than others due to the fact that they have a different performance. This means that from the TOLERANCE point of view, they have a better behavior because they present a more predictable performance. This is specially true when ES is used, since the time needed to complete the jobs is better estimated due to the fact that it takes into account the predicted status for the resource. This makes the resources usage slightly more balanced when using ES. Hence, it is a better solution to choose the more predictable resources with the aim of reducing the time needed to complete the jobs. 6. Conclusions and future work This paper presents a working implementation of an architecture which combines concepts from grid scheduling with autonomic computing, in order to provide users with a more adaptive job management system. The architecture involves consideration of the status of the network when reacting to changes in the system – taking into account the load on computing resources and the network links when making a scheduling decision. This architecture was originally presented and tested by means of simulations in [6]. This work presents an implementation based on GridWay [7], an open source grid meta-scheduler. The architecture provides scheduling of jobs to computing resources so that the network does not become overloaded. In order to perform the implementation of the autonomic networkaware meta-scheduler, a first step was the extension of GridWay
1068
L. Tomás et al. / Future Generation Computer Systems 28 (2012) 1058–1069
to be network-aware. Subsequently, the autonomic behavior is implemented by means of (1) adding a new attribute TOLERANCE that GridWay uses to perform the filtering and sorting of resources; (2) performing connection admission control (CAC); and (3) using Exponential Smoothing (ES) [8] to improve predictions of job execution times. The term TOLERANCE and the CAC were originally introduced and tested by means of simulations in [6], but both had to undergo adaptation when used in the GridWay implementation. Moreover, the use of ES alongside the CAC algorithm is a novelty of this work. The main contributions of this paper are: (1) an implementation of the architecture to perform autonomic network-aware metascheduling based on GridWay; (2) a scheduling technique that relies on ES to predict the completion time of jobs; (3) a performance evaluation carried out using a real testbed involving several workloads and heterogeneous resources from several organizations. Several ways of performing the scheduling of jobs to computing resources are evaluated in this paper, namely GW, GW-MHZ, GW-Net (presented in [37]), ANM (presented in [6]) and ANM-ES (novelty of this work). This evaluation uses different workloads and heterogeneous resources belonging to different organizations, and shows that the autonomic behavior based on exponential smoothing improves the performance received by users and yields a better load balance among resources. Finally, authors have several guidelines for future work. Among others, the study of more sophisticated methods (apart of ES) for prediction of execution times. Also, it must be noted that in this work focus is placed on the scheduling process. In the future, authors plan to address the issues related to network reservations, i.e., to develop a Bandwidth Broker which collaborates with the meta-scheduler in order to perform network reservations. Using this approach, the effective bandwidth in between measures could be better predicted and the estimations would be more accurate. Hence, the TOLERANCE value would be improved and there would be less options of choosing a wrong resource due to a misprediction. An additional extension involves deploying the autonomic implementation on EGEE resources [50], providing us with a larger grid testbed, where resources are more distributed and more users are involved in submitting jobs to the system. Acknowledgments This work was supported by the Spanish MEC and MICINN, as well as the European Commission FEDER funds, under Grants ‘‘CSD2006-00046’’ and ‘‘TIN2009-14475-C04’’. It was also partly supported by the JCCM under Grants ‘‘PBI08-0055-2800’’ and ‘‘PII1C09-0101-9476’’. References [1] U. Schwiegelshohn, et al., Perspectives on Grid computing, Future Generation Computer Systems 26 (8) (2010) 1104–1115. [2] I. Foster, C. Kesselman, The Grid 2: Blueprint for a New Computing Infrastructure, 2nd ed., Morgan Kaufmann, 2003. [3] J. Austin, T. Jackson, M. Fletcher, M. Jessop, P. Cowley, P. Lobner, Predictive maintenance: distributed aircraft engine diagnostics, in: The Grid 2: Blueprint for a New Computing Infrastructure, Elsevier Science, 2004 (Chapter). [4] F.T. Marchese, N. Brajkovska, Fostering asynchronous collaborative visualization, in: Proc. of the 11th Intl. Conference on Information Visualization Zürich, Switzerland, 2007. ISBN: 0-7695-2900-3. [5] I. Foster, M. Fidler, A. Roy, V. Sander, L. Winkler, End-to-end quality of service for high-end applications, Computer Communications 27 (14) (2004) 1375–1388. [6] A. Caminero, O. Rana, B. Caminero, C. Carrión, Performance evaluation of an autonomic network-aware metascheduler for Grids, Concurrency and Computation: Practice and Experience 21 (13) (2009) 1692–1708. [7] E. Huedo, R.S. Montero, I.M. Llorente, A modular meta-scheduling architecture for interfacing with pre-WS and WS Grid resource management services, Future Generation Computer Systems (ISSN: 0167-739X) 23 (2) (2007) 252–261.
[8] P.S. Kalekar, Time series forecasting using Holt–Winters exponential smoothing, Tech. Rep., Kanwal Rekhi School of Information Technology, 2004. [9] R.A. Ali, O. Rana, G. von Laszewski, A. Hafid, K. Amin, D. Walker, A model for quality-of-service provision in service oriented architectures, International Journal of Grid and Utility Computing (2005). [10] D. Adami, et al. Design and implementation of a Grid network-aware resource broker, in: Proc. of the Intl. Conference on Parallel and Distributed Computing and Networks, Innsbruck, Austria, 2006. [11] F. Xhafa, A. Abraham, Computational models and heuristic methods for Grid scheduling problems, Future Generation Computer Systems 26 (4) (2010) 608–621. [12] D. Guan, Z. Cai, Z. Kong, Provision and analysis of QoS for distributed Grid applications, in: Proc. of the 5th Intl. Conference on Wireless Communications, Networking and Mobile Computing, WiCOM, 2009, pp. 4191–4194. [13] H.-H. Chu, K. Nahrstedt, CPU service classes for multimedia applications, in: Proc. of Intl. Conference on Multimedia Computing and Systems, ICMCS, Florence, Italy, 1999. [14] G. Mateescu, Extending the portable batch system with preemptive job scheduling, in: SC2000: High Performance Networking and Computing Dallas, USA, 2000. [15] C. Cárdenas, M. Gagnaire, Evaluation of flow-aware networking (FAN) architectures under GridFTP traffic, Future Generation Computer Systems 25 (8) (2009) 895–903. [16] K. Kurowski, B. Ludwiczak, J. Nabezyski, A. Oleksiak, J. Pukacki, Dynamic Grid scheduling with job migration and rescheduling in the GridLab resource management system, Scientific Programming 12 (4) (2004) 263–273. [17] X. Wei, Z. Ding, S. Yuan, C. Hou, H. Li, CSF4: a WSRF compliant meta-scheduler, in: Proc. of the Intl. Conference on Grid Computing & Applications GCA, Las Vegas, USA, 2006. [18] O. Waldrich, P. Wieder, W. Ziegler, A meta-scheduling service for co-allocating arbitrary types of resources, in: Proc. of the 6th Intl. Conference on Parallel Processing and Applied Mathematics, PPAM, Poznan, Poland, 2005. [19] S. Venugopal, R. Buyya, L.J. Winton, A Grid service broker for scheduling e-science applications on global data Grids, Concurrency and Computation: Practice and Experience 18 (6) (2006) 685–699. [20] Grid(Lab) Resource management. Web page at: http://www.gridlab.org/ WorkPackages/wp-9/ (date of last access: 30.08.10). [21] F. Palmieri, Network-aware scheduling for real-time execution support in data-intensive optical Grids, Future Generation Computer Systems 25 (7) (2009) 794–803. [22] R. Wolski, N.T. Spring, J. Hayes, The network weather service: a distributed resource performance forecasting service for metacomputing, Future Generation Computer Systems 15 (5–6) (1999) 757–768. [23] J.O. Kephart, D.M. Chess, The vision of autonomic computing, Computer (ISSN: 0018-9162) 36 (1) (2003) 41–50. http://dx.doi.org/10.1109/MC.2003. 1160055. [24] M. Parashar, Autonomic Grid computing, Autonomic Computing—Concepts, Requirements, Infrastructures, M. Parashar and S. Hariri, (Eds.), CRC Press, 2006. [25] S. Dobson, S.G. Denazis, A. Fernández, D. Gaïti, E. Gelenbe, F. Massacci, P. Nixon, F. Saffre, N. Schmidt, F. Zambonelli, A survey of autonomic communications, ACM Transactions on Autonomous and Adaptive Systems (TAAS) 1 (2) (2006) 223–259. [26] X. Dong, S. Hariri, L. Xue, H. Chen, M. Zhang, S. Pavuluri, S. Rao, Autonomia: an autonomic computing environment, in: Proc. of the IEEE Intl. Conference on Performance, Computing and Communications, 2003. [27] H. Liu, M. Parashar, S. Hariri, A component-based programming model for autonomic applications, in: Proc. of the Intl. Conference on Autonomic Computing ICAC, New York, USA, 2004. [28] J.H. Abawajy, Autonomic job scheduling policy for Grid computing, in: Proc. of the 5th Intl. Conference on Computational Science ICCS, Atlanta, USA, 2005. [29] R. Nou, F. Julií, K. Hogan, J. Torres, A path to achieving a self-managed Grid middleware, Future Generation Computer Systems 27 (1) (2011) 10–19. [30] S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke, A directory service for configuring high-performance distributed computations, in: Proc. of 6th Symposium on High Performance Distributed Computing, HPDC, Portland, USA, 1997. [31] M.L. Massie, B.N. Chun, D.E. Culler, The Ganglia distributed monitoring system: design, implementation, and experience, Parallel Computing 30 (5–6) (2004) 817–840. [32] NLANR/DAST, Iperf—The TCP/UDP Bandwidth measurement tool. Web page at: http://sourceforge.net/projects/iperf/ (date of last access: 30.08.10). [33] S. Sohail, K.B. Pham, R. Nguyen, S. Jha, Bandwidth broker implementation: circa-complete and integrable, Tech. Rep., School of Computer Science and Engineering, The University of New South Wales, 2003. [34] L. Tomás, A. Caminero, B. Caminero, C. Carrión, Using network information to perform meta-scheduling in advance in Grids, in: Proc. of the 16th International Conference on Parallel Computing Euro-Par, Ischia, Italy, 2010. [35] L. Tomás, A. Caminero, C. Carrión, B. Caminero, Exponential Smoothing for network-aware meta-scheduler in advance in Grids, in: Proc. of the 6th Intl. Workshop on Scheduling and Resource Management on Parallel and Distributed Systems SRMPDS, in Conjunction with the 39th Intl. Conference on Parellel Processing ICPP, San Diego, USA, 2010. [36] Y. Zhang, W. Sun, Y. Inoguchi, Predict task running time in Grid environments based on CPU load predictions, Future Generation Computer Systems 24 (6) (2008) 489–497.
L. Tomás et al. / Future Generation Computer Systems 28 (2012) 1058–1069 [37] L. Tomás, A. Caminero, B. Caminero, C. Carrión, Studying the influence of network-aware Grid scheduling on the performance received by users, in: Proc. of the Grid Computing, High-PerformAnce and Distributed Applications GADA, Monterrey, Mexico, 2008. [38] I.T. Foster, Globus toolkit version 4: software for service-oriented systems, in: Proc. of the Intl. Conference on Network and Parallel Computing NPC, Beijing, China, 2005. [39] W.R. Stevens, TCP/ IP Illustrated: The Protocols, Addison-Wesley, ISBN: 0-20163346-9, 1994. [40] M. Dobber, R. van der Mei, G. Koole, A prediction method for job runtimes on shared processors: survey, statistical analysis and new avenues, Performance Evaluation 64 (7–8) (2007) 755–781. [41] The R foundation. Web page at: http://www.r-project.org/ (date of last access: 30.08.10). [42] Portable batch system. Web page at: http://www.openpbs.org (date of last access: 30.08.10). [43] G. Chun, H. Dail, H. Casanova, A. Snavely, Benchmark probes for Grid assessment, in: Proc. of 18th Intl. Parallel and Distributed Processing Symposium, IPDPS, Santa Fe, New Mexico, 2004. [44] M. Frumkin, R. Van der Wijngaart, NAS Grid Benchmarks: a tool for Grid space exploration, in: Proc. of 10th Intl. Symposium on High Performance Distributed Computing, 2001. [45] The NAS parallel benchmark. Web page at: http://www.nas.nasa.gov/ Resources/Software/npb.html (date of last access: 30.08.10). [46] J. Vazquez-Poletti, E. Huedo, R. Montero, I. Llorente, A comparison between two Grid scheduling philosophies: EGEE WMS and GridWay, Multiagent and Grid Systems 3 (4) (2007) 429–439. [47] J. Seidel, O. W aldrich, W. Ziegler, P. Wieder, R. Yahyapour, Using SLA for resource management and scheduling—a survey, Tech. Rep. CoreGRID TR0096, Institute on Resource Management and Scheduling, 2007. [48] G. GSA-RG, Grid scheduling use cases, Tech. Rep. GFD-I.064, Global Grid Forum, 2006. [49] C. Grimme, Grid metaschedulers: an overview and up-to-date solutions, Tech. Rep., University of Dortmund, 2006. [50] C. Vázquez, E. Huedo, R.S. Montero, I.M. Llorente, Federation of TeraGrid, EGEE and OSG infrastructures through a metascheduler, Future Generation Computer Systems 26 (7) (2010) 979–985.
Luis Tomas is a Ph.D. student of Computer Science at the University of Castilla-La Mancha, Spain. Luis received his B.E. and M.E. degrees in Computer Science from the University of Castilla-La Mancha (Albacete, Spain) in 2007 and 2009. He has been working with resource management and job scheduling for Grid environments since 2007. Luis’ current research effort is on efficient meta-scheduling in advance in Grids to provide QoS to users.
1069
Agustin Caminero is an Assistant Professor of Computer Science in the Dept. of Communication and Control Systems at The National University of Distance Education (Madrid, Spain). He obtained a Ph.D. degree in Computer Science with European Mention from the University of Castilla-La Mancha (Albacete, Spain) in 2009. His interests include meta-scheduling in Grids and Clouds, Quality of Service (QoS), simulation, and e-learning.
Omer Rana is a Professor of Computer Science at the Cardiff University, and the Deputy Director of the Welsh eScience Center. He holds a Ph.D. degree in ‘‘Parallel Architectures and Neural Computing’’ from the Imperial College (London University).
Carmen Carrion is an Associate Professor of Computer Architecture and Technology at the Computing Systems Department at the University of Castilla-La Mancha. She holds a Ph.D. Degree in Physics from the University of Cantabria, and her interests include architecture of interconnecting devices, meta-scheduling and QoS in Grids and Clouds.
Blanca Caminero is an Associate Professor of Computer Architecture and Technology at the Computing Systems Department (University of Castilla-La Mancha). She holds a Ph.D. Degree in Computer Science, and her current research interests are QoS support and metascheduling in Grids and Clouds.