Active rule learning using decision tree for resource management in Grid computing

Active rule learning using decision tree for resource management in Grid computing

Future Generation Computer Systems 27 (2011) 703–710 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: ...

1MB Sizes 0 Downloads 39 Views

Future Generation Computer Systems 27 (2011) 703–710

Contents lists available at ScienceDirect

Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs

Active rule learning using decision tree for resource management in Grid computing Leyli Mohammad Khanli, Farnaz Mahan ∗ , Ayaz Isazadeh Computer Science Department, Faculty of Mathematical Sciences, University of Tabriz, Tabriz, Iran

article

info

Article history: Received 11 August 2010 Received in revised form 21 December 2010 Accepted 28 December 2010 Available online 8 January 2011 Keywords: Grid computing Rule learning Decision tree Resource management Grid-JQA

abstract Grid computing is becoming a mainstream technology for large-scale resource sharing and distributed system integration. One underlying challenge in Grid computing is the resource management. In this paper, active rule learning is considered for resource management in Grid computing. Rule learning is very important for updating rules in an active database system. However, it is also very difficult because of a lack of methodology and support. A decision tree can be used in rule learning to cope with the problems arising in active semantic extraction, termination analysis of the rule set and rule updates. Also our aim in rule learning is to learn new attributes in rules, such as time and load balancing, in regard to instances of a real Grid environment that a decision tree can provide. In our work, a set of decision trees is built in parallel on training data sets based on the original rule set. Each learned decision tree can be reduced to a set of rules and thence conflicting rules can be resolved. Results from cross validation experiments on a data set suggest this approach may be effectively applied for rule learning. © 2011 Elsevier B.V. All rights reserved.

1. Introduction There are many definitions of Grid computing, with the most commonly quoted definition being given by Ian Foster, ‘‘Resource sharing and coordinated problem solving in dynamic multiinstitutional virtual organizations’’ [1]. A Grid is a very large-scale network computing system that can scale up to Internet size environment, in which all kinds of computing, storage and data resources, as well as scientific devices or instruments, are distributed across multiple organizations and administrative domains [2,3]. Grid computing is an emerging technology that enables users to share a large number of computing resources distributed over a network. In Grid environments, virtual organizations (VOs) associate heterogeneous users and resource providers such that it is not known how large individual VOs will be, but it is reasonable to imagine resource sharing among populations with tens of thousands of users and thousands of resources [3,4]. Supercomputers, SMPs, clusters, desktop PCs, or even mobile computing devices such as PDAs can be computing resource in the Grid architecture. The Grid resource management system (GRMS) is an important component because it is responsible for storing resource information across the Grid, accepting requests for resources, discovering and scheduling suitable resources that match the requests from the global Grid resources, and executing the requests on scheduled resources. The design and implementation of



Corresponding author. Tel.: +98 9144111803. E-mail addresses: [email protected], [email protected] (F. Mahan).

0167-739X/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2010.12.016

GRMS is challenging because the Grid is geographically distributed, heterogeneous and autonomous in nature [1,2,4]. Any time that a set of resources needs to be allocated over a set of users we face an NP-complete optimization problem [5]. By managing resources within the active ECA rules we minimize the NP-complete Allocation Problem within On-Line and Off-Line systems. 1.1. The problem The aim is to find an adaptive resource management related to the real-time Grid environment. The overall problem is to respond quickly and to increase the performance of resource management. For this aim, we need a learning system to update the resource management rules. We need to recognize which rules should be consolidated, which rules should be modified, and which rules should be invalidated. Rule updating, in relation to instances must contain: 1. The ability to change the value of some parameters in the rules. 2. The ability to add new attributes to the original rules and to create new ones. 3. The ability to create new rules that do not conflict with the original rules. Upon learning our resource management must have the following characteristics: 1. Guarantee to respond to most requests in a real Grid. 2. Respond more quickly with a learning system than without a learning system. 3. Have a higher performance and be more adaptive with a learning system than without a learning system.

704

L.M. Khanli et al. / Future Generation Computer Systems 27 (2011) 703–710

1.2. Motivation In reality, resources management can be very complex, and may require co-allocation of different resources, such as specific amounts of CPU hours, system memory, and network bandwidth for data transfer, etc. [3,6]. In this paper we need an active learning database to manage the information and rules. We use Grid-JQA, an architecture supporting such rules in Grid environments, that has been described in previous research [7–9]. Grid Java based Quality of service management by Active database (Grid-JQA) is a framework that provides workflow management for quality of service on different types of resources, including networks, CPUs, and disks. It also encourages Grid customers to specify their quality of service needs based on their actual requirements [7,9]. An active database system (ADBS) is a database system that monitors any situation of interest, and triggers an appropriate response in a timely manner when the interesting events occur [10,7]. The active behavior of a system is generally specified by means of rules that describe the situations to be monitored and the reactions of the system when these situations are encountered. In its general form, a rule consists of a rule event, a condition and an action, known as Event–Condition–Action rules or ECA rules. In Grid-JQA there is an Active Grid Information Server that automatically selects optimal resources using active database ECA rules and requests resource allocation [10,7,11,12]. We focus on active rule learning for improving and learning in resource management in the Grid. Active rule learning is very important for an active database system implementation and we also need the resource management to be updated in Grid computing. But, it is also difficult because many problems arising in rule learning can not be eliminated directly by using traditional database techniques [13,14]. There is a lacks of methodology and support for rule learning. Different representations of concepts may be learned from a set of labeled data, such as neural networks and decision trees [14,15]. A decision tree is useful for representing rules and its learning is reasonably fast and accurate. 1.3. Objective and the claim Our goal is to have an updated active resource management after learning that is done dependently on new instances in a real event-based environment. Hence, the architecture of our active database has two parts: an Off-Line part and an On-Line part. The On-Line part has static original rules and manages the resource in real time while the Off-Line part performs learning, simulates new examples of the environment, and creates new and updated rules based on the original ones. We focus on 19 ECA rules, as the original rules, that were introduced in [7,11] for resource management in Grid-JQA. Some rules have static parameters that are determined by an expert, but when these rules are used in a real time environment, these parameters must be changed and updated, based on instances received, in order to improve the resource management performance. Because we receive instances at different times and states, also we must add new attributes such as time and load balancing to the rules. It helps that rule learning be efficient. Our approach to learning different rules is to parallelize the process of learning by using decision trees. In this research, the final representation of the active Grid information server must be ECA rules so we must create rules from decision trees. It is straightforward to reduce a decision tree to rules. The strategy pursued here is to break instances into n partitions based on events, then learn a decision tree on each of the n partitions in parallel [14,16]. A new decision tree will be grown independently when it is needed. At each learning period in each partition, we may have several decision trees related to each rule that must be

combined in some way. In [17], the decision trees are combined using metalearning. Also in some partitions, there may be some new decision trees growing independently when there are new attributes in the instances. So the independent decision tree of each rule can be viewed as the agents learning a little about a domain. In our approach, after combining, each decision tree at each partition will be converted to a rule, then the validation of the rules must be checked according to the original rules. Also the performance of a new rule must be evaluated. This new rule set is added to the On-line system and then will be used to respond to new incoming instances. 1.4. Paper outline This paper is organized as follows: Section 2 demonstrates related work that contains the architecture of the Grid-JQA & AGIS ECA rule engine and decision tree. In Section 3, we describe the active rule learning process for Resource Management in GridJQA and we show the use of decision trees for rule learning. In Section 4, we evaluate our approach. Finally, we conclude the paper in Section 5. 2. Related works Resources management on Grids is a complex procedure involving coordinated resource sharing and responding to the requirements of users. Previous research uses the matchmaking framework implemented in Condor-G. The matchmaking procedure evaluates job and resource rank and requirements expressions to find ideal matches. Additional notable research in this case uses a variety of makespan minimizing heuristics, and the GrADS metascheduler [18]. The GridLab Resource Management System (GRMS) is a job metascheduling and resource management framework. It is based on dynamic resource discovery and selection, mapping and scheduling methodologies. It is also able to deal with resource management challenges [11]. The Globus resource management architecture [7] includes an information service. It plays an important role because it is responsible for providing the information about the current availability and capability of resources, has a co-allocator that is responsible for coordinating the allocation and management of resources at multiple sites, a manager that is responsible for taking RSL (Resource Specification Language) specification, and GRAM (Grid Resource Allocation Management), which is responsible for managing local resources [7]. The Grid-JQA system architecture consists of a Grid portal, Active Grid Information Server (AGIS) (includes ECA resource manager rules and ECA fault manager rules), a fault detector, and GRAM. To execute a job with the Grid-JQA, a user uses RSL to describe a resource type, a resource condition, and the number of resources. RSL is the specification language used by the Globus Toolkit to describe task configuration and service requirements. Then the user sends RSL to a Grid-JQA and the RSL parser extracts the resource type and resource condition and sends them to an AGIS. The main goal of using Grid-JQA is to provide seamless access to users for submitting jobs to a pool of heterogeneous resources, and at the same time, dynamically to monitor the resource requirements for execution of applications [8,9]. In this paper, we used Grid-JQA, which was introduced in [7] and is represented in Fig. 1 (taken from [7]). The resource manager automatically selects the set of optimal resources among the set of candidate resources using ECA rules and requests resource allocation, thus providing convenience for a user to execute a job. It also guarantees efficient and reliable job execution through a fault tolerance service [7,9]. Initially, the base ECA rules are specified

L.M. Khanli et al. / Future Generation Computer Systems 27 (2011) 703–710

705

Fig. 1. Architecture for AGIS ECA rule from [7].

according to the experience of experts, which we call the original ECA rules. Active database systems are characterized by the ability to automatically monitor predefined situations and react to them according to predefined actions. An active database system must provide a knowledge model and an execution model for supporting the reactive behavior [13,19,20]. ECA rules are one way of implementing this kind of functionality [12,13]. The ECA rule execution model specifies how a set of rules is treated at runtime. A brief description of rules execution is given in Fig. 2 [8,13]. An ECA rule has the general syntax as follows [12,13]: DEFINE RULE rule_name ON event IF condition DO action A rule is defined through a name and a description. The rule_name is the identifier of the rule. Every rule has a unique rule_name. The description of a rule commonly includes three components. Events: describes an event to which the rule may be able to respond. Events can be roughly subdivided into two categories: primitive events that correspond to elementary occurrences, and composite events that are composed of other composite or primitive events. Conditions: specifies what has to be checked once the rule is responded to and before it is executed. Actions: describes the task to be performed by the rule if the relevant event has taken place and the condition has evaluated to true. One such active rule in a Grid is as follows [7]: AGIS_Advertise_Rule ON Event AGIS_Advertise IF Condition select agis from AGIS_UPPER where agis.threshold = min(*.threshold) DO Action home.Advertise (agis); The Grid is real-time environment so we need to learn ECA rules for high performance. The most important questions of rule learning are how to meaningfully represent the semantics of the real world and how to guarantee a safe, error-free rule behavior. Indeed, experience in active applications indicates that it is not straightforward to achieve this objective [7,13,15,21].

It is important to update the existing rules in a dynamic environment, which is our aim. But which rules should be consolidated, which rules should be modified, and which rules should be invalidated? In order to achieve the task of assisting the rule developer, appropriate rule learning methodologies and techniques are needed. Most existing approaches for active experiment selection and active learning are based on neural networks or statistical models [21]. Both kinds of methods have been proven to show good performances on function approximation and classification tasks but lack a way of interpreting the resulting model. One approach for active learning of ‘easy to interpret’ rules is presented in [22]. The method is based on different heuristics that aim to maximize the volume that is covered by the rules. Its main weakness is that the rules derived are reviewed insufficiently so that errors in the rule base may persist. The approach described in [23] only delivers regions of interest where experiments could be performed. Concrete experiments have to be provided by other means, e.g. a human operator or a database [21,19]. We used decision tree for ECA rule learning in resource management. We make no reference to intelligent ECA rule learning. For description, classification and generalization of the data, a decision tree is valuable tool [24]. This comes from the compact and intelligible representation of the learned classification function, and the availability of a large number of efficient algorithms for their automated construction [16,25]. They give a hierarchical way to represent rules underlying data [26]. For the automatic construction of decision trees, ID3 or CART [27] can be used. In a decision tree, at each node an attribute must be chosen to split the node’s examples into subsets [26,28]. In this paper, we only consider the case of continuous attributes. There are different measures [29–31] that can be provided to determine how good a particular split is for an attribute. The form continuous attribute splits are Attribute ≤ X or Attribute > X [13.30]. At first, we do not need it because we determine the split of a tree from base ECA rules. 3. Our purposed active rule learning in resource management We used a Grid-JQA architecture that includes ECA rules. In Grid-JQA, the ECA resource manager rules eliminate user participation from resource management and select the set of optimal

706

L.M. Khanli et al. / Future Generation Computer Systems 27 (2011) 703–710

Event Event Source Signaling Occurrence

Triggering

triggered Selection & Evaluation rules

Selected Rule

Execution

Fig. 2. The steps of rule processing.

Fig. 3. Architecture of the Grid-JQA and AGIS ECA rule engine.

resources; they provide a user with the convenience of job execution and ensure that the job is executed efficiently. Especially, ECA resource manager rules reduce the performance degradation that is caused by the user’s arbitrary selection of resources and ensure efficient job execution after resource allocation. A user’s arbitrary resource selection does not always guarantee that the set of optimal resources is selected and it reduces the performance of job execution. ECA resource manager rules provide the set of optimal resources for job execution [7]. We need to optimize our ECA rules dynamically and learn new rules historically for high performance because the Grid environment is generally unpredictable. For this aim, we use decision trees for rule learning. As follows, first, in Section 3.1, we describe the extended Grid-JQA architecture for adding rule learning to a rule engine, then, in Section 3.2, our rule learning is presented. 3.1. Extended Grid-JQA architecture We define an active learning Grid resource server as an active database that includes ECA resource management rules for resource selection and job scheduling [7–9]. We extend the GridJQA architecture to achieve rule learning. We suppose the system has two parts. Fig. 1 is the On-Line system, which communicates with the environment via the active rule base in order to respond to an event. The other part of rule learning is an Off-Line system that simulates new conditions and examples. We describe it in Section 3.2. Fig. 3 shows the architecture of the rule engine GridJQA with added rule learning in it. The resource parameters are processors, network, and memory. From previous work [10], we use the approach that the user can assign a weight to each parameter that shows the importance of the parameter to the user. The user does not recognize a weight for the memory parameter, because less memory causes the job to fail to be executed properly while more memory does not have

any effect. Let us assume that a Grid infrastructure consists of N tasks. The request is described by the vector of QoS parameters qTaskj , j = 1, 2 . . . , N and the weights for the parameters by Eqs. (1) and (2). In this equation kis the number of parameters.



Taskj

qTaskj = q1

Taskj

, q2



Taskj

Taskj

≤ 1,

W Taskj = W1 0 ≤ Wi

Taskj

, . . . , qk Taskj

, W2



(1)

 Task

, . . . , Wk−1 j ,

k−1 −

Taskj

Wi

= 1.

(2)

i −1

For example, if CPU is important for one task, the client will set 1 for the CPU weight and zero for the others. The last QoS parameter is Taskj

Taskj

used for memory qk = qmem , and a weight, as mentioned above, is not assigned for the memory parameter. GRAM advertises resource level capabilities to the AGIS. When the resource capabilities are changed, the fault detector informs the AGIS by a fault event. Let us assume that a Grid infrastructure consists of M resources. The capabilities of a Grid resource is expressed by the resource parameter vector qResj , j = 1, 2, . . . , M as it appears in Eq. (3).



Rresj

qResj = q1

Res

Resj

, q2 j , . . . , qk Resj



(3)

The elements qi , i = 1, 2, . . . , k indicate the independent capabilities of the jth resource that affect its performance. The proposed solution in previous work is that instead of using a maximum, we can use a threshold in (4). As introduced in [10,7] is satisfy operator. R T means that the resource R can satisfy the task T and guarantee QoS parameters. If the summation is more than the threshold, the AGIS will choose it as the best matched resource.

L.M. Khanli et al. / Future Generation Computer Systems 27 (2011) 703–710

707

Fig. 4. ECA Rule Learning Off-Line system.





Taskj

i qmem ≥ qRes mem

Taskj =

Resi

Res k−1 − ql i l =1

Taskj

ql



 Taskj l

×w

and

 ≥ Threshold .

(4)

Now, the problem is ‘‘what should the threshold value be?’’. To solve this problem, taking into consideration that an active database is used, we introduce a decision tree for ECA rules learning in an active database to calculate the threshold in relation to environmental changes. 3.2. Active rule learning process Rule learning is an important part of an active database system. Its aim is the precise specification of a reliable, usable rule set, which has a safe rule behavior and does not exhibit unintended side-effects. As we have said, our proposed system consists two parts. The first part, the On-Line system, was described in Section 3.1. The other part is the Off-Line system, which simulates new conditions and examples and constructs a new rule base of original rules using a decision tree. If the new rule has a good effect, it is added to the active rule base. In our architecture, the Off-Line part of the system performs rule learning. In Fig. 4, the rule learning architecture is shown. According to the events and points raised in the Grid-JQA matching algorithm [7–9], we apply the 19 ECA rules given in [10,7] as the original ECA rules for implementing rule learning in resource management. Active rules learning can be regarded as a whole process and divided into four steps (See Fig. 4): Step 1. Rule Analysis: aims to check if the rule behavior responds better or not during the execution of an active application or event occurrences, and thus needs updating or modifying. Step 2. Rule Update: after rule analysis, the rule set may require updating. When active requests are changing, ECA rules have to be updated in accordance with the new value and conditions. Rule update is indispensable for the rules set to remain useful. To achieve this update and learning, we used a decision tree for each rule, based the instance of the event. In this paper, we focus on this step. Step 3. Rule Validation: we used the original ECA rule set to verify the new rule because the new or changed rule must be based on the original rules according to the expert requirements.

Fig. 5. Our approach for an active rule.

Step 4. Rule Evaluation: we simulated results to see if the performance is efficient, such that we can transfer the change to the active rule set in On-Line part. It is also possible to update or invalidate existing rules by mining the triggered rules history. If a rule has not been triggered recently, it may not be appropriate for inclusion in the rules set, and should be invalidated [16]. In this paper, we focus on step 2, which uses a decision tree for Grid ECA rule update in resource management, as described in the following. A. Decision tree for rule learning in step 2 We want to update the ECA rules for high performance in resource management. Here we use some terms: Original rules: base rules that an expert has first determined. Learning period: period of time that the Off-Line system undergoes learning. It is determined by the expert. We create initial decision trees for each rule of the original ECA rules and suppose the root node of them is the related event of each rule. At the end of each tree, we have a maximum of two leaves that describe the actions of each rule. Fig. 5 presents our approach. In this paper we focus on the rule update step (step 2), whose algorithm is shown in Fig. 6.

708

L.M. Khanli et al. / Future Generation Computer Systems 27 (2011) 703–710

Fig. 6. Rule update algorithm.

When the rule event occurs, it refers to the ECA rule related to it in the On-Line part. Our system does not need to update the rule, if the conditions of the event adapt with the related ECA rule’s conditions. But if it does not adapt or can not do better and quickly respond, we need to update the rule in the Off-Line rule learning system. In step 1 of the rule learning system, the instance is analyzed with regard to the related rule for that event. Here two cases may occur: 1-the value of the parameter in the condition of the rule did not match the instance or 2- a new attribute is in the instance with regard to the related rule. For example consider the following ECA rule: Insert_Ready_Task_Rule ON Event insert into READY_TASKS (task) If Condition  (select res from IDLE_RESOURCES  Where

res.q(i) i=1 task.q(i)

∑k

∗ task.w(i) ≥ Threshold

DO Action {home.satisfy ++; if (home.satisfy/(home.satisfy+home.unsatisfy)>_MAX_SATISFY) then home.IssueEvent(Inc_Threshold); home.GRAMAllocate(res, task); home.Monitor(res); Delete res from IDLE_RESOURCES; Insert res into RUNNING_RESOURCES; Delete task from READY_TASKS; Insert task into RUNNING_TASKS; ELSE {Action home.unsatisfy –; if(home.usatisfy/(home.satisfy+ home.unsatisfy)<_MIN_SATISFY) then home.IssueEvent( Dec_Threshold);} In this rule for case 1, home.threshold in the instance may take a value that the value of MAX_THRESHOLD could not match to it, so we need to learn a new value if we must respond to it. For case 2, it may be that a new attribute in a real request does not exist in the relevant rule, such as time and load balancing. Such attributes can be learnt over time. One benefit of our learning is the learning attributes that can be got from heuristics during system working. The decision tree has the ability to insert them into our rules, so we use it in step 2. In step 2 for rule updating, we consider these two cases. The first case occurs because our environment is dynamic; it may be a given an initial value that is not exactly true. So a new decision tree is generated for such requests with new values and they are collected in a related rule class up to the end of each learning period. In this case we have several trees in each rule class with regard to a new instance. Then combine the trees in each class into a new tree. This new tree must have the ability to cover all instances related to it. Now the rule can be created from the tree considered against the

base rule of this class. In the second case, a new tree is generated for such instances. If the number of no better and quickly adapted instances of one event is high, we create a tree for it and then create its new rule. Notice that the rule is new, but in reality it is an extended rule of related rule. Next, we compare and validate the new rules with the base rules of the system. If there is no conflict between the rules, the new rules can be evaluated for addition to the ECA rules in the On-Line system. The final rules will be ordered by their accuracy. Then we merge the two rule sets together and eliminate any redundant rules that have been created during the process of updating and learning. A decision tree gives better performance for updating ECA rules. In first case for combining trees, we must scope continuous attributes by finding one or more attributes that are the same but where the continuous value chosen for the test is different (e.g. threshold ≤ 5 and threshold ≤ 5.7) [32]. Here, for example, the threshold parameter in our example rule is a dynamic attribute. We need to support domains of such parameters that occur in this environment. If the attribute value must be greater than the existing values, the minimum value is used (e.g. threshold > 5 and threshold > 8 results in threshold > 5 as the condition of the combined tree). If the attribute value must be less than or equal to existing values, the maximum value is used in the combined tree. 4. Case study Under the model explored in this paper, a training data set is a new condition that occurs in a real environment and to which the original rules can not respond quickly. A training data set will be broken into n subsets with regard to events and rules. Decision trees learn from each of these n subsets in parallel. For example, one of the original rules is: Inc_Threshold_Rule ON Event Inc_Threshold IF Condition (home.Threshold < _MAX_THRESHOLD) DO Action home.IncThreshold( ) ELSE Action home.IssueEvent ( AGIS_Advertise );// under loaded The decision tree of this rule is shown in Fig. 7. MAX_ THRESHOLD is a continuous attribute, initially determined the expert. However, it may subsequently require changing on the basis of real examples and new data received from the environment. So the value of MAX_THRESHOLD is not static and must be learned over time.

L.M. Khanli et al. / Future Generation Computer Systems 27 (2011) 703–710

709

with learning

Inc_Threshold

without learning

100

_MAX_THRESHOLD

home.IncThreshold( )

Fig. 7. Decision tree of an example ECA rule.

20

h 8t

h 7t

6t h

5t h

4t h

3r

1s t

d

0

NPACI-set Task Length

1e+05

40

d

home.IssueEvent(AGIS_Advertise)

60

2n

else

Performance

80 home.Threshold

Learning Periods Number of Tasks

1e+04

Fig. 9. Performance of systems.

1000 100 10 1

0

1

2 3 4 Task length (seconds)

5

6 × 106

Fig. 8. Histograms of the task computation lengths found in the NPACI JOBLOG.

For example, when there are many free strong resources it is better that the AGIS assigns the strong resources to the weak tasks. So the threshold should be greater than one and the AGIS should select a resource precisely. But when there are a few strong resources it is better to set the threshold equal to one. So the AGIS should select a resource as soon as it finds one free, should not wait for strong resources, and should act using a first come first served strategy. The value of the threshold is changed by the ECA rules of AGIS in relation to the number of times that the related rule returns true or false for carrying out an action. Let R be the number of times that the relevant rule responds to the related request and NR be the number of times that the relevant rule does not respond to the related request. In this way if R is more than NR, the rule should have an increased threshold for assigning the strong resource, otherwise it should have a decreased threshold for assigning the resource to the task as soon as possible. To achieve this, we change the range of values of the attributes (such as threshold) in the related ECA rules. When an event is received and no rule from the existing active rules matches with it completely, the conditions of the request may be close to one or more original event rules. After analyzing it and realizing the new conditions, it sends it to the rule update. Rule update uses the decision trees that were described in Section 3.2.

JOBLOG provides job traces of a 128 node IBM SP system at San Diego Supercomputing Center and includes two years of data from May 1, 1998 until April 30, 2000. Each task is described by metadata defining, among other characteristics, its submission, start and end times, and the number of requested and allocated processors. In this simulation, we assume that the CPU cycle is from 1000 to 6000, and all resources have the same amount of memory and bandwidth. In this study, we consider tasks that did not respond early and quickly, or match better. We demonstrated that under the NPACI workload, the ECA rules based on a decision tree can improve the job throughput and can allow users to intervene with urgent tasks. In using ECA rules for resource management, when we received an event with a different value of attribute, in about 40% of events we could respond to them in a long time because the value of the threshold was static. Hence, when our threshold changes due to applying a learning decision tree for managing events, we could respond to more than 86% of events successfully and quickly. This result is shown in Fig. 9. It contains a performance comparison between systems with and without rule learning. 5. Conclusion The implementation of an active application requires many complex rules to specify the system’s active behavior. This paper used the Grid-JQA system architecture, which consists of an Active Grid Information Server (AGIS) (includes ECA resource manager rules and ECA fault manager rules) and a proposed rule learning Off-Line system for resource management in Grid computing. It is based on a decision tree approach to learning new examples and data, as well as to provide good dependability and efficiency, because there are some parameters that must be learned. It is also possible to update or invalidate existing rules by mining the history of rules triggered. 5.1. Demonstration of the claim

4.1. Experimentation via simulation Using a simulated computational Grid, we evaluate the performance of the resource management ECA rule strategies using the decision tree. We use task populations based on actual loads made by NPACI [33]. The histogram of task lengths is depicted in Fig. 8. The significance of using the NPACI population is that the real workload is characterized by long-tailed distributions, which inevitably affect the performance. We present results based on the NPACI-derived task population. The selected workload for this study is derived from traces obtained from the NPACI JOBLOG Job Trace Repository [33]. The

We now return to Section 1.1, where we defined the overall problem addressed by this paper. Recall that our proposed solution is ideally characterized as a method satisfying specific characteristics that we presented in Sections 3 and 4. 1. We presented, in Section 3.2, the Off-Line learning system architecture that, with rule updating, satisfies the first characteristic of the solution. 2. For the second and third characteristics, we evaluated our work in Section 4 and showed that our approach responds quickly with high performance. Also our learning approach is more adaptive. Fig. 9 gives the results of our approach.

710

L.M. Khanli et al. / Future Generation Computer Systems 27 (2011) 703–710

5.2. Future work A number of directions can be explored to extend and improve this work. We can use fuzzy decision trees for best adapting the rules to dynamic Grid environments. Another work is to purpose Grid’s user as multi agent system and do learning at them with recommendation. References [1] T. Abdullah, K.L.M. Bertels, S. Vassiliadis, Adaptive agent-based resource management for GRID, in: 12th ASCI Conference, Lommel, Belgium, June 2006, pp. 420–428. [2] H. Jin, Y. Pan, N. Xiao, J. Sun, An Active Resource Management System for Computational Grid, GCC 2004, in: LNCS, vol. 3251, 2004, pp. 225–232. [3] Aram Galstyan, Karl Czajkowski, Kristina Lerman, Resource allocation in the grid using reinforcement learning, AAMAS’04, New York, New York, USA, July, 2004, pp. 19–23. [4] U. Schwiegelshohn, R. Yahyapour, Resource Management for Future Generation Grids CoreGRID Technical Report Number TR-0005 May 19, 2005. [5] K. Bubendorfer, P. Komisarczuk, K. Chard, A. Desai, Fine grained resource reservation and management in grid economies, in: Proceedings of the 2005 International Conference on Grid Computing and Applications, Las Vegas, Nevada, USA, 2005, pp. 31–38. [6] Adrian Li Mow Ching, Dr Lionel Sacks, Paul McKee, Super ResourceManagment for Grid Computing, 2002. [7] L. Mohammad Khanli, M. Analoui, An approach to grid resource selection and fault management based on ECA rules, Future Generation Computer Systems Journal (2007). [8] L.M. Khanli, M. Analoui, Grid-JQA a New Architecture for QoS-guaranteed Grid Computing System, in: 14th Euromicro Conference on Parallel, Distributed and Network-based processing, Feb 15–17, PDP 2006, France. [9] L.M. Khanli, M. Analoui, Grid-JQA: grid Java based quality of service management by active database, in: 4th Australian Symposium on Grid Computing and e- Research, AusGrid, 2006. [10] L. Mohammad Khanli, M. Analoui, Active grid information server for grid computing, in: Computer Science, Journal of Supercomputing (2008). [11] Valentin Kravtsov, Thomas Niessen, Assaf Schuster, Werner Dubitzky, Vlado Stankovski, Grid Resource Management for Data Mining Applications, 2006. [12] M. Zoumboulakis, G. Roussos, A. Poulovassilis, Active rules for sensor databases, in: Proceedings of the First Workshop on Data Management for Sensor Networks, DMSN 2004, Toronto, Canada, August 30, 2004. [13] Min Dai, Ya-Lou Huang, Data Mining Used in Rule Design for Active Database Systems, in: Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 04, IEEE Computer Society, Washington, DC, USA, 2007, pp. 588–592. [14] Lawrence O. Hall, Nitesh Chawla, Kevin W. Bowyer, Decision tree learning on very large data sets, systems, man, and cybernetics, in: 1998 IEEE International Conference on, vol.3, San Diego, CA, USA, 11–14 Oct, 1998, pp. 2579–2584. [15] J.J. Grefenstette, C.L. Ramsey, A.C. Schultz, Learning sequential decision rules using simulation models and competition, Machine Learning 5 (4) (1990) 355–381. [16] S.K. Murthy, Automatic construction of decision trees from data: a multidisciplinary survey, Data Mining and Knowledge Discovery 2 (1998) 345–389. [17] P. Chan, S. Stolfo, On the accuracy of meta-learning for scalable data mining, Journal of Intelligent Information Systems 8 (1997) 5–28. [18] Daniel C. Vanderster, Nikitas J. Dimopoulosb, Rafael Parra-Hernandez, Randall J. Sobie, Resource allocation on computational grids using a utility model and the knapsack problem, Future Generation Computer Systems Journal (2008). [19] A.A. Freitas, Data Mining and Knowledge Discovery with Evolutionary Algorithms, Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2002. [20] J. Demsšar, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7 (2006) 1–30.

[21] E. Frank, K. Huber, Active Learning of Soft Rules for System Modelling, EUFIT’96, Aachen, September 1996, pp. 2–5. [22] K.P. Gross, Incremental multiple concept learning using experiments, in: Proceedings of the Fifth Internatinal Workshop on Machine Learning, Michigan, Ann Arbor 1988, pp. 22–28. [23] Y. Niquil, Guiding example acquisition by generating scenarios, in: Proceedings of the Ninth International Workshop on Machine Learning, 1992 pp. 348–354. [24] D. McSherry, Strategic induction of decision trees, Knowledge-Based Systems 12 (1999) 269–275. [25] Dae-Ki Kang, Kiwook Sohn, Learning decision trees with taxonomy of propositionalized attributes, Pattern Recognition 42 (2009) 84–92. [26] J.R. Cano, F. Herrera, M. Lozano, On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining, Applied Soft Computing 6 (3) (2006) 323–332. [27] Marina Guetova, Steffen Hölldobler, Hans-Peter Störr, Incremental Fuzzy Decision Trees, Advances in Artificial Intelligence, in: Computer Science, vol. 2479, 2002. [28] K. Kirchner, K.-H. Tölle, J. Krieter, Decision tree technique applied to pig farming datasets, Livestock Production Science 90 (2004) 191–200. [29] J. Mingers, An empirical comparison of selection methods for decision tree induction, Machine Learning 3 (4) (1989) 319–342. [30] L. Breiman, J.H. Friedman, R.A. Olshen, P.J. Stone, Classification and Regression Trees, Adsworth International Group, Belmont, CA, 1984. [31] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA. [32] J.R. Quinlan, Improved use of continuous attributes in C4.5, Journal of Artificial Intelligence Research 4 (1996) 77–90. [33] NPACI JOBLOG Job Trace Repository [Online], Available: http://joblog.npaci. edu/ (2000). Leyli Mohammad Khanli received her B.S. (1995) from Shahid Beheshti University Tehran, Iran, a M.S. (2000) from IUST (Iran University of Science and Technology) University and a Ph.D. degree (2007) from IUST (Iran University of Science and Technology) University. All are in computer engineering. She is currently assistant professor in the Department of Computer Science at Tabriz University. Her research interests include Grid computing and Quality of Service management.

Farnaz Mahan received her B.Sc. and M.Sc. degrees in Computer Engineering in 2002 (Iran Azad university) and 2005 (Iran University of Science and Technology, Tehran, Iran) respectively. Now she is a Ph.D student in the CS department, Tabriz University. Her research interests are power system restructuring, artificial intelligence and robotics, Grid computing, robotics and multi-agent systems.

Ayaz Isazadeh received a B.Sc. degree in Mathematics from the University of Tabriz in 1971, an M.S.Eng. degree in Electrical Engineering and Computer Science from Princeton University in 1978, and a Ph.D. degree in Computing and Information Science from Queen’s University, Ontario, Canada. He is an associate professor in the Department of Computer Science at the University of Tabriz.