Expert Systems with Applications 38 (2011) 6644–6656
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Agent-based platform to support the execution of parallel tasks David Sánchez a,⇑, David Isern a, Ángel Rodríguez-Rozas b, Antonio Moreno a a Intelligent Technologies for Advanced Knowledge Acquisition (ITAKA), Department of Computer Science and Mathematics, University Rovira i Virgili, Av. Països Catalans, 26, 43007 Tarragona, Catalonia, Spain b Center for Mathematics and its Applications, Department of Mathematics, Instituto Superior Técnico, Av. Rovisco Pais, 1049-001 Lisboa, Portugal
a r t i c l e
i n f o
a b s t r a c t
Keywords: Multi-agent system Mobile agents Grid computing Parallel computing
Parallel computing has been radically evolving in the recent years from the supercomputer multi-processor centralised point of view to the modern distributed approaches such as grid computing. The availability of relatively obsolete or underused hardware and the increasing LAN and WAN interconnection speed have motivated the success of those new paradigms. In this paper, we propose the use of agent technology to improve the management, flexibility and reusability of grid-like parallel computing architectures. We present a general purpose agent-based architecture which is able to manage and execute independent parallel tasks through one or several heterogeneous computer networks – or even Internet nodes – exploiting state of the art agent mobility capabilities. A particular application of the proposed architecture to support the execution of a complex knowledge acquisition task is also introduced, showing a high scalability and a very low overhead. Ó 2010 Elsevier Ltd. All rights reserved.
1. Introduction
private or public network which offer their hardware to execute task petitions of a master node. In a grid, the middleware is used to hide the heterogeneous nature of computer nodes and to provide users and applications with a homogeneous and smooth environment. Several advantages emerge with this approach such as the reduction of the cost that typically implies parallel execution, and the reuse of heterogeneous hardware which can be purchased independently, whose aggregation may provide computational resources similar to those of a multi-processor supercomputer. Another advantage is the scalability, when considering the amount of hardware available in geographically dispersed grids connected through Internet. Finally, the development of programs to be executed in a grid architecture (which are based on commonly used hardware and conventional OS) is typically simpler than those executing on a supercomputer with a particular CPU architecture and OS. The main disadvantage, however, is that the interconnection speed cannot compete against the high speed buses of a supercomputer. This arrangement is thus well suited to applications in which tasks can be executed independently, without the need to communicate intermediate results. Even though classical grid architectures offer a high degree of scalability and transparency, they present several difficulties such as resource discovery across virtual participants, dynamic construction of a virtual organization, the scale factor involved in potential resource-sharing environments, the code, task and system management through the distributed network, and the complex reusability of developed components due to the lack of interoperability among frameworks.
Artificial Intelligence techniques applied over real world processes usually involve the analysis of large amounts of data or huge information sources, and the execution of complex algorithms. In those cases, the computational capacity of a unique computer may not be enough, requiring the use of multiple-CPU configurations. Classical multi-processor approaches in the form of supercomputers solve this problem by offering massive CPU power in the form of an ad hoc and rigid architecture. Due to the high cost and reduced availability of this kind of systems, the use of multiprocessor solutions is very restricted (Ernemann, Hamscher, Schwiegelshohn, Yahyapour, & Streit, 2002). However, in the last years, due to the increasing availability of computer hardware both from relatively obsolete or underused resources and the increasing interconnection speeds both for local area networks and Internet, new distributed parallel computing paradigms such as grid computing have emerged (Kesselman & Foster, 2004). Grid computing is intended as network based computing, enabling seamless integration of computing systems and clusters, data storage, specialized networks and sophisticated analysis (Shi, Huang, Luo, Lin, & Zhang, 2006). Those approaches, such as the BOINC project,1 rely on computers connected to a ⇑ Corresponding author. Tel.: +34 977256563; fax: +34 977559710. E-mail address:
[email protected] (D. Sánchez). The Berkeley open infrastructure for network computing (BOINC) is a noncommercial middleware for volunteer and grid computing. BOINC is considered as a ‘‘quasi-supercomputer’’ because it has about 570000 active computers worldwide processing on average 2 petaflops as July 2009 (http://ct.boincstats.com/stats/ project_graph.php?pr=bo). 1
0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.11.073
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
In this paper, we propose the use of agents to model and implement distributed parallel frameworks, trying to overcome some of the shortcomings of grid-like architectures. In the last twenty years, agent technology (Wooldridge, 2009) has emerged as a promising computer engineering paradigm. Agents and, more generally, multi-agent systems (MAS) allow to model, at a very high level, heterogeneous and distributed systems and environments (Jennings, 2000). A natural design solution is to model tasks by means of agents. Agents act autonomously but they can cooperate in order to solve a complex problem. Agent-based systems offer added values thanks to agent features not found in classical software engineering paradigms such as elaborated communicative skills, flexibility and mobility capabilities, meeting the requirements of modern grid architectures such as dynamicity, distributed goal achievement and resource and environment awareness (Shi et al., 2006). Mobility, as will be argued later, is an interesting feature as it allows to transparently transport the code, execution state and data of an agent from one environment to another in which the execution may continue. Mobile agents are being used in areas of information retrieval (Lee, 2007a), network management (Manzoor & Nefti, 2009), workflow (Jie, Xian-xing, & Zheng-wei, 2004) and telecommunications (Thanh, 2001). In this paper, we analyse and exploit the multi-agent system paradigm in order to design and implement a new, high level, general purpose, flexible and robust platform for the parallel execution of tasks over one or several computer networks. The platform provides an efficient framework in which tasks can be executed by individual agents that are transparently managed and deployed over network nodes using an adequate load balancing policy. The main goals of the platform are to improve the generality, flexibility and reusability of grid-like approaches and ease their management. The developed platform has been evaluated in order to measure the overhead introduced by the platform and agent management, showing near optimum throughput for medium sized tasks. In addition, it has been exploited to support the execution of a complex Web-based knowledge acquisition system, which aims to learn domain ontologies automatically and unsupervisedly through a series of iterative learning steps (Sánchez & Moreno, 2008a). Due to its complexity, it requires large computing resources and presents a dynamic and non-deterministic execution. Those characteristics fit very well with the presented MAS platform, showing its potential usefulness to tackle real world problems. The rest of the paper is organised as follows. Section 2 describes the architecture of the proposal and the workflow defined by the interoperation between its components. Section 3 analyses the performance of the system through several tests which evaluate the overhead introduced by the main agent management activities. Section 4 presents the characteristics of the knowledge acquisition problem used as a real test and how it has been adapted to a parallel execution, evaluating the resulting speed-up. Section 5 presents related works applying agent technology to design parallel computing middlewares. The Section 6 presents the conclusions and some lines of future work.
2. Distributed agent platform The architecture proposed in this paper is based on a MAS platform and exploits agent characteristics to achieve the following goals: To manage and coordinate the parallel execution of tasks in a transparent way. The inherently distributed nature of MAS fits very naturally with this goal.
6645
To allow a great degree of flexibility and dynamic management and efficient advertisement and discovery of available computer nodes that may join or leave the network at any time. The dynamic capabilities of agent creation and management provides a suitable background for that purpose. To manage efficiently the resources provided by available computer networks. Scheduling policies can be designed according to the task characteristics and executed depending on the available computational resources. To ensure robustness and fail recovery in front of errors that may appear during the execution of tasks. MAS platform event monitoring provides a mechanism to detect problems occurring in the network, allowing the implementation of fail recovery policies. To develop generic components that can be easily adapted to several problems, allowing the optimization of resource usage and task execution. To achieve scalability, by minimising the overhead of the platform management. Agent management overhead will be taken into consideration and evaluated in relation to the total execution time. To ease the system management, centralising the updating of the software on the server side and supporting the propagation of software updates through the computer network in a transparent manner. Agent (code, data and execution state) mobility capabilities will be extensively exploited for that purpose. The only limitation is that concurrent tasks should be independent (i.e., the execution of a task cannot depend on other ones executed at the same time; however, they can use results from already finished ones), as inter-agent communication between concurrent tasks is not considered. Similarly to the grid computing paradigm, inter-task dependency would introduce an enormous overhead due to inter-node communication, as network speed is the typical bottleneck in these kinds of architectures (Joseph & Fellenstein, 2004). The full system has been implemented using the JADE agent development framework (Bellifemine, Caire, & Greenwood, 2007) and Java, providing the OS and hardware independency which characterises grid approaches. In this section we describe in detail the platform topology, its main components, the generic modules that allow a fine customization to concrete tasks, the platform management and task execution workflows, and the strategies implemented to detect events and provide fail recovery capabilities.
2.1. Platform topology The platform is organised in three main components (see Fig. 1): a server, a collection of nodes, and an interface with the user. The server resides in a computer with a known and publicly available IP address. It provides an execution environment in which the main components of the MAS can be executed and from which tasks to be executed are managed, events from the network can be monitored and the interaction with the final user is supported. It monitors the available computer nodes and initiates, distributes and finalizes the agents that will execute tasks. The nodes are computers located on one or several interconnected computer networks (via LAN or WAN), which share their hardware resources (CPU, memory, storage, Internet bandwidth) to execute tasks. Client nodes can be incorporated dynamically at any moment by registering themselves into the server, providing local information about their hardware characteristics. They will host the agents that will execute the tasks requested by the server at each moment.
6646
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
Fig. 1. Agent-based architecture.
The user side represents the final human client, who interacts with the system to request the execution of new tasks and to visualise partial or complete results. 2.2. Platform components Being inherently distributed the platform components are deployed in different locations. The locations in which agents can be executed are containers. Two types of containers are defined: Main container: it is located in the server and hosts the main agents which are in charge of the system and task execution management. Its location (IP and port) should be known and publicly available for any computer node which joins the platform. Node containers: each node which offers its hardware resources to support the execution of tasks also hosts a container in which agents can be deployed and executed. Container initialization is performed during a process of registration in which a special agent informs the server about the node location and hardware resources available. 2.2.1. Main container components As shown in Fig. 1, the main container contains hosts two types of agents. The grid manager agent (GMA) is the core of the platform, and intervenes in most of its decisions and actions. On the one hand, it offers a registration service by means of which computer nodes can
offer their hardware. On the other hand, it receives requests from the user that are translated in tasks to be executed by means of special agents in one or several computer nodes. It also stores partial results of the different tasks and presents them to the user. Finally, it implements a fail recovery mechanism to ensure the proper execution of tasks running on potentially failing nodes. Each of the presented features is implemented in a different agent behaviour which can be executed concurrently. The event manager agent (EMA) continuously monitors agent events across the whole platform by subscription to the JADE event manager. In this manner, the JADE environment informs the agent when a concrete event (expressed by means of an ad hoc agent ontology) has occurred. So, it is able to detect agent and container creations and destructions which may implicitly indicate software or hardware failures in a particular node. As the GMA is informed about unexpected destructions, it will be able to take the appropriate actions to ensure the finalization of tasks. 2.2.2. Server services Main container agents rely on several services which are also running on the server side. The Node Manager is a service which manages the computational resources available in the nodes at a certain moment. It stores the hardware configurations for the different nodes registered (and alive) at a certain moment and the task assignments which are currently executing in a particular location. It also implements a load balancing policy in order to select, at each moment, the most adequate free node in which to execute a task
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
request, depending on the node hardware and the task requirements. Depending on a node hardware and the load balancing policy it also decides the amount of tasks which a node may host in a particular moment (e.g., for non-multi-threaded CPU intensive tasks, a node with two CPUs may host up to two tasks at each moment). The node task execution capacity is what we call execution slot. The number of free execution slots is the available task execution capacity of a node at a certain moment. The ID Manager is a name server for new agents. All agents deployed through a distributed platform should have a unique name; this component assures that unique identifiers are assigned for dynamically created agents. The Request Server is the bridge between the user side and the agent platform. This component provides a gateway to translate all requests and results between the user and the platform. It is also able to receive information from the agents (e.g., the GMA may indicate that there are not enough resources available to fulfil a task execution request), which may be presented to the user via a Web interface. 2.2.3. Node container components When a local node container is set-up, a registry agent (RA) is automatically created. The RA is in charge of registering the node into the server by providing its physical location (IP) and a detailed configuration of the computer node, obtained after a hardware inspection. When the registering process is confirmed by the GMA, as the RA is not needed anymore, it destroys itself in order to free resources. 2.2.4. Mobile agents The latest component of the platform are the dynamically created agents called working agents (WA), which are in charge of executing tasks in a network node, in function of the available nodes and the scheduling policy. A WA is a generic component that supports the execution of a particular task, embedding the source code and all the required resources. It is dynamically created by the GMA when a new task request is received from the user. Then, according to the scheduling policy, it is sent from the main container to a node container in which it will be executed. All data and code travels through the network in order to ensure a transparent execution of the task at the destination without introducing any software requirements (i.e., it is not necessary to install or copy required libraries or source codes). Agent mobility capabilities are used to implement this functionality. When the task execution finishes, WAs return the results to the GMA and they self-destroy in order to free resources. 2.3. Generic components In order to adapt the system’s behaviour to a particular problem, the system incorporates several generic components that allow a fine customization of the task execution and load balancing policy: Job: it is an abstract class which embeds the task to be executed. Its most important goal is to host the source code and the resources needed for the execution of a task (i.e., a list of input or output files, object instantiations and libraries). By changing these data, one is able to change the task to be executed, making the system generic and adaptable to any kind of problem. It also stores the hardware requirements of the task, which allows the Node Manager to decide where to execute it according to the load balancing policy. Finally, it stores dynamic information about the location in which the task has to be executed (according to the scheduling) and its state. This information is
6647
consulted by the WA in order to decide where to move and when to return the results and destroy itself. WA embeds and transports (via agent code mobility capabilities) Job instances, avoiding the necessity of creating special WAs for each task to be executed and to pre-install concrete software on the client side, easing the management and bringing transparency to the execution. Node information: it is a data structure which provides information about the hardware and software characteristics of a node. By default, it supports the identification of the architecture, number and frequency of processors, total and free amount of physical, virtual and JAVA heap memory size and OS type and version. These data are automatically retrieved by means of a hardware and software inspection of the node performed by the RA at start-up and sent to the server to be stored and managed by the Node Manager. However, this can be extended by supporting other attributes and implementing measurement procedures (e.g., Internet bandwidth, available storage, etc.). Node information also provides information about the physical location including the IP and port of the node container. Scheduler: it implements the load balancing policy which allows to decide, at each moment, which is the most suitable node to execute a task according to the available hardware and the task requirements. Several standard policies based on the Round Robin algorithm are supported according to the hardware information provided by the Node Information data structure. CPU intensive tasks can be assigned according to the number of free slots of a computer node, prioritized by individual CPU speed. Tasks consuming a large quantity of RAM can be assigned according to the amount of free memory and prioritized according to the number and speed of CPUs. More complex scheduling policies or a combination of several of them can be designed by rewriting some parts of the Scheduler in order to implement the heuristics and strategies necessary to adapt the system’s behaviour to the tasks’ hardware requirements. This allows a fine tuning of the scheduling process and an optimum use of the available hardware. As stated above, the Scheduler is embedded in the Node Manager service.
2.4. Platform management and task execution In this section, we detail the life cycle of the platform, from the start-up of the server and the execution nodes to the request and execution of tasks. The process starts when the server agent module is initialised in conjunction with the Web server in a particular computer. The main container and the basic agents are created and the services (detailed in Sections 2.2.1 and 2.2.2) initialized. From that moment, the server is able to receive task execution requests from the user through the request server and to register offers from computer nodes. On the one hand, task execution requests are stored as Job class instantiations in a queue in which they wait until free node slots are available. The server informs via the requester service to the user about the task assignment and the amount of resources available in the whole system at each moment. On the other hand, nodes can start up dynamically. Each node is incorporated into the system via a registering process in which the node container and an associated RA are created. The RA inspects all hardware and software information and it sends the details to the GMA via an agent message. After the registration process is confirmed by the GMA, the RA destroys itself in order to free resources. This communication workflow follows the standard FIPA–Request communication protocol (FIPA, 2002a). The received data are stored by the Node Manager which, according to the hardware configuration
6648
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
and the implemented scheduling policy, will assign a set of free slots to the concrete node. The GMA processes task requests in the following manner. Whenever a task request is received, the task queue is processed in a FIFO manner, extracting Job instantiations which encapsulate the task (source code) and the associated resources if necessary (provided input files or serialized objects) (Fig. 2, Step 1). The Job instantiation with the task hardware requirements (if specified) is analysed by the Node Manager which decides, if there are available slots, in which free slot the task should be executed according to the implemented scheduling policy (a default or an ad-hoc one). The Node Manager marks the slot as busy and assigned to that task and stores the information about its location (concrete node container details) in the Job object, as stated in Section 2.3. The Job instantiation is also stored until the task is completely finished (i.e., results have been received) in order to support fail recovery, as will be detailed in Section 2.5. Then, the GMA dynamically creates a new WA which receives as argument the Job instantiation (Fig. 2, Step 2). The WA identifier is requested to the ID Manager in order to ensure its uniqueness. The GMA creates a new behaviour to process the results from the execution of the concrete WA. The WA checks its destination by consulting the Job instantiation and it moves itself through the network (agent code and state and Job instantiation with the task code, parameters and resources) to the assigned destination (Fig. 2, Step 3). The mobility of the code is supported by JADE by extensively using the Java object serialization capabilities. When arrived at the destination node container, the WA starts to execute the task exploiting the hardware resources of the node (Fig. 2, Step 4). When the execution finishes, it sends the result in the same Job instance to the GMA via an agent message in the context of a FIPA-Request communication protocol (Fig. 2, Step 5). It was decided to return the results via an explicit message (instead of moving again to the server) for efficiency (more details in Section 3). When an acknowledgement of the message is received from the GMA, the WA destroys itself in order to free the resources (Fig. 2, Step 6). Meanwhile, the GMA informs the Node Manager about the task finalization (the associated slot will be marked as free). Next, the user will be informed about the result via the Web interface. Finally, the GMA finishes the behaviour associated to the concrete task execution and the Job instance is removed from the Node Manager. It is important to note that no special software (apart from the JADE framework and the Java VM) is needed at the node side to
support the task execution. This is a very flexible mechanism for platform management as only one copy of the task source code is needed in the server side. In this manner, the task can be changed and updated easily and transparently to the rest of the platform. This supposes an added value with respect to grid-based executing environments in which a special software package should be installed and updated at the client side. 2.5. Event management and fail recovery Any distributed system based on decentralized, distributed and potentially unreliable computer resources needs to implement a mechanism to detect and recover software or hardware errors in order to ensure the proper finalisation of tasks. One way of achieving fault tolerance can be to detect the crash of an agent and, upon such an event, launch a new one (Summiya, Ijaz, Manzoor, & Shahid, 2006). Faults and failures can be masked either through redundancy, replication or process resilience (Greaves, StavridouColeman, & Laddaga, 2004; Lyu, Chen, & Wong, 2004). In our approach, we have designed a fail recovery mechanism based on agent failure detection, data (Job) redundancy and agent replication, like in (Manzoor & Nefti, 2009). As introduced in Section 2.2.1, a special agent (EMA) monitors all events occurring in the platform. EMA uses the JADE’s internal event mechanism that informs about a wide range of events, such as the creation of agents, the addition of new containers, and the death of any agent. Whenever an unexpected agent or container failure is detected, the EMA informs the GMA about the situation, passing the node identifier. This communication is sent asynchronously using the FIPA-Subscribe communication protocol (FIPA, 2002b). Agent or container failures may indicate software crashes (i.e., the task has implementation errors), OS and hardware errors (e.g., computer reset or shutdown) or unreachable network (i.e., the JADE environment is unable to communicate with the container). In those cases, the node and its associated slots will be removed from the list of available ones. A re-registering process is needed to incorporate again the node to the platform. In order to implement a recovery mechanism of unfinished tasks, the GMA asks the Node Manager about the task or tasks (in the case of several slots associated to the same node) which were executing in the node associated to the crashing container. The GMA finishes the behaviours associated to the reception of results for those task executions. Next, the concrete Job instances are recovered from the Node Manager (remember that a copy of all assigned jobs remains in the server, as depicted in Step 2 of Fig. 2), reinitialized (i.e., the assigned location is erased) and reassigned if free slots are available, following the same procedure explained previously in Section 2.4. If there are not any free slots, they are stored in the task queue for future assignments. In order to avoid infinite re-assignments of potentially erroneous tasks (i.e., the task code or the input data may cause an execution error which will re-appear at each execution), each recovered Job is tagged with a mark indicating the number of times it has failed. If a certain amount of fails have been detected for the same task, it will be discarded as it is assumed that it is a task crash rather than an arbitrary hardware or OS malfunction. Summarising, thanks to the node dynamic registration service and the agent monitoring capabilities, a very flexible and transparent fail recovery mechanism has been implemented to ensure the proper finalization of requested tasks. 3. Evaluation of overheads and scalability
Fig. 2. Life cycle of the execution of a task by mobile agents.
In order to evaluate the scalability of the approach as a general purpose parallel execution environment, in this section we analyse
6649
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
the main overheads which the agent, container and platform management introduce both in the server and in the execution nodes. All the tests reported in this section have been performed within a 100 Mb LAN of identical computers consisting on AMD Athlon 64 2.2 GHz CPUs with 2 GB of RAM. The OS for all of them is Windows XP SP3, running the Java VM 1.5.0_19 and the JADE 3.3 agent framework. Due to the variability observed in some tests (which is mainly caused by OS management activities), all reported values are the result of the averaged execution of the same process three consecutive times. 3.1. Memory overhead First, we measured the memory consumption of the different elements that compose the MAS platform. Concrete values are shown in Table 1. One can see that once the Java and JADE environments are running both in the server and in the nodes, the memory overhead introduced by the new agents, Job instantiations and behaviours is low. As a result, a server with 2 GB of RAM can easily manage more than a thousand nodes and hundreds of active tasks. A node can also potentially hold hundreds of execution slots in which WA may execute, but this number will be limited by the hardware configuration and the scheduling policy. It is important to note, however, that all those tests have been performed with dummy tasks (i.e., without input or output data). In consequence, for tasks involving the processing of a certain amount of data, the memory consumption will grow linearly. As the server, and more concretely the main container, is the central point of the platform, we have also evaluated the growth of the memory consumption in function of the number of registered nodes. Results are shown in Fig. 3. One can see that the increase in memory consumption is approximately constant as new nodes are added to the network. So, we can conclude that the memory consumption on the server side grows linearly according to the number of registered nodes, showing the scalability of the platform in front of a growing network of computers.
Fig. 3. Memory consumption in terms of used nodes.
Table 2 CPU usage for the main platform management activities. Activity
Runtime (ms)
Main container initialization, involving Java VM, JADE, container, agent and services start-up
1930
Node container initialization, involving Java VM, JADE, container and RA start-up, hardware inspection and registration on the server
1100
New (empty) task processed by the server, involving the instantiation of a Job, scheduling execution and WA creation
15
WA (empty) movement, involving serialization, encapsulation and transmission in the server and reception, deencapsulation and de-serialization in the node
78
Agent (empty) message sending and confirmation reception (e.g., registering service or task result transmission)
10
those tests involve empty tasks and results, showing only the CPU overhead of agent management activities.
3.2. CPU overhead 3.3. Parallel execution throughput evaluation Next, we measured the CPU runtime of the activities involved in the platform management, considering the hardware configuration described above. Concrete values are shown in Table 2. Results follow a similar tendency as observed for the memory consumption, indicating that, after the main initialization, the CPU runtime of common task-related activities and data transmission through the network is low. However, the degree in which the reported overheads influence the throughput of the system will depend on their rate against the size (i.e., runtime) of the task to execute. This aspect will be evaluated in the next section. Again, all
Table 1 Memory consumption for the main platform components. Component
Memory consumption
Main container after initialization, including Java VM + JADE + container + agents + services Node container after registering, including Java VM + JADE + container New node registered in the server New (empty) task request processed and stored in the server, including Job instantiation, WA and behaviour creation New (empty) WA received in a node
28.3 MB 16.5 MB 250 KB 370 KB
170 KB
The next test is aimed to evaluate the parallel throughput of the system (involving the overheads described in the previous section) for CPU intensive tasks. Sets of tasks with different sizes (with respect to CPU runtime) were executed on several node configurations with different degrees of parallelism. The objective is to show the kind of tasks in which the system will be able to offer a near optimum throughput. In this test, we have designed a CPU intensive task which executes itself with a 100% CPU usage for a fixed amount of time. The execution request of a certain amount of those tasks is passed to the server which will have the same amount of available node computers to execute them (i.e., we will consider the maximum degree of parallelism). The scheduling policy simply prioritizes free nodes (i.e., no task is executing in any of the potential slots of its container). The runtime from the tasks request to the reception of the last result from the WAs is computed and compared against the time required to execute the same amount of tasks in a sequential implementation, computing the speed-up factor. Results for a task with a fixed runtime of 60 s, considering scenarios ranging from 1 to 64 tasks and nodes, are shown in Table 3. Considering that, in all cases, we have the maximum degree of parallelism available (i.e., one node per task), the ideal parallel
6650
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
Table 3 Runtimes, overheads and speed-up factors for several scenarios with tasks with a fixed runtime of 60 s. # Tasks and nodes
Sequential runtime (ms)
Parallel runtime (ms)
Total CPU management overhead (ms)
CPU overhead per task
Parallel speed-up factor
1 2 4 8 16 32 64
60,000 120,000 240,000 480,000 960,000 1,920,000 3,840,000
60,108 60,206 60,394 60,844 61,716 63,666 67,912
108 206 394 844 1716 3666 7912
108 103 98.5 105.5 107.2 114.5 123.6
0.998 1.993 3.9739 7.889 15.555 30.157 56.54
runtime will be the runtime of an individual task (60 s). So, it is easy to compute the CPU overhead introduced by the platform management which will affect and lower the ideal speed-up factor of the parallel execution (which should be the same as the total amount of computer nodes). We can see that the overhead grows linearly in relation to the number of tasks as agent management activities are executed sequentially on the server side. Averaging the CPU overhead by the number of tasks involved in each test, we can observe that the runtime corresponds approximately to the sum of the activities involved in the task management (Job instantiation, WA creation, WA movement, result notification), whose runtime has been reported in the previous section. The averaged value varies slightly from one test to another. First, we can observe a lowering tendency with respect to the sequential case because, as task parallelism increases, some of the agent management tasks are also parallelized. In fact, even though server activities are executed sequentially, activities performed at the node side (i.e., WA reception, de-encapsulation and de-serialization, and result message sending) are executed in parallel. This tendency is observed up to 4 parallel tasks. Then it starts to grow proportionally to the number of tasks due to the concurrency overhead of the server. It is important to note that, due to the scenario characteristics (i.e., all tasks last the same time), all task requests and results from the nodes are received at the same time, forcing a great degree of concurrency at the server side (i.e., as many behaviours as tasks are being executed at the same time). This is the worst situation from the execution point of view as, in a real setting, it is very unlikely that all tasks last exactly the same amount of time. Due to the relation between the total CPU overhead (ranging from hundreds of ms to several seconds) against the task runtime (60 s), the speed-up factor of the parallel execution against the sequential setting is high. However, is this factor affected if the task runtime varies? In order to test it, we executed several tests involving tasks with fixed runtime ranging from 100 ms (i.e., around the same value as the minimum overhead) to 60 min. The evolution of the speed-up factors in relation to the degree of parallelism for all those scenarios is shown in Fig. 4. Considering that the management is independent on the CPU runtime of the concrete tasks, it is maintained constant through the different tests. In this manner, the higher the task runtime, the higher the speed-up factor as the relation between the CPU usage of overhead activities and task execution lowers. Observing the figure, for 60 min tasks, the speed-up is almost optimum, lowering significantly as we approach the range of the overhead runtime (i.e., seconds). In the case in which the tasks runtime is below the overhead (i.e., 100 ms) the execution performance of the parallel setting is, as expected, worse than its sequential counterpart. This shows the limitations of a grid-like parallel execution paradigm, in which tasks should be heavy enough to justify the overhead of the distributed execution management. It is also important to note that tasks are independent and no inter-task
Fig. 4. Speed-ups obtained for tasks of different sizes.
messages are needed for their execution. If that were the case, network communication (even at an Internet scale) may also severely hamper the performance. This is why grid approaches are meant to support independent tasks (Joseph & Fellenstein, 2004). In any case, we can conclude that, for medium-sized tasks (runtime measure in minutes), which are the typically involved in grid architectures, the proposed architecture offers near optimum throughputs. 4. Case of study: knowledge acquisition from the Web In this section we describe how the proposed platform has been adapted to support the implementation and execution of a real algorithm: a Web-based domain ontology learning system. Due to its complexity, the amount of data to process and the characteristics of the tasks involved in the knowledge acquisition process, the problem are very appropriate as an example of application of the proposed platform. In the following sections, we will present the different aspects that have been taken into consideration to model and adapt the problem to the distributed platform (i.e., task analysis, computational complexity, task modelling, scheduling policy, etc.) and the benefits achieved in terms of performance from the runtime point of view. The hardware configuration from which the throughput measures will be provided is the same as described in Section 3. 4.1. Domain ontology learning from the Web Ontologies (Fensel, 2001) allow the definition of standard machine readable representations of knowledge. They are crucial components in many knowledge intensive areas like the Semantic Web (Berners-Lee, Hendler, & Lassila, 2001), knowledge management, and electronic commerce. The construction of domain ontologies relies on knowledge engineers, which are typically overwhelmed by the size and complexity of domain knowledge. In this sense, automated ontology learning methods allow a reduction in the time and effort needed to develop ontologies. In the past, we developed an automatic, unsupervised and incremental method for domain ontology learning (Sánchez & Moreno, 2008a, 2008b), which uses the Web as the source of knowledge. In a nut shell, it uses several knowledge acquisition and text processing techniques to extract domain concepts and relations from the analysis of thousands of textual Web resources. The Web corpus is retrieved by means of a standard Web search engine that is also used to compute Web-scale statistics about information distribution. As depicted in Fig. 5, the learning process
6651
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
Fig. 5. Overview of the domain ontology learning.
starts from scratch and it is divided in several steps that are iteratively executed as new knowledge is acquired. The process begins with a keyword that defines the domain to explore (e.g., cancer). By means of querying it in a Web search engine, a set of resources are retrieved and syntactically analysed. Pattern-based regular expressions are exploited to detect semantic relationships implicitly stated in the text. As a result, a set of related candidate concepts are extracted. A statistical analysis aiming to estimate their degree of relevance for the input keyword is performed by exploiting Web scale statistics (Sánchez & Moreno, 2008a, 2008b). The more relevant ones are selected as new taxonomically (e.g., melanoma is a cancer) and non-taxonomically (e.g., cancer is treated with radiotherapy) related concepts for the input domain. Those are recursively used as seeds for further analyses, composing a multi-level semantic structure (taxonomically and non-taxonomically interrelated concepts). The system implements a self-controlling mechanism using feedback information about the learning to decide which concepts to explore and when to stop the analysis. Considering the learning workflow, the system presents a multi-level tree-like expansion as shown in Fig. 6. As stated above, the system exploits the Web as a corpus and Web search engines to perform the learning process. Considering the potentially enormous amount of Web resources to evaluate, the runtime required to iteratively query Web search engines and access, download and analyse thousands of Web sites can be quite overwhelming for a mono-computer environment. Certainly, not only CPU power is needed but also Internet bandwidth to perform online queries and RAM to store partial data and pattern recognition training files needed for analysing natural language text. In consequence, as shown in Table 4, using a sequential implementation with one computer, the runtime needed to learn domain ontology may easily take several hours. After an empirical study, we observed that the main bottleneck is the number of Web accesses (mainly Web search engine
Fig. 6. Learning expansion of the concept C with x taxonomic relations and y nontaxonomic relations.
Table 4 Number of results and runtime for several domains running on one computer. Domain ontology
# Taxonomic concepts
# Non-taxonomic concepts
Runtime (h)
Insect CPU Tea
668 134 236
236 121 1430
10 6 17
queries). This time is, in general, several orders of magnitude higher than the time needed for text analysis. In addition, Web servers and search engines introduce overheads when consecutive accesses from the same machine (i.e., IP) are performed. In fact, as shown in Fig. 7, the runtime depends linearly on the number of queries performed to search engines. Considering the tree-like learning expansion behaviour presented in Fig. 6, being T the runtime needed to analyse the resources related to a particular concept, the computational cost for one computer where ontological terms are sequentially evaluated is a polynomic function, given in Eq. (1):
6652
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
inferior to 4). The runtime (T) required to perform the analysis of a concept depends linearly on the Web queries for statistics that, at the same time, depend linearly on the number of retrieved concepts. This time is multiplied as many times as taxonomically and non-taxonomically related concepts have been retrieved. Considering the orders of magnitude managed (T is typically minutes and the number of taxonomically and non-taxonomically related concepts is hundreds) one can easily realize that a sequential analysis of a general domain in one computer is not computationally feasible. 4.2. Task modelling
Fig. 7. Runtime dependency on the Web search queries.
O ¼ T ðtaxo concepts þ notaxo conceptsÞmaxðtaxo
depth;notaxo depthÞ
:
ð1Þ
The cost depends on the number of taxonomically and nontaxonomically related concepts retrieved at each iteration. The exponent is the maximum depth of the relationships (usually
Taking into consideration the presented tree-like expansion, a certain number of concurrent and independent tasks can be executing at each moment (different analyses for each new concept). This workflow, in conjunction with the heavy-weight nature of the tasks, is adequate for the grid-like computational model supported by the proposed platform. In our case, the parallelization benefits are not only related to the CPU, but also to other resources such as the Internet bandwidth or system memory. This is especially important as the parallel execution of various learning tasks through several computers can
Fig. 8. Web applet for requesting concept analysis and visualizing results.
6653
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
reduce the overhead of Web access, minimizing the execution waits and Web search engine restrictions thanks to the distributed and concurrent access from, ideally, different IP addresses. In order to adapt the ontology learning execution to the proposed agent-based platform, and thanks to its generic and high level design, we only had to define the following components: The analysis of each concept has been modelled as a task, by extending the mentioned abstract class Job, defining the appropriate input parameters (search parameters, already acquired concepts, pattern files) and the output (files representing partial ontologies). The class constructor and the main action method are associated with the source code of the initialization and execution of the concept analysis. The scheduling policy has been overwritten according to the concrete task requirements. Considering that our computers have the same Internet bandwidth, the differential factors are the CPU and the amount of available RAM. This last one defined the maximum number of task instances running on the same computer, as each task requires at least 100 MB due to the size of patterns files loaded to perform linguistic analyses. CPU use is not constant (due to the waits between Web accesses) allowing the concurrent execution of several tasks on the same CPU. In consequence, we have designed a Scheduler that defines free slots on available nodes according to the amount of free RAM (i.e., a node with 1 GB free RAM may host up to 10 tasks) and prioritizes the assignment according to the number of CPU cores/processors and their speed (i.e., slots corresponding to more powerful nodes are assigned first). As the information about free RAM and CPU characteristics is supported by default by the hardware inspector executed during the node registering process, no modifications were needed. In order to allow user interaction to control and supervise the learning process, a Web-based interface has been designed (shown in Fig. 8). It allows to program domains to explore, to request the analysis of domain concepts and the visualization of partial results. The user also has information about the number of free slots and the running tasks. 4.3. Parallel execution performance In this section we offer an analysis of the performance obtained for the modelled problem under different execution scenarios and levels of parallelism. The first test consists on picking up four tasks of similar complexity (4 immediate subclasses of the Cancer domain) and to execute them, using the same parameters, in the following scenarios (see Table 5): Scenario 1: One node with a unique free slot. Each task will be modelled by a WA and executed sequentially. The final runtime is computed by adding each individual runtime. Scenario 2: Two computers with a unique free slot on each one. Two tasks (modelled over WAs) are sequentially executed in one computer in parallel with the pair of tasks executed in the other node. The time is the maximum of both sequential executions. Scenario 3: Four computers with one slot on each one. Each task is modelled over a WA, and the four of them are executed in parallel. The final time is the maximum of the four executions. One can see that the improvement is, as expected, very significant and proportional to the degree of parallelism (see Fig. 9). The net benefit, however, is lower than in previous CPU-intensive identical and deterministic tasks as, in this case, other variables are involved (non-deterministic Web accesses, large data files, etc.) and
Table 5 Performance for four similar learning tasks with different levels of parallelism. Domain
Scenario 1 (s)
Scenario 2 (s)
Scenario 3 (s)
Breast cancer Lung cancer Colon cancer Ovarian cancer
1083 980 627 715
1093 992 667 812
1095 1029 705 841
Total
3405
2085
1095
Fig. 9. Performance increase in relation to the degree of parallelism (number of nodes).
the tasks execution times are different, introducing a sequential delay waiting for the latest task to finish (in Scenarios 2 and 3). Again, considering the size of the tasks, the overhead introduced by the agent and platform management is negligible. The following test shows the workflow of a full parallel execution of a complete example. In this case, we selected one domain and executed 2 taxonomic levels sequentially (using one computer) and in parallel (using 4 computers) with automatic distribution of the work load in function of the implemented scheduling policy. Using a wide domain (Cancer) and executing the taxonomic learning for the first two iterations, the results are 49 immediate subclasses that should be analysed. This process takes, in one computer, a total of 16505 s. Performing the same execution in parallel with 4 computers (a unique slot for each one), the total runtime is lowered till 5634 s. This supposes a performance improvement of 292% with a hardware increase of 400%. Examining the execution trace and representing the task-node assignment at each moment, we can compose the Gantt diagram shown in Fig. 10. As stated above, the main factor that compromises the parallel performance are the non-parallel intervals at the beginning (necessary to obtain a first set of concepts to analyse) and at the end (needed to finish the latest task). In consequence, the potential improvement of this parallel approach is higher as more tasks (concepts) are available. In a complex learning process (involving hundreds of multi-level taxonomic and non-taxonomic analysis) the percentage of fully parallel execution intervals will be much higher in relation to the sequential parts, and the throughput improvement will tend to be similar to the hardware resources provided. Regarding the runtime required for each task, as stated in Section 4.1, in a sequential approach it depends linearly on the number of queries for statistics performed to the Web search engine (see Fig. 11). In this case, however, there is more variability due to the higher degree of parallelism.
6654
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
Fig. 10. Execution of taxonomic learning tasks on four computers for the cancer domain.
Fig. 11. Number of queries vs. runtime for each subclass of the cancer domain.
Considering this execution environment, we can compare it to the sequential approach from a temporal point of view. As presented in Section 4.1, the cost in a sequential approach is a polynomic function of the number of concepts retrieved at each iteration multiplied an amount of times defined by the maximum depth of the discovered relationships. Without considering the limitations introduced by the available hardware, in the distributed approach we can potentially parallelise the full set of retrieved concepts, reducing the runtime of one iteration to T (the time required to evaluate one concept). At the end, we are able to obtain a runtime of Tmax_depth, where the exponent is the maximum depth of the taxonomic and non-taxonomic relationships (among 2 and 4). In consequence, we can reduce the time from to T(taxo_concepts + notaxo_concepts)max_depth to Tmax_depth using a (taxo_concepts + notaxo_concepts) degree of parallelism. In the real world, however, it is very unlikely to have such an amount of available hardware and, in consequence, the real runtime will depend on the maximum degree of parallelism that we are able to achieve. Although one computer may host several tasks (in function of the available slots defined by the Node Manager), in our tests we have determined that one node with enough hardware resources (i.e., 2 Gigs of RAM, Pentium 4 CPU or later) is able to execute around 10 tasks (and WAs) before the performance is degraded due to the concurrence overhead. At the end, with a moderate amount of hardware, for this particular case, the parallel performance increase can suppose an improvement of one order of magnitude (from hours to minutes).
5. Related work Several previous works have proposed the use of agents to distribute a set of tasks among the nodes of a computer network. A
representative selection of them is summarised in the following paragraphs. In the work presented in Khan, Vaithianathan, Sivoncik, and Bölöni (2003), the set of tasks to be executed in a particular application is represented in a directed graph (called a Grid). The nodes of the graph are the tasks, whereas the directed edges represent data staging operations. There is a unique grid control center agent (GCCA), and a set of Grid Agents, one for each host. The scheduler of the GCCA decides which node has to execute each task, and also schedules the transfers of data from one node to another. The actual scheduling strategies can be static (computed before the application is started) or dynamic (decisions are taken at runtime). An important drawback of this proposal is that, unlike the system described in this paper, the code to be executed has to be installed in the network nodes before the application is executed. The proposed framework does not address the recovery from node failures. In Tang and Zhang (2006), a task is defined as a set of modules, which are the basic execution units. The definition of each module includes a module description, the executables, the serialization, and the module owned files. Within the module description there is a task section which contains, among other data, the minimum hardware requirements that a computer must satisfy to run the task. Each node of the network has, among other agents, a profiling agent, in which the node’s hardware characteristics are registered, and a computing agent, that uses a resource matrix to track the status of a number of computing resources. In the scheduling process, which is totally dynamic and distributed, computing agents of different nodes share their information on resource usage and decide the node in which a module has to be executed. As in the system mentioned above, executables must be uploaded to the nodes in advance, and methods to solve node failures are not given. ARMS is an agent-based resource management system for grid computing (Cao, 2004). The system is built of a hierarchy of homogeneous agents, which represent local grid resources and have their basic information (IP address, port, processor type, number of processors). Each agent uses a Capacity Table to record service information of other agents, which can be used to estimate their performance. An agent can transmit this information to his parent and his children in the hierarchy. Each of the agents contains a scheduler component that, taking into account the estimated performances of known agents, calculates the node in which the task to be assigned will have an earliest end time. The system does not provide any way to automatically recover from node failures. MobiGrid proposes a framework for task execution based on mobile agents (Barbosa & Goldman, 2004). The basic idea is to use the idle computers in a network. A task, assumed to be a long running application, is encapsulated in a mobile agent. A client node can submit tasks to a network server. Each server has a manager, which registers the task, searches for an idle machine, and dispatches the task there. If the machine user requests his machine, the task is migrated to another machine, preserving the processing already done. In order to recover from failures, the manager creates a clone of each task and dispatches it to another machine. If one of the clones dies, the manager makes a copy of the one that is still alive and dispatches it to another machine. Lee also suggests the use of mobile agents to take profit of idle time in the computers of a network (Lee, 2007b). His proposal is centred in the parallelisation of evolutionary algorithms. There is a main host and a set of additional hosts. Each host has a status agent (SA) that keeps the information of the individual machine state (e.g., CPU use). The main host has a mobile agent (MA), which clones itself with the evolutionary algorithm code for each node and dispatches a duplicate to each available machine. Each of these agents is responsible of executing a given number of iterations on a sub-population. If a user needs his machine at a certain point, the status agent of that node detects this situation and informs the MA
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
of that machine, which suspends its execution, inquires availability of other machines from SA, and moves to a new one to resume its execution. When all the MAs have completed a certain number of iterations, they enter in a communication phase in which they exchange some individuals of their sub-populations; after that, they start another round of iterations of the evolutionary algorithm. Another system in which mobile agents are used to distribute dynamically the execution of different tasks in a network is called – mobile agent load (MALD) balancing – (Cao, Sun, Wang, & Das, 2003). Each server in the network has three agents. The server management agent (SMA) monitors the work load in the server, which basically depends on the job queue length, the number of active connections of the server, and the percentage of free memory space. The local information agent (LIA) of each server is a mobile agent that travels around the servers, collecting and propagating the load information. When the SMA detects that the workload exceeds a given threshold, it activates the LIA to collect the network load information and it also initiates a job delivery agent (JDA), which selects another server to receive the reallocated job and carries the job to that server. If the destination server refuses to accept the job (e.g., because it is also overloaded), the JDA can find an alternate server to receive the job using the upto-date load information it just collected on the way. Another interesting proposal for distributing work to the nodes of a network using mobile agents is given in Fukuda, Kashiwagi, and Kobayashi (2006). There is a Commander Agent in each node that receives job execution requests from users. When a request arrives, this agent starts a Resource Agent that uses local data repositories to discover the computing nodes best fitted to the job’s resource requirements. The commander agent spawns as many Sentinel Agents and Bookkeeper agents as the number of nodes needed for the execution of the job. Each Sentinel launches a user program wrapper that starts a user process and records its execution snapshots, which are sent to and maintained by the corresponding bookkeeper agent (to be able to retrieve a user process if it crashes). Each agent may also autonomously migrate to a lighter-loaded machine. At the end, all results are forwarded to the commander agent that presents them to the user.
6. Conclusions Agent technologies have proved to be an appropriate solution to model complex systems (Wooldridge, 2009). Nowadays, they represent a mature technology that can be considered as the latest software engineering paradigm. One of the fundamental characteristics of multi-agent systems is their distributed nature. We have used this feature to design and implement a novel environment for parallel computing based on mobile agents. The designed system has been evaluated in order to show its scalability in function of the available level or parallelism and its behaviour in front of tasks with different degrees of granularity. It has also been extensively tested using a complex real world problem for large-scale Web knowledge acquisition. Performance results indicate that the system scales well and introduces minimum overheads for medium-sized tasks which are commonly found in grid-like computing architectures (Joseph & Fellenstein, 2004). In comparison to other approaches presented in the previous section, the benefits offered by the designed platform are: Flexibility: nodes can be added or removed from the platform at runtime. The system continuously adapts its behaviour (load balancing) to the available -potentially heterogeneous- hardware resources. On the contrary to many of the approaches presented in the previous section, nodes are not considered
6655
homogeneous, allowing to host a different amount of tasks, in function of the number of free slots, the hardware configuration, the task requirements and the scheduling policy. Task source code and resources are centralized in the server side and automatically transmitted to execution nodes in a transparent way, using advanced agent mobility capabilities. This greatly eases the management of the system as no software should be installed and maintained at the node side (on the contrary to many related works which require the a priori installation of executables). Scalability: platform management overheads (memory and CPU) are maintained at a reasonable level for medium-sized tasks and they grow up linearly according to task number. As a result, the execution speed-up scales linearly with respect to the number of available nodes. Those can be added easily regardless of their architecture or OS. Node hardware characteristics are also automatically inspected and transmitted in a transparent manner, allowing to adapt the scheduling policy to the available resources. None of the related works studied the overheads introduced by the agent management and most of them were only tested with simulated tasks. Robustness: it implements fail recovery measures, by constantly monitoring the platform state and detecting node hardware or software crashes. Unsuccessful tasks are automatically detected and reassigned to available nodes in a transparent way, ensuring, whenever it is possible, a correct finalisation. Most of the frameworks presented in the previous section do not monitor agent events and they lack a proper fail recovery mechanism. Generality: on the contrary to some related approaches (Fukuda et al., 2006) which propose an ad hoc design to solve concrete problems, our platform components have been designed in a generic way, allowing to model easily different problems. Components can also be fined tuned to adapt the system’s behaviour (i.e., scheduling policy) to the concrete characteristics of the tasks to execute, potentially improving the performance. Up to this moment, the system design is based on a unique agent platform in which several containers are distributed through several computers and agents are able to move between them. In order to improve even more the scalability and robustness of the system in front of, for example, server and platform failures, we plan to study the JADE platform federation and agent interplatform mobility in order to support the interoperation of several agent platforms, each one controlling a certain amount of nodes. This will result in a highly decentralised approach that may support server replication and a minimisation of bottlenecks. Acknowledgements This work has been partially supported by the Universitat Rovira i Virgili (a post-doctoral grant of D. Isern and 2009AIRE04) and the Spanish Ministry of Science and Innovation (DAMASK project, Data mining algorithms with semantic knowledge, TIN2009-11005) and the Spanish Government (PlanE, Spanish Economy and Employment Stimulation Plan). References Barbosa, R. M., & Goldman, A. (2004). MobiGrid: Framework for mobile agents on computer grid environments. In A. Karmouch, L. Korba, & E. R. M. Madeira, (Eds.), Proceedings of the first international workshop on mobility aware technologies and applications, MATA 2004, Florianópolis, Brazil (pp. 147–157). Bellifemine, F., Caire, G., & Greenwood, D. (2007). Developing multi-agent systems with JADE. Chichester, England: John Wiley and Sons. Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web – A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, 284(5), 34–43. Cao, J. (2004). ARMSim: A modeling and simulation environment for agent-based grid computing. Simulation, 80(4–5), 221–229.
6656
D. Sánchez et al. / Expert Systems with Applications 38 (2011) 6644–6656
Cao, J., Sun, Y., Wang, X., & Das, S. (2003). Scalable load balancing on distributed web servers using mobile agents. Journal of Parallel and Distributed Computing, 63(10), 996–1005. Ernemann, C., Hamscher, V., Schwiegelshohn, U., Yahyapour, R., & Streit, A. (2002). On advantages of grid computing for parallel job scheduling. In H. E. Bal, K.-P. Lohr, & A. Reinefeld, (Eds.), Proceedings of the 2nd IEEE international symposium on cluster computing and the grid, CCGrid 2002, Berlin, Germany (pp. 39–47). Fensel, D. (2001). Ontologies: Silver bullet for knowledge management and electronic commerce (1st ed.). Springer-Verlag. FIPA (2002a). FIPA request interaction protocol specification. Geneva, Switzerland: Foundation for Intelligent and Physical Agents (FIPA)
[Standard No. SC00026H]. FIPA (2002b). FIPA subscribe interaction protocol specification. Geneva, Switzerland: Foundation for Intelligent and Physical Agents (FIPA) [Specification No. DC00035H]. Fukuda, M., Kashiwagi, K., & Kobayashi, S. (2006). AgentTeamwork: Coordinating grid-computing jobs with mobile agents. Applied Intelligence, 25(2), 181–198. Greaves, M., Stavridou-Coleman, V., & Laddaga, R. (2004). Guest editors’ introduction: Dependable agent systems. IEEE Intelligent Systems, 19(5), 20–23. Jennings, N. (2000). On agent-based software engineering. Artificial Intelligence, 117(2), 277–296. Jie, L., Xian-xing, L., & Zheng-wei, G. (2004). Using mobile agents to implement a workflow system. Wuhan University Journal of Natural Science, 9(5), 775–780. Joseph, J., & Fellenstein, C. (2004). Grid computing. Prentice Hall. Kesselman, C., & Foster, I. (2004). The grid: Blueprint for a new computing infrastructure (2nd ed.). Morgan Kaufmann. Khan, M. A., Vaithianathan, S. K., Sivoncik, K., & Bölöni, L. (2003). Towards an agent framework for grid computing. In D. Grigoras & A. Nicolau, (Eds.), Proceedings of the second international advanced research workshop on concurrent information processing and computing, CIPC 03, Sinaia, Romania.
Lee, W.-P. (2007a). Deploying personalized mobile services in an agent-based environment. Expert Systems with Applications, 32(4), 1194–1207. Lee, W.-P. (2007b). Parallelizing evolutionary computation: A mobile agent-based approach. Expert Systems with Applications, 32(2), 318–328. Lyu, M. R., Chen, X., & Wong, T. Y. (2004). Design and evaluation of a fault-tolerant mobile-agent system. IEEE Intelligent Systems, 19(5), 32–38. Manzoor, U., & Nefti, S. (2009). An agent based system for activity monitoring on network – ABSAMN. Expert Systems with Applications, 36(8), 10987–10994. Sánchez, D., & Moreno, A. (2008a). Learning non-taxonomic relationships from web documents for domain ontology construction. Data & Knowledge Engineering, 64(3), 600–623. Sánchez, D., & Moreno, A. (2008b). Pattern-based automatic taxonomy learning from the Web. AI Communications, 21(1), 27–48. Shi, Z., Huang, H., Luo, J., Lin, F., & Zhang, H. (2006). Agent-based grid computing. Applied Mathematical Modelling, 30(7), 629–640. Summiya, S., Ijaz, K., Manzoor, U., & Shahid, A. A. (2006). A fault tolerance infrastructure for mobile agents. In M. Mohammadian (Ed.), Proceedings of the international conference on computational intelligence for modelling control and automation and international conference on intelligent agents, web technologies and international commerce, CIMCA 06, Sydney, Australia (p. 235). Tang, J., & Zhang, M. (2006). An agent-based peer-to-peer grid computing architecture: Convergence of grid and peer-to-peer computing. In R. Buyya, T. Ma, R. Safavi-Naini, C. Steketee, & W. Susilo, (Eds.), Proceedings of the 2006 Australasian workshop on grid computing and e-research, Hobart, Australia (pp. 33–39). Thanh, D. V. (2001). Using mobile agents in telecommunications. In A. M. Tjoa & R. Wagner, (Eds.), Proceedings of the 12th international workshop on database and expert systems applications, DEXA 2001, Munich, Germany (pp. 685–688). Wooldridge, M. (2009). An introduction to multiagent systems (2nd ed.). John Wiley and Sons.