AGENT-BASED MAINTENANCE MANAGEMENT SYSTEM FOR THE DISTRIBUTED FAULT TOLERANCE Mariela Cerrada
∗∗,1
Juan Cardillo ∗ Jose Aguilar ∗∗ Ra´ ul Faneite ∗∗
∗
Universidad de Los Andes. Facultad de Ingenieria. Dpto. de Sistemas de Control, M´erida-VENEZUELA ∗∗ Universidad de Los Andes. Facultad de Ingenieria. CEMISID, M´erida-VENEZUELA
Abstract: Nowadays, industrial necessities claims global management procedures integrating information systems in order to manage and to use the controlled-processes information and thus, to assure a good process behavior. These aspects aim to the development of fault detection and diagnosis systems and making-decision systems. In this work, a reference model for fault management in industrial processes is proposed. This model is based on a generic framework using multi-agent systems for distributed control systems; in this sense, the fault management problem is viewed like a feedback control process and the actions are related to the making-decision in the preventive maintenance task scheduling and the running of preventive and corrective specific maintenance tasks. A system prototype is developed on Java Development Framework c 2006 IFAC. and a case of study is presented for the prototype validation.Copyright Keywords: Maintenance Engineering, Management Systems, Fault Tolerance, Reliability, Automation.
1. INTRODUCTION Automation is an important aspect that permits to improve the industrial processes performance (Williams et al., 1994), and fault management systems are vital part of the automation process. Fault management and maintenance are vital aspects in industrial process, they not only include the knowledge about the failure modes and their causes, but an awareness of the extent to which equipment failure affects safety, product quality, costs and plant availability (Moubray, 1992). Thus, data and information should be available in order to drive the detection-diagnosis-prediction task. A maintenance program must to give atten1 Corresponding author. Telf.: +58 274 2402983; Fax: +58 274 2402979. This work has been supported by CDCHTULA Grant Project I-621-98-02-A
tion on the incorporation of new techniques and concepts in the maintenance area, and information management that aims to the development of systems permitting to store the corrective maintenance record, failure analysis record (historic detection, diagnosis and prediction), and maintenance planning management. In this sense, fault management systems supporting the decision-making based on the available information in order to propose new maintenance program according to the current enterprise context, have become an important area on the fault tolerance research (Aguilar et al., 2000). Nowadays, more of systems supporting the maintenance management are information systems where the relationships between historical records about fault occurrence, maintenance scheduling, maintenance planning, maintenance performance, plan938
ning reconfiguration have not been viewed in an integrated way. On the other hand, Multi-Agent Systems (MAS) theory can be viewed as an evolution of artificial intelligence, in order to attain autonomous computational systems. It is accorded the autonomy is the main characteristic describing an agent, the autonomy being the ability to accomplish a task and to reach its objectives without human or any other assistance (Weiss, 1999). Agents technology are being widely used as modelling approach for control systems and industrial developments because of the decentralized nature of the problem in these areas (Jennings and Bussmann, 2003) and the complexity of the manufacturing and business environments (Marik and McFarlane, 2005). In order to integrate the multiples perspectives in the related areas in industrial environments (informatics, control, maintenance, management) and competing interests to attain global objectives, MAS based architectures have been proposed (Nahm and Ishikawa, 2005),(Bravo et al., 2005) and general methodologies as MASCommonKADS to specify MAS have been developed (Iglesias et al., 1999). Particularly, MASINA (Multi-Agent Systems-based INtegrated Automation) is an approach for MAS modelling in industrial automation processes (Bravo et al., 2005) and this one has been enhanced from MASCommonKADS by incorporating new aspects for agent modelling (like UML tools, intelligence model and communication model). MAS-based scheduling has been treated in (Ouelhadj et al., 2004) with more attention in the communication problem than the agent modelling; in (Wagner, 2003) the use of agents in automation systems in their conceptual analysis is presented, remarking the role of the monitoring and maintenance as a part of the engineering process and this role is performed by an agent. In this work, it is proposed both of the reference integrated model for fault analysis and maintenance management and a reference model for maintenance management system (MMS) based on such integrated model by using MAS. The MMS objectives are achieved by the coordinated interaction of the agents, this way the agent-based MMS provides the assistance in the detectiondiagnosis-decision process, as well as in the planning and running of maintenance tasks.
SERVICES ADMINISTRATION AGENTS
Human Agent
Applications Administration Agent
O t h e r s
Communication Control Agent
HIGH LEVEL AGENTS
Agents Administration Agent
Business Resources administration agent
M A S
Supervision
Planning
Database agent
Control
Process
Fig. 1. MAS-oriented Architecture for Industrial Automation Prediction Models
Detection
Isolation
Diagnosis
Prediction
Scheduling
Maintenance Diagnosis Models
Detection Models
Fault Tolerant Supervision Control
Outputs
Processes
Actions
Control signal
SUPERVISION LEVEL
Fig. 2. Functional Reference Model for Fault Management System automation system, see figure 1, as an supervision agent (Aguilar et al., 2005), (Bravo et al., 2005). The MMS proposed in this work is based on the functional reference model permitting the adequate interaction between the activities related to the Monitoring and Fault Analysis Tasks (MFAT) and the Maintenance Management Tasks (MMT), see figure 2. The MFAT block is concerned to the detection, isolation, diagnosis, prediction and planning. The MMT block is concerned to the set up and running of the maintenance tasks according to the maintenance plan. Based on this functional reference model and following the MASINA methodology, actors and cases of use are defined and listed in Table 1.
2.1 Agent Model 2. MAS-BASED REFERENCE MODEL FOR MAINTENANCE MANAGEMENT SYSTEMS The MMS proposed in this work is a part of the integrated MAS oriented architecture for industrial
The roles of the mentioned actors can be embedded into the agents defined in the generic conceptual framework for the Agents-based Intelligent Distributed Control Systems (AIDCS) (Aguilar et al., 2005), (Bravo et al., 2005). Thus, 939
Table 1. Actors and cases of use Actor Detector Finder Diagnostician Predictor Scheduler Executor
Case of Use System Monitoring State Identification Finding-Failure Failure Analysis Failure Occurrence Estimation Tasks Scheduling Redefining Plans Running Tasks Reporting Tasks
Finder Agent Detector Agent
DCS
Middleware
Agents-based
Table 3. Agent Coordinator Objective
Coordinator Agent
Control
Type Input Parameters Output Parameters Activation Condition
End Condition Diagnostician Agent
Success Condition
Predictor Agent
Failure Condition Representation Language Description
Controller Agent
Observer Agent
Name
Actuator Agent
Maintenance
To schedule the maintenance tasks according to the item’s reliability, the failure effects and urgent tasks On-condition objective Data from Specialized, Observer and Controller agents Maintenance plan and/or corrective maintenance order (urgent tasks) Information is received from Specialized agents, Observer agent and Controller agent The maintenance plan or corrective actions are stated A maintenance plan or corrective actions are proposed, permitting to achieve adequate reliability levels on the item’s process ¬ Success Condition Natural Language This agent assesses the different setting in order to create the best general maintenance plan on a long time horizon (Long Term Plan (LTP)) and it produces the best set of corrective actions (urgent tasks) in case of emergency
Process
• • • • •
Fig. 3. MAS-based Maintenance Management System the following agents are defined, (figure 3): Detector Agent, Finder Agent, Diagnostician Agent, Predictor Agent, Coordinator Agent, Controller Agent, Actuator Agent, Observer Agent. Under the AIDCS framework the Detector, Finder, Diagnostician and Predictor agents are called specialized agents. In the following, only the characteristics and requirements of the Coordinator Agent are presented in tables 2 and 3. Coordinator Agent provides the following services: Maintenance Plan Proposition, Maintenance Plan Redefinition and Specialized Agent Calling.
LT maintenance plan proposition Resources Evaluation Application order of DIDP tasks Application order of corrective action Maintenance plans redefinition (MPR) Table 4. Task MPR
Name Objective
Precondition Frequency Description
Maintenance plans redefinition To redefine the maintenance plan based on the maintenance tasks that has not been carried out Information from Observer agent must be clear and complete. Related to the information from Observer agent Coordinator agent redefines the maintenance plan from the knowledge of the tasks that has not bee carried out.
Table 2. Agent Coordinator Name Type Roles Description
Coordinator Software agent Making decision for the maintenance tasks planning It gathers the information, from the specialized agents, about the process’ items and it schedules the maintenance tasks. The timeline should be defined according to the item’s reliability, the failure effects and urgent tasks
2.2 Tasks Model The agent’s tasks are defined as the activities that the agent makes in order to accomplish its service and objectives. In case of Coordinator Agent, the following tasks are defined, and table 4 details the tasks Maintenance Plan Redefinition:
2.3 Intelligence Model Excepting the Actuator Agent, all agents in the MMS may be sensitive to be intelligent. A general structure of the intelligence model is presented bellow: (1) Experience • Representation: Rules. • Type: Based on cases. • Reliability: It is related to the data completeness. • Processing scheme: Knowledge parameters tuning and new models incorporation. (2) Learning Mechanism • Type: Adaptive 940
• Representation: Rules, neural techniques, genetic techniques. • Learning Sources: Success or failures during DIDP tasks. • Update Mechanism: Experiences feedback. (3) Reasoning Mechanism • Information Source: Previous results from de FMS’s agents. • Activation Source: Scheduling tasks, DIDP tasks. • Type of Inference: Based on rules. • Task-Objective Relationship: It decides if the used algorithm is adequate for DIDP tasks or if an adequate maintenance plan is proposed. • Reasoning Strategy: It can be deductive or inductive: evaluate the causes of success or failure in DIDP tasks or confront unknown situations.
Fig. 4. UML Interaction diagram the previous conversation Maintenance Plan Redefinition, the following speaking interactions are performed: Expecting Maintenance Tasks Search (Table 6), Alarm and Maintenance Plan Sending. Table 6. Speaking Interaction
2.4 Coordination Model This model describes the communications scheme of the MAS: conversations, protocols and associated languages. The following conversations have been defined: On-Condition Maintenance, Maintenance Plan, Urgent Maintenance Tasks, Maintenance Plan Redefinition, Maintenance State, Functional Failure Identification. The conversation Maintenance Plan Redefinition (MPR) is only presented in table 5.
Speaking Interaction Type Objective
Agents Beginner Precondition End Condition
Table 5. Conversation MPR Conversations Objective
Agents Beginner Speaking interactions Precondition
End Condition
Description
To redefine the execution timeline of the maintenance tasks that have not been put up and running on the process Coordinator, Data-base (MAS-based Middleware) and Human Coordinator Agent Expecting Maintenance Tasks Search, Alarm, Maintenance Plan Shipment A particular maintenance task has not been put up and running on the process. A new timeline has been proposed and a new plan is sent to Data-Base, else, an Alarm is sent to the Human Agent (User) This conversation permits the information search on the Data-Base about the maintenance tasks that have not put up and running (Expecting Tasks) (see figure 4)
2.5 Communication Model A set of 21 speaking interaction has been defined and they are suitably arranged into the conversations in the Coordination Model. In the case of
Description
Expecting Maintenance Tasks Search Query. To search in the Data-Base the expecting maintenance tasks that have not executed on the process Coordinator, Data-Base (MAS-based Middleware) Coordinator Agent An active flag about expecting tasks The Coordinator Agent receives, from the Data-Base Agent, the whole information about the expecting tasks Maintenance Plan Redefinition, Urgent Maintenance Tasks This agent requests the whole information about the expecting tasks reported by the Observer Agent
3. MAS-BASED MMS PROTOTYPE DESIGN The prototype for the proposed MAS-Based MMS has been developed on JADE (Java Development Framework), a tool according to the Foundation for Intelligent Physical Agents (FIPA) request for the development of MAS applications. In order to validate the prototype performance, the maintenance activities on a pool pumping system has been studied (Aguilar and Cerrada, 2000). The system function is to maintain the water flow, from the pool to the conditioning and heating systems, on top of 7 GPM. In case of the main pump, the failure mode is the damage on the motor rolling, the cause should be the wearout stage and the critical variable associated to the failure mode is the vibration signal. A typical maintenance tasks associated to this failure mode 941
Fig. 5. MMS graphic interface: Controller Agent
Fig. 7. FMS graphic interface: Diagnostician Agent
Fig. 6. FMS graphic interface: Detector Agent Fig. 8. FMS graphic interface: Coordinator Agent are the vibrations monitoring (on-condition tasks) and the change of rolling (on-time tasks). An initial maintenance plan has been defined and a set of events has been stated on the normal and abnormal environment context. A particular case is presented: a failure occurrence on the main pump has been noticed during the application of an on-time maintenance task. The maintenance plan is known by the Controller Agent and the maintenance tasks are ordered at a particular time. Figure 5 shows the graphic interface monitoring the Controller Agent ordering the application of different maintenance tasks on the system’s equipment. Numbers in parenthesis are related to the current time unit (weeks, months, etc.). The order from Controller Agent is received by the Coordinator Agent who coordinates the application of detection/ diagnosis tasks. In this case, the specialized Detector Agent runs the detection mechanisms and a fault on the pump is detected at 150 time units (see figure 6). Once a fault is detected, Coordinator Agent requests information about the failure mode and causes to Diagnostician Agent. In this case, the Diagnostician’s result is deteriorated rolling and
the cause is the aging and wearout stage. Figure 7 shows the Diagnostician monitoring. Coordinator Agent need more information in order to take a decision and the services of Predictor Agent are requested. In this case, the dear time for total failure is predicted. Given the whole information, Coordinator Agent orders the change of the pump rolling at this time (see figure 7) and this order is send to the Actuator Agent. Figure 8 shows the monitoring of Coordinator Agent. If a maintenance task is rescheduled by the Coordinator Agent (carried out tasks), then, this is shown on the alarm screen, and a new maintenance plan is stored.
4. CONCLUSIONS In this work, both of the reference integrated model for fault analysis and maintenance management and a reference model based on MAS for maintenance management system (MMS), taking into account such integrated model, has been proposed. This model has been developed into a 942
MAS-based generic framework proposed for Intelligent Distributed Control Systems. In this sense, the system performs a set of tasks (actions) permitting the maintenance tasks planning and the application of specific maintenance tasks as fault detection, isolation, diagnosis and prediction. The enhanced methodology MASINA has been used for the conception and analysis of the MAS, then a set of models permitting to describe the main characteristics of the MAS have been provided. The resulting models have a generic structure that permits to incorporate it into the automation process of a distributed control systems. A prototype of the proposed MMS has been developed on JADE. A case of study related to the pool pumping system has been presented and the working of the FMS is showed, by using an event generator.
Nahm, Y-E. and H. Ishikawa (2005). A hybrid multi-agent system architecture for enterprise integration using computer networks. Robotics and Computer-Integrated Manufacturing 21, 217–234. Ouelhadj, D., S. Petrovic, P.I. Cowling and A. Meisels (2004). Inter-agent cooperation and communication for agent-based robust dynamic scheduling in steel production. Advanced Engineering Informatics 18, 161–172. Wagner, T. (2003). Applying agents for engineering of industrial automation systems. In: Lecture Notes in Artificial Intelligence. MATES 2003 (M. Schillo et al., Ed.). Vol. 2831. pp. 62–73. Springer Verlag. Berlin. Weiss, G. (1999). Multiagent Systems. The MIT Press. Massachusets. Williams, T.J., P. Bernous, J. Brosvic, D. Chen and G. Doumeingts (1994). Architectures for integrating manufacturing activities and enterprises. Computers in Industry 24, 11–39.
5. REFERENCES Aguilar, J. and M. Cerrada (2000). A fuzzy classifier system for fault tolerance. Revista T´ecnica de la Facultad de Ingenier´ıa. LUZ 23(2), 98–108. Aguilar, J., M. Cerrada and K. Morillo (2000). A reliability-based failure management application using intelligent hybrid systems. Proceedings of IFAC SAFEPROCESS 2000 1, 197– 202. Aguilar, J., M. Cerrada, F. Hidrobo and F. Rivas (2005). A multiagent based framework for intelligent distributed control systems. In: Lecture Notes in Computer Science. KnowledgeBased Intelligent Information and Engineering Systems (R. Khosla et al., Ed.). Vol. 3681. pp. 191–197. Springer Verlag. Berlin. Bravo, C., J. Aguilar, F. Rivas and M. Cerrada (2005). Design of an architecture for industrial automation based on multi-agents systems. Proceedings of 16th IFAC World Congress. Iglesias, C.A., M. Garijo, J.C. Gonz´ alez and J.R. Velazco (1999). Analysis and design of multi-agent systems using mascommonkads. In: Lecture Notes in Artificial Intelligence.Intelligent Agents IV. Agents Theories, Architectures and Language (M.P. Singh, A.S. Rao and M. Wooldridge, Eds.). Vol. 1365. pp. 1–16. Springer Verlag. Berlin. Jennings, N. and S. Bussmann (2003). Agentbased control systems. IEEE Control Systems Magazine pp. 61–73. Marik, V. and D. McFarlane (2005). Industrial adoption of agent-based technologies. IEEE Intelligent Systms pp. 27–35. Moubray, J. (1992). Reliablity-Centered Maintenance. Industrial Press. New York. 943