1st IFAC Workshop on Dependable Control of Discrete Systems (DCDS'07) ENS Cachan, France - June 13-15, 2007
INTERMITTENT FAULT DIAGNOSIS: A DIAGNOSER DERIVED FROM THE NORMAL BEHAVIOR ∗,∗∗
Siegfried Soldani ∗,∗∗∗ Michel Combacau Audine Subias ∗∗ J´ erˆ ome Thomas
∗,∗∗∗∗
∗
LAAS-CNRS, 7 Avenue du Colonel ROCHE FRANCE-31077 Toulouse Cedex 4 ∗∗∗ Universit´e Paul Sabatier, 118 route de Narbonne FRANCE-31062 Toulouse Cedex 4
∗∗
ACTIA, 25 Chemin de Pouvourvillle FRANCE-31432 Toulouse Cedex 4 ∗∗∗∗ INSA de Toulouse, 135 avenue de Rangueil FRANCE-31077 Toulouse Cedex 4
Abstract: This paper deals with an approach for the localization of intermittent faults in discrete events systems with partial observability. The proposed methods are based on a discrete events model representing the normal functioning of the observable behavior of the monitored system. This model based on automata formalism is built from the design data. The detection step consists of a comparison between the flow of observable events emitted by the monitored system and the flow foreseen by the model. A localization mechanism, based on diagnoser approach, points out the set of events potentially responsible for the faults. These two mechanisms are designed in order to operate on-board, in real time. An example c from the automotive domain is presented. Copyright 2007 IFAC Keywords: Fault Detection, Fault localization, Discrete-Events Systems, Automata, Automotive, Intermittent Faults.
1. INTRODUCTION Nowadays, communication has been becoming a main element in the different electronic and informatic systems. The data exchanges play a fundamental part in the correct operation of the different devices of control system. For this reason, the problem of fault detection and diagnosis at discrete events level is an important challenge to deal with the system failures. Moreover, in many systems, faulty behavior often occurs intermittently. To detect this kind of faults, we have to consider an approach which will be developed to use on-line. Therefore, our proposition is an approach for the design of an 1
The authors want to thank the French Ministry of Industry, the French R´ egion Midi-Pyr´ en´ ees and the FEDER program of the EEC for their support.
on-board detection and diagnosis system suitable for any networked architecture and mostly for the intermittent faults. This approach can be applied to different application fields as in the all transportation systems and particularly in automotive field where a typical on-board control system is constituted from different devices (sensors, actuators, processors) which exchange data through a communication bus to fulfill the functions required by the optimal operation of the vehicle. Section two recalls the main works in the domain of discrete events diagnosis and positions our proposal. The main results about localization mechanisms are presented in section three. An application example is given in section four and the conclusion gives the most promising perspectives of this work.
2. DIAGNOSIS AND DISCRETE EVENTS SYSTEMS Fault detection and diagnosis of discrete event systems have been the subject of many studies. Thus, it has been proposed some Petri nets approaches to fault diagnosis (Genc and Lafortune, 2003; Jiroveanu and Boel, 2005; Lefebvre and Delherm, 2005) and diagnoser approaches (Contant et al., 2002; Lamperti and Zanella, 2003; Pencol´e, 2004; Sampath et al., 1998),. . . In these works the model of the system to diagnose is a behavioral model including both normal operation and faults. These approaches give very good results for predictable faults. Indeed, to be detected and diagnosed, a fault must be taken into account by the model of the system. It implies an accurate knowledge of the system and of the faults. But, it is not always realistic to consider that all the faults can be known. The ongoing increase of electronic and informatic devices in the systems implies difficult knowledge about the faults which can occur on the system. So, it is impossible exhaustively to foresee all the faults. Thereby, we consider in our approach only the normal behavior of the system. Nevertheless we assume that a device failure may imply the occurrence of a faulty event or the lack of a correct event. Thus, we do not have any information about the failure, but we know some classes of faults. These classes of faults represent the insertion of spurious events or the lack of events. These faults may result from the devices failures as bad electrical contacts (e.g. faulty relays), sticky components (e.g. stuck valves), unknown bugs,. . . These failures are some typical situations of intermittent faults. This problem of intermittent faults starts to be studied in many works (Contant et al., 2004; Correcher et al., 2003; Jiang and Kumar, 2006) but still remains to be explored more in details. In these works, the faulty events are assumed to be followed by the corresponding reset event, which differs from our approach. Currently, we consider the lost of only one event or the occurrence of only one spurious event, these two situations being the typical symptoms of an intermittent fault.
3. DESCRIPTION OF LOCALIZATION REASONING Our proposition consists of three steps: the modeling of system behavior (off-line) and the detection method of the intermittent and fugitive faults (on-line), which are described in previous works (Soldani et al., 2006), and a localization method (on-line), presented here.
3.1 Previous Works These last works deal with the modeling of system behavior and the detection method of the intermittent and fugitive faults. This model is built from design data where only the observable events (communication events) are represented. The detection mechanism consists of a comparison between the flow of observable events emitted by the monitored system and the flow foreseen by the model. Then, a first reasoning of localization is developed and consists in modifying the sequence of last observable events leading to an inconsistency by deleting one of these events or by inserting an event of the model within this sequence. The goal is to restore the consistency between the observations and the model state trajectories. This reasoning as the calculation of the sequence size is executed on-line. In order to decrease the complexity of the on-line computation, we decided to develop an approach based on the diagnoser approach (Sampath et al., 1998). Let us recall that the main difference between the classic diagnoser and our approach is that we do not know the faults but some classes of faults. The difficulty is to determine all insertions and lacks, the consequences in the model behavior, and to build such a model. Our diagnoser takes all insertions and lacks in the normal behavior model into account. 3.2 Definitions and recalls Now, let us recall some definitions on the automata and the synchronised product. We defined an automata as : Definition 1. labeled transitions system A labeled transitions system is a four-tuple Γ =< S, S0 , E, R > where: - S is the finite set of states; - S0 is the finite set of initial states S0 ⊆ S; - E is the finite set of labels i.e. transition labels; - R is the set of labeled transitions R ⊆ S × a E × S, noted (s,a,s’) or s → s′ . Remarks: We define ǫ the empty label as (s,ǫ,s). Definition 2. Cartesian product of two automata Cartesian product of two automata Γi =< Si , S0i , Ei , Ri >, i ∈ {1, 2} is the automata Γ =< S, S0 , E, R > such as: -
S = S1 × S2 ; S0 = S01 × S02 ; E = E1 × E2 ; R is the set of the transitions defined by ((s1 , s2 ), a, (s′1 , s′2 )) iff :
· either a = (a1 , ǫ) and (s1 , a, s′1 ) ∈ R1 and (s2 , ǫ, s′2 ) ∈ R2 ; · either a = (ǫ, a2 ) and (s1 , ǫ, s′1 ) ∈ R1 and (s2 , a, s′2 ) ∈ R2 ; · either a = (a1 , a2 ) and (s1 , a1 , s′1 ) ∈ R1 and (s2 , a2 , s′2 ) ∈ R2 ; Definition 3. Synchronised product of two automata The synchronised product of two automata Γ1 and Γ2 , noted Γ1 kΓ2 \Sync is the Cartesian product restricted to the labeled transitions in Sync ⊆ (E1 × E2 ). This automata is noted Γ =< S, S0 , Sync, R >.
3.3 Fault Class Model As we have said previously, the building of the normal behavior function is not described here. This model is represented by an automata and is noted by ΓBF . This model does not take the faulty events into account but the set of the all possible insertions and also the all possible lacks of events. We build two fault class models Γ+ and Γ− which represent respectively the assumption that an event may insert or be lacking (cf. Figure 1). The automata Γ+ =< S + , S0+ , E + , R+ > is defined by : - S + = OK, Ft+1 , . . . , Ft+n ,∀ti ∈ E(ΓBF ); - S0+ = {OK}; + - E + = t+ ∈ E(ΓBF ); i , . . . , tn , ∀t i + - R+ = (OK, t+ , F ) . ti i to get the lack fault class model, it is sufficient to − + − change t+ i into ti and Fti into Fti . The insertion of an event ei associated to the transition ti is represented in Γ+ by a transition noted by t+ i . In the same way, the lacks of an event ei associated to the transition ti is represented in Γ− by a tran+ sition noted by t− i . Thus, Fti is the assumption that the event ei associated to the transition ti is inserted. Ft−i is the assumption that the event ei associated to the transition ti is lacking. Ok t+ 1
t+ n t+ i
Ft+1
Ft+i
Ft+n
Fig. 1. Insertion fault class model Remark1: n corresponds to the transitions number in the normal behavior model. Remark2: we consider that only one event can insert or be lacking, which is represented in the corresponding models by the fact that none transition follows the states Ft+i or Ft−i . Nevertheless, to
consider the many possibilities, the models can be modified by adding transitions after these states.
3.4 Model including generic faults To get a model representing the evolution of these assumptions (lack and insertion), we modify the normal behavior model by considering these insertions or lacks. 3.4.1. Lacking event The lacks of an event is considered as the occurrence of an unobservable event and therefore the system may evolve whereas the model remains in the same state. Thus, the modification for the lacking events consists in, for each transitions in RBF , defined by (s, ti , s′ ), creating a new transition defined by ′ (s, t− i , s ). The unobservable event is associated to the transition t− i . The new automata, noted − − − − , S , EBF , RBF >, is defined by : =< S Γ− 0 BF BF BF -
− = SBF ; SBF S0−BF = S0BF ; S − − EBF = EBF t , ∀t ∈ E(ΓBF ); S i − i′ − = RBF (s, ti , s ) such as : RBF − ′ ∀(s, ti , s′ ) ∈ RBF , (s, t− i , s ) ∈ RBF .
3.4.2. Insertion of event A received event may be faulty. If this event corresponds to an expected event then the model may evolve whereas the system remains in the same state. Therefore, the modification for the inserted events consists in adding a transition in the automata whose input marking s is the same than the output mark+ ing and whose label is t+ i . It is noted (s, ti , s). The faulty event has not to change the state in the model (like empty label). Note that we add to a state s only its output transistions because the events associated to these transitions are the only events that may change the trajectory without being immediately detected. That is why for each event occurrence, an insertion assumption about this one is always made and is not represented. The new automata, noted by + + + + Γ+ BF =< SBF , S0BF , EBF , RBF >, is defined by : -
+ SBF = SBF ; S0+BF = S0BF ; S + + EBF = EBF t , ∀t ∈ E(ΓBF ); S i + i + RBF = RBF (s, ti , s) such as : + ∀(s, ti , s′ ) ∈ RBF , (s, t+ i , s) ∈ RBF .
3.5 Synchronised Product After building these different models, a synchronised product is made between the fault class models and the modified normal behavior
model. This synchronisation associates the function states (S1 ,. . . ,Sn ) to modes (OK, Ft+i ) or (OK, Ft−i ) in which the function is or could be, depending on the transitions allowed in the product. For the lacking and inserted events, the synchronised products are done by: − − − − Γ− Sync = ΓBF kΓ \Sync with Sync = (ti , ti ), (ti , ǫ) + + + + + ΓSync = ΓBF kΓ \Sync with Sync =
(ti , ti ), (ti , ǫ)
We get all parameters defined by the synchronised (+/−) (+/−) (+/−) (+/−) product, noted Ssync , S0sync , Esync , Rsync . This is the trouble point of this approach in the sense where this product generates a lot of states. We get for the class fault models (n+1) places and n transitions. The synchronised product generates + n+ s ×(n+1) places and at most (nt ×n) transitions + (idem for the other product), where n+ s and nt + is the number of states and transitions in ΓBF . Thus, we get a polynomial complexity for the computation. However, since this one is made offline, it is possible to use this approach to localize the faulty event.
3.6 Diagnoser Nevertheless, we need to build two new models, − + − Γ+ Diag and ΓDiag from ΓSync and ΓSync which − take the meaning of the transitions t+ i and ti into account. Moreover, this model also take into account the fact that some states will never be reached because they do not correspond to the normal behavior and a detection will be made before. We obtain two diagnosers, one for the lacking events, another for the inserted events. 3.6.1. Insertion of events A transition t+ i is associated to the same event than the one associated to the transition ti . Thus, from a state in the synchronised model, we have to gather the output states of transitions ti and t+ i because they correspond to the same event (which is seen on the communication bus). So we can define different rules to build the new model from the previous synchronised model so as to take this condition into account. The new + + + + model Γ+ Diag =< SDiag , S0Diag , EDiag , RDiag > is caracterized by: + + + - SDiag ⊆ P(SSync ), set of subsets of SSync ; + + + - S0Diag ∈ P(S0Sync ) such as S0Sync ⊆ S0+Diag ; + + - EDiag ⊆ ESync ; + First of all, we can initialize the set SDiag and + RDiag by: + + + - SDiag = {{si }} , ∀si ∈ SSync = SBF × S+; + + - RDiag = {({si } , t, {sj })} , ∀(si , t, sj ) ∈ RSync .
Now, we can define these different rules to build our diagnoser: + (1) Let x, x’ ∈ RDiag |x = ({si } , ti , {sj }) and ′ ′ x = ({si } , tk , sj ), tk = t+ i ⇒ + + • SDiag = SDiag ∪ sj , s′j + + • RDiag = RDiag ∪ (si , ti , sj , s′j ) + + • RDiag = RDiag \ {x, x′ } + (2) Let sm = (., OK) ∈ SSync and S1 , S2 ∈ + SDiag sm ∈ (S1 ∩ S2 ) ⇒ + + = SDiag ∪ {S1 ∪ S2 } \ {S1 , S2 } • SDiag • And ∀Sx ∈ {S1 , S2 }, + + - (Sx , t, Sy ) ∈ RDiag ⇒ RDiag = + RDiag ∪ {({S1 ∪ S2 } , t, Sy )} \ {(Sx , t, Sy )} + + - (Sy , t, Sx ) ∈ RDiag ⇒ RDiag = + RDiag ∪ {(Sy , t, {S1 ∪ S2 })} \ {(Sy , t, Sx )} + (3) Let Si , Sj , Sm , Sl ∈ SDiag and + x,x’ ∈ RDiag |x = (Si , tk , Sl ), x′ = (Sj , tl , Sm ), + + ∪{(Si , tl , Sm )} = RDiag • Sj ⊆ Si ⇒ RDiag • Sj ⊆ Si and tk = tl ⇒ + + ∪{(Si , tk , {Sl ∪ Sm })} = RDiag - RDiag + + - SDiag = SDiag ∪ {Sl ∪ Sm } - And ∀Sx ∈ {Sl , Sm }, + + (Sx , t, Sy ) ∈ RDiag ⇒ RDiag = + ∪{({Sl ∪ Sm } , t, Sy )} \ {(Sx , t, Sy )} RDiag + + (Sy , t, Sx ) ∈ RDiag ⇒ RDiag = + ∪{(Sy , t, {Sl ∪ Sm })} \ {(Sy , t, Sx )} RDiag + (4) Let x = (Si , t, Sj ) ∈ RDiag + + \ {x} = RDiag (., ok) ∈ / Si ⇒ RDiag
3.6.2. Lacking events An event/transition t− i corresponds to a lacking event and therefore is an unobservable event/transition. Thus, from an ′ ′ observation point of view, if ∃(sj , t− i , sj ), sj and sj cannot be distinguished and are grouped together − in Γ− Diag . The model ΓDiag is caracterized and initialized in the same way than for Γ+ Diag but from the Γ− elements. The set of rules used to Sync − build ΓSync is constituted by the rule 1 defined below and rules 2,. . . ,6 defined in 3.6.1: − (1) Let Sy , Sx , Si , Sj ∈ SDiag with Sx ∈ {Si , Sj }, Sy ∈ / {Si , Sj } − − − x = (Si , t− n , Sj ) ∈ RDiag ⇒ SDiag = SDiag ∪ − − {Si ∪ Sj } and RDiag = RDiag \ {x} − − − x = (Sx , t, Sy ) ∈ RDiag ⇒ RDiag = RDiag ∪ ({Si ∪ Sj } , t, Sy )\ {x} − − − x = (Sy , t, Sx ) ∈ RDiag ⇒ RDiag = RDiag ∪ (Sy , t, {Si ∪ Sj })\ {x}
s0 t0 e3.e5
t1 e1
s1
t5 e5 t6 e6
t3 e4
t4 e3
t2 e2
s0 : s1 : s2 : s3 : s4 :
initial state reached each time the power is off, stand-by (wiping suspended or stopped), wiping low-speed, maintenance, wiping high speed.
e1 : maintenance request off e2 : maintenance request on e3 : low speed request off
e4 : low speed request on e5 : high speed request off e6 : high speed request on
4.3 Detection of a fault s4
t7 e3.e6
s2
t9 e1.e4
s3
t10 e3.e2
t8 e5.e4
t11 e1.e6
Fig. 2. “front wiping” abstracted model 4. APPLICATION IN AUTOMOTIVE INDUSTRY 4.1 Specificities of the automotive context In most of automotive architectures, the electronic devices named Electronic Control Unit (ECU) are connected by a specific local network: the Controller Area Network (CAN). CAN is based on a serial communication protocol, which supports distributed real-time control. The functions provided to the passengers are implemented by a collaboration of the different ECU. Our example concerns the “front wiping” function. The model of this function is not of a high level of complexity, but it is sufficient to show how our proposal works. This function is distributed on three different ECU: the “on/off and speed selection” ECU, the actuators ECU and the main ECU. This one manages the behavior of the function. It receives messages from the two others ECU and sends control messages to the actuators. These ECUs are designed to deal with some foreseen faults and not for intermittent faults. Our works are intended to give a solution to this kind of faults. In this architecture, the different ECU are “black boxes” and the real time monitoring can only be done through the observation of the messages exchanged on the network. Let us see on this example how the localization mechanisms previously described can be a great advantage.
4.2 Modeling of a distributed function The ΓBF model (figure 2) represents the exchange of messages between the three ECU during normal operation of the front-wiping function and is built from design data. This function has five states and the transitions are labeled by sets of events. The description is done below:
Let us consider the initial state s1, and the sequence of events [e2 , e3 ]. e2 is consistent with the events expected in the model, but the following event e3 does not belong to the set of expected events. So, the detection is made. Instead of reasoning from this sequence in order to determine which might be the faulty event, we use a diagnoser which provides us the different assumptions, which are consistent with the receveid events sequence. Otherwise, only an insertion assumption could explain an inconsistency.
4.4 Fault localization model − + − Building Γ+ BF ,ΓBF ,ΓSync and ΓSync leads to − − Γ+ Diag and ΓDiag . In this paper, only ΓDiag is represented (figure 3). − We can see that, from the state (s1 , OK), (s2 , Ft3 ), − − (s4 , Ft6 ), (s3 , Ft1 ) , the sequence of events [e2 , e3 ] is consistent with a trajectory by pointing out the assumption (s1,F t− 9 ). This means that the event associated to the transition t9 i.e. the event e1 .e4 is lacking. This is a valid assumption to explain the observed sequence. With this method, we reach the same results (fault localization) than those presented in (Soldani et al., 2006). However, the on-line computation complexity that only requires to follow the evolution of two DES models has been significantly reduced.
5. CONCLUSION The aim of this paper is to describe a method for localization of intermittent faults by an onboard monitoring system based on discrete events models. The monitoring system operates with a model of the normal behavior only. The considered approach is very interesting for automotive applications, domain in which intermittent faults lead to very awkward situations for the mechanics. The localization of the fault i.e. the lacking event or the spurious event, gives some potentially useful information to the mechanics (the components at the origin of the fault). Technically, this on-board monitoring system is designed to be connected to the off-board diagnosis tool used by the mechanics in the garage. Futur works will focus on the use of Petri net to reduce the state
t7 e3.e6
(s4,Ft3-)
t10 e3.e2
t3 e4
(s2,Ft0-)
t2 e2
(s3,Ft0-)
(s0,Ok) (s1,Ft0-)
t6 e6
(s4,Ft0-)
(s3,Ft3-)
t4 e3
(s1,Ft3-)
t8 e5.e4
(s2,Ft6-)
t5 e5
(s1,Ft6-)
t0 e3.e5
t2 e2
(s4,Ok) (s1,Ft5-) (s2,Ft8-)
t6 e6
(s4,Ft5-)
t3 e4
t7 e3.e6
t3 e4
(s2,Ft5-) (s3,Ft5-)
t4 e3
t4 e3 (s1,Ft8-)
t10 e3.e2
(s3,Ft8-)
t7 e3.e6
(s4,Ft8-)
(s4,Ft1-)
t11 e1.e6
t2 e2
t8 e5.e4
t9 e1.e4 (s2,Ok) (s4,Ft7-) (s3,Ft10-) (s1,Ft4-)
t8 e5.e4
(s2,F1-)
t9 e1.e4 t1 e1
t5 e5 t6 e6
(s1,Ft1-)
t1 e1
(s1,Ok) (s2,Ft3-) (s4,Ft6-) (s3,Ft1-)
(s3,Ft4-) (s4,Ft4-)
t3 e4
t1 e1
t2 e2
t9 e1.e4
t6 e6
t11 e1.e6
t5 e5
t8 e5.e4
(s1,Ft7-)
(s2,Ft7-)
(s1,Ft10-) (s2,F10-) (s4,Ft10-)
(s1,Ft11-) t5 e5
t10 e3.e2
(s2,Ft4-)
(s2,Ft11-)
(s3,Ok) (s4,Ft11-) (s2,Ft9-) (s1,Ft1-)
t3 e4 t2 e2 t6 e6 t7 e3.e6 t10 e3.e2 t4 e3
(s2,Ft1-) (s3,Ft1-) (s4,Ft1-) (s4,Ft9-) (s3,Ft9-) (s1,Ft9-)
t11 e1.e6
Fig. 3. Lack diagnoser model explosion in the synchronised product. Moreover, we will adress the problem of model initialisation and the reset when the detection is made (Caines et al., 1988; Giua, 1997). Besides, with this approach, diagnosability and detectability analysis will be lead, which would be close to different works (Sampath et al., 1998; Pencol´e, 2004). A real application on a vehicle is being developed to prove the efficiency of the proposition.
REFERENCES Caines, P.E., R. Greiner and S. Wang (1988). Dynamical logic observers for finite automata. In: Proceedings of the 27th IEEE Conference on Decision and Control. Austin,Texas (USA). pp. 226–233. Contant, O., S. Lafortune and D. Teneketzis (2002). Failure diagnosis of discrete event systems: The case of intermittent faults. In: Proceedings of the 41st IEEE Conference on Decision and Control. Las Vegas (USA). pp. 4006–4011. Contant, O., S. Lafortune and D. Teneketzis (2004). Diagnosis of intermittent faults. Discrete Event Dynamic Systems 14, 171–202. Correcher, A., E. Garcia, F. Morant, E. Quiles and R. Blasco-Gimenez (2003). Intermittent failure diagnosis in industrial processes. In: IEEE International Symposium on Industrial Electronics, ISIE’03. Vol. 2. pp. 4006–4011. Genc, S. and S. Lafortune (2003). Distributed diagnosis of discrete-event systems using Petri nets. In: ICATPN’03. Eindhoven (The Netherlands). pp. 316–336. Giua, A. (1997). Petri net state estimators based on event observation. In: Proceedings of the
36th IEEE Conference on Decision and Control. San Diego, California (USA). pp. 4086– 4091. Jiang, S. and R. Kumar (2006). Diagnosis of repeated failures for discrete event systems with linear-time temporal-logic specifications. 3, 47–59. Jiroveanu, G. and R.K. Boel (2005). Petri net model-based distributed diagnosis for large interacting systems. In: Proceedings of the 16th International Workshop on Principles of Diagnosis, DX’05. Monterey, California (USA). Lamperti, G. and M. Zanella (2003). Continuous diagnosis of discrete-event systems. In: Proceedings of the 14th International Workshop on Principles of Diagnosis, DX’03. Washington D.C. (USA). pp. 105–112. Lefebvre, D. and C. Delherm (2005). Diagnosis with causality relationships and directed paths in Petri net models. In: IFAC World Congress’05. Pragues (Czech Republic). Pencol´e, Y. (2004). Diagnosability analysis of distributed discrete event systems. In: Proceedings of the 16th European Conference on Artificial Intelligence, ECAI’2004. Valencia (Spain). pp. 43–47. Sampath, M., S. Lafortune and D. Teneketzis (1998). Active diagnosis of discrete-event systems. IEEE Transactions on Automatic Control 43(7), 908–929. Soldani, S., M. Combacau, J. Thomas and A. Subias (2006). Intermittent fault detection through message exchanges: a coherence based approach. In: 6th IFAC Symposium on Fault Detection, Supervision and Safety of Technical Processes (SAFEPROCESS’2006). Beijing (China). pp. 1549–1554.