Reliability Engineering and System Safety 60 (1998) 29-40
ELSEVIER
PII:S0951-8320(97)00125-7
© 1998 Elsevier Science Limited All rights reserved. Printed in Northern Ireland 0951-8320/98/$19.00
A methodology for fault diagnosis in large chemical processes and an application to a multistage flash desalination process: Part I Enrique E. Tarifa a & Nicolfis J. Scenna b'* aUniversidad Nacional de Jujuy, Gorriti 237-4600 San Salvador de Jujuy, Argentina bU.T.N.-F.R.R., Zeballos 1641-2000, Rosario, Argentina (Received 16 July 1996; accepted 31 August 1997)
This work presents a new strategy for fault diagnosis in large chemical processes (E.E. Tarifa, Fault diagnosis in complex chemistries plants: plants of large dimensions and batch processes. Ph.D. thesis, Universidad Nacional del Litoral, Santa Fe, 1995). A special decomposition of the plant is made in sectors. Afterwards each sector is studied independently. These steps are carried out in the off-line mode. They produced vital information for the diagnosis system. This system works in the on-line mode and is based on a two-tier strategy. When a fault is produced, the upper level identifies the faulty sector. Then, the lower level carries out an in-depth study that focuses only on the critical sectors to identify the fault. The loss of information produced by the process partition may cause spurious diagnosis. This problem is overcome at the second level using qualitative simulation and fuzzy logic. In the second part of this work, the new methodology is tested to evaluate its performance in practical cases. A multiple stage flash desalination system (MSF) is chosen because it is a complex system, with many recycles and variables to be supervised. The steps for the knowledge base generation and all the blocks included in the diagnosis system are analyzed. Evaluation of the diagnosis performance is carried out using a rigorous dynamic simulator. © 1998 Elsevier Science Limited. before taking the required actions. Fault diagnosis consists of analysing the detected symptoms in order to identify the fault causing them 2. As the analysis of the system is done by monitoring a set of predetermined process variables, the resolution for the diagnosis is limited. Often, a set of faults corresponds to the same pattern. This set of faults is defined as a cluster of faults corresponding to this pattern. To identify the real fault in the cluster, more resolution must be achieved u s i n g process knowledge, operator expertise, etc. The composition and dimension of fault clusters strongly depend on the proportion of the measured process variables. Both fault detection and diagnosis are important to prevent damage to production facilities and preserve operator safety. However, owing to the complexity of most chemical processes (i.e. time delay, complex interconnections, etc.), it is very difficult to detect and find faults in time to take
1 INTRODUCTION A fault (e.g. a partial valve blockage) must be detected as soon as it occurs. This can be done by detecting the symptoms (e.g. a low flow). These are the process-variable deviations caused by a fault. The set o f all these symptoms is the pattern of this fault. Symptom analysis can be done by checking if the measured variables are close to their normal values. If this check does not pass, a fault message is displayed. The functions up to this point are usually called monitoring or fault detection. This step yields vital information for the diagnosis task l If a fault is detected, a diagnostic test must be made *Author to whom correspondence should be addressed at: INGAR, Avellaneda 3657, 3000 Santa Fe, Argentina. Tel.: 54 42 534451; Fax: 54 42 553439. 29
30
E. E. Tarifa, N. J. Scenna
immediate corrective actions. Moreover, even computeraided diagnosis and supervision are very difficult tasks. Many strategies have been proposed to solve the diagnosis problem, e.g. quantitative algorithms ~, fault tree methods 3, a qualitative model approach such as Directed Signed Graphs (SDG) 4, the expert system-based approach 5, statistical pattern recognition strategies 6, the neural network approach 7 and others. These methods are suitable for diagnosis at the unit level. However, use of these methods for the supervision of an entire large plant is difficult and inefficient. Indeed, a large plant has too many variables to be supervised and its model is too complex to be handled in real time mode. In this case, a hierarchical approach is more appropriate 8 - 1 1 Finch and Kramer 12,13 introduce a model formalism to describe large and complex processes at a suitably detailed level for early-stage diagnosis. They use a two-step diagnosis procedure to detect faulty systems and units. In the first step, the potentially faulty subsystems are found. The second step involves the evaluation of rules for a further reduction of the fault candidate space. The process decomposition in subunits can be carried out by a structural or functional approach. Finch and Kramer 12.13 use a functional approach. However, in highly integrated plants, it is of primary importance to study simultaneously a set of sectors, because they share a common information loop. The worst condition is if the loop is as large as the whole plant, as for example multiple step separation evaporative processes, which normally have recirculations (see Part II). So for many diagnosis methodologies, decomposition becomes inadequate or inadmissible. This problem was reported by Mohindra and Clark 14 in their work on a distributed fauh diagnosis method. Unfortunately, this problem is closer to the rule than to the exception. This paper presents an efficient diagnosis method for large chemical plants. It involves two stages, done in offline and on-line mode. The off-line stage is carried out only once for each new plant to be studied. Its output consists of a set of rules, one for each sector into which the plant is divided, which conform to the knowledge base of an expert system. This expert system executes the on-line stage. It is based on a two-tier strategy. When a fault is detected, the upper level identifies the sector that may contain the fault. Afterwards the lower level evaluates the rules related to only the critical sector to identify the fault. If necessary, the sectors that are nearer to the critical sector are also involved in the searching procedure. This last step is done by the upper level. For this task, it uses a model that describes the connections among sectors, which is similar to that used by Finch and Kramer 12.13 Hence, the problems caused by the size of the plant (too many variables to be supervised and too complex a model) are solved in the lower level by processing only one sector at a time. Interactions among all sectors of the whole plant are neglected. However, this approach produces a loss of information that may cause spurious diagnosis. Thus, the 'local' reasoning, that is the on-line stage in the lower level, must
be able to work even with partial information. This implies that it must be a solid, efficient and high-performance lower diagnosis level. In our method this purpose is achieved by exploiting several advantages of fuzzy sets and fuzzy logic theory 15 As shown below, algorithm robustness can be achieved using fuzzy logic in both the detector module and the diagnosis module. Boolean logic is unsuitable in this case because the rules will never be completely true. Therefore, all of them will be considered false, causing a null diagnosis. Conversely, fuzzy logic avoids the elimination of the real fault because it does not classify the rules into true or false but assigns a degree of truth (between 0 and 1) to each. At worst, a slight descent of the actual fault into the fault ranking may occur (loss of resolution). On the other hand, to achieve higher resolution even with partial information, we must compile rules with the maximum 'local' information. This is achieved by performing qualitative fault simulation, and retaining (in contrast to other authors) all the information produced by this operation, that is, fault pattern, symptom sequence, affected and non-affected nodes, etc. These features also produce additional advantages: stability, less sensibility to noise, possibilities to report a ranking of fault candidates, and the ability to explain the diagnosis in a 'natural' form. In the following sections all steps of the proposed method will be analyzed. In Part II, a particular application will be developed to show the system performance. As an example, a multiple stage flash desalination process (MSF) was chosen because it is a complex system with many variables to be supervised and a high information recirculation. Both off-line and on-line stages will be analyzed for the system generation. Evaluation of the diagnosis performance will be carried out using a rigorous dynamic simulator especially developed for this purpose. Some aspects of this simulator will also be described.
2 T H E O F F - L I N E STAGE 2.1 Introduction
in this stage the entire plant is decomposed into sectors. This decomposition may be structural (focusing attention on the physical connections among sectors of the plant) or functional (considering the purpose of each sector of the plant). In the lower level each sector must be analyzed individually, regardless of the kind of decomposition used. As interactions among all the plant sectors are neglected, a loss of information is produced. Then, the size of each sector should allow the application of any diagnosis method, but this method must work efficiently with partial information. In general, many proposed methods use Boolean logic to process plant data provided by the acquisition system 16-i8 These are very sensitive to data noise and model imperfections. Several authors claim that this problem might be overcome by using fuzzy logic 19,5.20. Kramer 5 states that the
A methodology f or fault diagnosis: Part I use of different non-Boolean logic methods results in only slight differences in the results achieved. Nevertheless, fuzzy logic 15 is the most commonly used non-Boolean logic, owing to its power and relative simplicity. Another aspect that must be considered is that the number of sectors after process partitions may be too large. Thus, a fast diagnosis method is necessary. Considering this, qualitative methods are better than quantitative ones. In fact, the qualitative methods do not need complex calculations and the modelling cost is lower. The Signed Directed Graph (SDG) is a qualitative model which is widely accepted because it contains causal information as an additional feature. This kind of information is very useful in the diagnosis task. Methods based on SDGs can be classified into two classes. One class uses the SDG to develop on-line searching 4,14,16-18,20, whereas the other transforms the SDG to rules used by an expert system 19,21-24 The compilation of rules from the SDG enables a higher speed, because the information was 'processed' in the off-line mode. Considering the above mentioned facts, here we present a new method that uses SDGs, compilation and fuzzy logic simultaneously 19. This method involves a twofold operation. The first part, qualitative simulation and rule compilation, is executed in an off-line mode producing a set of rules derived from the process digraph. The second part is executed in an on-line mode; it evaluates the rules using on-line data and fuzzy logic. Qualitative simulation 21,25 and the way in which the fuzzy logic is used are the main differences compared with other published compilation methods. These features enable us to build a fast and robust diagnosis system, as is required when using any decomposition strategy. In the following sections the main features of the proposed methodology are presented.
2.2 Process modelling using the SDG In a SDG, the value of each node represents the qualitative state of a given process variable: normal (0), high ( + 1) or low ( - 1). Each arc represents the influence of a variable (initial node) on another variable (final node). A direct relation (both nodes deviate in the same direction) is shown by ( + 1) gain, the inverse by ( - 1) and (0) shows no relation. In other words, in a digraph the gain G 0 corresponding to the arc from node i to node j is a qualitative indicator of the direct effect on j due to a perturbation on i. Considering the process mathematical model, generally rewritten in the following form:
dXj
dt =fj (XI' X2'"" Xn)
(1)
a general expression for Gq is given by 4:
Gij = Sign k. OXiJ
31
F1
F2 K Fig. 1. Gravity-flow tank. strongly nonlinear, some gains become a function of the state variables, and both the perturbation size and its duration. Thus, changes in gain signs may be possible during fault evolution. To overcome this problem, two arcs with opposite signs can be used, replacing the arcs with variable gains a. Tarifa and Scenna 19.24 describe another way to handle variable gains. To clarify the above discussion, consider the system of Fig. 1. The mathematical model is: F2 = K v ~ dL_ F
A ~--
(3)
1 - F2
(4)
where F1 = 6 m3/h, K = 4.9, A = r/4. The stationary value o f L i s 1.5m. Fig. 2 shows the corresponding SDG. The solid arcs have positive gains, and the dashed arcs have negative gains. The mark -- indicates that K is an non-measured variable.
2.3 The diagnosis based on SDGs Digraph-based methods are attractive because little information is needed to build the digraph and carry out the diagnosis. Nodes of the SDG correspond to state variables, alarm conditions, or failure origins. Generally, it is assumed that a single fault which affects a single node in the SDG (the root node) is the source of all disturbances. The funda-
mental premise of digraph techniques is that cause and effect linkages must connect the fault origin to the observed symptoms. So, the diagnosis involves the location of all possible disturbance sources (root nodes) given the on-line sensor data. This is the base of Iri's 4,17 method and similar ones. However, these methods present some inconsistencies when they are applied to real cases. For instance, consider the fault # A I : a partial valve blockage (Fig. 1). This fault was simulated by changing the value of K from 4.9 to 4.5 at t = 1.4 h. Fig. 3 shows the evolution of L and F2. The fault causes a decay on F2; which, in turn, causes an
(2)
where the Sign function is used to obtain qualitative values. Oyeleye and Kramer's works 25,26 show different methodologies for the gain calculation. When the systems
FI
> L ~t. . . . . ~ F2 ~ Fig. 2. SDG of the gravity-flow tank.
-K
E. E. Tarifa, N. J. Scenna
32
6,20
1,75
5,80
1,50
F1
(0)
L "~. . . . . .
(+1)
F2 ~
~K "~. . . . . .
#A 1
(-l)
(-0
(+1)
i i t
Fig. 5. Cause-effect graph for the tank and #AI. - 5,40
1,25 1,00
0,00
I 1,00
5,00 2,00
3,00 h
L (Y1)
4,00
5 , 0 0 6,00
F2 (Y2)
Fig. 3. Effects of #A1 o n F 2 and L for a step function.
increase on L. Afterwards, F 2 returns to its normal value in spite of the fault. This compensation is caused by the increment of L. The conduct of F 2 is called the Compensatoo, Response (CR). Another common response is the Inverse Response (IR), where the final change is opposite to the first one. The SDG shown in Fig. 4 includes a new node representing the studied fault. Due to this fault, a decay on K is produced, so the gain for the arc from #A1 to - - K is - 1. This stage is called fault modelling, it determines how the fault will affect the SDG. Fig. 5 shows qualitative values adopted by nodes after the fault is produced. Also, there is a sub SDG that joins causally all the symptoms with the fault node. Within this" context, it is important to remark that the first proposed SDG-based diagnosis algorithms 4 used the following strategy. Find all the sub SDGs formed by all active arcs (whose gains are compatible with node values) and then look for the root nodes of each sub SDG (they are the possible faults). In Fig. 5, K : - 1. However, K is an non-measured variable. Thus, its values must be assumed to follow a continuous and consistent path. This is a problem when the number of non-measured variables increases (time consumption). On the other hand, due to measurement noise, it is necessary to define normality bands for all variables. Qualitative values will be null while the associated variable remains inside its normality band. Thus, when the first change of CR or IR variables is small, it is difficult to detect it. This is another serious problem for the diagnosis 26. For instance, F 2 has CR, thus its corresponding node may remain with a null qualitative value during the fault evolution. Therefore, along the SDG a continuous path from all the symptoms to the fault node may not exist. A feasible solution to this problem is to add new arcs to the SDG 21,25-28. These fictitious arcs must act as a bridge over the CR variables to form a continuous path. IR variables can be handled in a similar way. However, this solution has some disadvantages, as pointed out by Rose 29. F1
L "~. . . . .
~, F2 •
~K.~ . . . . . .
Fig. 4. SDG after modelling the fault #AI.
#A1
The main problem is that the number of possible fault propagation paths to be analyzed increases exponentially with the number of nodes and arcs. This is a serious problem for fault diagnosis in large processes. Kramer and Palowitch 2~ have pointed out that the SDG, derived as suggested in Section 2.2, has certain limitations. In fact, these SDG-based algorithms are adequate only if each variable undergoes not more than one transition between qualitative states during fault propagation. This is because the SDG represents only the initial system response to disturbances. In this way, when CR or IR variables evolve with more than one qualitative change during the propagation of the fault, a continuous causal pathway from the source node to each disturbed node may not exist. In order to solve this, the above authors have stated the need to work only with the first change of each node. In others words, a kind of snapshot capturing the first changes during fault evolution must be used. Finally, note that the use of Iri's methods 4,17 for practical problems (value assignation to the non-measured variables, sub SDGs searching for root node determination, and snapshot capturing) demand too much processing time. This is a critical point because these tasks must be done in real time (on-line mode). Thus, faster methods must be analyzed. 2.4 C o m p i l a t i o n o f the S D G
Compilation of the SDG was used by Kramer and Palowitch 2J to increase the rate of the diagnosis system because only compiled knowledge is used in the on-line mode. They have presented a method to obtain a set of rules from the SDG. This set comprises one rule for each possible fault. All rules involve only measured variables; so it is not necessary to propose values for non-measured variables. Rule evaluation is also easy, so sub-SDG searching and root node determination is avoided. In other words, processing time is highly reduced. In fact, a test problem formulated by Shiozaki et al. ~7, containing 99 nodes and 207 branches, was afterwards solved by Kramer and Palowitch 2t, reducing processing time from 5 min to a few seconds. Kramer and Palowitch 21 use Boolean logic and the compiled information is incomplete because only the satisfaction of the arc gains is checked, whereas node values (pattern) are ignored. Both factors increase the system sensibility to detection errors of both symptoms and sequences (the order of symptom detection). Unfortunately, they are common in real processes, due to noise and CR or IR variables. So, problems caused by CR and IR variables still remain. Oyeleye et al. 26.27 present a new model based on the idea of events (transitions of qualitative states) instead of state
A methodology for fault diagnosis: Part I variables. These authors introduced the concept of the Extended Signed Directed Graph (ESDG), defining new arcs that act as a bridge over the variables that could undergo CR or IR (as in Kramer and Palowitch 21). They implemented a computational algorithm to find these variables. The idea was implemented by Finch et al. 28 in MIDAS (Model Integrated Diagnostic Analysis System). However, according to Rose 29, in cases of industrial problems, the steps before using MIDAS (to build the ESDG and the event model), demand too much computation time.
2.5 Fault diagnosis using SDG, compilation and fuzzy logic Fig. 6 shows the off-line and on-line stages to build the diagnosis system. The off-line stage must be developed only once for each new process. Therefore, the computation time is not a critical factor, but of course it must be practical. A very important point is the use of qualitative simulation, the results of which are compiled in rules. These rules are often very large, and must be simplified. In our work, the final rules resemble those of Kramer and Palowitch 21, but they contain more information. As mentioned above, this additional information and the strategy for rule evaluation (fuzzy logic) together enable the diagnosis system to work with data noise, model limitation, and with CR and IR variables without adding extra arcs to the SDG 21,25-28 The other steps are carried out in the on-line mode. So, processing time is very important. However, our real time expert system must evaluate the set of rules using fuzzy logic, which implies a greater effort than the Boolean logic. Besides, the new rules are larger than the rules of Kramer and Palowitch 21 (because they have more information). Then, a particular expert system for the diagnosis task must be developed. The on-line stage will be discussed in Section 3. As pointed out by Kramer and Palowitch 21, the diagnosis problem is the inverse or the dual of a much simpler problem, namely, fault modelling (or qualitative simulation). Diagnosis uses a set of observed symptoms to generate a hypothesis, whereas dual modelling predicts the plant response when the operating faulty state is given. The last problem is easier to solve. Fault simulation using SDG implies the deduction of all the interpretations (directed tree branching from a given root node, only measured nodes must be considered). An interpretation represents a prediction of the dominant pathways of fault propagation; that is, a set of routes from the root node to each causally
Off-line
stage
Rules ~[ On-line - I stage
Fault cluster
connected node. It yields information about the event order (the sequence) and the direction of deviation of each node connected to the fault origin (the fault pattern). For a given digraph and a given fault origin, there may be many interpretations of the fault propagation, but only one or a small set of them reflects the real behaviour of the plant. Kramer and Palowitch el remark that qualitative simulation is a combinatorial explosive problem. Indeed, the number of potential propagation paths to be analyzed increases exponentially with SDG complexity. This is a serious problem for the simulation of any process and it gets worse for large ones. To avoid the explosion, these authors propose an implicit representation of the combined set of interpretations instead of enumerating each of them. However, in this way an important part of the interpretation information is lost: the fault pattern. Conversely, in our proposed algorithm all the interpretations are obtained. The risk of an explosion is reduced by keeping the SDG as simple as possible. This is achieved by avoiding the use of fictitious arcs (for controllers, and CR and IR variables) and neglecting the sector connections (i.e. working with a part of the whole process SDG). In this way, all the information contained in each interpretation is preserved. Fig. 7 depicts off-line steps. The first step is fault modelling. For each fault a new node is added to the SDG (which is built according to the observations in Section 2.2). Afterwards, all the interpretations are generated by qualitative simulation. Then, they are transformed in rules (compilation). There is one rule for each fault. Finally, these rules are simplified using the logic calculus laws 19,30 For instance, in the system described in Fig. 1, the fault modelling output is the SDG of Fig. 4. Qualitative simulation output is only one interpretation (see Fig. 8). Note that the node F 1 is not included in the interpretation because it belongs to the sub SDG that is not affected by the studied fault. The corresponding rule is: IF { ( F 2 = - 1) A N D ( L = before L is)} AND
THEN #A1 may be the fault. The first two propositions belong to the fault pattern. They verify that F2 is low and L is high. The third proposition considers the order or sequence of the symptoms. The Qualitative simulation
Process
t
t
model
state
SDG
Fig. 6. Off-line and on-line stages.
+ 1) AND (F2 is abnormal
{(Fl = 0) AND (NOT(FI is abnormal before L is))}
Process
t
33
Interpretations t
" I C°mpilati°n I
Rules Fig. 7. Off-line stage.
34
E. E. Tarifa, N. J. S c e n n a
L "t . . . . . .
(+1)
F2
(-1)
"~. . . . . .
On the other hand, N A Z i must verify only a particular interpretation (the sub SDG not affected by the fault):
#A1
(+1) (9)
N A Z i = Io i
Fig. 8. Interpretation of the fault #A1 for the tank. following proposition checks that F l is not being affected by the fault, then FI must be normal. Finally, the last proposition verifies that the arc from F1 to L is inactive. This proposition may be redundant, but it is included to maintain the symmetry, which is important using fuzzy logic. Comparison of this rule with that deduced from the method of Kramer and Palowitch 21 shows that it includes the following additional information: the complete fault pattern and the symptom sequence. This additional information, together with fuzzy logic, is used to overcome inconsistencies caused by CR and IR variables. Indeed, the fault pattern enables the recognition of fault propagation paths even if they are discontinued by CR and/or IR variables, a challenge for other methods 4,t7,21. This is possible because the diagnosis system does not already need to perform any path searching; all it has to do is to find out the pattern matching most of the observed symptoms. The symptom sequence also improves the diagnosis resolution 19,13.31 because faults with the same pattern may have different sequences. Finally, fuzzy logic dampens undesirable effects of data noise and spurious CR and IR caused by the wrong setting of normality bands, and provides the necessary condition to build a fault ranking which is the best form in which to report the diagnosis 5. Formally, our Rule Set (RS) rendered by the compilation step is expressed as follows: RS = {RI, R 2 ..... RNR }
(5)
R i = {A i =:¢' Ci}
(6)
So, RS is a set of rules Ri (one for each potential fault cluster of the system). Each rule has an antecedent A i and a consequent Ci. This consequent represents the cluster (a set of faults characterized by the same pattern and sequence) assigned to this rule. The antecedent A i is formed by two propositions: A i = EZ i A NAZ i
(7)
where ^ is the AND operator. The proposition EZi (Explored Zone) verifies that the observed symptoms agree with the estimations carried out by qualitative simulation. The second, N A Z i (Non Affected Zone), verifies that the observed symptoms do not belong to the part of the sub SDG which should not be affected by cluster Ci faults. On the one hand, E Z i is true if at least one possible propagation route li,j (Interpretations) is satisfied by the observed symptoms. That is:
NIi EZi=
V
li,j
j----I where V is the OR operator.
(8)
The verification of one interpretation implies that all its elements must be observed. In the structure of lij, two elements are considered: li, j = Pi,j/~Sid
(10)
where Pij are patterns and S o are sequences. Patterns and sequences are defined as: NXi,j Pi,j ~"
/~
Xid, k
( 1 1)
bi,j, k
(12)
k=l
Nbi,j Si,j~-
/~
k=l Thus, the satisfaction pattern implies that each variable evolves according to its expected value (shown by the proposition Xij,~) and the sequence satisfaction implies that the sequence is verified in each arc (it is expressed by the proposition bi,j,k). These propositions are calculated in this way: Xi,j,k =-- (rXi,j,k = 6Xl!j,k)
(13)
hi d, k =-- (rbi,j, k = t3bi,j,k o )
(14)
where 6Xi,).k are qualitative values supplied by the detector module. 6bi.), k indicate the observed qualitative sequence of arcs bij.k ( + 1 if the sequence is right, 0 if there are no symptoms affecting the arc, and - 1 if it is the inverse). ~SX° and ~Sb° are the predictions of qualitative simulation for node values and arc sequences, respectively. In other words, the evaluation is carried out by comparing the actual process state with the predicted state by qualitative simulation. For instance, in the rule for the fault #A1 previously shown, one proposition of type X is (L = + 1) and one proposition of type b is (F2 is abnormal before L is). Finally, the special interpretation Io is defined as Ii.j, k but for Io both qualitative predictions (bX ° and ~b °) are zero because all the involved nodes and arcs belong to N A Z . Moreover, it is convenient to define a special rule called #Normal for the normal state of the process. This rule is true if the process is operating normally. To build this rule the whole SDG must be considered as N A Z . The rules yielded in this stage are too large. Thus, they must be simplified to save computer memory and processing time. To achieve this goal, the following law of the logic calculus is used: (p A q) V (p A r) ¢:~p A (q V r)
(15)
Both expressions are equivalent, but the one at the right side is more simple. This law is applied to each rule
35
A methodology f o r fault diagnosis: Part I
antecedent A i. This step does not yield any information loss. Tarifa 19 presents four alternate methods based upon this idea that decreased the rule size up to 2% of the original for the studied case (see Part II).
3 THE ON-LINE STAGE 3.1 Introduction
A good diagnosis system must meet the following characteristics. Exactness: The real fault must always be included in the reported candidate set. Resolution: The fault candidate set should always be minimum. Speed: The system should report the true fault as soon as possible. Stability: It is important that only smooth changes are presented in the fault candidate set during the diagnosis task. Feasibility: The cost (and the effort) of the system generation must be reasonable.
To reach these goals, Fig. 9 shows how the on-line stage is designed. The data acquisition system provides the quantitative values of each measured variable. These values are updated according to parameter At (sample period). The detection module transforms this quantitative information into fuzzy qualitative values 32. If an abnormality is detected, the diagnosis module is started. This module selects the set of rules related to the plant state and evaluates them using fuzzy logic 5,15,23,32 34. Once the fault is located, the evaluation module is started. It finds all the interpretations belonging to a given fault and predicts all the possible consequences. Afterwards, the explanation module explains the reasoning chain followed to reach the diagnosis. Finally, the assistant module suggests a possible action plan. Several works report the use of non-Boolean logic (Boolean logic uses only two values of truth: TRUE and FALSE) in fault diagnosis. For instance, Dohnal 32 uses fuzzy logic in CONFUCIUS (CONcentrated FUzzy Cicerone of USer) Processstate--~
implemented in Pascal. Kramer 5 presents a method that uses both the violation and the satisfaction of the qualitative pattern generated by the equation residues 34. It is proven that non-Boolean logics increment system stability and sensibility. Moreover, it is shown that there are only slight differences among the performances of the non-Boolean logics tested. Petti and Dhurjati 33 analyse the sensibility of the evidence on the diagnosis process. Yu and Lee 23 combine quantitative and qualitative approaches using fuzzy logic and the Chang and Yu 22 method. Finally, quantitative manipulation of fuzzy sets during diagnosis was introduced by Han et al. zo In this work, the use of fuzzy logic 15 is quite different to that of other reported approaches. In fact, the main reason for using fuzzy logic here is to avoid the rigid evaluation of the causal order of events. That is, fuzzy logic is used to deal with data noise, CR and IR variables and the local reasoning with partial information. 3.2 The detection module
The detection module transforms quantitative values into qualitative ones by using fuzzy sets 15. A fuzzy set Z is a set of ordered pairs: Z = {(z,l~z(z))/z E ~/kl~ z : Uz ---. [0, III
where #:(z), which is a number in the interval [0, 1], represents the membership grade of element z to set Z. The ends of this interval represent no membership and full membership, respectively. Fig. 10 shows the fuzzy sets used by our detection module: High ( + 1), Normal (0) and Low ( - 1 ) . AX (quantitative deviation of X) is the difference between the real value of X and its normal value. /~N.... 1 (membership grade of Ax to the Fuzzy Set Normal) expresses the degree of belonging of ~X to the Fuzzy Set Normal. ~Normal = 1 means full belonging degree to the set Normal, whereas #Normal = 0 means the contrary. In the same way, #Low and /.tHigh are defined. To simplify the calculations, it is convenient to define the qualitative fuzzy deviation (~X) (Fig. 11). I f X increases too much and goes beyond the normal limit (i.e./ZNormaJ= 0 and /ZHigh= 1) then 6X will adopt the value + 1. This value will
Detection I
8x rRules ~
I I I
(16)
1
Diagnosis !I _1 Evaluation I
-i
I
Interpretations I r~1I
Analysis
II I
I Assistance [
Fig. 9. On-line stage.
~
Fault cluster
= Predictions
Normal
Low I I I I
}a
I I I I I
Explanations
I
I I I I I I
I I I I I I I I I I I
X~
Xs
I I
0 .Xs
~- Suggestions
I I I I I I I I I I I I
High
_XI
0
AX Fig. 10. Fuzzy sets used by the detection module.
E. E. Tarifa, N. J. Scenna
36
i
.........................
fiX 0
1 -1 -x s
-x'
0
1
~X
i .......
l-p° l _ _ _ ~
II I III I
II I III I
X~
Xs
o
Fig. 11. Qualitative fuzzy deviation definition.
be considered as the first change (see Section 2.3), and will be retained during the diagnostic process. It means that only when the 'complete deviation' is produced is the first change assumed; but if the variable evolution remains inside the normal band, 6X will evolve between - 1 and +1. The limits X l (inferior) and X s (superior) for normal bands are chosen, in accordance with standard statistical theory. It is assumed that each variable has a noise with normal distribution. X s is three times the standard deviation (the certainty of avoid false alarms is 99.73%), and X I is 1.5 times the standard deviation (the certainty is 86.66%). Formally, the qualitative deviation of x is:
OX,, =
{
ifl0x~
Sign(AX.)[1
-- ~Normal(~kXn)]
0
+1 8X
Fig. 12. Evaluation functions. An example.
AX
0Xn t
-1
11=1
if Iox,, i I < 1
OXo = 0
~X from the detection module. Afterwards, vector bb is calculated (Section 2.5) using fuzzy sets theory. Both vectors are used to calculate certainty grades of the propositions X and b (eqns (13) and (14)):/~x and #h, respectively. They depend on the grade of agreement between the expected values predicted by simulation (~X ° and 6b °) and the observations (~X and 6b). Evaluation functions are used to describe these relationships. These functions determine #x and #b from the following vectors: ~X °, ~b °, ~X, and bb. For instance, Fig. 12 shows the evaluation function used when ~X ° = + 1. It shows that/Zx ~ [1 - #o, 1]. The parameter tz° is a prefixed number that attenuates the 'severity' of the evaluation. Indeed, if ~X = - 1 (the worst case for the prediction 6X ° = + 1) the certainty of X is minimal and equal to (1 - / z °) but not zero. The other evaluation functions and the parameter 0 are defined in a similar way. Once #x and ~b have been evaluated, the expert system can calculate the certainty grades of rule antecedents (t~A)During this calculation the following definitions for the operators A N D - O R are used: lLpAq = IApltq
(18)
I.*pVq
(19)
(17) where n is the sample at time tn = nat (At is the sample interval). 3.3 The use of fuzzy logic to evaluate the rules
Our diagnosis module is an expert system that is able to process rules using fuzzy logic. When a rule is evaluated using fuzzy logic, the result is a real number between 0 and 1 called certainty grade. If its value is 0 the rule is false, and if it is 1 the rule is true. Values between 0 and 1 mean intermediate values of certainty. The certainty grade of a rule shows the satisfaction degree of the expected symptoms and sequences for qualitative deviations during the fault propagation. It is calculated comparing the first observed change with the expected value for each variable. Once all the rule certainty grades are calculated, they can be reported as a ranked list, showing the 'probability' associated with each fault candidate. Note that two different types of uncertainties are considered: one in the observation (acquisition system), and the other in the symptom expectation (qualitative simulation). Fuzzy sets enable us to deal with the first, while the second is overcome using fuzzy logic for rule evaluation. Rule evaluation begins with the reception of the vector
=
Max(l~pl~q)
where t~p and/Zq are the certainty grades of the propositions p and q, respectively; whereas #p ^ q and /zp v q are the certainties of (p AND q) and (p OR q), respectively. Several definitions are reported considering those calculations. However, only some of them are appropriate for a particular problem. In this work, the product was adopted for the AND operation because each symptom and each arc contributes proportionally to the full pattern satisfaction. Conversely, for the OR operation, it must be considered that only one interpretation will be the dominant one. In fact, interpretations do not contribute simultaneously to the fault certainty grade, but they do so individually. Thus, only the interpretation with the highest certainty grade must be considered. Therefore, the use of the Max function for the OR is appropriate. Once all these values are calculated, certainty grades of rule conclusions (/~c) can be evaluated. In this work, the vector # c is equal to /zA. The diagnosis module reports a set of graphics:/Zci vs. time. So, while the process is operating normally, the line belonging to the rule #Normal is at the top. However, if a fault is activated, the curve representing #Normal decays, and the line of the fault that is over the rest will be the most important candidate.
A methodology for fault diagnosis: Part I 3.4 The computional implementation
37
7,00
6,00
In this work, the diagnosis module is an ad hoc expert system. An expert system is a computer package designed to simulate or model human experts. It is a form of artificial intelligence and provides expert knowledge to a user through an interactive environment. Expert systems consist basically of a knowledge base, an inference engine and casespecific information, with some kind of user interface attached to the inference engine. The knowledge base is an ordered list of detailed facts, rules, and heuristics about the theme of interest. The inference engine can infer new rules or facts by considering the knowledge base information and the case-specific data. That is, it consists of a type of 'higher level knowledge' regarding how the knowledge base is ordered and used to generate a problem solution. The object of the user interface is to question and report the user. Expert systems should also provide intelligent explanations of their reasoning on request and/or after finding the problem solution 4 In this work, the expert system is written in Turbo Pascal 7.0 (Borland ®) for a PC ASTkIBM 486AX. The problemsolving model used is the blackboard model. In it, the solution space is organized into one or more applicationdependent hierarchies. The information of each level in the hierarchy represents partial solutions. On the other hand, the domain knowledge is partitioned into independent modules of knowledge that transform information at one level of the hierarchy into information at the same level or other levels. The choice of a knowledge module is based on the solution state and on the existence of a knowledge module capable of improving the current state of the solution 35. Fig. 13 shows the blackboard model used in this work. The goal is to determine vector/~c, but this implies the calculation of several intermediate variables from 6X. In other words, it is necessary to determine many partial solutions. Each knowledge source has been assigned to the calculation of one determined vector.
3.5 A small example To enhance the understanding of the proposed algorithm, in this section a small example will be analyzed. In Section 2, the off-line stage for the system shown in Fig. 1 was
5,00~ 4,00
3,00 200-t
................
1,00 ~ ...................................... / 0,00 | ~ = ~ 0,00 1,00 2,00 3,00 t [hi Fl[m3/h]
..........
:
4,00
I 6,00
5,00
L[m]
F2[m3/h]
Fig. 14. Effects of #A1 on F2 and L for a ramp function. described. Here, the on-line stage for the same system will be examined. In the off-line stage, all potential faults must be modelled; but for this example, only two faults were modelled: a partial valve blockage (#A1) and a high Fl (#A2). Therefore, the knowledge base contains three rules: #A1, #A2, and #Normal. Faults #A1 and #A2 were chosen because they create similar symptoms. Thus, achieving a diagnosis with high resolution is very difficult. This situation gets worse when the symptom magnitude is small. For instance, Fig. 14 shows the evolution of the measured variables when the fault #A1 affects the system. In Section 2 a step function for K was used to simulate this fault (see Fig. 3); here, a ramp function is used to simulate a gradual valve blockage. Note that F2 remains normal, and equal to Fl, as it is a CR variable. This implies that the only visible symptom is a rise in L. Both #A1 and #A2 can explain this, but both rules wait for a decrease in F2 that is not present. So far, our diagnosis system cannot decide which of them is the actual fault; but any other system, with Boolean logic, will erroneously reject both of them. Finally, the rule #A2 waits for an additional symptom: a rise in F 1. This symptom is also not present. Then, the diagnosis system must prefer #A1 but not reject #A2. Fig. 15 shows the diagnosis system output for this case. When the fault occurs, the rule #Normal decays and both #A1 and #A2 increase. #A1 is always at
I11~
1,00 0,80 ~
Rule evaluation
[LI,zNA
tJ.Exp
IJ,~rt
col
\,
0,40 ~'!
I/~tt
Px
Proposition evaluation
6b
I.
0,001
0,00
Sequence estimation
I
Fig. 13. Structure blackboard used by the diagnosis module.
1,00
:
~
~
2,00
3,00
4,00
~
r
5,00
t [h]
~I
........
#A2
....
#Normal
Fig. 15. System diagnosis output for #A1.
6,00
38
E. E. Tarifa, N. J. Scenna
the top, which is correct, but its degree of certainty is less than 1. This is because some expected symptoms are not present. However, the diagnosis system can find out the actual fault. Once the fault #A1 is identified, the diagnosis system can find out the dominant interpretation from all the interpretations used to generate the rule #A1. The dominant interpretation is the one which explains most of the observed symptoms. Afterwards, a natural explanation is generated using this interpretation. In this case, using the interpretation shown in Fig. 8, the explanation is: the fault may be #A1 because it may cause a decrease in F2 (not observed), which may cause a rise in L (observed). Predictions of symptoms that have not yet been observed can be made in a similar way.
4 DIAGNOSIS IN LARGE SCALE PROCESSES Kramer and Palowitch 21 have pointed out that the major problem of the fault qualitative simulation is its explosive nature (Section 2.5). Nevertheless, explicit evaluation of all the interpretations provides additional information (the fault propagation order and the fault pattern) for the diagnosis task. Therefore, a strategy to avoid the explosion during the qualitative simulation is required. Generally, at the level of units (e.g. each stage of a process) it is possible to obtain all the interpretations without explosion. When the process is appropriately divided however, if after the decomposition (structural or functional) some units produce too large a graph, two solutions are possible. The first is to divide (with some structural or functional meaning) those units to reduce the graphs until qualitative simulation becomes practical. The second alternative is to obtain all possible interpretations by a restricted simulation, which is the causal decomposition. In this kind of decomposition only a set of nodes, the nearest to the root node (i.e. the simulated fault), is considered in the exploration procedure. In other words, the exploration deepness through a directed tree (i.e. an interpretation) is limited by the number of measured nodes that can be contained in each interpretation branch (the highest number restricted by the explosion possibility). The rest of the nodes are rejected. It is important to note that within this alternative the number of nodes is adopted in such a way to avoid the explosion. Thus, there is a zone formed by all the measured variables outside the considered set (but affected by the fault), the information of which is lost 19 This lost of information may affect the diagnosis exactness. Fuzzy logic is used to overcome this problem. Indeed, if some symptoms or sequences do not match the expected values, fuzzy logic avoids the elimination of the real fault during the diagnosis procedure. In the worst case, a gradual degradation of the fault candidate into the fault ranking set may happen (loss of resolution). However, it is always feasible to incorporate into the rules other sources of knowledge, e.g. material, energy or momentum balances, expert
knowledge or some special distinctive information. The combination of qualitative and quantitative information for the diagnosis task is explained and implemented in MIDAS 13.26-29 The causal decomposition supposes that the symptom importance decreases with distance from the root node, so the farthest nodes can be neglected without affecting diagnosis quality. Evidently, both structural and functional decomposition implies also the full plant SDG truncation. But in these cases, the articulation between both reasoning levels is more natural 12,13 and easier, due to the underlying 'physical inside' that is the opposite of an arbitrary truncation. Thus, if possible, structural or functional decomposition must be preferred to determine the process sectors. In this work, the causal decomposition was not used. The main aspects of this method for large scale plants can be summarized as follows. First, there is a division of the full SDG into smaller ones using functional or structural decomposition. In general, according to several analyzed examples 19, the size of SDGs resulting from this stage is suitable for qualitative fault simulation; thus an explosion is avoided. Then, after the process has been divided into small functional or structural sectors, the diagnosis reasoning task in each SDG (lower level) is done independently from the others. In other words: 1. The first level supervises the whole system. While the process is normal, the detection module checks all the measured variables of the whole system. 2. If any symptom is detected, the second level is activated. Only sets of rules of the nearest sectors are evaluated. These sectors are those in which the symptom is observed, and if necessary, the others that act directly upon it. For point 2, a model that describes the connections among sectors, such as Finch and Kramer's ~2.13, may be used. However, alter the evaluation of the algorithm performance in several examples Jg, it was found that in most of cases it is enough to implement the knowledge data base considering only the rules associated to each SDG and neglecting interactions among adjacent SDGs. This fact strongly simplifies the first level strategy as will be shown in Part II. It is important to note that there is a strong connection between the SDG partition task and both the expert system generation and performance. Our strategy imposes a deep analysis of the SDG partition to minimize both the numbers of SDGs (avoiding explosion) and the interactions among them (i.e. use structural or functional decomposition). These objectives can be achieved. Nevertheless, it must also be mentioned that we can not present a general procedure to perform an optimal SDG partition for all the cases. More research must be done to identify a strategy for optimal SDG partition. To clarify all these points, in Part II of this work, an example with a large SDG will be studied from this point of view. System construction and performance evaluation will be analyzed using a rigorous dynamic simulator
A methodology f o r fault diagnosis: Part I especially developed to simulate appropriately all types o f fault.
5 CONCLUSIONS A practical method for on-line diagnosis of large chemical plants was proposed. Kramer and Palowitch's 21 ideas (i.e. SDG compilation and the use of the first change) were adopted to increase the diagnosis speed. However, the new method uses qualitative simulation and fuzzy logic to improve the diagnosis system performance. The plant is split into sectors. Each sector is modelled using SDGs, which must be of the proper size for qualitative simulation. This decomposition can be structural, functional or causal. Nevertheless, the first two strategies must be used whenever possible. Fuzzy logic is used to consider the implicit uncertainty both in the observation (rX, 6b) and in the predictions (SX °, 6b°). The detection module calculates the qualitative deviations (rX) corresponding to each node. The diagnosis module evaluates the rule set RS to determine the certainty /zc of each cluster, which is used to produce a ranked list of fault candidates. The rules and the vectors/iX ° and 6b ° describe the faulty process behaviour. They are outputs of the qualitative simulation. By adjusting the parameters #0 and ~t°, the diagnosis system can ignore either the sequence (similar to fault dictionaries 24,36) or the pattern (similar to the method of K o k a w a et al. 31). Moreover, Kramer and Palowitch's 21 method can be emulated by changing the evaluation functions. Application to a M S F will be analyzed in the second part of this work. This process is a serious challenge for any diagnosis system because the SDG associated with the entire plant is very large, and so is the quantity of recycled information, and the number of variables to be supervised. Thus, there is a high probability of failure in conventional algorithms. Moreover, this is an interesting challenge for the process division strategy.
REFERENCES 1. Isermann, R. Process fault detection based on modelling and estimation methods--A survey. Automatica, 1984, 20(4), 387-404. 2. Himmelblau, O. M., Fault Detection and Diagnosis in Chemical and Petrochemical Process. Elsevier, Amsterdam, 1978. 3. Lapp, S. A. & Powers, G. J. Computer-aided synthesis of fault trees. IEEE Trans. Reliab., 1977, R-26, 2-13. 4. Iri, M., Aoki, K., O'Shima, E. & Matsuyama, H. An algorithm for diagnosis of system failures in the chemical process. Comput. Chem. Eng., 1979, 3, 489-493. 5. Kramer, M. A. Malfunction diagnosis using quantitative models with non-Boolean reasoning in expert systems. AIChE J., 1987, 33, 130-140. 6. Kvalheim, O. M. A partial-least-squares approach to interpretative analysis of multivariate data. Chem. Intell. Lab. Syst., 1988, 3, 189-197.
39
7. Venkatasubramanian, V., Vaidyanathan, R. & Yamamoto, Y. Process fault detection and diagnosis using neural networks--I. Steady-state processes. Comput. Chem. Eng., 1990, 14(7), 699-712. 8. Pew, R. W. & Baron, S. Perspectives on human performance modelling. Automatica, 1983, 19(6), 663-676. 9. Rouse, W. B. Models of human problem solving: Detection, diagnosis, and compensation for system failures. Automatica, 1983, 19(6), 613-625. 10. Chen, L. W. & Modarres, M. Hierarchical decision process for fault administration. Comput. Chem. Eng., 1992, 16(5), 425 -448. 11. Krishnamurthi, M. & Phillips, D. T. An expert system framework for machine fault diagnosis. Comput. Ind. Eng., 1992, 22(1), 67-84. 12. Finch, F. E. & Kramer, M. A. Narrowing diagnostic focus using functional decomposition. AIChE J., 1988, 34(1), 2536. 13. Finch, F. E., Automated fault diagnosis of chemical process plants using model-based reasoning. Ph.D. thesis, Massachusetts Institute of Technology, Boston, 1989. 14. Mohindra, S. & Clark, P. A. A distributed fault diagnosis method based on digraph models: steady-state analysis. Comput. Chem Eng., 1993, 17(2), 193-209. 15. Zadeh, L. A. Fuzzy sets. lnfo. Control, 1965, 8, 338-352. 16. Tsuge, Y., Shiozaki, J., Matsuyama, H., O'Shima, E., Iguchi, Y., Fuchigami, M. & Matsushita, M. Feasibility study of fault diagnosis system for chemical plants. Int. Chem. Eng., 1985, 25, 660-667. 17. Shiozaki, J., Matsuyama, H., O'Shima, E. & Iri, M. An improved algorithm for diagnosis of system failures in the chemical process. Comput. Chem. Eng., 1985, 9(3), 285-293. 18. Umeda, T., Kuriyama, T., O'Shima, E. & Matsuyama, H. A graphical approach to cause and effect analysis of chemical processing systems. Chem. Eng. Sci., 1980, 35, 2379-2388. 19. Tarifa, E. E., Fault diagnosis in complex chemistries plants: plants of large dimensions and batch processes. Ph.D. thesis, Universidad Nacional del Litoral (UNL), Santa Fe, 1995. 20. Han, C. C., Shih, R. F. & Lee, L. S. Qualitative signed directed graph with the fuzzy set for fault diagnosis resolution improvement. Ind. Eng. Chem. Res., 1994, 33, 19431954. 21. Kramer, M. A. & Palowitch, B. L. A rule-based approach to fault diagnosis using the signed directed graph. AIChE J., 1987, 33(7), 1067-1078. 22. Chang, C. C. & Yu, C. C. On-line fault diagnosis using the signed directed graph. Ind. Eng. Chem. Res., 1990, 29, 12901299. 23. Yu, C. C. & Lee, C. Fault diagnosis based on qualitative/ quantitative process knowledge. AIChE J., 1991, 37(4), 617628. 24. Tarifa, E. E. & Scenna, N. J. A fault diagnosis prototype for a biorector for bioinsecticide production. Reliability Engineering and System Safety, 1995, 48, 27-45. 25. Oyeleye, O. O. & Kramer, M. A. Qualitative simulation of chemical process systems: steady-state analysis. A1ChE J., 1988, 34(9), 1441-1454. 26. Oyeleye, O. O., Qualitative modelling of continuous chemical processes and applications to fault diagnosis. Doctoral thesis, Massachusetts Institute of Technology, Boston, 1990. 27. Oyeleye, O. O., Finch, F. E. & Kramer, M. A. Qualitative modelling and fault diagnosis of dynamic processes by MIDAS. Chem. Eng. Commun., 1990, 96, 205-228. 28. Finch, F. E., Oyeleye, O. O. & Kramer, M. A. A robust event-oriented methodology for diagnosis of dynamic process systems. Comput. Chem. Eng., 1990, 14(12), 13791396. 29. Rose, P., A model based system for fault diagnosis of
40
E. E. Tarifa, N. J. Scenna
chemical process plant. M.S. thesis, Massachusetts Institute of Technology, Boston, 1990. 30. Huang, Y. L. & Fan, L. T. A fuzzy-logic-based approach to building efficient fuzy rule-based expert systems. Comput. Chem. Eng., 1993, 17(12), 181-192. 31. Kokawa, M., Miyazaki, S. & Shingai, S. Fault location using digraph and inverse direction search with application. Automatica, 1983, 19(6), 729-735. 32. Dohnal, M. Linguistics and fuzzy models. Comput. Ind., 1983, 4, 341-345.
33. Petti, T. F. & Dhurjati, P. S. Object-based automated fault diagnosis. Chem. Eng. Commun., 1991, 102, 107-126. 34. Mah, R. S. H., Stanley, G. M. & Downing, D. M. Reconciliation and rectification of process flow and inventory data. Ind. Eng. Chem. Process Des. Dev., 1976, 15, 175-183. 35. Barr, A., Cohen, P. R. & Feigenbaun, E. A., The Handbook of Artificial Intelligence. Addison Wesley, New York, USA, 1989. 36. Berenblut, B. J. & Whitehouse, H. B. A method for monitoring process plant based on a decision table analysis. Chem. Eng. (London), 1977, 318, 175-181.