Fuzzy Sets and Systems 158 (2007) 2715 – 2733 www.elsevier.com/locate/fss
Adaptive Mamdani fuzzy model for condition-based maintenance Ranganath Kothamasu, Samuel H. Huang∗ Intelligent Systems Laboratory, Department of Mechanical, Industrial, and Nuclear Engineering, University of Cincinnati, Cincinnati, OH 45221, USA Received 20 November 2006; received in revised form 24 June 2007; accepted 3 July 2007 Available online 25 July 2007
Abstract Proper maintenance of equipment to prevent failures has become increasingly important. For manufacturing companies, it enables uninterrupted production to support lean manufacturing. For commercial carriers, it ensures the safety of passengers and crew members. Maintenance technology has progressed from time-based to condition-based. The idea of condition-based maintenance (CBM) is to monitor equipment using various sensors to enable real-time diagnosis of impending failures and prognosis of equipment health. The success of CBM hinges on the ability to develop accurate diagnosis/prognosis models. These models must be cognitive friendly for them to gain user acceptance, especially in safety critical applications. This paper presents a neuro-fuzzy modeling approach for CBM. The emphasis is on model comprehensibility so it can effectively serve as a decision-aid for domain experts. The comprehensibility of a neuro-fuzzy system usually deteriorates once rules are tuned. To solve this problem, Kullback–Leibler mean information is used to evaluate and refine tuned rules so they remain easily interpretable. The effectiveness of this modeling approach is demonstrated via a couple of real-world applications. © 2007 Elsevier B.V. All rights reserved. Keywords: Fuzzy systems; Neural network; Condition-based maintenance; Diagnosis
1. Introduction Maintenance is the set of activities performed on a system to sustain it in operable condition. The oldest and most common maintenance strategy is “fix it when it breaks.” The appeal of this approach is that no analysis or planning is required. The problems, however, are the reduction in availability and high unscheduled downtime because of unanticipated breakdowns. Condition-based maintenance (CBM) refers to the practice of triggering maintenance activities as necessitated by the condition of the target system. CBM thus entails the process of diagnosis of the target system and timely identification of incipient or existing failures, popularly known as failure detection and identification (FDI). FDI has been given due research focus; however, there is a dearth of autonomous yet interactive decision making tools that would perform diagnosis and prognosis under the precepts of CBM. CBM offers many advantages over a traditional time-based strategy that typically is modeled around the popular bath-tub curve [23]. Time-based maintenance tends to be too conservative resulting in very high maintenance costs.
∗ Corresponding author. Tel.: +1 513 556 1154; fax: +1 513 556 3390.
E-mail address:
[email protected] (S.H. Huang). 0165-0114/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.fss.2007.07.004
2716
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
The bath-tub curve fails to acknowledge the complex interactions between the different components of a system and is especially not suited to discrete manufacturing systems with frequent changes in work content and schedule. CBM, on the other hand, is highly generic and can be used to generate efficient maintenance strategies. CBM, being a proactive process, requires the development of a predictive model that can trigger the alarm for maintenance. In many instances, this model could be loosely based on analytical criteria developed on the signals collected from the system. In a much more sophisticated form, it would necessitate the development of prognostic and diagnostic models that can predict the future state of a system besides diagnosing the current state. Such models can be developed using the process data, its history and several other factors such as future schedule. These models have to be precise and robust, in addition to possessing some form of autonomous modeling capabilities. In this paper, we present an approach based on Mamdani fuzzy model and adaptive learning for system diagnosis and prognosis. The approach is used to develop a robust and lucid modeling system that can assume the role of a decision making aid in the CBM arena. The system can be subjected to continuous improvement (or plain modification) by interacting with the users. A couple of applications are used to demonstrate the effectiveness of the system, including a real-word aircraft engine condition monitoring case study. 2. Background Quality is increasingly seen as a motivation for improved maintenance management as [3]. Another compelling but less addressed justification of maintenance is safety and environmental preservation, which assumes a highly significant role with increase in stringency of safety and environmental laws. Since operational hazards and accidents lead to enormous legal expenses, inattention to these issues is no longer affordable [19]. Although the above motivational factors have direct economic impacts, efficient maintenance on its own has economic objectives [21]. Though the return on investment is highly dependent on the specific industry and the equipment involved, a survey [19] states that an investment in monitoring-based maintenance of between $10,000 and $20,000 results in savings of $500,000 a year. Across many industries, 15–40% of manufacturing costs are typically attributable to maintenance activities. In the current competitive marketplace, maintenance management plays an increasingly important role in combating competition by reducing equipment downtime and associated costs and unscheduled disruptions [1]. These insights instigated the development of various paradigms like total productive maintenance [18] which aims at maximizing equipment efficiency and Terotechnology [10] which offers a much broader perspective including the supply (to the system), engineering, and market modules of a system. These paradigms prescribe predictive maintenance over reactive or a simple time-based maintenance. Predictive maintenance can be classified into CBM and reliability centered maintenance (RCM). CBM is a decision making strategy where the decision to perform maintenance is reached by observing the “condition” of the system and/or its components. The condition of a system is quantified by parameters that are continuously monitored and are system or application specific. For instance, in the case of rotary systems a vibration characteristic or index is an appropriate choice. The advantage of this approach is immediately apparent as the decision is made on depictive and corroborative data that actually reflects the state of the system. It is highly presumptive to assume that the state of a system would always follow the same operational curve, which is the underlying assumption in preventive maintenance. In an industrial or production environment, the system is exposed to random disturbances, which cause deviations in the operational characteristics. Hence, it is highly justified to monitor the condition of system and base the maintenance decision on the state of the system. Some of the advantages of CBM are prior warning of impending failure and increased precision in failure prediction. It aids in diagnostic procedures as it is relatively easy to associate the failure to specific components through the monitored parameters. It also can be linked to adaptive control thus facilitating process optimization. The disadvantage, of course, is the necessity to install and use monitoring equipment and to develop some level of modeling or decision-making strategy. RCM utilizes reliability estimates of the system to formulate a cost-effective schedule for maintenance [14]. It was originally developed in the aircraft industry. For aircraft and other safety-related applications, cost-effectiveness is balanced with safety and availability with the goal of minimizing costs and downtime but eliminating the chance of a failure [17]. RCM is a union of two tasks, one of which is to analyze and categorize failure modes based on the effects of the failure on the system, and the other is to assess the impact of maintenance schedules on reliability. Failure analysis starts with the identification of all the failure modes and proceeds with categorization of these failure modes based on the consequences of each failure. The results comprise a failure modes and effects analysis (FMEA). Usually the
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
2717
consequences of failure are operational, environmental/safety, or economic [19]. Once the effects have been identified, the decision logic algorithms prioritize the effects. These algorithms tend to be industry specific as the constraints and requirements of each industry vary considerably. Though conventional RCM-based maintenance intervals were determined similarly to planned or scheduled maintenance, condition monitoring techniques are increasingly being used to determine the optimum interval [13,20]. In this sense, RCM and CBM are converging to the same platform of system diagnosis and prognosis.
3. Intelligent CBM It is evident from the current state of the art in system maintenance applications that CBM is a generalized and efficient maintenance paradigm. It is possible to create an architecture that can be used to generate maintenance solutions using this paradigm. It is also possible to achieve a seamless integration of model-based FDI algorithms with this architecture in order to create decision tools that aid users in their maintenance applications. We have developed such an intelligent CBM system architecture as shown in Fig. 1. The system has four modules; namely, data acquisition, feature extraction, model generation, and model deployment. Data acquisition is the process of acquiring data from the target system and its environment. The data include information extracted from the process, sensors monitoring the process, and the environment. Feature extraction is an essential element as raw data are seldom useful in its own form, especially if it comes from a sensor used in vibration or acoustic emission application. It is the extraction of useful information from the raw data which reflect the condition of the target system. Several features can be extracted from the domains of signal processing, time series analysis and diagnostic analysis.
Fig. 1. Intelligent CBM system architecture.
2718
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
Model generation refers to the actual process of building a predictive model from extracted data. This model would take the extracted features as inputs and provide relevant outputs based on whether it is a prognostic or diagnostic application. A diagnostic model would give information on the existence of any failure along with its type. A prognostic model would give information on the expected state of system as reflected by the condition parameters (features) or some other specified output typical of the application. For instance it is typical to use indicators such as RMS (root mean square) and Kurtosis in bearing wear applications. In such an application, the diagnostic model would detect bearing failures while prognostic model could estimate the future RMS or Kurtosis values. As depicted in the architecture, the domain expert plays a significant role in the process of feature extraction and some parts of the model building process. The expert’s knowledge can be efficiently used to identify the right set of features as well as generate the necessary knowledge to create the models. This approach is consistent with the objective to create an easily interpretable decision aid with certain amount of autonomy. Model deployment refers to the process of integrating the created model into the monitoring system as well as establishing the proper channels of communication with the various business functions such as maintenance, production planning, and quality control. The two outputs of model generation process, namely, the model itself and the process knowledge, are to be systematically integrated with these business functions. This paper concentrates on the model generation process. As can be seen from the architecture, the predictive model needs to handle diagnosis and prognosis, which are two classes of learning problems; namely, classification and function approximation. These problems have been extensively studied in both the parametric and non-parametric arena and there is a multitude of paradigms that address the issues involved. To determine which paradigm is the most appropriate to CBM, we start by listing the desirable characteristics of a CBM predictive model: • Adaptive: The algorithm should be adaptable as it functions in a highly dynamic and non-linear environment. • Flexible: The algorithm should be as generic as the architecture and should be a universal approximator. It should also be flexible enough to incorporate various forms of knowledge—data, heuristic and analytical as provided by the domain expert. • Lucid: The algorithm should be able to create models that are highly transparent. This is essential as the model has to act as a decision aid and should also generate useful and intelligible knowledge about the process. This is also essential as the domain expert is tightly integrated with the model building process. • Robust: The algorithm should be able to create models that are robust to handle the demands of a real-time algorithm such as noise handling capabilities. Parametric estimation methods are, in general, highly robust and theoretically can estimate any system to the required accuracy. However, they require assumptions on the distribution of some modeling elements, for instance, the data or the error. Several methods such as transformation techniques do exist but they are not amenable to automation. As stated in the statistical learning theory, it is also not advisable to approach to a solution via solving a harder problem such as density estimation [5]. Non-parametric estimation methods, such as neural networks, do not require any such assumptions and are capable of approximating any domain of problems. They are also equipped with learning algorithms that can automatically retrain or regenerate maintenance solution if necessitated. However, they tend to be quite opaque (black box) and it is not often possible to generate any qualitative knowledge about the approximated system. This is a major hindrance to establishment of any form of knowledge transfer between the domain expert and the approximating system. Fuzzy inference systems (FIS), especially the Mamdani type, can be efficiently used as a bridge between the domain expert and a CBM system. FIS works on knowledge bases that are in easily comprehensible “IF…THEN’’ format. However, this particular class of algorithms does not possess any form of automated learning, hence require considerable amount of manual tuning in the generation of the solution. Neuro-fuzzy algorithms are an assimilation of neural networks and FIS and are able to annul the disadvantages of the respective parts. These algorithms are particularly adaptive, lucid and highly flexible. As they are essentially fuzzy inference systems embedded into a neural network, they are also robust. It is also easy for a domain expert to interact with these algorithms. Since the knowledge is both in a functional form (network) and generalized form (rule base), it is possible to integrate with the other business functions previously mentioned.
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
2719
4. Adaptive Mamdani fuzzy model As observed in the previous section, neuro-fuzzy systems are ideal candidates to fulfill CBM objectives. Adaptive neuro-fuzzy inference system (ANFIS) [11] and hybrid fuzzy inference system (HyFIS) [12] are the two most popular neuro-fuzzy connectionist systems that simulate a Sugeno and a Mamdani type FIS, respectively. Both algorithms have been validated on various data sets and were shown to possess good accuracy. However, they are not without their drawbacks in the CBM context as elucidated below. Consider a domain described by a function y = f (x1 , x2 ), a Mamdani type FIS in this domain would consists of rules of the form “IF x1 is low AND x2 is medium THEN y is high,’’ where low, medium and high are linguistic terms with functional forms like Gaussian, Sigmoid, etc., also known as membership functions. A Sugeno type FIS in this domain would consist of rules of the form “IF x1 is low AND x2 is medium THEN y = f1 (x1 , x2 ),” where low and medium are linguistic terms with functional context. The difference between the two FIS is the form of consequents. In Mamdani type FIS the output membership function can be defined independent of the premise parameters; whereas in Sugeno type FIS each output membership function is a function of the inputs. ANFIS mimics a Sugeno type FIS. It is efficient for function approximation problems and is not particular useful in classification applications. Hence, it is not appropriate for diagnosis applications and the knowledge (rules) it extracts would be abstract for a domain expert as they are not entirely in a linguistic format. HyFIS, on the other hand, simulates a Mamdani type FIS which is universally applicable and hence can be used for prognosis as well as diagnosis applications. However, it uses a defuzzification (process of generating crisp outputs from fuzzy outputs) strategy that restricts the output membership functions to assume a Gaussian functional form (with center and variance parameters). Although this does not hamper its ability to generate maintenance solutions, it is not possible for a domain expert to interact with the model in all situations (for instance, when output membership functions are non-Gaussian). The aforementioned reasons have provided the motivation to formulate an easily comprehensible neuro-fuzzy system; namely, adaptive Mamdani fuzzy model (AMFM) as shown in Fig. 2. The first layer represents the input parameters. Let N be the number of input parameters, then the first layer will have N nodes. Let Mn denote the number of linguistic terms of input parameter n, n = 1, 2, . . ., N. Then, the total number of nodes in the second layer, I, will be N n=1 Mn . A node n in the first layer is connected to only Mn nodes in the second layer that represents its corresponding linguistic terms. It merely passes the input value xn to the connected second layer nodes. A node i in the second layer has an activation function which is a fuzzy membership function that can be Gaussian, open left Sigmoid, or open right Sigmoid to represent the concepts “medium is the best,’’ “smaller is better,’’ and “bigger is better,’’ respectively. The number of third layer nodes, J, equals to the number of rules. Each node, with its connections from the preceding nodes, represents a rule. Note that different nodes in this layer might represent the same concept (linguistic term of an output parameter). For example, in Fig. 2, say the input parameter x1 has three linguistic terms, small, medium, and
Fig. 2. Adaptive Mamdani fuzzy model (AMFM).
2720
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
large, which corresponds to the first three nodes in the second layer. The input parameter xn has two linguistic terms, small and large, which corresponds to the last two nodes in the second layer. The first two nodes in the third parameter both represent the concept “output y1 is small.’’ Then, the first and second nodes in the third layer and their connections from the second layer nodes represent the rules “IF x1 is small and xn is small THEN y1 is small’’ and “IF x1 is medium and xn is large THEN y1 is small.’’ The activation function of third layer nodes is the minimum operation. For instance (3) (2) (2) in Fig. 2, y1 = min{y1 , yI −1 }. The fourth layer nodes represent the output linguistic terms. Unlike the third layer, each node represents a distinct concept. Therefore, the number of nodes, K, equals to the total number of output parameter linguistic terms. Each node is connected to the preceding layer nodes that represent the same concept. Its activation function is the maximum operation. For example, in Fig. 2, the first node in the fourth layer represents the concept “output y1 is small.’’ It is (3) (3) (4) connected to the first two nodes in the third layer, which represent the same concept. We havey1 = max{y1 , y2 }. Each fourth layer node also maintains a fuzzy membership function as in the second layer. Note that this membership function is not an activation function. It is transmitted to a fifth layer node for defuzzification. The fifth layer represents the output parameters. Let L be the number of output parameters, then the fifth layer will have L nodes. Let Ol denote the number of linguistic terms of output parameter l, l = 1, 2, . . . , L. A node l in the fifth layer will have Ol incoming connections from forth layer nodes that correspond to its linguistic terms. For example, in Fig. 2, the output parameter y1 has two linguistic terms, namely, small and large, which are represented by the first and the second nodes in the forth layer, respectively. Therefore, these two-forth layer nodes are connected to the first node is the fifth layer, which represents the output parameter y1 . A node in the fifth layer performs defuzzification using a weighted average method. The tunable parameters in AMFM are the parameters of the membership functions. For Gaussian membership (2) (2) functions, these include those for the input parameters, namely, ci and i (i = 1, 2, . . ., I ), in the second layer; (4) (4) and those for the output parameters, namely, ck and k (k = 1, 2, . . ., K), in the forth layer. The tuning process is based on error backpropagation and gradient descent search. For a particular input vector [x1 x2 … xN ]T , let the desired output vector be [d1 d2 . . .dL ]T , and the AMFM output vector be [y1 y2 . . .yL ]T . Then, the error can be calculated as 1 (dl − yl )2 . 2 L
E=
(1)
l=1
The error signal at the fifth layer can be calculated as (5)
l
=−
*E = (dl − yl ). *yl
(2) (4)
(4)
Using the chain rule, we can calculate the error signal for ck and k , when the kth node in the fourth layer is connected to the lth node in the fifth layer, as follows: *E (4)
*ck
*E *yl (5) = × (4) = −l × *yl *ck
*E (4)
*k
=
*E *yl (5) × (4) = −l × *yl *k
(4) (4)
(4)
−2 ln yk , (4) (4) (4) |y −2 ln y | k ∈(l) k k k
y k k
(3)
(4) (4) (4) (4) (4) (4) (4) −2 ln yk − k ∈(l) yk ck k −2 ln yk k ∈(l) yk k , (4) (4) (4) −2 ln yk k ∈(l) yk k
(4) (4) (4) (4) (4) (4) in which = yk ck −2 ln yk , = yk −2 ln yk , and (l) denotes the set of fourth layer nodes that are connected to lth node in the fifth layer.
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733 (4)
2721
(4)
Now, ck and k can be updated as follows: (4) ck
=
(4) ck
=
(4) k
+× −
(4)
× yk ,
(4) *ck
(4) k
*E
+× −
(5)
*E
(4)
× yk ,
(4) *k
(6)
in which is a positive constant (the learning rate). (2) (2) To adjust ci and i , the error signals need to be propagated backward to the second layer. The error signal at the forth layer is calculated as (4) k
−*E
=
=
(4)
*yk
( =
−*E (5)
*yl
(5)
*yl
·
=
(4)
*yk
(5) l
(4) (4) k ∈(l) |yk k k |)×
(5)
·
*yl
(4)
*yk
1 (4) (4) (4) (4) (4) (4) −( k ∈(l) |yk ck k k |) × k k − 1k ck k k − k , (4) (4) ( k ∈(l) |yk k k |)2 (7)
(4) in which k = −2 ln yk . The error signal at the third layer is calculated as (4) (3) (4) if yj = yk k (3) ∀j ∈ (k), j = 0 otherwise
(8)
in which (k) denotes the set of third layer nodes that are connected to kth node in the forth layer. The error signal at the second layer is calculated as (3) (2) (3) if yi = yj j (2) ∀i ∈ (j ), i = 0 otherwise
(9)
in which (j ) denotes the set of second layer nodes that are connected to jth node in the third layer. (2) (2) Now we can calculate the error signal for ci and i as follows: *E (2)
*ci
=
*E
=
(2)
*i
(2)
Hence, ci
*E (2)
*yi
*E (2)
*yi
(2)
and i
(2)
×
*yi
(2)
*ci
(2)
×
*yi
(2)
*i
=
(2) ci
=
(2) i
(2)
(2)
= −i × yi
+× −
*E
(2)
×
(xi
×
(xi
(2)
− ci ) (2)
(i )2 (2)
(10)
,
(2)
− c i )2 (2)
(i )3
.
(11)
(2) *ci
(2) i
(2)
can be updated as follows:
(2) ci
(2)
= −i × yi
+× −
*E (2)
*i
(2)
× xi ,
(12)
(2)
× xi .
(13)
2722
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
Fig. 3. Initial membership functions: (a) input X1 and (b) input X2 .
For sigmoid membership functions, the same error backpropagation and gradient descent search method is used. After rule tuning, although the model precision increases with respect to the training data set. However, the legibility of the rule often deteriorates. It is not uncommon that the rules become undistinguishable and make sense only from the approximation point of view. The tuned rules will not be able to clearly explain the created model, thus causing a deterioration of the transparency of the system. This has grave repercussions in some situations where the model needs to change over time because of the dynamic nature of the domain. Since the models are not transparent enough it is not possible to direct this necessary change. Although it is possible to continue to update the model using error backpropagation and gradient descent, this makes the system equivalent to a neural network, which defeats the original intention to utilize them as easily interpretable decision aids. To overcome this problem, we use Kullback–Leibler (KL) mean information, which measures the distance between two distributions, to evaluate and refine the structure of tuned rules. The KL distance is computed as given in the following: di,j =
∀x
di (x) ∗ log
di (x)
dj (x)
,
(14)
where di,j represents the KL distance between membership functions di and dj in dimension d. A distance matrix d , hence, can be formulated for each dimension d which represents the qualitative distance of each membership function
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
2723
Fig. 4. Final membership functions: (a) input X1 and (b) input X2 . Table 1 KL distance matrix for X1
MF1 MF2 MF3 MF4 MF5 MF6
MF1
MF2
MF3
MF4
MF5
MF6
0 0.18253 0.23386 0.23386 0.32494 0.32495
0.18253 0 0.23104 0.23103 0.40117 0.40118
0.23386 0.23104 0 0.70052 0.86876 0.10774
0.23386 0.23103 0.70052 0 0.10774 0.86876
0.32494 0.40117 0.86876 0.10774 0 1
0.32495 0.40118 0.10774 0.86876 1 0
Table 2 KL distance matrix for X2
MF1 MF2 MF3 MF4 MF5 MF6
MF1
MF2
MF3
MF4
MF5
MF6
0 0.48375 0.44889 0.44885 0.46912 0.46913
0.48375 0 1 0.99996 0.0026265 0.0026214
0.44889 1 0 1e−005 0.9768 0.97681
0.44885 0.99996 1e−005 0 0.97675 0.97676
0.46912 0.0026265 0.9768 0.97675 0 5e−006
0.46913 0.0026214 0.97681 0.97676 5e−006 0
2724
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
Fig. 5. Final membership functions after merging: (a) input X1 and (b) input X2 .
from the rest as ⎡
d1,1 . . . d1,N d
⎤
⎥ ⎢ d d d ⎥ d = ⎢ ⎣ i,1 i,i i,N d ⎦ , dN d ,1 . . . dN d ,N d where N d is the number of membership functions in dimension d and di,j =
(15) ( di,j )2 + ( dj,i )2 . This matrix is scaled
to facilitate merging of the significantly similar membership functions based on a threshold dthreshold . The matrix is scaled as di,j d d d , (16) = i,j |i,j = d max where dmax is the largest element in d . The primary advantage of using the KL distance is that it is not restricted by the parametric form of the membership functions. The similarity between any two membership functions is inversely related to the corresponding value in the distance matrix. The advantage of KL distance matrix can be seen in the following case study
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
2725
which involves the approximation of a function popularly known as Rosenbrock’s banana function as defined in the following: y = 100 ∗ (x1 − x22 )2 + (1 − x2 )2 ,
(17)
The initial and final membership functions as identified by ANFIS are depicted in Figs. 3 and 4. It can be seen that there is a clear deterioration of the rule and linguistic structure within each input dimension (membership functions move closer to each other). However, from an approximation point of view the network is very precise, as indicated by the MSE value which is 0.000651 after 1000 iterations. The normalized KL distance matrices for the input dimension (X1 and X2 ) are computed using the above formulae and are given in Tables 1 and 2, where MF stands for membership function. It can be seen that the distance measures of MF1 (from the rest) are quite similar, indicating a very wide span and hence higher overlap with all the membership functions. This is indeed the case as can be seen from Fig. 4. It can also be seen that there is a gradual degradation of the structure because of membership functions with very wide spans and closely spaced centers. This is indicated in Tables 1 and 2 where some of the distance measures are low in magnitude. The inference system is refined by eliminating (merging or deleting) the MFs that result in structural deterioration as indicated by the distance measures. A threshold value of 0.2 was chosen and MFs with lower distance measures are merged accordingly. The resultant network was trained for 1000 epochs and the MSE value was found to be 0.000317, which is 48% lower. The resultant MFs are given in Fig. 5; their interpretability is clearly higher than those shown in Fig. 4. 5. Applications 5.1. Bearing fault identification AMFM modeling accuracy in diagnosis and prognosis applications was demonstrated in Kothamasu et al. [13]. Here, we use an application with linear separable data to illustrate other advantages of AMFM modeling. The application concerns the identification of failures in spindle bearings. The data set (shown in Table 3) was provided by the National Institute of Standards and Technology (NIST) and it comprises of several features provided by the domain expert. These features were extracted from acoustic signals emanating from 18 bearings, of which nine were defective. Two of the features were selected for modeling the separability of the two output classes (normal and defective). Three classes of modeling paradigms were analyzed and compared; namely, regression, neural networks, and AMFM. In the following analysis and model development, the normal (fault free) data are represented as belonging to class ‘0’ and the fault data are represented as belonging to class ‘1.’ Therefore, the developed models would be trained on a discrete set of outputs {0, 1}. Since the models produce continuous outputs, a threshold of 0.5 is chosen and the class is assigned based on the scheme given in the following equation: yi =
0 if f (xi1 , xi1 ) 0.5, 1 if f (xi1 , xi1 ) > 0.5,
where yi is the predicted class, xi1 and xi2 are the two features (inputs) and f is the developed model. Based on expert heuristics, the following rules are used in the AMFM model: 1. IF (Max/Avg+) is low and (Avg(> Max/2)) is low THEN bearing is normal. 2. IF (Max/Avg+) is high and (Avg(> Max/2)) is low THEN bearing is faulty. 3. IF (Max/Avg+) is low and (Avg(> Max/2)) is high THEN bearing is faulty. The AMFM model is tuned and compared to both regression and neural network models. The actual range data are given in Table 4. In order to develop the response surfaces, an enhanced range of features is adopted as specified in the simulated ranges column in Table 4. A grid resolution of 0.5 was used in developing these response surfaces. Fig. 6 depicts the response surfaces of all the three models. Table 5 provides the range of responses from these three models. It is interesting to see that AMFM model generates values much closer to the class representation. Since the
2726
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
Table 3 NIST data on spindle bearings Index
Speed
Transducer
avg+
avg−
Max
Min
RMS
avg > max /2
avg < min /2
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
750 1500 3000 750 1500 3000 750 1500 3000 750 1500 3000 750 1500 3000 750 1500 3000
ACC ACC ACC ACC ACC ACC AE AE AE AE AE AE PIN PIN PIN PIN PIN PIN
0.027 0.134 0.692 0.057 0.343 1.774 0.027 0.134 0.692 0.057 0.343 1.774 0.001 0.005 0.017 0.002 0.010 0.025
−0.023 −0.163 −0.662 −0.062 −0.338 −1.771 −0.023 −0.163 −0.662 −0.062 −0.338 −1.771 −0.001 −0.005 −0.017 −0.003 −0.008 −0.022
0.184 0.973 4.895 0.796 5.001 13.772 0.184 0.973 4.895 0.796 5.001 13.772 0.005 0.042 0.141 0.053 0.180 0.416
−0.254 −1.246 −7.980 −0.798 −3.938 −12.55 −0.254 −1.246 −7.980 −0.798 −3.938 −12.55 −0.005 −0.039 −0.140 −0.052 −0.124 −0.291
0.031 0.201 0.946 0.107 0.595 2.928 0.031 0.201 0.946 0.107 0.595 2.928 0.001 0.007 0.023 0.007 0.021 0.051
0.124 0.583 3.233 0.612 3.751 10.446 0.124 0.583 3.233 0.612 3.751 10.446 0.003 0.026 0.097 0.037 0.148 0.331
−0.254 −0.784 −5.118 −0.583 −2.762 −9.008 −0.254 −0.784 −5.118 −0.583 −2.762 −9.008 −0.003 −0.026 −0.092 −0.041 −0.087 −0.214
0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1
Table 4 Range of features in the training dataset Feature
Max/Avg+ Avg > Max/2
Actual range
Simulated range
Minimum
Maximum
Minimum
Maximum
5.000 0.003
26.500 10.446
0 0
30 15
neural network model converges to the regression model, these two models are referred to as traditional models in the rest of the analysis. Under the class identification scheme (using the threshold) both the traditional and the AMFM models achieve 100% accuracy. Fig. 7 depicts the response surfaces using the classification scheme explained above. Both surfaces divide the entire domain into two regions; however, they significantly differ in their classification boundaries. These discontinuous surfaces create a boundary where the domain corresponding to the operating region of a normal (fault free) spindle bearing is encapsulated within a pocket. The domain outside this pocket corresponds to the operating region of a faulty bearing. It has to be noted that these pockets are situated in the region that has low values of both features and this coincides with the a priori domain knowledge that high valued features are associated with faulty bearing performance. The traditional models create a triangular pocket and the AMFM model creates a rectangular pocket. Although it is not possible to determine the quality of the model based on the shape of this pocket, it is possible to assess their generalization based on the position of the decision boundary in relation to the actual observations within the domain. Fig. 8 gives the decision boundary of the traditional and AFMF models. The AMFM decision boundary is identified by simulating the model at a fine grid of 0.05 over the entire domain. Table 6 gives the actual distances (perpendicular) of the patterns from the decision boundaries. Inline with the theory of support vector machines it would be interesting to study the separability of the decision boundaries achieved by the two models. As can be seen from Table 6 the average distance of observations from the AMFM decision boundary is approximately nine times the average distance from the decision boundary of the traditional model. Fig. 9 gives the box plot of these distances. It can be seen that even though there seem to be three outliers in the case of AMFM model, the median of the distances is comparatively very high. This can be attributed to better generalization and better quality of
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
2727
Fig. 6. Response surfaces: (a) AMFM, (b) regression, and (c) neural network.
Table 5 Range of response of various models Range of response
AMFM NN Regression
Min
Max
−0.0105 −2.1176 −0.5588
1.0114 5.071 3.1221
model according to support vector machine theory. However, it has to be noted that the advantage of the AMFM model was its ability to generate a non-linear decision boundary based on the simple rules provided by the domain expert. This application showed that the AMFM model results in a better response surface owing to the non-linear nature of its decision boundary. As was seen, it consequently results in a maximal separating hyperplane among the regression, neural network, and the AMFM models. Its primary advantage lies in its ability to generate good models based on very high level heuristics that can either be extracted from the data or provided by the domain expert. This results in not only a precise model but also a very legible and transparent model that can be understood and further maintained or tuned easily.
2728
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
Fig. 7. Response surfaces using classification scheme: (a) traditional model and (b) AMFM model.
Fig. 8. Decision boundaries: (a) traditional models and (b) AMFM model. Table 6 Pattern distances from each model Observation
Model
Feature1
Feature2
Class
Traditional
AMFM
6.9245 7.2405 7.0729 6.9245 7.2405 7.0729 5.8367 8.2546 8.4439 13.914 14.567 7.7653 13.914 14.567 7.7653 25.683 18.821 16.452 Average distance
0.124 0.583 3.233 0.124 0.583 3.233 0.003 0.026 0.097 0.612 3.751 10.446 0.612 3.751 10.446 0.037 0.148 0.331
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
4.2415 3.6896 1.8492 4.2415 3.6896 1.8492 5.0657 3.4151 3.2349 0.84073 3.596 3.9362 0.84073 3.596 3.9362 8.3683 3.8141 2.348 3.47514222
24.075 22.343 23.211 24.075 22.343 23.211 32.36 20.245 20.534 33.807 29.654 20.515 33.807 29.654 20.515 58.292 57.082 55.106 30.60161
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
2729
Fig. 9. Box plot of pattern distances from linear and AMFM models.
5.2. Aircraft engine condition monitoring The AMFM modeling technique is further applied to the identification of aircraft engine failures based on its response parameters. Engine performance data, including inter-turbine temperature (dimension 1), fuel flow (dimension 2), shaft speed (dimension 3) and vibration, are used to determine the status of the engine. The operating states of the engine could be normal or faulty due to turbine deterioration or compressor bleed leak. The modeling was done is three phases. In the first phase relevant features were extracted from the data. In the second phase the data were modified into a probabilistic domain so that it is conducive to modeling. The final phase is the actual model construction using AMFM. These three steps are elucidated in the following. Feature extraction: The following features were extracted from the specified dimension based on inputs from the domain expert: • • • • • •
Spike 1 (sudden jump in dimension 1). Spike 2 (sudden jump in dimension 2). Spike 3 (sudden jump in dimension 3). Trend 1 (trend component in dimension 1). Trend 2 (trend component in dimension 2). Trend 3 (trend component in dimension 3).
A spike in the data represents a sudden jump in the value of that dimension. The following procedure was used in the computation of a spike. Let y1 , y2 , . . . , yN be the time series at times t1 , t2 , . . . , tN . Step 1: A moving average of window 11 is used initially to smooth the data and let the smoothed series be s1 , s2 , . . . , sN . Step 2: Find the maximum deviation of actual time series value from the smoothed value. m1 = max(yi − si ),
i = 1, 2, . . . , N.
Step 3: Define a neighborhood window around the “m1,’’ where the current spike is situated and compute the maximum deviation from the smoothed value excluding the window. For the current model a window of “10’’ is used. m2 = max(yi − si ),
i = 1, 2, . . . , m1 − 10, m1 + 10, . . . , N.
2730
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
Table 7 Results from the enhanced model Engine
A B C D E F G H I J K L M A1 B1 C1 D1 E1 F1 G1 H1 I1 J1 K1 L1 M1 N1 O1 P1 Q1 R1 S1 T1 U1 V1
kurtosis1
50.12 1.93 2.24 2.50 2.37 2.55 2.55 3.04 1.90 2.09 2.33 2.82 62.13 3.29 1.86 93.91 2.31 64.76 6.73 2.49 2.32 3.11 2.92 1.54 1.80 4.68 4.68 76.39 2.30 94.67 3.22 240.04 4.53 4.62 8.49
kurtosis2
38.22 7.95 200.12 4.35 6.05 3.77 3.54 7.79 1.51 2.82 2.56 4.43 55.68 7.66 4.91 124.98 3.34 39.90 7.36 6.98 5.03 3.41 4.06 3.00 1.73 4.61 6.81 57.47 4.68 48.42 9.36 114.85 5.68 5.05 11.18
spike1
9.68 1.58 1.22 2.96 1.39 1.22 1.19 1.18 1.15 1.57 1.13 1.21 20.85 1.03 1.15 21.24 1.24 4.08 1.27 1.16 1.01 1.02 1.09 2.35 1.34 1.26 1.22 8.79 1.48 8.49 1.30 18.78 1.12 1.33 1.15
spike2
4.24 1.80 2.27 1.77 1.18 1.51 1.47 1.57 1.33 2.04 1.28 1.14 7.80 1.34 1.18 18.85 1.38 2.19 1.16 1.10 1.12 1.33 1.01 11.81 1.12 1.02 1.02 6.46 1.47 3.23 1.25 5.39 1.75 1.01 1.04
trend1
0.05 0.03 0.07 0.71 0.31 0.45 0.44 0.06 0.36 0.85 0.67 0.05 0.20 0.05 0.27 0.11 0.26 −0.01 −0.01 0.09 −0.01 −0.11 0.12 1.37 1.45 −0.09 −0.14 0.02 0.09 0.12 0.03 0.00 0.01 0.04 −0.01
trend2
0.04 0.16 0.03 0.07 0.18 0.28 0.11 −0.05 0.64 0.25 0.56 0.01 0.08 0.03 0.07 −0.04 0.07 0.00 −0.05 0.07 0.04 0.09 0.07 0.60 0.76 −0.04 −0.14 0.00 0.01 0.13 −0.06 −0.02 0.00 0.00 −0.01
trend3
0.07 −0.10 −0.04 −0.07 −0.15 −0.39 −0.09 −0.02 −0.16 −0.19 −0.07 0.01 0.01 −0.08 0.02 0.14 0.00 0.01 −0.02 −0.02 0.01 0.01 −0.03 −0.23 −0.50 −0.03 −0.08 0.00 0.00 0.08 0.01 0.02 0.01 0.00 0.00
Actual failure X Z Y Y Y Y Y X Y Y Y Z X Y Y Y Y X Z Y Z Z Y Y Y X Z X Y Z Z X Z Z Z
Possibility of failure X
Y
Z
0.97 −0.01 0.00 −0.17 −0.26 −0.15 −0.16 0.24 −0.28 −0.23 −0.22 0.03 0.81 −0.04 0.00 0.90 −0.01 0.87 0.22 −0.08 0.03 0.14 −0.03 −0.37 −0.10 0.26 0.33 0.92 0.04 0.93 0.26 0.96 0.08 0.02 0.08
0.08 0.19 1.05 0.91 0.88 0.93 0.81 0.17 0.82 0.25 0.67 0.25 0.13 0.43 0.85 0.32 0.85 0.00 0.02 0.44 0.13 −0.04 0.55 0.51 0.88 −0.09 −0.37 0.02 0.37 0.06 0.07 0.25 0.12 0.23 0.08
0.44 0.43 0.48 0.00 0.07 0.02 −0.01 0.43 0.04 0.01 0.01 0.43 0.44 0.33 0.12 0.44 0.14 0.41 0.44 0.42 0.44 0.44 0.40 0.05 0.27 0.44 0.44 0.44 0.42 0.44 0.43 0.44 0.44 0.43 0.44
Step 4: The value of the spike is defined as m1 / m2 . A moving average with term 11 was used to smooth the time series and a linear least square fit on the smooth curve was used to represent the trend. Let s1 , s2 , . . . , sN be the values of the smoothed curve at instances t1 , t2 , . . . , tN . The trend is computed as follows: N (si − sa )(ti − ta ) = i=1 , N 2 i=1 (ti − ta ) where sa is the average of s and ta is the average of t. Data assimilation: The next challenge is to identify the structure of the outputs. As is typical with modeling nominal outputs, there are a myriad number of ways in which the outputs can be coded representing the failure modes. In this case study the output set was designed to represent the possibility of the various failure modes. Since the model’s response is guided by well-defined rules it is possible to gauge the progression toward a particular failure mode by observing the nature of these possibilities. The methodology to convert the output data into the possibility domain is given below. Step 1: The training data are divided into clusters by using the subtractive clustering algorithm [8].
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
2731
Step 2: Let C be the number of identified clusters and J be the number of operating states of the engine. Each cluster is associated with some probability of operating in each of these states of the engine. If Pj,c is the probability of the engine being in the state j in cluster c then it can be computed as follows: nj,c Pj,c = J , j =1 nj,c where nj,c is the number of data points in cluster c that belong to the operating mode j . Model development: The AMFM model would take the extracted features as inputs and output the possibility of the three operating modes—normal (status Z), turbine deterioration (status Y) and compressor bleed leak (status X). The model was created in 500 epochs and is 89.5% accurate with four rules. After observing the results, we found that the model was unable to diagnose compressor bleed leak in four instances. The reason is because of the spike values being too low compared to other compressor bleed leak patterns (spike is the primary indicator of compressor bleed leak). A close observation of monitored parameters 1 and 2 resulted in the conclusion that there are two dominant mechanisms of compressor bleed leak manifestation—one being a singular spike and the other being a characteristically spiky behavior. Since the computation of the spike feature as previously specified is not sufficient to capture a characteristically spiky behavior, two other features, namely the kurtosis values of input parameters 1 and 2 were used to capture this effect. Further analysis also indicated that the spike value from input dimension 3 does not add any information beyond the spike values in dimensions 1 and 2, and hence it was dropped. AMFM was used to create a new model based on the new set of features—kurtosis1, kurtosis2, spike1, spike2, trend1, trend2, and trend3. The new model resulted in an improvement of accuracy to 94.5%. The data set and the responses from the model are given in Table 7. 6. Discussion Fuzzy and neuro-fuzzy systems have been used for various CBM applications, including: • • • • • • • •
hydraulic pump and motor diagnosis [2]; control flow valve fault diagnosis [4]; high-pressure boiler feed pump vibration prediction [6]; high voltage transformer relative failure rate determination [7]; motor fault diagnosis [9]; bearing fault diagnosis [16]; robot fault detection [22]; power transformer incipient fault recognition [24].
Fuzzy systems can effectively deal with both quantitative and qualitative information. They are conducive to CBM applications because qualitative expert heuristic knowledge can be incorporated in the model building process. The model can be fine tuned using quantitative historical/experimental data. Intuitively, this would result in a more robust and accurate model, which is also easily interpretable. Model interpretability is very important in CBM applications. CBM systems are designed to provide advanced notices and alarms, or to shut down a process. Human operators are required to evaluate the situation and make appropriate decisions. In other words, CBM systems and human operators need to collaborate to solve problems. Human factor engineering research has shown that under such circumstances, human operators need to understand the system’s reasoning process in order to effectively solve complex problems [15]. As previously mentioned, there are two prevalent types of fuzzy systems, the Sugeno type and the Mamdani type. They differ in the format of rule consequents. The rule consequent of a Sugeno fuzzy system is in the from of a function; whereas that in a Mamdani fuzzy system is in the form of a linguistic term. Mamdani fuzzy systems are more compatible with the reasoning process of human operators. Therefore, we advocate the use of such systems for CBM applications. There are two common approaches to make Mamdani fuzzy systems adaptive (i.e., ability to adjust its membership functions based on available data). One approach is the neuro-fuzzy approach—implement the fuzzy system using a neural network structure and use error backpropagation and gradient descent search to adjust membership function parameters. The other approach is to use genetic algorithms (GA) for parameter adjustment. The GA approach, being stochastic, often results in linguistic terms that are counter-intuitive. For example, after GA optimization the membership function of a linguistic term “small’’ may shift to the right of the membership function
2732
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
of a linguistic term “large.’’ This has no influence on the model accuracy but degrades model interpretability (unless the two linguistic terms are relabeled). This situation rarely happens when using the neuro-fuzzy approach, which is the approach we adapted. In neuro-fuzzy modeling, there is a tendency to sacrifice model interpretability in pursuit of model accuracy. It is not uncommon to see researchers use seven or more linguistic terms to describe a parameter. They seem to overlook that a major reason for choosing a fuzzy system is because of its interpretability. This situation is exacerbated when rules are extracted automatically from data using popular methods such as subtractive clustering, in which case the number of linguistic terms equals the number of rules/clusters. For example, if a problem involves four parameters and 15 clusters are formed, the final fuzzy model would include 60 membership functions. This may be acceptable in applications where model interpretability is not critical. However, it is undesirable in CBM applications because human operators would most likely get confused when trying to understand why the model produces an alarm. Our main contribution is to overcome this problem by using KL mean information to evaluate the similarity among fuzzy membership functions. Similar membership functions are then combined to minimize the number of linguistic terms. Our result showed that it is possible to keep the fuzzy model concise and interpretable while maintaining a high level of model accuracy. 7. Conclusion System maintenance is an important activity for both manufacturing and service companies. Although time-based maintenance is simple to implement, CBM is gaining popularity because of its proactive nature. Traditional models for CBM applications are mainly based on first-principles. This paper presents a neuro-fuzzy modeling approach utilizing IF-THEN rules and demonstrated its usefulness in CBM applications. The benefits that can be realized using this non-traditional modeling approach are as follows: • Rule-based knowledge representation, coupled with rule extraction, provides a means to integrate data-driven modeling with physics-based modeling. • Rule-based model is compatible with human heuristic reasoning, thus allowing domain experts to directly contribute to model building. • Rule-based model is transparent to the user. How a decision is made can be clearly explained so the system can quickly gain user trust. This is especially important in safety-critical applications where human lives are at stake. • Approximate reasoning is a parametric computational approach, thus opens up the possibility for automatic parameter tuning so that the system can adapt to a changing environment. The neuron-fuzzy modeling approach for CBM should be viewed as complementary to, rather competing with, firstprinciple-based modeling approach. We believe these two types of models should be integrated in CBM applications as follows. A first-principle-based model is first developed to describe normal behavior of the system of interest. The actual system behavior is then monitored and compared to what is expected (predicted by the first-principles model). The difference is then analyzed using a neuro-fuzzy model to identify abnormal behavior caused by system fault and degradation. The neuron-fuzzy model, utilizing easy to understand IF-THEN rules, will be a more useful tool than a first-principles model (built with complex equations) to assist domain experts for decision-making. Acknowledgements The research is partially supported by the National Science Foundation under Grant no. 0555962. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. References [1] G. Abdulnour, R.A. Dudek, M.L. Smith, Effect of maintenance policies on the just-in-time production system, Internat. J. Production Res. 33 (2) (1995) 565–583. [2] S. Amin, C. Byington, M. Watson, Fuzzy inference and fusion for health state diagnosis of hydraulic pumps and motors, In: Annual Meeting of the North American Fuzzy Information Processing Society, 2005, pp. 13–18. [3] M. Ben-Daya, S.O. Duffua, Maintenance and quality: the missing link, J. Quality in Maintenance 1 (1) (1995) 20–26.
R. Kothamasu, S.H. Huang / Fuzzy Sets and Systems 158 (2007) 2715 – 2733
2733
[4] C.D. Bocaniala, J. Sa da Costa, V. Palade, A novel fuzzy classification solution for fault diagnosis, J. Intelligent and Fuzzy Systems 15 (3/4) (2004) 195–205. [5] V. Cherkassky, F. Mulier, Learning From Data: Concepts, Theory and Methods, Wiley, New York, NY, 1998. [6] F.E. Ciarapica, G. Giacchetta, Managing the condition-based maintenance of a combined-cycle power plant: an approach using soft computing techniques, J. Loss Prevention in the Process Industries 19 (4) (2006) 316–325. [7] R.J. Cizelj, B. Mavko, I. Kljenak, Component reliability assessment using quantitative and qualitative data, Reliability Engineering and System Safety 71 (1) (2001) 81–95. [8] S. Chiu, Fuzzy model identification based on cluster estimation, J. Intelligent and Fuzzy Systems 2 (3) (1994) 267–278. [9] X.Z. Gao, S.J. Ovaska, Soft computing methods in motor fault diagnosis, Appl. Soft Comput. 1 (1) (2001) 73–81. [10] T.M. Husband, Maintenance Management and Terotechnology, Gower Press, Aldershot, UK, 1978. [11] J.S.R. Jang, ANFIS: adaptive network based fuzzy inference system, IEEE Trans. System Man Cybernet. 23 (3) (1993) 665–685. [12] J.S. Kim, N. Kasabov, HyFIS: adaptive neuro-fuzzy systems and their application to non-linear dynamical systems, Neural Networks 2 (9) (1999) 1301–1312. [13] R. Kothamasu, S.H. Huang, W.H. VerDuin, A comparison of computational intelligence and statistical methods in condition monitoring for hard turning, Internat. J. Production Res. 43 (3) (2005) 597–610. [14] U. Kumar, S. Granholm, Reliability centered maintenance—a tool for higher profitability, Maintenance 5 (3) (1990) 23–26. [15] P.E. Lehner, D.A. Zirk, Cognitive factors in user/expert system interaction, Human Factors 29 (1) (1987) 97–109. [16] C.K. Mechefske, Objective machinery fault diagnosis using fuzzy logic, Mech. Systems and Signal Process. 12 (6) (1998) 855–862. [17] M.A. Moss, Designing for Minimal Maintenance Expense, Marcel Dekker, New York, NY, 1985. [18] S. Nakajima, Total Productive Maintenance, Productivity Press, Cambridge, MA, 1988. [19] B.K.N. Rao, The need for condition monitoring and maintenance management in industries, Handbook of Condition Monitoring, Elsevier Science, Amsterdam, 1996, pp. 1–36. [20] H. Sandtorv, RCM—closing the loop between design, reliability and operational reliability, Maintenance 6 (1) (1991) 13–21. [21] H. Saranga, J. Knezevic, Reliability prediction for condition based maintained systems, Reliability Engineering and System Safety 71 (2) (2000) 219–224. [22] H. Schneider, P.M. Frank, Observer-based supervision and fault detection in robots using non-liner and fuzzy logic residual evaluation, IEEE Trans. Control Systems Technology 4 (3) (1996) 274–282. [23] D.H. Stamatis, Failure Mode and Effect Analysis: FMEA from Theory to Execution, ASQC Quality Press, Milwaukee, WI, 1995. [24] H.T. Yang, C.C. Liao, Adaptive fuzzy diagnosis system for dissolved gas analysis of power transformers, IEEE Trans. Power Delivery 14 (4) (1999) 1342–1350.