Engineering Applications of Artificial Intelligence 28 (2014) 22–35
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai
Incipient fault diagnosis using support vector machines based on monitoring continuous decision functions Mehdi Namdari n, Hooshang Jazayeri-Rad Department of Instrumentation and Automation, Petroleum University of Technology, Ahwaz, Iran
art ic l e i nf o
a b s t r a c t
Article history: Received 4 April 2013 Received in revised form 13 September 2013 Accepted 21 November 2013 Available online 27 December 2013
Support Vector Machine (SVM) as an innovative machine learning tool, based on statistical learning theory, is recently used in process fault diagnosis tasks. In the application of SVM to a fault diagnosis problem, typically a discrete decision function with discrete output values is utilized in order to solely define the label of the fault. However, for incipient faults in which fault steadily progresses over time and there is a changeover from normal operation to faulty operation, using discrete decision function does not reveal any evidence about the progress and depth of the fault. Numerous process faults, such as the reactor fouling and degradation of catalyst, progress slowly and can be categorized as incipient faults. In this work a continuous decision function is anticipated. The decision function values not only define the fault label, but also give qualitative evidence about the depth of the fault. The suggested method is applied to incipient fault diagnosis of a continuous binary mixture distillation column and the result proves the practicability of the proposed approach. In incipient fault diagnosis tasks, the proposed approach outperformed some of the conventional techniques. Moreover, the performance of the proposed approach is better than typical discrete based classification techniques employing some monitoring indexes such as the false alarm rate, detection time and diagnosis time. & 2013 Elsevier Ltd. All rights reserved.
Keywords: Support vector machines Incipient fault diagnosis Pattern recognition Continuous decision function Binary mixture distillation column
1. Introduction Chemical process industries have constantly been very much concerned about the different methods and techniques for reducing the inconsistency of products and to guarantee a safety production so as to avoid public damage and large economic losses. Towards this objective, early and accurate reactions to process irregularities play an essential role. This task is denoted as Abnormal Event Management (AEM) which is the main component of supervisory control. AEM is composed of three major steps: first, appropriate detection of an abnormal process behavior (fault detection); second, diagnosing its causal origins or root causes (fault diagnosis); and finally, taking proper decisions in taking the process back to the normal operating regime (process recovery). Different methods have been established to detect and diagnose faults in complex chemical plants employing computer aided decision-making supporting tools. These techniques can be generally categorized into model based approaches and process history based approaches. Process model based methods depend
n Corresponding author. Department of Instrumentation and Automation, Petroleum University of Technology, Behbahani Expressway, POB: 6198144471, Ahwaz, Khuzestan, Iran. Tel.: þ 989391785876; fax: þ986115551021. E-mail addresses:
[email protected],
[email protected] (M. Namdari),
[email protected] (H. Jazayeri-Rad).
0952-1976/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.engappai.2013.11.013
on a basic understanding of the process utilizing first principle knowledge which includes qualitative methods and quantitative methods (Dash and Venkatasubramanian, 2000). In qualitative methods such as fault trees and signed diagrams, the process model states the relationships between the inputs and outputs in terms of qualitative functions (Venkatasubramanian et al., 2003b). In quantitative methods however, which are also denoted as Analytical Redundancy (AR), mathematical expressions are used to describe the process. The crucial task in these methods is creating residual signals which essentially give the difference between the real measured outputs and the predictions gained by mathematical model of the process (Isermann, 2005; Venkatasubramanian et al., 2003a). In contrast to the modelbased methods where a priori knowledge about the model of the process is assumed, in process history based methods only the availability of large amount of historical process data is assumed (Venkatasubramanian et al., 2003c). Conversely, the rise of a range of new sensors and data collecting tools enabled measurement of hundreds of variables every few seconds of real operation of chemical processes. These instrumentation developments, which offer large volume of datasets, have inspired the development of various process history based methods to design more reliable and cost-effective fault detection and diagnosis systems. These data based techniques are more appropriate for large-scale industrial processes where precise and broad quantitative or qualitative cause-effect models may
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
Nomenclature b BP BW C d D F G k L LC m M MBP
separating hyperplane parameter bottom product flowrate (kmol min 1) bound width of classifier normal output regularization parameter dimension of sample vectors decision function feed rate (kmol min 1) training set hyperbolic decision function parameter reflux flow (kmol min 1) level controller number of classes number of training samples reboiler liquid holdup (kmol)
be hard to design from the first principles. Amongst those data based approaches, Multivariate Statistical Process Monitoring (MSPM) approaches are the most prevalent for fault detection with numerous applications over the past years (Joe Qin, 2003; MacGregor and Kourti, 1995). Here, we can denote Principal Component Analysis (PCA), Partial Least Square (PLS), Canonical Variate Analysis (CVA), Fisher Discriminant Analysis (FDA) (Chiang et al., 2000, 2001; Russell et al., 2000), Dynamic Principal Component Analysis (Ku et al., 1995), Multiway Principal Component Analysis (MPCA) (Nomikos and MacGregor, 1994), Independent Component Analysis (ICA) (Lee et al., 2006, 2004), Correspondence Analysis (CA) (Detroja et al., 2007, 2006) and nonlinear kernel-based methods including Kernel Principal Component Analysis (KPCA) (Cho et al., 2005; Choi et al., 2005), Kernel Partial Least Squares (KPLS) (Zhang et al., 2010), Kernel Independent Component Analysis (KICA) (Lee et al., 2007) and Kernel Dissimilarity Analysis (KDA) (Zhao et al., 2009). Data-based fault diagnosis is basically a pattern recognition problem in which a functional mapping from the measurement space to a fault space is computed. In pattern recognition based techniques, the accessibility of a complete historical process data for normal operation and different abnormal conditions of the process are assumed. In these methods, diverse operating situations comprising normal and abnormal ones are assumed to be patterns. Then, a particular classifier is employed to investigate the online measurement data and to convert it to an identified class label for normal or abnormal condition. When the process model is not known, pattern recognition methods provide a convenient approach to solve fault diagnosis problem. Several algorithms based on the pattern recognition methods, which are realized and applied to various fault diagnosis applications, include the k-Nearest Neighborhood (kNN) (He and Wang, 2007), Fisher Discriminant Analysis (FDA) (Chiang et al., 2004), and the Bayesian Networks (BNs) (Verron et al., 2006, 2010). In particular, amongst the pattern recognition techniques, Artificial Neural Networks (ANNs) had an enormous attention in past years because of some of their motivating features such as handling nonlinearity, noise tolerance and generalization capability (Behbahani et al., 2009; Eslamloueyan, 2011; Namdari et al., 2013; Power and Bahri, 2004; Zhang, 2006; Zhang and Morris, 1994). ANNs have demonstrated to be magnificent classifiers but they need large number of samples for training, which is not always obtainable in practice and sometimes result in high generalization error because in training process, they only attempt to exploit classification performance for the training data. To resolve this problem, a new machine learning method relying on
MTP qF T t TP V W xi xBP yi yTP Xi zF ζi
23
condenser liquid holdup (kmol) fraction of liquid in feed temperature (1C) simulation time (min) top product flowrate (kmol min 1) boilup flow (kmol min 1) d-dimensional weight vector sample vector i liquid composition of light component for bottom product class label i vapor composition of light component for top product online measurement i feed composition SVM slack variable
statistical learning theory called SVM is proposed in pattern recognition field to accomplish higher generalization ability specifically for dealing with problems with low samples and high input features (Vapnik, 1999). It has been shown that SVM is more effective than previously described pattern classifiers and in recent years have been discovered to be highly effective in numerous practical applications (Moguerza and Muñoz, 2006). The use of SVMs in solving process engineering problems is virtually new (Mahadevan and Shah, 2009; Yélamos et al., 2009). In Chiang et al. (2004) the performance of FDA as a recognized classical linear method is compared with SVM for three faulty conditions of the Tennessee Eastman Process (TEP) which generates overlapping data sets. In Kulkarni et al. (2005), a SVM with knowledge incorporation is applied to diagnose the faults in the TEP. In Mao et al. (2007) a Fuzzy Support Vector Machine (FSVM) classifier with parameter tuning, which employs feature selection using recursive feature elimination, is utilized for fault diagnosis of the TEP. In Yélamos et al. (2007) a multi-label method for dealing with simultaneous faults based on SVM is applied to the TEP. In Monroy et al. (2010) a semi-supervised approach consisting of different methods such as Independent Component Analysis (ICA), Gaussian Mixture Models (GMM), Bayesian Information Criterion (BIC) and SVM for fault diagnosis of the chemical processes is offered and effectively applied to the entire set of the TEP faults. In some applications, hybrid approaches comprising of statistical process monitoring techniques and SVM classifier are employed. In these methods the statistical techniques are in charge of the fault detection and feature extraction. The extracted feature is then employed by SVM to achieve fault diagnosis (Guo et al., 2003; Jie and Shouson, 2010; Xu et al., 2013; Zhang, 2008, 2009). In spite of the acceptance and practicality of pattern recognition based fault diagnosis in many engineering branches, however, one of their key disadvantages in process industry applications is related to their static prediction for the dynamic performance of the chemical processes. Normally, the output of an ordinary pattern recognition technique is essentially an integer value which merely signifies the existing process operational state (Normal, Fault 1, Fault 2, …) without offering any dynamical information about the process changing form one state to another. Consequently, the quality and depth of the faults through the actual operating time is not provided via the typical application process of classification techniques in fault diagnosis problems. For example, in a two-class SVM, in decision stage, generally a discrete decision function is employed which merely assumes two values þ1 or 1 to identify if an input vector fits in a specific class or not. However, for incipient fault mainly, while fault progressively
24
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
Fig. 1. Separating hyperplanes with small and large margins in binary classification.
advances over time and there is a changeover from normal operation to abnormal operation, utilizing discrete decision function does not provide any information about the growth and severity of the fault. Several process faults, such as reactor fouling and weakening of catalyst, develop gradually and can be considered as incipient faults. In this work, to deal with this matter, a continuous decision function is employed. The anticipated continuous decision function is limited between 1 and þ1. The output value between 1 and 1 can be considered as the fuzzy indication of the severity of the corresponding fault. In addition, based on the two common classification methods for SVM, i.e., One Versus One (OVO) and One Versus All (OVA) the proposed approach is extended to deal with multiclass classification problems. The anticipated method is applied to achieve online incipient fault diagnosis of a continuous binary mixture distillation column. In this work, we compared the performance of the OVOSVM against the OVA-SVM based on the both the continuous and discrete decision based methods. The fault detection and diagnosis performance measures employed include false alarm rate, detection time and diagnosis time. The remainder of the paper is organized as follows. Section 2 describes the two-class SVM, OVO-SVM and OVA-SVM for the discrete decision function and the proposed continuous decision function based methods. Description of the case study, i.e., the continuous binary mixture distillation column, and also details about the considered fault cases are presented in Section 3. In Section 4 the training issues of the classifiers are discussed. Section 5 is dedicated to represent the graphical results of the application of the diagnosis systems on the case study. In Section 6 further discussions about the diagnosis performances of the applied methods are detailed. In addition, some comparison results based on the employed quantitative performance indices are provided. Finally, conclusions are given in the last section of the paper.
2. SVM SVM is a fairly new machine learning tool which is effectively applied in many machine learning applications such as classification, regression, and outlier detection. SVM was originally designed for pattern classification tasks. Pattern classification techniques classify objects into one of the given categories called classes. Classical classifiers such as neural networks attempt to minimize error on the training data set employing the Empirical Risk Minimization (ERM) technique. On the other hand, the SVM is based on the Structural Risk Minimization (SRM) principle rooted in the statistical learning theory. SRM offers better generalization
abilities via the minimization of the upper bound of the generalization error (Abe, 2005).
2.1. Two-class SVM SVM in its basic form is a binary classifier which learns a linear hyperplane that separates a set of positive examples from a set of negative examples with maximum margin. The margin is defined by the distance of the hyperplane to the nearest of the positive and negative examples as demonstrated in Fig. 1. This maximum margin hyperplane is called optimal separating hyperplane because the philosophies based on the statistical learning approve that allowing for maximum margin in the training process results in better generalization ability. The nearest data points are employed to define the margin and are known as support vectors. The number of support vectors grows with the complexity of the problem. Suppose there is a known training sample set G ¼ {(xi, yi), i¼1,…, M}, where M is the number of samples and each sample xi Є Rd is a member of a class defined by yi Є {þ 1, 1}. In the case of a linearly separable data, it is possible to define the following hyperplane that separates the given data: W T Ux þ b ¼ 0
ð1Þ
where W is an d-dimensional vector and b is a scalar bias term. The vector W and scalar b are used to describe the position of separating hyperplane. It can be shown that maximizing the margin is equivalent to minimizing W. To obtain a hyperplane with larger margin and better generalization ability a positive slack variable ζi for each training sample is defined. This permits some of the samples to be misclassified. So the optimal hyperplane separating the data can be determined as a solution to the following constrained quadratic optimization problem: M 1 Minimise :W 2 : þ C ∑ ζi 2 i¼1
Subject to yi ½W Uxi þ b Z 1 ζi;
ð2Þ
i ¼ 1; …; M
ð3Þ
where C is the regularization parameter that determines the tradeoff between the maximization of the margin and minimization of the classification error and is used to prevent the overfitting phenomenon. Vectors W and b are determined through solving the optimization problem in the training procedure. Then, for or a given input data, xi, the following discrete decision function is used to classify
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
input data in either positive class or negative class: ( þ1 for yi ¼ þ 1 Dðxi Þ ¼ signðW T U xi þ bÞ 1 for yi ¼ 1
25
ð4Þ
In spite of the linearity of the basic SVM learning technique, nonlinear classification is obtained by utilizing kernel trick where kernel functions are used to map the data from the original input space to a high dimensional feature space. A linear hyperplane is then trained to differentiate the data in the feature space which is equivalent to a nonlinear classification in the input space. Polynomial, Radial Basis Function (RBF) and sigmoid are the most popular kernels used for SVM (Sanchez and David, 2003). 2.2. Multiclass SVM The discussion above deals with binary classification for two classes of data. In real world problems, however, we encounter more than two classes, for example, in fault diagnosis where various types of faults may be encountered in a real diagnosis problem. Unfortunately, there is no unique way to use SVM for multiclass problems. Currently, two general approaches for adapting SVM to multiclass strategies have been developed. One method constructs and combines several binary classifiers while the other considers all data in an optimization problem (Hsu and Lin, 2002). Two common methods, which are considered in this paper use the first approach to tackle multiclass problems, are the OVA and OVO methods. OVA construct m binary classifiers where m is the number of classes. Each binary classifier separates the training samples of one class from all other classes. That is why it is called one versus all. So in each binary SVM separator, the training samples of all classes are used. When classifying a new example, each binary classifier predicts a class and the one with the highest confidence is then selected (i.e., the "winner takes all'' strategy). In the second strategy, OVO constructs m(m 1)/2 binary classifiers where each binary classifier separates the training samples of one class from another. Therefore, in the training cycle of each binary SVM, the training samples belonging to only two of the classes are used. The majority voting strategy is then utilized in classification of one new example in which m(m 1)/2 binary SVM classifiers will vote for each class, and the winner class will be the class having the maximum votes (Kim et al., 2003). 2.3. Classification based on continuous decision functions Once the hyperplane parameters of W and b are determined through solving the optimization problem given by Eqs. (2) and (3), the following hyperbolic decision function in a two-class classification problem is proposed instead of the one presented by Eq. (4): ( 4 0 for yi ¼ þ 1 Dðxi Þ ¼ tanhðkðW T U xi þ bÞÞ ð5Þ o0 for yi ¼ 1 where the parameter k can be chosen so that the decision function values for the training samples belong to the class þ1 be closer to þ1 and for the class 1 be closer to 1. As the value of k grows, the sensitivity of the classifier to the changes in the process variables is decreased. For low values of k, small variations in the faulty variables can be spotted by the classifier; however, the system outputs will become more sensitive to the process noises. The value of k should be evaluated using the operator proficiencies and the sensitivity prerequisite in monitoring of the dynamic behavior of the particular fault. The proposed continuous discrimination approach can also be employed in the feature space by using nonlinear kernel functions.
Fig. 2. Two-class SVM with discrete and continuous decision functions.
The advantages of the continuous decision function over the discrete decision function in online fault diagnosis applications can be explained in the following manner. Consider the optimal separating hyperplane with the discrete and continuous decision functions depicted in Fig. 2. Suppose that the class labeled as 1 contains the normal operation samples and the class with the label þ1 contains the faulty operation samples. The increase of the value of the decision function represented in Eq. (5) from 1 toward þ1 can be interpreted as the severity of process departure from the normal operation to the faulty operation. Therefore, by using the continuous decision function, useful information about the development of the fault can be obtained. Whereas, discrete decision function does not characterize the gradual growth of the fault. As will be discussed later, we use this information to detect and diagnosis the incipient faults in their initial stages of manifestation; in comparison with discrete based classification where a fault cannot be sensed unless the process passes completely the normal operation state into a faulty state in the classification space. For multi-fault diagnosis, we employ the binary SVM hinged on the hyperbolic decision function and the OVO or OVA multiclass algorithms. The schematic diagram of the OVA Support Vector Machine (OVA-SVM) with the proposed continuous decision function is represented in Fig. 3. The continuous decision function, Di(x), which separates class i with the positive label from all other classes with the negative label, is defined as Di ðxÞ ¼ tanhðki ðW i T Ux þ bi ÞÞ
ð6Þ
where Wi and bi (as defined in Eq. (1)) are the parameters of the ith separating hyperplane. The ki parameter is identical to the parameter k used in Eq. (5). As shown in Fig. 3, for each input vector x we have m continuous decision values corresponding to m classes of faults. Fig. 4 shows the schematic diagram of the OVO Support Vector Machine (OVO-SVM) with the proposed continuous decision function. For each two-class classifier, Wij and bij are the parameters of the hyperplane, (Fij(x) ¼0), that separates the class i with the positive label from the class j with the negative label: F ij ðxÞ ¼ W Tij :x þ bij
ð7Þ
In Eq. (7) we have Fij(x)¼ Fji(x). The continuous decision function, Di(x), for the class i can be defined by a minimization operation on Fij as 0 B B B Di ðxÞ ¼ tanhBki ð B @
1 min j ai j ¼ 1; …; m
C C C F ij ðxÞÞC C A
ð8Þ
26
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
Fig. 3. The schematic diagram of OVA-SVM with continuous decision functions.
Fig. 4. The schematic diagram of OVO-SVM with continuous decision functions.
The ki parameter can be chosen as in the OVA approach. Therefore, similar to the OVA approach, we have m continuous decision functions for the m corresponding classes.
control. This structure is known as the L–V configuration. The schematic diagram of the column with the L–V configuration at the normal operating condition is shown in Fig. 5. Further details about the model and all the MATLAB simulation files are available over the internet.
3. Case study 3.1. Simulated faults In this work a continuous binary mixture distillation column has been considered as the case study. The plant is the “column A” studied by Skogestad and Morari (1988). It has 41 stages including reboiler and total condenser, and separates a binary mixture with relative volatility of 1.5 into products of 99% purity. In developing the model the following assumptions were used: binary mixture; constant pressure; constant relative volatility; equilibrium on all stages; total condenser; constant molar flows; no vapor holdup; linearized liquid dynamics; and the effect of vapor flow ("K2"effect) is included. The model consists of four manipulating inputs and four control outputs. The model is open-loop unstable. Top and bottoms product flows are used to achieve the corresponding level controls. The reflux (L) and boilup (V) flows can be used to obtain composition
Five malfunctions owing to the fluctuations in feed rate (F), feed composition (zF) and fraction of the liquid in feed (qF) are considered in this study and are listed in Table 1. These process faults only result in variations in the product quality and do not lead to failures or operational hazards that might bring about equipment damage and/or plant shutdown. In implementing each fault condition listed in Table 1, step changes are introduced in each faulty variable while other variables are maintained constant at their normal values. Forty three variables are measured from the process. These variables include the top product flowrate (TP), bottom product flowrate (BP), condenser liquid holdup (MTP), reboiler liquid holdup (MBP) and thirty nine tray temperatures. Gaussian noise was added to all measured variables. An observation sample vector at
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
27
phases: X i;p ¼
Fig. 5. The schematic diagram of the column with L–V configuration.
Table 1 Selected malfunctions. Faults
Description
Fault Fault Fault Fault Fault
High feed rate Low feed rate High feed composition Low feed composition Low liquid rate
1 2 3 4 5
a particular time instant is given by: X ¼ fTP; BP; M TP ; M BP ; Tð1Þ; Tð2Þ; …Tð39Þg
ð9Þ
Six simulation runs are considered corresponding to one normal operation and five faulty conditions. Each simulation is executed for 200 min after which all measurement variables are settled to their new steady state values. Fig. 6 shows the distillation column variables: product flowrates (TP, BP), liquid holdups (MBP, MTP), bottom tray temperature (T(1)) and the top tray temperature (T(39)) in normal and diversely defined fault situations. No multiple simultaneous occurrences of faults are assumed in this work and the fault diagnosis system will be trained and validated based on single faults.
4. Training SVM models The Libsvm toolbox developed by Chang and Lin (2011) was used in this work for all the SVM training tasks. Libsvm employs a Sequential Minimal Optimization (SMO)-type decomposition method for the training algorithm of SVM (Platt, 1999). Also, linear kernel function is used. This means no mapping into a highdimensional space is performed. It should be noted that no significant improvement was achieved using nonlinear kernels in our case study. Hence, in all training actions, the regularization parameter C is the only parameter to be properly selected. Since measured variables have different magnitudes, observation samples are scaled using the following expression before being fed to the classification algorithm in the training or testing
X i X i;normal jΔXji; max
ð10Þ
where Xi and Xi,p are, respectively, the real and pre-processed values for the ith online measurement, respectively, X i;normal is the mean value of the ith measurement under a normal operating condition, and |ΔX|i,max is the maximum absolute value of the difference value X i X i;normal (Zhang, 2006). Data scaling is required to prevent variables in greater numeric ranges dominating those in smaller numeric ranges during learning and testing calculations (Hsu et al., 2003). Proper sampling time should be selected in order to choose training data from the simulated plant. Sampling time should be selected according to the kind of chemical process to obtain a good dynamic sequence of the process variables and being able to detect and diagnose any malfunctions. Since the simulation time is fixed at 200 min for all the column operational experiments, choosing a lower sampling time results in a larger training set size and vice versa. Here, we employed an experimental search to select the optimal training set size. Different sampling times corresponding to different data set sizes are selected and used to check the SVM classification performances. Six experiments corresponding to six different training sets are arranged. The same sampling time is used in data gathering of all six sets of data. Here, the OVO-SVM algorithm based on the majority voting algorithm is employed as the classifier and its performance is checked using the 5-fold cross validation on the training data. In the v-fold crossvalidation, the training set is divided into v subsets of equal sizes. Sequentially, one subset is tested using the classifier trained on the remaining v 1 subsets. The C parameter is set to 1000 in these experiments. The results of the experiments are depicted in Fig. 7. Note that the performance response follows an asymptotic curve that is a function of the training set size and as expected the accuracy normally growth as the training set size increases. Larger training sets will provide higher performances but will require longer training times. This discloses the trade-off decision which occurs when choosing the optimal training set size. Finally, a training size of 200 data points collected with the sampling time of 1 min is selected for each class to provide a trade-off between the training computational requirement and the classification accuracy. After the selection of the training set, additional experiments are considered in order to tune the training parameter C. An appropriate adjustment of this parameter results in a significant improvement of the diagnosis results. Starting with an initial value of C, the classifiers are trained and tested based on the 5-fold cross validation on the training set obtained from the previous experiment. The value of C is then increased exponentially and this procedure is repeated. The value of C which resulted in the best performance (i.e., C ¼104) was then selected. This procedure is performed for both of the OVO-SVM and OVA-SVM classifiers and the results are presented in Fig. 8.
5. Diagnosis results As mentioned in Section 2, the outputs of the trained binary classifiers in the OVO or OVA scheme can be combined in different ways to perform prediction task for new input samples. We discussed that the classical winner takes all and majority voting strategies for OVA-SVM and OVO-SVM, respectively. These techniques are based on the discrete decision functions, however, in this work new continuous decision functions are proposed. In this section, four strategies: OVO based on discrete decision functions (OVOD), OVA based on discrete decision functions (OVAD), OVO
28
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
Fig. 6. Column operation in normal and different fault situations.
Fig. 7. SVM optimal training set size selection procedure.
based on continuous decision functions (OVOC) and OVA based on continuous decision functions (OVAC) are investigated in the online incipient fault diagnosis of the distillation column. All these fault diagnosis approaches can be evaluated based on abrupt or incipient faults. Abrupt faults are faults modeled as stepwise function and incipient faults are faults modeled by using ramp
Fig. 8. SVM tuning of the training parameter procedure.
functions (see Fig. 9). Since abrupt faults present larger disturbances to the process and consequently result in sudden deviations of the process variables from their nominal values, they can easily be detected and diagnosed by both of the discrete and continuous based decision function approaches. Therefore, in the
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
29
Fig. 9. Abrupt and incipient fault behaviors.
Fig. 10. OVO-SVM outputs based on the discrete decision functions approach.
testing procedure only the incipient form of the five malfunctions listed in Table 1, which represents a more challenging diagnosis task, will be considered.
The column simulator is run for 100 min while each fault state listed in Table 1 is introduced in an incipient manner. As can be observed in Fig. 9, to simulate the incipient faults, several
30
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
Fig. 11. OVA-SVM outputs based on the discrete decision functions approach.
parameters have to be defined, namely: the fault starting time and the fault settling time. The fault starting time is the time at which the faulty behavior starts. The fault settling time is the time required after a change in a process to reach to its steady state. For our study, the fault starting time is considered to be at time equal to zero while fault settling time is set to be 100 min for all the fault conditions. Each fault is simulated separately without any case of combination with other faults (single fault diagnosis). Online measurements are collected with a sampling time of 1 s and initially processed using Eq. (10) before being submitted to each of the mentioned diagnosis systems. The parameter k in Eqs. (6) and (8) is selected as unity for all the fault cases. The outputs of the OVOD and OVAD diagnosis systems for each fault state are shown in Figs. 10 and 11, respectively. The value of zero in these figures indicates the normal state while other values correspond to their corresponding fault numbers listed in Table 1. The outputs of the OVOC and OVAC diagnosis systems for each
fault state are shown in Figs. 12 and 13, respectively. The OVAD and OVOD outputs for an input observation are single discrete values each indicating an operational state number while the OVAC and OVOC outputs consist of six continuous values each of which is associated to an operational state.
6. Discussion and performance evaluation The results depicted in Figs. 12 and 13 indicate that the OVOC and OVAC classification methods have successfully detected and diagnosed all the considered fault categories. Due to the low severity of the fault in the early development stage, the fault has no noticeable impact on the process. Hence, as seen in Figs. 12 and 13 for each case during the initial times, the decision value for the normal output is positive and high; while all other outputs are negative. As the fault grows with time, the value of the normal
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
31
Fig. 12. OVO-SVM outputs based on the continuous decision functions approach.
output falls down and the output corresponding to the fault go up in a way that it represents a fuzzy indication of severity of the fault. The only exception in diagnosing performance is for fault 5 in OVO-SVM, where as seen in Fig. 12 the fault is initially diagnosed as being fault 2. However, as the time is passed the fault 5 output rises while the fault 2 output drops and finally the correct diagnosis is resulted. For discrete decision based approaches also as seen in Figs. 10 and 11, at the start in the initial times of fault occurrence, the observations are classified as normal state (numbered as 0); nevertheless, after further propagation of fault into the process, finally the correct fault label is indicated by the systems in all fault cases. Despite the final correct diagnosis for the discrete decision based approaches; however Figs. 10 and 11 reveal the fact that the fault growing behavior cannot be represented by these approaches. The obtained results presented in this section clearly demonstrate the superiority of the continuous decision function classification over the discrete decision function classification for the
online incipient fault diagnosis tasks when a qualitative assessment of fault trajectory into the process is required. However, in order to present a quantitative evaluation for the employed diagnosis methodologies, detection and diagnosis time indices are contemplated. For OVOD and OVAD, detection and diagnosis times for one specific fault are essentially the same variable and are defined as the first time when the system output is deviated from zero. The diagnosis decision is then correct if the decision value equals to the occurring fault label and incorrect otherwise. However, for OVAC and OVOC where there are six outputs at each time instant, further analysis is required in order to decide upon detection and diagnosis steps. The fault detection alarm is considered to be based on the normal output deviations, since as seen in Figs. 12 and 13; the normal outputs of the both of the OVA and OVO methods response promptly to the growth of all kind of the faults in the process. A boundary condition can be developed for the normal output. It can then be used to define the regions of fault and no fault. Consequently, a fault will be detected by the system at the particular time if the normal output violates the
32
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
Fig. 13. OVA-SVM outputs based on the continuous decision functions approach.
boundary in that time. We experimentally obtained the boundary conditions by examining the system responses under normal and different faulty operations. For this purpose, first a large number of normal operating data is generated using the column simulator which is then divided into training and test data. The normal training data is fed to the OVOC and OVAC algorithms and their normal outputs are shown in Fig. 14. As seen in this figure, the normal outputs of both OVOC and OVAC are close to 1 which indicate the process data belong to the normal operating condition. The fault detection thresholds are represented by the horizontal lines in Fig. 14 which separate the normal space from faulty space so that a sample which do not lie within the boundary conditions will be considered as the faulty sample. The Bound Width (BW) value which determines the coordinates of normal thresholds can be specified by a deviation in percentage of the average value of the normal outputs for the training data. For example, BW¼0.9 % in Fig. 14 for OVO-SVM indicates the distance between the upper threshold line and the mean line is considered to be 0.9% of the average value of the normal outputs for the training data. An identical plot for OVAC with BW¼6% is also represented in Fig. 14.
In determining the BWs, a trade-off arises between the detection latency (the time interval between the instant a fault is initiated and detected) and the false alarm rate (the percentage of the normal data which are incorrectly recognized as faulty states). Since fast fault detection (low detection latency) requires considering lower BWs resulting in higher false alarm rates. The false alarm rate determines the robustness of one fault detection system while detection latency is associated with its sensitivity. In order to select optimal boundary conditions, different values of BWs are selected and the false alarm rate of the training data and the mean detection time of all fault states are determined. The corresponding results are listed in Table 2. Note that since simulated fault are introduced at time zero, the detection time and detection latency time are equal. Finally, the BW values are considered to be equal to 1% and 7% for OVOC and OVAC, respectively, which resulted in zero false alarm rates for both of these two techniques. The fault diagnosis time is considered as the first time when a faulty output reaches to a positive value. This choice is based on the fact that in all cases for all operational states, at no time more than one faulty output has a positive value as seen in Figs. 12 and 13.
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
33
Fig. 14. Defining boundary conditions for normal outputs of OVO-SVM and OVA-SVM. Table 2 False alarm rate for the training data versus mean detection time for different values of bound widths (BW). Classifier
Indexes
OVO-SVM
BW (%) False alarm rate (%) Mean detection time (min) Bound width (%) False alarm rate (%) Mean detection time (min)
OVA-SVM
0.4 13.78 0.63 1 58.04 0.51
0.5 5.97 1.24 2 25.93 0.84
0.6 2.63 2.18 3 9.18 1.21
0.7 1.02 3.44 4 2.46 2.32
0.8 0.29 3.80 5 0.54 4.79
0.9 0.12 4.42 6 0.14 10.39
1 0 4.60 7 0 12.09
Fig. 15. Detection and diagnosis of the fault 3 in different methodologies.
Consequently, there will always be a single predicted fault label at each operational time. The diagnosis decision is correct if the positive faulty output indicates the true occurring fault and incorrect otherwise. For an example, the detection and diagnosis steps of fault 1 using all the methodologies are shown in Fig. 15.
Table 3 shows the false alarm rates and also detection and diagnosis time results for all the classification strategies. These are determined based on the corresponding procedure discussed for each method in all fault cases. As stated before, for discrete based classification methods, i.e., OVAD and OVOD, if the diagnosis
34
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
Table 3 Monitoring results of different methodologies. Diagnosis case
Monitoring Index
OVOC
OVAC
OVOD
OVAD
Normal condition Fault 1
False alarm rate (%) Detection time (min) Diagnosis time (min) Detection time (min) Diagnosis time (min) Detection time (min) Diagnosis time (min) Detection time (min) Diagnosis time (min) Detection time (min) Diagnosis time (min)
0 4.53 7.20 3.47 5.91 8.09 23.81 5.12 22.98 1.79 5.51(incorrect) 33.95(correct) 4.60 18.77
0 4.66 10.00 29.87 29.87 17.16 29.38 6.72 41.57 2.07 31.03
0 7.20 7.20 5.91 5.91 23.81 23.81 22.98 22.98 5.51 5.51(incorrect) 32.08 (correct) 13.22 18.40
0 16.92 16.92 50.14 50.14 41.10 41.10 47.81 47.81 12.68 12.68
Fault 2 Fault 3 Fault 4 Fault 5
Average
Detection time (min) Diagnosis time (min)
decision is correct, the detection and diagnosis times are the same (see Fig. 15). However, for continuous based approaches, due to the employed policy in the fault monitoring task, there is time latency between the detection and diagnosis alarms of each of the fault cases. Generally, large values of diagnosis times may lead to negative judgments about the diagnosis performances of the employed methodologies. However, it is worth mentioning that for incipient faults, where fault evolves slowly through time, the abnormality should not necessarily be diagnosed until the process variables deviate significantly form their nominal values and symbolize clear faulty patterns in the classification point of view. Basically, in order to have a general examination judgment of one classification system in diagnosis of incipient faults, the diagnosis time of one fault should be compared with its settling time where fault reaches to its steady value (Mendonça et al., 2009). In our study, as stated before in Section 5, the settling time of all the simulated faults are considered to be 100 min. Hence it can be claimed that the general diagnosis performance of the employed methodologies which are represented in Table 3, is fairly acceptable. As shown in this table, all faults are correctly detected and diagnosed successfully before their fault settling time. Different detection and diagnosis results obtained for different faults, basically originate from the dissimilarity of their effects on the process variables. As a general fact, in classification based fault diagnosis, the success in detection and diagnosis of one specific fault is related to its severity and also exclusivity impact on the observed process variables. To investigate this fact, consider for example faults 3 and 4 listed in Table 1 are associated to the changes in the feed composition. Engineering knowledge about the column dynamical behavior reveals the fact that the variation in feed composition, among the entire observed variables introduced in relation 9, leads to changes in tray temperatures only and does not affect flowrate and liquid holdup variables. This also can be checked in Fig. 6 where due to the faults 3 and 4, sluggish changes in tray temperatures are obtained while flowrate and liquid holdup variables are remained constant close to their normal values. The column operation under faults 3 and 4 can be compared with others faults such as 1 or 2 where sudden and significant variations in liquid holdup variables are occurred as depicted in Fig. 6. Finally, it can be concluded that the detection and diagnosis missions of the fault 1 or 2 could be achieved faster than the faults 3 and 4 by any classification system. Indeed, the diagnosis results in Table 3 confirm this point. Moreover, for the OVO method especially, diagnosis times of the continuous based and discrete based decision function approaches are comparable in most cases. However, the excellent results obtained by OVOC and OVAC within short detection times,
12.09 28.37
33.73 33.73
emphasize the superiority of the continuous based over the discrete based approaches if the crucial task of detection of incipient faults in their initial stages is required. Also, the OVO based approaches presented better results than the OVA based methods in both detection and diagnosis tasks. The advantage of the OVO strategy over the OVA method in terms of classification accuracy is also investigated in other researches (Hsu and Lin, 2002; Weston and Watkins, 1999). However, as discussed before in Section 2.2, the ratio between the number of required binary classifiers in the OVO and OVA methods is (m 1)/2, which is significant when m is large. Hence, in the training phase the computational load for the OVO method may become unaffordable for large values of m. In the testing procedure of the classification algorithms based on the OVO method, all the trained binary classifiers should be employed and their outputs are combined. This may result in sluggish responses for diagnosis systems. Hence, the use of SVM is not recommended for solving problems with more than 10 classes by some investigators (Chiang et al., 2004).
7. Conclusions The consistency and the proper management of incipient faults is a challenging subject in process industries due to the problems encountered in online monitoring of the fault occurrences. These problems originate from the timing behavior of the incipient faults where the depth of fault grows steadily over time and the fault may exist in the process for a long time without revealing noticeable influences on the process variables. In this work, for mainly detection and diagnosis of incipient faults, a SVM classifier based on the continuous decision function technique is proposed. In a two-class classification problem, this technique enables the determination of both the label and the severity of the fault in comparison to the traditional discrete decision function approach which does not offer any information about the fault progress. For multiclass classification, two methodologies based on the OVO-SVM and OVA-SVM approaches are proposed. These methodologies provide continuous outputs corresponding to each process operational status at each time instant of the process operation. These techniques are applied to the incipient fault diagnosis of a continuous binary mixture distillation column. As expected, for all fault situations of the case study, the outputs of the proposed classification systems successfully diagnosed the faults and also reported the development of each fault over time. Moreover, further analysis of the classifiers outputs provided an opportunity to detect and diagnose faults in their initial stages which is a critical task in incipient fault management. For this purpose in the proposed methodologies, fault detection and
M. Namdari, H. Jazayeri-Rad / Engineering Applications of Artificial Intelligence 28 (2014) 22–35
diagnosis are accomplished by defining some thresholds for normal and faulty outputs so that the occurrence of the fault and also the type of the fault is reported at the moments when these thresholds are violated by the classifiers' outputs. In this study, performance comparison is made between four diagnosis schemes, namely, OVAC, OVOC, OVAD and OVAD. The monitoring results in terms of detection time, diagnosis time, and false alarm rates confirmed the advantage of the proposed continuous based approaches (OVOC and OVAC) over the traditional discrete based methods (OVOD and OVAD). In addition, OVOD presented superior results against the OVAD method. However, the application of the OVOD due to its heavy training calculations and also sluggish behavior in the testing phase may not be suggested for large fault diagnosis problems in online applications. Acknowledgments The authors are grateful to the anonymous referees for their useful remarks which led to an improved manuscript. References Abe, S., 2005. Support Vector Machines for Pattern Classification. Springer, New York. Behbahani, R.M., Jazayeri-Rad, H., Hajmirzaee, S., 2009. Fault detection and diagnosis in a sour gas absorption column using neural networks. Chem. Eng. Technol. 32 (5), 840–845. Chang, C.C., Lin, C.J., 2011. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2 (3), 1–27. (Article 27). Chiang, Leo H, Kotanchek, M.E., Kordon, A.K., 2004. Fault diagnosis based on Fisher discriminant analysis and support vector machines. Comput. Chem. Eng. 28 (8), 1389–1401. Chiang, Leo H, Russell, E.L., Braatz, R.D., 2000. Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis. Chemom. Intell. Lab. Syst. 50 (2), 243–252. Chiang, Leo Hua, Braatz, R.D., Russell, E.L., 2001. Fault Detection and Diagnosis in Industrial Systems. Springer. Cho, J.H., Lee, J.M., Wook Choi, S., Lee, D., Lee, I.B., 2005. Fault identification for process monitoring using kernel principal component analysis. Chem. Eng. Sci. 60 (1), 279–288. Choi, S.W., Lee, C., Lee, J.M., Park, J.H., Lee, I.B., 2005. Fault detection and identification of nonlinear processes based on kernel PCA. Chemom. Intell. Lab. Syst. 75 (1), 55–67. Dash, S., Venkatasubramanian, V., 2000. Challenges in the industrial applications of fault diagnostic systems. Comput. Chem. Eng. 24 (2), 785–791. Detroja, K.P., Gudi, R.D., Patwardhan, S.C., 2007. Plant-wide detection and diagnosis using correspondence analysis. Control Eng. Pract. 15 (12), 1468–1483. Detroja, K.P., Gudi, R.D., Patwardhan, S.C., Roy, K., 2006. Fault detection and isolation using correspondence analysis. Ind. Eng. Chem. Res. 45 (1), 223–235. Eslamloueyan, R., 2011. Designing a hierarchical neural network based on fuzzy clustering for fault diagnosis of the Tennessee Eastman process. Appl. Soft Comput. 11 (1), 1407–1415. Guo, M., Xie, L., Wang, S. qing, Zhang, J.M., 2003. Research on an integrated ICASVM based framework for fault diagnosis. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 2710–2715. He, Q.P., Wang, J., 2007. Fault detection using the k-nearest neighbor rule for semiconductor manufacturing processes. IEEE Trans. Semicond. Manuf. 20 (4), 345–354. Hsu, C.W., Lin, C.J., 2002. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13 (2), 415–425. Hsu, C.W., Chang, C.C., Lin, C.J., 2003. A Practical Guide to Support Vector Classification. Technical Report. Department of Computer Science and Information Engineering. University of National Taiwan, Taipei, pp. 1–12. Isermann, R., 2005. Model-based fault-detection and diagnosis–status and applications. Annu. Rev. Control 29 (1), 71–85. Jie, X.U., Shouson, H.U., 2010. Nonlinear process monitoring and fault diagnosis based on KPCA and MKL-SVM. Chin. J. Sci. Instrum. 31 (11), 2428–2433. Joe Qin, S., 2003. Statistical process monitoring: basics and beyond. J. Chemom. 17 (8–9), 480–502. Kim, H.C., Pang, S., Je, H.M., Kim, D., Yang Bang, S., 2003. Constructing support vector machine ensemble. Pattern Recognit. 36 (12), 2757–2767. Ku, W., Storer, R.H., Georgakis, C., 1995. Disturbance detection and isolation by dynamic principal component analysis. Chemom. Intell. Lab. Syst. 30 (1), 179–196.
35
Kulkarni, A., Jayaraman, V.K., Kulkarni, B.D., 2005. Knowledge incorporated support vector machines to detect faults in Tennessee Eastman process. Comput. Chem. Eng. 29 (10), 2128–2133. Lee, J., Qin, S.J., Lee, I., 2006. Fault detection and diagnosis based on modified independent component analysis. AIChE J. 52 (10), 3501–3514. Lee, J., Qin, S.J., Lee, I., 2007. Fault detection of non‐linear processes using kernel independent component analysis. Can. J. Chem. Eng. 85 (4), 526–536. Lee, J.M., Yoo, C., Lee, I.B., 2004. Statistical process monitoring with independent component analysis. J. Process Control 14 (5), 467–485. MacGregor, J.F., Kourti, T., 1995. Statistical process control of multivariate processes. Control Eng. Pract. 3 (3), 403–414. Mahadevan, S., Shah, S.L., 2009. Fault detection and diagnosis in process data using one-class support vector machines. J. Process Control 19 (10), 1627–1639. Mao, Y., Xia, Z., Yin, Z., Sun, Y., Wan, Z., 2007. Fault diagnosis based on fuzzy support vector machine with parameter tuning and feature selection. Chin. J. Chem. Eng. 15 (2), 233–239. Mendonça, L.F., Sousa, J.M.C., Sá da Costa, J.M.G., 2009. An architecture for fault detection and isolation based on fuzzy methods. Expert syst. Appl. 36 (2), 1092–1104. Moguerza, J.M., Muñoz, A., 2006. Support vector machines with applications. Stat. Sci. 21 (2), 322–336. Monroy, I., Benitez, R., Escudero, G., Graells, M., 2010. A semi-supervised approach to fault diagnosis for chemical processes. Comput. Chem. Eng. 34 (5), 631–642. Namdari, M., Jazayeri-Rad, H., Nabhani, N., 2013. Comparing the performance of two neural network methods in a process fault diagnosis system. J. Basic Appl. Sci. Res. 3 (2), 942–947. Nomikos, P., MacGregor, J.F., 1994. Monitoring batch processes using multiway principal component analysis. AIChE J. 40 (8), 1361–1375. Platt, J.C., 1999. Fast Training of support vector machines using sequential minimal optimization. In: B. Schölkopf, C. Burges, and A. Smola, (Eds.). Advances in Kernel Methods - Support Vector Learning, MIT Press, 185–208. Power, Y., Bahri, P.A., 2004. A two-step supervisory fault diagnosis framework. Comput. Chem. Eng. 28 (11), 2131–2140. Russell, E.L., Chiang, L.H., Braatz, R.D., 2000. Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis. Chemom. Intell. Lab. Syst. 51 (1), 81–93. Sanchez, A., David, V., 2003. Advanced support vector machines and kernel methods. Neurocomputing 55 (1), 5–20. Skogestad, S., Morari, M., 1988. Understanding the dynamic behavior of distillation columns. Ind. Eng. Chem. Res. 27 (10), 1848–1862. Vapnik, V., 1999. The Nature of Statistical Learning Theory. Springer, New York. Venkatasubramanian, V., Rengaswamy, R., Yin, K., Kavuri, S.N., 2003a. A review of process fault detection and diagnosis: Part I: Quantitative model-based methods. Comput. Chem. Eng. 27 (3), 293–311. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S.N., 2003b. A review of process fault detection and diagnosis: part II: qualitative models and search strategies. Comput. Chem. Eng. 27 (3), 313–326. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S.N., Yin, K., 2003c. A review of process fault detection and diagnosis: part III: process history based methods. Comput. Chem. Eng. 27 (3), 327–346. Verron, S., Tiplica, T., Kobi, A., 2006. Bayesian networks and mutual information for fault diagnosis of industrial systems. Workshop on Advanced Control and Diagnosis (ACD'06). Verron, S., Tiplica, T., Kobi, A., 2010. Fault diagnosis of industrial systems by conditional Gaussian network including a distance rejection criterion. Eng. Appl. Artif. Intell. 23 (7), 1229–1235. Weston, J., Watkins, C., 1999. Support vector machines for multi-class pattern recognition. In: ESANN, pp. 61–72. Xu, J., Zhao, J., Ma, B., Hu, S., 2013. Fault diagnosis of complex industrial process using KICA and sparse SVM. Math. Prob. Eng., 1–6. Yélamos, I., Graells, M., Puigjaner, L., Escudero, G., 2007. Simultaneous fault diagnosis in chemical plants using a multilabel approach. AIChE J. 53 (11), 2871–2884. Yélamos, Ignacio, Escudero, G., Graells, M., Puigjaner, L., 2009. Performance assessment of a novel fault diagnosis system based on support vector machines. Comput. Chem. Eng. 33 (1), 244–255. Zhang, J., 2006. Improved online process fault diagnosis through information fusion in multiple neural networks. Comput. Chem. Eng. 30 (3), 558–571. Zhang, J., Morris, A.J., 1994. Online process fault diagnosis using fuzzy neural networks. Intell. Syst. Eng. 3 (1), 37–47. Zhang, Y., 2008. Fault detection and diagnosis of nonlinear processes using improved kernel independent component analysis (KICA) and support vector machine (SVM). Ind. Eng. Chem. Res. 47 (18), 6961–6971. Zhang, Y., 2009. Enhanced statistical analysis of nonlinear processes using KPCA, KICA and SVM. Chem. Eng. Sci. 64 (5), 801–811. Zhang, Y., Zhou, H., Qin, S.J., Chai, T., 2010. Decentralized fault diagnosis of largescale processes using multiblock kernel partial least squares. IEEE Trans. Ind. Inf. 6 (1), 3–10. Zhao, C., Wang, F., Zhang, Y., 2009. Nonlinear process monitoring based on kernel dissimilarity analysis. Control Eng. Pract. 17 (1), 221–230.