Journal of Process Control 31 (2015) 45–54
Contents lists available at ScienceDirect
Journal of Process Control journal homepage: www.elsevier.com/locate/jprocont
Decision fusion systems for fault detection and identification in industrial processes Fuyuan Zhang a , Zhiqiang Ge a,b,∗ a State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Department of Control Science and Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, PR China b Key Laboratory of Advanced Control and Optimization for Chemical Processes, Shanghai 200237, PR China
a r t i c l e
i n f o
Article history: Received 3 August 2014 Received in revised form 6 March 2015 Accepted 9 April 2015 Keywords: Fault detection and identification Decision fusion system Data-driven model Diversity Dempster–Shafer evidence theory
a b s t r a c t Numerous fault detection and identification methods have been developed in recent years, whereas, each method works under its own assumption, which means a method works well in one condition may not provide a satisfactory performance in another condition. In this paper, we intend to design a fusion system by combining results of various methods. To increase the diversity among different methods, the resampling strategy is introduced as a data preprocessing step. A total of six conventionally used methods are selected for building the fusion system in this paper. Decisions generated from different models are combined together through the Dempster-Shafer evidence theory. Furthermore, to improve the computational efficiency and reliability of the fusion system, a new diversity measurement index named correlation coefficient is defined for model pruning in the fusion system. Fault detection and identification performances of the decision fusion system are evaluated through the Tennessee Eastman process. © 2015 Elsevier Ltd. All rights reserved.
1. Introduction It is well known that not only the proper monitoring of the industrial process is significant and practical, but also the fast and precise identification of faults is essential for reducing the number of off-products and improving the productivity of the process. Thus, searching for the method which is effective and well-suited for monitoring is becoming more and more important. In chemical process industries, particularly, fault detection and identification is a hot research spot in the past years. Generally, process monitoring methods can be divided into three categories [1–5]: model-based methods, knowledge-based methods, and data-based methods. Due to advantages of having few requirements of the process model and the associated expert knowledge, the data-based method has recently become the most popular one for process monitoring. Among all data-based process monitoring methods, typically used ones include principal component analysis (PCA), independent component analysis (ICA), partial
∗ Corresponding author at: State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Department of Control Science and Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, PR China. Tel.: +86 571 87951442. E-mail address:
[email protected] (Z. Ge). http://dx.doi.org/10.1016/j.jprocont.2015.04.004 0959-1524/© 2015 Elsevier Ltd. All rights reserved.
least squares (PLS), artificial neural networks (ANN), etc. Although satisfactory results have been obtained in many industrial processes by using those mature methods, the equipment used in the industrial plants become more and more complicated and multifunctional, and the state of process is usually the combination of many operating conditions, which may cause performance deteriorations of those methods. In fact, it is obvious that sometimes the choice of one method under a single assumption will not achieve good results that we expect, as is shown in the work of Venkatasubramanian [4], as a result of the mismatching between the real process and the model assumption. Therefore, it is a question here, is there a perfect method that can deal with any complex condition in a process? The answer is absolutely no. According to the No Free Lunch theorem [6], there is no algorithm which is universally superior to others, that is to say, we are not able to design a strategy that can adapt to a variety of situations, e.g. non-Gaussian data distributions, nonlinear relationships among process variables, frequent changes of operating conditions, etc. In order to address this problem, some researchers put forward to the idea of ensemble systems [7,8]. The main purpose is to combine sorts of methods which have completely different emphases on modeling the data when dealing with the same problem through some efficient fusion algorithms. One key factor of the ensemble system is the characteristic of diversity, which means
46
F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54
each single model needs to express different views of the system, thus has different errors so that the total error can be reduced after the ensemble process. Although there is no strict definition and explicit measurement of the diversity, it has been illustrated that the more the diversity is, the better the fusion results could be [9,10]. For example, Polikar [11] has experimentally proved that the ensemble of multiple classifiers performed better than a single one where the diversity is quite significant. The other key factor of the ensemble system is about the decision making or combination for various models. In general, there are two categories: utility-based methods and evidence-based methods. A representative of the former is the voting-based method [12–14], and the latter includes Bayesian method [15], Dempster–Shafer (D–S) method [16], decision templates [17], Borda count [18], etc. Compared to other decision making approaches, the D–S framework provides a more flexible mathematical tool for dealing with imperfect information, as well as a more simple computing procedure and concise expression of final decision. What is more, the D–S method has no limitation of the data distribution, which can bring lots of conveniences during data preprocessing. Due to those advantages, the D–S based method has been widely used for decision making in the past years [15,19–22], and has also been proved to be an appropriate approach for improving the performance of an ensemble model that deals with unreliable information [20]. In this paper, the Dempster–Shafer evidence theory is employed for the development of decision fusion systems for fault detection and identification. In order to enhance the diversity performance of the fusion system, a resampling strategy is introduced as a data preprocessing procedure, in addition to using different types of data models. Furthermore, through defining a new correlation measurement index, those classifiers which have similar characteristics are pruned from the fusion system. As a result, both of the computational efficiency and the classification reliability can be improved. Here, the fusion system which incorporates all classifiers is called as ALL fusion system, and the one with pruning strategy is represented as SELECTIVE fusion system. The rest of the paper is organized as follows. Section 2 provides a review of preliminary knowledge about the Dempster–Shafer evidence theory. Due to the length of this paper, we have ignored detailed preliminary knowledge about selected unsupervised and supervised modeling methods, since one can easily find them in many published books and papers. Section 3 describes a complete framework of ALL and SELECTIVE fusion systems, with the definition of a new index to measure the correlations among different methods. Online fault detection and identification results are illustrated based on the proposed framework by using the Tennessee Eastman (TE) process in Section 4. Finally, conclusions are made. 2. Dempster–Shafer evidence theory The evidence theory is initially proposed by Dempster [23] concerning lower and upper probability distribution, and Shafer [16] proved the ability of the belief functions to model uncertain knowledge. Then, the complete Dempster–Shafer theory was formulated, which enables us to combine evidences from different sources and arrives at a degree of belief which has been widely used in the field of information fusion. In this section, some basic concepts and combination rules of the Dempster–Shafer theory are introduced, one can refer to Shafer [16], Smets and Kennes [24], or Yager [25] for more detailed instructions on this subject. 2.1. Basic definitions Definition 1. Let be a finite non-empty set of N mutually exhaustive and exclusive hypotheses about some fault class
domain. Then, Let us denote 2 , the power set of , composed with all the proposition of F in . ˝ = {F1 , F2 , . . ., FN } ˝
2
(1)
= {∅, {F1 }, {F2 }, . . ., {FN }, {F1 ∪ F2 }, {F1 ∪ F3 }, . . ., ˝}.
(2)
Definition 2. Basic probability assignment (BPA), also called the mass function or basic belief assignment, is a function mapping from 2˝ to [0,1] which assigns a belief value to each element of power set. It satisfies the following two properties: m : 2˝ → [0, 1] m(∅) = 0
(3)
m(A) = 1
A⊆˝
where ∅ is an empty set and it is called normalized BPA with m(∅) =0, otherwise, each subset A when m(A) > 0, is called the focal element of m. Definition 3. Bel(A) =
The belief function is defined as bel : 2˝ → [0, 1]
m(B)
(4)
B⊆A
Definition 4.
The plausible function is defined as pl : 2˝ → [0, 1]
Pl(A) = 1 − Bel(A) =
m(B)
(5)
A∩B = / ∅
where A is the negation of a hypothesis A. Definition 5. [Bel(A), Pl(A)] is the confidence interval which describes the uncertainty about A. If the difference between Bel and Pl increases, then the information available used for fusion will decrease. Therefore, the difference provides a measurement of uncertainty about the level of evidence. 2.2. Rule of combination When multiple independent sources of evidence are available, such as m1 and m2 , the combined evidence can be obtained by Dempster’s rule as follows: m(∅) = 0, m1.2 (A) = m1 (A) ⊕ m2 (A) =
1 1−K
m1 (B)m2 (C)
(6)
B∩C=A
where K = B∩C=∅ m1 (B)m2 (C), it represents the BPA when the result of the combination is an empty set, and is often interpreted as a measurement of conflict between the two pieces of evidence, which satisfies K = / 1. Obviously, the larger K is, the more conflict the evidences are, and the less information is available. Obviously, the Dempster’s rule can be easily extended to more than two hypothesis, as shown in Eq. (7), i.e., by combining the BPAs of first two classifiers (m1 and m2 ) using Eq. (6) to obtain the combined BPA (m1 , m2 ) and then combine the result (m1.2 ) with the BPA of the third classifier (m3 ) and so forth until the Tth classifier. m1,2,...,T = m1 ⊕ m2 ⊕ . . . ⊕ mT = (((m1 ⊕ m2 ) ⊕ m3 ) ⊕ . . . ⊕ mT ) = ((m1,2 ⊕ m3 ) ⊕ . . . ⊕ mT ). . .
(7)
In recent years, Dempster–Shafer based fusion has been widely used in various fields, such as pattern recognition, process fault diagnosis, geographic information systems, medical diagnosis. For example, Parikh et al. [26,27] used the Dempster–Shafer evidence theory to combine the outputs of multiple primary classifiers to improve overall classification performance. The effectiveness of this approach was demonstrated for detecting failure in a diesel
F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54
47
Fig. 1. Fault identification framework based on Dempster–Shafer evidence theory.
engine cooling system. Kaushik Ghosh [28] proposed a framework for distributed fault detection and identification and adapted the Dempster–Shafer evidence theory to combine these diagnostic results at different levels of abstraction. 3. Decision fusion systems for fault detection and identification As we know, only if we could find or design a good enough classifier which has a perfect generalization performance under different sorts of circumstances, there is no need to resort to ensemble fusion techniques. However, the reality of noise, outliers, or missing data, makes a perfect classifier is unknown to us and at least it could not be well designed for different conditions. Therefore, we try to exploit a system which includes many classifiers and want to reach the objective of the best classifier. When individual classifiers make errors on different instances, the intuition is that different types of classifiers can often complement for each other, specifically, we need classifiers whose decision boundaries are adequately different from those of others, and hence the classification performance can be improved as a result of combination. Such a set of classifiers is said to be diverse. It is obvious that we must focus on the important element of diversity with regard to utilizing a multiple classifier system, as well as resampling of the dataset. Fig. 1 shows the main architecture of the proposed decision fusion system. There are generally two steps which composed of off-line modeling about the training data and online classification of unlabeled data. In detail, there are four main procedures related to the implementation of this system, that is, • Resampling of training data. • Selection of multiple classifiers. • Testing the performance of classifiers denoted by confusion matrices. • Ensemble decisions using D–S evidence theory. 3.1. Resampling of training data The diversity of the classifier can be achieved by many ways. The most popular method is probably the resampling technique, which is used in process of bootstrapping or bagging, where training datasets are obtained by randomly drawing and replacing in the whole training data. The resampling technique is firstly introduced
by Efron [29]. It is a method to make samples recombined by randomly sampling with replacement, and it is based on the theory of probability and statistics, as a matter of fact, an instance in the training set has probability 1 − (1 − 1/n)n of being selected at least once in the n times instances are randomly selected from the training set. For large n, this is about 1 − 1/e = 63.2% which means that each subset contains only about 63.2% unique instances from the training set. That is to say, when randomly drawing samples in a training dataset, the more times of one sample comes up, the more useful information it contains, and it is helpful to weaken the relevance among different testing data, and improve the diversity of samples by importing randomness to some extent. Let C be the number of fault classes in the process, represented as ˝ = {F1 , F2 , . . ., FC } , for the ith fault called Fi , where i = 1, 2, . . ., C. For process monitoring purpose, we also need the data when the process is under the normal operating condition, denoted by F0 . For each of the data matrix Fi , there are n rows (samples) and m columns (variables). Thus, there are totally C + 1 classes which need to be identified when fault identification is required from the process. Here are specific steps for resampling of the process dataset: Step 1: Randomly select the number from 1 to n by n times, record the location index denoted by S. Step 2: Rearrange each dataset of C + 1 classes as the index of S. Step 3: Generate a new data matrix, i.e., X. 3.2. Selection of multiple classifiers Another way to achieve the diversity of the decision fusion system is to select different classifiers, similar to the random subspace method proposed by Ho where diversity is obtained by training each classifier through different features chosen in the primitive feature space [30]. For selection of multiple classifiers or fault identification methods, there are two factors that should be considered: the category of classifiers, and the number of classifiers. Without loss of generality, we pick widely used classifiers from both unsupervised and supervised modeling methods, which are listed as follows: (1) Unsupervised methods, include Principal Component Analysis (PCA), Kernel Principal Component analysis (KPCA), and Independent Component Analysis (ICA); (2) Supervised methods, include K-nearest neighbor (KNN), Fisher discrimination analysis (FDA), and Artificial Neural Network (ANN).
48
F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54
It is worth to note that the selected methods have their own modeling emphases in address different kinds of process data characteristics, such as non-Gaussian data distribution, variable nonlinearity, time-varying and multiple operating conditions, etc. Obviously, when we select more classifiers, more satisfactory results can be expected. However, for the convenience of modeling and online implementation, we set the number of classifiers as six. One can easily extend it to more general cases under the same modeling framework. 3.3. Testing the performance of classifiers denoted by confusion matrix
FinalDS = arg max [m1,2,...T (Fi )]
In this section, we test the performance of each classifier, and the information is stored in confusion matrix, which is usually constructed by testing on separate validation datasets [31]. Suppose ˝ = {F1 , F2 , . . ., FC } , for the ith fault called Fi , where i = 1, 2, . . ., C, there are C classes in total, and T stands for the number of classifiers. For an instance x, if the output of the kth classifier is class Fj . i.e., Ek (x) = Fj . The confusion matrix CMk for classifier k is typically represented as
⎡
k N11
k N12
···
k N1C
⎢ k k k ⎢ N21 N22 · · · N2C ⎢ CM = ⎢ .. ⎢ .. .. . ⎣ . . k
k NC1
k NC2
···
k NCC
k N1(C+1) k N2(C+1)
.... ..
⎤
⎥ ⎥ ⎥ ⎥ k = 1, 2, . . ., T ⎥ ⎦
(8)
3.4. Ensemble decisions using D–S evidence theory In this section, two decision fusion systems are developed, named as the All fusion system and the SELECTIVE fusion system. The main difference between these two fusion systems is due to a new defined index named as correlation coefficient, which is used for measurement of the correlations among different classifiers. While the ALL fusion system incorporates all individual classifiers, the SELECTIVE fusion system uses the correlation coefficient based index for selection of individual classifiers in the first step, and then adopts those selected classifiers for decision fusion. 3.4.1. All fusion system To build the fusion algorithm, the confusion matrix obtained from performance evaluation process is used to estimate the belief functions of each classifier. The main steps of the fusion process are summarized as follows: Step 1: Calculate the individual basic probability assignment (BPA) of each classifier. Nijk
M
i=1
Nijk
k = 1, 2, . . ., T
(9)
where
A∈
mk.l (A) = 1
(10)
(11)
i∈[1,C]
3.4.2. SELECTIVE fusion system To further improve the performance of the decision fusion system, the ALL fusion system is pruned to exclude some similar individual classifiers, which we named here as the SELECTIVE fusion system. To date, there are several assessment of diversity to measure the correlations among different classifiers, such as pairwise measures [32], which calculate between two classifiers, and non pair-wise measures e.g. Entropy measure and Kohavi–Wolpert variance measure. In this paper, we proposed a new non pair-wise method, which is called as correlation coefficient (corrij ) here, and defined as corrij =
k NC(C+1)
where the rows in this confusion matrix represent the actual classes: F1 , F2 , . . ., FC , while the columns stand for the classes assigned by the kth classifier, and note that the last column C + 1 stands for the normal class, which also indicates the classification result of the type I error. For example, the element Nijk in the confusion matrix, represents the number of validation samples from class Fi that are assigned to class Fj by classifier k. Thus, for each classifier, we can get a similar confusion matrix, which contains the corresponding performance information.
mk (Fi ) =
In Eq. (9), the element Nijk represents the number of validation samples from class Fi that are assigned to class Fj by classifier k, and M indicates the total number of classifiers that have classified the sample which pertains to class i to class j. Step 2: Compute the combined BPAs using Dempster’s rule. After achieving the individual BPAs of all classifiers, we can combine them by Dempster’s rule provided in Eqs. (6) and (7), to obtain the combined result of all T classes, i.e. m1,2,...,T (Fi ). Step 3: Final decision making. Choose the class Fi with the maximum combined BPA as the final decision, that is,
cov(cmi , cmj )
D(cmi )
D(cmj )
(12)
where cmi and cmj are the confusion matrix of the ith and jth classifier, D(cmi ), D(cmj ) are the variance of the ith and jth confusion matrix. From the statistical viewpoint, index of corrij is a measurement of linear relationship between two classifiers. Based on this index, our aim is to obtain small correlation coefficients so that the corresponding classifiers will have a high diversity between each other. Meanwhile, this index (corrij ) also provides strong and reliable bases to implement the combination by using proper fusion rules. Therefore, after we have selected different classifiers in the previous step, we can use Eq. (12) to compute the values of correlation coefficient for different classifiers. To prune some similar classifiers for the fusion system, we can set a threshold to evaluate the similarity limit, thus we regard the values greater than this value as high similarities among those classifiers and most of them should be abandoned. Similar to the procedures introduced in Section 3.4.1, we can compute the individual BPA and combined BPA values, based on which the final decision can be made. 4. Case study: Tennessee Eastman challenge problem In this section, the proposed method is tested for online fault identification on the Tennessee Eastman (TE) industrial challenge problem [33,34]. This process produces two products (G and H) and a byproduct (F) from reactants A, C, D, and E, which can be seen in Fig. 2. The process has five major units, namely: a two-phase reactor, a product condenser, a flash separator, a recycle compressor, and product stripper. It consists of 41 measured variables, which are 22 continuous process measurements (Table 1), and 19 composition measurements. Besides, there are 12 manipulated variables, 21 programmed faults are introduced to the process, which are tabulated in Table 2. More details of the process tested are explained well in the book of Chiang et al. [1]. Here, we use the 22 continuous measurements for fault detection and identification. With reference to the fault information, faults 1 to 7 are step changes of the process variables; faults 8 to 12 are random changes of the variables; fault 13 is a slow shift of the
F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54
49
Fig. 2. Tennessee Eastman process.
reaction kinetics; faults 14, 15, and 21 are in relation to valve sticking; and faults16 to 20 are types of unknown faults. Among these faults, owing to the great effect on the process and bring changes between the process variables, some faults are easy to be detected. However, there are also faults that are difficult to be detected (e.g., faults 3, 9, and 15), because they are very small and have little influence on the process. In this process, the existing controller is able to provide good recovery for faults 3, 4, 9, 14, 15, 16, and 19, therefore, these faults are excluded from the analysis in the present paper. Firstly, we choose fault 1, 2, 5, 6, 8, and 12 for simulation to verify the effectiveness of the proposed method. Each normal and fault dataset contains 960 samples with a sampling interval of 3 min. All faults were introduced at sample 161 of operating time for the training data. Thus there are totally 800 fault samples in each fault class. The parameters for each classifier are as follows: for fault detection in PCA, data which are collected in normal operation are used for modeling, and the fault is flagged when the 99% confidence limit
of T2 statistic and SPE value is violated, while for fault diagnosis, a fault reconstruction scheme is used where a combined discriminant of T2 and SPE statistic are developed for each fault class. The number of principal component has been determined by the cumulative variance contribution rate where it is above eighty percent. Selections of confidence limit and number of principal component are similar in KPCA, the differences are the choice of parameters of kernel function. Here, the kernel type is chosen as the RBF kernel, and the variance of kernel function is set as 15. For the classifier KNN, the number of nearest neighbor is set to five by try-and-error, and the Euclidean distance is used as the distance metric for default. The number of independent component in the ICA model is set to 4, and the confidence limit of statistics is also set as 99%. In the FDA model, the number of dimensions in the embedded feature space is set as the number of classes minus one which could contain the maximum amount of information. For training of neural network, the data collected under the normal operation is used. A sample is flagged as abnormal when it violates an interval of lower bound
Table 1 Measurement variables in TE process. No.1
Measured variables
No.1
Measured variables
1 2 3 4 5 6 7 8 9 10 11
A feed D feed E feed Total feed Recycle flow Reactor feed rate Reactor pressure Reactor level Reactor temperature Purge rate Product separator temperature
12 13 14 15 16 17 18 19 20 21 22
Product separator level Product separator pressure Product separator underflow Stripper level Stripper pressure Stripper underflow Stripper temperature Stripper steam flow Compressor work Reactor cooling water outlet temperature Separator cooling water outlet temperature
50
F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54
Table 2 Disturbances in TE process. Fault number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Fault One 100
Process variable
PCA-T 2 statistics
Type
A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed temperature (stream 2) Reactor cooling water inlet temperature Condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss-reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) Reactor cooling water inlet temperature Condenser cooling water inlet temperature Reaction kinetics Reactor cooling water valve Condenser cooling water valve Unknown Unknown Unknown Unknown Unknown Valve position constant (stream 4)
80
Step
60
Step Step step Step Step Step
40
KPCA-T 2 statistics ICA-I 2 statistics
20 0
0
2
4
6
8
10
12
14
16
18
20
100
Random variation Random variation Random variation Random variation Random variation Slow drift Sticking Sticking Unknown Unknown Unknown Unknown Unknown Constant position
80 60
PCA-SPE statistics KPCA-SPE statistics ICA-SPE statistics
40 20 0
0
2
4
6
8
10
12
14
16
18
20
Sampling Time Fig. 3. Unsupervised methods for online fault detection tested on fault one data.
Fault One and upper bound that one has specified, usually, it is set as [0.8 1.2]. For fault identification, a two layers feed-forward back propagation neural-network that mapped the online sample to the seven process states (normal and six fault classes) with size [10 7] using hyperbolic tangent sigmoid transfer function for the hidden layers, and a linear transfer function for the output layer is used. Among the seven output nodes, the one with the largest value was considered as the process state or fault class if its value is close to 1 (1 ± 0.2). We have carried out two different scenarios for this case study. In scenario 1, we used the ALL fusion system, that is to say, all the classifiers are used for combination of the final decision. In scenario 2, on the basis of the correlation coefficient, the SELECTIVE fusion system has been developed for making the final decision. Detailed results of these two fusion systems are illustrated in the following two subsections.
2 0-Normal 1-Fault: KNN 1
0
2
4
6
8
10
12
14
16
18
20
0-Normal 1-Fault: FDA 1
0
0
2
4
6
8
10
12
14
16
18
20
2 0-Normal 1-Fault: ANN-BP 1
0
4.1. Scenario 1
0
2
0
2
4
6
8
10
12
14
16
18
20
Sampling Time Corresponding to 7 different modes (1 normal condition, and 6 fault conditions) in the TE process, 7 datasets have been generated. Table 3 shows the fault detection and identification performance of different classifiers for the six fault cases, where Det. stands for the number of delay samples for fault detection, and Id. represents the number of delay samples for fault identification. It can be seen that different classifiers have different times of detection delay and some methods cannot even identify particular fault at all, e.g. K-nearest neighbor is unable to detect and identify fault 2. Specifically, taking fault 1 for example, the performances of various
Fig. 4. Supervised methods for online fault detection tested on fault one data.
classifiers are presented in Fig. 3, and Fig. 4, respectively. The dotted lines in Fig. 3 stand for the confidence limits of different classifiers and each color indicates corresponding method, e.g. the blue dotted line represents the level of confidence which is computed by classifier of PCA. In Fig. 4, “0” stands for the normal condition, “1” stands for the fault condition. While the unsupervised methods provide continuous monitoring results through the T2 and SPE
Table 3 Performance results of various classifiers of the six selected faults. Fault index
1 2 5 6 8 12
PCA
ICA
KPCA
KNN
ANN
FDA
Det.
Id.
Det.
Id.
Det.
Id.
Det.
Id.
Det.
Id.
Det.
Id.
8 16 11 0 26 22
6 0 – 7 26 22
8 15 0 3 19 25
6 23 12 10 21 22
7 17 0 0 14 2
6 0 – 0 22 0
7 – 16 24 24 16
4 – – 0 26 –
12 30 2 17 1 30
8 26 – 20 12 –
1 3 0 0 5 18
2 9 0 0 15 35
F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54
51
Table 4 Comparison results of differnet mehtods under with/without resampling frameworks. Fault index
1 2 5 6 8 12
PCA
ICA
KPCA
KNN
ANN
FDA
D–S
Det.
No-bag.
Det.
No-bag.
Det.
No-bag.
Det.
No-bag.
Det.
No-bag.
Det.
No-bag.
Det.
No-bag.
7 16 11 0 26 22
7 26 12 3 26 22
4 16 0 3 19 25
6 23 12 10 21 22
4 15 0 0 14 2
4 15 0 0 20 6
7 – 16 24 24 16
0 – 12 20 25 23
7 25 2 – 1 30
0 26 30 – – 12
1 9 0 0 5 0
1 9 0 0 15 35
0 9 0 0 1 0
0 9 0 0 14 6
statistics, we can see the results of supervised methods are discrete. And the results shown in Fig. 3 and Fig. 4 indicate that the ICA and FDA methods perform better than others. Table 4 provides the comparison results between each single model and the D–S evidence theory based method, in terms of the detection delay. At the same time, we also provide the results of the D–S evidence theory based method which is tested without the resampling technique, denoted as No-bag. In Table 4, the superior results of detection time delay achieved by different methods are marked with bold, e.g. fault 2 which performs substantially better for three classifiers of PCA, ICA and ANN, i.e., for PCA method, it takes 16 samples delay once the fault 2 was detected when adopting resampling technique but takes 26 samples delay without the resampling method. It can be inferred that the resampling technique can improve the fault detection performance of the classifier. For identification of those selected fault cases, the confusion matrices of the six classifiers can be generated, which are shown together in Fig. 5. Take the symbol of PCA CM for example, it represents the confusion matrix of classifier PCA and indicates the classification performance of the classifier PCA and similarly, the confusion matrix of classifier KNN is represented by KNN CM as well as ICA CM, KPCA CM etc. For example, On the basis of the confusion matrix, we can calculate the BPA values of each class (normal and fault), and then compute the combined BPA value to make the final decision. Taking fault 8 for examination, the whole process is displayed in Table 5. It is at the 12th sample that the ANN model first detected the fault. According to the BPA value we calculated, it can be seen that the combined BPA value of class 8 is 0.8421, which is the largest one. Therefore, it means that this fault should be identified as fault 8. Then the process keep going, when it comes
to the 22nd sample, one can observe that there are another KPCA method which can also detect the fault and then identified it as fault 8. While for this case the ANN method first detected the fault, there are situations that other methods may first detect the fault. However, no matter what method detect the fault at the first place, the decision fusion system can always identify the fault immediately. Therefore, compared to single classifiers, both of the fault detection and identification performances can be improved by the decision fusion system.
4.2. Scenario 2 In this scenario, we test the SELECTIVE fusion system on the TE process. The main difference compared to the scenario 1 is that we utilize the concept of correlation coefficient(Corr) to measure the relatioship between each pair of classifiers on the basis of the confusion matrix. The results of correlation coefficients for different pairs of classifiers are presented together in Fig. 6, with detailed values provided Table 6, in which those classifiers with high correlations with each other are marked in bold. As can be seen, the Corr values between PCA and KPCA, PCA and KNN, and KPCA and KNN all exceeds 70%. It means that each pair has a high similarity with each other and thus the perfomance of one classifier is quite close to the other one. When they are both involved in the decision fusion system, the diversity of the fusion system could be decreased which may affect the fusion performance. In this case, the SELECTIVE fusion system combines the classifier selectively and ensure that the two pairs of similar classifiers are not used at the same time.
Fig. 5. Confusion matrices of six fault detection methods.
52
F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54
Table 5 Implementation process of the ALL fusion system. Sample (no.) 12
22
26
30
a
PCA
KPCA
KNN
FDA
ICA
ANN
Combined BPA
Decision
Class
–
–
–
–
–
8
mtotal (5) = 0.0263 mtotal (8) = 0.8421 mtotal (12) = 0.1326
8
BPA
m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306
m(8) = 0.9765 m(12) = 0.0235
m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448
m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136
m(5) = 0.1667 m(12) = 0.8333
m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326
Class
–
8
–
–
–
8
mtotal (5) = 0 mtotal (8) = 0.9973 mtotal (12) = 0.0027
8
BPA
m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306
m(8) = 0.9765 m(12) = 0.0235
m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448
m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136
m(5) = 0.1667 m(12) = 0.8333
m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326
Class
8
8
8
–
–
8
mtotal (1) = 0 mtotal (5) = 0 mtotal (8) = 0.9998 mtotal (12) = 0.0002
8
BPA
m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306
m(8) = 0.9765 m(12) = 0.0235
m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448
m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136
m(5) = 0.1667 m(12) = 0.8333
m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326
Class
8
8
8
–
–
8
mtotal (1) = 0 mtotal (5) = 0 mtotal (8) = 0.9998 mtotal (12) = 0.0002
8
BPA
m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306
m(8) = 0.9765 m(12) = 0.0235
m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448
m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136
m(5) = 0.1667 m(12) = 0.8333
m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326
No., number of sample; m(5) = BPA value of class 5; m(8) = BPA value of class 8; and so on; mtotal = combined BPA of classes which detect and identification the fault.
Table 7 shows the whole process of the SELECTIVE fusion system. The parts which are covered in the light gray color are those classifiers (PCA, KNN) that we chose not to use during the fusion process. Compared to the results obtained in scenario 1, because once there is one classifier of PCA, KNN, ANN that has identified the fault, it is proper for us to get rid of the other classifies. In this scenario, the SELECTIVE fusion system has generated the same decision as the ALL fusion system, but it reduced the computational time. Simultaneously, it improved the quality of combined BPAs value by 5%-9%, which means the SELECTIVE fusion system is more reliable compared to the ALL fusion system.
Correlation coefficient 1 6 0.9 7
10
0.8 0.7 0.6 8
0.5
9
12
3
13
14
11
0.4 1
2
0.3
15
4
0.2
4.3. Further performance assessment
5
0.1 0
0
5
10
15
The discription of index Fig. 6. Correlation coefficients of different classifiers. Table 6 Numerical values of Corr for different methods. Index
C1
C2
Corr
Index
C1
C2
Corr
1 2 3 4 5 6 7 8
PCA ICA ICA ICA ICA PCA PCA PCA
ICA KPCA KNN FDA ANN KPCA KNN FDA
0.3105 0.3124 0.4002 0.2369 0.1254 0.9531 0.8003 0.4562
9 10 11 12 13 14 15
PCA KPCA KPCA KPCA KNN KNN FDA
ANN KNN FDA ANN FDA ANN ANN
0.4873 0.7898 0.3879 0.4850 0.5121 0.5119 0.2630
To further performance evaluation, the ROC curve is used which is a comprehensive index depicting trade-off between true positive and false positive rates. In the ROC curve, the x axis indicates false positive rate (FPR), which defines how many incorrect positive results occur among all negative samples during the test. On the other hand, the y axis indicates true positive rate (TPR) which defines how many correct positive results occur among all positive samples during the test. Intuitively, the larger the area covers under the curve, the more sensitive the classifier is, which also means the higher performance of the classifier. All individual classifiers as well as the D–S evidence fusion method are shown together in Fig. 7. The true positive rate (TPR) and false positive rates (FPR) in these ROC curves are calculated as follows: True positive rate: TPR =
TP TP + FN
(13)
F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54
53
Table 7 Implementation process of the SELECTIVE fusion system. Sample (no.) 12
22
26
30
a
PCA
KPCA
KNN
FDA
ICA
ANN
Combined BPA
Decision
Class
–
–
–
–
–
8
mtotal (5) = 0.0263 mtotal (8) = 0.8421 mtotal (12) = 0.1326
8
BPA
m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306
m(8) = 0.9765 m(12) = 0.0235
m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448
m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136
m(5) = 0.1667 m(12) = 0.8333
m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326
Class
–
8
–
–
–
8
mtotal (5) = 0 mtotal (8) = 0.9973 mtotal (12) = 0.0027
8
BPA
m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306
m(8) = 0.9765 m(12) = 0.0235
m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448
m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136
m(5) = 0.1667 m(12) = 0.8333
m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326
Class
8
8
8
–
–
8
mtotal (5) = 0 mtotal (8) = 0.9973 mtotal (12) = 0.0027
8
BPA
m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306
m(8) = 0.9765 m(12) = 0.0235
m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448
m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136
m(5) = 0.1667 m(12) = 0.8333
m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326
Class
8
8
8
–
–
8
mtotal (5) = 0 mtotal (8) = 0.9973 mtotal (12) = 0.0027
8
BPA
m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306
m(8) = 0.9765 m(12) = 0.0235
m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448
m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136
m(5) = 0.1667 m(12) = 0.8333
m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326
No., number of sample; m(5) = BPA value of class 5; m(8) = BPA value of class 8; and so on; mtotal = combined BPA of classes which detect and identification the fault.
Therefore, based on the results of ROC curve, it can be concluded that the fault detection and identification performances have been significantly improved through combining and fusion the results of multiple classifiers.
The ROC Curve 1 0.9
True Positive Rate
0.8
5. Conclusions
0.7 0.6 0.5
FDA KNN KPCA ICA PCA ANN D-S random class ify
0.4 0.3 0.2 0.1 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Positive Rate Fig. 7. ROC curves for individual monitoring methods and D–S evidence fusion in TE process.
False positive rate: FPR =
FP FP + TN
(14)
where TP indicates the number of true positive samples and FN is the number of false negative samples. Precisely, TP stands for the number of samples which are classified as abnormal when it belongs to abnormal class. Similarly, if it is classified as normal, it is counted as a false negative, denoted as FN. And TN indicates the number of samples which are classified as normal when it belongs to normal class. Similarly, if it is classified as abnormal, it is counted as a false positive, denoted as FP. We can observe from Fig. 7 that the D–S evidence fusion method has the largest area, compared to any other individual classifiers.
In this paper, two decision fusion systems have been developed for fault detection and identification through the Dempster-Shafer evidence theory, namely ALL fusion system and SELECTIVE fusion system. To guarantee the diversity among various individual classifiers, the resampling method has been introduced as a datapreprocessing step in both two fusion systems. While the ALL fusion system used all classifiers for decision making, some similar classifiers have been pruned in the SELECTIVE fusion system. The proposed Fault detection and identification performances of the two decision fusion systems have been evaluated in the TE process. Results show that the delay times of both fault detection and identification have decreased in comparison to any single classification method. With the introduction of the classifier pruning strategy in the SELECTIVE fusion system, the computational efficiency has been improved, as well as the reliability of the decision making system. Acknowledgements This work was supported in part by the National Natural Science Foundation of China (NSFC) (61273167), Project National 973 (2012CB720500), and the Open Research Project of the Key Laboratory of Advanced Control and Optimization for Chemical Processes, Shanghai (2014ACOCP01). References [1] L.H. Chiang, R.D. Braatz, E.L. Russell, Fault Detection and Diagnosis in Industrial Systems, Springer Science & Business Media, 2001.
54
F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54
[2] V. Venkatasubramanian, R. Rengaswamy, K. Yin, S.N. Kavuri, A review of process fault detection and diagnosis: part I: quantitative model-based methods, Comput. Chem. Eng. 27 (2003) 293–311. [3] V. Venkatasubramanian, R. Rengaswamy, S.N. Kavuri, A review of process fault detection and diagnosis: part II: qualitative models and search strategies, Comput. Chem. Eng. 27 (2003) 313–326. [4] V. Venkatasubramanian, R. Rengaswamy, S.N. Kavuri, K. Yin, A review of process fault detection and diagnosis, part III: process history based methods, Comput. Chem. Eng. 27 (2003) 327–346. [5] Z. Ge, Z. Song, F. Gao, Review of recent research on data-based process monitoring, Ind. Eng. Chem. Res. 52 (2013) 3543–3562. [6] D.H. Wolpert, W.G. Macready, No free lunch theorems for optimization, IEEE Trans. Evol. Comput. 1 (1997) 67–82. [7] L.K. Hansen, P. Salamon, Neural network ensembles, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 993–1001. [8] B.V. Dasarathy, B.V. Sheela, A composite classifier system design: concepts and methodology, Proc. IEEE 67 (1979) 708–713. [9] P. Cunningham, J. Carney, Diversity versus quality classification ensembles based on feature selection, in: R. López de Mántaras, E. Plaza (Eds.), Machine Learning: ECML 2000, Springer Berlin Heidelberg, 2000, pp. 109–116. [10] L. Lam, Classifier combinations: implementations and theoretical issues, in: Multiple Classifier Systems, Springer Berlin Heidelberg, 2000, pp. 77–86. [11] R. Polikar, Ensemble based systems in decision making, IEEE Circuits Syst. Mag. 6 (2006) 21–45. [12] A.F.R. Rahman, H. Alam, M.C. Fairhurst, Multiple classifier combination for character recognition: revisiting the majority voting system and its variations, in: D. Lopresti, J. Hu, R. Kashi (Eds.), Document Analysis Systems V, Springer Berlin Heidelberg, 2002, pp. 167–178. [13] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, John Wiley & Sons, 2004. [14] J.A. Benediktsson, I. Kanellopoulos, Classification of multisource and hyperspectral data based on decision fusion, IEEE Trans. Geosci. Remote Sens. 37 (1999) 1367–1377. [15] G. Niu, S.-S. Lee, B.-S. Yang, S.-J. Lee, Decision fusion system for fault diagnosis of elevator traction machine, J. Mech. Sci. Technol. 22 (2008) 85–95. [16] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, 1976. [17] L.I. Kuncheva, J.C. Bezdek, R.P. Duin, Decision templates for multiple classifier fusion: an experimental comparison, Pattern Recognit. 34 (2001) 299–314. [18] Y. Huang, C. Suen, The behavior-knowledge space method for combination of multiple classifiers, in: IEEE Computer Society Conference on Computer Vision
[19]
[20]
[21]
[22] [23] [24] [25] [26]
[27]
[28]
[29] [30] [31]
[32] [33] [34]
and Pattern Recognition, Institute of Electrical Engineers Inc. (IEEE), 1993, p. 347. K. Ghosh, Y.S. Ng, R. Srinivasan, Evaluation of decision fusion strategies for effective collaboration among heterogeneous fault diagnostic methods, Comput. Chem. Eng. 35 (2011) 342–355. M. Tabassian, R. Ghaderi, R. Ebrahimpour, Combination of multiple diverse classifiers using belief functions for handling data with imperfect labels, Expert Syst. Appl. 39 (2012) 1698–1707. O. Basir, X. Yuan, Engine fault diagnosis based on multi-sensor information fusion using Dempster–Shafer evidence theory, Inform. Fusion 8 (2007) 379–386. Y. Bi, J. Guan, D. Bell, The combination of multiple classifiers using an evidential reasoning approach, Artif. Intell. 172 (2008) 1731–1751. A.P. Dempster, A generalization of Bayesian inference, J. R. Stat. Soc. Ser. B: Methodol. 30 (1968) 205–247. P. Smets, R. Kennes, The transferable belief model, Artif. Intell. 66 (1994) 191–234. R.R. Yager, Dempster–Shafer belief structures with interval valued focal weights, Int. J. Intell. Syst. 16 (2001) 497–512. C.R. Parikh, M.J. Pont, N. Barrie Jones, Application of Dempster–Shafer theory in condition monitoring applications: a case study, Pattern Recognit. Lett. 22 (2001) 777–785. C.R. Parikh, M.J. Pont, N.B. Jones, F.S. Schlindwein, Improving the performance of CMFD applications using multiple classifiers and a fusion framework, Trans. Inst. Meas. Control 25 (2003) 123–144. K. Ghosh, S. Natarajan, R. Srinivasan, Hierarchically distributed fault detection and identification through Dempster–Shafer evidence fusion, Ind. Eng. Chem. Res. 50 (2011) 9249–9269. B.E.R.J. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall, New York, 1993. T.K. Ho, The random subspace method for constructing decision forests, IEEE Trans. Pattern Anal. Mach. Intell. 20 (1998) 832–844. L. Xu, A. Krzyzak, C.Y. Suen, Methods of combining multiple classifiers and their applications to handwriting recognition, IEEE Trans. Syst. Man Cybern. 22 (1992) 418–435. L.I. Kuncheva, C.J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Mach. Learn. 51 (2003) 181–207. J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Comput. Chem. Eng. 17 (1993) 245–255. P.R. Lyman, C. Georgakis, Plant-wide control of the Tennessee Eastman problem, Comput. Chem. Eng. 19 (1995) 321–331.