Decision fusion systems for fault detection and identification in industrial processes

Journal of Process Control 31 (2015) 45–54 Contents lists available at ScienceDirect Journal of Process Control journal homepage: www.elsevier.com/l...

Download PDF

1MB Sizes 1 Downloads 90 Views

Report

PDF Reader
Full Text

Journal of Process Control 31 (2015) 45–54

Contents lists available at ScienceDirect

Journal of Process Control journal homepage: www.elsevier.com/locate/jprocont

Decision fusion systems for fault detection and identiﬁcation in industrial processes Fuyuan Zhang a , Zhiqiang Ge a,b,∗ a State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Department of Control Science and Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, PR China b Key Laboratory of Advanced Control and Optimization for Chemical Processes, Shanghai 200237, PR China

a r t i c l e

i n f o

Article history: Received 3 August 2014 Received in revised form 6 March 2015 Accepted 9 April 2015 Keywords: Fault detection and identiﬁcation Decision fusion system Data-driven model Diversity Dempster–Shafer evidence theory

a b s t r a c t Numerous fault detection and identiﬁcation methods have been developed in recent years, whereas, each method works under its own assumption, which means a method works well in one condition may not provide a satisfactory performance in another condition. In this paper, we intend to design a fusion system by combining results of various methods. To increase the diversity among different methods, the resampling strategy is introduced as a data preprocessing step. A total of six conventionally used methods are selected for building the fusion system in this paper. Decisions generated from different models are combined together through the Dempster-Shafer evidence theory. Furthermore, to improve the computational efﬁciency and reliability of the fusion system, a new diversity measurement index named correlation coefﬁcient is deﬁned for model pruning in the fusion system. Fault detection and identiﬁcation performances of the decision fusion system are evaluated through the Tennessee Eastman process. © 2015 Elsevier Ltd. All rights reserved.

1. Introduction It is well known that not only the proper monitoring of the industrial process is signiﬁcant and practical, but also the fast and precise identiﬁcation of faults is essential for reducing the number of off-products and improving the productivity of the process. Thus, searching for the method which is effective and well-suited for monitoring is becoming more and more important. In chemical process industries, particularly, fault detection and identiﬁcation is a hot research spot in the past years. Generally, process monitoring methods can be divided into three categories [1–5]: model-based methods, knowledge-based methods, and data-based methods. Due to advantages of having few requirements of the process model and the associated expert knowledge, the data-based method has recently become the most popular one for process monitoring. Among all data-based process monitoring methods, typically used ones include principal component analysis (PCA), independent component analysis (ICA), partial

∗ Corresponding author at: State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Department of Control Science and Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, PR China. Tel.: +86 571 87951442. E-mail address: [email protected] (Z. Ge). http://dx.doi.org/10.1016/j.jprocont.2015.04.004 0959-1524/© 2015 Elsevier Ltd. All rights reserved.

least squares (PLS), artiﬁcial neural networks (ANN), etc. Although satisfactory results have been obtained in many industrial processes by using those mature methods, the equipment used in the industrial plants become more and more complicated and multifunctional, and the state of process is usually the combination of many operating conditions, which may cause performance deteriorations of those methods. In fact, it is obvious that sometimes the choice of one method under a single assumption will not achieve good results that we expect, as is shown in the work of Venkatasubramanian [4], as a result of the mismatching between the real process and the model assumption. Therefore, it is a question here, is there a perfect method that can deal with any complex condition in a process? The answer is absolutely no. According to the No Free Lunch theorem [6], there is no algorithm which is universally superior to others, that is to say, we are not able to design a strategy that can adapt to a variety of situations, e.g. non-Gaussian data distributions, nonlinear relationships among process variables, frequent changes of operating conditions, etc. In order to address this problem, some researchers put forward to the idea of ensemble systems [7,8]. The main purpose is to combine sorts of methods which have completely different emphases on modeling the data when dealing with the same problem through some efﬁcient fusion algorithms. One key factor of the ensemble system is the characteristic of diversity, which means

46

F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54

each single model needs to express different views of the system, thus has different errors so that the total error can be reduced after the ensemble process. Although there is no strict deﬁnition and explicit measurement of the diversity, it has been illustrated that the more the diversity is, the better the fusion results could be [9,10]. For example, Polikar [11] has experimentally proved that the ensemble of multiple classiﬁers performed better than a single one where the diversity is quite signiﬁcant. The other key factor of the ensemble system is about the decision making or combination for various models. In general, there are two categories: utility-based methods and evidence-based methods. A representative of the former is the voting-based method [12–14], and the latter includes Bayesian method [15], Dempster–Shafer (D–S) method [16], decision templates [17], Borda count [18], etc. Compared to other decision making approaches, the D–S framework provides a more ﬂexible mathematical tool for dealing with imperfect information, as well as a more simple computing procedure and concise expression of ﬁnal decision. What is more, the D–S method has no limitation of the data distribution, which can bring lots of conveniences during data preprocessing. Due to those advantages, the D–S based method has been widely used for decision making in the past years [15,19–22], and has also been proved to be an appropriate approach for improving the performance of an ensemble model that deals with unreliable information [20]. In this paper, the Dempster–Shafer evidence theory is employed for the development of decision fusion systems for fault detection and identiﬁcation. In order to enhance the diversity performance of the fusion system, a resampling strategy is introduced as a data preprocessing procedure, in addition to using different types of data models. Furthermore, through deﬁning a new correlation measurement index, those classiﬁers which have similar characteristics are pruned from the fusion system. As a result, both of the computational efﬁciency and the classiﬁcation reliability can be improved. Here, the fusion system which incorporates all classiﬁers is called as ALL fusion system, and the one with pruning strategy is represented as SELECTIVE fusion system. The rest of the paper is organized as follows. Section 2 provides a review of preliminary knowledge about the Dempster–Shafer evidence theory. Due to the length of this paper, we have ignored detailed preliminary knowledge about selected unsupervised and supervised modeling methods, since one can easily ﬁnd them in many published books and papers. Section 3 describes a complete framework of ALL and SELECTIVE fusion systems, with the deﬁnition of a new index to measure the correlations among different methods. Online fault detection and identiﬁcation results are illustrated based on the proposed framework by using the Tennessee Eastman (TE) process in Section 4. Finally, conclusions are made. 2. Dempster–Shafer evidence theory The evidence theory is initially proposed by Dempster [23] concerning lower and upper probability distribution, and Shafer [16] proved the ability of the belief functions to model uncertain knowledge. Then, the complete Dempster–Shafer theory was formulated, which enables us to combine evidences from different sources and arrives at a degree of belief which has been widely used in the ﬁeld of information fusion. In this section, some basic concepts and combination rules of the Dempster–Shafer theory are introduced, one can refer to Shafer [16], Smets and Kennes [24], or Yager [25] for more detailed instructions on this subject. 2.1. Basic deﬁnitions Deﬁnition 1. Let be a ﬁnite non-empty set of N mutually exhaustive and exclusive hypotheses about some fault class

domain. Then, Let us denote 2 , the power set of , composed with all the proposition of F in . ˝ = {F1 , F2 , . . ., FN } ˝

2

(1)

= {∅, {F1 }, {F2 }, . . ., {FN }, {F1 ∪ F2 }, {F1 ∪ F3 }, . . ., ˝}.

(2)

Deﬁnition 2. Basic probability assignment (BPA), also called the mass function or basic belief assignment, is a function mapping from 2˝ to [0,1] which assigns a belief value to each element of power set. It satisﬁes the following two properties: m : 2˝ → [0, 1] m(∅) = 0

(3)

m(A) = 1

A⊆˝

where ∅ is an empty set and it is called normalized BPA with m(∅) =0, otherwise, each subset A when m(A) > 0, is called the focal element of m. Deﬁnition 3. Bel(A) =

The belief function is deﬁned as bel : 2˝ → [0, 1]

m(B)

(4)

B⊆A

Deﬁnition 4.

The plausible function is deﬁned as pl : 2˝ → [0, 1]

Pl(A) = 1 − Bel(A) =

m(B)

(5)

A∩B = / ∅

where A is the negation of a hypothesis A. Deﬁnition 5. [Bel(A), Pl(A)] is the conﬁdence interval which describes the uncertainty about A. If the difference between Bel and Pl increases, then the information available used for fusion will decrease. Therefore, the difference provides a measurement of uncertainty about the level of evidence. 2.2. Rule of combination When multiple independent sources of evidence are available, such as m1 and m2 , the combined evidence can be obtained by Dempster’s rule as follows: m(∅) = 0, m1.2 (A) = m1 (A) ⊕ m2 (A) =

1 1−K

m1 (B)m2 (C)

(6)

B∩C=A

where K = B∩C=∅ m1 (B)m2 (C), it represents the BPA when the result of the combination is an empty set, and is often interpreted as a measurement of conﬂict between the two pieces of evidence, which satisﬁes K = / 1. Obviously, the larger K is, the more conﬂict the evidences are, and the less information is available. Obviously, the Dempster’s rule can be easily extended to more than two hypothesis, as shown in Eq. (7), i.e., by combining the BPAs of ﬁrst two classiﬁers (m1 and m2 ) using Eq. (6) to obtain the combined BPA (m1 , m2 ) and then combine the result (m1.2 ) with the BPA of the third classiﬁer (m3 ) and so forth until the Tth classiﬁer. m1,2,...,T = m1 ⊕ m2 ⊕ . . . ⊕ mT = (((m1 ⊕ m2 ) ⊕ m3 ) ⊕ . . . ⊕ mT ) = ((m1,2 ⊕ m3 ) ⊕ . . . ⊕ mT ). . .

(7)

In recent years, Dempster–Shafer based fusion has been widely used in various ﬁelds, such as pattern recognition, process fault diagnosis, geographic information systems, medical diagnosis. For example, Parikh et al. [26,27] used the Dempster–Shafer evidence theory to combine the outputs of multiple primary classiﬁers to improve overall classiﬁcation performance. The effectiveness of this approach was demonstrated for detecting failure in a diesel

F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54

47

Fig. 1. Fault identiﬁcation framework based on Dempster–Shafer evidence theory.

engine cooling system. Kaushik Ghosh [28] proposed a framework for distributed fault detection and identiﬁcation and adapted the Dempster–Shafer evidence theory to combine these diagnostic results at different levels of abstraction. 3. Decision fusion systems for fault detection and identiﬁcation As we know, only if we could ﬁnd or design a good enough classiﬁer which has a perfect generalization performance under different sorts of circumstances, there is no need to resort to ensemble fusion techniques. However, the reality of noise, outliers, or missing data, makes a perfect classiﬁer is unknown to us and at least it could not be well designed for different conditions. Therefore, we try to exploit a system which includes many classiﬁers and want to reach the objective of the best classiﬁer. When individual classiﬁers make errors on different instances, the intuition is that different types of classiﬁers can often complement for each other, speciﬁcally, we need classiﬁers whose decision boundaries are adequately different from those of others, and hence the classiﬁcation performance can be improved as a result of combination. Such a set of classiﬁers is said to be diverse. It is obvious that we must focus on the important element of diversity with regard to utilizing a multiple classiﬁer system, as well as resampling of the dataset. Fig. 1 shows the main architecture of the proposed decision fusion system. There are generally two steps which composed of off-line modeling about the training data and online classiﬁcation of unlabeled data. In detail, there are four main procedures related to the implementation of this system, that is, • Resampling of training data. • Selection of multiple classiﬁers. • Testing the performance of classiﬁers denoted by confusion matrices. • Ensemble decisions using D–S evidence theory. 3.1. Resampling of training data The diversity of the classiﬁer can be achieved by many ways. The most popular method is probably the resampling technique, which is used in process of bootstrapping or bagging, where training datasets are obtained by randomly drawing and replacing in the whole training data. The resampling technique is ﬁrstly introduced

by Efron [29]. It is a method to make samples recombined by randomly sampling with replacement, and it is based on the theory of probability and statistics, as a matter of fact, an instance in the training set has probability 1 − (1 − 1/n)n of being selected at least once in the n times instances are randomly selected from the training set. For large n, this is about 1 − 1/e = 63.2% which means that each subset contains only about 63.2% unique instances from the training set. That is to say, when randomly drawing samples in a training dataset, the more times of one sample comes up, the more useful information it contains, and it is helpful to weaken the relevance among different testing data, and improve the diversity of samples by importing randomness to some extent. Let C be the number of fault classes in the process, represented as ˝ = {F1 , F2 , . . ., FC } , for the ith fault called Fi , where i = 1, 2, . . ., C. For process monitoring purpose, we also need the data when the process is under the normal operating condition, denoted by F0 . For each of the data matrix Fi , there are n rows (samples) and m columns (variables). Thus, there are totally C + 1 classes which need to be identiﬁed when fault identiﬁcation is required from the process. Here are speciﬁc steps for resampling of the process dataset: Step 1: Randomly select the number from 1 to n by n times, record the location index denoted by S. Step 2: Rearrange each dataset of C + 1 classes as the index of S. Step 3: Generate a new data matrix, i.e., X. 3.2. Selection of multiple classiﬁers Another way to achieve the diversity of the decision fusion system is to select different classiﬁers, similar to the random subspace method proposed by Ho where diversity is obtained by training each classiﬁer through different features chosen in the primitive feature space [30]. For selection of multiple classiﬁers or fault identiﬁcation methods, there are two factors that should be considered: the category of classiﬁers, and the number of classiﬁers. Without loss of generality, we pick widely used classiﬁers from both unsupervised and supervised modeling methods, which are listed as follows: (1) Unsupervised methods, include Principal Component Analysis (PCA), Kernel Principal Component analysis (KPCA), and Independent Component Analysis (ICA); (2) Supervised methods, include K-nearest neighbor (KNN), Fisher discrimination analysis (FDA), and Artiﬁcial Neural Network (ANN).

48

F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54

It is worth to note that the selected methods have their own modeling emphases in address different kinds of process data characteristics, such as non-Gaussian data distribution, variable nonlinearity, time-varying and multiple operating conditions, etc. Obviously, when we select more classiﬁers, more satisfactory results can be expected. However, for the convenience of modeling and online implementation, we set the number of classiﬁers as six. One can easily extend it to more general cases under the same modeling framework. 3.3. Testing the performance of classiﬁers denoted by confusion matrix

FinalDS = arg max [m1,2,...T (Fi )]

In this section, we test the performance of each classiﬁer, and the information is stored in confusion matrix, which is usually constructed by testing on separate validation datasets [31]. Suppose ˝ = {F1 , F2 , . . ., FC } , for the ith fault called Fi , where i = 1, 2, . . ., C, there are C classes in total, and T stands for the number of classiﬁers. For an instance x, if the output of the kth classiﬁer is class Fj . i.e., Ek (x) = Fj . The confusion matrix CMk for classiﬁer k is typically represented as

⎡

k N11

k N12

···

k N1C

⎢ k k k ⎢ N21 N22 · · · N2C ⎢ CM = ⎢ .. ⎢ .. .. . ⎣ . . k

k NC1

k NC2

···

k NCC

k N1(C+1) k N2(C+1)

.... ..

⎤

⎥ ⎥ ⎥ ⎥ k = 1, 2, . . ., T ⎥ ⎦

(8)

3.4. Ensemble decisions using D–S evidence theory In this section, two decision fusion systems are developed, named as the All fusion system and the SELECTIVE fusion system. The main difference between these two fusion systems is due to a new deﬁned index named as correlation coefﬁcient, which is used for measurement of the correlations among different classiﬁers. While the ALL fusion system incorporates all individual classiﬁers, the SELECTIVE fusion system uses the correlation coefﬁcient based index for selection of individual classiﬁers in the ﬁrst step, and then adopts those selected classiﬁers for decision fusion. 3.4.1. All fusion system To build the fusion algorithm, the confusion matrix obtained from performance evaluation process is used to estimate the belief functions of each classiﬁer. The main steps of the fusion process are summarized as follows: Step 1: Calculate the individual basic probability assignment (BPA) of each classiﬁer. Nijk

M

i=1

Nijk

k = 1, 2, . . ., T

(9)

where

A∈

mk.l (A) = 1

(10)

(11)

i∈[1,C]

3.4.2. SELECTIVE fusion system To further improve the performance of the decision fusion system, the ALL fusion system is pruned to exclude some similar individual classiﬁers, which we named here as the SELECTIVE fusion system. To date, there are several assessment of diversity to measure the correlations among different classiﬁers, such as pairwise measures [32], which calculate between two classiﬁers, and non pair-wise measures e.g. Entropy measure and Kohavi–Wolpert variance measure. In this paper, we proposed a new non pair-wise method, which is called as correlation coefﬁcient (corrij ) here, and deﬁned as corrij =

k NC(C+1)

where the rows in this confusion matrix represent the actual classes: F1 , F2 , . . ., FC , while the columns stand for the classes assigned by the kth classiﬁer, and note that the last column C + 1 stands for the normal class, which also indicates the classiﬁcation result of the type I error. For example, the element Nijk in the confusion matrix, represents the number of validation samples from class Fi that are assigned to class Fj by classiﬁer k. Thus, for each classiﬁer, we can get a similar confusion matrix, which contains the corresponding performance information.

mk (Fi ) =

In Eq. (9), the element Nijk represents the number of validation samples from class Fi that are assigned to class Fj by classiﬁer k, and M indicates the total number of classiﬁers that have classiﬁed the sample which pertains to class i to class j. Step 2: Compute the combined BPAs using Dempster’s rule. After achieving the individual BPAs of all classiﬁers, we can combine them by Dempster’s rule provided in Eqs. (6) and (7), to obtain the combined result of all T classes, i.e. m1,2,...,T (Fi ). Step 3: Final decision making. Choose the class Fi with the maximum combined BPA as the ﬁnal decision, that is,

cov(cmi , cmj )

D(cmi )

D(cmj )

(12)

where cmi and cmj are the confusion matrix of the ith and jth classiﬁer, D(cmi ), D(cmj ) are the variance of the ith and jth confusion matrix. From the statistical viewpoint, index of corrij is a measurement of linear relationship between two classiﬁers. Based on this index, our aim is to obtain small correlation coefﬁcients so that the corresponding classiﬁers will have a high diversity between each other. Meanwhile, this index (corrij ) also provides strong and reliable bases to implement the combination by using proper fusion rules. Therefore, after we have selected different classiﬁers in the previous step, we can use Eq. (12) to compute the values of correlation coefﬁcient for different classiﬁers. To prune some similar classiﬁers for the fusion system, we can set a threshold to evaluate the similarity limit, thus we regard the values greater than this value as high similarities among those classiﬁers and most of them should be abandoned. Similar to the procedures introduced in Section 3.4.1, we can compute the individual BPA and combined BPA values, based on which the ﬁnal decision can be made. 4. Case study: Tennessee Eastman challenge problem In this section, the proposed method is tested for online fault identiﬁcation on the Tennessee Eastman (TE) industrial challenge problem [33,34]. This process produces two products (G and H) and a byproduct (F) from reactants A, C, D, and E, which can be seen in Fig. 2. The process has ﬁve major units, namely: a two-phase reactor, a product condenser, a ﬂash separator, a recycle compressor, and product stripper. It consists of 41 measured variables, which are 22 continuous process measurements (Table 1), and 19 composition measurements. Besides, there are 12 manipulated variables, 21 programmed faults are introduced to the process, which are tabulated in Table 2. More details of the process tested are explained well in the book of Chiang et al. [1]. Here, we use the 22 continuous measurements for fault detection and identiﬁcation. With reference to the fault information, faults 1 to 7 are step changes of the process variables; faults 8 to 12 are random changes of the variables; fault 13 is a slow shift of the

F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54

49

Fig. 2. Tennessee Eastman process.

reaction kinetics; faults 14, 15, and 21 are in relation to valve sticking; and faults16 to 20 are types of unknown faults. Among these faults, owing to the great effect on the process and bring changes between the process variables, some faults are easy to be detected. However, there are also faults that are difﬁcult to be detected (e.g., faults 3, 9, and 15), because they are very small and have little inﬂuence on the process. In this process, the existing controller is able to provide good recovery for faults 3, 4, 9, 14, 15, 16, and 19, therefore, these faults are excluded from the analysis in the present paper. Firstly, we choose fault 1, 2, 5, 6, 8, and 12 for simulation to verify the effectiveness of the proposed method. Each normal and fault dataset contains 960 samples with a sampling interval of 3 min. All faults were introduced at sample 161 of operating time for the training data. Thus there are totally 800 fault samples in each fault class. The parameters for each classiﬁer are as follows: for fault detection in PCA, data which are collected in normal operation are used for modeling, and the fault is ﬂagged when the 99% conﬁdence limit

of T2 statistic and SPE value is violated, while for fault diagnosis, a fault reconstruction scheme is used where a combined discriminant of T2 and SPE statistic are developed for each fault class. The number of principal component has been determined by the cumulative variance contribution rate where it is above eighty percent. Selections of conﬁdence limit and number of principal component are similar in KPCA, the differences are the choice of parameters of kernel function. Here, the kernel type is chosen as the RBF kernel, and the variance of kernel function is set as 15. For the classiﬁer KNN, the number of nearest neighbor is set to ﬁve by try-and-error, and the Euclidean distance is used as the distance metric for default. The number of independent component in the ICA model is set to 4, and the conﬁdence limit of statistics is also set as 99%. In the FDA model, the number of dimensions in the embedded feature space is set as the number of classes minus one which could contain the maximum amount of information. For training of neural network, the data collected under the normal operation is used. A sample is ﬂagged as abnormal when it violates an interval of lower bound

Table 1 Measurement variables in TE process. No.1

Measured variables

No.1

Measured variables

1 2 3 4 5 6 7 8 9 10 11

A feed D feed E feed Total feed Recycle ﬂow Reactor feed rate Reactor pressure Reactor level Reactor temperature Purge rate Product separator temperature

12 13 14 15 16 17 18 19 20 21 22

Product separator level Product separator pressure Product separator underﬂow Stripper level Stripper pressure Stripper underﬂow Stripper temperature Stripper steam ﬂow Compressor work Reactor cooling water outlet temperature Separator cooling water outlet temperature

50

F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54

Table 2 Disturbances in TE process. Fault number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Fault One 100

Process variable

PCA-T 2 statistics

Type

A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed temperature (stream 2) Reactor cooling water inlet temperature Condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss-reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) Reactor cooling water inlet temperature Condenser cooling water inlet temperature Reaction kinetics Reactor cooling water valve Condenser cooling water valve Unknown Unknown Unknown Unknown Unknown Valve position constant (stream 4)

80

Step

60

Step Step step Step Step Step

40

KPCA-T 2 statistics ICA-I 2 statistics

20 0

0

2

4

6

8

10

12

14

16

18

20

100

Random variation Random variation Random variation Random variation Random variation Slow drift Sticking Sticking Unknown Unknown Unknown Unknown Unknown Constant position

80 60

PCA-SPE statistics KPCA-SPE statistics ICA-SPE statistics

40 20 0

0

2

4

6

8

10

12

14

16

18

20

Sampling Time Fig. 3. Unsupervised methods for online fault detection tested on fault one data.

Fault One and upper bound that one has speciﬁed, usually, it is set as [0.8 1.2]. For fault identiﬁcation, a two layers feed-forward back propagation neural-network that mapped the online sample to the seven process states (normal and six fault classes) with size [10 7] using hyperbolic tangent sigmoid transfer function for the hidden layers, and a linear transfer function for the output layer is used. Among the seven output nodes, the one with the largest value was considered as the process state or fault class if its value is close to 1 (1 ± 0.2). We have carried out two different scenarios for this case study. In scenario 1, we used the ALL fusion system, that is to say, all the classiﬁers are used for combination of the ﬁnal decision. In scenario 2, on the basis of the correlation coefﬁcient, the SELECTIVE fusion system has been developed for making the ﬁnal decision. Detailed results of these two fusion systems are illustrated in the following two subsections.

2 0-Normal 1-Fault: KNN 1

0

2

4

6

8

10

12

14

16

18

20

0-Normal 1-Fault: FDA 1

0

0

2

4

6

8

10

12

14

16

18

20

2 0-Normal 1-Fault: ANN-BP 1

0

4.1. Scenario 1

0

2

0

2

4

6

8

10

12

14

16

18

20

Sampling Time Corresponding to 7 different modes (1 normal condition, and 6 fault conditions) in the TE process, 7 datasets have been generated. Table 3 shows the fault detection and identiﬁcation performance of different classiﬁers for the six fault cases, where Det. stands for the number of delay samples for fault detection, and Id. represents the number of delay samples for fault identiﬁcation. It can be seen that different classiﬁers have different times of detection delay and some methods cannot even identify particular fault at all, e.g. K-nearest neighbor is unable to detect and identify fault 2. Speciﬁcally, taking fault 1 for example, the performances of various

Fig. 4. Supervised methods for online fault detection tested on fault one data.

classiﬁers are presented in Fig. 3, and Fig. 4, respectively. The dotted lines in Fig. 3 stand for the conﬁdence limits of different classiﬁers and each color indicates corresponding method, e.g. the blue dotted line represents the level of conﬁdence which is computed by classiﬁer of PCA. In Fig. 4, “0” stands for the normal condition, “1” stands for the fault condition. While the unsupervised methods provide continuous monitoring results through the T2 and SPE

Table 3 Performance results of various classiﬁers of the six selected faults. Fault index

1 2 5 6 8 12

PCA

ICA

KPCA

KNN

ANN

FDA

Det.

Id.

Det.

Id.

Det.

Id.

Det.

Id.

Det.

Id.

Det.

Id.

8 16 11 0 26 22

6 0 – 7 26 22

8 15 0 3 19 25

6 23 12 10 21 22

7 17 0 0 14 2

6 0 – 0 22 0

7 – 16 24 24 16

4 – – 0 26 –

12 30 2 17 1 30

8 26 – 20 12 –

1 3 0 0 5 18

2 9 0 0 15 35

F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54

51

Table 4 Comparison results of differnet mehtods under with/without resampling frameworks. Fault index

1 2 5 6 8 12

PCA

ICA

KPCA

KNN

ANN

FDA

D–S

Det.

No-bag.

Det.

No-bag.

Det.

No-bag.

Det.

No-bag.

Det.

No-bag.

Det.

No-bag.

Det.

No-bag.

7 16 11 0 26 22

7 26 12 3 26 22

4 16 0 3 19 25

6 23 12 10 21 22

4 15 0 0 14 2

4 15 0 0 20 6

7 – 16 24 24 16

0 – 12 20 25 23

7 25 2 – 1 30

0 26 30 – – 12

1 9 0 0 5 0

1 9 0 0 15 35

0 9 0 0 1 0

0 9 0 0 14 6

statistics, we can see the results of supervised methods are discrete. And the results shown in Fig. 3 and Fig. 4 indicate that the ICA and FDA methods perform better than others. Table 4 provides the comparison results between each single model and the D–S evidence theory based method, in terms of the detection delay. At the same time, we also provide the results of the D–S evidence theory based method which is tested without the resampling technique, denoted as No-bag. In Table 4, the superior results of detection time delay achieved by different methods are marked with bold, e.g. fault 2 which performs substantially better for three classiﬁers of PCA, ICA and ANN, i.e., for PCA method, it takes 16 samples delay once the fault 2 was detected when adopting resampling technique but takes 26 samples delay without the resampling method. It can be inferred that the resampling technique can improve the fault detection performance of the classiﬁer. For identiﬁcation of those selected fault cases, the confusion matrices of the six classiﬁers can be generated, which are shown together in Fig. 5. Take the symbol of PCA CM for example, it represents the confusion matrix of classiﬁer PCA and indicates the classiﬁcation performance of the classiﬁer PCA and similarly, the confusion matrix of classiﬁer KNN is represented by KNN CM as well as ICA CM, KPCA CM etc. For example, On the basis of the confusion matrix, we can calculate the BPA values of each class (normal and fault), and then compute the combined BPA value to make the ﬁnal decision. Taking fault 8 for examination, the whole process is displayed in Table 5. It is at the 12th sample that the ANN model ﬁrst detected the fault. According to the BPA value we calculated, it can be seen that the combined BPA value of class 8 is 0.8421, which is the largest one. Therefore, it means that this fault should be identiﬁed as fault 8. Then the process keep going, when it comes

to the 22nd sample, one can observe that there are another KPCA method which can also detect the fault and then identiﬁed it as fault 8. While for this case the ANN method ﬁrst detected the fault, there are situations that other methods may ﬁrst detect the fault. However, no matter what method detect the fault at the ﬁrst place, the decision fusion system can always identify the fault immediately. Therefore, compared to single classiﬁers, both of the fault detection and identiﬁcation performances can be improved by the decision fusion system.

4.2. Scenario 2 In this scenario, we test the SELECTIVE fusion system on the TE process. The main difference compared to the scenario 1 is that we utilize the concept of correlation coefﬁcient(Corr) to measure the relatioship between each pair of classiﬁers on the basis of the confusion matrix. The results of correlation coefﬁcients for different pairs of classiﬁers are presented together in Fig. 6, with detailed values provided Table 6, in which those classiﬁers with high correlations with each other are marked in bold. As can be seen, the Corr values between PCA and KPCA, PCA and KNN, and KPCA and KNN all exceeds 70%. It means that each pair has a high similarity with each other and thus the perfomance of one classiﬁer is quite close to the other one. When they are both involved in the decision fusion system, the diversity of the fusion system could be decreased which may affect the fusion performance. In this case, the SELECTIVE fusion system combines the classiﬁer selectively and ensure that the two pairs of similar classiﬁers are not used at the same time.

Fig. 5. Confusion matrices of six fault detection methods.

52

F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54

Table 5 Implementation process of the ALL fusion system. Sample (no.) 12

22

26

30

a

PCA

KPCA

KNN

FDA

ICA

ANN

Combined BPA

Decision

Class

–

–

–

–

–

8

mtotal (5) = 0.0263 mtotal (8) = 0.8421 mtotal (12) = 0.1326

8

BPA

m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306

m(8) = 0.9765 m(12) = 0.0235

m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448

m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136

m(5) = 0.1667 m(12) = 0.8333

m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326

Class

–

8

–

–

–

8

mtotal (5) = 0 mtotal (8) = 0.9973 mtotal (12) = 0.0027

8

BPA

m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306

m(8) = 0.9765 m(12) = 0.0235

m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448

m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136

m(5) = 0.1667 m(12) = 0.8333

m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326

Class

8

8

8

–

–

8

mtotal (1) = 0 mtotal (5) = 0 mtotal (8) = 0.9998 mtotal (12) = 0.0002

8

BPA

m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306

m(8) = 0.9765 m(12) = 0.0235

m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448

m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136

m(5) = 0.1667 m(12) = 0.8333

m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326

Class

8

8

8

–

–

8

mtotal (1) = 0 mtotal (5) = 0 mtotal (8) = 0.9998 mtotal (12) = 0.0002

8

BPA

m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306

m(8) = 0.9765 m(12) = 0.0235

m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448

m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136

m(5) = 0.1667 m(12) = 0.8333

m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326

No., number of sample; m(5) = BPA value of class 5; m(8) = BPA value of class 8; and so on; mtotal = combined BPA of classes which detect and identiﬁcation the fault.

Table 7 shows the whole process of the SELECTIVE fusion system. The parts which are covered in the light gray color are those classiﬁers (PCA, KNN) that we chose not to use during the fusion process. Compared to the results obtained in scenario 1, because once there is one classiﬁer of PCA, KNN, ANN that has identiﬁed the fault, it is proper for us to get rid of the other classiﬁes. In this scenario, the SELECTIVE fusion system has generated the same decision as the ALL fusion system, but it reduced the computational time. Simultaneously, it improved the quality of combined BPAs value by 5%-9%, which means the SELECTIVE fusion system is more reliable compared to the ALL fusion system.

Correlation coefficient 1 6 0.9 7

10

0.8 0.7 0.6 8

0.5

9

12

3

13

14

11

0.4 1

2

0.3

15

4

0.2

4.3. Further performance assessment

5

0.1 0

0

5

10

15

The discription of index Fig. 6. Correlation coefﬁcients of different classiﬁers. Table 6 Numerical values of Corr for different methods. Index

C1

C2

Corr

Index

C1

C2

Corr

1 2 3 4 5 6 7 8

PCA ICA ICA ICA ICA PCA PCA PCA

ICA KPCA KNN FDA ANN KPCA KNN FDA

0.3105 0.3124 0.4002 0.2369 0.1254 0.9531 0.8003 0.4562

9 10 11 12 13 14 15

PCA KPCA KPCA KPCA KNN KNN FDA

ANN KNN FDA ANN FDA ANN ANN

0.4873 0.7898 0.3879 0.4850 0.5121 0.5119 0.2630

To further performance evaluation, the ROC curve is used which is a comprehensive index depicting trade-off between true positive and false positive rates. In the ROC curve, the x axis indicates false positive rate (FPR), which deﬁnes how many incorrect positive results occur among all negative samples during the test. On the other hand, the y axis indicates true positive rate (TPR) which deﬁnes how many correct positive results occur among all positive samples during the test. Intuitively, the larger the area covers under the curve, the more sensitive the classiﬁer is, which also means the higher performance of the classiﬁer. All individual classiﬁers as well as the D–S evidence fusion method are shown together in Fig. 7. The true positive rate (TPR) and false positive rates (FPR) in these ROC curves are calculated as follows: True positive rate: TPR =

TP TP + FN

(13)

F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54

53

Table 7 Implementation process of the SELECTIVE fusion system. Sample (no.) 12

22

26

30

a

PCA

KPCA

KNN

FDA

ICA

ANN

Combined BPA

Decision

Class

–

–

–

–

–

8

mtotal (5) = 0.0263 mtotal (8) = 0.8421 mtotal (12) = 0.1326

8

BPA

m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306

m(8) = 0.9765 m(12) = 0.0235

m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448

m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136

m(5) = 0.1667 m(12) = 0.8333

m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326

Class

–

8

–

–

–

8

mtotal (5) = 0 mtotal (8) = 0.9973 mtotal (12) = 0.0027

8

BPA

m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306

m(8) = 0.9765 m(12) = 0.0235

m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448

m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136

m(5) = 0.1667 m(12) = 0.8333

m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326

Class

8

8

8

–

–

8

mtotal (5) = 0 mtotal (8) = 0.9973 mtotal (12) = 0.0027

8

BPA

m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306

m(8) = 0.9765 m(12) = 0.0235

m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448

m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136

m(5) = 0.1667 m(12) = 0.8333

m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326

Class

8

8

8

–

–

8

mtotal (5) = 0 mtotal (8) = 0.9973 mtotal (12) = 0.0027

8

BPA

m(5) = 0.2595 m(8) = 0.7099 m(12) = 0.0306

m(8) = 0.9765 m(12) = 0.0235

m(1) = 0.0115 m(5) = 0.0230 m(8) = 0.6207 m(12) = 0.3448

m(5) = 0.3523 m(8) = 0.0341 m(12) = 0.6136

m(5) = 0.1667 m(12) = 0.8333

m(5) = 0.0263 m(8) = 0.8421 m(12) = 0.1326

No., number of sample; m(5) = BPA value of class 5; m(8) = BPA value of class 8; and so on; mtotal = combined BPA of classes which detect and identiﬁcation the fault.

Therefore, based on the results of ROC curve, it can be concluded that the fault detection and identiﬁcation performances have been signiﬁcantly improved through combining and fusion the results of multiple classiﬁers.

The ROC Curve 1 0.9

True Positive Rate

0.8

5. Conclusions

0.7 0.6 0.5

FDA KNN KPCA ICA PCA ANN D-S random class ify

0.4 0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive Rate Fig. 7. ROC curves for individual monitoring methods and D–S evidence fusion in TE process.

False positive rate: FPR =

FP FP + TN

(14)

where TP indicates the number of true positive samples and FN is the number of false negative samples. Precisely, TP stands for the number of samples which are classiﬁed as abnormal when it belongs to abnormal class. Similarly, if it is classiﬁed as normal, it is counted as a false negative, denoted as FN. And TN indicates the number of samples which are classiﬁed as normal when it belongs to normal class. Similarly, if it is classiﬁed as abnormal, it is counted as a false positive, denoted as FP. We can observe from Fig. 7 that the D–S evidence fusion method has the largest area, compared to any other individual classiﬁers.

In this paper, two decision fusion systems have been developed for fault detection and identiﬁcation through the Dempster-Shafer evidence theory, namely ALL fusion system and SELECTIVE fusion system. To guarantee the diversity among various individual classiﬁers, the resampling method has been introduced as a datapreprocessing step in both two fusion systems. While the ALL fusion system used all classiﬁers for decision making, some similar classiﬁers have been pruned in the SELECTIVE fusion system. The proposed Fault detection and identiﬁcation performances of the two decision fusion systems have been evaluated in the TE process. Results show that the delay times of both fault detection and identiﬁcation have decreased in comparison to any single classiﬁcation method. With the introduction of the classiﬁer pruning strategy in the SELECTIVE fusion system, the computational efﬁciency has been improved, as well as the reliability of the decision making system. Acknowledgements This work was supported in part by the National Natural Science Foundation of China (NSFC) (61273167), Project National 973 (2012CB720500), and the Open Research Project of the Key Laboratory of Advanced Control and Optimization for Chemical Processes, Shanghai (2014ACOCP01). References [1] L.H. Chiang, R.D. Braatz, E.L. Russell, Fault Detection and Diagnosis in Industrial Systems, Springer Science & Business Media, 2001.

54

F. Zhang, Z. Ge / Journal of Process Control 31 (2015) 45–54

[2] V. Venkatasubramanian, R. Rengaswamy, K. Yin, S.N. Kavuri, A review of process fault detection and diagnosis: part I: quantitative model-based methods, Comput. Chem. Eng. 27 (2003) 293–311. [3] V. Venkatasubramanian, R. Rengaswamy, S.N. Kavuri, A review of process fault detection and diagnosis: part II: qualitative models and search strategies, Comput. Chem. Eng. 27 (2003) 313–326. [4] V. Venkatasubramanian, R. Rengaswamy, S.N. Kavuri, K. Yin, A review of process fault detection and diagnosis, part III: process history based methods, Comput. Chem. Eng. 27 (2003) 327–346. [5] Z. Ge, Z. Song, F. Gao, Review of recent research on data-based process monitoring, Ind. Eng. Chem. Res. 52 (2013) 3543–3562. [6] D.H. Wolpert, W.G. Macready, No free lunch theorems for optimization, IEEE Trans. Evol. Comput. 1 (1997) 67–82. [7] L.K. Hansen, P. Salamon, Neural network ensembles, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 993–1001. [8] B.V. Dasarathy, B.V. Sheela, A composite classiﬁer system design: concepts and methodology, Proc. IEEE 67 (1979) 708–713. [9] P. Cunningham, J. Carney, Diversity versus quality classiﬁcation ensembles based on feature selection, in: R. López de Mántaras, E. Plaza (Eds.), Machine Learning: ECML 2000, Springer Berlin Heidelberg, 2000, pp. 109–116. [10] L. Lam, Classiﬁer combinations: implementations and theoretical issues, in: Multiple Classiﬁer Systems, Springer Berlin Heidelberg, 2000, pp. 77–86. [11] R. Polikar, Ensemble based systems in decision making, IEEE Circuits Syst. Mag. 6 (2006) 21–45. [12] A.F.R. Rahman, H. Alam, M.C. Fairhurst, Multiple classiﬁer combination for character recognition: revisiting the majority voting system and its variations, in: D. Lopresti, J. Hu, R. Kashi (Eds.), Document Analysis Systems V, Springer Berlin Heidelberg, 2002, pp. 167–178. [13] L.I. Kuncheva, Combining Pattern Classiﬁers: Methods and Algorithms, John Wiley & Sons, 2004. [14] J.A. Benediktsson, I. Kanellopoulos, Classiﬁcation of multisource and hyperspectral data based on decision fusion, IEEE Trans. Geosci. Remote Sens. 37 (1999) 1367–1377. [15] G. Niu, S.-S. Lee, B.-S. Yang, S.-J. Lee, Decision fusion system for fault diagnosis of elevator traction machine, J. Mech. Sci. Technol. 22 (2008) 85–95. [16] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, 1976. [17] L.I. Kuncheva, J.C. Bezdek, R.P. Duin, Decision templates for multiple classiﬁer fusion: an experimental comparison, Pattern Recognit. 34 (2001) 299–314. [18] Y. Huang, C. Suen, The behavior-knowledge space method for combination of multiple classiﬁers, in: IEEE Computer Society Conference on Computer Vision

[19]

[20]

[21]

[22] [23] [24] [25] [26]

[27]

[28]

[29] [30] [31]

[32] [33] [34]

and Pattern Recognition, Institute of Electrical Engineers Inc. (IEEE), 1993, p. 347. K. Ghosh, Y.S. Ng, R. Srinivasan, Evaluation of decision fusion strategies for effective collaboration among heterogeneous fault diagnostic methods, Comput. Chem. Eng. 35 (2011) 342–355. M. Tabassian, R. Ghaderi, R. Ebrahimpour, Combination of multiple diverse classiﬁers using belief functions for handling data with imperfect labels, Expert Syst. Appl. 39 (2012) 1698–1707. O. Basir, X. Yuan, Engine fault diagnosis based on multi-sensor information fusion using Dempster–Shafer evidence theory, Inform. Fusion 8 (2007) 379–386. Y. Bi, J. Guan, D. Bell, The combination of multiple classiﬁers using an evidential reasoning approach, Artif. Intell. 172 (2008) 1731–1751. A.P. Dempster, A generalization of Bayesian inference, J. R. Stat. Soc. Ser. B: Methodol. 30 (1968) 205–247. P. Smets, R. Kennes, The transferable belief model, Artif. Intell. 66 (1994) 191–234. R.R. Yager, Dempster–Shafer belief structures with interval valued focal weights, Int. J. Intell. Syst. 16 (2001) 497–512. C.R. Parikh, M.J. Pont, N. Barrie Jones, Application of Dempster–Shafer theory in condition monitoring applications: a case study, Pattern Recognit. Lett. 22 (2001) 777–785. C.R. Parikh, M.J. Pont, N.B. Jones, F.S. Schlindwein, Improving the performance of CMFD applications using multiple classiﬁers and a fusion framework, Trans. Inst. Meas. Control 25 (2003) 123–144. K. Ghosh, S. Natarajan, R. Srinivasan, Hierarchically distributed fault detection and identiﬁcation through Dempster–Shafer evidence fusion, Ind. Eng. Chem. Res. 50 (2011) 9249–9269. B.E.R.J. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall, New York, 1993. T.K. Ho, The random subspace method for constructing decision forests, IEEE Trans. Pattern Anal. Mach. Intell. 20 (1998) 832–844. L. Xu, A. Krzyzak, C.Y. Suen, Methods of combining multiple classiﬁers and their applications to handwriting recognition, IEEE Trans. Syst. Man Cybern. 22 (1992) 418–435. L.I. Kuncheva, C.J. Whitaker, Measures of diversity in classiﬁer ensembles and their relationship with the ensemble accuracy, Mach. Learn. 51 (2003) 181–207. J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Comput. Chem. Eng. 17 (1993) 245–255. P.R. Lyman, C. Georgakis, Plant-wide control of the Tennessee Eastman problem, Comput. Chem. Eng. 19 (1995) 321–331.

Decision fusion systems for fault detection and identification in industrial processes

Decision fusion systems for fault detection and identification in industrial processes

Recommend Documents