Mutual information-based sparse multiblock dissimilarity method for incipient fault detection and diagnosis in plant-wide process

Mutual information-based sparse multiblock dissimilarity method for incipient fault detection and diagnosis in plant-wide process

Journal of Process Control 83 (2019) 63–76 Contents lists available at ScienceDirect Journal of Process Control journal homepage: www.elsevier.com/l...

2MB Sizes 1 Downloads 14 Views

Journal of Process Control 83 (2019) 63–76

Contents lists available at ScienceDirect

Journal of Process Control journal homepage: www.elsevier.com/locate/jprocont

Mutual information-based sparse multiblock dissimilarity method for incipient fault detection and diagnosis in plant-wide process Jing Zeng, Weiping Huang, Ziyang Wang, Jun Liang ∗ State Key Lab of Industrial Control Technology, Institute of Industrial Control Technology, Zhejiang University, Hangzhou, 310027, PR China

a r t i c l e

i n f o

Article history: Received 23 January 2018 Received in revised form 25 June 2019 Accepted 12 September 2019 Keywords: Mutual information Sparsification Multiblock dissimilarity strategy Plant-wide process detection and diagnosis

a b s t r a c t Multiblock methods have attracted much attention in monitoring of plant-wide processes. However, most recent works only provide rough division results and do not take the data distribution changes among the process into consideration. To solve these defects, a new multiblock monitoring scheme that integrates a mutual information (MI)-based sparsification method, dissimilarity analysis (DISSIM) method and support vector data description (SVDD) is proposed in this paper. This method makes use of the complex relations between variables and the connections between sub-blocks in process decomposition to produce easy-to-interpret sub-blocks related to the process mechanism. Then DISSIM method is applied in each sub-block for distribution monitoring. The multiblock DISSIM strategy can deal with the local behaviors and distribution changes, improving its sensitivity to incipient changes in plant-wide process. Finally, results in all blocks are combined with SVDD to provide a final decision, and a diagnosis method is proposed for fault diagnosis. Case studies upon a numerical case and Tennessee Eastman (TE) benchmark process demonstrate the effectiveness of our method. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction With the development of distributed control systems and new sensor technologies, processes data can be easily collected, and multivariate statistical process monitoring (MSPM) methods for keeping process safety and improving product quality become a research hotspot. MSPM methods make full use of process data and do not depend on the prior knowledge of the process [1–4]. Principal component analysis (PCA) and its extensions are the most commonly used MSPM methods for fault detection and diagnosis [5–10]. However, for plant-wide processes with complex structure, these global modeling methods may not function well. The plant-wide processes, containing multiple different parts and operation units, are featured with more abundant variables and more complex structure. A fault might affect only a few variables or units and shows local characteristics. Traditional global methods may submerge such local information, leading to poorer monitoring performance. Applying multiblock strategy is an efficient strategy to reduce the complexity and extract the local information, and faulty sections can be initially isolated by detecting faulty event in each block [11–17]. Westerhuis et al [14] compared the perfor-

∗ Corresponding author at: Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, PR China. E-mail address: [email protected] (J. Liang). https://doi.org/10.1016/j.jprocont.2019.09.004 0959-1524/© 2019 Elsevier Ltd. All rights reserved.

mance of several multiblock and hierarchical PCA and PLS methods. Zhang el al [15]. gave a decentralized fault diagnosis method based on the multiblock kernel PLS. Zhu et al. [16] promoted a distributed PCA method for the big data problem. The effectiveness of multiblock methods is proved in the above works, but they are based on prior knowledge for process decomposition, which is not always available. Besides, the division results may be subjected to the cognition or experience of operators. Thus, totally data-driven process decomposition methods have attracted considerable attention in the past few years. Ge and Song [18] proposed distributed PCA method based on the loading vectors of PCA. Grbovic et al [19] utilized a sparse PCA (SPCA) to decompose the process, and Xie et al [20] proposed a shrinking PCA (ShPCA) method for division. These data-driven methods can automatically divide the process into several sub-blocks. But they are implemented based on covariance and restricted to linear limitations. As an essential step of multiblock method, a reasonable and interpretable process decomposition is important for both fault detection and diagnosis. Considering the realistic nonlinear issue, mutual information (MI) based methods have been developed [21–25] recently. Jiang and Yan [21] proposed a process decomposition method which utilized MI to measure both the linear and nonlinear relations between variables. Then MI-spectral clustering strategy [22,23] are developed. Huang and Yan [24] focused on the independent variables and performed them separately. Xu et al [25] used minimal

64

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

nections between sub-blocks into account. This method gives a convincing division result with more interpretable and less redundant sub-blocks. (ii) A new multiblock method based on DISSIM method is proposed, which can efficiently extract the local incipient fault information, to improve the sensitivity in incipient fault. (iii) Incorporating the MI-based sparsification method, DISSIM, and SVDD, a new multiblock monitoring scheme is developed for more effective incipient fault monitoring in plant-wide process. It also provides a new way for monitoring the distribution and structure changes in plant-wide process. The rest of the paper is organized as follows: Section 2 gives brief reviews of the DISSIM and MI techniques. In Section 3, the proposed MI-SMDISSIM method is explained in detail, including MI-based sparsification method, fault detection and diagnosis strategy. Then two case studies upon a numerical case and the TE process are analyzed to demonstrate the effectiveness of the proposed method in Section 4. Finally, conclusions and discussions are provided in Section 5. Fig. 1. The flowchart of the MI-SDISSIM method.

redundancy maximal relevance (mRMR) method to reduce variable redundancy but gathered the remaining variables into one sub-block. Although these methods take the nonlinear relationship into account, the determination of the division limit or the cluster number depends on experience and the division results are rough. Moreover, these methods do not consider the relationship between sub-blocks during the decomposition process. This paper proposes a new MI-based sparsification method for a more reasonable block division result. After process decomposition, local models are constructed in the sub-blocks. PCA and its extensions are the commonly used methods. These methods use the distance indices for fault evaluation and may not be sensitive to the changes in data distributions or geometric structures. Kano et al [26] proposed a dissimilarity analysis method, termed as DISSIM, to evaluate the distribution difference between normal and faulty conditions. It is based on the idea that a change of operating condition can be detected by monitoring the distribution of process data, and uses Karhunen-Loeve (KL) expansion [27] to evaluate the dissimilarity. Zhao et al. applied DISSIM method to batch process [28] and nonlinear process [29]. Although the DISSIM method has been successfully used in detecting incipient fault [27,30], it is a global modeling method. As mentioned before, in plant-wide processes the incipient fault may affect a few variables and its information may be suppressed when modeling with all the variables. The performance of global DISSIM method to incipient fault detection suffers from deterioration, which also results in less information for fault diagnosis and may even lead to wrong result. Thus a multiblock strategy is required. To address the above issues, this paper proposes a new MI-based sparse multiblock DISSIM (MI-SMDISSIM) method for incipient fault detection and diagnosis in plant-wide processes. Referring to the framework for multiblock monitoring scheme [17], the flowchart of the proposed MI-SMDISSIM method is displayed in Fig. 1. Firstly, a new MI-based sparsification method is proposed for more reasonable process decomposition. After block division, DISSIM method is applied in each sub-block, exploring the local behaviors. Their monitoring results are then combined together with SVDD to make the final decision. Once the fault is detected, a fault diagnosis strategy is activated based on the monitoring result of each sub-block. The contribution of the paper can be summarized as follow: (i) A new MI-based sparsification method is proposed. It takes both the complex relations between variables and the con-

2. Preliminaries 2.1. DISSIM Kano et al. [20] proposed a dissimilarity factor to evaluate the difference between the distributions of two datasets. Denote X1 and X2 are two datasets with Ni (i = 1, 2) samples and m process variables. Dataset X1 is chosen as the reference and normalized to zero-mean and unit-variance, the other dataset is also scaled with the same normalization information. The covariance matrix of the mixture of both datasets is given by: 1 S= N1 + N2 − 1 =



X1T

X2T

   X1 X2

N1 − 1 N2 − 1 S1 + S2 N1 + N2 − 1 N1 + N2 − 1

(1)

where Si = XiT Xi /(Ni − 1). The covariance matrix S can be decomposed using the singular value decomposition (SVD): S = VV T

(2) Rm×m

where  ∈ is a diagonal matrix with the eigenvalues of S as diagonal elements; V contains the mutual orthonormal vectors. Then the original dataset Xi can be transformed as:



Yi =

Ni − 1 X V−1/2 N1 + N2 − 1 i

(3)

With the transformed matrix, the covariance matrices of Y1 and Y2 , which share same eigenvectors, can be calculated and the corresponding eigenvalues j (j = 1, 2, · · ·, m) are obtained. By transforming the two datasets resulting in the same eigenvectors, the difference of their eigenvalues can be used to evaluate the distribution difference between two sets. A dissimilarity index D is defined as:

4  D= j − 0.5 m m

2

(4)

j=1

The dissimilarity index D changes between zero and one. Index D is close to zero if two datasets are similar to each other and it is near one if they are quite different. A set of normal indices D can be obtained by exploring the normal training data. For D values cannot satisfy a defined distribution, its control limit CtrD can be calculated

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

65

using kernel density estimation (KDE) [31,32] with Gaussian kernel function and a confidence level of .

a real symmetric and (semi-)positive define matrix, which can be decomposed using SVD algorithm:

2.2. Mutual information

SMI = P0 MI P0T

In information theory [33,34], the MI is a non-parametric and non-linear measurement index, which can quantify the mutual independence of two variables from the aspect of entropy. It considers the higher-order statistic which can perform well for both linear and nonlinear relationship. Considering two random variables x1 and x2 , the Shannon entropy estimation of x1 is defined as



H(x1 ) = −

p (x1 ) log p(x1 )dx1

(5)

where p (x1 ) is the probability density function of x1 . The MI between these two variables can be calculated as:



I(x1 , x1 ) =

p(x1 , x2 ) log

p(x1 , x2 ) dx1 dx2 p (x1 ) p (x2 )

(6)

where p (x1 , x2 ) is the joint probability density function. Using the entropy of x1 and x2 , Eq. (6) can be rewritten as: I(x1 , x1 ) = H(x1 ) + H(x2 ) − H(x1 , x2 )

(7)

where H (x1 ) and H (x2 ) are the marginal entropies of x1 and x2 , and H (x1 , x2 ) is the joint entropy which is given as follows:



H(x1 , x1 ) = −

p(x1 , x2 ) log p(x1 , x2 )dx1 dx2

(8)

The calculation of MI based on Eq. (6) is intensive and inefficient. A toolbox named MIToolbox has been developed and widely used for calculating MI [35]. Variables with stronger relevance will have larger MI, whereas, the MI value will be equal to 0. 3. MI-based sparse multiblock DISSIM monitoring scheme Followed the flowchart in Fig. 1, the detailed description of the proposed method is given in this section. 3.1. MI-based sparsification method

(10)

where MI ∈ Rm×m is a diagonal matrix with the eigenvalues 0 ≤ 01 ≤ 02 ≤ · · · ≤ 0m of SMI as diagonal elements, and the columns





of loading matrix P0 = p01 , p02 , · · ·, p0m ∈ Rm×m are corresponding eigenvectors. The elements in eigenvector p0i show the strength of relationship between variables in both linear and nonlinear aspects. To extract the strongly related variables and give a clear interpretation of the sub-blocks, the following MI-based optimization problem is proposed to sparsify the loading matrix:

 

p∗i = argmin pTi SMI pi + ˇ|pi  |

 pi 2 s.t. |pi  | = 1 2

1

(11)

pTi p∗j = 0, j = 1, 2, · · ·, i − 1 where pi (i = 1, 2, · · ·, m) ∈ Rm×1 is the sparse loading vector,

m      |pi  | = pi,k  is the l1 norm to introduce sparsity into pi and 1 k=1

pi,k is the kth element of pi , ˇ > 0 controls the sparsity of pi . If the variables are strongly related, the corresponding elements in pi will be increased the rest elements will be shrunk to zeroes. Thus the interpretability of pi is increased. It should be noted that the MI-based optimization problem is a non-convex problem with the l1 norm and the constraints. An algorithm with interior point algorithm is developed for solving the optimization problem as follows: Algorithm 1. (Iterative interior point algorithm for solving MIbased optimization problem) (i) To get the sparse loading matrix, the following step is continued until m sparse loading vectors are calculated, and we initially set i = 0; (ii) Set i = i + 1, a = 1. Start the algorithm by setting the ith loading vector in P0 as the initial solution of the ith sparse loading (0) vector, that is pi = p0i ; (a−1)

Many traditional methods, including SPCA and ShPCA, are based on the covariance of the dataset, which only focus on the linear relationship between variables and perform poor for the nonlinear relationship. To solve this problem, we make use of MI to extract the complex relationships between variables and propose a MI-based sparsification method for block division. This method also takes the relations between sub-blocks into account with the orthogonal constraints to reduce the redundancy between sub-blocks. Given a normal training dataset X ∈ RN×m with N samples and m process variables, the MI values between every two variables are calculated and a MI matrix SMI ∈ Rm×m is obtained as:



I1,1

I1,2

···

I1,m

⎢I ⎢ 2,1

I2,2

···

I2,m

SMI = ⎢ ⎢

⎣ ... Im,1

.. .

..

Im,2

···

.



⎥ ⎥ ⎥ .. ⎥ . ⎦

(iii) Based on the last solution pi , the optimization probin Eq. (11) can be reformulated as followed, where yi = lem   yi,1 , yi,2 , · · ·, yi,m denotes the absolute value terms in the l1 norm of pi :

p∗i = arg min pTi SMI pi + ˇ pi

yi,k

k=1

(a−1)

s.t. pTi pi

m 

=1

(12)

− yi,k ≤ pi,k ≤ yi,k , k = 1, 2, · · ·, m pTi p∗j = 0, j = 1, 2, · · ·, i − 1 note that the orthogonal constraints are vanished for i = 1;

(9)

Im,m

where Ii,j (i, j = 1, 2, · · ·, m) is the MI value between xi and xj . Except for the simple linear relationship, Ii,j can extract the nonlinear relevance between xi and xj by accounting more relation than the covariance value. Similar to the covariance matrix, MI matrix is

(i) Solve the optimization problem using the interior point  algo(a) (a) rithm. Then, obtain pi by normalizing p∗i : pi = p∗i / |p∗i  |;

 

(a)

(ii) If|pi

(a−1)

− pi

   | ≤ ε, go to Step vi, else go to Step iii. Note ε is 1

the convergence threshold and we take ε = 0.001 in our work; (iii) Repeat Step ii-v until all the loading vectors are calculated, that is i = m.

66

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

Fig. 2. Monitoring scheme of the proposed MI-SMDISSIM.

Fig. 3. (a)MI-based sparse loading matrix, (b)covariance-based sparse loading matrix.

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

67

Fig. 4. Monitoring result of the Fault 1: (a) GDISSIM; (b) MI-SMDISSIM; (c) DISSIM sub-block 3; (d) diagnosis result of GDISSIM; (e) diagnosis result of MI-SMDISSIM.

Since ˇ > 0 tunes the sparsity, the value of ˇ is selected as the ones minimizing the following Bayesian information criterion (BIC) [20]:

BICˇ,i = log

 T ∗ pi

SMI p∗i



 ∗ log N

+ f pi

N

such sub-block. The process is finally divided into B sub-blocks, and the original training data can be written as: X = [X1 , X2 , · · ·, XB ]

(13)



where f p∗i collects the number of non-zero elements in vector p∗i . Since f (·) is not continuous and p∗i cannot be explicitly express by ˇ, we practically choose ˇ from a finite set ˇ ∈  l 2 : l ∈ Z is an interger and lmin ≤ l ≤ lmax , where lmin and lmax can be selected with specific process.

3.2. Multiblock DISSIM fault detection With the MI-based sparsification method proposed in section 3.1, we can obtain a sparse loading matrix P with m sparse vectors, that is P = [p1 , p2 , · · ·, pm ]. Each sparse loading vector contains a few non-zero elements and can be used for process decomposition. Then m sub-blocks can be collected. If the variable set of one subblock is the subset of another sub-block’s variable set, eliminate

(14)

where Xb ∈ Rmb ×N (b = 1, 2, · · ·, B) denotes the sub-dataset in the bth sub-block with mb process variables and N samples. Then the multiblock DISSIM method is implemented. In each sub-block, the moving window length is set as L and the first moving window is chosen as reference distribution. Then the corresponding dissimilarity indices Db ∈ RN−L and the control limit CtrDb are calculated. Then SVDD [38,39] is used to merge the results of each subblock. The input data Y of SVDD are constructed by the dissimilarity indices of each block as: Y = [D1 , D2 , · · ·, DB ]

(15)

A hypersphere with center a and radius R can be calculated. For a new online moving window Z ∈ Rm×L , it is first divided into B sub-moving window according to the sub-blocks, that is: Z = [Z1 , Z2 , · · ·, ZB ]

(16)

68

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

Fig. 5. Monitoring result of the Fault 2: (a) GDISSIM; (b) MI-SMDISSIM; (c) DISSIM sub-block 3; (d) PCA sub-block 3; (e) diagnosis result of GDISSIM; (f) diagnosis result of MI-SMDISSIM.

The dissimilarity index in each block is calculated:



Dz = Dz,1 , Dz,2 , · · ·, Dz,B



(17)

where Dz,b is the dissimilarity between the sub-reference data and the sub-moving window. Based on the corresponding monitored vector ytest = DZ , the distance d between the monitored data and the center a can also be calculated with SVDD. A monitoring index can be defined as:

DR =

d R

(18)

And the control limit is 1. That is the online moving window is referred as normal if DR is less than 1; otherwise the window is a faulty window.

3.3. Fault diagnosis Once the fault has been detected, the diagnosis method is activated to find the root cause. Based on the work of Verron et al. [36], a two-step diagnosis method is proposed in this section. Firstly, considering the non-faulty variables may be affected by the faulty variables and mislead the diagnosis result, the variables in the non-faulty blocks are eliminated to reduce the ‘noisy’ effect. Then

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

69

Fig. 6. Monitoring result of the Fault 3: (a) GDISSIM; (b) MI-SMDISSIM; (c) DISSIM sub-block 1; (d) DPCA; (e) diagnosis result of GDISSIM; (f) diagnosis result of MI-SMDISSIM.

the contribution of corresponding variables can be calculated with MI. Define ˝ as a variable set containing all the variables in the plant-wide process. If one online moving window is referred as a faulty window, the dissimilarity indices of sub-blocks are examined. The sub-block whose dissimilarity index Db ≤ CtrDb is referred as non-faulty sub-block and its variables are allocated into the non-faulty-variables set ˝n . Then the faulty-variables set ˝f can be obtained as follow: ˝f = ˝ − ˝n

(19)

The numbers of variables in ˝n and ˝f are mn and mf respectively, and mn + mf = m. Then the data in reference window and online moving window can be expressed as:



Xr = Xrf , Xrn



Z = Zf , Zn





∈ Rm×L

∈ Rm×L

(20)

where Xrf and Z f represent the reference data and online data corresponding to mf faulty variables. They can be treated as two different classes with label C = 0 and C = 1 respectively. A super dataset corresponding to faulty variables is constructed by putting Xrf and Zf





together as Tf = Xrf , Zf ∈ Rmf ×2L . The MI between a faulty variable and the multinomial random variable C can be computed as:





I Tf,k , C =





1 log T2 f,k 2



−p (C = 1) log Z2





− p (C = 0) log X2



rf,k



(21)

f,k





where Tf,k k = 1, 2, · · ·, mf is corresponding to the kth faulty variable; p (C = 0) = p (C = 1) = 0.5 are the probability distribution of the two classes; T2 is the variance of Tf,k ; X2 and Z2 are the f,k

rf,k

f,k

70

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

variances of Xrf,k and Zf,k respectively. Then the contribution of the kth faulty variable in ˝f is calculated as:



Contk =

I Tf,k , C mf  



I Tf,k , C



(22)

k=1

Variable with larger Contk has more information to discriminate Xrf and Zf , and it is considered to be more responsible for the fault. For the sub-blocks are well defined, the diagnosis result can be easily interpreted. And the elimination of the non-faulty variables provides a subset of possibly responsible variables, which can reduce the misjudgement of the root cause.

dynamic correlation. This numerical process is simulated as follows: x1 (t) = s1 (t) + s2 (t) + 0.18x1 (t − 1) + 0.23x2 (t − 1) + e1 (t) x2 (t) = 1.56s1 (t) + 1.14s2 (t) + 0.87x1 (t − 1) − 0.25x2 (t − 1) + e2 (t) x3 (t) = x12 (t) − 2.14x2 (t) + e3 (t) x4 (t) = −x13 (t) + 3x22 (t) + e4 (t) x5 (t) = s2 (t) + e5 (t) x6 (t) = x52 (t) − 2x5 (t) + e6 (t) x7 (t) = −x53 (t) + 3x5 (t) + e7 (t)

3.4. Fault monitoring procedure

x8 (t) = 1.74s2 (t) + 1.89s3 (t) + e8 (t)

The whole procedure of the MI-SMDISSIM monitoring scheme is illustrated in Fig. 2, and its detailed steps for the fault detection and diagnosis are shown as follows: Offline modeling:

x9 (t) = 1.42s2 (t) + 1.92s3 (t) + e9 (t)

(i) Obtain the training data X under normal operating condition and calculate the MI matrix SMI ; (ii) Solve the MI-based sparse optimization problem with Algorithm 1 and obtain the sparse loading matrix P; (iii) Divide the process variables into B blocks with sparse loading matrix P; (iv) Implement DISSIM method in each block and calculated the dissimilarity indices Db and the corresponding limits CtrDb . The length of the moving window is set to be L; (v) Take the indices in each block as the input of SVDD and calculate the center a and the radius R of the hypersphere; Online monitoring: (i) Collect an online moving window Z and divide it into B submoving windows; (ii) Calculate the dissimilarity index Dz,b between sub-moving window and corresponding reference data; (iii) Take the indices Dz,b as the input of SVDD, calculate the distance d and index DR; (iv) If DR > 1, check the dissimilarity index Dz,b of each sub-moving window, obtain the faulty variables set ˝f and turn to step (v); else turn to online step (i) and wait for another online moving window; (v) Divide the data in reference distribution and online moving window into faulty and non-faulty sets respectively, and construct the super dataset Tf ; (vi) Calculate the contribution of the kth faulty variable in ˝f and analyze the root cause of the fault. 4. Case studies In this section, the proposed method is first applied to a numerical case with complex correlations to demonstrate the effectiveness of the proposed MI-based sparsification method, and three faults are constructed to verify the method’s monitoring performance. Then, our method is implemented on the TE benchmark process to show its advantages. 4.1. Case study on a numerical system The proposed MI-SMDISSIM method is applied for fault detection and diagnosis on a simple plant-wide process with both linear and nonlinear correlations. The process contains 12 variables and could be divided into four parts. The first part contains a simple

x10 (t) = x82 (t) − 2x9 (t) + e10 (t) x11 (t) = 3x82 (t) − x93 (t) + e11 (t) x12 (t) = e12 (t) (23) where factors s1 , s2 , s3 are uncorrelated Gaussian distributed data sources with zero-mean and unit variance; ei (i = 1, . . ., 12) are independent Gaussian noise N (0, 0.1). Four hundred samples are generated as the training data, and xi (i = 1, . . ., 12) are measurements used for process monitoring. Besides, three test datasets with four hundred samples are generated as follows: Fault 1: a step change of 2.5 is added on s3 from the 201st sample; Fault 2: a ramp change of 0.05 (i − 200) is added on x8 from the 201st sample, where i is the sample number; Fault 3: the parameter 0.23 from x2 (t − 1) to x1 (t) drifts to 0.8 from the 201st sample; Based on the training data, the MI matrix and covariance matrix are calculated respectively, and then the sparse loading matrices are constructed. The sparse loading matrix based on is illustrated in Fig. 3(a). The range of paramthe MI matrix  eter ˇ is 2−1 , 26 . By choosing the minimum BIC values, we can get the proper ˇ for each vector and their   values are 20 , 2−1 , 2−1 , 2−1 , 20 , 2−1 , 2−1 , 2−1 , 2−1 , 20 , 2−1 , 24 . The nonzero elements in each loading vector denote that the corresponding variables are strongly related, and finally the numerical process is divided into four sub-blocks, {x1 , x2 , x3 , x4 }, {x5 , x6 , x7 }, {x8 , x9 , x10 , x11 } and {x12 }, which is totally consistent with the description in Eq. (23). In contrast, as shown in Fig. 3(b), the covariance-based sparse loading matrix fails to extract the complex relationship and puts nonlinear-correlated variables into different block, such as x1 , x2 and x3 , x4 . After the division, DISSIM method is carried out in each subblock. A global DISSIM (GDISSIM) model is also constructed for comparison. During the construction, the window length is set to be 30 by trial and error. 371 moving windows are generated in each test dataset and the fault is introduced from the 172th the moving window. The confidence level is defined as 0.99. Monitoring results of Fault 1 is plotted in Fig. 4. Fault 1 is a step change in factor s3 which only affects the third sub-block {x8 , x9 , x10 , x11 }. As shown in Fig. 4(a), GDISSIM method has a poor performance. In contrast, Fig. 4(c) show can successfully detects the local abnormal variation and has a better performance. Fig. 4(c) shows that the fault is successfully detected in sub-block 3. The fault diagnosis results of GDISSIM and MI-SMDISSIM are presented in Fig. 4(d) and (e). Fig. 4(e) successfully identified x8 , x9 , x10 , x11 as the responsible variables, while Fig. 4(d) also takes x4 and x7 as the faulty variables even though x4 and x7 are independent from s3 .

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

The monitoring results of Fault 2 are presented in Fig. 5. Fault 2 is a ramp local change and has little effect on the process at the early stage. Our MI-SMDISSIM method can detect the fault at 189th moving window, while GDISSIM method does not detect the fault until the 241th moving window. Besides, Fig. 5(c) and (d) show that in sub-block 3 DISSIM method is more sensitive to incipient fault than PCA method. These prove the conclusion that the proposed method is boosted with its sensitivity to the incipient local fault even if magnitude of the change is small. The diagnosis result MI-SMDISSIM is plotted in Fig. 5(f). The proposed method discovers that variables x8 , x10 , x11 are firmly related to the fault while variable x9 has little to do with the fault. And this is totally right according to Eq. (23). The monitoring results of Fault 3 are plotted in Fig. 6. In this case, a parameter in calculating variable x1 has changed, which influences all the variables in sub-block1. The proposed MI-SMDISSIM method is sensitive to this local structure change and outperforms the GDISSIM method with a detection rate of 0.895, as shown in Fig. 6(a) and (b). The dynamic PCA (DPCA) method is also applied and fails to detect the local structure change, as shown in Fig. 6(d), even it extracts the relationship between past data and current data. This verifies the superiority of the proposed method. Fig. 6(f) shows that the faulty variables are x1 , x2 , x3 , x4 , which matches the reality.

Fig. 7. MI-based sparse loading matrix.

Table 1 Block Division Result.

4.2. Case study on the TE process The proposed fault detection and diagnosis method is illustrated on the TE process which is a widely used simulated industrial benchmark proposed by Downs and Vogel [37] in this section. It involves five major operation units: a reactor, a product condenser, a vapor-liquid separator, a recycle compressor and a product stripper. A set of 53 variables can be measured in the process, including 22 continuous process measurements, 11 process manipulated variables and 19 composition measurements, and 33 continuous variables listed in Appendix Table A1 are used for process monitoring. Note that RCW is the abbreviation of reactor cooling water, CCW is abbreviated of condenser cooling water and SCW stands for separator cooling water. One normal training dataset and 21 faulty test datasets are simulated. Each dataset contain 960 samples and all the faults are introduced at the 161th sample. For faults 3, 9, 15 are very subtle faults which are hard to be detected, they are not considered in the evaluation. The other 18 faults are listed in Appendix Table A2.

71

block no.

variables no

block no.

variables no

1 2 3 4 5 6 7

7, 13, 16, 18, 19, 20, 27, 31 2, 9, 21, 23, 32 15, 30 12, 29 17, 33 1, 25 10, 28

8 9 10 11 12 13

11, 22 3, 24 4, 26 6, 8 5 14

The training dataset is used to calculated MI-based sparse  load ing matrix, which is plotted in Fig. 7. The range of ˇ is 2−4 , 28 , and the kernel parameter of SVDD is set as the input dimension m. Then with the proposed division strategy, the whole process can be divided into 13 sub-blocks and the variables distribution for each sub-block is listed in Table 1. The proposed method gives more interpretable sub-blocks, which are closely related to the process mechanism or the control effect. For example, sub-block 1 denotes the relation of pressure and temperature between different units and compressor work. Sub-block 4 shows that the product separator level is affected by the separator pot liquid flow valve as a result of feedback control. For all 21 faults, the detection rates of MI-SMDISSIM and GDISSIM are presented in Table 2. The window length is set as L = 100

Table 2 Fault Detection Rates of the TE Process. Fault No

T 1 2 4 5 6 7 8 10 11 12 13 14 16 17 18 19 20 21

DPCA

PCA 2

0.991 0.984 0.209 0.241 0.991 1 0.969 0.296 0.406 0.984 0.936 0.993 0.135 0.763 0.893 0.110 0.318 0.393

SPE

T2

SPE

0.999 0.958 1 0.209 1 1 0.836 0.258 0.749 0.895 0.953 1 0.274 0.954 0.901 0.125 0.498 0.473

0.991 0.985 0.054 0.245 0.989 1 0.971 0.276 0.236 0.991 0.936 0.961 0.110 0.773 0.891 0.005 0.336 0.356

0.998 0.974 1 0.264 1 1 0.964 0.361 0.899 0.943 0.953 1 0.381 0.971 0.900 0.809 0.575 0.560

GDISSIM

MI-MBPCA

MISMDISSIM

0.650 0.297 0.393 0.448 0.934 0.584 0.928 0.827 0.941 0.988 0.900 0.577 0.963 0.933 0.863 0.956 0.844 0.423

0.997 0.986 1 1 1 1 0.991 0.850 0.776 0.996 0.957 1 0.906 0.961 0.912 0.737 0.772 0.522

0.980 0.976 0.829 0.990 1 0.999 0.970 0.876 0.984 0.994 0.930 0.995 0.863 0.966 0.900 0.968 0.878 0.900

72

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

Fig. 8. Monitoring results of Fault 5: (a) PCA; (b) DPCA; (c) GDISSIM; (d) MI-SMDISSIM; (e) DISSIM sub-block5.

and the confidence level is 0.99. Also the detection rates of GPCA, DPCA and MI-MBPCA [21] are calculated for comparison. The bold value for one fault indicates the highest detection rate. It can be seen that the performance of the MI-SMDISSIM method is more robust. It is the optimal one in many cases and shows a similar performance in the rest cases. Besides, our method focuses on the data distribution changes and is sensitive to incipient faults, especially in fault 21. The performance of our method can be adjusted by the window length, and Table 3 gives the detection rates with different window length. The detection rates of the MI-SMDISSIM in fault 4 and 16 are significantly improved with 150-sample window length. But our method with 100-sample window length has better performance in most cases, and online fault detection with smaller window length can be activated earlier, which is more practical in

engineering practice. So we set L = 100 in the further analysis of faults 5 and 21. Fault 5 involves a step change in the condenser cooling water inlet temperature. It causes a change in the flow rate of the CCW and soon affects the temperature of the separator as well as that of the outlet stream from the separator to the stripper. Then, the whole process is seriously affected by the fault in a short time. Fig. 8 shows that fault 5 can be detected by all the method, but PCA, DPCA and GDISSIM notify the operator that the fault has been corrected after several hours. The reason is that most faulty variables are compensated by control system, and only variable 33 is stabilized around a point higher than its normal values for the CCW inlet temperature is kept above its normal values. Thus fault 5 becomes a typical local fault which is difficult to be detected by global model. Judging

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

73

Fig. 9. Fault diagnosis results of Fault 5: (a) contribution plot to the SPE statistic in PCA; (b) contribution in GDISSIM (c) contribution in MI-SMDISSIM.

Table 3 Fault Detection Rates of GDISSIM and MI-SMDISSIM with different window length. Fault No 1 2 4 5 6 7 8 10 11 12 13 14 16 17 18 19 20 21

MI-SMDISSIM

GDISSIM L = 100

L = 150

L = 100

L = 150

0.650 0.297 0.393 0.448 0.934 0.584 0.928 0.827 0.941 0.988 0.900 0.577 0.963 0.933 0.863 0.956 0.844 0.423

0.928 0.622 0.679 0.612 0.950 0.780 0.953 0.935 0.969 0.988 0.894 0.994 0.984 0.934 0.890 0.946 0.871 0.604

0.980 0.976 0.829 0.990 1 0.999 0.970 0.876 0.984 0.994 0.930 0.995 0.863 0.966 0.900 0.968 0.878 0.900

0.980 0.975 0.993 0.989 1 0.999 0.961 0.858 0.983 0.990 0.923 0.995 0.981 0.965 0.891 0.954 0.871 0.846

from the global models, operator may be misled to believe that the fault has been corrected. In contrast, Fig. 8(d) shows that our MISMDISSIM method keeps detecting the fault. Furthermore, Fig. 8(e) gives the monitoring result of sub-block 5 which keeps informing the existence of the fault in the CCW, while the DISSIM indices in the other sub-blocks have returned beyond the limits, thus the root cause can be initially located in sub-block 5. Then diagnosis strategy is implemented. The diagnosis results of traditional PCA method and GDISSIM, as shown in Fig. 9(a) and (b) respectively, find lots of responsible variables. Superior to these two methods, Fig. 9(c) shows that our proposed strategy correctly isolates variable 33 and takes it as the root cause of the fault.

Fault 21 is a valve position constant fault in stream 4. The total feed flow valve (variable 26) changes its distribution and keeps constant since 161th sample. It has inconspicuous effect on the other variables at the beginning and the fault is hard to detect. As time goes, the fault becomes obvious. This is a common case in industrial process that serious fault evolves gradually from incipient distribution changes and shows a local characteristic. For global models like PCA, DPCA and GDISSIM, as shown in Fig. 10, they take a long time to discover the fault. In contrast, Fig. 10(d) shows that the MI-SMDISSIM method keeps informing the operator of the fault since the 142th moving window. Besides, see Table 2, the detection rate of our method is 0.900, which is much higher than that of state-of-art MI-MBPCA method. These verify that MI-SMDISSIM method is more sensitive to incipient fault with local characteristic. The diagnosis results are shown in Fig. 11. In the contribution plot of PCA, reactor level (variable 8) is mistaken as the root cause. In contrast, the contribution plot of GDISSIM and MI-SMDISSIM correctly isolate variable 26. But the diagnosis result in MI-SMDISSIM is more accurate and points out that the other variables have little contribution to the fault. 5. Conclusion In this paper, a new MI-SMDISSIM method is proposed for incipient fault detection and diagnosis in plant-wide processes. MI-based sparsification method takes advantage of MI and automatically divides the process into blocks according to the linear and nonlinear relations among variables. And it considers the relationship between sub-blocks with several constraints and gives a more reasonable block division related to the process mechanism or controller effect. Motivated by the sensitivity of DISSIM to the distribution changes, a multiblock DISSIM method is proposed to extract the local incipient information. It is expected to be sensitive

74

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

Fig. 10. Monitoring results of Fault 21: (a) PCA; (b) DPCA; (c) GDISSIM; (e) DISSIM sub-block10; (f) MI-SMDISSIM.

to the incipient fault in plant-wide processes. At last, SVDD is used to generate the final intuitionistic result combining all the results in sub-blocks. Besides, with the proposed diagnosis strategy, the process fault roots can be successfully diagnosed. A numerical case and the TE benchmark process are used for experiment. The results verify the feasibility and efficiency of the proposed MI-SMDISSIM method. However, it needs noticing that our division method is implemented based on the historical samples of normal data. In practice, the relationship between variables can be time-varying for some reasons, such as mode change, which

may cause malfunction of the proposed method. Thus a more reliable strategy in block division should be developed. Nevertheless, sub-blocks related methods are promising in fault diagnosis field, worthing further investigation in the future work. Acknowledgment This study was supported by the National Natural Science Foundation of China (U1664264,U1509203).

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

75

Fig. 11. Variables identification results of Fault 21: (a) contribution plot to the SPE statistic in PCA; (b) contribution in GDISSIM (c) contribution in MI-SMDISSIM.

Appendix A.

Table A1 Monitoring Variables in the TE process. Variables no.

Measurement Variables

Variables no.

Measurement Variables

Variables no.

Manipulated Variables

1 2 3 4 5 6 7 8 9 10 11

A Feed D Feed E Feed Total Feed Recycle Flow Reactor Feed Rate Reactor Pressure Reactor Level Reactor Temp. Purge Rate Product Separator Temp.

12 13 14 15 16 17 18 19 20 21 22

Product Separator Level Product Separator Pressure Product Separator Underflow Stripper Level Stripper Pressure Stripper Underflow Stripper Temp. Stripper Steam Flow Compressor Work RCW Outlet Temp. SCW Outlet Temp.

23 24 25 26 27 28 29 30 31 32 33

Feed D Flow Valve Feed E Flow Valve Feed A Flow Valve Total Feed Flow Valve Compressor Recycle Valve Purge Valve Separator Pot Liquid Flow Valve Stripper Liquid Product Flow Valve Stripper Steam Valve RCW Flow CCW Flow

Table A2 TE Process Fault Description. Fault No.

Description

Type

1 2 4 5 6 7 8 10 11 12 13 14

A/C feed ratio, B composition constant (stream4) B composition, A/C ratio constant(stream4) RCW inlet temperature CCW inlet temperature A feed loss (stream1) C header pressure loss-reduced availability (stream4) A,B,C feed composition (stream4) C feed temperature (stream2) RCW inlet temperature CCW inlet temperature reaction kinetics RCW valve

step change step change step change step change step change step change random variation random variation random variation random variation slow drift sticking

76

J. Zeng et al. / Journal of Process Control 83 (2019) 63–76

Table A2 (Continued) Fault No.

Description

Type

16 17 18 19 20 21

unknown unknown unknown unknown unknown valve position constant (stream 4)

unknown unknown unknown unknown unknown constant position

References [1] S. Joe Qin, Statistical process monitoring: basics and beyond, J. Chemom. 17 (2003) 480–502. [2] Z. Ge, Z. Song, F. Gao, Review of recent research on data-based process monitoring, Ind. Eng. Chem. Res. 52 (2013) 3543–3562. [3] S. Joe Qin, Survey on data-driven industrial process monitoring and diagnosis, Annu. Rev. Control 36 (2012) 220–234. [4] S. Yin, S.X. Ding, X. Xie, H. Luo, A review on basic data-driven approaches for industrial process monitoring, IEEE Trans. Ind. Electron. 61 (2014) 6414–6428. [5] L.H. Chiang, E. Russell, R.D. Braatz, Fault Detection and Diagnosis in Industrial Systems, Springer Verlag, London, 2001. [6] S. Narasimhan, S.L. Shah, Model identification and error covariance matrix estimation from noisy data using PCA, Control Eng. Pract. 16 (2008) 146–155. [7] W. Ku, R.H. Storer, C. Georgakis, Disturbance detection and isolation by dynamic principal component analysis, Chemometr. Intell. Lab. Syst. 30 (1995) 179–196. [8] K. Liu, Z. Fei, B. Yue, J. Liang, H. Lin, Adaptive sparse principal component analysis for enhanced process monitoring and fault isolation, Chemometr. Intell. Lab. Syst. 146 (2015) 426–436. [9] I. Portnoy, K. Melendez, H. Pinzon, M. Sanjuan, An improved weighted recursive PCA algorithm for adaptive fault detection, Control Eng. Pract. 50 (2016) 69–83. [10] Q. Jiang, X. Yan, Weighted kernel principal component analysis based on probability density estimation and moving window and its application in nonlinear chemical process monitoring, Chemometr. Intell. Lab. Syst. 127 (2013) 121–131. [11] Q. Jiang, B. Huang, Distributed monitoring for large-scale processes based on multivariate statistical ananlysis and Bayesian method, J. Proc. Control 46 (2016) 75–83. [12] Z. Ge, Review on data-driven modeling and monitoring for plant-wide industrial processes, Chemometr. Intell. Lab. Syst. 171 (2017) 16–25. [13] J.F. MacGregor, C. Jaeckle, C. Kiparissides, M. Koutoudi, Process monitoring and diagnosis by multiblock PLS methods, AIChE J. 40 (1994) 826–838. [14] J.A. Westerhuis, T. Kourti, J.F. MacGregor, Analysis of multiblock and hierarchical PCA and PLS models, J. Chemom. 12 (1998) 301–321. [15] Y. Zhang, H. Zhou, S.J. Qin, T. Chai, Decentralized fault diagnosis of large-scale processes using multiblock kernel partial least squares, IEEE Trans. Ind. Informatics 6 (2010) 3–10. [16] J. Zhu, Z. Ge, Z. Song, Distributed parallel PCA for modeling and monitoring of large-scale plant-wide processes with big data, IEEE Trans. Ind. Informatics 13 (2017) 1877–1885. [17] Z. Ge, J. Chen, Plant-wide industrial process monitoring: a distributed modeling framework, IEEE Trans. Ind. Informatics 12 (2016) 310–321. [18] Z. Ge, Z. Song, Distributed PCA model for plant-wide process monitoring, Ind. Eng. Chem. Res. 52 (2013) 1947–1957. [19] M. Grbovic, W. Li, P. Xu, A.K. Usadi, L. Song, S. Vucetic, Decentralized fault detection and diagnosis via sparse PCA based decomposition and Maximum Entropy decision fusion, J. Process Control 22 (2012) 738–750. [20] L. Xie, X. Lin, J. Zeng, Shrinking principal component analysis for enhanced process monitoring and fault isolation, Ind. Eng. Chem. Res. 52 (2013) 17475–17486.

[21] Q. Jiang, X. Yan, Plant-wide process monitoring based on mutual information-multiblock principal component analysis, ISA Trans. 53 (2014) 1516–1527. [22] Q. Jiang, X. Yan, Nonlinear plant-wide process monitoring using MI-spectral clustering and Bayesian inference-based multiblock KPCA, J. Process Control 32 (2015) 38–50. [23] J. Huang, X. Yan, Double-step block division plant-wide fault detection and diagnosis based on variable distributions and relevant features, J. Chemom. 29 (2015) 587–605. [24] J. Huang, X. Yan, Related and independent variable fault detection based on KPCA and SVDD, J. Process Control 32 (2016) 88–99. [25] C. Xu, S. Zhao, F. Liu, Distributed plant-wide process monitoring based on PCA with minimal redundancy maximal relevance, Chemometr. Intell. Lab. Syst. 169 (2017) 53–63. [26] M. Kano, S. Hasebe, I. Hashimoto, H. Ohno, Statistical process monitoring based on dissimilarity of process data, AIChE J. 48 (2002) 1231–1240. [27] K. Fukunaga, W.L.G. Koontz, Application of the Karhunen-Loeve expansion to feature selection and ordering, IEEE Trans, Comput. C-19 (1970) 311–318. [28] C. Zhao, F. Wang, M. Jia, Dissimilarity analysis based batch process monitoring using moving windows, AIChE J. 53 (2007) 1267–1277. [29] C. Zhao, F. Wnag, Y. Zhang, Nonlinear process monitoring based on kernel dissimilarity analysis, Control Eng. Pract. 17 (2009) 221–230. [30] C. Zhao, F. Gao, A sparse dissimilarity analysis algorithm for incipient fualt isolation with no priori fault information, Control Eng, Pract. 65 (2017) 70–82. [31] Q. Chen, R.J. Wynne, P. Goulding, D. Sandoz, The application of principal component analysis and kernel density estimation to enhance process monitoring, Control Eng. Pract. 8 (2000) 531–543, 17(2009)221-230. [32] Q. Chen, U. Kruger, A.T.Y. Leung, Regularised kernel density estimation for clustered process data, Control Eng. Pract. 12 (2004) 267–274. [33] A. Kraskov, H. Stögbauer, R.G. Andrzejak, P. Grassberger, Hierarchical clustering using mutual information, Europhys. Lett. 70 (2005) 278–284. [34] M.M. Rashid, J. Yu, A new dissimilarity method integrating multidimensional mutual information and independent component analysis for non-Gaussian dynamic process monitoring, Chemometr. Intell. Lab. Syst. 115 (2012) 44–58. [35] G. Brown, A. Pocock, M.-J. Zhao, M. Luján, Conditional likelihood maximisation: a unifying framework for information theoretic feature selection, J. Mach. Learn. Res. 13 (2012) 27–66. [36] S. Verron, T. Tiplica, A. Kobi, Fault detection and identification with a new feature selection based on mutual information, J. Process Control 18 (2008) 479–490. [37] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Comput. Chem. Eng. 17 (1993) 245–255. [38] D.M.J. Tax, R.P.W. Duin, Support vector data description, Mach. Learn. 54 (2004) 45–66. [39] Z. Ge, F. Gao, Z. Song, Batch process monitoring based on support vector data description method, J. Process Control 21 (2011) 949–959.