Fault detection and diagnosis via standardized k nearest neighbor for multimode process

Fault detection and diagnosis via standardized k nearest neighbor for multimode process

ARTICLE IN PRESS JID: JTICE [m5G;November 18, 2019;10:25] Journal of the Taiwan Institute of Chemical Engineers xxx (xxxx) xxx Contents lists avai...

3MB Sizes 0 Downloads 114 Views

ARTICLE IN PRESS

JID: JTICE

[m5G;November 18, 2019;10:25]

Journal of the Taiwan Institute of Chemical Engineers xxx (xxxx) xxx

Contents lists available at ScienceDirect

Journal of the Taiwan Institute of Chemical Engineers journal homepage: www.elsevier.com/locate/jtice

Fault detection and diagnosis via standardized k nearest neighbor for multimode process Bing Song, Shuai Tan, Hongbo Shi∗, Bo Zhao Key Laboratory of Advanced Control and Optimization for Chemical Processes of the Ministry of Education, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China

a r t i c l e

i n f o

Article history: Received 30 June 2019 Revised 28 August 2019 Accepted 19 September 2019 Available online xxx Keywords: Multimode process Fault detection Fault diagnosis Standardized distance k nearest neighbor

a b s t r a c t For the multimode process, the scale information of every single mode never be considered in the distance calculation between the data and its neighbors in k nearest neighbor (kNN). This work proposes a standardized kNN (SkNN) based fault detection method, where a standardized distance is developed to characterize the distance between the data and its neighbors taking the scale information within mode and mode to mode into consideration. In addition, compared with the kNN based fault diagnosis method, the importance of various neighbors is considered through constructing the weights and giving to different neighbors in the SkNN based fault diagnosis method. Moreover, when there is more than one fault variable, in order to eliminate the influence of other fault variables on current reconstructed variable and reduce the computational complexity, concurrent reconstructed strategy and greedy algorithm are used in the SkNN based fault diagnosis method. At last, an industrial case study is employed to prove the effectiveness and advantage of the proposed SkNN based fault detection and diagnosis method. © 2019 Published by Elsevier B.V. on behalf of Taiwan Institute of Chemical Engineers.

1. Introduction With the widespread application of distributed control systems and data acquisition systems, lots of data and characteristic parameters can be collected and stored. Data-driven fault detection and diagnosis methods have been rapidly developed [1–3]. Among the numerous data-driven methods, the largest number of research papers and application cases are multivariate statistical methods [4–8]. However, there are some assumptions on the data distribution in traditional multivariate statistical methods, such as Gaussian distribution, linear relationship, and single stable operation mode. For these assumptions, some new methods have been proposed and applied. Practical process data often follows different types of nonGaussian distributions. For the purpose of improving the monitoring performance of non-Gaussian process, feature extraction is performed using independent component analysis (ICA) [9]. To cope with the problem of mixing Gaussian and non-Gaussian distribution, Ge et al. proposed the independent component analysis-principal component analysis (ICA-PCA) method [10]. The nonlinear relationship between process variables is very common in actual industrial process. Many nonlinear fault detection methods have been proposed, of which the most widely used ones are



Corresponding author. E-mail address: [email protected] (H. Shi).

kernel based techniques [11,12]. The other nonlinear methods are local linearization based methods, such as semi-supervised weighted probabilistic principal component regression (SWPPCR) [13] and weighted linear dynamic system (WLDS) [14]. Fault detection for processes with dynamic characteristics requires the consideration of the serial correlation [15]. Taking the dynamics batch to batch and within batch into account simultaneously, Yao and Gao proposed a multiphase two-dimensional dynamic principal component analysis (2-D-DPCA) to model and monitor batch processes [16]. Multimode process fault detection methods can be divided into the multiple models method and the global model method. The multiple models method needs the mode division and the result determination rule [17,18]. The key of the global model is how to make one model describe all mode properties [19] or how to make one model contain all modes useful information [20]. Moreover, considering that between-mode analysis in the multimode process can reveal much interesting information, Zhao et al. developed a between-mode relative analysis algorithm for multimode analysis and monitoring [21]. Compared with the multiple models method and the global model method, the between-mode analysis can not only efficiently detect faults but also offer enhanced process understanding. Once the fault is detected successfully, the next step is to diagnose the fault variable. The contribution plot is the most common tool to diagnose fault variables, where the contribution of each variable to the monitoring statistic is computed [22]. Then, the variable with large contribution can be deemed as the root

https://doi.org/10.1016/j.jtice.2019.09.017 1876-1070/© 2019 Published by Elsevier B.V. on behalf of Taiwan Institute of Chemical Engineers.

Please cite this article as: B. Song, S. Tan and H. Shi et al., Fault detection and diagnosis via standardized k nearest neighbor for multimode process, Journal of the Taiwan Institute of Chemical Engineers, https://doi.org/10.1016/j.jtice.2019.09.017

JID: JTICE 2

ARTICLE IN PRESS

[m5G;November 18, 2019;10:25]

B. Song, S. Tan and H. Shi et al. / Journal of the Taiwan Institute of Chemical Engineers xxx (xxxx) xxx

cause of the fault. Given that the calculation of the contribution is converted from the process variable by matrix multiplication, it is possible that normal variables are smeared by fault variables [23]. Thus, normal variables may have relatively large contributions, and the fault diagnosis result would be inaccurate. Other than diagnosing fault variables based on the contribution plot, another fault diagnosis method is the reconstruction based method [24]. The core idea is that the reconstructed data would be close to the normal confidence region and the monitoring statistic of the reconstructed data never exceed its control limit when the true fault is assumed. Multivariate statistical fault detection and diagnosis methods usually have some assumptions on data distribution. It is difficult to satisfy these assumptions in practice. The k nearest neighbor (kNN) is a widely used learning algorithm, which has no requirements on data distribution. According to the distance, for a data, the kNN method search k neighbors in the training dataset which contains multiple classes. Then, the voting strategy and the neighbors label are used to determine the data category. The core idea of kNN is that if a great majority of neighbors belong to a class, then the data belongs to the same class. Up to now, the kNN method has been used for fault detection and diagnosis successfully [25,26]. From the fault detection viewpoint, the kNN method never need matrix transformation, thence, the information lose can be avoided. Meanwhile, the kNN method usually applies the Euclidean distance to measure the distance between the data and its k neighbors. When the kNN method is used for multimode processes fault detection, the scales of different variables withinmode and those of the same variable mode-to-mode are ignored. From the fault diagnosis viewpoint, the kNN based reconstruction method was proposed [27], which can diagnose fault variables by reconstructing maximize reduce index. However, various neighbors are regarded equally in the process of kNN based reconstruction, and the different importance of neighbors never be considered. For multimode process, even if the global standardization has been conducted, scales of different variables within mode and those of the same variable mode-to-mode are different. In order to calculate the distance between the data and its neighbors more accurate for the multimode process, a novel standardized distance considering scales of different variables within mode and those of the same variable mode-to-mode is proposed in the proposed SkNN based method. Moreover, all neighbors are important equally in the kNN based reconstruction method, and the different importance of various neighbors is ignored. The different importance of various neighbors is taken into account in the SkNN based reconstruction method. When the number of fault variables is more than one, the reconstructed variable in the kNN based reconstruction method would be affected by other fault variables, then the reconstruction is inaccurate. Moreover, in order to reduce the computational complexity, the relative exceeding degree is designed and the idea of greedy algorithm are used for SkNN based reconstruction method. The contributions of this work are listed as follows: (1) Compared with the original kNN based fault detection method, the standardized distance which contains the scale information withinmode and mode-to-mode is proposed to make the distance calculation between the data and its neighbors more accurate for the multimode process. (2) Considering the different importance of various neighbors based on the standardized distance, the reconstruction would be more reasonable. (3) In the process of reconstruction, to eliminate the influence of the remaining fault variables on the current reconstructed variable, the combination of variables are reconstructed concurrently. (4) To reduce the computational complexity, the relative exceeding degree is designed and the idea of greedy algorithm are used for reconstruction. The remainder of this work is shown as: The proposed SkNN based fault detection and diagnosis methods are elaborated in

Fig. 1. Scatter plot of normal and fault data.

Section 2. Results and analysis of case study are given in Section 3. Section 4 gives out some conclusions of this work. 2. SkNN based fault detection and diagnosis 2.1. SkNN based fault detection Essentially, the kNN method can be used for both classification and regression. Given that fault detection can be seen as a twoclass classification problem for normal data and fault data, the kNN method offers a solution to detect fault. Let X ∈ Rn × m is the training dataset which only contains the normal data, and xt is the testing data. No matter xt is normal or fault, search its neighbors from X ∈ Rn × m . On one hand, if data xt is fault for X ∈ Rn × m , xt and its neighbors belong to different categories. As a result, xt is located far from its neighbors. On the other hand, if data xt is normal for X ∈ Rn × m , xt and its neighbors belong to the same category. As a result, xt is close to its neighbors. Therefore, the accumulated distance between the fault data and its neighbors is much larger than that between the normal data and its neighbors. According to the difference in the accumulated distance between normal data and fault data, the kNN method can be used for fault detection. To show this intuitively, Fig. 1 gives the scatter plot of normal data and fault data. The blue circle denotes the normal data in the training dataset, the red square denotes a normal data, and the red circle denotes a fault data. Determine the number of neighbors is 2, then neighbors for the red square and the red circle can be obtained, and corresponding distances d1, d2, d3, d4 can be calculated. As shown in Fig. 1, it can be concluded that d1 + d2 < d3 + d4. Therefore, the accumulated distance between the data and its neighbors can be used to judge whether the data is fault or not. When the kNN method is used for regression, it is generally that the average value of neighbors outputs are regarded as the regression prediction. In this work, the normal value of the fault variable needs to be reconstructed for fault diagnosis. Considering that neighbors are all normal, each variable value of neighbors is normal. Therefore, the average value of neighbors corresponding variable can be regarded as the normal value of the reconstructed fault variable. The kNN based fault detection method usually uses the Euclidean distance to compute the distance between the data and its neighbors. In practice, the scales of various variables are different. For the single mode dataset, the scale can be eliminated through the global standardization method z-score. For the

Please cite this article as: B. Song, S. Tan and H. Shi et al., Fault detection and diagnosis via standardized k nearest neighbor for multimode process, Journal of the Taiwan Institute of Chemical Engineers, https://doi.org/10.1016/j.jtice.2019.09.017

JID: JTICE

ARTICLE IN PRESS

[m5G;November 18, 2019;10:25]

B. Song, S. Tan and H. Shi et al. / Journal of the Taiwan Institute of Chemical Engineers xxx (xxxx) xxx

3

f

f

fth neighbor xi is replaced by its neighborhood mean me[N (xi )] to enhance the robustness.

D



xi , xif





=



xi − xif

T 



xi − xif .

(3) f SD(xi , xi ), ( f

After each standardized distance = 1, 2, · · · k ) has been computed, the accumulated distance can be calculated as

AD(xi ) =

k





SD xi , xif .

(4)

f =1

In the SkNN based fault detection method, AD(xi ),(i = 1, 2, , n) is deemed as the monitoring statistic. The control limit LimAD of AD can be determined by the kernel density estimation (KDE). The univariate kernel estimator is shown as n 1

fˆu (AD ) = K nu



i=1

Fig. 2. Scatter plot of multimode data before and after the global standardization.

multimode dataset, since the mean and standard deviation of the multimode dataset are applied, the scales of different variables within mode and those of the same variable mode-to-mode still are different even if the z-score method has been conducted. Fig. 2 gives the scatter plot of multimode data before and after the global standardization. The blue circle and red circle denote the mode 1 data and mode 2 data before the global standardization, respectively. The green circle and black circle denote the mode 1 data and mode 2 data after the global standardization, respectively. As presented in this figure, the scales of x1(red circle) and x2 (red circle) in mode 2, and those of x2 in mode 1 (blue circle) and x2 in mode 2 (red circle) are different. Even though the z-score global standardization is conducted, the scales of x1(black circle) and x2 (black circle) in mode 2, and those of x2 in mode 1 (green circle) and x2 in mode 2 (black circle) are still different. As analyzed above, the standardization method z-score is not sufficient for the multimode dataset. In order to obtain the satisfactory model, the local scale information of variables should be taken into consideration. For the multimode dataset X ∈ Rn × m (n is the number of data, m is the number of variables), from a global aspect, to make each variable with zero mean and unit variance, the z-score is used to eliminate the scale to a certain extent as

X − me(X ) X¯ = std (X )

(1)

where me(X) and std(X) denote the mean and the standard deviation of X, respectively. Considering scales of different variables within mode and those of the same variable mode-to-mode in X¯ , a standardized distance is proposed as

   {xi − meN (x f )}T {xi − meN (x f )}   i f SD xi , xi =    i std[N (xi )]std N (xif )

(2)

f where xi is a data in X¯ , xi is f th neighbor of xi according to the Euclidean distance, N(xi ) is the neighborhood including k neighbors of xi , me[N(xi )] is the mean of N(xi ), std[N(xi )] is the standard deviation of N(xi ).

Comparison 1. There are two differences between the proposed standardized distance and the Euclidean distance in Eq. (3). First, f the standard deviation information std[N(xi )] and std[N (xi )] are contained to take the scale information into account. Second, the

AD − ADi u



(5)

where fˆu (AD ) denotes the estimated probability density of AD, u denotes the smoothing parameter, n denotes the number of data, K denotes the kernel function, and the Gaussian kernel is used in this work. In the online phase, for a testing data xt ∈ Rm × 1 , the first step is to preprocess it using the mean and the standard deviation of the multimode training dataset X. Then, find k neighbors for xt in f X¯ . As a result, SD(xt , xt ) and AD(xt ) can be obtained. In the prof

cess of computing SD(xt , xt ), since the neighborhood of each normal data has been obtained, it is easy to calculate and the computational complexity is greatly reduced. If AD(xt ) > LimAD , it can be concluded that xt is a fault data. Otherwise, xt is a normal data. 2.2. SkNN based fault diagnosis Once fault data has been detected, fault diagnosis is needed to find fault variables which are the root cause of fault condition. Through the reconstruction, the kNN based fault diagnosis method has been proposed in [27]. However, in kNN based fault diagnosis method, each neighbor is treated equally in the reconstruction process, and different importance degree of various neighbors is ignored. Under the case of only one fault variable, a SkNN based reconstruction method considering different importance degree of various neighbors is proposed for fault diagnosis as

Step 1: For the fault data xt = [xt1 ,xt2 ,, xtm ]T , every variable is reconstructed as

x¨ti =

1 k

1

f =1 SD(

 f xt , xt



k

1

f =1

SD(xt , xt )

)

f



xL (x ) t f

 (6) i



where xt is the data which removes the reconstructed f





variable from xt , xt is the fth neighbor of xt in X which  removes the reconstructed variable from X¯ , L f ( xt ) is the label of the fth neighbor, [x  ]i is the ith variable of the Lf ( xt )

data x



Lf ( xt ) 



f

in X¯ , SD( xt , xt ) is the standardized distance f

between xt and xt , x¨ti is the reconstructed ith variable. Step 2: After each variable x¨t1 , x¨t2 , · · · , x¨tm has been reconstructed, replace corresponding variable at a time using reconstructed variable in xt . Then, m new reconstructed data ... ... ... x t1 = [x¨t1 , xt2 , · · · , xtm ], x t2 = [xt1 , x¨t2 , · · · , xtm ], , x tm = [xt1 , xt2 , · · · , x¨tm ] can be obtained. ... Step 3: For x ti (i = 1, 2, · · · , m ), compute the accumulated distance ... AD( x ti ) according to Eq. (4).

Please cite this article as: B. Song, S. Tan and H. Shi et al., Fault detection and diagnosis via standardized k nearest neighbor for multimode process, Journal of the Taiwan Institute of Chemical Engineers, https://doi.org/10.1016/j.jtice.2019.09.017

ARTICLE IN PRESS

JID: JTICE 4

[m5G;November 18, 2019;10:25]

B. Song, S. Tan and H. Shi et al. / Journal of the Taiwan Institute of Chemical Engineers xxx (xxxx) xxx

Fig. 4. Plot of neighbors searching for [3, 4].

Fig. 3. Plot of neighbors searching for [12, 3, 4].

... Step 4: Judge AD( x ti ), (i = 1, 2, · · · , m ) whether exceeds the control limit LimAD or not. ... Step 5: If there is a AD( x ti ), (i = 1, 2, · · · , m ) not exceeding the control limit, then, corresponding variable can be deemed as the fault variable. For the situation that there is only one fault variable, the reconstructed result would be accurate according to above process. Under the case of more than one fault variables, the reconstruction is conducted through the combination of reconstructed single variable in kNN based fault diagnosis method. Then, the reconstructed result would be affected by other fault variables, thus, the reconstruction result would be inaccurate. For visual explanation, a dataset is constructed as

Xe = [x1 , x2 , x3 , x4 ] ∈ R50×4 x1 : N ( 1, 1 ), x2 : N ( 2, 1 ) x3 : N ( 3, 1 ), x4 : N ( 4, 1 )

(7)

where N(·) represents the normal distribution. It is easy to know that the center of the dataset is [1, 2, 3, 4]. Design a fault data as [11, 12, 3, 4], where a step of 10 occurs in both x1 and x2 . In order to reconstruct x1 in this fault data, the data [12, 3, 4] searches its neighbors in the dataset [x2 ,x3 ,x4 ]. Suppose the number of neighbors is determined as 5, the neighbor searching process for [12, 3, 4] is plotted in Fig. 3, where the blue circle represents normal data in [x2 ,x3 ,x4 ] and the red asterisk represents [12, 3, 4]. As shown in this figure, the selected neighbors is mainly influenced by x2 , which is a fault variable in the fault data [11, 12, 3, 4]. It can be concluded that the selected neighbors are inaccurate, then the reconstructed x¨1 would be inaccurate. When x1 and x2 are reconstructed concurrently, the data [3, 4] searches its neighbors in the dataset [x3 ,x4 ]. Fig. 4 gives the neighbor searching process for [3, 4]. As plotted in this figure, the location of [3, 4] for [x3 ,x4 ] is similar to that of [1, 2, 3, 4] for [x1 ,x2 ,x3 ,x4 ], and the selected neighbors are accurate. Through the above analysis, it can be concluded that the reconstruction of single variable would be affected by other fault variables when there are more than one fault variable in the process. In order to obtain the accurate result, it is necessary to reconstruct the combination of variables concurrently. Because the AD statistic is a form of distance sum, when there are more than one fault variables in the process, although reconstructing only one variable cannot make the statistic return to normal value, the exceeding degree of reconstructed data would be reduced in contrast to original fault data. Therefore, the variable can be judged as normal when the exceeding degree of reconstructed data increases. It

should be emphasized that the reduction of the exceeding degree does not mean that the variable is a fault variable. In this work, the relative exceeding degree (RED) is defined as

RED =

nt

... [AD( x t ) − AD(xt )]

(8)

t=1

where nt is the number of fault data in the testing dataset. Through the defined relative exceeding degree, potential fault variables can be judged. Specifically, if the relative exceeding degree is negative, the reconstructed variable is judged as potential fault variable. On the contrary, the reconstructed variable is judged as normal. As a result, in the proposed SkNN based reconstruction method, the reconstruction of multiple variables is conducted concurrently on potential fault variables instead of all variables. Considering that the number of potential fault variables is smaller than that of all variables, the computational complexity can be reduced. However, the computational complexity would still be large when the number of potential fault variables is large. In order to reduce computational complexity further, the idea of greedy algorithm is used for the reconstruction of multiple poten... tial fault variables. Suppose the reconstructed x ts can obtain the minimum relative exceeding degree, the variable xts can be determined as fault variable. Assume the number of fault variables is 2, repeat step 1–5 and reconstruct the combination of two variables [xts ,xti ],(i = s) concurrently. If there is an accumulated distance not exceeding the control limit, then, corresponding two variables are fault variables. Otherwise, increase the number of reconstructed variables until a combination of fault variables is found. Comparison 2. There are three main differences between the proposed SkNN based and kNN based reconstruction method. (1) Different importance of various neighbors is taken into account in SkNN based reconstruction method, and that is neglected in kNN based reconstruction method. (2) Under the case of more than one fault variables, the reconstruction is conducted through the combination of reconstructed single variable in kNN based reconstruction method, and that is conducted through repeating step 1–5 for the combination of potential fault variable concurrently. (3) To reduce the computational complexity father, the idea of greedy algorithm is used in SkNN based reconstruction method. Remark. There is a key parameter k in the proposed SkNN based fault detection and diagnosis method. When the value of k is too small, it is equivalent to detect and diagnose fault based on the

Please cite this article as: B. Song, S. Tan and H. Shi et al., Fault detection and diagnosis via standardized k nearest neighbor for multimode process, Journal of the Taiwan Institute of Chemical Engineers, https://doi.org/10.1016/j.jtice.2019.09.017

JID: JTICE

ARTICLE IN PRESS

[m5G;November 18, 2019;10:25]

B. Song, S. Tan and H. Shi et al. / Journal of the Taiwan Institute of Chemical Engineers xxx (xxxx) xxx

5

training data in a small neighborhood, and the result is very sensitive to neighbors. In order to ensure the robustness of detection and diagnosis, k should not be too small. Otherwise, the statistical fluctuations would affect results significantly. Moreover, if k is too large, it is equivalent to use the training data in a large neighborhood for fault detection and diagnosis, and the training data far from the testing data will also work on the result, Then, the result would be inaccurate. In particular, if the number of neighbors is selected as the number of samples in the training dataset, no useful information can be provided based on the neighbors. Usually, the cross-validation method is used to select the optimal k value. In this work, the k value is determined according to the 10-fold cross-validation. 3. Case study Tennessee Eastman (TE) process is the simulation of an actual industrial process, and has been widely used in fault detection and diagnosis [28,29]. According to the mass ratio of two products and the production rate, the TE process has six modes. In this work, mode 1, mode 3, mode 4 and mode 6 of the TE process are simulated. For 12 manipulated variables, the compressor recycle valve in mode 1 and mode 4 always equals 1, the stripper steam valve in mode 1, mode 3, mode 4 and mode 6 always equals 1, the agitator speed value in mode 1, mode 3, mode 4 and mode 6 always equals 100. Therefore, these 3 manipulated variables are not selected as monitoring variables. Thence, the 31 variables containing 22 continuous process variables and 9 manipulated variables are selected as monitoring variables. The normal condition and two fault conditions are simulated and tested to show the superiority of the proposed SkNN based fault detection and diagnosis method. The normal training dataset: TE runs under normal condition, then 250 mode 1 data, 250 mode 3 data, 250 mode 4 data and 250 mode 6 data are collected. The first testing dataset: TE runs under normal condition of mode 3 from data 1 to data 200, and two sensors fault occurs in

Fig. 5. Scatter plot of the training dataset.

the reactor pressure with the amplitude 5 kPa from data 200 to data 10 0 0 and the separator temperature with the amplitude 0.5◦ C from data 500 to data 10 0 0, respectively. The second testing dataset: TE runs under normal condition of mode 1 from data 1 to data 200, and an actuator fault occurs in the reactor cooling water valve from data 200 to data 10 0 0. To prove the superiority and effectiveness of the proposed SkNN based method, the widely used PCA and kNN methods are compared. In the kNN based method, the number of neighbors is selected as 8 according to the 10-fold cross-validation. For fair comparisons, the number of neighbors in the proposed SkNN based method is also determined as 8. Similar to SkNN based method, the AD statistic is used as the monitoring statistic in kNN based fault detection method. The confidence level of PCA, kNN and SkNN methods are determined as 99%. Fig. 5 gives the scatter plot

Fig. 6. Fault detection results of PCA, kNN and SkNN in the first testing dataset.

Please cite this article as: B. Song, S. Tan and H. Shi et al., Fault detection and diagnosis via standardized k nearest neighbor for multimode process, Journal of the Taiwan Institute of Chemical Engineers, https://doi.org/10.1016/j.jtice.2019.09.017

JID: JTICE 6

ARTICLE IN PRESS

[m5G;November 18, 2019;10:25]

B. Song, S. Tan and H. Shi et al. / Journal of the Taiwan Institute of Chemical Engineers xxx (xxxx) xxx

of two variables (recycle flow and reactor feed rate) for the training dataset, where different colors represent different mode data. As shown in this figure, it can be concluded that the training dataset is multimodality. In the first testing dataset, two sensors fault occurs in mode 3. Specifically, the display value of the reactor pressure is larger than the actual value 5 kPa from data 200 to data 10 0 0, and the display value of the separator temperature is larger than the actual value 0.5 ◦ C from data 500 to data 10 0 0. Fig. 6 presents fault detection results of PCA, kNN and SkNN. The monitoring statistics of all normal testing data are below than the control limit in PCA(T2 ), kNN and SkNN. More than 30% fault data are wrongly classified as normal data by the kNN method and the SPE statistic of PCA with the fault detection rate 67.125% and 62.5%. In contrast to the PCA and kNN methods, the detection result of SkNN has been greatly improved, where the fault detection rate is 96.25%. Reconstruct each single variable based on the proposed SkNN based fault diagnosis method. Unfortunately, the monitoring statistic still exceeds the control limit with only one variable reconstructed. Then, conduct the proposed SkNN based fault diagnosis method to reconstruct two variables concurrently. Fig. 7 is the stem plot of monitoring statistic based on original and reconstructed reactor pressure variable and separator temperature variable. In this figure, the AD value of fault data and normal data is at the same level using reconstructed reactor pressure variable and separator temperature variable. Compared with Fig. 6(d), there is no phenomenon of continuous alarm, and the reactor pressure and the separator temperature can be regarded as fault variables. In the second testing dataset, an actuator fault occurs in mode 1, where the sticking happens in the reactor cooling water valve. Once this fault occurs, due to the existence of closed feedback loop, three variables including two process variables (reactor temperature and reactor cooling water outlet temperature) and one manipulated variable (reactor cooling water flow) fluctuate intensely from data 200 to data 10 0 0. Fig. 8 plots the fault detection results of PCA, kNN and SkNN. The false detection rates of PCA, kNN

Fig. 7. Stem plot of monitoring statistic based on original and reconstructed reactor pressure and separator temperature in the second testing dataset.

and SkNN are lower than 5%, and can be acceptable. In PCA(T2 ), PCA(SPE) and kNN, for 800 fault testing data, the monitoring statistics of almost all fault data are below than the control limit, and the fault detection rates are 0, 0.125% and 0.5%, respectively. Compared with the PCA and kNN methods, all fault data are detected successfully by the SkNN based method with the fault detection rate 100%. Fig. 9 is the trajectory of the reactor temperature, the reactor cooling water outlet temperature and the reactor cooling water flow in the second testing dataset. As presented in Fig. 9(a)–(c), the variance of data 200 to data 10 0 0 is larger than that of data 1 to data 200. Moreover, every single variable and the combination of

Fig. 8. Fault detection results of PCA, kNN and SkNN in the second testing dataset.

Please cite this article as: B. Song, S. Tan and H. Shi et al., Fault detection and diagnosis via standardized k nearest neighbor for multimode process, Journal of the Taiwan Institute of Chemical Engineers, https://doi.org/10.1016/j.jtice.2019.09.017

JID: JTICE

ARTICLE IN PRESS

[m5G;November 18, 2019;10:25]

B. Song, S. Tan and H. Shi et al. / Journal of the Taiwan Institute of Chemical Engineers xxx (xxxx) xxx

7

Fig. 9. Fault variable trajectory in the second testing dataset.

testing dataset, which corresponds to the trajectory of variables in Fig. 9. 4. Conclusion

Fig. 10. Stem plot of monitoring statistic based on original and reconstructed reactor temperature, reactor cooling water outlet temperature and reactor cooling water flow in the second testing dataset.

any two variables are reconstructed using the SkNN based fault diagnosis method. Through reconstructing every single variable and the combination of any two variables cannot make the monitoring statistics below than the control limit. Thus, implement the proposed SkNN based fault diagnosis method to reconstruct three variables concurrently. Fig. 10 is the stem plot of monitoring statistic based on original and reconstructed reactor temperature variable, reactor cooling water outlet temperature variable and reactor cooling water flow variable. In this figure, there is no obvious difference between the AD value of fault data and that of normal data using reconstructed reactor temperature variable, reactor cooling water outlet temperature variable and reactor cooling water flow variable. Therefore, these three variables are diagnosed as fault variables in the second

This work proposes a novel SkNN based fault detection and diagnosis method. Compared with the kNN based method, the proposed SkNN based fault detection method considers scales of different variables within mode and those of the same variable modeto-mode for multimode process. Thence, the fault detection model would be more accurate. In addition, the SkNN based fault diagnosis method treats various neighbors with different importance in the process of reconstruction, which is more reasonable than that with the same importance. In order to eliminate the impact of the remaining fault variables on the current reconstructed variables under the case of more than one fault variables, the combination of variables are reconstructed concurrently. Moreover, to reduce computational complexity, the idea of greedy algorithm is used. Finally, the proposed SkNN based method is tested under an industrial process. In contrast to kNN and PCA methods, the results of SkNN can show the superiority. However, the proposed SkNN method is unsupervised, which has to search the entire historical database for every testing data and may be relatively resource demanding. The future work is to extend the SkNN based method to semi-supervised or supervised form. Acknowledgements This research is supported by the National Natural Science Foundation of China (nos. 61673173, 61703161), Fundamental Research Funds for the Central Universities (nos. 222201717006, 222201714031), National Natural Science Foundation of Shanghai (no. 19ZR1473200). References [1] Zhu JL, Yao Y, Li DW, Gao FR. Monitoring big process data of industrial plants with multiple operating modes based on Hadoop. J Taiwan Inst Chem Eng 2018;91:10–21.

Please cite this article as: B. Song, S. Tan and H. Shi et al., Fault detection and diagnosis via standardized k nearest neighbor for multimode process, Journal of the Taiwan Institute of Chemical Engineers, https://doi.org/10.1016/j.jtice.2019.09.017

JID: JTICE 8

ARTICLE IN PRESS

[m5G;November 18, 2019;10:25]

B. Song, S. Tan and H. Shi et al. / Journal of the Taiwan Institute of Chemical Engineers xxx (xxxx) xxx

[2] Ge ZQ. Process data analytics via probabilistic latent variable models: a tutorial review. Ind Eng Chem Res 2018;57:12646–61. [3] Zhao CH, Gao FR. Fault subspace selection approach combined with analysis of relative changes for reconstruction modeling and multifault diagnosis. IEEE Trans Control Syst Technol 2016;24:928–39. [4] Song B, Zhou XG, Shi HB, Tao Y. Performance-indicator-oriented concurrent subspace process monitoring method. IEEE Trans Ind Electron 2019;66:5535–45. [5] Zhou L, Zheng JQ, Ge ZQ. Multimode process monitoring based on switching autoregressive dynamic latent variable model. IEEE Trans Ind Electron 2018;65:8184–94. [6] Lv ZM, Yan XF. Hierarchical support vector data description for batch process monitoring. Ind Eng Chem Res 2016;55:9205–14. [7] Zhou L, Chen JH, Jie J, Song ZH. Multiple probability principal component analysis for process monitoring with multi-rate measurements. J Taiwan Inst Chem Eng 2019;96:18–28. [8] Yu WK, Zhao CH. Recursive exponential slow feature analysis for fine-scale adaptive processes monitoring with comprehensive operation status identification. IEEE Trans Ind Inform 2019;15:3311–23. [9] Zhu JL, Ge ZQ, Song ZH. Non-Gaussian industrial process monitoring with probabilistic independent component analysis. IEEE Trans Autom Sci Eng 2017;14:1309–19. [10] Ge ZQ, Song ZH. Process monitoring based on independent component analysis-principal component analysis (ICA-PCA) and similarity factors. Ind Eng Chem Res 2007;46:2054–63. [11] Zhang YW, Zhou H, Qin SJ, Chai TY. Decentralized fault diagnosis of large-scale processes using multiblock kernel partial least squares. IEEE Trans Ind Inform 2010;6:3–10. [12] Wang G, Jiao JF, Yin S. A kernel direct decomposition-based monitoring approach for nonlinear quality-related fault detection. IEEE Trans Ind Electron 2017;13:1565–74. [13] Yuan XF, Ge ZQ, Huang B, Song ZH. A probabilistic just-in-time learning framework for soft sensor development with missing data. IEEE Trans Control Syst Technol 2017;25:1124–32. [14] Yuan XF, Yang YL, Yang CH, Ge ZQ, Song ZH, Gui WH. Weighted linear dynamic system for feature denoteation and soft sensor application in nonlinear dynamic industrial processes. IEEE Trans Ind Electron 2018;65:1508–17. [15] Ge ZQ, Chen X. Dynamic probabilistic latent variable model for process data modeling and regression application. IEEE Trans Control Syst Technol 2019;27:323–31.

[16] Yao Y, Gao FR. Multivariate statistical monitoring of multiphase two-dimensional dynamic batch processes. J Process Control 2009;19:1716–24. [17] Wang FL, Tan S, Peng J, Chang YQ. Process monitoring based on mode identification for multi-mode process with transitions. Chemom Intell Lab Syst 2012;110:144–55. [18] Zhang SM, Zhao CH, Gao FR. Two-directional concurrently strategy of mode identification and sequential phase division for multimode and multiphase batch process monitoring with uneven lengths. Chem Eng Sci 2018;178:104–17. [19] Song B, Ma YX, Shi HB. Multimode process monitoring using improved dynamic neighborhood preserving embedding. Chemom Intell Lab Syst 2014;135:17–30. [20] Choi SW, Martin EB, Morris AJ, Lee IB. Fault detection based on a maximum– likelihood principal component analysis (PCA) mixture. Ind Eng Chem Res 2005;44:2316–27. [21] Zhao CH, Wang W, Qin Y, Gao FR. Comprehensive subspace decomposition with analysis of between-mode relative changes for multimode process monitoring. Ind Eng Chem Res 2015;54:3154–66. [22] Peng KX, Zhang K, Li G. Quality-related process monitoring based on total kernel PLS model and its industrial application. Math Probl Eng 2013;4:1–14. [23] Qin SJ, Vall S, Piovoso MJ. On unifying multiblock analysis with application to decentralized process monitoring. J Chemometr 2001;15:715–42. [24] Dunia R, Qin SJ. Subspace approach to multidimensional fault identification and reconstruction. AIChE J 1998;44:1813–31. [25] Martin EB, Morris AJ. Non-parametric confidence bounds for process performance monitoring charts. J Process Control 1996;6:349–58. [26] Zhang SM, Zhao CH, Wang S, Wang FL. Pseudo time-slice construction using variable moving window-k nearest neighbor (VMW-kNN) rule for sequential uneven phase division and batch process monitoring. Ind Eng Chem Res 2017;56:728–40. [27] Wang GZ, Liu JC, Li Y. Fault diagnosis using kNN reconstruction on MRI variables. J Chemometr 2015;29:399–410. [28] Song B, Shi HB. Fault detection and classification using quality supervised double-layer method. IEEE Trans Ind Electron 2018;65:8163–72. [29] Wang Y, Yao Y, Zheng Y, Wong DSH. Multi-objective monitoring of closed-loop controlled systems using adaptive Lasso. J Taiwan Inst Chem Eng 2015;56:84–95.

Please cite this article as: B. Song, S. Tan and H. Shi et al., Fault detection and diagnosis via standardized k nearest neighbor for multimode process, Journal of the Taiwan Institute of Chemical Engineers, https://doi.org/10.1016/j.jtice.2019.09.017