Batch process fault detection and identification based on discriminant global preserving kernel slow feature analysis

Batch process fault detection and identification based on discriminant global preserving kernel slow feature analysis

ISA Transactions 79 (2018) 108–126 Contents lists available at ScienceDirect ISA Transactions journal homepage: www.elsevier.com/locate/isatrans Re...

3MB Sizes 0 Downloads 124 Views

ISA Transactions 79 (2018) 108–126

Contents lists available at ScienceDirect

ISA Transactions journal homepage: www.elsevier.com/locate/isatrans

Research article

Batch process fault detection and identification based on discriminant global preserving kernel slow feature analysis

T

Hanyuan Zhanga, Xuemin Tianb, Xiaogang Dengb,∗, Yuping Caob a b

School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, Shandong, China College of Information and Control Engineering, China University of Petroleum (East China), Qingdao 266580 Shangdong, China

A R T I C LE I N FO

A B S T R A C T

Keywords: Batch process Process monitoring Nonlinear fault variable identification Slow feature analysis Discriminant analysis Nonlinear biplot

As an attractive nonlinear dynamic data analysis tool, global preserving kernel slow feature analysis (GKSFA) has achieved great success in extracting the high nonlinearity and inherently time-varying dynamics of batch process. However, GKSFA is an unsupervised feature extraction method and lacks the ability to utilize batch process class label information, which may not offer the most effective means for dealing with batch process monitoring. To overcome this problem, we propose a novel batch process monitoring method based on the modified GKSFA, referred to as discriminant global preserving kernel slow feature analysis (DGKSFA), by closely integrating discriminant analysis and GKSFA. The proposed DGKSFA method can extract discriminant feature of batch process as well as preserve global and local geometrical structure information of observed data. For the purpose of fault detection, a monitoring statistic is constructed based on the distance between the optimal kernel feature vectors of test data and normal data. To tackle the challenging issue of nonlinear fault variable identification, a new nonlinear contribution plot method is also developed to help identifying the fault variable after a fault is detected, which is derived from the idea of variable pseudo-sample trajectory projection in DGKSFA nonlinear biplot. Simulation results conducted on a numerical nonlinear dynamic system and the benchmark fed-batch penicillin fermentation process demonstrate that the proposed process monitoring and fault diagnosis approach can effectively detect fault and distinguish fault variables from normal variables.

1. Introduction Batch and fed-batch processes have become increasingly important for producing high value-added products, such as special chemicals, pharmaceuticals, semiconductors and biology industry outputs. To ensure the batch process safety and product quality, efficient and reliable process monitoring and fault diagnosis technologies have attracted a lot of attention from researchers and engineers. With wide application of computer control systems [1], rich process data are collected and stored at the batch process database, which facilitates rapid development of data-driven multivariate statistical process monitoring methods for batch process monitoring [2,3]. Multiway principal component analysis (MPCA), multiway partial least squares (MPLS) and multiway independent component analysis (MICA) are classical multivariate statistical methods for monitoring batch processes. In recent years, many extensions [4–8] of MPCA, MPLS and MICA have been developed to improve process monitoring performance by taking different process characteristics into consideration. Because batch processes are often characterized by high nonlinearity [9], multiway kernel principal ∗

component analysis (MKPCA) [10], multiway kernel partial least squares (MKPLS) [11] and multiway kernel independent component analysis (MKICA) [12,13] have been proposed for process monitoring. Moreover, batch processes are inherent dynamic time-varying over time and vary from one status to another due to the changes in the process operation conditions, such as temperature drift in the experimental setup, decreasing reservoirs in (bio)chemical reactors and process disturbances varying during the batch run. Considering the dynamic behaviors of batch processes, multiway dynamic PCA (MDPCA) [14], multiway dynamic PLS (MDPLS) methods [15] and other related techniques [16,17] have been introduced to derive latent variables (LVs) associated with not only the current observations but also the preceding observations using lagged measurements over a specific time period. Alternatively, state-space models based on subspace identification methods [18,19] have been applied to monitor batch process with dynamic behaviors because the latent states in the state-space formulation evolve over time in a Markovian fashion, thereby being temporally correlated. The inherent time-varying dynamics of batch processes can be

Corresponding author. E-mail addresses: [email protected], [email protected] (H. Zhang), [email protected], [email protected] (X. Tian), [email protected], [email protected] (X. Deng), [email protected] (Y. Cao). https://doi.org/10.1016/j.isatra.2018.05.005 Received 29 October 2017; Received in revised form 1 May 2018; Accepted 8 May 2018 0019-0578/ © 2018 ISA. Published by Elsevier Ltd. All rights reserved.

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

nonlinear characteristic of process data, Zhang et al. [36] proposed kernel slow feature discriminant analysis (KSFDA) to identify the pattern of fault data, which combines discriminative information with KSFA learning. Furthermore, Zhang et al. [37] presented a nonlinear process fault detection method based on KSFDA and support vector data description. However, the KSFDA based methods used for continuous process monitoring are local discriminant approaches and they fail to properly consider the global structure information of process data. In the ideal case, neighboring points in the same class will be projected to a single point in low-dimensional space. Thus, KSFDA only captures the local structure of data and ignores the global geometrical properties. After a fault is detected, another critical issue worthy noting is the fault identification [38] which is vital for guiding the repair of the detected fault. The existing KSFA or KSFDA based monitoring methods generally take the fault detection as the primary target, but they seldom investigate the challenging problem of nonlinear fault identification because the information of the weights or the contributions of the original variables are lost when the original input space is transformed into a higher-dimensional feature space. In this sense, the classical fault variables isolation methods which isolate fault variables by comparing the contribution of individual process variable to the monitoring statistic, like contribution plots [39] and reconstruction-based contribution (RBC) plots [40], are unfeasible for kernel based SFA methods. Although fault pattern recognition methods [41–43] that adopt pattern matching techniques to determine which kind of fault pattern in historical database is similar to the occurred fault have the ability to overcome this limitation, they encounter two disadvantages: 1) the implementation and interpretation of the final results is not straightforward; 2) most of them do not permit to graphically visualize the relation between original variables and final models. Recently, the principle of nonlinear biplot [44,45] has been applied to recover and visualize the importance of the original variable to the optimal class separation in kernel based methods [46–49] by solving the problem of discriminant analysis. To deal with the batch process monitoring problem, the idea of nonlinear biplot is exploited to visualize the original variables with the highest discriminant power to the kernel partial least squares discriminant analysis model [50]. When the batch process monitoring scheme has a clear unsupervised nature, Vitale et al. [51] further extended nonlinear biplot to unsupervised framework and proposed KPCA based pseudo-sample contribution plots to isolate fault variable. Although the current extensions of nonlinear biplot have achieved great success in recovering the discriminant power of original variable to the kernel based methods for classification and discriminant purpose, there exist some open problems or controversial issues to motivate the further study. One important open problem is: the discriminant power of the original variable is defined qualitatively instead of calculating that of the original variable quantitatively, which may result in degraded fault variable identification performance. Therefore, an objective criterion is urgent needed to evaluate the discriminant power of the original variable to kernel based discriminant analysis methods. Against this background, this paper proposes a new batch process monitoring method, called as discriminant global preserving kernel slow feature analysis (DGKSFA), by integrating discriminant analysis and GKSFA model closely. To the best of our knowledge, we are the first to combine discriminant analysis with GKSFA model for batch process monitoring. The proposed DGKSFA can make use of the class discriminatory information of batch process as well as preserve global and local structure information of process data. One monitoring statistic is built based on the extracted optimal kernel feature vectors of test data and reference data for fault detection. After a fault is detected, a novel nonlinear contribution plot naturally associated with DGKSFA model is developed to address the important issue of fault variable isolation in nonlinear process monitoring methods. Inspired by the idea of nonlinear biplot, an objective criterion is defined to calculate the discriminant power of the original variable to the optimal separation of

modeled by varying the underlying parameters, which are referred to as the underlying driving forces and vary slowly [20,21]. However, the conventional dynamic monitoring approaches [14–19] for batch processes fail to extract the underlying driving forces because the extracted LVs or the latent states in different time intervals are still assumed to be temporally independent [22–24]. From this point of view, the existing batch process dynamic monitoring approaches are inadequate in capturing the inherently time-varying dynamics of batch process, which may lead to unsatisfactory monitoring performance. More recently, a new dynamic data analytic technology called slow feature analysis (SFA) has been proposed to extract mutually independent slowly varying features known as slow features (SFs) from process data [25]. Wiskott [26] demonstrated that SFA can be used to estimate the underlying driving forces of a nonstationary time series. In recent years, SFA and its extensions have been discussed extensively by researchers and successfully applied to monitoring continuous industrial processes [22,24,27–30]. The SFA based methods have also exhibited tremendous potential for batch process monitoring because of their capability of extracting slowly varying SFs on behalf of the underlying driving forces of batch processes. Considering that batch processes exhibit timevarying dynamic variations due to the underlying driving forces, Zhang et al. [20] proposed global preserving kernel slow feature analysis (GKSFA) to handle the time-varying dynamic characteristic of process data as well as preserve the global and local data structure during dimensionality reduction. Moreover, Zhang et al. [31] also discussed another kernel SFA-based monitoring method to characterize the underlying driving forces within a batch and suppress the stochastic variations and deviations among different batches for batch process monitoring. To reveal and utilize the information of multiplicity in multiphase batch process monitoring, Zhang et al. [21] proposed a new index called the phase recognition factor (PRF) to automatically achieve phase division and a novel process monitoring strategy based on the global preserving statistics slow feature analysis (GSSFA) model is developed to monitor batch processes with transitions after phase division. In comparison with the current batch process dynamic monitoring methods, the process monitoring methods developed in [20, 21 31] are more effective and adequate to mine time-varying dynamic information from batch process data because these methods offer a superior means for exploiting the underlying driving forces of process data. Unfortunately, the abovementioned SFA based batch process monitoring methods are unsupervised algorithms and they fail to extract discriminant feature from different classes of observed data. Consequently, these existing methods may inevitably cause the undesirable effect that the class label information is submerged for fault detection and diagnosis. The main motivation of this paper is based on the fact that the batch process fault detection and identification performance can be improved by incorporating discriminant information with SFA based batch process monitoring method that offer a more effective alternative to mine the batch process data features. In the fields of classification and pattern recognition, there are some successful cases of applying SFA based supervised methods to reveal the discriminant structure of data. By utilizing the temporal coherence principle to find low invariant dimensional representation of image data, Huang et al. [32] introduced supervised slow feature analysis (SSFA) to perform the dimensionality reduction. Slow feature discriminant analysis (SFDA) is proposed to extract discriminant slow features for handwritten digit recognition [33]. To effectively make full use of discriminant power for face recognition, Gu et al. [34] modified SFDA by designing a new adaptive criterion to generate within-class time series and constructed the optimization based on maximum margin criterion. Gu et al. [35] further developed another extension of SFDA to extract discriminant feature for face classification, which integrates global information into SFDA and extracts statically uncorrelated discriminant feature. In the continuous industrial process monitoring field, improved versions of SFA based supervised methods have also been designed. To account for the 109

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

global data structure information in the new space. As shown in Fig. 1, the unfolded data matrix X(IK × J ) is composed of a series of time-slice matrices X k (I × J ), for k = 1,2, ⋯, K , which destroys the original temporal structure. For the process modeling, the pseudo-time series is constructed using neighboring information to supply the temporal structure information. In particular, a pseudo-time series regarding to the i-th sample x k (i) in the time-slice matrix X k (I × J ) is constructed by means of K-nearest neighborhood (K-NN) criterion as

normal dataset and fault snapshot dataset in the developed contribution plot. Simulations on a numerical nonlinear process and the benchmark fed-batch penicillin fermentation process are used to demonstrate the superiority of the proposed fault detection and diagnosis scheme. In summary, the contributions of this work address two aspects. First, a DGKSFA model is built to utilize discriminant information and extract time-varying dynamics of batch process for process monitoring, where different class discriminatory information is used and the global and local structure information of nonlinear batch process data is preserved. Second, a novel nonlinear contribution plot is developed to tackle the challenging issue of nonlinear fault variable identification based on the pseudo-sample variable trajectory projection in DGKSFA nonlinear biplot, which estimates discriminant power of the original variable in fault snapshot dataset to the optimal class separation direction quantitatively in the high dimensional feature space. The remaining sections are organized as follows. In Section 2, a brief review of GKSFA is provided. Section 3 presents the proposed DGKSFA based fault detection method in details, while the nonlinear contribution plot based on DGKSFA is detailed in Section 4. Section 5 gives out the overall procedure of batch process fault detection and identification base on DGKSFA. Two case studies are used to validate the proposed DGKSFA method for process fault detection and identification in Section 6, and our conclusions are offered in Section 7.

T

tk (i) = [x k (i), x1k (i), x k (i), x 2k (i), ⋯, x k (i), x lk (i)] ∈ 2l × J

(1)

x kj (i),

j = 1,2, ⋯, l denotes the l nearest neighbors of x k (i) in its where corresponding time-slice matrix. With the application of tk (i), i = 1,2, ⋯, I , the pseudo-time series T matrix of X k (I × J ) is constructed asTk (2Il × J ) = [tkT (1), tkT (2), ⋯, tkT (I )] . Then, the data matrix X(IK × J ) is extended to a pseudo-time series matrix T(2IKl × J ) shown as T

T (2IKl × J ) = [T1T, T1T, ⋯, TkT, ⋯, TTK ]

(2)

For 1 ≤ i ≤ IKl , denote the (2i − 1) -th and 2i-th sample of the pseudo-time series T asτ(2i − 1) ∈  J and τ(2i) ∈  J . A nonlinear mapping function ϕ (⋅) is supposed to transform the vectors τ(2i − 1) and into new vectors τϕ (2i) = ϕ (τ (2i)) and τ(2i) τϕ (2i − 1) = ϕ (τ (2i − 1)) , respectively. The input matrix X is also nonlinear transformed into ϕ (X) , denoted as X ϕ . As the nonlinear mapping ϕ (⋅) is unknown, kernel trick is used to help completing the nonlinear transformation, without having to perform the nonlinear mapping ϕ (⋅) explicitly. By using the kernel function ker (x, y) , the inner product of two nonlinear mapped vectors ϕ (x) and ϕ (y) is computed as ϕT (x) ϕ (y) = ker (x, y) . Thus, the KSFA optimization task is formulated in the kernel space as

2. A brief review of GKSFA The model training data of batch process is a three-dimensional data matrix, denoted byX(I × J × K ) , where I is the number of batch runs, J denotes the number of process variables and K is the number of samples in a batch run. A two-step multiway data unfolding (TS-MDU) technique illustrated in Fig. 1 is first applied to convert the three-way training data matrix X(I × J × K ) into two-way variable-wise unfolded matrix X(IK × J ) [20]. Then, GKSFA is implemented on the matrix X(IK × J ) : the original input data with nonlinear relationship is firstly mapped onto a new linear space by one nonlinear mapping function ϕ (⋅) , and then linear SFA is combined with global structure analysis to explore the local dynamic data relationships as well as to mine the

IKl

JKSFA = min

1 αTj ∑ (k2i − k2i − 1)(k2i − k2i − 1) Tα j IKl − 1 i = 1

= min αTj LK α j s. t .

1 αTj KKTα j = 1 IK − 1

Fig. 1. Diagram of the two-step multiway data unfolding process. 110

(3)

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

where α j is the j-th loading vector, k2i = X ϕτϕ (2i) k2i − 1 = X ϕτϕ (2i − 1) are the kernel vectors. The kernel 1

normal data and the constructed fault pattern data, and its objective function is to minimize the GKSDA optimization in the unfolded normal data and maximize the separability between normal data and fault pattern data, simultaneously. Given the unfolded normal operating dataset X = [x (1), x (2), ⋯, x (IK )]T and the fault pattern dataset F = [xF (1), xF (2), ⋯, xF (nF )]T , the between-class separability of normal operating dataset and fault pattern dataset is characterized as the distances between the data points in normal operating dataset X and their neighboring points in fault pattern dataset F . Similar to Section 2, the between-class pseudo-time series between normal dataset and fault pattern dataset is constructed using K-NN selection criterion. For the ith sample x(i) in normal dataset X , its p nearest neighbors in fault pattern dataset F are denoted by xFf (i), f = 1, 2, ⋯, p . Then, a between-class pseudo-time series between x(i) and the fault pattern dataset is constructed as

and ma-

IKl

trixK = X ϕX ϕT ,and LK = IKl − 1 ∑i = 1 (k2i − k2i − 1)(k2i − k2i − 1) T . When KSFA is applied to three-dimensional data matrix of batch process, it is related to the local neighborhood data relationship and omits the global data structure exploration. In order to preserve the global and local data structure simultaneously, GKSFA model is built by combining the KSFA optimization JKSFA with the global optimization JGlobal together [20].

JGKSFA = βJKSFA − (1 − β ) JGlobal = minαTj (β LK − (1 − β ) CK ) α j

(4)

where 0 < β < 1 is a tuning factor and JGlobal is the objective function of global structure analysis, the goal of which is to find a projection vector such that the mean square of the Euclidean distance between all pairs of IK 1 the projected sample points is maximized. Assume IK ∑i = 1 x ϕ (i) = 0 , JGlobal can be formulated as

JGlobal = max

1 IK

tb (i) = [x (i), x1F (i), x (i), x 2F (i), ⋯, x (i), xFp (i)]T ∈ 2p × J

IK

∑ αTj X ϕx ϕ (i) x ϕT (i) X ϕTα j = max αTj CK α j i=1

(8)

By this analogy, the between-class pseudo-time series between normal matrix X(IK × J ) and fault pattern dataset F is shown as

(5)

T

Tb (2IKp × J ) = [tTb (1), tTb (2), ⋯, tTb (IK )]

where the vector x ϕ (i) denotes the input vector x(i) after nonlinear IK 1 1 transformation and CK = IK ∑i = 1 X ϕx ϕ (i) x ϕT (i) X ϕT = IK KKT . T Furthermore, the constraint α j Kα j = 1 is imposed to ensure the reasonable solution and the KSFA optimization and global preserving optimization are normalized to be considered under the same scale.

(9)

Denote the (2j − 1) -th and 2j-th sample of the between-class pseudotime series Tb asμ(2j − 1) ∈  J and μ(2j ) ∈  J for 1 ≤ j ≤ IKp , the temporal variation between these two samples can be considered as μ (2j ) − μ (2j − 1) . After the matrix Tb is transformed into Tbϕ = ϕ (Tb) by the nonlinear mapping function ϕ (.) , maximizing the between-class separability of normal data and fault pattern data is equal to maximizing the temporal variation of matrix Tbϕ , resulting in the optimization JSepara formulated as

∼ ∼ ∼ minαTj (β LK − (1 − β ) CK ) α j s. t . αTj Kα j = 1

(6) ∼ ∼ ∼ where LK = LK / ρ (LK ), CK = CK / ρ (CK ) , and K = K/ ρ (K) . Here, ρ (⋅) represents the spectral radius of corresponding matrix.

IKp

JSepara = max

3. Discriminant GKSFA

− μϕ (2i − 1))(μϕ (2i) − μϕ (2i − 1)) Twj

It is clear from Eq. (6) that the aforementioned GKSFA is an unsupervised monitoring technique, which requires a fault-free training dataset to build the normal operation model. The real batch process data, however, are very likely to be contaminated with a certain number of faulty samples because different types of process faults may occur during the plant operation. Consequently, GKSFA based process monitoring method cannot give sufficient attention to class label information of fault data associated with useful fault information, which leads to that the occurring fault may not be detected timely. Although some supervised methods based on SFA in pattern recognition field have been proposed to utilize the discriminatory information of different classes, these methods are local discriminant approach because they only preserve the local geometrical structure of process data by minimizing the distance among neighboring points from the same class. As a result, neighboring data points in the same class will be projected to a single point in some cases and faraway points in the same class will be projected to a small region without a constraint for the global data structure exploration. Motivated by the above analysis, by integrating discriminant analysis into GKSFA model, we propose DGKSFA to make use of the class discriminatory information and preserve both global and local structure information of nonlinear batch process data for fault detection. Suppose that the number of C classes of historical fault data are collected from batch process, and the i-th class of historical fault data is denoted as Fi = [x Fi (1), x Fi (2), ⋯, x Fi (n Fi )]T ∈ n Fi × J i = 1,2, ⋯, C , where n Fi is the number samples in matrix Fi . The fault pattern data matrix F is first constructed based on the C classes of historical fault data F1, F2, ⋯, FC as follows T

F = [F1T , F2T , ⋯FCT ] = [xF (1), xF (2), ⋯, xF (nF )]T ∈ nF × J

1 wTj ∑ (μϕ (2i) IKp − 1 i=1 (10)

is the projection vector andμϕ (2i) = ϕ (μ (2i)) , where wj μϕ (2i − 1) = ϕ (μ (2i − 1)) . Considering that the projection vector wj could be expressed by the linear combinations of the training samples as wj = X ϕTα j , the objective function in Eq. (10) is rewritten as IKp

JSepara = max

1 αTj X ϕ ∑ (μϕ (2i) IKp − 1 i=1

− μϕ (2i − 1))(μϕ (2i) − μϕ (2i − 1)) TX ϕTα j

(11)

The kernel function employed by DGKSFA is not arbitrary. It is required to satisfy Mercer's theorem [10]. According to Mercer's theorem of functional analysis, there exists a nonlinear mapping into a space where a kernel function acts as a dot product if the kernel function is a continuous kernel of a positive integral operator. A specific choice of kernel function implicitly determines the nonlinear mapping and the feature space F. In DGKSFA, the selection of a kernel function is the most important since the degree of capturing nonlinear characteristics of an industrial process is dependent upon it. However, the general question of how to select the ideal kernel for a given monitoring process has been still an open problem [16]. The representative kernel functions are as follows: Sigmoid kernel: ker (x, y) = tanh (β0 x, y + β1) Polynomial kernel: ker (x, y) = x, y d Gaussian kernel: ker (x, y) = exp(− x − y 2 / σ ) where β0 , β1, d and σ are specified a priori by the user. The polynomial kernel and the Gaussian kernel always satisfy Mercer's theorem, whereas the sigmoid kernel satisfies it only for some certain values of β0 and β1 [10]. Among them, the Gaussian kernel is the most widely used kernel function [41]. In our study, the Gaussian kernel function is also

(7)

C

where nF = ∑i = 1 n Fi denotes the number of samples in fault pattern data matrix F. In this work, DGKSFA is pairwise applied to variable-wise unfolded 111

ISA Transactions 79 (2018) 108–126

H. Zhang et al. 0.015

variable 4

y2 (1.56% explained variance)

0.01 0.005

A variable 2

0 -0.005

B

-0.01

variable 3

-0.015

variable 1

-0.02 -0.025 -0.03 -0.035 -700

-600

-500

-400

-300

-200

-100

y1 (98.43% explained variance)

0

100

Fig. 3. The variable pseudo-sample trajectories projected in the DGKSFA nonlinear biplot. Fig. 2. DGKSFA scores plot for the Setosa and Virginica classes in the Iris benchmark data set.

used for the mapping into a high-dimensional feature space to follow references [20] [52], and [53] since it is found to be more appropriate to capture the nonlinearity of the considered system than other representative kernel functions. By using Gaussian kernel function ker (x, y) = exp(− x − y 2 / σ ) , the inner product of two nonlinear mapped vectors could be computed by their kernel function. Therefore, there are the kernel vectors γ2i = X ϕμϕ (2i) and γ2i − 1 = X ϕμϕ (2i − 1) , where the i-th elements of γ2i and are defined by and γ2i, j = ker (x (j ), μ (2i)) γ2i − 1 γ2i − 1, j = ker (x (j ), μ (2i − 1)) , respectively. With the use of these kernel vectors, the optimization in Eq. (11) is rewritten with the kernel form as IKp

JSepara = max

1 αTj ∑ (γ − γ2i − 1)(γ2i − γ2i − 1) Tα j = maxαTj Q K α j IKp − 1 i = 1 2i (12) 1 IKp − 1

IKp ∑i = 1

γ2i − 1) T

(γ2i − γ2i − 1)(γ2i − is the kernel betweenwhere Q K = class pseudo-temporal variation matrix between normal operation dataset X and fault pattern dataset F . In order to extract the class discriminatory information and preserve both global and local data structure of nonlinear batch process data, we integrate the GKSFA optimization with the optimization JSepara together and construct the following objective function: ∼ ∼ min αTj (β LK − (1 − β ) CK ) α j JGKSFA = T JSepara max α j Q K α j ∼ T ∼ α j (β LK − (1 − β ) CK ) α j = min T α j QK αj

Fig. 4. A diagram of calculating the variable contribution based on nonlinear biplot.

first calculated to guarantee that the points used to compute the distance in normal data and test data have the same sampling time. The mean kernel vector of time-slice kernel matrix Kk (I × IK ) is calculated as

JDGKSFA =

kk = (13)

I

∑ kk (i),

k = 1,2, ⋯, K (15)

i=1

Using Eq. (15), the mean value of each element in kernel vector at each sampling point in different batch is obtained. Then, the average kernel matrix K is constructed as

To solve the above minimization problem leads to the generalized eigenvalue decomposition as

∼ ∼ (β LK − (1 − β ) CK ) α j = λjϕ Q K α j

1 I

K (K × IK ) = [k1, k2, ⋯, kK ]T

(14)

(16)

For a new test data vector x t ∈  J × 1collected at the time interval k, scale this vector using the mean and standard deviation of the normal operating data at time k, then the scaled test data x t is mapped into high-dimensional feature space through kernel trick, and the test kernel vector k t (IK × 1) = X ϕϕ (x t) = X ϕx tϕ , where xtϕ = ϕ (xt) is obtained. Based on the test kernel vector k t and the kernel vector K(k ) corresponding to the time interval k in average kernel matrix K , the monitoring statistic Ddist is built as follows

The solution of Eq. (14) brings a series of eigenvectors α1, α2, ⋯, αIK ϕ corresponding to the eigenvalues λ1ϕ , λ 2ϕ , ⋯, λIK . In pairwise DGKSFA analysis, the eigenvector corresponding to the smallest eigenvalue is regarded as the optimal kernel discriminant vector α opt that optimally separates test data from normal data. To perform fault detection, a monitoring statistic is built by comparing Euclidean distance between the optimal kernel feature vectors of present test data and normal data. In the high-dimensional kernel space, the kernel matrix K ∈ IK × IK of normal dataset is composed a series of time-slice matricesKk (I × IK ) = [k k (1), k k (2), ⋯, k k (I )]T , for k = 1,2, ⋯, K . Therefore, the average kernel matrix of normal dataset is

Ddist = αTopt kt − αTopt K T (k )

2

(17)

To inspect if a fault occurs, the confidence limit of monitoring 112

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

Fig. 5. The overall procedure of the proposed batch process fault detection and identification strategy.

Ddist monitoring statistic in the proposed DGKSFA method cannot be determined directly from a particular approximate distribution. The well-known kernel density estimation (KDE) is a nonparametric empirical density estimation technique, which does not need any assumption of the process variables distribution. Hence, data-driven KDEbased method has become popular recently for confidence limit determination [8] [41] [54], and [55]. Therefore, we apply the KDE method to determine the confidence limits of the monitoring statistic Ddist in the proposed DGKSFA based monitoring method. Specifically, a univariate kernel estimator with kernel ker in the KDE method is defined by

Table 1 The values and selection criterions of the parameters in DGKSFA for the numerical nonlinear system. Values & selection criterions

Gaussian kernel parameter σ

Trade-off parameter β

Neighbor number l

Neighbor number p

Values of the parameters Selection criterions

400

0.6

3

5

Grid search

Grid search

Based on the experience

Based on the experience

statistic Ddist is required. In order to determine the confidence limit of the monitoring statistic, many process monitoring methods requires the monitored process variables follow some specified distribution. Typically, existing PCA and KPCA methods assume the Gaussian distribution for the monitored process variables. Specifically, assuming the monitored process variables follows a multivariate normal distribution, the PCA based methods computing the confidence limits for T 2 and SPE statistics based on the F distribution and weighted χ 2 distribution for these two statistics, respectively. As no prior knowledge is available regarding the distribution of the process variables in batch process since the batch process is highly complex, it is difficult to guarantee that the process variables conform to a specific distribution assumption, such as multivariate Gaussian distribution. Hence, the confidence limit of the

pˆ (y ) =

1 nh

n

∑ ker i=1

{ y −h y } i

(18)

where y is the data point under consideration, yi is an observation value from the data set, h is the smoothing parameter, n is the number of observations. ker denotes the kernel function which is selected as Gaussian kernel in this work. The confidence limit of monitoring statistic Ddist can be obtained using KDE method as follows. First, values of the monitoring statistic Ddist from normal operating data are computed. Then the univariate kernel density estimator is used to estimate the density function of the normal Ddist values. Lastly, the confidence limits of monitoring statistic Ddist is obtained by calculating the point occupying the 99% area of density function in the normal operating data. IK 1 In order to ensure IK ∑i = 1 x ϕ (i) = 0 , mean centering operation in the 113

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

10

-2

10 120

10

10

114

Ddist

Ddist

10

-4

10

10

-4

-3

10

10

-2

-5

0

50

100

150

10

200

-6

-8

-10

-12

0

50

100

150

200

Sample sequence

Sample sequence

(a) MKFDA

(b) DKSFA 10 10

Ddist

10

0

-2

-4

103

10 10

10

-6

-8

-10

0

50

100

150

200

Sample sequence (c) DGKSFA Fig. 6. Online monitoring charts of the numerical nonlinear system fault.

Based on the idea of nonlinear biplot, an innovative nonlinear contribution plots is developed to identify fault variable by finding the original variables with the largest discriminant power to the optimal separation of normal data and fault data in DGKSFA model.

Table 2 Fault detection times (FDTs) and fault detection rates (FDRs) of numerical system fault obtained by three methods. Method MKFDA DKSFA DGKSFA

Statistic

Ddist Ddist Ddist

FDT (Sample No.)

FDR (%)

120

81.00

114

87.00

103

98.00

4.1. Linear biplot and nonlinear biplot The classical linear biplot is a visualization in which the samples and the variables of a data set are represented together. Based on singular value decomposition of the column mean-centered data matrix ∼ ∼ X ∈ n × m with n samples and m variables [46], matrix X is decomposed into score matrix Y and loading matrix P according to

high-dimensional feature space should be performed before solving Eq. (14). It can be carried out by substituting K with

ˆ = K − KIIK − IIK K + IIK KIIK K

(19)

∼ X = UΛV T = YPT

where IIK is the IK × IK matrix whose elements are all equal to 1/ IK . Before calculating monitoring statistic Ddist using Eq. (17), the kernel vector k t should be also mean centered as follows:

kˆ t = k t − KIt − IIK k t + IIK KIt

(21)

where Y = UΛ and P = V . To construct a linear biplot, the elements in the first two columns of Y and in the first two columns of P are used. The positions of the samples in a biplot are subsequently given by the elements in the first two columns of the score matrix Y . The variable trajectories in a biplot are represented as vectors pointing from the origin to the coordinates that are given by the elements of the first two columns of the loading matrix P . More details about linear biplots could be seen in the related references [44–46]. The key idea leading to the nonlinear biplot is to interpret the representation of variable trajectories in the classical linear biplot as projections of variable pseudo-samples in the nonlinear feature space. A variable pseudo-sample corresponds to a particular observation that carries all the weight of one specific single variable. For example, a vector vj = [0,0, ⋯, 1,0, ⋯, 0]represents one of the possible pseudo-

(20)

where It is the IK × 1 vector with its elements as 1/ IK . 4. Nonlinear contribution plots based on nonlinear biplots After a fault is detected, it is vital to identify fault variables in order to find the root cause of the fault. The fault identification task is very challenging, and a few existing works discussed the methods of identifying faults. In particular, there is hardly any fault identification research for the kernel based process monitoring methods owing to the fact that the correlation between monitoring model and the original input variables is lost by the usage of implicit nonlinear transformation. 114

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

-5

0.6

0.5

-3

x 10 10

0.4

0.3

0

A

0

5

y2 (6.24% explained variance)

y2 (10.24% explained variance)

0.5

x 10

B

A 0

10

20 -4

x 10

0.2

0.1

0

-0.5

B

-1

-1.5

-2

variable 1 variable 2 variable 3 variable 4 variable 5

-2.5 -0.1 -0.01

0

0.01

0.02

0.03

0.04

0.05

y1 (67.78% explained variance)

0.06

-3 -3

(a) MKFDA

-2.5

-2

-1.5

-1

-0.5

y1 (86.35% explained variance)

0

0.5 -5

x 10

(b) DKSFA -3

7

x 10

variable 1 variable 2 variable 3 variable 4 variable 5

6

y2 (1.81% explained

5 4 3

B

2 1 0 -1 -9

A -8

-7

-6

-5

-4

-3

-2

y1 (92.64% explained variance)

-1

0

1 -3

x 10

(c) DGKSFA Fig. 7. Original pseudo-sample variable trajectories of the numerical nonlinear system fault.

samples associated with the variable x͠ j . By projecting the vector vj onto the latent structure of a one-latent variable discriminant model, the score for vj is calculated as follows

ynew = [0,0, ⋯, 1,0, ⋯, 0] α = αj

previous operation defines a variable trajectory along the direction of the variable vector in the nonlinear biplot. Note that for classical linear biplot, only one pseudo-sample per variable is sufficient to represent the variables while for a nonlinear biplot, the variable trajectory will be curved and multiple pseudo-samples per variable are required. In references [46–50], the analysis of the pseudo-sample variable trajectories in the nonlinear biplot permits to reveal the discriminant power of original variables in the kernel-based methods for classification and discrimination purposes. In particular, these variable trajectories are compared with the direction of class separation.

(22)

This score ynew is equal to the j-th value of the loading vector α and gives information about the discriminant power of variable xj to the latent variable model. When a pseudo-sample matrix Vj is created for variable xj which contains the j-th column values ranging from the minimum to the maximum of variable xj and 0 in all the other entries.

⎡ 0,0, ⋯, min(x j ) , 0, ⋯, 0 ⎤ ⋯ ⎥ Vj = ⎢ ⋯ ⎥ ⎢ 0,0, ⋯, max(x j ) , 0, ⋯, 0 ⎥ ⎢ ⎦ ⎣

4.2. DGKSFA based nonlinear contribution plot (23)

However, these previous works about nonlinear biplot have a common shortcoming in that they lack a criterion to estimate the contribution of the original variable to the nonlinear discriminant model quantitatively. For some complex cases, these methods may not function well. To overcome this problem, a new nonlinear contribution plot is developed to determine the contribution of each original variable of fault snapshot dataset to DGKSFA model for fault variable identification. In the proposed nonlinear contribution plot, an objective guideline is designed to calculate the amount of variable trajectory projected along the optimal class separation direction as an amount of contribution of original variable of fault snapshot dataset to DGKSFA model. After a fault is detected by the DGKSFA based batch process

By projecting matrix Vj onto the latent space of the discriminant model, a trajectory of variable xj is obtained according to the following equation:

α ⎡ α1 ⎤ min(x j ) αj ⎤ 2 ⎡ 0,0, ⋯, min(x j ) , 0, ⋯, 0 ⎤ ⎢ ⎥ ⎡ ⎥ ⋮ ⎢⋮ ⎥ ⎢ ⋯ ⎢ ⎥ Vj α = = ⎥ ⋯ ⋮ ⎢ ⎥ ⎢ αj ⎥ ⎢ ⎥ ⎢ 0,0, ⋯, max(x j ) , 0, ⋯, 0 ⎥ ⎢⋮ ⎥ ⎢ max(x j ) αj ⎥ ⎣ ⎦⎢ ⎥ ⎢ ⎦ ⎣ ⎣ αn ⎦

(24)

It is straightforward to generalize this result to the situation where two latent variables are considered and the matrix resulting from the 115

ISA Transactions 79 (2018) 108–126

H. Zhang et al. -5

1.4

x 10

1.2

Contribution

Contribution

1

0.8

0.6

0.4

0.2

0

1

2

3

4

5

Variables (a) MKFDA

(b) DKSFA -3

9

x 10

8

Contribution

7 6 5 4 3 2 1 0

1

2

3

4

5

Variables (c) DGKSFA Fig. 8. Fault identification using the proposed contribution plot for the numerical nonlinear system fault. Table 3 The IVI and RCD indices of the three methods for the numerical nonlinear system fault. Variable number

1

MKFDA

DKSFA

Table 4 The monitored variables of the penicillin fermentation process.

IVI

RCD

IVI

RCD

IVI

RCD

1

0.1025

1

0.3618

1

0.9156

PH FC

Variable number

Process variable

1 2 3 4 5 6 7 8 9 10

Aeration rate Agitator power Substrate feed rate Substrate feed temperature Dissolved oxygen concentration Culture volume Carbon dioxide concentration PH Temperature Generated heat

DGKSFA

T

Acid

rebuilt in Eq. (25) only based on the fault snapshot dataset S and the unfolded normal operating dataset X(IK × J ) . Base Fermenter

Substrate Tank

JDGKSFA = min

FC

Cold Water Hot Water

∼ ∼ αTj (β LK − (1 − β ) CK ) α j αTj QS α j

(25)

where the matrix QS denotes the kernel between-class pseudo-temporal variation matrix between normal operation dataset X(IK × J ) and fault ∼ ∼ snapshot dataset S . The matrix LK and matrix CK are calculated in Section 2. By solving the minimization problem in Eq. (25), the loading matrix A = [α1, α2] is obtained, where vector α1 corresponds to the smallest eigenvalue and vector α2 corresponds to the second smallest eigenvalue. For j = 1,2, ⋯, J , a nS × J pseudo-sample matrix Vj for the j-th original variables of fault snapshot dataset is created as in Eq. (23) using its minimum and its maximum values in the fault data. Then, the

Air

Fig. 9. The flow Diagram of the fed-batch penicillin fermentation process.

monitoring method, the fault samples are collected to construct fault snapshot dataset S = [x S (1), x S (2), ⋯, x S (nS )]T ∈ nS × J and nS is the number fault samples in dataset S . Then the proposed DGKSFA model is 116

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

accordance with our hypothesis. However, the most discriminative variable for the class separation need to be further identified. A novel objective criterion based on the nonlinear biplot is proposed to calculate the discriminant power of the original variables to DGKSFA model. In Fig. 4, points C1, C2, C3 represent the centers of gravity of the ⎯⎯⎯⇀ trajectories of variables 1, 3,4, respectively. Vector AB represents the optimal class separation direction between Setosa class and Virginica ⎯⎯⎯⎯⇀ class and vector ACi , i = 1,2,3 represents pseudo-sample trajectory vectors of variables 1, 3, 4, respectively. We find that the discriminant power of each original variable to DGKSFA model is not only related to the length of the variable pseudo-sample trajectory but also related to ⎯⎯⎯⇀ the angle formed by the optimal class separation direction AB and each variable pseudo-sample trajectory. Due to the nonlinear kernel transformation, the variable pseudo-sample trajectory may be curve. So the ⎯⎯⎯⎯⇀ norm of vector ACi , i = 1,2,3 is regarded as the length of the variable pseudo-sample trajectories and the angle θi , i = 1,2,3 between vector ⎯⎯⎯⇀ ⎯⎯⎯⎯⇀ AB and vectors ACi , i = 1,2,3 is regarded as the angle formed by the ⎯⎯⎯⇀ vector AB and each variable pseudo-sample trajectory. The contribution (or discriminant power) of each original variable to DGKSFA model is defined as

Table 5 Normal operating conditions of the penicillin fermentation process. Initial conditions Substrate concentration (g/L) Dissolved oxygen concentration (mmol/L) Biomass concentration (g/L) Penicillin concentration (g/L) Culture volume (L) Carbon dioxide concentration (mmol/L) PH Bioreactor temperature (K) Generated heat (kcal) Set points Aeration rate (L/h) Agitator power (W) Substrate feed flow rate (L/h) Substrate feed temperature (K) PH Bioreactor temperature (K)

14–18 1–1.2 0.1 0 100–105 0.5–1 4.5–5.5 298–299 0 8–9 29–31 0.039–0.045 296–297 5–5.05 298–299

pseudo-sample matrix Vj is transformed into a pseudo-sample kernel matrix K Vj through the same kernel transformation as for constructing DGKSFA model.

K Vj = ϕ (Vj) ϕT (X) = Vjϕ X ϕT

⎯⎯⎯⎯⇀ ⎯⎯⎯⎯⇀ Cont i = ACi · cos θi = ACi ·

(26) where cos θi =

where Vjϕ is the abbreviation form of ϕ (Vj) . The obtained matrix K Vj ∈ nS × IK contains information about the dissimilarity between the nS pseudo-samples of the fault data and the IK observations of the normal operating data. The matrix K Vj is mean centered as follows

ˆ Vj = K Vj − I V K − K V IIK + I V KIIK K j j j

=

⎯⎯⎯⎯⇀ ⎯⎯⎯⇀ ACi · AB ⎯⎯⎯⇀ AB

(29)

.

A large value of Cont i indicates that the i-th original variable has large discriminant power to the DGKSFA model. The variable with the largest discriminant power is regarded as fault variable because it makes the largest contribution to the separation of normal data and fault data. To evaluate the fault variable identification performance, two performance indices are used in this work. One index is called identified variable index (IVI) [57], defined as

(27)

where I Vj is the nS × IK vector with its elements as 1/ IK . ˆ Vj is projected onto the Then, the j-th pseudo-sample kernel matrix K two-dimensional latent structure of DGKSFA model:

ˆ VA YVj = K j

⎯⎯⎯⎯⇀ ⎯⎯⎯⇀ ACi · AB ⎯⎯⎯⎯⇀ ⎯⎯⎯⇀ ACi AB

⎯⎯⎯⎯⇀ ⎯⎯⎯⇀ ACi · AB ⎯⎯⎯⎯⇀ ⎯⎯⎯⇀ ACi AB

IVI = argmax(Conti, i = 1,2, ⋯, m) i

(28)

(30)

which represents the index of the variable with the largest contribution to class separation. Another performance index is the recognition contrast degree (RCD). RCD is constructed to judge the variable recognition contrast degree by comparing the largest and the second largest contribution of variables, which is expressed as

Finally, the score matrix YVj is visualized in the two-dimensional biplot to obtain the pseudo-sample trajectory of the j-th variable. The above procedure is repeated to generate a nonlinear biplot containing pseudo-sample trajectories for the J original variables of fault snapshot dataset. Note that when different classes of data are not linearly separable, the curvature of the trajectory illustrates the effect and importance of the nonlinear kernel transformation. In our study, we find that the projection of these variable pseudosample trajectories along the optimal class separation direction in nonlinear biplot reflects the contribution of original variables of fault snapshot dataset to class separation in DGKSFA model. Next, the Iris dataset which is a widely used benchmark data set to exemplify discriminant and cluster analysis [56] is used to help explaining the proposed nonlinear contribution plot. The DGKSFA scores of Setosa class and Virginica class in Iris benchmark data set are plotted in Fig. 2, where Setosa class is completely separated from the Virginica class. y1 and y2 are the first and second score vectors resulting in the optimal separation between the Setosa class and Virginica class in the biplot. Points A and B are the centers of gravity of Setosa class and Virginica class, respectively. The red dotted line connecting points A and B represents the optimal class separation of the two classes. To recover the discriminant power of original variables to the separation of Setosa class and Virginica class, pseudo-sample trajectory of each original variable in Virginica class is constructed and projected in the space spanned by the corresponding scores y1 and y2 . These projections are visualized in Fig. 3 and as illustrated the variables Sepal Length (variable1), Petal Length (variable 3) and Petal Width (variable 4) have large discriminant power to the class separation, which is in

RCD = r·

Contmax − Contsubmax Contmax

(31)

where Contmax is the largest variable contribution, Contsubmax is the second largest variable contribution and r is set as 1 for correct fault variable identification and −1 for wrong identification. The high contrast degree results in the convincing, clear and correct recognition. 5. Batch process fault detection and identification strategy based on DGKSFA The DGKSFA based batch process fault detection and identification procedure includes two stages: fault detection stage and fault identification stage, as illustrated in Fig. 5, and the fault detection stage can be further divided into an offline modeling stage and an online detection stage. In the offline modeling stage, the fault pattern dataset F are first constructed using C kinds of historical fault data, then, the three-way normal dataset X(I × J × K ) and the fault pattern dataset F are utilized to build DGKSFA model. Lastly, the confidence limit of monitoring statistic is determined bases on the normal dataset. During the online detection stage, the collected test data is first normalized, then, the mean centered test kernel vector kˆ t is calculated. Lastly, the monitoring statistic Ddist is built using test kernel vector kˆ t and average kernel matrix K(K × IK ) of normal dataset to determine whether the process is 117

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

Fig. 10. Data characteristics of process variables in the penicillin fermentation process. Table 7 The values and the selection criterions of the parameters in DGKSFA for the penicillin fermentation process.

Table 6 Basic information about six fault batches. No.

Variable

Fault type

Fault magnitude

Starting time (h)

Terminal time (h)

F1 F2

Aeration rate Agitator power Substrate feed rate Aeration rate Agitator power Substrate feed rate

Step Step

+10% +10%

100 100

400 400

Step

+10%

100

400

Ramp Ramp

+1 +3

100 100

400 400

Ramp

+0.004

100

400

F3 F4 F5 F6

Values & selection criterions

Gaussian kernel parameter σ

Trade-off parameter β

Neighbor number l

Neighbor number p

Values of the parameters Selection criterions

600

0.7

4

6

Grid search

Grid search

Based on the experience

Based on the experience

normal. In the fault identification stage, a new DGKSFA model is first rebuilt only using the normal dataset X and the fault snapshot dataset S . Then, the contribution of each original variable in the fault snapshot 118

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

10

-2

10

-4

128

10

10

10

10

Ddist

Ddist

10

-3

-4

-5

10

-6

0

10

100

200

300

10

400

-6

120

-8

-10

-12

0

100

200

300

400

Time (h)

Time (h)

(a) MKFDA

(b) DKSFA 10

10

-2

-4

Ddist

100

10

10

10

-6

-8

-10

0

100

200

300

400

Time (h) (c) DGKSFA Fig. 11. Online monitoring charts of the penicillin fermentation process fault F3.

space using Eq. (20). (2) Calculate the monitoring statistic Ddist of the test data x t according to Eq. (17) and compare the monitoring statistic Ddist with its corresponding confidence limit to determine whether a fault occurs.

dataset is computed based on DGKSFA nonlinear biplot to determine which variable is responsible for the fault. The detailed fault detection stage and fault identification stage can be summarized as follows. The fault detection stage The offline modeling stage

The fault identification stage (1) Collect the C kinds of historical fault data to construct the fault pattern dataset F according to Eq. (7) and unfold the three-way normal operating matrix into two-way data matrix X(IK × J ) using the two-step MDU technique. Then, Normalize the normal dataset X(IK × J ) and the fault pattern dataset F . ˆ and the kernel intra(2) Calculate the mean centered kernel matrix K class pseudo-temporal variation matrix Lk using normal dataset X(IK × J ) and construct the kernel between-class pseudo-temporal variation matrix Qk using normal dataset X(IK × J ) and fault pattern dataset F . ˆ (3) Build the DGKSFA statistical model based on the kernel matrix K

(1) If a fault is detected, collect the fault samples to construct the fault snapshot dataset S . Then, rebuild a new DGKSFA model only based on the fault snapshot dataset S and the normal dataset X(IK × J ) according to Eq. (25). (2) Plot the optimal class separation vector and the pseudo-sample trajectory of each original process variable of the fault snapshot dataset in nonlinear biplot based on DGKSFA model. (3) Compute the contribution of each original variable of the fault snapshot dataset to the separation of normal dataset and fault snapshot dataset in DGKSFA model using Eq. (29). The variable with the largest contribution is regarded as fault variable.

and the kernel pseudo-temporal variation matrices Lk and Qk and solve the eigenvalue problem in Eq. (14) to obtain the optimal kernel discriminant vector α opt . (4) Calculate the average kernel matrix K(K × IK ) of normal dataset using Eq. (16) and compute the confidence limit Dlimit of monitoring statistic based on matrix K(K × IK ) .

6. Case studies To evaluate the monitoring and fault identification performance of the proposed method, two case studies are demonstrated in this section. One is a numerical example that simulates a nonlinear dynamic batch process. The other one is a well-known benchmark batch process, fedbatch penicillin fermentation process. In both case studies, the fault diagnosis performance of our proposed DGKSFA method is compared with two monitoring schemes, namely, the DKSFA method, the MKFDA

The online detection stage (1) Normalize the collected test data x t and calculate the test kernel vector k t . Then, mean center the test kernel vector k t in the kernel 119

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

Fig. 12. Original pseudo-sample variable trajectories of the penicillin fermentation process fault F3.

increased from sample 101 by adding0.03 × (k − 100) to the previous one until the end of the batch. Three methods of MKFDA, DKSFA and DGKSFA are applied to develop statistical monitoring models for fault detection. For DGKSFA method, Gaussian kernel is selected as the kernel function to follow the literature [16,20]. The normal operation data is divided into training dataset and validating dataset. A grid search [59–62] of the kernel parameter σ and the trade-off parameter β is conducted in seek of the optimal fault detection result of the normal operation data. Specifically, the kernel parameter σ is searched in a wide range {1, 100, 200, …, 900, 1000} while the trade-off parameter β is searched in the range of {0.1, 0.2, 0.3, …, 0.8, 0.9}. The neighbor number l and p in MGKSFA are empirically chosen to be 3 and 5, respectively. By solving the generalized eigenvalue decomposition in Eq. (14), the optimal kernel discriminant vector α opt is selected as the eigenvector corresponding to the smallest eigenvalue. By solving the minimization problem in Eq. (25), the vector α1 corresponding to the smallest eigenvalue and vector α2 corresponding to the second smallest eigenvalue construct the loading matrix A = [α1, α2]. The values and the selection criterions of the parameters in DGKSFA for the numerical nonlinear system are listed in Table 1. For a fairly comparison, Gaussian kernel function is used in MKFDA and DKSFA methods and the kernel parameter σ is also set to 400. The neighbor number l and p in DKSFA are also chosen as 3 and 5, respectively. For MKFDA and DKSFA methods, to detect the process fault, the optimal kernel discriminant vectors are selected as the eigenvectors corresponding to the biggest and smallest eigenvalues, respectively. To identify the fault variable, the loading matrices in MKFDA and DKSFA methods are constructed with the eigenvectors corresponding to the

method. 6.1. A numerical nonlinear system A simple numerical nonlinear system is simulated with the following model [16,20,58].

⎡ 0.118 − 0.191 0.287 ⎤ ⎡ 0.13 z (k ) = ⎢ 0.847 0.264 0.943 ⎥ z (k − 1) + ⎢ 0 ⎣ 0 ⎣− 0.333 0.514 − 0.217 ⎦

2 ⎤ − 4 ⎥ u2 (k − 1) 1 ⎦ (32)

y (k ) = z (k ) + v (k )

(33)

where u is the correlated input:

0.811 − 0.226 ⎤ 0.193 0.689 ⎤ u (k ) = ⎡ u (k − 1) + ⎡ w (k − 1) ⎢ 0 ⎢− 0.320 − 0.749 ⎥ 0.415 ⎥ ⎣ ⎦ ⎣ ⎦ (34) wherew(k − 1) is the Gaussian noise with zero mean and unit variance, v(k ) is the measurement noise following the Gaussian distribution with zero mean and 0.01 variance, u(k ) = [u1 (k ), u2 (k )]T are the input variables, z(k ) = [z1 (k ), z2 (k ), z 3 (k )]T are the system state variables andy(k ) = [y1 (k ), y2 (k ), y3 (k )]T are the system outputs. Five variables [u1 (k ), u2 (k ), y1 (k ), y2 (k ), y3 (k ), ]are used as the monitored variables. To verify the monitoring algorithms, one normal operating dataset including 30 batches is generated, where each batch is composed of 200 samples. To simulate the stochastic variations among batches, the small changes are designed for the variable correlation coefficients. Furthermore, a fault case is simulated as: variable u1 (variable 1) is linearly 120

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

-6

-5

1.8

x 10

1.2

x 10

1.6

1

1.2

Contribution

Contribution

1.4

1 0.8 0.6

0.8

0.6

0.4

0.4

0.2 0.2 0

1

2

3

4

5

6

7

8

9

0

10

1

2

3

4

5

6

7

8

9

10

Variables

Variables (a) MKFDA

(b) DKSFA -4

Contribution

x 10

2

1

0

1

2

3

4

5

6

7

8

9

10

Variables (c) DGKSFA Fig. 13. Fault identification using the proposed contribution plot for the penicillin fermentation process fault F3.

trajectories of fault snapshot dataset original variables of MKFDA, DKSFA and DGKSFA are visualized in the biplots shown in Fig. 7 (a)-(c). The graphical comparison of pseudo-sample variable trajectories and ⎯⎯⎯⇀ the corresponding optimal separation vector AB permits to evaluate the contribution of original variable to class separation. According to Fig. 7, the variable 1 (variable u1) is recognized as the most discriminative variable for the class separation by the three methods, confirming our hypothesis. However, as illustrated in Fig. 7 (a)-(b), the fault-free variable 3 and variable 4 are deemed to have much larger discriminant power by MKFDA and DKSFA. To determine the contribution of the original variable qualitatively, the proposed contribution plots of the three methods are shown in Fig. 8 (a)-(c), respectively. In Fig. 8 (a)-(b), the MKFDA based and DKSFA based contribution plots both indicate that variable 1 has the largest contribution to this fault. However, the contributions of other variables are also very large, which leads to an unconvincing conclusion. In contrast, the DGKSFA based contribution plot shown in Fig. 8 (c) not only identifies the root cause of the fault correctly but also has a large difference between the contribution value of fault variable and those of normal variables, which ensures the clear identification with high contrast degree. A summarized comparison of MKFDA based, DKSFA based and DGKSFA based contribution plots is given by performance indices in Table 3. Although the three contribution plots can determined the root cause of the fault correctly, their recognition contrast degrees are quite different. The RCD values of MKFDA based and DKSFA based contribution plots are both below 0.4, indicating that the performances of the two methods are not satisfactory because they have small recognition contrast degrees. On the contrary, the DGKSFA based contribution plot gives ideal results because the value of RCD is 0.9156.

biggest and smallest eigenvalues and eigenvectors corresponding to the second biggest and the second smallest eigenvalues, respectively. For the three monitoring methods, the fault detection threshold of each monitoring statistic is computed by the 99% confidence limit of normal operation data. The fault detection rate is defined as the percentage of the fault samples exceeding the detection threshold over all the fault samples. In order to decrease the false alarms, a fault is detected only when consecutive five samples exceed the 99% confidence limit and the first sample number of them is defined as fault detection time. The simulated fault is a ramp change in variable 1 (variable u1). The fault monitoring charts of three methods are shown in Fig. 6. From Fig. 6 (a), it could be seen that MKFDA Ddist statistic detects the fault at the 120-th. When DKSFA is applied, it gives a better monitoring result shown in Fig. 6 (b), where the Ddist statistic alarms the fault at the 114th sample. The monitoring result of DGKSFA is plotted in Fig. 6 (c), which indicates that Ddist statistic detects the fault at the 103-th sample. Therefore, DGKSFA is the most sensitive to the numerical system fault. The fault detection times and fault detection rates obtained by the three methods are listed in Table 2. It is clear that the fault detection rate of DGKSFA Ddist is 98%, which are the highest among the three methods. To sum up, the case study on the numerical nonlinear system shows that DGKSFA outperforms the DKSFA and MKFDA methods in terms of fault detection time and fault detection rate. After the fault is detected, the next step is to identify it. In this paper, the proposed nonlinear contribution plot method based on nonlinear biplot is combined with MKFDA and DKSFA as the same way of that in DGKSFA. Therefore, nonlinear contribution plots of MKFDA, DKSFA and DGKSFA are used to recognize the fault. The optimal class ⎯⎯⎯⇀ separation vector AB (the red dotted line) and pseudo-sample 121

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

10

-2

10

154

Ddist 10

10

-8

-4

10

10

132

-6

-3

Ddist

10

10

-4

-5

0

100

200

300

10

400

-10

-12

0

100

200

300

400

Time (h)

Time (h)

(a) MKFDA

(b) DKSFA 10 10

Ddist

10

0

-2

-4

112

10 10

10

-6

-8

-10

0

100

200

300

400

Time (h)

(c) DGKSFA Fig. 14. Online monitoring charts of the penicillin fermentation process fault F6.

points, a normal operating dataset including 40 batches is randomly generated according to the normal operating conditions listed in Table 5. The data characteristics of the monitored process variables listed in Table 4 are presented in Fig. 10 using randomly selected five normal batches. In Fig. 10, we can see that most of the process variables have inherent time-varying dynamics, i.e., multiphase behaviors. Furthermore, much stronger time-varying dynamics and much higher uncertainty in the transition period which is approximately from 40 h to 50 h can also be clearly observed in Fig. 10. Therefore, we can visually observe the inherent time-varying dynamics of the fed-batch penicillin fermentation from Fig. 10. Besides, six faulty batches are simulated, which also last 400 h. The tested faults involve step and ramp changes of the influence aeration rate, the agitator power and the substrate feed rate. The detailed fault information is listed in Table 6. For DGKSFA, the kernel function is chosen as Gaussian kernel according to references [52] and [53]. The kernel parameter σ and the trade-off parameter β in DGKSFA are determined based on the same grid search method used for the first case study. The number of neighbor points l and p are empirically selected as 4 and 6, respectively. The optimal kernel discriminant vector α opt and the loading matrix A in DGKSFA are constructed to follow the criterions introduced in Section 3 and Section 4. The values and the selection criterions of the parameters in DGKSFA for the penicillin fermentation process are listed in Table 7. To be fair, Gaussian kernel function is used in MKFDA and DKSFA methods and the kernel parameter σ is also set to 600, the neighbor number l and p in DKSFA are chosen as 4 and 6, respectively. For MKFDA and DKSFA methods, to detect the process fault and identify the fault variable, the optimal kernel discriminant vectors and the loading matrices are constructed according to the same principles used in the first case study, respectively. For the three monitoring methods, the fault detection threshold is set as the 99% confidence limit of normal

6.2. The benchmark fed-batch penicillin fermentation process The fed-batch penicillin fermentation process shown in Fig. 9 is a well-known benchmark simulation process widely used to evaluate the batch process monitoring and fault diagnosis methods [12] [52] [53], and [63]. There are four physiological phases, i.e., lag, exponential cell growth, stationary, and cell death, in penicillin fermentation. These four physiological phases delineate the progress of the process. To ensure final penicillin concentration with a high value, the first three physiological phases are considered [63]. Furthermore, the fed-batch penicillin fermentation process can be divided into two operational stages. The first operational stage is a pre-culture stage and the second operational stage is a fed-batch stage. Most of the necessary cell mass is generated during the initial pre-culture stage, and the penicillin cells begin to be produced in the exponential cell growth phase. During the fed-batch stage, the biomass growth rate should be kept constant to maintain high penicillin production. Thus, glucose is continuously supplied to the system instead of all being added at the beginning. The fermenter temperature and pH are controlled to provide the best conditions for penicillin production. Therefore, the fed-batch penicillin fermentation process, as a typical multiphase batch process, has the inherent time-varying dynamics. In this study, a modular simulator Pensim V.2.0 [64] is used to generate the simulation dataset of the penicillin fermentation process. For the purpose of algorithm evaluation, we collect 10 variables shown in Table 4 as the monitored variables and Gaussian noises are added to each process variable during the variable sampling procedure. The selected sampling rate is 1 h and all of the batch runs are of the same duration of 400 h. Therefore, the pre-culture stage is approximately from 1 h to 45 h while the fed-batch stage is approximately from 46 h to 400 h. Considering small variations in the initial conditions and set

122

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

Fig. 15. Original pseudo-sample variable trajectories of the penicillin fermentation process fault F6.

fault snapshot dataset qualitatively. In Fig. 13 (a) and (b), the MKFDA based contribution plot identifies variable 5 as the root cause of the fault while the DKSFA based contribution plot suggests that variable 10 is affected most. Therefore, the two methods cannot give the accurate fault variable. For the DGKSFA based contribution plot in Fig. 13 (c), the variable 3 is recognized as the maximum contributor to the fault F3, which is more accurate than MKFDA based and DKSFA based contribution plots. Fault F6 involves the ramp variation of substrate feed rate (variable 3). When fault F6 occurs, the monitoring charts of three methods are shown in Fig. 14. According to Fig. 14 (c), DGKSFA obtains the quickest fault detection, which alerts the fault at the 112 h for the Ddist statistic. By contrast, MKFDA performs the worst in Fig. 14 (a), whose Ddist statistic detects the fault at the 154 h. Compared to MKFDA, DKSFA gives a better result in Fig. 14 (b), where the DKSFA Ddist statistic alarms the fault at the 132 h. The monitoring charts of fault F6 demonstrate that the DGKSFA-based monitoring method can detect the fault F6 much faster than then other two monitoring methods. ⎯⎯⎯⇀ The optimal class separation vector AB (the red dotted line) and pseudo-sample variable trajectories of the fault snapshot dataset of MKFDA, DKSFA and DGKSFA are visualized in Fig. 15 (a)-(c). In Fig. 15 (a) and (b), it is rather clear that the variables 3, 5, 7 and 10 have large discriminant power by comparing their pseudo-sample trajectories with ⎯⎯⎯⇀ the optimal class separation vector AB . From Fig. 15 (c), variables 3, 5 and 10 are recognized as the most discriminative variables to the class

operation data. In all monitoring charts, we consider a fault is detected if continuous five samples exceed confidence limit. Fault F3 is firstly illustrated, which is a step increase of substrate feed rate (variable 3) occurring from the 100-th hour. The monitoring charts obtained by MKFDA, DKSFA and DGKSFA are listed in Fig. 11. The monitoring result of MKFDA is shown in Fig. 11 (a), where the MKFDA Ddist statistic indicates the fault at the 128 h. However, the Ddist statistic values of many fault samples fluctuate around the control limit after 128 h, which leads to a relatively large missing alarm rate. In Fig. 11 (b), DKSFA obtains a better performance than MKFDA and its Ddist statistic detects the fault at the 120 h. In contrast to the monitoring results of Fig. 11 (a) and (b), DGKSFA does best in Fig. 11 (c), where it is observed that Ddist statistic exceeds its confidence limits clearly from the 100 h. DGKSFA gives the earliest fault detection results. Generally, DGKSFA demonstrates its great advantage in the monitoring of the fault F3. To identify the fault variable, the optimal class separation vector ⎯⎯⎯⇀ AB (the red dotted line) and pseudo-sample variable trajectories of the fault snapshot dataset of MKFDA, DKSFA and DGKSFA are visualized in the biplots shown in Fig. 12 (a)-(c). Fig. 12 (a) indicates that the variables 3, 5, 6 and 10 have large discriminant power. According to Fig. 12 (b) and (c), variables 3, 5 and 10 are recognized as the most discriminative variables to the class separation. Next, the proposed contribution plots of the three methods illustrated in Fig. 13 (a)-(c) are calculated to determine the contribution of the original variable in the

123

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

-5

-3

4.5

x 10

6

x 10

4 5

3

Contribution

Contribution

3.5

2.5 2 1.5

4

3

2

1 1

0.5 0

1

2

3

4

5

6

7

8

9

0

10

1

2

3

Variables

4

5

6

7

8

9

10

Variables

(a) MKFDA

(b) DKSFA -5

3

x 10

Contribution

2.5

2

1.5

1

0.5

0

1

2

3

4

5

6

7

8

9

10

Variables (c) DGKSFA Fig. 16. Fault identification using the proposed contribution plot for the penicillin fermentation process fault F6. Table 8 FDTs (Sample No.) and FDRs (%) of Ddist statistic obtained by three methods.

F1 F2 F3 F4 F5 F6

DKSFA

The average fault detection rate

MKFDA

1.1

DGKSFA

FDT

FDR

FDT

FDR

FDT

FDR

100 100 128 142 136 154

100 100 83.11 84.46 85.14 82.24

100 100 120 130 128 132

100 100 93.92 90.54 91.23 89.87

100 100 100 110 106 112

100 100 100 97.30 98.65 96.62

separation. In order to give an objective criterion for evaluating the contribution of the original variables in the fault snapshot dataset, the proposed contribution plots of the three methods are calculated in Fig. 16 (a)-(c). With the MKFDA based contribution plot in Fig. 16 (a), the fault variable is falsely recognized as the variable 5 because it has the largest contribution. Although the DKSFA based contribution plot in Fig. 16 (b) indicates variable 3 and 10 are more responsible for the fault F6, variable 3 is still not identified as the root cause of the fault. On the contrary, we can find that the largest contribution to the class separation is from variable 3 according to the DGKSFA based contribution plot illustrated in Fig. 16 (c). Therefore, the DGKSFA based contribution plot can find abnormal variable more clearly and more accurately than MKFDA based and DKSFA based contribution plots. The monitoring performance of the three monitoring methods on all

1

0.9

0.8

0.7

0.6

0.5

MKFDA

DKSFA

DGKSFA

Monitoring method Fig. 17. Comparison of the average fault detection rates over the fault F3, F4, F5 and F6.

the six fault scenarios of the fed-batch penicillin fermentation process is investigated. The fault detection times and the fault detection rates obtained by these three methods are tabulated in Table 8. From Table 8, it can be found that all the three monitoring methods can detect stepchange faults F1 and F2 immediately. However, the MKFDA and DKSFA 124

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

develop a novel nonlinear contribution plot to calculate the discriminant power of the original variable in the fault snapshot dataset to the optimal class separation quantitatively, which is proposed based on the idea of nonlinear biplot. Simulation results involving a numerical nonlinear dynamic system and the benchmark fed-batch penicillin fermentation process demonstrate that the proposed DGKSFA based monitoring method outperforms MKFDA based and DKSFA based monitoring methods in terms of the fault detection time and fault detection rate. Furthermore, the superior performance of the developed DGKSFA based nonlinear contribution plot in identifying the underlying fault variable is also demonstrated in the two case studies. In the proposed DGKSFA method, the number of C classes of historical fault data are treated equally to construct the fault pattern dataset. However, different classes of historical fault data may have different contributions to the class label information of the monitored process. Therefore, to utilize the discriminatory information of different classes more rationally and more optimally, the research on how to apply different weightings to different classes of historical fault dataset needs to be further investigated. Besides, during the stage of constructing the between-class pseudo-time series, how the number of neighbor points affects the performance of the DGKSFA method is still not fully understood. For the proposed nonlinear contribution plot, owing to limited redundancy or correlation among different process variables, the contribution from the real fault variable may be propagated to other correlated normal process variables in the proposed nonlinear contribution plot. This “smearing” effect can reduce the difference between the real fault variable and the normal process variables, which is a common shortcoming for the contribution plot based methods [65]. Under this circumstances, the proposed nonlinear contribution plot may not explicitly identify the real fault variable, but it only determine a small group of process variables that are highly correlated with the fault. To identify the real fault variable in the case of the “smearing” effect, the combination of the proposed nonlinear contribution plot and the causality analysis [66] based methods which accurately can determine cause and effect relationship between different process variables, for example the transfer entropy [67,68], is currently underway.

Table 9 The IVI and RCD indices of the three methods for the six faults. Fault number

F1 F2 F3 F4 F5 F6

Variable number

1 2 3 1 2 3

MKFDA

DKSFA

DGKSFA

IVI

RCD

IVI

RCD

IVI

RCD

1 2 5 1 2 5

0.9605 0.9593 −0.8057 0.0376 0.8321 −0.6095

1 2 10 1 2 10

0.9989 0.9995 −0.7199 0.8786 0.9197 −0.0221

1 2 3 1 2 3

0.9998 0.9999 0.4199 0.9999 0.9999 0.5861

methods cannot detect the step-change fault F3 right after it is introduced to the process while the MGKSFA-based method alerts the step-change fault F3 with no fault detection delay time. In contrast to the step-change faults, the ramp faults F4 to F6 are more difficult to detect due to their slowly-varying process parameters. For the challenging ramp fault detection problem, our proposed DGKSFA-based monitoring method achieves much faster fault detection than the other two monitoring methods. The results of Table 8 clearly demonstrate the superior ability of the MGKSFA-based method in shortening the fault detection delay for the challenging fault patterns F3 to F6, in comparison with the other two methods. From Table 8, all the three methods are seen to achieve 100% fault detection rates for the step-change faults F1 and F2. For the challenging faults F3 to F6, our proposed DGKSFAbased monitoring method is observed to obtain higher fault detection rates than the other four methods. The average fault detection rates of each monitoring statistic over the faults F3eF6 are illustrated in Fig. 17 for a more intuitive comparison. The results of Fig. 17 further confirm the superior monitoring performance of the DGKSFA-based method over the MKFDA-based, DKSFA-based methods. After the six faults (F1∼F6) are detected, the next step is to identify fault variables. MKFDA based, DKSFA based and DGKSFA based contribution plots for the six faults are calculated and the quantitative performance comparison is executed by indices IVI and RCD, listed in Table 9. For the fault F1, F2, F4 and F5, the three kinds of contribution plots can identify the fault variable correctly, but the RCD values of them are different, especially for fault F4 and F5. The DGKSFA based contribution plot can identify the fault variables of fault F4 and F5 more clearly because its RCD values for fault F4 and F5 are much larger than that of MKFDA based and DKSFA based contribution plots. The fault variable (substrate feed rate) of fault F3 and F6 is difficult to diagnose because the effect of glucose substrate feed rate is propagated to other correlated variables. The results in Table 9 show that the MKFDA based and DKSFA based contribution plots are unable to identify the fault variable correctly for faults F3 and F6. In contrast, the DGKSFA based contribution plot can find the true fault variables for faults F3 and F6 with the values of RCD are both above 0.4, which means clear identification. According to the above results and analysis, the proposed DGKSFA based contribution plot is superior to the MKFDA based and DKSFA based contribution plots in terms of recognizing fault variables.

Acknowledgement This work is supported by the National Natural Science Foundation of China [grant numbers 61273160, 61403418, 61473176]; the Natural Science Foundation of Shandong Province, China [grant numbers ZR2014FL016, ZR2016FQ21]; the Shandong Provincial Key Program of Research and Development [grant number 2018GGX101025]; and the Fundamental Research Funds for the Central Universities [grant number 17CX02054]. References [1] Zhang D, Shi P, Wang QG, Yu L. Analysis and synthesis of networked control systems: a survey of recent advances and challenges. ISA (Instrum Soc Am) Trans 2017;66:376–92. [2] Lv Z, Yan X, Jiang Q. Batch process monitoring based on just-in-time learning and multiple-subspace principal component analysis. Chemometr Intell Lab Syst 2014;137(20):128–39. [3] Yu J, Qin S. Multiway Gaussian mixture model based multiphase batch process monitoring. Ind Eng Chem Res 2009;48(18):8585–94. [4] Hong J, Zhang J, Morris J. Progressive multi-block modelling for enhanced fault isolation in batch processes. J Process Contr 2014;24(1):13–26. [5] Lv Z, Yan X, Jiang Q. Batch process monitoring based on multiple-phase online sorting principal component analysis. ISA (Instrum Soc Am) Trans 2016;64:342–52. [6] Deng X, Wang L. Modified kernel principal component analysis using doubleweighted local outlier factor and its application to nonlinear process monitoring. ISA (Instrum Soc Am) Trans 2018;72:218–28. [7] Qin S. Survey on data-driven industrial process monitoring and diagnosis. Annu Rev Contr 2012;36(2):220–34. [8] Cai L, Tian X, Chen S. Monitoring nonlinear and non-Gaussian processes using Gaussian mixture model-based weighted kernel independent component analysis. IEEE Transactions on Neural Networks & Learning Systems 2017;28(1):122–35.

7. Conclusions A novel statistical modeling approach, referred to as DGKSFA, has been proposed for batch process monitoring and fault diagnosis. Our contribution has been twofold. First, the DGKSFA based model is derived by integrating discriminant analysis into GKSFA method to mine discriminant information and time-varying dynamics for effective fault detection of batch process. Specifically, the discriminant analysis is used to extract different class discriminatory information and GKSFA is used to preserve global and local structure information of nonlinear process data. Second, the challenging problem of fault variable identification for kernel based methods is investigated after a fault is detected by the DGKSFA based monitoring statistic. A further contribution is to 125

ISA Transactions 79 (2018) 108–126

H. Zhang et al.

[38] Wang QG, Yu C, Zhang Y. Improved system identification with renormalization group. ISA (Instrum Soc Am) Trans 2014;53:1481–8. [39] Qin S. Statistical process monitoring: basics and beyond. J Chemometr 2003;17(8–9):480–502. [40] Alcala C, Qin S. Reconstruction-based contribution for process monitoring. Automatica 2009;45:1593–600. [41] Deng X, Tian X, Chen S, et al. Nonlinear process fault diagnosis based on serial principal component analysis. IEEE Transactions on Neural Networks and Learning Systems 2018;29(3):560–72. http://dx.doi.org/10.1109/TNNLS.2016.2635111. [42] Honório LM, Barbosa DA, Oliveira EJ, Garcia PAN, Santos MF. A multiple kernel classification approach based on a Quadratic Successive Geometric Segmentation methodology with a fault diagnosis case. ISA (Instrum Soc Am) Trans 2018;74:209–16. [43] Johannesmeyer M, Singhal A, Seborg D. Pattern matching in historical data. AIChE J 2002;48(9):2022–38. [44] Gower J, Hardings S. Nonlinear biplots. Biometrika 1988;75(3):445–55. [45] Vines SK. Predictive nonlinear biplots: maps and trajectories. J Multivariate Anal 2015;140:47–59. [46] Krooshof P, Ustün B, Postma G, et al. Visualization and recovery of the (bio)chemical interesting variables in data analysis with support vector machine classification. Anal Chem 2010;82(16):7000–7. [47] Postma G, Krooshof P, Buydens L. Opening the kernel of kernel partial least squares and support vector machines. Anal Chim Acta 2011;705(1–2):123–34. [48] Smolinska A, Blanchet L, Coulier L, et al. Interpretation and visualization of nonlinear data fusion in kernel space: study on metabolomic characterization of progression of multiple sclerosis. PLoS One 2012;7(6):e38163. [49] Engel JG, Postma J, Peufflik IV, et al. Pseudo-sample trajectories for variable interaction detection in dissimilarity partial least squares. Chemometr Intell Lab Syst 2015;146:89–101. [50] Vitale R, Noord O, Ferrer A. A kernel-based approach for fault diagnosis in batch processes. J Chemometr 2014;28:697–707. [51] Vitale R, Noord O, Ferrer A. Pseudo-sample based contribution plots: innovative tools for fault diagnosis in kernel-based batch process monitoring. Chemometr Intell Lab Syst 2015;149:40–52. [52] Yao M, Wang H. On-line monitoring of batch processes using generalized additive kernel principal component analysis. J Process Contr 2015;28:56–72. [53] Tang X, Li Y, Xie Z. Phase division and process monitoring for multiphase batch processes with transitions. Chemometr Intell Lab Syst 2015;145:72–83. [54] Hong X, Chen S, Harris CJ. A forward-constrained regression algorithm for sparse kernel density estimation. IEEE Transactions on Neural Network 2008;19(1):193–8. [55] Zhang Y, An J, Zhang H. Monitoring of time-varying processes using kernel independent component analysis. Chem Eng Sci 2013;88(2):23–32. [56] Marsico M, Nappi M, Riccio D, et al. Mobile Iris Challenge Evaluation (MICHE)-I, biometric iris dataset and protocols. Pattern Recogn Lett 2015;57(C):17–23. [57] Zhang H, Tian X, Deng X, Cai L. A local and global statistics pattern analysis method and its application to process fault identification. Chin J Chem Eng 2015;23(11):1782–92. [58] Chang K, Lee IB. Nonlinear multivariate filtering and bioprocess monitoring for supervising nonlinear biological processes. Process Biochem 2006;41(8):1854–63. [59] Pontes FJ, Amorim GF, Balestrassi PP, et al. Design of experiments and focused grid search for neural network parameter optimization. Neurocomputing 2016;186(C):22–34. [60] Zong W, Huang GB, Chen Y. Weighted extreme learning machine for imbalance learning. Neurocomputing 2013;101(3):229–42. [61] Li C, Ding Z, Zhao D, et al. Building energy consumption prediction: an extreme deep learning approach. Energies 2017;10(10):1525. [62] Li C, Gao J, Yi J, et al. Analysis and design of functionally weighted single-inputrule-modules connected fuzzy inference systems. IEEE Trans Fuzzy Syst 2018;26(1):56–71. [63] Sun W, Meng Y, Palazoglu A, Zhao J, et al. A method for multiphase batch process monitoring based on auto phase identification. J Process Contr 2011;21(4):627–38. [64] Birol G, Undey C, Cinar A. A modular simulation package for fed-batch fermentation: penicillin production. Comput Chem Eng 2002;26(11):1553–65. [65] Yan Z, Kuang TH, Yao Y. Multivariate fault isolation of batch processes via variable selection in partial least squares discriminant analysis. ISA (Instrum Soc Am) Trans 2017;70:389–99. [66] Gharahbagheri H, Imtiaz S, Khan F, et al. Causality analysis for root cause diagnosis in fluid catalytic cracking unit. IFAC Papersonline 2015;48(21):838–43. [67] Duan P, Yang F, Chen T, et al. Direct causality detection via the transfer entropy approach. IEEE Trans Contr Syst Technol 2013;21(6):2052–66. [68] Duan P, Yang F, Shah SL, et al. Transfer zero-entropy and its application for capturing cause and effect relationship between variables. IEEE Trans Contr Syst Technol 2015;23(3):855–67.

[9] Huang J, Wang QG. Decentralized adaptive control of interconnected nonlinear systems with unknown control directions. ISA (Instrum Soc Am) Trans 2018;74:60–6https://doi.org/10.1016/j.isatra.2018.01.008. [10] Lee JM, Yoo CK, Lee IB. Fault detection of batch processes using multiway kernel principal component analysis. Comput Chem Eng 2004;28(9):1837–47. [11] Di L, Xiong Z, Yang X. Nonlinear process modeling and optimization based on multiway kernel partial least squares model[C]//2008 Winter simulation Conference. 2008. p. 1645–51. [12] Tian X, Zhang X, Deng X, Chen S. Multiway kernel independent component analysis based on feature samples for batch process monitoring. Neurocomputing 2009;72:1584–96. [13] Chen T, Zhang J. On-line multivariate statistical monitoring of batch processes using Gaussian mixture model. Comput Chem Eng 2010;34(4):500–7. [14] Chen J, Liu K. On-line batch process monitoring using dynamic PCA and dynamic PLS models. Chem Eng Sci 2002;57(1):63–75. [15] Sang W, Morris J, Lee IB. Dynamic model-based batch process monitoring. Chem Eng Sci 2008;63(3):622–36. [16] Jia M, Chu F, Wang F, Wang W. On-line batch process monitoring using batch dynamic kernel principal component analysis. Chemometr Intell Lab Syst 2010;101(2):110–22. [17] Wang Y, Jia M, Mao Z. Weak fault monitoring method for batch process based on multi-model SDKPCA. Chemometr Intell Lab Syst 2012;118(118):1–12. [18] Zou T, Wu S, Zhang R. Improved state space model predictive fault-tolerant control for injection molding batch processes with partial actuator faults using GA optimization. ISA (Instrum Soc Am) Trans 2018;73:147–53. [19] Wu S, Jin Q, Zhang R, Zhang J, Gao F. Improved design of constrained model predictive tracking control for batch processes against unknown uncertainties. ISA (Instrum Soc Am) Trans 2017;69:273–80. [20] Zhang H, Tian X, Deng X. Batch process monitoring based on multiway global preserving kernel slow feature analysis. IEEE Access 2017;5:2696–710. [21] Zhang H, Tian X, Deng X, Cao Y. Multiphase batch process with transitions monitoring based on global preserving statistics slow feature analysis. Neurocomputing 2018;293:64–86. http://dx.doi.org/10.1016/j.neucom.2018.02.091. [22] Shang C, Yang F, Gao X, et al. Concurrent monitoring of operating condition deviations and process dynamics anomalies with slow feature analysis. AIChE J 2015;61(11):3666–82. [23] Shang C, Huang B, Yang F, et al. Probabilistic slow feature analysis-based representation learning from massive process data for soft sensor modeling. AIChE J 2015;61(12):4126–39. [24] Shang C, Yang F, Huang B, Huang D. Recursive slow feature analysis for adaptive monitoring of industrial processes. IEEE Trans Ind Electron 2018. http://dx.doi. org/10.1109/TIE. 2018. 2811358. [25] Wiskott L, Sejnowski TJ. Slow feature analysis: unsupervised learning of invariances. Neural Comput 2002;14(4):715–70. [26] Wiskott L. Estimating driving forces of nonstationary time series with slow feature analysis. arXiv.org e-Print archive. 2003http://arxiv.org/abs/cond-mat/0312317/. [27] Deng X, Tian X, Hu X. Nonlinear process fault diagnosis based on slow feature analysis[C]//2012 10th World Congress on Intelligent Control and Automation (WCICA). 2012. p. 3152–6. [28] Zhang N, Tian X, Cai L, Deng X. Process fault detection based on dynamic kernel slow feature analysis. Comput Electr Eng 2015;41:9–17. [29] Guo F, Shang C, Huang B, et al. Monitoring of operating point and process dynamics via probabilistic slow feature analysis. Chemometr Intell Lab Syst 2016;151:115–25. [30] Huang J, Ersoy OK, Yan X. Slow feature analysis based on online feature reordering and feature selection for dynamic chemical process monitoring. Chemometr Intell Lab Syst 2017;169:1–11. [31] Zhang H, Tian X. Batch process monitoring based on batch dynamic kernel slow feature analysis [C]//the 29th Chinese Control and Decision Conference. May 28-30 2017. p. 4772–7. Chongqing, China. [32] Huang Y, Zhao J, Liu Y, et al. Nonlinear dimensionality reduction using a temporal coherence principle. Inf Sci 2011;181(16):3284–307. [33] Huang Y, Zhao J, Tian M, et al. Slow feature discriminant analysis and its application on handwritten digit recognition[C]. International Joint Conference on Neural Networks. IEEE; 2009. p. 1294–7. [34] Gu X, Liu C, Wang S, et al. Feature extraction using adaptive slow feature discriminant analysis. Neurocomputing 2015;154:139–48. [35] Gu X, Liu C, Wang S, et al. Uncorrelated slow feature discriminant analysis using global preserving projections for feature extraction. Neurocomputing 2015;168(C):488–99. [36] Zhang H, Tian X, Cai L. Nonlinear process fault diagnosis using kernel slow feature discriminant analysis. IFAC Papersonline 2015;48(21):607–12. [37] Zhang H, Tian X. Nonlinear process fault detection based on KSFDA and SVDD. CIESC Journal 2016;3(67):827–32.

126