Online reduced gaussian process regression based generalized likelihood ratio test for fault detection

Online reduced gaussian process regression based generalized likelihood ratio test for fault detection

Journal of Process Control 85 (2020) 30–40 Contents lists available at ScienceDirect Journal of Process Control journal homepage: www.elsevier.com/l...

2MB Sizes 1 Downloads 55 Views

Journal of Process Control 85 (2020) 30–40

Contents lists available at ScienceDirect

Journal of Process Control journal homepage: www.elsevier.com/locate/jprocont

Online reduced gaussian process regression based generalized likelihood ratio test for fault detection Fezai R. a, Mansouri M. a,∗, Abodayeh K. b, Nounou H. a, Nounou M. c a

Electrical and Computer Engineering Program, Texas A&M University at Qatar, Qatar Department of Mathematical Sciences, Prince Sultan University, Riyadh, Saudi Arabia c Chemical Engineering Program, Texas A&M University at Qatar, Qatar b

a r t i c l e

i n f o

Article history: Received 3 May 2019 Revised 16 August 2019 Accepted 1 November 2019

Keywords: Machine learning (ML) Fault detection (FD) Gaussian process regression (GPR) Generalized likelihood ratio test (GLRT) Online reduced GPR Tennessee eastman (TE) process

a b s t r a c t In this paper we consider a new fault detection approach that merges the benefits of Gaussian process regression (GPR) with a generalized likelihood ratio test (GLRT). The GPR is one of the most well-known machine learning techniques. It is simpler and generally more robust than other methods. To deal with both high computational costs for large data sets and time-varying dynamics of industrial processes, we consider a reduced and online version of the GPR method. The online reduced GPR (ORGPR) aims to select a reduced set of kernel functions to build the GPR model and apply it for online fault detection based on GLRT chart. Compared with the conventional GPR technique, the proposed ORGPR method has the advantages of improving the computational efficiency by decreasing the dimension of the kernel matrix. The developed ORGPR-based GLRT (ORGPR-based GLRT) could improve the fault detection efficiency since it is able to track the time-varying characteristics of the processes. The fault detection performance of the developed ORGPR-based GLRT method is evaluated using a Tennessee Eastman process. The simulation results show that the proposed method outperforms the conventional GPR-based GLRT technique. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction Fault detection (FD) has received considerable attention in the modern industrial processes, due to its central role in avoiding emergency shutdowns, equipment damage or casualties. FD is necessary to monitor the continuity of operating a system under nominal conditions in order to improve productivity, increase production process utilization and reduce maintenance costs. Various FD methodologies have been developed and proposed in the literature [1–4]. These methods are generally classified into two main categories: model based and data-driven methods. Generally, model-based methods, such as the observer-based method, parity space, and parameter estimation methods used mainly mathematical models to detect the faults [5–7]. The data driven methods, including machine learning techniques (MLTs), offer an effective alternative solution for FD, especially for large scale chemical systems. Various MLTs are used to estimate the models of chemical systems. The MLT includes linear regression [8,9], Least Absolute Shrinkage and Selection Operator (LASSO) [10], Elastic net [11], decision tree regression Bagging [12], decision tree regression boost-



Corresponding author. E-mail address: [email protected] (M. M.).

https://doi.org/10.1016/j.jprocont.2019.11.002 0959-1524/© 2019 Elsevier Ltd. All rights reserved.

ing [13], artificial neural networks (ANN) [14–17], support vector regression (SVR) [18], extreme learning machines (ELM) [19– 22], relevance vector machines (RVM) [23], Kernel ridge regression (KRR) [24], regularized linear regression (RLR) [25] and Gaussian process regression (GPR) [26–29]. It has been shown that the GPR technique outperforms the LASSO, Elastic net, Bagging, boosting, KRR, SVR, ELM and RVM techniques in terms of modeling performances. The advantages of the GPR are due to the ability of GPs to approximate the system uncertainty effectively, which translates into a residuals distribution that reflects well changes in the process. Recently, modeling with GPR has received an increased attention in the machine learning techniques due to its simplicity and their generalization performance compared to the other techniques [30,31]. In addition to the good numerical performance and stability, GPR requires a relatively small training data set and can adopt very flexible kernel functions. The GPR can identify the relevant bands and observations in establishing relationships with a variable and provide confidence intervals for the predictions. GPR has been shown to be particularly successful in different fields [30,31] such as image identification and face recognition, it performs much better than other nonlinear ML methods. In the current work, therefore, the benefits of this approach will be merged with the statistical hypothesis testing in order to

F. R., M. M. and A. K. et al. / Journal of Process Control 85 (2020) 30–40

enhance the monitoring abilities of chemical systems through the detection of different kinds of faults. However, the GPR technique presents two major drawbacks. The first problem is that the computational complexity scales cubically with the training samples. Applying conventional GPR requires O(N3 ) run time and O(N2 ) memory space, where N is the number of samples. This prevents GPR from being applied in its original form to any larger data set. The second drawback which prevents GPR from being applied for fault detection of dynamic process is its static model. The GPR technique is completely defined by a mean and a covariance function. Thus, the use of a static GPR model to monitor dynamic processes leads to the occurrence of false alarms that significantly reduces the reliability of this technique. In order to overcome these limitations, we propose to use an online reduced GPR (ORGPR) for fault detection. This method aims to extract a reduced number of samples from the original data set to construct the new GPR model and then use them for online fault detection. The new online method can update the parameters of GPR and considerably reduce its computational complexity, making possible to do modeling and fault detection with very large datasets. Additionally, this paper proposes the extension of the ORGPR algorithm to composite hypothesis testing method-based generalized likelihood ratio test (GLRT). In the previous research works [1,32– 36], it has been showed that the GLRT detection chart provided a good detection efficiency. It is known to have better fault detection performance when compared to the classical univariate and multivariate monitoring statistics. Thus, combining the advantages of online reduced GPR model with the hypothesis testing should provide even further improvements in fault detection. Therefore, the contribution of this paper is to enhance the quality of fault detection of chemical process using ORGPR-based GLRT approach, so that, the modeling phase is addressed using the ORGPR model and the detection phase is achieved using the GLRT chart. The developed fault detection ORGPR-based GLRT algorithm provides optimal properties by maximizing the detection probability for a particular false alarm rate (FAR). The validation of the developed approach is performed, using simulated Tennessee Eastman (TE) process data, through monitoring the key TE process variables. This paper is organized as follows: a description of the GPRbased GLRT is given in Section 2. The developed fault detection ORGPR-based GLRT is detailed in Section 3. Then, in Section 4, the fault detection performance is studied using Tennessee Eastman process. Finally, the conclusions are given in Section 5.

31

Given N observations X1 , X2 , . . . , XN , the joint distribution of the random variables is also Gaussian:



f (X1 )

f (X2 ) · · · f (XN )

T

= N ( μ, K ) ,

(4)

RN×N

where K ∈ is the covariance matrix whose entries are given by the covariance function, Ki j = K (Xi , X j ). The relationship between the input X and the output Y of GPR model is given by:

Y = f (X ) + ε ,

(5)

where ε ∼ N (0, σε2 ). A GPR model characterizes the response by drawing samples from the Gaussian processes and defining explicit basis functions φ (X1 ), φ (X2 ) . . . , φ (XM ) such that [37–39]

f (Xi ) =

M 

W j φ j (Xi )

i = 1...N

(6)

j=1

where M is the number of basis functions used to approximate f(X). Or simply, by writing

f (X ) = φ (X )W

(7)

where W and φ are the weight vector and the kernel function, respectively. Moreover, the probability density of the Gaussian samples follows a Gaussian distribution. Thus, the likelihood function can be written as: p(Y/X, W ) =

N 

( p(Yi /Xi , Wi ))

i=1

=

N 



i=1

=



 

1 2πσε )

1

exp −

exp −

T 

Yi − φ (Xi )T Wi

2σε2



N

2πσε

Yi − φ (Xi )T Wi

Y − φ ( X )T W

T 

Y − φ ( X )T W





2σε2

= N ( f (X ), σε2 IN ),

(8)

where IN is the identity matrix. To get the posterior distribution in Bayesian inference, we combine Eq. (8) with a Gaussian prior over W, such that

W ∼ N (0, ),

(9)

where  is the covariance matrix. Then, the posterior distribution is determined using Bayes rule as follows:

p(W/Y, X ) =

p(Y/X, W ) p(W ) . p(Y/X )

(10)

Therefore, the posterior distribution will be



−1 −1 ϒ φ ( X ) Y, ϒ , 2

1

2. Description of GPR-based GLRT approach

p(W/Y, X ) ∝ p(Y/X, W ) p(W ) = N

A Gaussian process regression (GPR) is a type of continuous stochastic process. GPR based on the Bayesian framework provides a complete posterior distribution over possible functions. It is a probability distribution over functions, such that every finite sample of function values (or outputs) f(X) is jointly Gaussian distributed and is defined by its mean μ(Xi ) and covariance function K(Xi , Xj ):

where ϒ =  −1 + σε−2 φ (X )φ (X )T (see also [40]). For a new observation X∗ with corresponding output Y∗ , the GPR defines a joint prior distribution

μ(Xi ) = E( f (Xi )), 

(1)



K (Xi , X j ) = E ( f (Xi ) − μ(X j ))( f (Xi ) − μ(X j )) .

(2)

In GPR, the function value f(Xi ) is given by:

f (Xi ) = N (μ(Xi ), K (Xi , X j )).

(11)

p( f (X ∗ )/X ∗ , X, Y ∗ )  = p( f (X ∗ )/X ∗ , W ) p(W/X, Y ∗ )dW = N (σε−2 φ T (X ∗ )ϒ −1 φ (X )Y ∗ , φ T (X ∗ )ϒ −1 φ (X ∗ )) = N (M, C ).

(12)

Therefore, the predicted distribution is also Gaussian with mean

M = σε−2 φ T (X ∗ )ϒ −1 φ (X )Y ∗ and covariance matrix:

(3)

σε

C = φ T (X ∗ )ϒ −1 φ (X ∗ ).

32

F. R., M. M. and A. K. et al. / Journal of Process Control 85 (2020) 30–40

By defining K = φ T (X ) φ (X ), the mean and the covariance matrix of the prediction distribution becomes



M = φ T (X ∗ ) φ (X ) K + σε2 IN

−1

Y∗

and

(13)



C = φ T (X ∗ ) φ (X ∗ ) − φ T (X ∗ ) φ (X ) K + σε2 IN

−1

φ T (X ) φ (X ∗ ). (14)

φ T (X

Due to the fact that

i ) φ (Xj )

is an inner product and  is a 1

positive definite matrix, we can define  as ( 2 )2 . Let us define 1 2

ψ (X ) =  φ (X ). Then, the a dot product of the form K (Xi , X j ) = ψ T (Xi )ψ (X j ) can be obtained by the evaluation of the kernel function. Using these properties, the posterior mean M can be rewritten as:



M = K∗T K + σε2 IN

−1

Y ∗.

C = K (X , X ) − ∗



K∗T



Algorithm 2 GPR-based GLRT algorithm. Input: input data matrix X and output data matrix Y •

K + σε IN 2

, X∗)

−1



K∗ ,

(16)

K (X2 , X ∗ ) · · · K (XN , X ∗ )

T

where K∗ = K (X1 . Eqs. (15) and (16) constitute the main results of the GPR. Given the Gaussian noise assumption, the prediction output Yˆ ∗ can be computed as follows:

Yˆ ∗ = N (M, C + σε2 ).

(17)

The residual vector (or the modeling error) E can be expressed as:

E = Y − Yˆ .

(18)

Training Phase 1. 2. 3. 4.

(15)

Thus, the posterior variance is given by ∗

After that, the GPR is applied to the scaled data to compute the model. The GLRT is applied to the modeling error (called residual) to compute the detection chart and its threshold. In the next step, the testing data are scaled using the same mean and variance computed in the training phase. The new detection chart is computed using the new residual. Finally, the computed threshold from training data is then used for fault detection and decision making. Algorithm 2 illustrates the main steps of the GPR-based GLRT fault detection chart.

Compute the model using GPR; Evaluate the residuals; Compute the GLRT chart; Compute the GLRT control limit (Gα );

Testing Phase 5. Compute the model using GPR; 6. Determine the posterior mean M and the posterior variance C; 7. Compute the prediction output Yˆ∗ ; 8. Compute the new residuals for new sample time; 9. Compute the new GPR chart; 1. If the new GLRT chart violates its control limit; Gα , the process is considered out of control and a fault is declared; 10. Else, there is no fault and the process is operating under normal operating conditions.

Hence, the GLRT statistic (G) is computed as [34,41]:

G=

1

ξ2

E 2 =

1

ξ2

Y − Yˆ 2 ,

(19)

where ξ 2 is the variance of the residual. Let a and b be, respectively, the mean and variance of G, which follow a chi-square distribution [41]. Then its control limit Gα can be computed from its approximate distribution as [34]: 2 Gα = gχh, α,

(20) 2

where g = 2ba and h = 2ba . A fault is detected if the G is out of its respective threshold

G > Gα .

(21)

The procedure of a GPR is given in Algorithm 1. The GPR-based GLRT method is achieved in two steps. In the first one, fault free data are scaled to zero mean and unit variance.

The GPR-based GLRT fault detection strategy is presented in Fig. 1. 3. Description of online reduced GPR-based GLRT In chemical processes, the measurement variables often show strong dynamic and stochastic features due to the inherent timevarying characteristics. Hence, using GPR for modeling and fault detection may become ill-suited for characterizing the complex relationships between the process and quality variables. It can lead to false alarm and missed detection. In addition, GPR technique is not useful for applications with large datasets since it requires to store the training data set, and uses it for fault detection purposes [42–44]. This may cause high computational cost. Firstly, GPR model requires the computation and storage of the kernel matrix K of size N × N, where N is the number of training samples.

Algorithm 1 GPR algorithm. Training Data Input data matrix X and output data matrix Y , a chosen kernel function k(., . ) ; Testing Data For a new observation X ∗ with corresponding output Y∗ ; - Form a covariance matrix (Ky ) as: Ky = K + σε2 IN ;

- Determine the mean of the prediction distribution (M) as: M = φ T (X ∗ ) φ (X )Ky−1Y ∗ ; - Calculate the covariance matrix of the prediction distribution (C) as: C = φ T (X ∗ ) φ (X ∗ ) − φ T (X ∗ ) φ (X )Ky−1 φ T (X ) φ (X ∗ ); - Determine the kernel function (K) as: K (X ∗ , X ∗ ) = φ T (X ∗ ) φ (X ∗ )



and the kernel vector (K∗ ) as: K∗ = K (X1 , X ∗ )

K (X2 , X ∗ ) · · · K (XN , X ∗ )

T

;

- Compute the posterior mean (M) as: M = K∗T Ky−1 −1Y ∗ ; - Calculate the posterior variance (C) as: C = K (X ∗ , X ∗ ) − K∗T Ky−1 K∗ ; Determine the prediction output (Yˆ ∗ ) as: Yˆ ∗ = N (K∗T Ky−1 −1Y ∗ , K (X ∗ , X ∗ ) − K∗T Ky−1 K∗ + σε2 ).

F. R., M. M. and A. K. et al. / Journal of Process Control 85 (2020) 30–40

33



where Kt ∈ Rd−1×d−1 is the Gram matrix of the dictionary Dt with entries K (Xwi , Xw j ) and it is given by



 T φ (Xw2 ) · · · φ (Xwd−1 ) φ (Xw1 ) φ (Xw2 ) · · · φ (Xwd−1 ) ⎡ K (X , x ) ⎤ ··· K (Xw1 , xwd−1 ) w1 w1 . . .. ⎦ =⎣ . . (24) . . . K (Xwd−1 , xw1 ) ··· K (Xwd−1 , xwd−1 )  T By solving Eq. (19), the vector β = β1 · · · βd can be com-

Kt = φ (Xw1 )

puted as



β = (Kt )−1 K (Xt+1 ),

(25)





T

where K (Xt+1 ) = K (Xw1 , Xt+1 ) K (Xw2 , Xt+1 ) · · · K (Xwd , Xt+1 ) . By inserting Eq. (20) into Eq. (19), we get the following expression of εt+1

Fig. 1. Diagram of GPR-based GLRT scheme.

Secondly, the computational cost of training GPR model is about O(N3 ). Furthermore, predicting a test case requires O(N) for evaluating the mean (Eq. (15)) and O(N2 ) cost for calculating the variance (Eqs. (16)) [42–44]. In order to overcome these limitations, we propose to use online reduced GPR-based GLRT (ORGPR-based GLRT). The suggested method consists to extract a reduced dataset so-called dictionary to build the new GPR model and then uses it for fault detection. Online reduced GPR-based GLRT for fault detection technique raises the question of how to process an increasing amount of observations and update the GPR model as a new input-output pairs (Xt+1 , Yt+1 ) is collected. Online update of the GPR model relies on a two stage processes at each iteration; a model order control step that inserts and removes kernel functions φ (Xt+1 ) corresponds to the input Xt+1 from the dictionary, and a parameter update step. Discarding a kernel function from the model obtained using Eq. (5) may degrade its performance. To avoid this problem, we shall use linear approximation criterion which aims at identifying kernel functions whose removal is expected to have negligible effect on the quality of the model. So, the kernel function φ (Xt+1 ) is added at time step t + 1 into the dictionary if the following condition is satisfied,

εt+1 = minβ φ (Xt+1 ) −

d 

β j φ (Xw j )2 ≥ ν,

(22)

j=1

where is the euclidean norm of the difference between  φ (Xt+1 ) and dj=1 β j φ (Xw j ), φ (Xw1 ),...,φ (Xwd ) forms a d-elements subset called the dictionary Dt+1 and d N, ν is the threshold parameter that determines the level of sparsity of the model and d is the size of the new dictionary. Note that Eq. (22) ensures the linear independence of the elements of the dictionary by projecting φ (Xt+1 ) onto the space spanned by the other d kernel functions. The optimal value of each coefficient β j is computed by the minimization of Eq. (22), which leads to d 

β j βi K (Xw j , Xwi ) − 2

j,i=1



d 

β j K (Xw j , Xt+1 )





β T Kt β − 2β T K (Xt+1 ) + K (Xt+1 , Xt+1 )







(23)

(26)

The resulting dictionary Dt , called ν −approximate, satisfies the following relation:

min

min

i=1,...,d β1 ,...,βN

φ (Xwi ) −

d 



β j φ (Xw j ) ≥ ν .

(27)

j=1 i = j

For online fault detection, the sample is added in the dictionary if Gt+1 Gα ,t . Thus, at the tth-iteration of the online reduced GPR, the model contains the dictionary Dt , the posterior mean Mt , the posterior covariance matrices Ct and the inverse of the kernel ma

trix (Kt )−1 . When a new data pair (Xt+1 , Yt+1 ) is available, the kernel function is added to the dictionary Dt+1 = {Dt , φ (Xt+1 )} and the posterior distribution is updated as

p( f (Xt+1 )/Xt+1 , Dt+1 , Yt+1 ) = N (Mt+1 , ϒt+1 )

(28)

Ct+1 + σε2 IN .

where ϒt+1 = The updated posterior mean Mt+1 is given by







Mt+1 = (K (Xt+1 ))T Kt + σε2 IN

−1

Yt

= (K (Xt+1 )) wt , T



Kt

(29)

−1

where wt = + σε2 IN Yt . The posterior variance Ct+1 is updated as





Ct+1 = K (Xt+1 , Xt+1 ) − (K (Xt+1 ))T Kt + σε2 IN



= K (Xt+1 , Xt+1 ) − (K (Xt+1 ))T St K (Xt+1 )



Kt

−1

−1



K (Xt+1 ) (30)

where St = + σε2 IN . The vector wt and the matrix St are updated whenever a new kernel function is processed, building upon their previous values at step t and depending on the dictionary Dt . The predictions output Yˆt+1 is updated as



Yˆt+1 = N (K (Xt+1 ))T wt , K (Xt+1 , Xt+1 )





−(K (Xt+1 ))T St K (Xt+1 ) + σε2 .

j=1

+K (Xt+1 , Xt+1 ) = minβ





.2

εt+1 = minβ



εt+1 = (K (Xt+1 ))T (Kt )−1 Kt (Kt )−1 K (Xt+1 )



−2(K (Xt+1 ))T (Kt )−1 K (Xt+1 ) + K (Xt+1 , Xt+1 )



= K (Xt+1 , Xt+1 ) + (K (Xt+1 ))T (Kt )−1 K (Xt+1 )



−2(K (Xt+1 ))T (Kt )−1 K (Xt+1 )



= K (Xt+1 , Xt+1 ) − (K (Xt+1 ))T (Kt )−1 K (Xt+1 )

= K (Xt+1 , Xt+1 ) − (K (Xt+1 ))T β

(31)

Using noisy observations Yt , the kernel matrix includes a regularization term σε2 . In this study, no noise term is added to Yt . Thus, the expressions of wt and St will be equal to Kt−1Yt and Kt−1 , respectively.

34

F. R., M. M. and A. K. et al. / Journal of Process Control 85 (2020) 30–40

 −1

Updating wt and St , a matrix inversion Kt is needed to be updated at each step. The costly matrix inversion can be avoided by using the Woodbury matrix identity to update iteratively the

 −1

matrix Kt





Kt+1 −1 =

 =

[45],



(Kt )−1 0

(Kt )−1 0









1 −(Kt )−1 K (Xt+1 )  0



+ −(K (Xt ))T (Kt )−1 0 1 εt+1







1 −βt+1  0 T −βt+1 + 1 0 εt+1



1 .

1



(32)

According to Eq. (31), the estimation error is given by

et+1 = Yt+1 − Yˆt+1 .

(33)

Once the residuals are updated using the GPR model, the GLRT statistic can be defined as

Gt+1 =

1

ξ2

et+1 22 ,

(34)

where ξ is the variance of the residual e. For time-varying processes, the control limit for GLRT statistic changes with time, making adjustment of this limit necessary for online fault detection.

The parameter values of aGt+1 and bGt+1 of the GLRT statistic are updated after each addition of new data to the dictionary Dt+1 making control limit Gt+1,α time-varying. Thus, to have an adaptive threshold not affected by integration of out-of-control variables, the control limit for GLRT statistic is updated as

Gα ,t+1 = gGt+1 χh2G

t+1

where gGt+1 =

,α ,

bG

t+1

2aG

k+1

(35) and hGt+1 =

2a2G

t+1

bG

t+1

, with aGt+1 and bGt+1 are

the mean and variance of the GLRT, respectively. The kernel function φ (Xt+1 ) is discarded from the dictionary when Eq. (18) is not verified. The kernel function does not contribute significantly to the diversity of the dictionary, and thus it could be discarded. In this case, the dictionary is left unchanged Dt+1 = Dt . The vector α t and the matrix St are not updated. The control limit is updated according to Eq. (35). However, in the case of faulty data, the dictionary Dt+1 , the parameters of GPR model and the control limit of the GLRT statistic are left unchanged. The main steps of the ORGPR-based GLRT algorithm are presented in Algorithm 3. The main steps of the online reduced GPR-based GLRT for fault detection are summarized in Fig. 2.

Fig. 2. A block diagram of online reduced GPR-based GLRT for fault detection.

F. R., M. M. and A. K. et al. / Journal of Process Control 85 (2020) 30–40

Algorithm 3 ORGPR-based GLRT algorithm. Input: N × m data matrix X, output matrix Y , Initialization -Parameters of the GPR model M1 , C1 , -The GLRT statistic G1 and its control limit G1,α , Testing Data For time instant t = 1, 2,... do, -Obtain a new input-output {Xt+1 , Yt+1 } and scale them,

-Compute K (Xt+1 ), the kernel between φ (Xt+1 ) and every basis in the dictionary, -Determine K (Xt+1 , Xt+1 ),



-Compute βt+1 = (Kt )−1 K (Xt+1 ),

-Calculate εt+1 = K (Xt+1 , Xt+1 ) − (K (Xt+1 ))T βt+1 ,

-Determine the predictive mean Mt+1 = (K (Xt+1 ))T





−1

Kt + σε2 IN Yt -Compute the predictive variance





−1



Ct+1 = K (Xt+1 , Xt+1 ) − (K (Xt+1 ))T Kt + σε2 IN K (Xt+1 ), -Determine the estimated output Yˆt+1 from Equation (31), -Compute the GLRT statistic and compare it to the threshold value, If Gt+1 Gt,α do If εt+1 < ν , -Discard the kernel function from the dictionary Dt+1 , -Update the GLRT statistic and its control limit, else -Add the kernel function to the dictionary Dt+1 = {Dt , φ (Xt+1 )}, -Update the inverse of kernel matrix according to Equation (32), -Update the predictive mean Mt+1 , -Update the predictive variance Ct+1 , -Update the estimated output Yˆt+1 , -Update the GLRT and its control limit Gt+1,α , If Gt+1 > Gt,α , the process is operating under faulty conditions and a fault is declared.

4. Simulation results In order to demonstrate the advantages of the proposed technique, the monitoring performance is assessed through a Tennessee Eastman process.

4.1. Case study The Tennessee Eastman (TE) process is a benchmark simulation model of a complex industrial chemical process proposed by Downs and Vogel [46]. The system generates two products from four reactants. It consists of five unit operations: a reactor, a recycle compressor, a condenser, a stripper, and a separator. The flowchart of the TE process is shown in Fig. 3. A total of 53 variables can be measured in this process, which contains 12 manipulated variables and 41 measured variables variables. The measured variables contain 22 continuous process variables and 19 composition variables. Detailed information of these 53 process variables is provided in Tables 1–3 [47,48]. The TE process is simulated under normal or various types of faulty operations. Also, the data set is used to assess the performance of the proposed method for online fault detection against the GPR-based GLRT method. The faults are introduced after observation 224 and continue until the end of the data set. These faults are listed in Table 4.

35

Table 1 Continuous process variables in the Tennessee Eastman process. Variables

Measured variables

Unit

XMEAS(1) XMEAS(2) XMEAS(3) XMEAS(4) XMEAS(5) XMEAS(6) XMEAS(7) XMEAS(8) XMEAS(9) XMEAS(10) XMEAS(11) XMEAS(12) XMEAS(13) XMEAS(14) XMEAS(15) XMEAS(16) XMEAS(17) XMEAS(18) XMEAS(19) XMEAS(20) XMEAS(21) XMEAS(22)

A feed(stream 1) D feed(stream 1) E feed(stream 1) Total feed(stream 4) Recycle flow(stream 8) Reactor feed rate(stream 6) Reactor pressure Reactor level Reactor temperature Purge rate (stream 9) Product separator temperature Product separator level Product separator pressure Product separator underflow (stream 10) Stripper level Stripper pressure Stripper underflow(stream 11) Stripper temperature Stripper stream flow Compressor work Reactor cooling water outlet temp Separator cooling water outlet temp

km3 /h kg/h kg/h km3 /h km3 /h km3 /h kPa % 0 C km3 /h 0 C % kPa m3 /h % kPa m3 /h 0 C kg/h kW 0 C 0 C

Table 2 Composition variables in the Tennessee Eastman process. Variables

State

XMEAS(23) XMEAS(24) XMEAS(25) XMEAS(26) XMEAS(27) XMEAS(28) XMEAS(29) XMEAS(30) XMEAS(31) XMEAS(32) XMEAS(33) XMEAS(34) XMEAS(35) XMEAS(36) XMEAS(37) XMEAS(38) XMEAS(39) XMEAS(40) XMEAS(41)

Composition Composition Composition Composition Composition Composition Composition Composition Composition Composition Composition Composition Composition Composition Composition Composition Composition Composition Composition

A B C D E F A B D E F G H D E F G H

Stream no

Sample time/min

6 6 6 6 6 6 9 9 9 9 9 9 9 9 11 11 11 11 11

6 6 6 6 6 6 6 6 6 6 6 6 6 6 15 15 15 15 15

Table 3 Manipulated variables in the TE process. Variables

Description

XMV(1) XMV(2) XMV(3) XMV(4) XMV(5) XMV(6) XMV(7) XMV(8) XMV(9) XMV(10) XMV(11) XMV(12)

D feed flow (stream 2) E Feed flow (stream 3) A Feed flow (stream 1) Total feed flow (stream 4) Compressor Recycle Valve Purge Valve (stream 9) Separator Pot Liquid flow (stream 10) Stripper Liquid Product flow (stream 11) Stripper Stream Valve Reactor Cooling Water flow Condenser Cooling Water flow Agitator Speed

4.2. Results and discussion using TE process A dataset which contains 1024 data samples is used as the training dataset for modeling, and another dataset which consists of 1024 samples is used as the testing dataset. The GPR model will be developed in this case where each model corresponds to a fault in the TE process. The results of fault detection and diagno-

36

F. R., M. M. and A. K. et al. / Journal of Process Control 85 (2020) 30–40

Fig. 3. Tennessee Eastman process. Table 4 Summary of process faults for Tennessee Eastman Process. Fault

Process variable

Type

F1 F2 F3 F4 F5 F6

B composition, A/C feed ratio constant (stream 4) A feed loss (stream 1) A, B and C feed compositions (stream 4) Condenser cooling water inlet temperature Reaction kinetics Unknown

Step Step Random variation Random variation Slow shift Unknown

sis for Faults (F1 to F6 ) are presented in this paper to validate the feasibility and effectiveness of the proposed ORGPR-based GLRT method. The fault F1 is introduced by a step change in the composition B and in the A/C feed ratio constant in stream 4. Figs. 4 and 5 show the fault detection results using the GPR-based GLRT and ORGPR-based GLRT techniques. As presented in these figures, the variation of the GLRT statistic changes dramatically once the fault is introduced at sample 224. The detection statistic is higher than its threshold which means that the fault is clearly detected. Fig. 4 shows that the GLRT statistic presents higher false alarms. While, applying the ORGPR-based GLRT to the same data set allows for better capabilities of adaptation to non linear and nonstationary behaviour of the TE process. It should be noted that the control limit Gα ,t of the GLRT statistic changes with time due to the

fact that it is updated depending on the dictionary as described in Section 3. The fault F3 is associated with a random variation in the A, B and C feed compositions in stream 4. Figs. 6 and 7 illustrate the detection results using GPR-based GLRT and ORGPR-based GLRT techniques. One may observe that the statistic GLRT obtained using GPR and ORGPR models remains almost consistently higher than their thresholds in the faulty region. At the same time, one can find that unlike the GPR-based GLRT which results in higher false alarms (see Fig. 3), the ORGPR-based GLRT technique is able to reduce the false alarms (see Fig. 7). The ability of ORGPR-based GLRT to adapt the nonlinear and dynamic process makes it suitable for efficient monitoring when the operating condition of the process changes.

F. R., M. M. and A. K. et al. / Journal of Process Control 85 (2020) 30–40

37

Fig. 4. GLRT statistic obtained using GPR technique in case of fault F1 .

Fig. 6. GLRT statistic obtained using GPR technique in case of fault F3 .

Fig. 5. GLRT statistic obtained using ORGPR technique in case of fault F1 .

Fig. 7. GLRT statistic obtained using ORGPR technique in case of fault F3 .

The fault (F5 ) produces a slow shift in reaction kinetics and the fault (F6 ) is an unknown fault. Figs. 8 and 10 show that the GPR-based GLRT approach can successfully detect these faults in the TE process but with a high false alarms. This can be explained by the fact that GPR model and the control chart of the GLRT statistic are fixed over time. Thus, an adaptive control chart is needed to reduce the rate of false alarms.

The results of the ORGPR-based GLRT in the case of fault F5 (Fig. 9) and F6 (Fig. 11), show that the faults are detected with lower rate of false alarms. These results prove the ability of the ORGPR-based GLRT to adapt the dynamic process changes due to the adaptability GPR model and its control chart. The fault detection results are shown in Table 5 in terms of false alarm rate (FAR), good detection rate (GDR) and computation time

38

F. R., M. M. and A. K. et al. / Journal of Process Control 85 (2020) 30–40

Fig. 8. GLRT statistic obtained using GPR technique in case of fault F5 .

Fig. 10. GLRT statistic obtained using GPR technique in case of fault F6 .

Fig. 9. GLRT statistic obtained using ORGPR technique in case of fault F5 . Table 5 Summary of FAR, GDR and CT.

Fig. 11. GLRT statistic obtained using ORGPR technique in case of fault F6 .

GPR-based GLRT

ORGPR-based GLRT

Faults

FAR(%)

GDR(%)

CT(s)

FAR(%)

GDR(%)

CT(s)

F1 F2 F3 F4 F5 F6

6.25 24.5536 26.7857 28.12 28.1250 30.8036

98.5000 99.7500 97.7500 98.750 97.8750 94.2500

8.4531 8.6094 9.82 10.1563 8.2969 10.1875

1.3393 2.6786 4.01 10.2679 4.01 9.3750

98.50 99.3750 97.37 96.25 96.12 90.25

0.0021 0.0018 0.0020 0.0031 0.0028 0.0027

(CT). Table 5 shows that the ORGPR-based GLRT technique provides better results in comparison to GPR-based GLRT method. Moreover, there is a clear improvement of this developed technique in terms of false alarms. For example, the FAR decreased from 6.25% (GPR) to 1.33% (ORGPR) in the case of fault (F1 ). The same conclusions for F2 to F6 . Moreover, when the proposed method is used for fault

F. R., M. M. and A. K. et al. / Journal of Process Control 85 (2020) 30–40

detection, the required time decreases significantly in comparison to the GPR-based GLRT technique. 5. Conclusions In this paper, an online reduced Gaussian process regression (ORGPR)-based generalized likelihood ratio test (GLRT) is applied for fault detection (FD) of chemical systems. The ORGPR method is investigated as modeling algorithm in the task of fault detection and then the faults are detected using GLRT chart. The main advantages of the developed ORGPR-based GLRT over the GPR-based GLRT are the reduction in the computational cost when using large datasets and the update of the GPR parameters upon a new observation is available. These benefits make the ORGPR-based GLRT strategy useful for many complex fault detection problems and fast real time processes. The developed ORGPR-based GLRT technique is validated using Tennessee Eastman (TE) process data through monitoring some of the key variables involved in the TE systems. The developed algorithm showed efficient results by reducing the false alarm rate, the missed detection rate and computation time. Declaration of competing Interest The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or nonfinancial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.

[10] [11] [12] [13] [14]

[15]

[16] [17] [18] [19] [20] [21] [22]

[23]

[24] [25]

[26] [27] [28]

Acknowledgement This work was made possible by NPRP grant NPRP9-330-2-140 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

[29]

[30]

[31]

Supplementary material [32]

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.jprocont.2019.11.002.

[33]

References [34] [1] M. Mansouri, M.N. Nounou, H.N. Nounou, Multiscale kernel pls-based exponentially weighted-glrt and its application to fault detection, IEEE Trans. Emerg. Top. Comput. Intell. (99) (2017) 1–11. [2] I. Baklouti, M. Mansouri, A.B. Hamida, H. Nounou, M. Nounou, Monitoring of wastewater treatment plants using improved univariate statistical technique, Process Saf. Environ. Prot. 116 (2018) 287–300. [3] M. Mansouri, A. Al-Khazraji, M. Hajji, M.F. Harkat, H. Nounou, M. Nounou, Wavelet optimized EWMA for fault detection and application to photovoltaic systems, Sol. Energy 167 (2018) 125–136. [4] M.-F. Harkat, M. Mansouri, M. Nounou, H. Nounou, Enhanced data validation strategy of air quality monitoring network, Environ. Res. 160 (2018) 183–194. [5] P.M. Frank, Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy: a survey and some new results, Automatica 26 (3) (1990) 459–474. [6] R. Isermann, Model-based fault-detection and diagnosis–status and applications, Annu. Rev. Control 29 (1) (2005) 71–85. [7] V. Venkatasubramanian, R. Rengaswamy, S.N. Kavuri, K. Yin, A review of process fault detection and diagnosis: part III: process history based methods, Comput. Chem. Eng. 27 (3) (2003) 327–346. [8] A. Ullah, V. Zinde-Walsh, On the robustness of lm, lr, and w tests in regression models, Econometrica (1984) 1055–1066. [9] J.M. Hahne, F. Biessmann, N. Jiang, H. Rehbaum, D. Farina, F. Meinecke, K.-R. Müller, L. Parra, Linear and nonlinear regression techniques for simul-

[35]

[36]

[37] [38] [39] [40] [41]

[42] [43]

39

taneous and proportional myoelectric control, IEEE Trans. Neural Syst. Rehabil. Eng. 22 (2) (2014) 269–279. R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. 58 (1) (1996) 267–288. H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. 67 (2) (2005) 301–320. J.R. Quinlan, Simplifying decision trees, Int. J. Man Mach. Stud. 27 (3) (1987) 221–234. D. Steinberg, P. Colla, Cart: classification and regression trees, The Top Ten Algorithms in Data Mining, 9, 2009, p. 179. M.M. Hamed, M.G. Khalafallah, E.A. Hassanien, Prediction of wastewater treatment plant performance using artificial neural networks, Environ. Model. Softw. 19 (10) (2004) 919–928. M.S. Nasr, M.A. Moustafa, H.A. Seif, G. El Kobrosy, Application of artificial neural network (ann) for the prediction of el-agamy wastewater treatment plant performance-egypt, Alex. Eng. J. 51 (1) (2012) 37–43. Y. Djebbar, R. Narbaitz, Neural network prediction of air stripping kl a, J. Environ. Eng. 128 (5) (2002) 451–460. N. Moreno-Alfonso, C. Redondo, Intelligent waste-water treatment with neural-networks, Water Policy 3 (3) (2001) 267–271. C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995) 273–297. G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (1–3) (2006) 489–501. G. Huang, G.-B. Huang, S. Song, K. You, Trends in extreme learning machines: a review, Neural Netw. 61 (2015) 32–48. Q.-Y. Zhu, A.K. Qin, P.N. Suganthan, G.-B. Huang, Evolutionary extreme learning machine, Pattern Recognit. 38 (10) (2005) 1759–1763. G.-B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst. Man. Cybern. Part B 42 (2) (2012) 513–529. C.M. Bishop, M.E. Tipping, Variational relevance vector machines, in: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., 20 0 0, pp. 46–53. M.I. Jordan, T.M. Mitchell, Machine learning: trends, perspectives, and prospects, Science 349 (6245) (2015) 255–260. J.O. Ogutu, T. Schulz-Streeck, H.-P. Piepho, Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions, in: BMC Proceedings, 6, BioMed Central, 2012, p. S10. D. Nguyen-Tuong, M. Seeger, J. Peters, Model learning with local Gaussian process regression, Adv. Robot. 23 (1) (2009) 2015–2034. A.J. Smola, P.L. Bartlett, Sparse greedy gaussian process regression, in: Advances in Neural Information Processing Systems, 2001, pp. 619–625. H. Sheng, J. Xiao, Y. Cheng, Q. Ni, S. Wang, Short-term solar power forecasting based on weighted Gaussian process regression, IEEE Trans. Ind. Electron. 65 (1) (2017) 300–308. A. Rohani, M. Taki, M. Abdollahpour, A novel soft computing model (Gaussian process regression with k-fold cross validation) for daily and monthly solar radiation forecasting (part: I), Renew. Energy 115 (2018) 411–422. J. Verrelst, L. Alonso, G. Camps-Valls, J. Delegido, J. Moreno, Retrieval of vegetation biophysical parameters using gaussian process techniques, IEEE Trans. Geosci. Remote Sens. 50 (5) (2012) 1832–1843. M. Lázaro-Gredilla, M.K. Titsias, J. Verrelst, G. Camps-Valls, Retrieval of biophysical parameters with heteroscedastic gaussian processes, IEEE Geosci. Remote Sens. Lett. 11 (4) (2014) 838–842. M. Mansouri, M. Sheriff, R. Baklouti, M. Nounou, H. Nounou, A.B. Hamida, N. Karim, Statistical fault detection of chemical process-comparative studies, J. Chem. Eng. Process Technol. 7 (1) (2016) 282–291. M. Mansouri, M. Nounou, H. Nounou, N. Karim, Kernel PCA-based GLRT for nonlinear fault detection of chemical processes, J. Loss Prev. Process Ind. 40 (2016) 334–347. C. Botre, M. Mansouri, M. Nounou, H. Nounou, M.N. Karim, Kernel PLS-based GLRT method for fault detection of chemical processes, J. Loss Prev. Process Ind. 43 (2016) 212–224. C. Botre, M. Mansouri, M.N. Karim, H. Nounou, M. Nounou, Multiscale PLS-based GLRT for fault detection of chemical processes, J. Loss Prev. Process Ind. 46 (2017) 143–153. M.Z. Sheriff, M. Mansouri, M.N. Karim, H. Nounou, M. Nounou, Fault detection using multiscale PCA-based moving window GLRT, J. Process Control 54 (2017) 47–64. A.Y. Sun, D. Wang, X. Xu, Monthly streamflow forecasting using gaussian process regression, J. Hydrol. 511 (2014) 72–81. A. Kapoor, K. Grauman, R. Urtasun, T. Darrell, Gaussian processes for object categorization, Int. J. Comput. Vis. 88 (2) (2010) 169–188. P. Ferreiro Alonso, Training Data Analysis for Gaussian Process State Space Models, Universitat Politècnica de Catalunya, 2017 B.S. thesis. C.K. Williams, C.E. Rasmussen, Gaussian Processes for Machine Learning, 2, MIT Press Cambridge, MA, 2006. M. Mansouri, M.N. Nounou, H.N. Nounou, Improved statistical fault detection technique and application to biological phenomena modeled by s-systems, IEEE Trans. Nanobiosci. 16 (6) (2017) 504–512. L. Csató, M. Opper, Sparse on-line gaussian processes, Neural Comput. 14 (3) (2002) 641–668. H. Bijl, T.B. Schön, J.-W. van Wingerden, M. Verhaegen, Online sparse Gaussian process training with input noise, Stat 1050 (2016) 29.

40

F. R., M. M. and A. K. et al. / Journal of Process Control 85 (2020) 30–40

[44] L. Yang, K. Wang, L.S. Mihaylova, Online sparse multi-output Gaussian process regression and learning, IEEE Trans. Signal Inf. Process. Netw. (2018). [45] K.S. Riedel, A Sherman–Morrison–Woodbury identity for rank augmenting matrices with application to centering, SIAM J. Matrix Anal. Appl. 13 (2) (1992) 659–662. [46] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Comput. Chem. Eng. 17 (3) (1993) 245–255.

[47] R. Fezai, M. Mansouri, O. Taouali, M.F. Harkat, N. Bouguila, Online reduced kernel principal component analysis for process monitoring, J. Process Control 61 (2018) 1–11. [48] R. Fazai, O. Taouali, M.F. Harkat, N. Bouguila, A new fault detection method for nonlinear process monitoring, Int. J. Adv. Manuf. Technol. 87 (9–12) (2016) 3425–3436.