Fault detection in dynamic systems using the Kullback–Leibler divergence

Fault detection in dynamic systems using the Kullback–Leibler divergence

Control Engineering Practice 43 (2015) 39–48 Contents lists available at ScienceDirect Control Engineering Practice journal homepage: www.elsevier.c...

1MB Sizes 4 Downloads 95 Views

Control Engineering Practice 43 (2015) 39–48

Contents lists available at ScienceDirect

Control Engineering Practice journal homepage: www.elsevier.com/locate/conengprac

Fault detection in dynamic systems using the Kullback– Leibler divergence Lei Xie a, Jiusun Zeng b,n, Uwe Kruger c,n, Xun Wang d, Jaap Geluk d a

State Key Laboratory of Industrial Control Technology, Zhejiang University, 310027 Hangzhou, PR China College of Metrology and Measurement Engineering, China Jiliang University, 310018 Hangzhou, PR China c Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Jonsson Engineering Center, Troy, NY 12180-3590, USA d Department of Mathematics, The Petroleum Institute, P.O. Box 2533, Abu Dhabi, UAE b

art ic l e i nf o

a b s t r a c t

Article history: Received 7 January 2015 Accepted 24 May 2015

This paper proposes detecting incipient fault conditions in complex dynamic systems using the Kullback–Leibler or KL divergence. Subspace identification is used to identify dynamic models and the KL divergence examines changes in probability density functions between a reference set and online data. Gaussian distributed process variables produce a simple form of the KL divergence. Non-Gaussian distributed process variables require the use of a density-ratio estimation to compute the KL divergence. Applications to recorded data from a gearbox and two distillation processes confirm the increased sensitivity of the proposed approach to detect incipient faults compared to the dynamic monitoring approach based on principal component analysis and the statistical local approach. & 2015 Elsevier Ltd. All rights reserved.

Keywords: Kullback–Leibler divergence Auto-and cross correlation Subspace identification Non-Gaussian distribution Incipient fault conditions

1. Introduction To maintain a safe and reliable production, multivariate statistical techniques have received considerable attention for monitoring large scale industrial systems over the past few decades (Ge & Song, 2013; Kruger & Xie, 2012). Alternative approaches include parity space and finite memory observers (Bezzaoucha, Marx, Maquin, & Ragot, 2014; Graton, Kratz, & Fantini, 2014; Odendaal & Jones, 2014; Wang, Gao, & Chen, 2015). The most widely used statistical tools include principal component analysis (PCA) and partial least squares (PLS) (Ge, Xie, Kruger, Lamont, & Wang, 2009; Kourti & MacGregor, 1995, 1996; Kruger, Chen, Sandoz, & McFarlane, 2001; Pranatyasto & Qin, 2001; Venkatasubramanian, Rengaswamy, Kavuri, & Yin, 2003). However, their applications are compromised by the assumptions that the recorded variables (i) do not show any auto- and cross-correlation, (ii) are stationary and (iii) follow a Gaussian distribution. For most industrial processes, however, some assumptions are not met, for example recorded variables possess time-based correlation, resulting from controller feedback and the presence of process noise and disturbances. To address time-based correlation, Ku, Storer, and Georgakis (1995) proposed the application of PCA to a time-series arrangement of the recorded variables. For Gaussian distributed process variables, Qin and Li (2001) and Komulainen, Sourander, and

n

Corresponding authors. E-mail addresses: [email protected] (J. Zeng), [email protected] (U. Kruger).

http://dx.doi.org/10.1016/j.conengprac.2015.05.010 0967-0661/& 2015 Elsevier Ltd. All rights reserved.

Jamsa-Jounela (2004) applied dynamic PCA and PLS models to monitor processes that have correlated variables, respectively. To monitor correlated non-Gaussian distributed variables, Lee, Yoo, and Lee (2004) suggested using dynamic independent component analysis. Although such time-lagged arrangements have been successfully utilized for detecting abnormal situations, they produce sizable lagged variable sets that are difficult to interpret in practice (Odiowei & Cao, 2010; Treasure, Kruger, & Cooper, 2004). Subspace model identification (SMI) (Larimore, 1990; Lee & Edgar, 2002; van Overschee & de Moor, 1996; Wang & Qin, 2002) has been widely used to model time-based variable relationships, based on their simplicity and numerical stability (Qin, 2006). Different from time-lagged arrangements, state space models utilize a set of state variables to model dynamic process behavior. SMI has also gained attention as a process monitoring tool (Odiowei & Cao, 2010; Russel, Chiang, & Braatz, 2000; Schubert, Kruger, Arellano-Garcia, Feital, & Wozny, 2011, 2012; Treasure et al., 2004). This paper uses a non-causal state space model structure in innovation form: sðk þ 1Þ ¼ AsðkÞ þ KeðkÞ

ð1aÞ

yðkÞ ¼ CsðkÞ þ eðkÞ;

ð1bÞ

where y A Rdy is the measured data vector, s A Rds is the state vector having the multivariate density function ps ðsÞ, a mean vector of zero and the covariance matrix Σs , K A Rds dy and C A Rdy ds are parameter matrices, e A Rdy is an error vector, k is a sample index and A A Rds ds is a parameter matrix that has eigenvalues inside the unit circle.

40

L. Xie et al. / Control Engineering Practice 43 (2015) 39–48

The application of Eq. (1), instead of a dynamic causal input– output model structure in state space form, does not require dividing y into input/cause and output/effect variables. For complex systems, particularly those in the petrochemical industry, such a division is often difficult because of (i) controller feedback, (ii) complex interactions between various process units and (iii) the presence of recycle streams for example. Treasure et al. (2004) and Odiowei and Cao (2010) highlighted that a time-lagged arrangement of y results in a considerable dimension of the combined variable set to be analyzed. In contrast, Eq. (1) utilizes the state vector s, which is typically of a significantly smaller dimension, to describe the time-based correlation encapsulated in y. The application of SMI for fault detection consists, generally, of two steps: 1. estimating model parameters and generating state sequences and model residuals using SMI; and 2. defining monitoring statistics from state sequences and model residuals. Using model residuals from a state space model, Qin and Li (2004) showed how to diagnose faulty sensors based on Hotelling's T2 statistics. Basseville, Abdelghani, and Benveniste (2000) combined SMI and the statistical local approach to detect abnormal situations in vibration systems producing Gaussian distributed variable sets. Ge, Kruger, Lamont, Xie, and Song (2010) developed this approach further to monitor non-Gaussian distributed vibration variables. Odiowei and Cao (2010) used canonical variate analysis to estimate state sequence and model residuals and construct monitoring statistics using a kernel density estimation. Despite the number of successful applications of SMI-based process monitoring approaches, Kruger, Kumar, and Littler (2007) pointed out that they may not be sensitive in detecting incipiently developing faults. For static PCA, recent work confirmed that such fault conditions may not be detectable using conventional scorebased statistics and residuals (Harmouche, Delpha, & Diallo, 2014; Kruger & Xie, 2012; Zeng, Kruger, Geluk, Wang, & Xie, 2014). More precisely, Harmouche et al. (2014) and Zeng et al. (2014) showed that the KL divergence is more sensitive in detecting incipient faults than conventional score- and residual-based statistics and the statistical local approach. Resulting from the higher sensitivity in detecting incipient fault conditions, the application of the KL divergence to fault detection and diagnosis has also been extended to dynamic systems. Bittencourt, Saarinen, Sander-Tavallaey, Gunnarsson, and Norrlof (2014) developed an off-line fault diagnosis scheme that compares different data records using the KL divergence. Liu, Yamada, Collier, and Sugiyama (2013) developed a moving window approach for change point detection based on the f-divergence and Yu and Mehta (2009) designed a prediction (filtering) approach using the KL divergence. However, the approaches in Bittencourt et al. (2014), Liu et al. (2013), and Yu and Mehta (2009) relate to off-line comparisons between dynamic systems, suffer from increased problem size (Bittencourt et al., 2014; Liu et al., 2013) or are not applicable to variable sets that have a joint non-Gaussian distribution. This article proposes to combine SMI with the KL divergence to (i) develop a monitoring approach for modeling time-based correlation in recorded variable sets and (ii) detect incipient fault conditions. For systems that produce Gaussian distributed process variables, the KL divergence reduces to an analytical expression allowing a straightforward estimation of the corresponding rejection regions to test whether the process is in-statistical-control. The identified state space model yields two monitoring statistics that (i) relate to the state sequences and (ii) rely on the model residuals.

To propose a framework that is also applicable to systems yielding non-Gaussian distributed process variables, the paper utilizes the recently developed density ratio estimation (Kanamori, Hido, & Sugiyama, 2009) to compute the KL divergence. There are also two monitoring statistics for the nonGaussian case, one that monitors the variation of the state sequences and the other the accuracy of the model prediction. For both statistics, rejection regions for hypothesis testing can be estimated using a kernel density estimation (Chen, Kruger, & Leung, 2004; Silverman, 1986). The next section briefly summarizes the KL divergence and shows how to obtain its value for Gaussian and non-Gaussian distributed variable sets. Section 3 provides a detailed description of the monitoring approach for dynamic Gaussian systems. This is followed by extending the proposed approach to handle dynamic non-Gaussian systems in Section 4. Sections 5–7 summarize three application studies involving recorded variables from (i) a gearbox system and (ii) two industrial distillation processes. Each process showed time-based correlation in the recorded variable set. Whilst the gearbox system yielded Gaussian distributed process variables, the application to distillation processes produced non-Gaussian distributed process variables. Finally, Section 8 provides a concluding summary of this paper.

2. Preliminaries This section gives a brief definition of the KL divergence and 1 discusses how to compute it based on a reference set Y ¼ fyðiÞgni ¼ 1, k ~ containing n1 samples and online records Y k ¼ fyðjÞgj ¼ k  n2 þ 1 , containing n2 ⪡n1 samples. It is assumed here that the samples are independent and the variable set y does not have any timebased correlation. The density functions for generating the samples in the sets Y and Y~ k are py ðÞ and p~ y ðÞ, respectively. The KL divergence (Kullback & Leibler, 1951) is a widely used non-metric divergence to measure the difference between the probability density functions (PDFs) py ðyÞ and p~ y ðyÞ. The KL divergence of p~ y ðyÞ with respect to py ðyÞ is defined by ! Z p~ y ðyÞ p~ y ðyÞln Kðp~ y ; py Þ ¼ dy: ð2Þ py ðyÞ Rdy Here, lnðÞ is the natural logarithm. The KL divergence is not a distance metric, as the triangular inequality and the symmetric property are generally not met. The KL divergence is non-negative and zero if and only if py ðyÞ is equal to p~ y ðyÞ. According to Eq. (2), the KL divergence is conceptually simple and its applications can be found in a wide range of problems, including image and speech processing (Ramirez, Segura, Benitez, & de la Torre, 2004), image retrieval (Goldberger et al., 2003) and others (Basseville, 2013).

2.1. Estimation of the KL divergence for gaussian distributed data In many applications, py ðÞ and p~ y ðÞ are generally not available and must be estimated from the data sets Y and Y~ k , respectively. If the random vector y has a Gaussian distribution, py ðÞ and p~ y ðÞ are defined by estimating both mean vectors, μy and μ~ y , and covar~ , respectively. For the PDFs p ðyÞ, y A Y, iance matrices, Σy and Σ y y ~ Þ, the KL divergence ~ y~ A Y~ k , y~  N ðμ~ y ; Σ y  N ðμy ; Σy Þ, and p~ y ðyÞ, y becomes (Penny, 2001) 

K p~ y ; py



8 9 0 1   > > = T       Σ 1< y C 1 ~ μ~ y  μy Σy 1 μ~ y  μy þ lnB ¼ @ A þ tr Σy Σ y  dy : > ~ 2> : ; Σ  y

ð3Þ

L. Xie et al. / Control Engineering Practice 43 (2015) 39–48

2.2. Estimation of the KL divergence for non-Gaussian distributed data If y has a non-Gaussian distribution, however, the estimation py ðyÞ and p~ y ðyÞ and subsequently the KL divergence is not straightforward. Available methods for estimating the densities, such as the kernel density estimation (Chen et al., 2004; Silverman, 1986), are computationally expensive and the computation of the KL divergence requires the application of numerical integration techniques. Kanamori et al. (2009) showed how to efficiently estimate the density ratio ϕðyÞ ¼ py ðyÞ=p~ y ðyÞ as follows: ! n1 p ðyÞ X ðy yi ÞT ðy  yi Þ T ^ ðyÞ: ϕðyÞ ¼ ~ y  βi exp ð4Þ ¼ β KðyÞ ¼ ϕ γ p y ðyÞ i ¼ 1 Here, βi is an unknown parameter and γ is the kernel parameter. R  By minimizing the mean squared error term 12 Rdy ϕðyÞ  T ϕ^ ðyÞÞ2 p~ y ðyÞ dy the parameter vector β ¼R ðβ1 ⋯ βn1 Þ can be deter2 mined by removing the constant term 12 Rdy ϕ ðyÞp~ y ðyÞ dy as Z Z 2 1 β ¼ argmin ϕ^ ðyÞp~ y ðyÞ dy  ϕ^ ðyÞpy ðyÞ dy β 2 Rdy Rdy   n o 1  ^ 2 ðyÞ  ~ ^ ¼ argmin E ϕ ð5Þ p y ðyÞ  E ϕ ðyÞ  py ðyÞ β 2 The expectations in Eq. (5) are based on the PDFs py ðyÞ and p~ y ðyÞ. Using the samples stored in the data sets Y~ k and Y these   ^ 2 ðyÞ j ~ 1 T expectations can be approximated as 12E ϕ p y ðyÞ  2β Hβ n o Pk T T ^ ðyÞ j 1 and E ϕ py ðyÞ  h β, where H ¼ n2 j ¼ k  n2 þ 1 Kðy j ÞK ðyj Þ and P n1 n1 1 is based on the n2 h ¼ n11 ni ¼ 1 Kðyi Þ. This shows that H A R samples in Y~ and h A Rn1 is constructed from the n1 samples in Y. k

Including a regularization term, the optimal solution for the parameter vector β is  β ¼ H  δI  1 h: ð6Þ Here, δ is the regularization parameter. The reason for including the regularization term is that the rank of H is n2 ⪡n1 . The kernel

parameter γ and the regularization parameter δ can be determined by cross validation (Kanamori et al., 2009). ^ ðyÞ, the approximation of the KL divergence now Using ϕ becomes Z     ^ ðyÞ dy K p~ y ; py   p~ y ðyÞln ϕ Rdy

n   o n  o T  ^ ðyÞ  ~   E ln ϕ  p y ðyÞ ¼  E ln β KðyÞ

p~ y ðyÞ

ð7Þ

Finally, the samples stored in data set Y~ k now allow computing the following equation:   n  o T  K p~ y ; py   E ln β KðyÞ 

p~ y ðyÞ



k X 1 n2 j ¼ k  n

2

  T ln β Kðyj Þ þ1

ð8Þ It is important to note that the above approximation of the KL divergence is based on the assumption that the samples are independent and the random vector y does not have any timebased correlation. However, time-based correlation is a common occurrence in recorded data from industrial system, for example caused by controller feedback, and the presence of process noise and disturbances. Sections 3 and 4 develop a monitoring framework for recorded variable sets that possess time-based correlation and have a Gaussian and non-Gaussian distribution, respectively.

41

3. Monitoring of dynamic Gaussian process This section discusses how to construct a monitoring scheme for recorded samples that are generated from Eq. (1) by assuming



 e  N 0; Σe ; and  EfeðiÞeT ðkÞg ¼ δij Σ e , where i; j are sample indices and δij is the Kronecker delta function, i.e. if i ¼j, δij ¼ 1, and if ia j, δij ¼ 0. Under these assumptions, the samples of the error vector e are uncorrelated and Eq. (1) generates state variables that are Gaussian distributed and possess a time-based correlation. As shown by Chan (2008) and briefly revised in Section 3.1, there is an analytical expression to compute the KL divergence for the correlated state vector sðkÞ. Section 3.2 describes how to construct monitoring statistics based on sðkÞ and eðkÞ. Section 3.3 then discusses how to diagnose an abnormal situation and Section 3.4 summarizes the monitoring scheme. 3.1. Preliminaries For a dynamic process defined in Eq. (1), the discussion below highlights that it can also be seen as a Gauss–Markov process (Chan & Vasconcelos, 2004). It follows from Eq. (1a) that recursively substituting consecutive samples of s yields sðk þ tÞ ¼ At sðkÞ þ

t 1 X

At  ℓ  1 Keðk þℓÞ:

ð9Þ

ℓ¼0

Following

from the above assumptions, each sample of eðk þ ℓÞ  N 0; Σe , 0 r ℓ ot  1 and each sample of the state vector is Gaussian distributed:

sðk þ tÞ  N μst ; Σst ; ð10Þ h iT where μst ¼ At sðkÞ and Σst ¼ AΣst  1 AT þ KΣe KT ¼ At Σs At þ h iT Pt  1 ℓ ℓ T . Given that the eigenvalues of A are ℓ ¼ 0 A K Σe K A assumed to be inside the unit circle, it follows that Eq. (10) asymptotically converges to ( ) 1 h iT X d A ℓ K Σe A ℓ K : ð11Þ lim sðk þ tÞ-N 0;

t-1

ℓ¼0

Moreover, Eq. (9) becomes, asymptotically, an impulse response or moving average data model to describe the state variables. Eq. (15)

shows that the cross-covariance matrices, Cov sðk þ tÞ; sðkÞ , are required:

ð12Þ Cov sðt þ kÞ; sðkÞ ¼ At Σs ; which follows from Eq. (9). To capture the time-based correlation of the state variables, a total of n samples are stacked in a contiguous order, which yields  sTn ðk þ nÞ ¼ sT ðk þ 1Þ ⋯ sT ðk þ nÞ : ð13Þ Following from the preceding discussion sn has a Gaussian distribution: n o ð14Þ sn ðk þnÞ  N μsn ; Σsn ; with a following mean vector and covariance matrix: 0 1

μs1

Bμ B s2 μsn ¼ B B ⋮ @

μsn

C C C C A

ð15aÞ

42

L. Xie et al. / Control Engineering Practice 43 (2015) 39–48

2

Σs1

6 6 6 6 A Σs 1 6 Σsn ¼ 6 6 2 6 A Σs 6 1 6 6 ⋮ 4 An  1 Σs1

Σs1 AT Σs2

h

Σs 1 A 2

iT

Σs 2 A T



A Σs 2

Σs 3







An  2 Σs2

A n  3 Σs 3

⋱ ⋯

Σs 1 ¼ A Σs A T þ K Σe K T ,

Here,

iT 3 7 h iT 7 7 n2 7 Σs 2 A 7 h iT 7 7: Σs 3 A n  3 7 7 7 7 ⋮ 5 h

~ sample of y, denoted here by yðkÞ, the data window Y~ k stores the   T T T ~ ~ ~ samples y sn ðkÞ ¼ y s ðk  n þ1Þ⋯y s ðkÞ . Next, applying subspace

Σs 1 A n  1



ð15bÞ

Σs n

Σs ¼

P1

ℓ¼1

h iT Aℓ KΣe Aℓ K ,

μsℓ ¼ Aℓ sðkÞ for 1 rℓ r n and sðkÞ is assumed to be available. It is important to note that n must be large enough to capture the time-based correlation encapsulated in the recorded variables. 3.2. Constructing monitoring statistics With respect to the data structure in Eq. (1), there are two distinct monitoring statistics that can be established on the basis of the KL divergence: yðkÞ ¼

CsðkÞ þ eðkÞ : |{z}  |fflffl{zfflffl}  K ðp~ e ;pe Þ K p~ ;p ysn

ð16Þ

ysn

Given that the state and error vectors are unknown, they need to be estimated using subspace identification (van Overschee & de Moor, 1996). For the reference set Y, the application of subspace identification for stochastic systems gives rise to the following estimation of the data structure in Eq. (1): ^ s ðkÞ þ εðkÞ yðkÞ ¼ Ct |fflfflffl{zfflffl ffl}

ð17aÞ

^ s ðkÞ þ C^ K^ εðkÞ ^ s ðk þ 1Þ ¼ C^ At Ct

ð17bÞ

~ ~ identification algorithm using Y~ k yields the estimates A^ ðkÞ, C^ ðkÞ, ~^ ~ ðkÞ and t~ ðk n þ 1Þ. Now, estimating the mean vector and C ðkÞ, Σ ε s ~ ðkÞ, the covariance matrix using the data set Y~ k , μ~ ys ðkÞ and Σ ysn n n o T ~ ~ ~ allows constructing the PDF of y sn ðkÞ  N μ ys ðkÞ; Σ ysn ðkÞ . n

With the two sequences Y and Y~ k , each containing n samples, it follows from Eq. (3) that the KL divergence can be computed as follows:   1 T   μysn  μ~ ysn ðkÞ Σysn1 μysn  μ~ ysn ðkÞ K p~ ys ðkÞ; pys ¼ n n 2 1 0   1    Σ  C B  ysn  C 1 ~ þ ln@ ð18Þ A þ tr Σysn Σ ysn ðkÞ  ndy A: ~ ðkÞ Σ ysn A second monitoring statistic can be constructed by computing the KL divergence for the estimated error vector ε, given that ε  N fμε ; Σ ε g and ε~ ðkÞ  N fμ~ ε ðkÞ; Σ~ ε ðkÞg:   1 K p~ ε ðkÞ; pε ¼ με  μ~ ε ðkÞ T Σε 1 με  μ~ ε ðkÞ 2 1 0 1     C 1 ~ B Σε  C þ ln@ ð19Þ A þ tr Σε Σ ε ðkÞ  dy A: ~ ðkÞ Σ ε

3.3. Fault diagnosis

ys ðkÞ





εðkÞ  N 0; Σε :

ð17cÞ

^ C^ and K^ are estimates of A, C and K, In Eq. (17), the matrices A, respectively, ts ðkÞ and εðkÞ are estimates of sðkÞ and eðkÞ, respec^ C, ^ K^ tively. Asymptotically, it should be noted that the estimates A, and ts ðkÞ are equal to A, C, K and sðkÞ up to a similarity transformation (van Overschee & de Moor, 1996), respectively. It is important to note that ys ðkÞ in Eq. (17a) is not affected by the unknown similarity transformation. According to Eq. (17), the subspace identification algorithm provides estimates of the error covariance matrix Σe , Σ ε , the dimension of the vector of state ^ s ð1Þ. vector, s, and the initial vector ys ð1Þ ¼ Ct The estimates computed from the subspace identification algorithm now allow constructing the mean vector and the covariance   T T matrix for the augmented vector yTsn ¼ tTs ð1ÞC^ ⋯tTs ðnÞC^ by replacing

 Σs with Σy ¼ C^ A^ Σt A^ T C^ T þ C^ K^ Σε K^ T C^ T ;  μs with μy ¼ C^ A^ ℓ ts ð1Þ 1



s1

sℓ þ 1

i

yðk  n þ iÞ ¼ CA^ sðk  nÞ þ C

Ai  ℓ  1 Keðk  n þ ℓÞ þ εðk  n þ iÞ

ð20Þ   T T Stacking the n samples to form y~ n ðkÞ ¼ y~ ðk  n þ 1Þ ⋯ y~ ðkÞ and applying Eq. (20) yield T

y~ n ðkÞ ¼ Γts ðk  n þ 1Þ þ GdðkÞ:

ð21Þ

Here 2

3 C^ 6 ^ ^ ^ ^ n  17 Γ ¼ 4 C A ⋮ C A 5; 2 C^ A^ I 6 G ¼ 4 C^ A^ K^

sℓ

i1 X ℓ¼0

2

s

0

0



I

0



0 ^ 0 K^ C^ A^ KI⋯0⋮⋮⋮⋱⋮ C^ A^

2 C^ A^

and defining  h ℓ iT  T ^ℓ ^ ^ ^  Σys ¼ C^ P1 C^ ; ℓ ¼ 1 A K Σε A K

 Σy

To examine the effect of a fault condition upon the state space model in Eq. (1a), this subsection considers faults that can be described by a specific fault direction and a constant fault magnitude. Moreover, the considered fault condition superimposes a fault signature upon the normal data vector. To describe this scenario commences by consecutively substituting Eq. (16) for 1 r i rn, n o k, which gives rise to

3 n1

K^ C^ A^

n2

^ C^ A^ K

n3

7 ^ K⋯I 5

ð22aÞ 0

T T T T ¼ C^ A^ Σtsℓ A^ C^ þ C^ K^ Σε K^ C^

n

o

as shown in Eq. (15). Hence, ysn  N μys ; Σysn . Given that the

1 εðk  n þ 1Þ B C B εðk  n þ 2Þ C C dðkÞ ¼ B B C ⋮ @ A εðkÞ

ð22bÞ

n

error vector ε is not assumed to have any time-based correlation, all

statistically relevant information is encapsulated in ε  N με ; Σε . Using the KL divergence for online monitoring requires the use of a moving window approach. Let k be the currently recorded

ðf Þ ~ þ ΞfðkÞ is the kth sample, where Next, assuming that y~ ðkÞ ¼ yðkÞ the fault condition, ΞfðkÞ, is superimposed upon the uncorrupted ~ sample, yðkÞ, where Ξ and fðkÞ are the fault direction and the fault magnitude, respectively. This allows Eq. (21) to become

L. Xie et al. / Control Engineering Practice 43 (2015) 39–48 ðf Þ y~ n ðkÞ  Ξn f n ðkÞ ¼ Γts ðk  nÞ þGdðkÞ;

where:

Ξn ¼ In  Ξ;

0 B f n ðkÞ ¼ @

fðk n þ 1Þ ⋮

ð23Þ 1 C A

ð24Þ

fðkÞ n

and y~ n ðkÞ is the stacked vector of the observed data and  is the Kronecker product. Pre-multiplying Eq. (23) by the orthogonal ? complement of Γ, Γ , removes the unknown sample of the state vector ts ðk  nÞ (Chow & Willsky, 1984), yielding the noise vector h i† h i† ? ? ðf Þ ? ? dðkÞ ¼ Γ G Γ y~ n ðkÞ  Γ G Γ Ξn f n ðkÞ ð25Þ ?

where † is the pseudo-inverse. Determination of Γ has been discussed extensively in the literature (Chow & Willsky, 1984; Li & Shah, 2002; Qin & Li, 2001). A constant fault condition, for example a sensor fault, affects the mean value of the residual vector only, i.e. fðkÞ ¼ f 0 , which Eq. (25) highlights. A least squares estimate of the fault signature is then h i1 T f^ 0 ¼ Ξ M T M Ξ ΞT M T M y~ ðfn Þ ; ð26Þ ?

?

where M ¼ ½Γ G † Γ . If the fault magnitude changes over time, however, the mean vector and the covariance matrix of dðkÞ change. This renders the diagnosis of such an event more difficult. It is possible, however, to approximate the fault signature stepwise, as discussed in Kruger and Xie (2012). 3.4. Construction of on-line monitoring model Utilizing the monitoring statistics proposed in Eqs. (18) and (19) to evaluate whether the process is in-statistical-control requires the determination of rejection regions, or control limits. To guarantee a statistically independent estimation of the rejection regions, the following approach is considered here. Recording a total of 2n samples from the process under normal operating condition and using the first set of n samples, Y, allows identifying a reference state space model of the form in Eq. (1). Using a moving window approach for the second set of n samples then produces a total of n values for Kðp~ yn ðkÞ; pyn Þ and Kðp~ ε ðkÞ; pε Þ by applying Eqs. (18) and (19), respectively. Here, n þ 1 r k r 2n. Using the kernel density estimation (Chen et al., 2004; Silverman, 1986) based on these n values enables estimating the PDF of Kðp~ ys ; pys Þ n n and Kðp~ ε ; pε Þ and with it the control limits K ysn ;α and K ε;α for a significance α ¼ 0:01 for example. Fig. 1 shows the division of the recorded data set in more detail. The list below summarizes the steps required to establish an on-line model:

43

(5) identify the model parameters using subspace identification from Y ¼ fyi gni¼ 1 based on Y; (6) determine the values for Kðp~ ys ðkÞ; pys Þ and Kðp~ ε ðkÞ; pε Þ using n n Eqs. (18) and (19) from the n data windows Y~ k ; (7) estimate the PDFs and the control limits K ysn ;α and K ε;α for a significance α; (8) when the ð2n þ 1Þth sample becomes available, mean center and scale the data to construct the data window Y~ 2n þ 1 ; (9) identify a state space model using subspace identification from Y~ 2n þ 1 ; (10) compute the KL divergence Kðp~ ys ð2n þ 1Þ; pys Þ and n

n

Kðp~ ε ð2n þ 1Þ; pε Þ and test the hypothesis whether the process   or is in-statistical-control, i.e. if K p~ ys ð2n þ 1Þ; pys n n  K p~ ε ð2n þ 1Þ; pε exceed the control limit K ysn ;α or K ε;α the process is out-of-statistical-control; otherwise, it is instatistical-control; and (11) select the next sample, set 2n þ j to 2n þ j þ 1 and continue with Step 8.

4. Monitoring of dynamic non-Gaussian system With respect to the data structure in Eq. (1), for the recorded data vector y to be non-Gaussian, the state vector s must be nonGaussian. This implies that the error vector e is non-Gaussian, which follows from Eq. (1a). However, based on the assumption that e does not have any time-based correlation, the assumption Efsðk þ tÞeðkÞg ¼ 0, t A N a1, still holds true. Further assuming that the process is stationary, the application of subspace identification allows estimating A, B and K, as well as ys ð1Þ. Similar to the construction of a monitoring model for dynamic non-Gaussian systems (3), establishing monitoring statistics based on the KL divergence requires a division of the reference data set into three disjunct sections. From the first section, including the ^ C, ^ samples 1 to n1, identify a state space model, i.e. the matrices A, K^ and the initial state vector ts ð1Þ. Next, using the samples in the middle section, i.e. samples n1 þ1 to n2, the functions ϕys and ϕε can be estimated using Eqs. (4)–(6). To capture the time-based correlation requires again the definition T T of the augmented vector yTsn ðkÞ ¼ ðtTs ðk  n þ1ÞC^ ⋯ tTs ðkÞC^ Þ, where n1 þ1 r k r n2 , n o n1 . Based on these n2 n1 augmented vectors, applying Eqs. (5) and (6) yields an estimate of βys and βε together n

with γ ys and δysn , and γ ε and δε , respectively. n Finally, using the third portion, including the samples n2 þ 1 to n3 allows applying Eq. (7) to approximate the values of the KL divergence for Kðp~ ys ; pys Þ and Kðp~ ε ; pε Þ for a window size n. By n

n

(1) Collect a total of 2n reference samples from the system describing normal process behavior; (2) divide the data set equally into two parts, as Fig. 1 suggests; (3) normalize the data to zero mean and unit variance to obtain the reference data set for identifying the state space model according to Eq. (1), Y ¼ fyi gni¼ 1 ; (4) define the samples inside the moving windows, by invoking samples from the latter half of the reference set, k Y~ k ¼ fy~ j gj ¼ k  n þ 1 ; n þ 1 rk r2n;

using these n3  n2 values for a window size n, the PDFs of Kðp~ ys ; pys Þ and Kðp~ ε ; pε Þ can be estimated and subsequently the

Fig. 1. Estimation of parameters for dynamic Gaussian model by dividing normal data into two disjunct regions.

Fig. 2. Estimation of parameters for dynamic non-Gaussian model by dividing Y into three disjunct regions.

n

n

control limits K ysn ;α and K ε;α . Fig. 2 shows the division of the reference set together with the estimation of the required parameters. To establish an on-line monitoring scheme requires the steps discussed in Section 3.4.

44

L. Xie et al. / Control Engineering Practice 43 (2015) 39–48

Table 1 Specification of the gearbox system. The values concerning the speed of stage are in rev/s and those for the Mashing frequency are in Hz Detail

Number of teeth

Speed of stage

Mashing frequency

Contact ratio

Overlap ratio

First stage Second stage

34/70

24.33

827.73

1.359

2.890

29/52

6.59

342.73

1.479

1.479

5. Application to a gearbox system This subsection presents an application study to recorded data from a two-stage helical gearbox system. Baydar, Chen, Ball, and Kruger (2001) and Baydar and Ball (2001) showed that the recorded vibration variables, using accelerometers, are Gaussian distributed and show significant time-based correlation. The gearbox operated under full load conditions of 260 N m. Table 1 presents details of the experimental setup. A more detailed description of this gearbox system is available in Kruger and Xie (2012) and Ge et al. (2010) for example. Fig. 3 in Ge et al. (2010) shows that the recorded vibration signals show a strong time-based correlation, implying that the monitoring approach developed in Section 3 is therefore required to describe this dynamic process using the data structure in Eq. (1). The proposed monitoring scheme in Section 3 (denoted as DM1) is compared with (i) the dynamic monitoring approach in Kruger et al. (2007) which is based on the statistical local approach (denoted here as DM2) and (ii) the conventional nonnegative quadratic statistics based on a dynamic PCA model (Ku et al., 1995) (denoted as DM3). The four vibration variables were simultaneously sampled at a frequency of 6.4 kHz, i.e. a sample time of 0.15625 ms. Among these variables, the analysis here relied on the reading of one accelerometer. The injected fault condition simulates a tooth breakage by removing a certain percentage of the widths of one tooth in the pinion gear, which is a serious localized fault condition. From this system, a reference data set and a set describing the effect of removing ten percent of the tooth were recorded for this study. The reference set contained a total of 4000 samples and the second set contained around 2000 samples. To identify the DM1 model, the reference set was divided into two equal parts: the first set to identify the dynamic data model and the second one to obtain the PDF for the Kðp~ ys ; pys Þ and Kðp~ e ; pe Þ n n monitoring statistics. The monitoring statistics for the DM2 and 2 DM3 models have a limiting central χ distribution with various degrees of freedom (Ku et al., 1995; Kruger et al., 2007). According to the discussions in Xie et al. (2006), the identification of A, C and K using the AIC criterion suggested the selection of 6 state variables for the DM1 model. The window length for the DM1 model was selected to be n ¼150 to capture the time-based correlation of the recorded vibration signal. Augmenting the six state variables by the recorded vibration variable, as discussed in Kruger et al. (2007), formed the basis for the DM2 model. To guarantee a fair comparison, the window length for the DM2 model was also set to be 150. Finally, based on the recommendations in Ku et al. (1995), the number of time-lagged arrangements for the DM3 model was determined to be five using parallel analysis. Applying the variance of the reconstruction error method Qin and Dunia (2000) suggested retaining six principal components. Fig. 3 shows the results of this comparison by defining the monitoring statistics of the 

 DM1 model by K p~ y ; py sn

 sn

 and K p~ ε ; pε ;

 DM2 model by T 2θ and T 2θε ; and  DM3 model by T2 and Q. s

Each row in Fig. 3 refers to the corresponding method. Moreover, the plots on the left- and right-hand side of Fig. 3 correspond to the statistics for the state variables/principal components and the residuals variables, respectively. The largest number of rejections of the null hypothesis, that is the process is in-statistical-control,  for the ten percent removal of one tooth emerged for the K p~ e ; pe 2 statistic of the DM1 model, followed by the T θe statistic of the DM2 model. It is also noticeable that for each of the dynamic monitoring models, the residuals-based statistics are more sensitive than those related to the state variables/principal components, with the Kðp~ ε ; pε Þ statistic being the most sensitive one. More precisely, following the discussion in Zeng et al. (2014), the statistical local approach dilutes an incipient constant  fault magnitude by the power of four, whilst the error based K p~ ε ; pε statistic impacts the fault condition by the power of two only, irrespective of the window lengths. The advantage in sensitivity of the window-based statistics over the conventional nonnegative quadratic forms increases with the window lengths. This explains why the latter statistics produced smaller number of violations of their control limits. In summary, by virtue of its construction, the monitoring approach based on the KL divergence (DM1) is the most sensitive one in this comparison, which follows from

 the dilution of the impact of the fault magnitude for small 

incipient fault conditions by the statistical local approach (DM2); and the use of a moving window approach that magnifies the presence of the fault condition in the recorded data (DM1 and DM2).

6. Application to distillation process I This subsection summarizes an application study to recorded data from a distillation unit. Resulting from the presence of controller feedback, the recorded variables show significant serial correlation (Xie et al., 2006). Moreover, applying the well-known Jarque–Bera test (Jarque & Bera, 1987) yields that most of the recorded variables do not have a Gaussian distribution. This requires the utilization of the developed scheme in Section 4 to monitor this process. From this process, designed to purify butane, data records for the 9 variables listed in Table 2 are available. A sample for each process variable is taken every 30 s. To identify dynamic models and to establish monitoring statistics, a set of 7500 reference samples describing normal process behavior was recorded. From this process, a second data set of 2500 samples was also recorded describing an abnormal event that was later identified as a sudden and unexpected variation in the setpoint of the reboiler pressure. As a result, the reboiler pressure varied extensively and in an abnormal pattern. As this abnormal event was not severe, it did not propagate and affected the control system of the distillation unit. Hence, it did not manifest itself in the recorded values of the remaining 8 variables. The exact root cause for this behavior, however, could not be determined. As before, the proposed monitoring scheme in Section 4 is again denoted as DM1 and compared with the dynamic approach based on the statistical local approach (Kruger et al., 2007) (DM2) and the standard dynamic PCA approach (Ku et al., 1995) (DM3). The statistics for the DM2 method are asymptotically Gaussian

L. Xie et al. / Control Engineering Practice 43 (2015) 39–48

45

Fig. 3. Monitoring statistics based on the KL divergence, the statistical local approach and conventional nonnegative quadratic formulations describing ten percent removal in one tooth.

Table 2 Variable set used in this study. Number

Variable

1 2 3 4 5 6 7 8 9

Impurities of benzene Temperature (top) Temperature (middle) Feed flow Reflux flow Distillate flow Overhead pressure Reboiler pressure Reboiler outlet temperature

distributed. This, however, is not the case for the nonnegative quadratic statistics related to the DM3 model. For this, the support vector data description, proposed in Liu, Xie, Kruger, Littler, and Wang (2008), is considered for the DM3 model. ^ C^ and K^ of the DM1 model, the To identify the matrices A, application of the AIC criterion (Xie et al., 2006) suggested ds ¼ 7 state variables. Again, augmenting the seven state variables by the recorded process variables formed the new data matrix for the DM2 model. For identifying the state space model, the first 3750 samples of the reference set were used according to Fig. 2. Dividing the remaining 3750 samples again into two parts, using the first 3250 samples for estimating the kernel-based density ratio and the last 500 samples for estimating the PDF of Kðp~ yn ; pya st Þ and Kðp~ ε ; pε Þ. The window length for the DM1 and DM2 model was set 100. It is important to note that selecting the window length too small compromises the sensitivity for detecting incipient faults and choosing the window size too larger may increase the average run length for detecting a fault condition (Kruger & Xie, 2012; Zeng et al., 2014). Finally, based on the recommendations by Ku et al. (1995), the number of time-lagged arrangements for the DM3 model was determined to be three using parallel analysis. Applying the variance of the reconstruction error method Qin and Dunia (2000) suggested retaining 4 principal components. Applying the Jarque–Bera test to the residuals of the state space model, ε, confirmed the earlier observations that the process variables have a non-Gaussian distribution. In fact, all of the residual variables have a non-Gaussian distribution. Fig. 4 summarizes the results of applying each dynamic monitoring model. As before, the rows from top to bottom show the results of the DM1, DM2 and DM3 models. Moreover, the plots in the left and

right columns correspond to the state variables/principal components and the model residuals, respectively. The abnormal event emerged around 1000 samples (8 h and 20 min) into the data set. The period during which this event arose lasted for around 4 h or after around 1500 samples into the data set. Each of the three monitoring approaches successfully detected this event. For the state variables/principal components, the Kðp~ ys ; pys Þ statistic showing the strongest response. In contrast, n n the T 2θs and the T2 statistics of the DM2 and DM3 approaches showed a less pronounced response. For the residual-based statistics, Kðp~ ε ; pε Þ is most sensitive one, whilst the T 2θε and Q statistics showed a sporadic response. Similar to the application to the gearbox system, this study confirms the increased sensitivity of statistics constructed from the KL divergence to detect incipient fault conditions too.

7. Application to distillation process II This section applies the DM1, DM2 and DM3 approaches to detect an abnormal event of a second distillation process. Different from the other two application examples, the abnormal situation studied here has a considerable effect upon the recorded variables and can, therefore, not be considered to have a minor effect. An initial analysis of the recorded data showed significant time-based correlation, resulting from controller feedback, which was also observed in other studies based on this process (Kruger & Xie, 2012; Kruger et al., 2007; Rajaraman et al., 2006; Schubert et al., 2011). Moreover, applying the Jarque–Bera test (Jarque & Bera, 1987) on the recorded variable set also showed that the recorded variables are non-Gaussian distributed. A detailed introduction of this process is available in Chapter 5 of Kruger and Xie (2012). Table 3 lists the 13 variables used in this study. The process separates butane (C4) from a fresh feed consisting of a mixture of hydrocarbons including butane, pentane (C5) and impurities of propane (C3). The separation is carried out in a unit that consists of 33 trays. An upstream depropanizer unit provides the feed to this debutanizer unit, entering around the middle of the column height. The lighter vaporized components C3 and C4 are liquidized as overhead product. The overhead stream is collected in a reflux drum vessel and the remaining C3 and C5 in the C4 product stream is measured through online analyzers. A second stream is taken from the bottom of the column, which is divided between a feed entering the re-boiler and the bottom product stream. The temperature of the vaporized steam is also

46

L. Xie et al. / Control Engineering Practice 43 (2015) 39–48

Fig. 4. Monitoring statistics based on the KL divergence, the statistical local approach and conventional nonnegative quadratic formulations describing impact of abnormal increase in feed.

Table 3 Variable set used in this study. Number

Variable

1 2 3 4 5 6 7 8 9 10 11 12 13

Tray 14 temperature Column overhead pressure Tray 2 temperature Fresh feed temperature Reboiler steam flow Tray 31 temperature Fresh feed flow Reboiler temperature Bottom draw Percentage C3 in C4 Percentage C5 in C4 Top draw Percentage C4 in C5

measured. A detailed description can be found in Schubert et al. (2011). To identify dynamic models, a reference set containing 20086 samples was used, recorded at a sampling interval of 30 s. A fault condition describing a severe drop in feed flow and feed temperature was also recorded from this process. To identify the matrices ^ C^ and K^ of the DM1 model, the application of the AIC criterion A, (Xie et al., 2006) suggested ds ¼15 state variables. The DPCA2 model (Kruger et al., 2007), therefore, included the dy ¼ 13 process and ds ¼ 15 state variables. Finally, the application of parallel analysis yielded an arrangement of three lagged samples of the dy ¼13 process variables for the DM3 model. Applying the Jarque–Bera test to the residuals of the state space model in Eq. (1) confirmed the earlier observations that the process variables have a non-Gaussian distribution. For a window length of 100, the results of applying each dynamic monitoring model are summarized in Fig. 5. As before, the rows from top to bottom show the results of the DM1, DM2 and DM3 models. Moreover, the plots in the left and right columns correspond to the source variables and the model residuals, respectively. The drop in fresh feed/temperature has a considerable and immediate impact upon the recorded variables. Hence, this fault cannot be seen as small or incipiently developing, which was the case for the 10% removal of one tooth in the pinion gear (Section 5) or for the abnormal variation in the setpoint of the reboiler pressure for the distillation unit (Section 6). This time, each method correctly detected this event around 400 samples (3 h

and 20 min) into the data set. In summary, the proposed monitoring approach based on the KL divergence is equally as sensitive in detecting fault conditions if they are significant. The increased sensitivity of this proposed approach is predominantly noticeable for detecting fault condition that has a minor impact upon the recorded process variables and/or develop incipiently.

8. Concluding summary This paper has proposed a monitoring approach for dynamic systems yielding process variables that possess time-based correlation and can be Gaussian or non-Gaussian distributed. The proposed approach relies on the KL divergence, which has been shown to be more sensitive when incorporated into the conventional multivariate statistical control framework. The dynamic model structure utilized here to describe time-based correlation is of a non-causal innovation form. Compared to a dynamic PCA model, proposed in the literature, the state variables describe the correlation instead of a time-lagged arrangement of the process variables and typically have a significantly smaller dimension (Odiowei & Cao, 2010; Treasure et al., 2004). For Gaussian distributed systems, the state space model describes a Gauss–Markov process for which the paper has developed two monitoring statistics, one that corresponds to the correlation encapsulated in the state variables and a second one related to the error variables. If the recorded process variables have a non-Gaussian distribution, the paper proposed to compute the KL divergence by estimating the ratio of two probability density functions using a kernel-based approach. This also yields two monitoring statistics, one related to the non-Gaussian distributed state variables and a second for the non-Gaussian distributed error variables. Computing the KL divergence for process variables that are Gaussian and non-Gaussian distributed, the paper utilizes a moving window approach. The article has shown that incorporating the KL-divergence into a dynamic process monitoring approach yields more sensitive monitoring statistics to detect incipient fault conditions. This has been exemplified using three application studies, involving recorded data from a gearbox system and two distillation units, where the proposed monitoring approach has been contrasted with existing work involving data-driven monitoring models. The fault condition for the gearbox system and one of the distillation processes were incipient and, therefore, showed a minor effect upon the recorded process variables. The monitoring statistics

L. Xie et al. / Control Engineering Practice 43 (2015) 39–48

47

Fig. 5. Monitoring statistics based on the KL divergence, the statistical local approach and conventional nonnegative quadratic formulations describing impact of severe drop in fresh feed.

constructed from the KL divergence showed the most significant response to both fault conditions. In contrast, the recorded fault condition for the second distillation unit has a major effect upon the process variables and each of the competitive methods was able to detect the abnormal event.

Acknowledgments The authors wish to acknowledge the financial support from National Natural Science Foundation of China, Grant nos. 61203088, 61134007 and 61374121, the 111 Project from the Ministry of Education, China, Grant no. B07031 and are grateful for funding from the Petroleum Institute, Grant nos. RIFP 14301 and RIFP 15332. References Basseville, M. (2013). Divergence measures for statistical data processing—An annotated bibliography. Signal Processing, 93(4), 621–633. Basseville, M., Abdelghani, M., & Benveniste, A. (2000). Subspace-based fault detection algorithms for vibration monitoring. Automatica, 36(1), 101–109. Baydar, N., & Ball, A. D. (2001). A comparative study of acoustics and vibration signals in detection of gear failures using Wigner–Ville distribution. Mechanical Systems & Signal Processing, 15(6), 1091–1107. Baydar, N., Chen, Q., Ball, A. D., & Kruger, U. (2001). Detection of incipient tooth defect in helical gears using multivariate statistics. Mechanical Systems & Signal Processing, 15(2), 303–321. Bezzaoucha, S., Marx, B., Maquin, D., & Ragot, J. (2014). Finite memory state observer design for polytopic systems—application to actuator fault diagnosis. In Proceedings of IEEE conference on control applications (pp. 97–103). Bittencourt, A. C., Saarinen, K., Sander-Tavallaey, S., Gunnarsson, S., & Norrlof, M. (2014). A data-driven approach to diagnostics of repetitive processes in the distribution domain c applications to gearbox diagnostics in industrial robots and rotating machines. Mechatronics, 24(8), 1032–1041. Chan, A. B. (2008). Beyond dynamic textures: A family of stochastic dynamical models for video with applications to computer vision (Master's thesis). University of California San Diego. Chan, A. B., & Vasconcelos, N. (2004). Efficient computation of the KL divergence between dynamic textures. Technical report. San Diego, USA: University of California. Chen, Q., Kruger, U., & Leung, A. Y. T. (2004). Regularised kernel density estimation for clustered process data. Control Engineering Practice, 12(3), 267–274. Chow, E., & Willsky, A. (1984). Analytical redundancy and the design of robust failure detection systems. IEEE Transactions on Automatic Control, 29, 603–614. Ge, Z., & Song, Z., 2013. Multivariate statistical process control. London, UK: SpringerVerlag. Ge, Z., Xie, L., Kruger, U., Lamont, L., & Wang, S. Q. (2009). Sensor fault identification and isolation for multivariate non-Gaussian processes. Journal of Process Control, 19(10), 1707–1715. Ge, Z., Kruger, U., Lamont, L., Xie, L., & Song, Z. (2010). Fault detection in nonGaussian vibration systems using dynamic statistical-based approaches. Mechanical Systems & Signal Processing, 24(8), 2972–2984.

Goldberger, J., Gordon S., & Greenspan H. (2003). An efficient image similarity measure based on approximations of kl-divergence between two gaussian mixtures. In Proceedings of the ninth IEEE international conference on computer vision (pp. 487–493), 13–16 October 2003, Nice, France. Graton, G., Kratz, F., & Fantini, J. (2014). Finite memory observers for linear timevarying systems: Theory and diagnosis applications. Journal of the Franklin Institute, 351(2), 785–810. Harmouche, J., Delpha, C., & Diallo, D. (2014). Incipient fault detection and diagnosis based on Kullback–Leibler divergence using principal component analysis: Part I. Signal Processing, 94(1), 278–287. Jarque, C. M., & Bera, A. K. (1987). A test for normality of observations and regression residuals. International Statistical Review, 55(2), 1–10. Kanamori, S., Hido, S., & Sugiyama, M. (2009). A least squares approach to direct importance estimation. Journal of Machine Learning Research, 10(7), 1391–1445. Komulainen, T., Sourander, M., & Jamsa-Jounela, S. (2004). An online application of dynamic PLS to a dearomatization process. Computers and Chemical Engineering, 28(12), 2611–2619. Kourti, T., & MacGregor, J. F. (1995). Process analysis, monitoring and diagnosis using multivariate projection methods. Chemometrics & Intelligent Laboratory Systems, 28, 3–21. Kourti, T., & MacGregor, J. F. (1996). Multivariate SPC methods for process and product management. Journal of Quality Technology, 28, 409–428. Kruger, U., & Xie, L. (2012). Statistical monitoring of complex multivariate processes. Chichester, UK: John Wiley & Sons. Kruger, U., Chen, Q., Sandoz, D. J., & McFarlane, R. C. (2001). Extended PLS approach for enhanced condition monitoring of industrial processes. AIChE Journal, 47(9), 2076–2091. Kruger, U., Kumar, S., & Littler, T. (2007). Improved principal component modelling using the local approach. Automatica, 43(9), 1532–1542. Ku, W., Storer, R. H., & Georgakis, C. (1995). Disturbance rejection and isolation by dynamic principal component analysis. Chemometrics & Intelligent Laboratory Systems, 30(1), 179–196. Kullback, S., & Leibler, R. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86. Larimore, W. E. (1990). Canonical variate analysis for system identification, filtering, and adaptive control. In Proceedings of the IEEE conference on decision and control (pp. 635–639). Piscataway, NJ: IEEE Press. Lee, J., & Edgar, T. F. (2002). Subspace identification method for simulation of closed-loop systems with time delay. AIChE Journal, 48(2), 417–420. Lee, J. M., Yoo, C., & Lee, I. B. (2004). Statistical process monitoring of dynamic processes based on independent component analysis. Chemical Engineering Science, 14(7), 2995–3006. Li, W. H., & Shah, S. (2002). Structured residual vector-based approach to sensor fault detection and isolation. Journal of Process Control, 12(3), 429–443. Liu, X., Xie, L., Kruger, U., Littler, T., & Wang, S.-Q. (2008). Statistical-based monitoring of multivariate non-Gaussian systems. AIChE Journal, 54(9), 2379–2391. Liu, S., Yamada, M., Collier, N., & Sugiyama, M. (2013). Change-point detection in time-series data by relative density-ratio estimation. Neural Networks, 43(1), 72–83. Odendaal, H. M., & Jones, T. (2014). Actuator fault detection and isolation: An optimized parity space approach. Control Engineering Practice, 26(5), 222–232. Odiowei, P. P., & Cao, Y. (2010). Nonlinear dynamic process monitoring using canonical variate analysis and kernel density estimators. IEEE Transactions on Industrial Informatics, 6(1), 36–45. Penny, W. D. (2001). KL divergences of normal, gamma, Dirichlet and Wishart distributions. Technical report. UK: University College London.

48

L. Xie et al. / Control Engineering Practice 43 (2015) 39–48

Pranatyasto, T. N., & Qin, S. J. (2001). Sensor validation and process fault diagnosis for FCC units under MPC feedback. Control Engineering Practice, 9(8), 877–888. Qin, S. J. (2006). An overview of subspace identification. Computers & Chemical Engineering, 30(10–12), 1502–1513. Qin, S. J., & Dunia, R. (2000). Determining the number of principal components for best reconstruction. Journal of Process Control, 10(2–3), 245–250. Qin, S. J., & Li, W. (2001). Detection, identification of faulty sensors in dynamic processes. AIChE Journal, 47, 1581–1593. Qin, S. J., & Li, W. H. (2004). Detection and identification of faulty sensors in dynamic processes. AIChE Journal, 47(7), 1581–1593. Rajaraman, S., Kruger, U., Mannan, M. S., & Hahn, J. (2006) A new sensor fault diagnosis technique based upon subspace identification and residual filtering. In Proceedings of 2006 International Conference on Intelligent Computing (pp. 990–998), 16–19 August 2006, Kunming, PR China. Ramirez, J., Segura, J. C., Benitez, C., & de la Torre, A. (2004). A new Kullback–Leibler VAD for speech recognition in noise. IEEE Signal Processing Letters, 11(2), 266–269. Russel, E. L., Chiang, L. H., & Braatz, R. D. (2000). Data-driven techniques for fault detection and diagnosis in chemical processes. Advances in industrial control. London: Springer. Schubert, U., Kruger, U., Arellano-Garcia, H., Feital, T., & Wozny, G. (2011). Unified model-based fault diagnosis for three industrial application studies. Control Engineering Practice, 19(5), 479–490. Schubert, U., Kruger, U., Wozny, G., & Arellano-Garcia, H. (2012). Input reconstruction for statistical-based fault detection and isolation. AIChE Journal, 58(5), 1513–1523.

Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall. Treasure, R. J., Kruger, U., & Cooper, J. E. (2004). Dynamic multivariate statistical process control using subspace identification. Journal of Process Control, 14, 279–292. van Overschee, P., & de Moor, B. (1996). Subspace identification for linear systems. Norwell, USA: Kluwer Academic Publishers. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., & Yin, K. (2003). A review of process fault detection and diagnosis Part III: Process history based methods. Computers & Chemical Engineering, 27(3), 327–346. Wang, J., & Qin, S. J. (2002). A new subspace identification approach based on principal component analysis. Journal of Process Control, 12(8), 841–855. Wang, Y., Gao, B., & Chen, H. (2015). Data-driven design of parity space-based FDI system for AMT vehicles. IEEE/ASME Transactions on Mechatronics, 20(1), 405–415. Xie, L., Kruger, U., Lieftucht, D., Littler, T., Chen, Q., & Wang, S. Q. (2006). Statistical monitoring of dynamic multivariate processes Part I: Modeling auto-correlation and cross-correlation. Industrial and Engineering Chemistry Research, 45(5), 1659–1676. Yu, S., & Mehta, P. G. (2009). The Kullback–Leibler rate metric for comparing dynamical systems. In Proceedings of the IEEE conference on decision and control, Shanghai, China. Zeng, J. S., Kruger, U., Geluk, J., Wang, X., & Xie, L. (2014). Detecting abnormal situations using the Kullback-Leibler divergence. Automatica, 50(11), 2777–2786.