Fault detection with Conditional Gaussian Network

Fault detection with Conditional Gaussian Network

Engineering Applications of Artificial Intelligence 45 (2015) 473–481 Contents lists available at ScienceDirect Engineering Applications of Artificial...

1MB Sizes 0 Downloads 70 Views

Engineering Applications of Artificial Intelligence 45 (2015) 473–481

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai

Fault detection with Conditional Gaussian Network Mohamed Amine Atoui n, Sylvain Verron, Abdessamad Kobi ISTIA, LARIS EA 7315, L'UNAM, 62, Avenue Notre Dame du Lac, 49000 Angers, France

art ic l e i nf o

a b s t r a c t

Article history: Received 15 May 2014 Received in revised form 11 June 2015 Accepted 27 July 2015 Available online 28 August 2015

The main interest of this paper is to illustrate a new representation of the Principal Component Analysis (PCA) for fault detection under a Conditional Gaussian Network (CGN), a special case of Bayesian networks. PCA and its associated quadratic statistics such as T2 and SPE are integrated under a sole CGN. The proposed framework projects a new observation into an orthogonal space and gives probabilities on the state of the system. It could do so even when some data in the sample test are missing. This paper also gives the probabilities thresholds to use in order to match quadratic statistics decisions. The proposed network is validated and compared to the standard PCA scheme for fault detection on the Tennessee Eastman Process and the Hot Forming Process. & 2015 Elsevier Ltd. All rights reserved.

Keywords: Fault detection Statistical inference PCA Conditional Gaussian Networks Tennessee Eastman Process Hot Forming Process

1. Introduction Nowadays, systems failures can potentially lead to serious consequences for human, environment or material, and sometimes fixing them could be expensive and even dangerous. Thus, in order to avoid these undesirable situations, it becomes very important and essential for current modern complex systems to early detect any changes in the system nominal operations before they become critical. To do so, several detection methods have been developed and enhanced these last years. These methods can be broadly indexed into two principal approaches, named modelbased methods and data-driven methods. Model-based methods are powerful and efficient widely used methods. They are related on the system analytical representation (detailed physical model). However, obtaining this representation for complex, large-scale systems is often not possible or very tricky and requests a lot of time and money. To deal with that, data driven methods have received a significant attention. These methods unlike modelbased ones use only measures taken directly from the system (or their transformation) at different times (historical data). Several data driven methods for faults detection have been proposed (Yin et al., 2012; Ding, 2012; Qin, 2012; Venkatasubramanian et al., 2003; Chiang et al., 2001). Many of them are based on rigorous statistical development of system data and one can mention Subspace aided APproach (SAP), powerful data-driven tools developed to address the problems of building an accurate physical model for complex n

Corresponding author. E-mail address: [email protected] (M.A. Atoui).

http://dx.doi.org/10.1016/j.engappai.2015.07.020 0952-1976/& 2015 Elsevier Ltd. All rights reserved.

systems. Partial Least Squares (PLS), Principal Component Analysis (PCA) and their variants (dynamic, non-linear, kernel, and probabilistic) are statistical methods widely used for data reduction and fault detection purpose. PCA is a well-known and powerful data-driven technique significantly used in application for fault detection but also in many other fields due to its simplicity for model building and efficiency to handle a huge amount of data. In order to identify at any moment if the system is In Control (IC) or not (the system is Out of Control OC), it is, according to Ding et al. (2010) and Qin (2003), associated to statistics with quadratic forms. These statistics are not only associated to PCA but also to many others data driven and model-based methods. Among these statistics, two well-known and used statistics are the T2 and SPE (Squared Prediction Error) statistics. These two are generally combined to complement each other and thus enhance the fault sensitivity. Meanwhile, in the last decades, Bayesian networks (BN) have been also proposed for fault detection (Yu and Rashid, 2013; Verron et al., 2010a; Huang, 2008; Roychoudhury et al., 2006; Schwall and Gerdes, 2002; Lerner et al., 2000). BN's are powerful tools designed by experts and/or learned from data. They offer a Probabilistic/statistical framework that able to integrate information from different sources which may be of interest for fault detection. Indeed, the use and the fusion of all the information available on the system (as causal influences (e.g. graphical representations of variables dependencies), probabilistic fault detection decisions, maintainability information, components reliability and so on) could enhance and provide better decisions. On this perspective, we propose to use a BN in order to model PCA fault detection techniques.

474

M.A. Atoui et al. / Engineering Applications of Artificial Intelligence 45 (2015) 473–481

Nomenclature 0 Bi ; B^ i ; B~ i cΔ e E F gα

zero or a vector of zeros, depending on the context respectively rows of the eigenvectors matrix P, P^ , P~ coefficient of variability natural exponential function set of an BN arcs Fisher distribution the normal deviate corresponding to the upper 1  α percentile I identity matrix N ðμ; Σ Þ Gaussian (normal) probability density function (pdf) with μ-means and covariance matrix Σ N number of samples ^ P~ ; A~ eigenvectors matrices P; A; P^ ; A; p a probability measure: a probability distribution or a probability density function. Its meaning will be clear from the context

Another important challenge is to handle on-line missing observations. The most used approaches are based on the imputation methods, which try to complete the missing values. However, these methods are time consuming and depend strongly on the missing rate of the original sample. The proposed network, unlike most of the proposed Bayesian networks for fault detection, is able to respect a false alarm rate, model PCA fault detection scheme and handle automatically missing observation without delay or imputation. The main interests of this paper can be described in few points : (1) a generalized form of the quadratic statistics (e.g. T2, SPE) under a probabilistic tool, (2) a probabilistic framework for fault detection purpose, managing both PCA (systematic and residual subspaces) and statistics under a single BN using discrete and Gaussian nodes, and (3) probabilities about the system state could be provided, even when data on line are missing (a non-imputation method to handle unobserved variable s). The remainder of this paper is structured as follows. In Section 2 a brief description of some definitions and tools needed to develop our proposals is given, Section 3 describes and introduces the development of PCA under CGNs for fault detection purpose. This is followed by a comparison between our proposal and the standard PCA, two cases studies are given. Finally, conclusions and outlooks are outlined in the last section.

V set of an BN nodes x; x þ ; x  ; y; t multivariate variables X normalized set of samples of x X^ ; X~ respectively the systematic and noise part of X Z; Z^ ; Z~ spaces generated by PCA

α ϵ Λ ω

value of the error of first kind error model set of non-negative real eigenvalues ratio of p(OC) to p(IC) CLΔ control limit of the quadratic statistic Δ θ1 ; θ2 ; θ3 ; h0 parameters used for the calculation of CLSPE OC ζ IC probabilistic control limits given the quadratic statistic Δ ; ζΔ Δ and the states : IC, OC PaðxÞ set of parent nodes of x A⧹fBg A except B E½x; cov½x respectively the expected value and the covariance of the variable x

 a conditional probability table (CPT) associate to each node, 

given its parents, describing probabilistic dependencies between variables, calculations (e.g. based on Bayes rule) named inference, used given the availability of new information (evidence) about one or several G nodes, to update the network (e.g. to give the posterior probabilities).

2.1.2. Conditional Gaussian Networks A particular form of Bayesian networks is the Conditional Gaussian Network (CGN). Each node in the network represents a random variable that may be discrete or Gaussian (univariate/ multivariate). However, following Lauritzen and Jensen (2001), Lauritzen (1992), for the availability of exact computation (inference) discrete nodes are not allowed to have continuous parents, they have only discrete parents. Thus, each Gaussian node, given its Gaussian parents follows a Gaussian linear regression model (linear combination of its continuous parents observations), with parameters depending on its discrete parents. In this paper, we restrict our attention to two kinds of Gaussian nodes. First, the linear Gaussian node, a Gaussian node y with only Gaussian parents Φ1 ; …; Φd . Its conditional distribution is given by pðyj Φ1 ¼ ϕ1 ; …; Φd ¼ ϕd Þ ¼ N ðμy þ W 1 ϕ1 þ …þ W d ϕd ; Σ y Þ

2. Tools 2.1. Bayesian Networks 2.1.1. Definition A Bayesian Network (BN) (Jensen and Nielsen, 2007) is a probabilistic graphical model. It is associated and consists of the following:

 a directed acyclic graph G, G ¼(V, E), where V is the vertexes set of G (nodes), and E is the edges set of G (arcs),

 a finite probabilistic space ðΩ; Z; pÞ, with Ω a non-empty space, Z a collection of the subspaces of Ω and, p a probability measure on Z with pðΩÞ ¼ 1,  a set of random variables x ¼ x1 ; …; xm associated with the vertexes of the graph G and defined on ðΩ; Z; pÞ, such that: m

pðx1 ; x2 ; …; xm Þ ¼ ∏ pðxi j Paðxi ÞÞ i¼1

where Paðxi Þ is the set of the parent nodes of xi in G,

ð1Þ

ð2Þ

where μy is a parameter governing the mean of y, Σ y is the covariance matrix of y, W 1 ; …; W d are the regression coefficients. Note that, the joint distribution p ðy; PaðyÞÞ is also Gaussian. If Σ y is null then (2) represents a deterministic linear relationship between y and its parents. The second node, the conditional linear Gaussian node without Gaussian parents, a Gaussian node y with only discrete parents PaðyÞ ¼ ðΘ1 ; …; Θd Þ. It is linear Gaussian for each value kPaðyÞ of its parents PaðyÞ. Its conditional distribution could be written as below: pðyj PaðyÞ ¼ kPaðyÞ Þ ¼ N ðμkPaðyÞ ; Σ kPaðyÞ Þ;

kPaðyÞ A K PaðyÞ

ð3Þ

where μkPaðyÞ and Σ kPaðyÞ are respectively the mean and the covariance matrix of y given the values kPaðyÞ of its parents. K PaðyÞ represent the different values that the parents of y can take. 2.1.3. Discriminant analysis and CGN Many Conditional Gaussian Networks can be used to solve discrimination problems between different data classes. Their nodes set V always include a discrete node indexing the different

M.A. Atoui et al. / Engineering Applications of Artificial Intelligence 45 (2015) 473–481

classes. Some structures taken by these networks arise to Discriminant Analysis (DA) (Friedman et al., 1997). DA is a supervised statistic technique used to solve classification problem (see Duda et al., 2001), commonly under the assumption that the classes are normally distributed. Consider a new observation vector x of x A Rm and K different classes C k;k A 1;…;K , using the Bayes formula: pðC k j xÞ ¼

pðC k Þpðxj C k Þ ; pðxÞ

pðxÞ 4 0

ð4Þ

DA affects x to the class Ck having the maximal a posteriori probability pðC k j xÞ as below:

δ : x A Ck

n

pðC k Þpðxj C k Þ n if k ¼ argmax pðxÞ k ¼ 1;…;K

ð5Þ

where pðC k Þ represents the prior probability of the class C k ; pðxÞ is the normalization factor which does not affect the decision and pðxj C k Þ is the multivariate Gaussian probability density function of x given class Ck pðxj C k Þ ¼

1 2π

m=2 j

Σ Ck

j 1=2

e

ð  ðx  μC ÞT Σ C 1 ðx  μC ÞÞ=2 k

k

k

475

Step II : Find the eigenvectors Pj and the eigenvalues λj of Σ using a Singular Value Decomposition (SVD):

Σ ¼ P ΛP T P T ¼ ½BT1 …BTm ;

P j A Rm ;

P ¼ ½P 1 …P m ;

Λ ¼ diagðσ 21 ; …; σ 2m Þ;

BTi A Rm ;

PP T ¼ I

σ 21 Z ⋯ Z σ 2m 4 0

where σ 2j ¼ λj are non-negative real eigenvalues corresponding to the variance of X mapped through Pj, P is the matrix of the eigenvectors Pj and Bi are the rows of P. Step III : Determine a (Valle et al., 1999), the number of dominant eigenvectors of Σ in which the retained variance under projection is maximal (the principal axes P^  P with the ^ ), and divide P, Λ into two largest associated eigenvalues Λ parts as below: " # Λ^ 0 ~ ¼ diagðσ 2 ; …; σ 2 Þ Λ¼ ; Λ aþ1 m ~ 0 Λ mm ma ; P^ A R P ¼ ½P^ P~  A R P T ¼ ½BT1 …BTm ;

BTi ¼ ½B^ i B~ i T ;

T B^ i A Ra

ð6Þ

with μC k the mean vector of the class Ck, and Σ C k the covariance matrix of the class Ck. These parameters can be estimated using the Maximum Likelihood Estimation (MLE), given the data available. The rule given in (5) is called the Maximum A Posteriori (MAP) rule and represents the quadratic discriminant analysis. From it, many others discrimination rules (e.g. Linear Discriminant Analysis) can be derived by considering others assumptions on classes covariance matrices. Quadratic DA could be done under the CGN classifier shown in Fig. 1. This CGN consists of a discrete root node D that represent the K classes, and a multivariate Gaussian node x A Rm which takes into account correlation that may exist between the m variables under each class.

Step IV: Deduce Z^ and Z~ , such as X can be written as follows: T T X ¼ X^ þ X~ ¼ Z^ P^ þ Z~ P~ ; Z ¼ ½Z^ Z~  Z^ ¼ X P^ ¼ ½Z 1 …Z a  ¼ ½XP 1 …XP a ; Z^ A RNa

Z~ ¼ X P~ ¼ ½Z a þ 1 …Z m  ¼ ½XP a þ 1 …XP m ;

Z~ A RNðm  aÞ

2.2.2. Fault detection with PCA Once principal and residual subspaces are identified, PCA could be used for fault detection. These subspaces are monitored, since faults may affects one or both of them. In this paper, among several test statistics we focus on those with a quadratic form and specially SPE (Squared Prediction Error) and T2 statistics (the most measures used in literature). These two statistics along with their appropriate control limits detect different types of faults, and their advantages can be utilized by employing them together (Chiang et al., 2001). Fault detection using these measures to monitor the both subspaces of PCA can be done as below

 Step I: Set the Control Limits (CLs) for SPE (Jackson and Fig. 1. CGN for DA (multivariate form).

2.2. Principal Component Analysis 2.2.1. Definition Principal Component Analysis (PCA) is a famous multivariate statistical technique (Ding et al., 2011; Jackson, 2001). Let us consider X A RNm , a normalized set (scaled to zero mean and unit variance) of N collected samples of the system input and output variables x ¼½x1 ; …; xm T . PCA projects linearly X, the original space onto an orthogonal space Z called the score space such that: Z¼XP, where the orthonormal columns of P A Rmm refers to its axes. This projection allows the separation of the original observation space into two T parts (subspaces) X ¼ X^ þ X~ : its systematic (principal) part X^ ¼ Z^ P^ , T ma ^ ~ ~ ~ where P A R , and its noise (residual) part X ¼ Z P , where P~ A Rmðm  aÞ . PCA approach can be achieved as below:

1 XT X N 1

θ1

θi ¼

m X

σ 2i j ;

θ1

i ¼ 1; 2; 3;

j ¼ aþ1

ð7Þ

h0 ¼ 1 

2θ1 θ3 2θ 2 2

ð8Þ

where g α is the normal deviate corresponding to the upper 1  α percentile, T 2 : CLT2 ¼

aðN 2  1Þ F α ða; N aÞ NðN  aÞ

ð9Þ

where F α is the Fisher distribution.

 Step II: For each new and normalized (given the mean and variances of X) sample x ¼ ½x1 ; …; xm T , compute the two statistics T2 and SPE as below: ^  1 P^ T x ¼ t^ T Λ  1 t^ ; T 2 ¼ xT P^ Λ

Step I: Form the covariance matrix of X:

Σ

Mudholkar, 1979) and T2 statistic (Tracy et al., 1992) for a given significance level α: 0 qffiffiffiffiffiffiffiffiffiffiffiffiffi 11=h0 2 g α 2θ2 h0 θ h ðh  1Þ 2 0 0 A SPE : CLSPE ¼ θ1 @ þ1þ 2

T T SPE ¼ xT P~ P~ x ¼ t~ t~ ;

T t~ ¼ P~ x

T t^ ¼ P^ x

ð10Þ ð11Þ

476



M.A. Atoui et al. / Engineering Applications of Artificial Intelligence 45 (2015) 473–481

Note that T2 is associated to the principal components T T t^ A Ra ; t^ ¼ P^ x, and SPE to the last one t~ A Rðm  aÞ ; t~ ¼ P~ x. Step III: Finally, compare the measures calculated to their corresponding control limit and make a decision using this logic rule: if SPE rCLSPE and T 2 r CLT 2 then the system is declared fault-free (IC), otherwise the system is faulty (OC).

2.2.3. Gaussian latent variable model and PCA Based on a linear Gaussian model (probabilistic and generative model), Tipping and Bishop (1999) proposes a Probabilistic Principal Component Analysis (PPCA). It is a special case of statistical factor analysis and a generalization of the traditional (projective model) PCA (Kim and Lee, 2003). Under such a model, the m system variables (measurements) are considered as the linear combination of a o m mutually uncorrelated latent variables plus an additive noise. Using the following steps we can obtain the same systematic part X^ of X as with traditional PCA.

 Step I: Consider and use the following probabilistic inputoutput model in which all of the marginal and conditional distributions are Gaussian: x ¼ A^ t^ þ ϵ; x A Rm ; A^ A Rma ; t^ A Ra ^ vIÞ; t^  N ð0; IÞ; ϵ  N ð0; vIÞ; xj t^ ¼ N ðA^ t;

 

v0

pðt^ j x ¼ xn Þ ¼ pðt^ j x1 ¼ xn1 ; …; xm ¼ xnm Þ T ^ 1 M ¼ ðvI þ A^ AÞ

ð13Þ

where we can notice that unlike the posterior covariance, the posterior mean of pðt^ j x ¼ xn Þ depends on x, thus: T

n t^ ¼ E½t^ j x ¼ xn  ¼ M A^ ½xn1 ; …; xnm T

ð14Þ

 Step IV: Given t^ n , calculate x^ n A X^ T , with n A 1; …; N, as below: n n n n x^ ¼ E½xj t ¼ t^  ¼ ½E½x1 j t ¼ t^ …E½xm j t ¼ t^ T n n T n ^ ^ ^ ¼ ½B 1 t^ ; …; B m t^  ¼ A t^

3.1. PCA and CGN PCA as discussed previously can be thought of as a generative model, where all the variables are Gaussian. This model could be represented under a simple illustrative graphical model: a BN. Indeed, it is obvious from Section 2.1.2 that (12) can be easily implemented under a CGN. Fig. 2 represents a Gaussian linear node x and its ^ which can naturally automate and generalize the Gaussian parent t, probabilistic calculations employed in Section 2.2.3. In this graph, x as in (12) correspond to an m-dimensional observed variable that follows ^ vIÞ, where v  0, A^ is the a conditional Gaussian distribution N ðAt; regression matrix corresponding to the eigenvectors matrix P^ , and t^ correspond to an a-dimensional score variable.

ð12Þ

where I is the identity matrix, v is a scalar, t^ and ϵ are independent random variables, where the latent variable t^ is assumed to explain all the systematic variability of x. Note that, according to Bishop et al. (2006) there is no loss of generality in assuming a zero mean and unit covariance Gaussian for the ^ Because a more general Gaussian distribution latent variable t. will give rise to an equivalent probabilistic model. Step II: Given X with row vector ðxn ÞT ¼ ½xn1 ; …; xnm , find a and the matrix A^ equivalent to the matrix P^ obtained by PCA, by means of, e.g., Steps II and III of PCA seen above. n Step III: Based on (12), deduce t^ according to the posterior distribution pðt^ j x ¼ xn Þ given by: T ¼ N ðM A^ ½xn1 ; …; xnm T ; vMÞ;

a CGN, after we propose a probabilistic framework for statistics as T2 and SPE, and ultimately we give the proposed CGNs for fault detection purpose. These CGNs can be used as an alternative to the PCA scheme for fault detection. Note, however, such as PCA they may be suitable for some applications and not for others (e.g. dynamic data, nonlinear data and so on.).

Fig. 2. PCA under a CGN.

However, despite the fact that this network can be used for data reduction as standard PCA, it does not reconstruct entirely x. Indeed, it only manages the systematic part of PCA (see (2.2.1)) and do not handle the last m  a components t~ of x. To make possible the separation of the original space into two subspaces, ^ another Gaussian node t~ with a same prior distribution as t, representing an (m  a)-dimensional variable is added. The new structure of the network shown in Fig. 3 represents a probabilistic model that can be written as follows: x ¼ A^ t^ þ A~ t~ þ ϵ

ð16Þ

with x A R ; t^ A Ra ; t~ A Rm  a ; A^ ¼ P^ ; A~ ¼ P~ ; ϵ A Rm ; ϵ  N ð0; vIÞ; v  ^ t~  N ðA^ t^ þ A~ t; ~ vIÞ, where t~ and t^ 0; t^  N ð0; IÞ; t~  N ð0; IÞ, and xj t; are independent. m

ð15Þ

This way to represent PCA under a probabilistic model can offer many advantages (Tipping and Bishop, 1999), such as the assignment of each variable to an probability density. It can be used for fault detection. It suffices to associate it to one quadratic statistic or more. However, this will be not optimal in the sense that we cross by steps of different nature. In the next section, we shall show that the PCA fault detection scheme can be done in a unified probabilistic way, and also in a sole tool (a CGN).

Fig. 3. PCA under a CGN (multivariate representation).

The marginal distribution pðx) of the observed variable x can be expressed, from the sum and product rules of probability, as below: Z Z pðxÞ ¼

^ tÞ ~ dt^ dt~ ¼ pðx; t;

Z

^ pðtÞ

Z

^ tÞpð ~ tÞ ~ dt^ dt~ pðxj t;

ð17Þ

3. The proposed probabilistic framework In this section, we propose original CGNs for fault detection. Under these networks, we simultaneously handle PCA and quadratic statistics that come with it. For clarity, we introduce PCA under

Because (17) corresponds to a linear Gaussian model, pðxÞ follows a Gaussian distribution, and is given by: pðxÞ ¼ N ðμx ; Σ x Þ

ð18Þ

M.A. Atoui et al. / Engineering Applications of Artificial Intelligence 45 (2015) 473–481

the parameters μx and Σ x can be obtained by completing the square, a well-known technique in manipulating Gaussian, or more directly using (16) so that: ^ t ~ t ^ þ AE½ ~ þE½ϵ μx ¼ E½x ¼ AE½ T

477

corresponding nodes as hidden. In other words, inference (calculations) is made by marginalizing over the unobserved variables. Based on that, we shall see thereafter that the proposed tool for fault detection is able to answer a query when some observations are missing unlike the PCA fault detection scheme.

^ t^ t^ A^ þ AE½ ~ t~ t~ A~ þE½ϵϵT  Σ x ¼ cov½x ¼ AE½ T

T

T

Also, the conditional probability distribution pðt^ j xÞ of the hidden node t^ can be obtained as below: R ^ pðxj t; ^ tÞpð ~ tÞ ~ dt~ pðtÞpðxj ^ ^ pðtÞ tÞ ¼ ¼ N ðμt^ j x ; Σ t^ j x Þ ð19Þ pðt^ j xÞ ¼ pðxÞ pðxÞ with ^ þ Σ ^ Σ  1 ðx  μ Þ μt^ j x ¼ E½t^ j x ¼ E½t x tx xx ^  Σ ^ Σ  1Σ ^ Σ t^ j x ¼ cov½t^ j x ¼ cov½t tx xx xt where the matrices Σ xx ; Σ xt^ and Σ tx belong to the covariance ^ matrix of the joint variable ½t^ xT . Thus, for a given observation xn, n t^ ¼ E½t^ j x. Furthermore, given the assumption made by the probabilistic model (16), where the elements xi ; i A 1; …; m of x are uncorrelated (the elements ϵi ; i A 1; …; m of ϵ are mutually independent), the proposed network given in Fig. 4 and the one presented previously in Fig. 3 are perfectly the same. They just differ on the representation of x (Fig. 4: univariate representation; Fig. 3: multivariate representation). Each Gaussian node xi in Fig. 4 represents the following probabilistic model: xi ¼ B^ i t^ þ B~ i t~ þ ϵi

ð20Þ

where xi A R; t^ A Ra ; t~ A Rm  a ; ϵi A R; ϵi  N ð0; vÞ; v  0; B^i A R1a ; i A ~ vÞ. ^ t~  N ðB^ i t^ þ B~ i t; 1; …; m; t^ ¼ N ð0; IÞ; t~ ¼ N ð0; IÞ; xi j t;

3.2. Statistics as T2 and SPE under a CGN. In this subsection, we propose a probabilistic framework to handle multivariate/univariate statistics (control charts) e.g. SPE, T2 and so on. These quadratic statistics (we regroup them under the same notation Δ) are widely used for fault detection purpose, and they are also associated with PCA (Ding et al., 2010). For a new observation x of a multivariate random variable x, they are first calculated and then compared to their predefined threshold (control limit (CLΔ ) in respect of a given false alarm rate α) to discriminate between the two states: IC, OC. If the statistic Δ is greater than CLΔ then the system is declared Out of Control (OC), else the system is in Control (IC). Based on Conditional Gaussian Network classifiers (that discriminate between two classes: IC and OC) having the same structure as the network shown in Fig. 1, we want a probabilistic representation of these statistics. To do so, given a quadratic statistic Δ, we need to define its corresponding network paraOC meters and probabilistic control limit e.g. ζ Δ , such that: if the OC pðOC j x ¼ xÞ Z ζ Δ the system is declared OC, where pðOC j xÞ is the posterior probability of the class OC given x, and where pðOC j x ¼ xÞ ¼ 1  pðIC j x ¼ xÞ. The parameters of the class IC (the multivariate Gaussian distribution pðxj ICÞÞ are estimated (if they are unknown) from the fault-free available, by means of e.g. MLE. The class OC (the multivariate Gaussian distribution pðxj OCÞÞ, as in (Verron et al., 2010b), is considered as a virtual class which represents the set of observations that cannot be attributed to the fault-free class IC. Their parameters are defined such as μOC ¼ μIC and ΣOC expressing more variability than Σ IC : Σ OC ¼ cΔ  Σ IC , where cΔ 4 1 (if cΔ ¼ 1, the two classes will be identical, which does not make sense). For more simplicity we note μΔ ¼ μIC and Σ Δ ¼ Σ IC . The networks corresponding to the statistics T2 and SPE are respectively illustrated in Figs. 5 and 6.

Fig. 4. PCA under a CGN (univariate representation).

From Fig. 3, we can deduce the marginal distribution pðxi ) of each observed variable xi as follows: Z Z Z m ^ ~ ^ tÞ ~ ∏ pðxj j t; ^ tÞdðx⧹fx ~ ~ ^ pðtÞ pðtÞ pðxi j t; pðxi Þ ¼ i gÞ dt dt Z ¼

^ pðtÞ

Z

jai

~ ^ ~ ~ ^ pðtÞpðx i j t; tÞ dt dt

Fig. 5. T2 under CGN.

ð21Þ

R ^ ~ where ∏m j a i pðx j j t; tÞdðx⧹fx i gÞ ¼ 1 and x⧹fx i g means all the variables x1 ; …; xm except xi . The marginal probability distribution of each xi , pðxi Þ, is Gaussian, of the form: pðxi Þ ¼ N ðμxi ; Σ xi Þ

ð22Þ

where its parameters can be deduced from (20) as below: ^ þ B~ i E½t ~ þ E½ϵ μxi ¼ E½xi  ¼ B^ i E½t T T T T Σ xi ¼ covðxi Þ ¼ B^ i E½t^ t^ B^ i þ B~ i E½t~ t~ B~ i þE½ϵϵT 

The univariate representation could be a better choice when some observations are not available. Indeed, in a very natural way it manages the unobserved variables by considering their

Fig. 6. SPE under CGN.

Note that, this difference between the two classes (the cΔ making the variance of OC larger than IC) is quite used in literature for fault detection purpose using Bayesian networks (as in (Kawahara et al., 2005; Schwall and Gerdes, 2002; Lerner et al., 2000)), where the class (IC or OC) with the maximum posterior probability is taken. However, the value of cΔ is determined using

478

M.A. Atoui et al. / Engineering Applications of Artificial Intelligence 45 (2015) 473–481

faults data which are generally not available or not enough to accurately estimate it and avoid increasing false alarm and/or missdetection rates. In what follows, instead focusing on the exact value IC OC of cΔ we seek for probabilistic control limits ζ Δ or ζ Δ given a cΔ 41 such as we keep the following decision rule: x A IC : if Δ r CLΔ

ð23Þ

under one of these decision rules: x A IC : if pðICj xÞ Z ζ Δ

ð24Þ

x A IC : if pðOCj xÞ o ζ Δ

ð25Þ

IC

OC

Consider (24) for a given x such as: pðICj xÞ ¼ ζ Δ

IC

ð26Þ

pðICj xÞ ¼ ζ Δ ½pðICj xÞ þpðOCj xÞ IC

ð27Þ

where ζ Δ ¼ 1  ζ Δ and pðICj xÞ þ pðOCj xÞ ¼ 1. Using the Bayes formula, the a posteriori probability of each class (IC or OC) can be written as below: OC

IC

pðDj x ¼ xÞ ¼

pðDÞpðxj DÞ ; pðxÞ

D A fIC; OC g

ð28Þ

From (26) and (28) we obtain   pðICÞpðxj ICÞ IC pðICÞpðxj ICÞ þ pðOCÞpðxj OCÞ ¼ ζΔ pðxÞ pðxÞ IC ζ IC Δ pðOCÞpðxj OCÞ ¼ pðICÞ½pðxj ICÞ  ζ Δ pðxj ICÞ

Let ω

¼ pðOCÞ pðICÞ ,

ð29Þ ð30Þ

then we have

ζ Δ ωpðxj OCÞ ¼ pðxj ICÞ  ζ IC Δ pðxj ICÞ IC pðxj ICÞ ¼ ζ Δ ½pðxj ICÞ þ ωpðxÞj OCÞ IC

ζ IC Δ ¼

pðxj ICÞ pðxj ICÞ þ ωpðxj OCÞ

ð31Þ

As in quadratic discriminant analysis, each class, IC and OC, follows a Gaussian distribution. The conditional probability of these two classes can be written as in Eqs. (32) and (33), where p represents a multivariate Gaussian distribution of dimension m. 1 T 1 pðxj ICÞ ¼ m=2 eð  ðx  μΔ Þ Σ Δ ðx  μΔ ÞÞ=2 j Σ Δ j 1=2 2π

pðxj OCÞ ¼

1 2π m=2 j Σ Δ j 1=2 cΔ

m=2

1

e  ðx  μΔ Þ Σ Δ T

ζΔ ¼

Previously, we have demonstrated the possibility to implement PCA, and the quadratic statistics as T2 and SPE each under a CGN. For fault detection purpose, we propose to join them under a single CGN (probabilistic framework) as shown in Figs. 7 and 8, where the nodes T2 and SPE are discrete nodes (each with two states IC and OC). In the proposed networks, the nodes t^ and t~ are each henceforth child of a discrete node respectively T2 and SPE and thus are Gaussian for each of their parent values: IC and OC. So, in order to reach the decisions made by the both quadratic statistics, T2 and SPE which monitor the latent variables in the PCA fault detection scheme, we define the Gaussians associated to each one of the nodes t^ and t~ such as they have zero means and different covariance matrices (the Gaussian distribution associated to the state OC express more variability, through the coefficient cΔ than the one associated to IC). The covariance matrices of the Gaussians corresponding to the class ^ (see IC of the nodes T2 and SPE are defined respectively equal to Λ ^ matrix is obtained using only (10)) and I (see (11)). We recall that Λ fault-free data. Concerning the others nodes of the proposed networks, the discrete nodes T2 and SPE (decisions nodes with two states IC, OC) are defined each given their states (values) prior probabilities. Finally, the remaining continuous nodes follow a Gaussian linear regression model given their Gaussian latent parents ~ In Figs. 7 and 8, we give the conditional probability table t^ and t. defined and associated to each node. For a given system, at each moment using the proposed probabilistic graphical models we are able to make a decision about its state (In control or out of control) equivalently to the PCA fault detection scheme. Indeed, once the parameters of their nodes are defined, it is enough to calculate the posterior probability of one state of each discrete node and compare it to its corresponding probabilistic control limit given by (34). The states posterior probabilities of the node T2 can be deduced (the same way for the node SPE) from the proposed network in Fig. 7 given a new normalized observation x of the multivariate variable (node) x as below: pðT 2 ¼ IC j x ¼ xÞ ¼ ηpðT 2 ¼ ICÞpðxj T 2 ¼ ICÞ

ð32Þ ¼ ηpðT 2 ¼ ICÞ

ðx  μΔ Þ=2cΔ

X pðSPEÞpðxj T 2 ¼ IC; SPEÞ

ð35Þ

SPE

ð33Þ

Let CLΔ be the Squared Mahalanobis form of x when IC 1 pðIC j x ¼ xÞ ¼ ζ Δ , where CLΔ ¼ ðx  μΔ ÞT Σ Δ ðx  μΔ Þ, which corresponds to the control limit of the concerned statistic Δ (e.g. for T2 and SPE see respectively (8) and (9)), then we have IC

3.3. CGN for fault detection

where η ¼ 1/ IC j x ¼ xÞ.

P T2

pðxj T 2 ÞpðT 2 Þ and pðT 2 ¼ OC j x ¼ xÞ ¼ 1 pðT 2 ¼

1 eð  CLΔ Þ=2 2π m=2 j Σ Δ j 1=2

1 ω eð  CLΔ Þ=2 þ eð  CLΔ Þ=2cΔ m=2 m=2 2π m=2 j Σ Δ j 1=2 2π j Σ Δ j 1=2 c Δ

m=2

¼ ¼

cΔ eð  CLΔ Þ=2

cΔ eð  CLΔ Þ=2 þ ωeð  CLΔ Þ=2cΔ m=2



ω

¼

m=2



þ ωeððcΔ  1Þ=2cΔ ÞCLΔ

1

eððcΔ  1Þ=2cΔ ÞCLΔ m=2



 ðm=2Þ ððcΔ  1Þ=2cΔ ÞCLΔ e ,

Let γ ¼ ωcΔ

ζ IC Δ ¼

m=2



1 1þγ

and

IC ζ OC Δ ¼ 1  ζΔ

finally we obtain

Fig. 7. PCA for fault detection under CGN: Multivariate form.

ð34Þ

Based on (34), with a cΔ 41, we are able to modelize a quadratic statistic Δ in a CGN.

The parameters of pðxj T 2 ¼ IC; SPEÞ for each value (IC, OC) of the node SPE can be obtained as below: pðxj T 2 ¼ IC; SPE ¼ ICÞ

M.A. Atoui et al. / Engineering Applications of Artificial Intelligence 45 (2015) 473–481

Z ¼

pðt^ j T 2 ¼ ICÞ

Z

^ tÞpð ~ t~ j SPE ¼ ICÞ dt~ dt^ pðxj t;

T ^ A^ T Þ ¼ N ð0; vI þ A~ A~ þ A^ Λ

ð36Þ

similarly ^ A^ T Þ ~ SPE IÞA~ T þ A^ Λ pðxj T 2 ¼ IC; SPE ¼ OCÞ ¼ N ð0; vI þ Aðc The networks presented in Figs. 7 and 8 give same results despite the fact that they have different structures. In the following, we give the states posterior probabilities of the node T2 (those of the node SPE can be deduced in a similar way), in Fig. 8, given the new normalized observations x1 ; …; xm of the univariate variables (nodes) x1 ; …; xm : m

pðT 2 ¼ IC j x1 ¼ x1 ; …; xm ¼ xm Þ ¼ ηpðT 2 ¼ ICÞ ∏ pðxi j T 2 ¼ ICÞ ¼ i¼1

m X ηpðT ¼ ICÞ pðSPEÞ ∏ pðxi j T 2 ¼ IC; SPEÞ 2

η ¼ 1=

where

ð37Þ

i¼1

SPE

P

T2 pðx1 ; …; xm j T

2

ÞpðT 2 Þ,

pðT 2 ¼ OC j x1 ¼ x1 ; …;

2

xm ¼ xm Þ ¼ 1  pðT ¼ IC j x1 ¼ x1 ; …; xm ¼ xm Þ and for each state of SPE the parameters of pðxi j T 2 ¼ IC; SPEÞ are given by pðxi j T 2 ¼ IC; SPE ¼ ICÞ Z Z ^ tÞpð ~ t~ j SPE ¼ ICÞ dt~ dt^ ¼ pðt^ j T 2 ¼ ICÞ pðxi j t; ^ B^ Þ ¼ N ð0; v þ B~ i B~ i þ B^ i Λ i T

T

ð38Þ

and T ^ B^ T Þ pðxi j T 2 ¼ IC; SPE ¼ OCÞ ¼ N ð0; v þ B~ i ðcSPE IÞB~ i þ B^ i Λ i

479

R P where pðx  j T2 ¼ IC; SPEÞ dx  ¼ 1, η ¼ 1= T 2 pðx þ j T 2 ÞpðT 2 Þ and 2 2 pðT ¼ OC j x þ Þ ¼ 1  pðT ¼ IC j x þ Þ. Note that, these calculations are automatically handled using the proposed CGNs. In the next section, an example illustrating this fact is given after validating the feasibility of the proposed framework on the Tennessee Eastman Process data sets.

4. Application 4.1. Tennessee Eastman Process In order to compare our proposal to the conventional fault detection PCA scheme (see Section 2.2), we propose to test both of them on the Tennessee Eastman Process (TEP). It is an industrial chemical process (see Fig. 9). Its simulation provided by the Eastman Chemical Company is widely used as a benchmark problem for control techniques and also to compare fault detection and/or diagnosis methods. The TEP consists of five major units namely, reactor, condenser, compressor, separator and stripper as described in Downs and Vogel (1993). This process has 52 variables, more exactly 41 observed variables and 11 manipulated variables. In this paper, as in Yin et al. (2012), we only consider 22 observed and 11 manipulated variables (m¼33) where we have retained a¼9 principal components, given the fault-free training sample. Note that, this is just an example and more/less process variables can be considered. The two methods have been compared using the two indexes: FAR (False Alarm Rate) and MDR (Miss Detection Rate) on 21 test data sets, each with 800 samples (1 for normal operating conditions

Fig. 8. PCA for fault detection under CGN: Univariate form.

The proposed CGN (univariate form, Fig. 8) unlike PCA and most of the detection methods, is able to give a response even if the values of some variables are missing (their corresponding nodes are considered as latent). Let x a set of m variables (nodes). Among the m variables a set of variables x þ are observed and another x  are unobserved. Given the observed variables, the posterior probabilities of the discrete node states can be calculated by marginalizing over x  . Below, we give an example of how the posterior probability of the state IC of T2 can be inferred when some observations are missing: pðT 2 ¼ ICj x þ ¼ x þ Þ ¼ ηpðT 2 ¼ ICÞ

XZ

pðSPEÞpðx þ ; x  j T 2 ¼ IC; SPEÞ dx 

SPE

¼ ηpðT ¼ ICÞ 2

X

4.2. Hot Forming Process þ

2

pðSPEÞpðx j T ¼ IC; SPEÞ

SPE

¼ ηpðT 2 ¼ ICÞ

X

and others for 20 different faults). The obtained results have shown that both methods give same and identical FARs and MDRs. In Fig. 10, we present the results of the conventional PCA scheme for fault detection using the statistics T2 and SPE and its equivalent in a CGN for 200 observations of respectively: the fault-free and the Fault 4 data sets. In this figure, for the statistics T2 and SPE, an upper violation of their respective control limit CLT2 and CLSPE means that a fault has occurred in the process. The same hold for the CGNs when the posterior probability of the state OC of the node T2 (resp. SPE) are respectively greater or equal to their corresponding probabilistic oc oc control limit ζ T2 (resp. ζ SPE ). It can be seen that the two approaches provide the same decisions at any instant. This is also true for the others not presented observations.

∏ pðSPEÞpðxi j T 2 ¼ IC; SPEÞ

þ SPE xi A x

ð39Þ

The ability of the proposed framework to handle incomplete data is shown on the Hot Forming Process (HFP) illustrated in Fig. 11. It is a simple case study (Li et al., 2008) with 5 variables (one quality variable (x5 final dimension of workpiece) and four process variables (x1 ,

480

M.A. Atoui et al. / Engineering Applications of Artificial Intelligence 45 (2015) 473–481

Fig. 9. Tennessee Eastman Process.

PCA + T2 statistic

PCA + SPE statistic

60

80

2

T

CLT2

20 0

0

50

100

150

SPE

60

40

40 CL

20 0

200

SPE

0

50

instants CGN: probability of the node T2

150

200

CGN: probability of the node SPE

1

1 oc

ζT2

0.8 0.6

p(OC|x)

p(OC|x)

100 instants

0.5

ζoc

SPE

0.4 0.2

0

50

100 instants

150

200

0

0

50

100

150

200

instants

Fig. 10. Comparison between PCA and the proposed method (with cSPE ¼ cT2 ¼ 1:005).

Fig. 11. Hot Forming Process.

temperature; x2 , material flow stress; x3 , tension in workpiece; and x4 , blank holding force)). We simulate 100 fault-free observations of each variable, from it the orthogonal axes P and the a principals are defined. Given the process variables, many faults scenarios could be generated e.g. by introducing a mean shift with an amplitude ms. Let us introduce a fault with ms¼3 in x3 . Fig. 12 presents the proposed method (with cSPE ¼ cT2 ¼ 3, m¼ 5, a¼3, and α ¼ 1%) implemented in two situations : In Fig. 12(a), the sample test is complete (all the variables are observable) and in Fig. 12(b) the value of the variable x2 is missing (white node in the graph, x  ¼ x2 and x þ ¼ ½x1 ; x3 ; x4 ; x5 T ). We show that the proposed method does not fail and still able, in a natural way, to make a decision (gives the probabilities of IC and OC of each discrete node by marginalizing over x2 ). The Networks and inferences presented in this paper have been achieved using BNT (Murphy, 2001).

M.A. Atoui et al. / Engineering Applications of Artificial Intelligence 45 (2015) 473–481

481

Fig. 12. An illustrative example when a fault has occurred in x3 (with ms¼ 3) with ζ oc ¼ 92:17% and ζ oc SPE ¼ 46:97%. (a) PCA fault detection scheme under a CGN when all the T2 variables are available. (b) PCA fault detection scheme under a CGN when the value of the variable x2 is missing.

5. Conclusions and outlooks The main interest of this paper is the presentation of a new tool for fault detection purpose. Firstly, we have transposed standard PCA (systematic and residual subspaces) under a BN and more precisely a CGN. Secondly, we have proposed a probabilistic framework for statistics as T2, SPE. For that, it has been necessary to define probabilistic control limits in order to match the decisions made by the comparison of the quadratic statistics to their thresholds. Finally, we have introduced a CGN which consists of the integration of PCA and its corresponding statistics. The proposed method has been tested on the TEP and compared to the PCA fault detection scheme. The obtained results demonstrate that the two methods produce same decisions. Also, we have shown that the proposed tool, unlike the PCA fault detection scheme could be used even when test data are missing. Besides the fact that the proposed CGN match the standard fault detection PCA scheme, it naturally opens to a numerous outlooks (1) integrating other information about the system e.g. data reliability, (2) extending it to Fault isolation (diagnosis) e.g. building separate CGN's for each process unit (or fault), and finally (3) a mixture of this representation could be a solution for non-linear processes and non-Gaussian data hypothesis. Also, as PCA, statistical tests are widely used by other data-driven and model-based methods (to test the difference between the measures and their estimated (from analytical model)), it could be interesting to integrate each or them under a sole framework: Bayesian network.

Acknowledgments Mohamed Amine Atoui is supported by a Ph.D. purpose grant from “la Région Pays de la Loire”. The authors gratefully acknowledge the contribution of the reviewers comments. References Bishop, C.M., et al., 2006. Pattern Recognition and Machine Learning, vol. 1. Springer, New York. Chiang, L., Russel, E., Braatz, R., 2001. Fault Detection and Diagnosis in Industrial Systems. Springer, London. Ding, S.X., Zhang, P., Jeinsch, T., Ding, E., Engel, P., Gui, W. A survey of the application of basic data-driven and model-based methods in process monitoring and fault diagnosis. In: World Congress, 2011, pp. 12380–12388. Ding, S., Zhang, P., Ding, E., Naik, A., Deng, P., Gui, W., 2010. On the application of pca technique to fault diagnosis. Tsinghua Sci. Technol. 15 (2), 138–144. Ding, S.X. Data-driven design of model-based fault diagnosis systems. In: Proceedings of IFAC ADCHEM, 2012, pp. 10–13.

Downs, J.J., Vogel, E.F., 1993. A plant-wide industrial process control problem. Comput. Chem. Eng. 17 (3), 245–255. Duda, R.O., Hart, P.E., Stork, D.G., 2001. Pattern Classification, 2nd edition Wiley, New York. Friedman, N., Geiger, D., Goldszmidt, M., 1997. Bayesian network classifiers. Mach. Learn. 29 (2), 131–163. Huang, B., 2008. Bayesian methods for control loop monitoring and diagnosis. J. Process Control 18 (9), 829–838. Jackson, J.E., Mudholkar, G.S., 1979. Control procedures for residuals associated with Principal Component Analysis. Technometrics 21 (3), 341–349. Jackson, J.E. A User's Guide to Principal Components, vol. 587, Wiley.com, 2005. Jensen, F.V., Nielsen, T.D., 2007. Bayesian Networks and Decision Graphs. Springer, New York. Kawahara, Y., Fujimaki, R., Yairi, T., Machida, K. Diagnosis method for spacecraft using dynamic Bayesian networks. In: 'i-SAIRAS 2005'—The Eighth International Symposium on Artificial Intelligence, Robotics and Automation in Space, vol. 603, 2005, p. 85. Kim, D., Lee, I.B., 2003. Process monitoring based on probabilistic pca. Chemom. Intell. Lab. Syst. 67 (2), 109–123. Lauritzen, S.L., Jensen, F., 2001. Stable local computation with conditional Gaussian distributions. Stat. Comput. 11 (2), 191–203. Lauritzen, S.L., 1992. Propagation of probabilities, means, and variances in mixed graphical association models. J. Am. Stat. Assoc. 87 (420), 1098–1108. Lerner, U., Parr, R., Koller, D., Biswas, G. Bayesian fault detection and diagnosis in dynamic systems. In: AAAI/IAAI, 2000, pp. 531–537. Li, J., Jin, J., Shi, J., 2008. Causation-based t2 decomposition for multivariate process monitoring and diagnosis. J. Qual. Technol. 40 (1), 46. Murphy, K.P., 2001. The Bayes net toolbox for matlab. Comput. Sci. Stat., 33. Qin, S.J., 2003. Statistical process monitoring: basics and beyond. J. Chemom. 17 (8– 9), 480–502. Qin, S.J., 2012. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 36 (2), 220–234. Roychoudhury, I., Biswas, G., Koutsoukos, X. A Bayesian approach to efficient diagnosis of incipient faults. In: 17th International Workshop on Principles of Diagnosis DX. Citeseer, 2006, pp. 243–264. Schwall, M.L., Gerdes, J.C. A probabilistic approach to residual processing for vehicle fault detection. In: American Control Conference, 2002, Proceedings of the 2002, vol. 3, 2002, pp. 2552–2557. Tipping, M.E., Bishop, C.M., 1999. Probabilistic principal component analysis. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 61 (3), 611–622. Tracy, N., Young, J., Mason, R., 1992. Multivariate control charts for individual observations. J. Qual. Technol. 24 (2). Valle, S., Li, W., Qin, S.J., 1999. Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods. Ind. Eng. Chem. Res. 38 (11), 4389–4401. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S.N., Yin, K., 2003. A review of process fault detection and diagnosis: part III: process history based methods. Comput. Chem. Eng. 27 (3), 327–346. Verron, S., Li, J., Tiplica, T., 2010a. Fault detection and isolation of faults in a multivariate process with Bayesian network. J. Process Control 20 (8), 902–911. Verron, S., Tiplica, T., Kobi, A., 2010b. Fault diagnosis of industrial systems by Conditional Gaussian Network including a distance rejection criterion. Eng. Appl. Artif. Intell. 23 (7), 1229–1235. Yin, S., Ding, S.X., Haghani, A., Hao, H., Zhang, P., 2012. A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman Process. J. Process Control 22 (9), 1567–1581. Yu, J., Rashid, M.M. 2013. A novel dynamic Bayesian network-based networked process monitoring approach for fault detection, propagation identification, and root cause diagnosis, AIChE J. 59(7), 2348–2365, Wiley Online Library.