Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
Contents lists available at ScienceDirect
Journal of Loss Prevention in the Process Industries journal homepage: www.elsevier.com/locate/jlp
Kullback-Leibler distance-based enhanced detection of incipient anomalies Fouzi Harrou a, *, Ying Sun a, Muddu Madakyaru b a
King Abdullah University of Science and Technology (KAUST), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia b Manipal Institute of Technology, Department of Chemical Engineering, Manipal University, Manipal, India
a r t i c l e i n f o
a b s t r a c t
Article history: Received 3 February 2016 Accepted 28 August 2016 Available online 31 August 2016
Accurate and effective anomaly detection and diagnosis of modern engineering systems by monitoring processes ensure reliability and safety of a product while maintaining desired quality. In this paper, an innovative method based on Kullback-Leibler divergence for detecting incipient anomalies in highly correlated multivariate data is presented. We use a partial least square (PLS) method as a modeling framework and a symmetrized Kullback-Leibler distance (KLD) as an anomaly indicator, where it is used to quantify the dissimilarity between current PLS-based residual and reference probability distributions obtained using fault-free data. Furthermore, this paper reports the development of two monitoring charts based on the KLD. The first approach is a KLD-Shewhart chart, where the Shewhart monitoring chart with a three sigma rule is used to monitor the KLD of the response variables residuals from the PLS model. The second approach integrates the KLD statistic into the exponentially weighted moving average monitoring chart. The performance of the PLS-based KLD anomaly-detection methods is illustrated and compared to that of conventional PLS-based anomaly detection methods. Using synthetic data and simulated distillation column data, we demonstrate the greater sensitivity and effectiveness of the developed method over the conventional PLS-based methods, especially when data are highly correlated and small anomalies are of interest. Results indicate that the proposed chart is a very promising KLDbased method because KLD-based charts are, in practice, designed to detect small shifts in process parameters. © 2016 Elsevier Ltd. All rights reserved.
Keywords: Anomaly detection Statistical process control Kullback-Leibler distance Partial least square
1. Introduction 1.1. The state of the art Process safety and product quality are threatened by anomalous behavior, which causes systems or processes to deviate unacceptably from their normal operating conditions or states (Isermann (2005)). Anomaly detection and diagnosis are therefore central to building safe and reliable modern industrial processes. If anomalies in industrial systems are not detected promptly, they can affect plant productivity, profitability, and safety. For example, a survey carried out by (Nimmo (1995)) showed that the petrochemical industry in the USA could save up to $10 billion per year if the anomalies that occur in monitored processes could be detected and
* Corresponding author. E-mail address:
[email protected] (F. Harrou). http://dx.doi.org/10.1016/j.jlp.2016.08.020 0950-4230/© 2016 Elsevier Ltd. All rights reserved.
identified. As a result of proper process monitoring, downtime is minimized, safety of process operations is improved, and manufacturing costs are reduced (Isermann (2005); Gertler (1998)). Many serious accidents have occurred in the past few decades in various chemical and petrochemical plants worldwide, including the Union Carbide accident (Dhara and Dhara (2002)), the -Cornell (1993)), and the Mina AlPiper Alpha accident (Pate Ahmedi accident (Venkatasubramanian et al. (2003a)). The Union Carbide accident occurred in Bhopal, India in 1984, where a major toxic gas leak caused over 3000 deaths and seriously injured 400,000 others in the surrounding residential areas (Dhara and Dhara (2002)). The 1988 accident in Piper Alpha, which is a north-sea oil production platform operated by Occidental Chemi-Cornell (1993)). cal, involved an explosion killing 167 men (Pate The Mina Al-Ahmedi accident in kuwait in 2000, was due to a failure in a condensate line in a refinery plant causing the death of 5 people and injuring 50 others (Venkatasubramanian et al. (2003a)). These accidents show that tight monitoring of chemical and
74
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
petrochemical processes is essential for the safe and profitable operation of these plants. Naturally, once the anomaly is detected by the monitoring process, effective system operation also requires decision making on the risk of system shutdown followed by anomaly attribution and isolation or correction before it contaminates process performance (Isermann (2005); Gertler (1998)). The purpose of anomaly detection is to identify any anomaly event: when a distance exists between process behavior and that of its nominal state. Anomaly detection is a binary decision making process: either the process is under normal or abnormal operating conditions. Here we focus on anomaly detection versus anomaly isolation, which is used to determine the location of the detected anomaly. The benefits of process monitoring and fault diagnosis are currently receiving considerably increasing attention in the application and research domains. Data-based process-monitoring methods, also known as process history-based methods or modelfree methods (Yin et al. (2014); Chaing et al. (2001); Venkatasubramanian et al. (2003b); Neumann et al. (1999); Serpas et al. (2013)), are widely applied in the process industry for process monitoring and diagnosis purposes. These methods are able to extract useful information from historical data, computing the relationship between the variables in the absence of an analytical model. Toward this end, data-based monitoring methods have relied on the availability of historical data obtained from the monitored process under nominal operating conditions (Montgomery (2005); Bissell (1994)). Techniques in this category principally consist of two phases. (i) Off-line training or learning, in which historical fault-free or healthy data sets are first used to build an empirical model that describes the behavior of the nominal process. The model is then used to detect anomalies in future data. (ii) On-line monitoring, in which the online measurement data are processed and the empirical model is used to estimate true values of new measurements and anomalies are detected and diagnosed (see Fig. 1). Indeed, the information used for anomaly-detection is derived directly from input data. Because costly, time-consuming explicit models are not required for data-based methods, they have become very popular in industrial processes. However, because the information used for anomaly detection is derived directly from input data, the performance of data-based methods mainly depends on the availability of quality input data. Numerous data-based anomaly detection techniques are referenced in the bibliography, and they can be broadly categorized into two main classes: univariate and multivariate techniques (Montgomery (2005); Yin et al. (2012, 2015)). Univariate statistical monitoring methods, such as EWMA (exponentially weighted moving average) and CUSUM (cumulative sum) charts, are essentially used to monitor only one process variable (Montgomery (2005)). However, modern industrial processes often present multiple highly correlated process variables. This is the area where univariate anomaly-detection methods are unable to explain different aspects of the process and, therefore, are not appropriate
for modern day processes. Therefore, to monitor several different process variables simultaneously multivariate statistical monitoring techniques were developed. Multivariate anomaly-detection methods take into account the correlation between a process variables while univariate anomaly-detection methods do not. In recent decades, the latent variable regression (LVR)-based method has been recognized as a powerful tool for monitoring multivariate processes and for diagnosis problems within (Yin et al. (2014)). The main idea of the LVR-based monitoring approach (e.g., partial least square (PLS) regression, principal component analysis (PCA)) is to extract useful data from the original data set and to construct statistics for monitoring. More specifically, LVR models deal with collinearity by transforming the variables so that most data is captured in a smaller number of variables, which are used to construct the model for process monitoring. In other words, LVR models perform regression on a small number of latent variables that are linear combinations of the original variables. The PLS regression also known as the projection to latent structure, is among the most widely used multivariate statistical process monitoring methods for monitoring multivariate processes (Wold et al. (1984); Yin et al. (2015)) due to its simplicity and efficiency in processing huge amounts of processed data. Roughly speaking, PLS regression attempts to decompose the data in such a way that the correlation between predictor and predicted variables are maximized (Geladi and Kowalski (1986)). In other words, PLS regression transforms the variables by taking the input-output relationship into account and maximizing the covariance between the transformed input and output variables. By extracting useful data from the original dataset and then using monitoring indices such as T2 and Q statistics, PLS regression has been used successfully for fault detection in multivariate process with highly correlated variables. Hence, several variants of the linear PLS regression have been developed for process monitoring, such as multiway PLS (Nomikos and MacGregor (1995)), multi-block PLS (MacGregor et al. (1994)), multi-scale PLS (Teppola and Minkkinen (2000); Lee et al. (2009); Roodbali et al. (2011)), dynamic PLS (Lee et al. (2004)), adaptive total PLS (Dong et al. (2015)), recursive PLS (Qin (1998)) and multi-phase PLS (Lu et al. (2004)). PLS-based process monitoring methods have therefore been largely exploited for a variety of engineering applications (Khedher et al. (2015); Peng et al. (2015); Kembhavi et al. (2011); Muradore and Fiorini (2012); Li et al. (2007)). 1.2. Motivation and contribution Effective operation of engineering systems requires responsive monitoring of key variables. Detecting incipient or small anomalies in highly correlated input-output multivariate data is one of the most crucial and challenging tasks in the area of anomaly detection and diagnosis. Early detection of incipient anomalies can provide an early warning and help to avoid catastrophic damage and subsequent financial loss. Unfortunately, conventional PLS-based snap-
Fig. 1. Schematic of data-driven anomaly-detection methods.
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
shot Shewhart type monitoring indices, such as T2 and Q statistics, are relatively inefficient at detecting incipient anomalies as they make decision based on only the information about the process in the last observation. The ability to detect smaller parameter shifts can be improved by using a chart based on information about the entire process history. Using the EWMA monitoring chart, the authors in (Harrou et al. (2015)) showed that the performances of conventional PLS-based monitoring charts can be drastically improved by taking into account historical data in the decision making procedure, especially in detecting anomalies with small magnitudes. Here we take a conceptually simple and generic approach that is applicable if residuals with Gaussian or nonGaussian distribution. Instead of monitoring changes in the underlying PLS model, our proposed approach evaluates changes in the density functions of the response residuals. Thereafter, the problem of anomaly detection is addressed as a distance measure between probability distributions. Specifically, this paper is focuses on in developing an enhanced PLS anomaly detection method by using the Kullback-Liebler distance (KLD) (Kullback and Leibler (1951)). This choice is mainly motivated by the greater ability of the KLD metric to evaluate the similarity between two distributions, which makes it very attractive as anomaly indicator. KLD is used as a monitoring index to identify the dissimilarity between two probability distributions, and it allows automatic anomaly detection. In particular, the KLD is used as a measure to quantify the similarity between actual PLS-based residuals and reference the probability distributions. Anomaly detection results confirm that the proposed technique has potential for detecting small or incipient anomalies in highly correlated data. The remaining sections of this paper are organized as follows. In Section 2, a brief introduction to PLS and how it can be used in anomaly-detection are reviewed. In Section 3, the KLD metric is briefly described, and Section 4 describes the proposed PLS-based KLD anomaly-detection approach. In Section 5, we assess the proposed scheme and present some simulation results. Finally, some concluding remarks and future research directions are given in Section 6. 2. PLS-based process monitoring The developed method uses PLS regression as a modeling framework and the KLD measure for anomaly detection. The performance of the developed anomaly detection method will be illustrated and compared to that of the conventional PLS method. Because a developed method is based on PLS regression we provide a brief introduction to PLS and describe how it can be used in anomaly detection. Roughly speaking, PLS regression is a multivariate analysis method that constructs a linear regression between two data sets expressed in the form of data matrices (Wold et al. (1984)). The suitability of a linear PLS regression depends on its capability to deal with collinear data and with several variables in both the input (predictor) matrix X and output (response) matrix Y (Geladi and Kowalski (1986)). The basic idea of PLS regression consists of finding the latent variables from the process data by capturing the largest variability in the data and achieving the maximum crosscorrelation among the predictor and the response variables (Wold et al. (1984)). Given an input data matrix X2Rnm having n observations and m variables and an output data matrix Y2Rnp consisting of p response variables, a PLS model is formally determined by two sets of linear equations: the inner model and the outer model (Geladi and Kowalski (1986)) (see Fig. 2). The outer model, which links the latent variables (LVs) tothe response and predictor matrices, can be expressed as (Kourti and MacGregor (1995)):
75
Fig. 2. PLS Algorithm showing an inner and outer relationship of Y and Y.
8 l P > > b > ti PTi þ E ¼ TPT þ E < X¼ XþE¼ i¼1
l > > b þ F ¼ P u qT þ F ¼ UQ T þ F > :Y ¼ Y i i
(1)
i¼1
b and Y b represent modeling matrices of X and Y, respecwhere X tively, the matrices T2Rnl and U2Rnq consist of l kept LVs of the predictor and response data, respectively, the matrices E2Rnm and F2Rnp represent the approximation residuals of the predictor and response data, respectively, and the matrices P2Rml and Q 2Rpq represent the predictor and response loading matrices, respectively. Note that PLS regression uses an iterative algorithm (Geladi and Kowalski (1986)) to estimate the LVs used in the model, where one LV or principal component is added iteratively to the model. After the inclusion of a latent variable, the input and output residuals are computed and the process is repeated using the residual data until the cross-validation-error criterion is minimized (Wold et al. (1984); Geladi and Kowalski (1986); Frank and Friedman (1993)). The number of LVs, l, can be estimated by cross-validation, for example (Li et al. (2002)). The inner relationship relating the predictor and response LVs can then be described by:
U ¼ TB þ H;
(2)
where B represents a regression matrix that consists of the model parameters linking the predictor and response LVs and H represents a residual matrix. The response Y can now be expressed as, Y ¼ TBQ T þ F* . Note that the PLS method projects the data down to a number of LVs that explain most of the variation in both predictors and responses and then models the LVs by linear regressions. The overall concept of a linear PLS model algorithm is illustrated in Fig. 3. For PLS-based monitoring, two statistics, Hotelling's T2 and Q or squared prediction error (SPE), are generally used (Montgomery (2005)). The Hotelling's T2 is a statistical metric that captures the behavior of retained LVs (Hotelling (1933)), and is defined as (Hotelling (1933)):
T2 ¼
l X ti2 i¼1
s2i
;
(3)
where s2i is the estimated variance of the corresponding latent variable ti. For new testing data, when the T2 value exceeds the control limit, it can be concluded that the process is out of control (Hotelling (1933)). The Q statistic, on the other hand, shows deviations from normal operating conditions based on variations in the predictor variables that are not described by the PLS model (Geladi and Kowalski (1986)). Indeed, the Q(X) statistic quantifies
76
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
Fig. 3. PLS Algorithm chart flow.
the loss of fit with the PLS model developed and is defined as (Jackson and Mudholkar (1979)): 2
b k2 ; Q ðXÞ ¼ kx x
(4)
b represents the prediction of x by the PLS model. When a where x vector of new data is available, the Q(X) statistic is calculated and compared with the threshold value Qa given in (Qin (2003)). If the confidence limit is violated, then a fault is declared. In conventional PLS-based monitoring, two statistics, T2 and Q, are generally used for anomaly detection (Montgomery (2005)). Although each hast its advantages and disadvantages, both tend to fail to detect small faults (Montgomery (2005)) because they treat each observation individually and don't take into account information from past data. That makes them insensitive to small faults in process variables and causes many missed detections. To overcome these limitations of the conventional PLS-based monitoring methods, we have developed an alternative fault detection approach, in which PLS is used as a modeling framework for fault detection using the KLD metric. More details about the KLD metric and how it can be used in fault detection are presented next. 3. Kullback-Leibler distance Measures of distance between probability distributions are central to the problems of inference and discrimination (Ullah (1996)). One of the most common measures of distance between probability distributions is based on Kullback-Leibler information (KLI). KLI is also well-known in information theory as the relative entropy and is an important statistical measure that can be used to quantify the dissimilarity, separability, distinguishability, or closeness between two probability density functions (PDFs) (Pardo (2005); Belov and Armstrong (2011); Kullback and Leibler (1951); Borovkov (1998); Basseville and Nikiforov (1993)). It integral to various problems of statistical inference and data processing, such as detection, estimation, compression, and classification (Borovkov (1998); Liese and Miescke (2008); Basseville (2013); Bittencourt et al. (2014); Zeng et al. (2014)). It has been used extensively by scientists and engineers in various disciplines, including pattern recognition (Coetzee (2005); Zhang et al. (2007)), image processing (Chen et al. (2003)), classification (Gupta et al. (2009)), and anomaly detection and detectability (Anderson and Haas (2011); Harrou et al. (2014); Kinnaert et al. (2001)). This is perhaps because of its simplicity in both theory and application. Another reason for its wide use it is, up to the sign, the expectation of the log-likelihood (Pardo (2005); Basseville (2013); Borovkov (1998);
Liese and Miescke (2008)). 3.1. Definition 1 (Kullback-Leibler information) Let us consider P1(x) and P2(x) to be two probability distributions with probability density functions p1(x) and p2(x), respectively. The Kullback-Leibler information of p1(x) relative to p2(x), which is a measure of the information lost when p1(x) is used to approximate p2(x), is defined as the expected value of the likelihood ratio with respect to p2(x) (Kullback and Leibler (1951)):
Z Iðp1 \\p2 Þ ¼ ℝdx
p ðxÞ p1 ðxÞ dx ¼ Ep1 log ; p1 ðxÞ log 1 p2 ðxÞ p2 ðxÞ
(5)
and between p2(x) versus p1(x) is given by
Z Iðp2 \\p1 Þ ¼ ℝ
dx
p ðxÞ p ðxÞ dx ¼ Ep2 log 2 ; p2 ðxÞ log 2 p1 ðxÞ p1 ðxÞ
(6)
where Epi denotes the expectation operator over the distribution pi. Here, log(.) is the natural algorithm. Note that the definition of KLI is valid for discrete and continuous distribution. As a matter of fact, the KLI measure is not a distance or metric in the Euclidean sense, basically because the distance between two distributions generally is not a symmetric function of p1(x) and p2(x), (i.e. Iðp1 \\p2 ÞsIðp2 \\p1 Þ), see Fig. 4. Also, the triangle inequality is not satisfied (namely IðA\\BÞ þ IðB\\CÞ > IðA\\CÞ is not guaranteed). It is non-negative (i.e., Iðp1 \\p2 Þ 0 or Iðp2 \\p1 Þ 0) and null only when the two densities are equal, p1¼p2. Hence, it must be interpreted as a pseudo-distance measure only. The closed form expression of KLI can easily be computed in the case of normal distributions. For univariate normal distributions, p1(x) and p2(x) of a random variable x, where p1 N ðm0 ; s0 Þ and p2 N ðm1 ; s1 Þ, where m0 and m1 are the means and s20, s21 are the variances for p1 and p2, the KLI between p1 and p2 is given by (Pardo (2005); Belov and Armstrong (2011)):
1 Iðp1 \\p2 Þ¼ pffiffiffiffiffiffi s0 2p ¼
Z exp
ðxm0 Þ2 2s20 !
!"
#
s ðxm0 Þ2 ðxm1 Þ2 þ log 1 dx s0 2s20 2s21
s21 s20 ðm1 m0 Þ2 1 þ þ 1 : log 2 2s21 s20 s21 (7)
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
77
Fig. 4. Illustration of the KLI for two Gaussian distributions. Note that the typical asymmetry for the KLI is clearly visible.
Remark Note that the second term in Equation (7) is strictly positive for any s0ss1. This means that the KLI in the case of simultaneous change in the mean and variance of a Gaussian variable is greater than the Kullback information corresponding to change in the mean only. Thus, using the KLI measure makes it easier to detect anomalies that simultaneously affect the mean and variance in the mean arising together with a change in the mean and variance of process measurements than to detect anomalies that affect only the process mean. Note that the KLI measure is asymmetrical. This nonsymmetrical property of the KLI complicates the procedure of setting the threshold for anomaly detection. To overcome this issue, a symmetrized version will be used. A familiar symmetrized version of the information is called the KLD, which will be used in this paper as an anomaly-detection indicator. This measure is also known as the J-divergence (Pardo (2005)). The KLD or J-divergence between p1(x) and p2(x) is given by:
the KLD measure, the more similar the distributions, and vise versa. It is this comparison operation that makes it a useful indicator of anomaly detection. Since KLD is a useful tool for measuring the dissimilarity between two distributions, it can be a suitable statistic for anomaly detection. More specifically, the KLD can be used as a anomaly indicator by comparing the statistical similarity between residual distributions before and after an anomaly. Therefore, it seems meaningful to adopt a PLS-based KLD scheme for statistical process monitoring instead. The main objective of this work is to exploit the advantages of the KLD and those of PLS modeling to achieve enhanced detection performance compared to conventional PLS anomaly detection methods, especially in detecting incipient or small shifts in the process mean. Subsequently, the KLD metric will be integrated with PLS to extend its anomaly detection abilities for detecting incipient anomalies.
Jðp1 ; p2 Þ ¼ Iðp1 \\p2 Þ þ Iðp2 \\p1 Þ
(8)
4. The proposed PLS-based KLD anomaly detection scheme
p1 ðxÞ dx p2 ðxÞ
(9)
Z ¼
ðpx p2 ðxÞÞ log
Lemma 1 Assume that p1 ðxÞ N ðm0 ; s0 Þ and p2 ðxÞ N ðm1 ; s1 Þ are two univariate normal densities of a random variable X; the KLD between p1(x) and p2(x) is given by (Belov and Armstrong (2011)):
Jðp1 ; p2 Þ ¼
1 s21 s20 1 1 þ þ ðm1 m0 Þ2 2 þ 2 2 s20 s21 s0 s1
!
! 2 :
(10)
When changes affect only on the mean (i.e., s21 ¼ s20 ), Equation (10) can be rewritten as:
Jðp1 ; p2 Þ ¼
ðm1 m0 Þ2
s20
:
(11)
The KLD in this case between two normal distributions with different means and the same variance is proportional to the square of the difference between the two means or the signal-to-noise ratio. In this case, the KLI measure is symmetric (i.e., Iðp2 \\p1 Þ ¼ Iðp1 \\p2 Þ), but this is not true in general (Basseville and Nikiforov (1993)). Note that the KLD measure in the case of non-Gaussian variables requires that non-Gaussian distributions to be used instead. In this case, the KLD measure has no closed form and the integral function given by Equation (8) should be numerically approximated (Romano and Kinnaert (2006)). Indeed, two similar distributions will have a small KLD close to zero due to measurement noise and errors, while very different distributions would have a larger KLD. In other words, the smaller
In this section, the PLS modeling framework is integrated with KLD to develop new anomaly detection schemes with a higher sensitivity to small or incipient anomalies in the multivariate inputoutput data. A KLD between the distributions of the output PLS residuals before and after anomalies can be used as an anomaly indicator of the existence or absence of anomalies. Under healthy conditions, it is expected to be approximatively zero in cases of modeling uncertainties and measurement noise. However, when an anomaly occurs, the KLD between the distributions of residuals before and after a change deviate significantly from zero, indicating the presence of a new condition that is significantly distinguishable from the normal faultless working mode. The low value of KLD represents a high similarity between the two variables while a high value indicates the opposite. Thus, this work exploits the high sensitivity of the KLD measure to small changes to improve anomaly detection over conventional PLS-based methods. Thus, in this paper, we introduce two monitoring charts based on the KLD. The first approach is the mixed KLD-Shewhart chart, where the Shewhart monitoring chart with three sigma rule is used to monitor the KLD measure. The second approach is the KLD-EWMA monitoring chart, which integrates the KLD statistic into the EWMA monitoring chart. The two different versions of the KLD chart are denoted as KLD-EWMA and KLD-Shewhart. The overall concept of the proposed method is illustrated in Fig. 5. Our approach is to combine the conventional KLD, Shewhart, and EWMA schemes to exploit the beneficial features of each of their design structures, as described in the following section. The developed anomaly-detection schemes based on KLD consist of two stages: training and testing. In the first stage, a reference PLS model is constructed in the case of healthy fault-free
78
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
Fig. 5. A block diagram of KLD-based anomaly-detection methods. KLD is used to compare the statistical distribution of a PLS-based residual against that of a reference. If the KLD measure, J, is greater than some predefined threshold, h, the presence of an anomaly is declared.
data. The following is a summary of the off-line training steps. Step 1: Building the PLS reference model. This step comprises of three main sub-steps presented as follows: 1) Collect training a fault-free data set (X and y) that represents the normal process of operations and a testing data set (possibly faulty data). The training data set corresponds to the normal operating conditions of the monitored process, which is necessary to construct the PLS reference model. 2) Scale the training data that is used to construct the process model (mean-centering and scaling all variables to variance unity). 3) Build the reference PLS model using the training data sets. - Select the number of latent variables, - Express the data matrix as a sum of approximate and residual matrices as shown in equation (1), - Compute the residuals of the response variables, F. Step 2: Calculation of control chart limits. 1) Compute the symmetric KLD measure between the PDFs of the residuals of the response variables obtained from the PLS model of two different training data sets. The KLD can be used as an
indicator of the existence or absence of anomalies. Control limits can be placed on the KLD of the residual PLS response variables. 2) Compute the upper control limit (UCL) of the control chart(s) based on the KLD of fault-free data. The control charts used in this study were Shewhart and EWMA charts. During the testing or on-line monitoring stage, the PLS reference models previously obtained are compared with the new data and an anomaly statistical index is calculated. The testing steps are summarized below. Step 3: Detection of anomalies. This step is composed of the follows five main sub-steps. 1) Scale the new or testing data, which may possibly contain anomalies, with the mean and standard deviation obtained from the training data. 2) Compute the residuals of the response variables, F. 3) Compute the KLD between the probability density of the actual residual and the reference distribution obtained under normal operational conditions. 4) Compute the control chart decision function. The computation of decision function in the case of an EWMA and Shewhart control chart is summarized in the next subsection.
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
5) Check for anomalies. Declare an anomaly when the control chart decision function exceeds the control limits.
4.1. First approach e mixed KLD-Shewhart control chart In this subsection, a brief description of the mixed KLDShewhart monitoring chart is introduced. The difference between the observed value of the output variable, y, and the predicted value, b y , obtained from the PLS model represent the residual of the output variable, F ¼ ½f1 ; …; ft ; …; fn , which can be used as an indicator to detect a possible anomaly. The residual F obtained from the PLS model is assumed to be Gaussian. It is assumed that the anomaly affects the mean parameter of residual distributions and the variance is expected to remain unchanged after the anomaly occurs. In this approach, the KLD has been used to measure the distance between the probability distribution of the current residual p0 ðf Þ N ðm0 ; s20 Þ against a reference p1 ðf Þ N ðm1 ; s20 Þ, where m0 and m1 are the means and s20 > 0 is the variance for p0(f) and p1(f). Then, the KLD distance based on the residual distributions of the response variables from the PLS model can be computed as m
m
2
0Þ follows: Jðp0 ðf Þ; p1 ðf ÞÞ ¼ ð 1 , where m0 and s0 are the mean and s20 the standard deviation of PLS-based residuals obtained with faultfree data, respectively. The normal operating conditions are guaranteed by a zero KLD when the PDFs are equal, although, in real situation, deviations from zero should only occur due to measurement errors or modeling uncertainties. The KLD-based test makes decisions between the null hypothesis H 0 (absence of anomalies) and the alternative hypothesis H 1 (presence of anomalies) by comparing between the decision statistic Jðp0 ðf Þ; p1 ðf ÞÞ and a given value of the threshold h.
Jðp0 ðf Þ; p1 ðf ÞÞ ¼
ðm1 m0 Þ2
s20
0 WH H 1 h:
current and earlier time points. More specifically, in this approach the EWMA control scheme is used to monitor the KLD mean. Towards this end, the KLD measures the probability distance between the PDF of the current residual against a reference, which is given in Equation (12). Defining the vector of KLD measurements as KLD ¼ ½d1 ; …; dj ; …; dn , then the EWMA decision function can be computed using the KLD measurements as follows:
zKLD ¼ gdt þ ð1 gÞzt1 ; t2½1; n; t
(14)
where l is a weighted parameter, with 0
ation of the KLD distance of the residual obtained with fault-free data. The EWMA control scheme declares an anomaly when the value of zKLD falls outside of the interval between the control limits. t The upper and lower control limits, UCL and LCL, are set as (Montgomery (2005)): UCL=LCL ¼ mKLD 0 ±LszKLD , where L is a multit
plier of EWMA standard deviation szKLD . L and g are two parameters t
that need to be set carefully (Montgomery (2005)). Since the EWMA control scheme is applied on the KLD measurements vector, only one EWMA decision function will be computed to monitor all variables. However, this approach can only detect the presence of anomalies, (i.e., it not can determine their locations). In the next section, the performance of the PLS-based KLD anomaly-detection schemes will be evaluated and compared to that of the conventional PLS-based anomaly detection scheme through simulated examples using synthetic data.
(12) 5. Simulated examples
For setting the detection threshold, h, a simple approach based on the three-sigma rule was used.
h ¼ mJ0 þ LsJ0 ;
79
(13)
where mJ0 and sJ0 are the mean and the standard deviation for the nominal behavior of the anomaly indicator KLD, respectively, and L is the width of the control limits that determines the confidence limits, usually specified in practice as 3 for a false alarm rate of 0.27%. If the decision function, Jðp0 ðf Þ; p1 ðf ÞÞ, is larger than the threshold h, the KLD-based test decides for H 1, otherwise H 0 is assumed to be true. 4.2. Second approach e mixed KLD-EWMA control chart In nominal conditions, no abnormalities happen in the monitored process; thus, the value of KLD fluctuates around zero due to measurement noise. A significant departure from zero of KLD reveals important deviations from normal behavior, indicating that the process is out of control. In this approach, EWMA (Harrou and Nounou (2014)) is used to enhance process monitoring through its integration with the KLD statistic, which will be called Mixed KLDEWMA. The KLD is used as input for a EWMA control chart. Note that one fundamental difference between a Shewhart chart and an EWMA chart is that the former uses only the data observed at the current time point for detecting variation caused by special causes and it ignores all data that are observed at earlier time points, while the latter uses all available data that are observed at both the
5.1. Fault detection in synthetic data In this section, we demonstrate the process-monitoring performance of the proposed methods through their ability to detect anomalies in synthetic data sets that contain several different types of anomaly scenarios. The proposed methods are compared with the conventional the PLS-based methods (Q and T2) and with PLSbased EWMA approach (Harrou et al. (2015)). The synthetic example is simulated to better interpret the proposed anomaly detection schemes. 5.1.1. Data generation A simulation example involving six input variables and one output variable is considered. The input variables are generated as follows:
8 x1 > > > > x > > < 2 x3 x4 > > > > > > x5 : x6
¼ u1 þ ε1 ; ¼ u1 þ ε2 ; ¼ u1 þ ε3 ; ¼ u2 þ ε4 ; ¼ 2u2 þ ε5 ; ¼ 2x1 þ 2x4 þ ε6 ;
(15)
where, εi represents measurement errors, which follow a zeromean Gaussian distribution with a standard deviation of 0.095. The first two input variables, u1 and u2, represent a quad-chirp signal (sinusoidal waveform with quadratically increasing frequency) and a mishmash signal (this signal starts with low-
80
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
frequency oscillations that increase with time), respectively. The other input variables are computed as linear combinations of the first two inputs, which means that the input matrix X ¼ ½x1 ; …; x6 is of rank 2. Then, the output variable is obtained as linear combination of the input variables as follows:
y¼
6 X
ai xi ;
(16)
i¼1
where ai¼{1,2,1,1.5,0.5} with i2½1; 6. A total of thousand samples are generated as the training data. These data, are scaled to have zero mean with a unit variance for use developing a PLS model. Using the cross-validation technique, two LVs were determined as optimal model dimensions for the PLS reference model. After constructing the PLS reference model, the test data set with five hundred samples was also generated with the introduction of three artificial anomalies to evaluate performance. Three different cases of anomalies were simulated: abrupt anomaly, intermittent anomaly, and gradual anomaly (see Fig. 6). In the first case study, it is assumed that the testing data sets contains additive bias anomalies (case A). In the second case study, it is assumed that the testing data set contains intermittent anomalies (case B). In the third case study, drift anomalies are assumed to occur in x1 (case C). However, before discussing the monitoring results, let us first distinguish between the considered anomalies and briefly give the performance measures used in this study to compare the various anomaly detection methods.
5.1.2. Considered anomalies An anomaly is defined as an unpermitted deviation of at least one characteristic property of a variable from its acceptable behavior (Isermann (2005)). Anomalies can also be classified by taking into account the time-variant behavior of an anomaly. Three types of anomalies can be distinguished: abrupt, incipient, and intermittent (see Fig. 6). The anomaly may develop abruptly, like a step-function, or incipiently, like a drift-like function, or intermittent with interruptions as shown in Fig. 6. An abrupt anomaly is characterized by a sudden jump of a process variable or system from nominal into abnormal behavior (see Fig. 6). More specifically, when an abrupt anomaly occurs, the value of the signal jumps from a normal value to a new constant value x(t)þb. The time profile of a process variable with an abrupt anomaly, xf(t), is given by:
xf ðtÞ ¼
xðtÞ xðtÞ þ b
if t < tf if t tf ;
(17)
where x(t) denotes the nominal variable value, b is the bias term or the rate of increase (%), and tf is the time point of fault occurrence. An incipient fault anomaly develops into a larger and larger value (see Fig. 6(b)). Slow degradation of a component can be viewed as an incipient anomaly. The time profile of a process variable with an gradual or incipient anomaly, xf(t), is given by:
( f
x ðtÞ ¼
xðtÞ xðtÞ þ s t tf
if t < tf if t tf ;
(18)
where x(t) denotes the nominal variable value, s is the slope of drift and tf is the time point of fault occurrence. Intermittent anomalies are anomalies that occur and disappear repeatedly (see Fig. 6(c)). Here, we address the problem of detecting abrupt, intermittent, and gradual anomalies encountered by various anomaly detection techniques that have been developed for the safe operation of systems or processes. 5.1.3. Performance measures Anomaly detection performance is evaluated using the common criteria of missed detection rate and false alarm rate. The performance of the developed methods are compared to that of the PLSbased T2, Q, and EWMA charts. Rates of false alarm and missed detection were used to evaluate the performance of the monitoring charts; from a practical point of view, these are important detection performance criteria. A false alarm is an indication of a anomaly when no anomaly has occurred, while missed detection denotes a failure to detect an anomaly that has occurred. Missed detection rate (MDR) is the number of anomalies that do not exceed the control limits (missed detection) over the total number of anomalies.
MDR ¼
missed detection %: faulty data
(19)
The false alarm rate (FAR) is the number of false alarms over the total number of faultless data and is defined as:
FAR ¼
false alarms %: faultless data
(20)
The smaller the values of MDR and FAR, the better the performance of the corresponding monitoring chart. 5.1.4. Detection results 5.1.4.1. Case A: abrupt anomaly. In this case study, we investigated the problem of detecting an abrupt anomaly. The testing data used to compare the various anomaly-detection methods, which consist of 500 samples, are generated using the same model described earlier in Equations (15) and (16). To simulate an abrupt anomaly in the variable x1 , an additive anomaly with a magnitude of 5% of the total variation in x1 is introduced between sample 300 and the end of the testing data. The performances of the Q(x) and T2 charts are shown in Fig. 7(a)e(b), respectively. The horizontal dashed lines represent a 95% confidence limit used to identify the possible faults. These results show that the conventional PLS-based methods (Q(x) and T2) are completely unable to detect this small anomaly. Note that this is because these conventional PLS-based anomaly-detection metrics only take into account the information provided by the present data samples in the decision making process, which makes
Fig. 6. Types of anomalies: (a) abrupt; (b) incipient; (c) intermittent.
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
81
Fig. 7. The time evolution of the T2 (a), Q (b), EWMA (c), KLD-Shewhart (d) and KLD-EWMA (with L¼3 and g¼0.2) (e) statistics in the presence of a bias anomaly in x1 from sample 300 to the end (Case A). The horizontal dashed line denotes the control limit and the blue-shaded region represents the zone where the anomaly is introduced to the test data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
these metrics insufficiently powerful for detecting small changes. Fig. 7(c) shows that the EWMA chart is capable of detecting this simulated anomaly but with several regions of missed detection (i.e., MDR ¼ 32%). Fig. 7(d) shows that although the mixed KLDShewhart index detected this bias anomaly, it resulted in fewer false alarms (i.e., FAR ¼ 2%) and missed detections (MDR ¼ 8%) than the other methods. The results of the mixed KLD-EWMA chart, however, which are illustrated in Fig. 7(e), clearly show the capability of this proposed method to detect a small anomaly without false alarms. This case study clearly shows the superiority of the EWMA-KLD over all other indices. 5.1.4.2. Case B: intermittent anomalies. Intermittent anomalies are the repeated occurrences of transient anomalies (they appear, disappear, and then reappear). The aim of this case study was to assess the potential of the proposed PLS-based KLD method to detect intermittent anomalies. Towards this end, a small bias level, which is 2% of the total variation in x1 , is injected between samples between intervals [200, 250] and [300, 500]. The result of the Q(x) and T2 statistics show that anomalies remained undetectable by applying the conventional PLS statistic (see Fig. 8(a)e(b)). In other words, the conventional PLS approach fails to detect this small anomaly. The EWMA chart is shown in Fig. 8(c) in which l is chosen to be 0.2. It can be seen that the EWMA chart can recognize these anomalies but at the expense of missed detections. Results of PLSbased KLD-EWMA and KLD-Shewhart anomaly detection algorithms for the considered intemittent anomaly are shown in Fig. 8(d) and 8(e), respectively. From Fig. 8(d), it can be seen that the KLD-Shewhart chart can detect intermittent anomalies but with some missed detections. In contrast to the conventional PLS, the results of KLD-EWMA scheme, which are shown in Fig. 8(e), clearly indicate that the proposed strategy can successfully detect this anomaly. The KLD-EWMA scheme was constructed using the
EWMA parameters l¼0.2 and L¼3. This case study further testifies to the superior performance of the KLD-EWMA over other indices. Results of the monitoring charts were evaluated by calculating the percentage of false alarms and missed detections; performances are summarized in Table 1. Fewer missed detections and false alarms confirm the greater efficiency of the KLD-EWMA chart over the other charts. 5.1.4.3. Case C: gradual anomaly. Gradual or incipient anomalies, for example a slow drift in a sensor, are more subtle and the impact is less obvious. However, if left unattended for a long period of time, incipient anomalies might degrade the required performance of the inspected system, which could to catastrophic damages. Herein, the performance of PLS-based EWMA-KLD and Shewhart-KLD anomaly detection methods are illustrated and compared with that of conventional PLS. The testing data set, which is simulated using the same model given in Equation (15), consists of 500 data samples. The testing data set is first scaled with the mean and standard deviation of the training fault-free data. The aim of this case study is to assess the potential of the proposed PLS-based KLD anomaly detection methods to detect incipient or gradual anomalies. To do so, a slow increase in the input variable x1 with a slope of 0.01 was added to the test data starting at sample number 300 of the simulated testing data. The Q(x) and T2 statistics for this case are plotted in Fig. 9(a) and 9(b), respectively. The Q statistic exceeds the control limit at the 400th sample, while the T2 statistic indicates that no anomaly has been detected. Thus, the T2 chart is completely unable to detect this simulated anomaly. The EWMA chart is shown in Fig. 9(c), shows that the charting statistic crosses the decision threshold with the first signal at sample number 340, while the KLD-Shewhart chart shown in Fig. 9(d), shows its first signal at sample number 309th. The results of KLD-EWMA scheme shown in Fig. 9(e) clearly indicate that the proposed strategy successfully
82
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
Fig. 8. The time evolution of the T2 (a), Q (b), EWMA (c), KLD-Shewhart (d) and KLD-EWMA (with L¼3 and g¼0.2) (e) statistics in the presence of a intermittent anomaly in x1 between samples the interval ½200; 250∪½300; 500 (Case B). The horizontal dashed line denotes the control limit and the blue-shaded region represents the zone where the anomaly is introduced to the test data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Table 1 Percentage of false alarms and missed detections for all statistical charts. Statistical chart
T2 Q EWMA KLD-Shewhart KLD-EWMA
Case A
Case B
FAR
MDR
FAR
MDR
0.66 0 0 2 0
100 100 32 8 0
0.4 0 0 2 1.6
0.8 100 58.8 22 0
detects this anomaly. The KLD-EWMA statistic gradually increased as the anomaly slowly developed and began to violate the threshold value when the size of the anomaly became sufficiently large that it was detected by this model. From the plot, the KLDEWMA chart gives a signal of anomaly at sample number 309th. In the KLD-EWMA chart l and L are chosen to be 0.2, 3, respectively. These results testify to the superior performance of PLS-based KLD charts over all other indices.
5.2. Anomaly-detection in simulated distillation column data In this section, the effectiveness of the proposed methods is tested using a distillation column process simulated by Aspen Tech 7.2 (see (Madakyaru et al. (2013)) for details) with added zeromean Gaussian noise, where the predictor variables consist of ten temperatures (Tc) in different trays of the column, feed flow rates, and reflux stream, and the response variable is the composition of the light component in the distillate stream (XD). To construct a PLS model, the process data are split into two parts (i.e., training and testing data sets each containing 512 observations). The PLS model is constructed from the fault-free training data set. The plots of these fault-free measurements data are shown in Fig. 10(a)e(d)) for
the case where the signal-to-noise ratio is 10. The optimal LVs are selected based on prediction ability using the testing data set by the cross validation method. After mean-centering and scaling all variables to unit standard deviation, a PLS model was constructed based on the training data. Based on the cross-validation technique, three LVs were required for the PLS model. We then introduce three types of anomalies into the testing data sets and compare the performance of different anomaly detection techniques. In the first case study, it is assumed that the testing data sets contain additive bias anomlies, case (i). In the second study, intermittent anomalies are assumed to occur in the sensor measuring the temperature Tc3, case (ii). In the third case it is assumed that the testing data set contains drift sensor anomalies, case (iii).
5.2.1. Case (i): abrupt anomaly e bias sensor anomaly In this case study, a bias anomaly, which is 2% of the total variation in temperature Tc3, was incorporated into the temperature sensor measurements Tc3 between samples 100 and 150. The performances of the Q and T2 statistics are demonstrated in Fig. 11(a) and 11(b), respectively. The dashed lines represent a 95% confidence limit used to identify the possible anomalies. These results show that the conventional PLS-based methods (Q and T2) are completely unable to detect this small bias anomaly. The EWMA chart in the testing data considered is shown in Fig. 11(c). This chart is capable of detecting this small anomaly but with a few false alarms. The smoothing parameter l and the control limit width L used are 0.2 and 3, respectively. Application of the two developed charts to the testing data with bias sensor anomaly are shown in Fig. 11(d)e(e). The plots show that all the introduced abrupt anomalies are detected by the charts based on KLD. The plots in the Fig. 11(e), clearly show the capability of this proposed method in detecting this small anomaly without false alarms.
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
83
Fig. 9. The time evolution of the T2 (a), Q (b), EWMA (c), KLD-Shewhart (d), and KLD-EWMA (with L¼3 and g¼0.2) (e) statistics in the presence of drift fault with slope 0.01 in x1 (Case C). The horizontal dashed line denotes the control limit and the blue-shaded region represents the zone where the anomaly is introduced to the test data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 10. Distillation Column Example: Dynamic input-output data used for training and testing the model with noise SNR ¼ 10 (solid red line: noise-free data; blue dots: noisy data). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
5.2.2. Case (ii): intermittent anomalies e intermittent bias sensor anomaly The objective of this second case study is to assess the performance of the proposed methods in detecting intermittent anomalies that can occur and disappear repeatedly. To this end, a small bias level of 1% of the total variation in temperature Tc3, is introduced between sample interval [50,100] and a bias of 2% is introduced between sample interval [150, 250]. Monitoring results of
this case study are given in Fig. 12(a)e(e). Traditional PLS (T2 and Q statistics) cannot detect this change (see Fig. 12(a) and (b)). Monitoring results of PLS-based EWMA chart are given in Fig. 12(c). In the EWMA chart, l is set at 0.25 and L is set at 3. The intermittent anomaly is successfully detected after it is introduced into the process, however, many EWMA statistic values stay below its corresponding decision threshold, which means the anomaly cannot be detected continuously. Monitoring results of both new charts,
84
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
Fig. 11. The time evolution of the T2 (a), Q (b), EWMA (with L¼3 and g¼0.2) (c), KLD-Shewhart (d) and KLD-EWMA (with L¼3 and g¼0.2) (e) statistics in the presence of a bias anomaly in the temperature sensor measurements ‘Tc3’ (Case (i)). The horizontal dashed line denotes the control limit and the blue-shaded region represents the zone where the anomaly is introduced to the test data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 12. The time evolution of the T2 (a), Q (b), EWMA (c), KLD-Shewhart (d) and KLD-EWMA (with L¼3 and g¼0.2) (e) statistics in the presence of intermittent anomalies in x5 (Case (ii)). The horizontal dashed line denotes the control limit and the blue-shaded region represents the zone where the anomaly is introduced to the test data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87 Table 2 Percentage of false alarms and missed detections of monitoring statistics. Statistical chart
2
T Q EWMA KLD-Shewhart KLD-EWMA
Case A e example 1
Case A e example 2
FAR
MDR
FAR
MDR
7.75 4.75 1.5 1.25 0.75
100 100 0 0 0
7.75 4.75 1.42 1.42 0.57
100 100 31.33 4.66 0.66
KLD-Shewhart and KLD-EWMA, are given in Fig. 12(d)e(e). Comparing results shown in Fig. 12(c), (d), and 12(e) finds that the monitoring performance greatly improves, especially by the KLDEWMA statistic. The parameters l and L for the KLD-EWMA monitoring chart are the same as for the first case study. This case clearly highlights the advantage of the PLS-based KLD methods over the conventional PLS method, especially in the case of small anomalies. The reason PLS-based KLD anomaly detection method outperforms the conventional PLS method lies in the fact that the former uses the KLD metric, which is well reputed by its high sensitivity to measure the dissimilarity between the two distributions before and after anomaly. As we know, the FAR and MDR are key indices of successful evaluation of anomaly detection. Table 2 shows FAR and MDR of all monitoring statistics. The smaller the MDR and FAR values the better the performance of the corresponding statistic. The superiority of the KLD-EWMA method over all other methods can be confirmed by results in Table 2, showing that the KLD-EWMA method has the lowest FAR and MDR among all four monitoring indices.
85
5.2.3. Case (iii): gradual anomaly e slow drift sensor anomaly Slow drifting or ramping sensor anomaly generally shows a slow or gradual degradation of sensor characteristics over a long period of time. Owing to the small magnitude of the anomaly, its impacts are not immediately obvious. This case is aimed to assess the potential of the proposed PLS-based KLD anomaly detection schemes to detect a slow drift anomaly. A slow drifting sensor anomaly with a slope of 0.01 was added to the temperature sensor Tc3 starting at sample 250 lasting until the end of the testing data. Monitoring results of T2 and Q statistics are shown in Fig. 13(a)e(b). Fig. 13(a) shows that the T2 statistic cannot detect this incipient change. We can conclude therefore that the T2 statistic cannot detect anomalies with low amplitude because the variation of the projection of these anomalies in the principal subspace can be masked by variations in the measurements of normal operation. Fig. 13(b) illustrates that the Q chart detected anomaly at sample 356. The EWMA statistic gradually increased as the anomaly slowly developed and began to violate the decision threshold at the 300th sample, when the magnitude of the anomaly became sufficiently large enough to be detected by the given model (see Fig. 13(c)). The two new charts, KLD-Shewhart and KLD-EWMA, increased linearly from sample 250, exceeding the control limits at signal 290 sample and 288, respectively (see Fig. 13(d)e(e)). The KLD-EWMA chart detected slow drifts of mean much faster than all other charts. The superiority of the new KLD-EWMA chart over all other indices is verified again, both in sensitivity and detection rate. In summary, the results of this study show that the KLD-EWMA chart outperforms the other charts for the cases we considered by detecting all anomalies with a smaller number of false alarms. These results show that the KLD measure was an effective anomaly indicator, with higher sensitivity to small changes. We have also shown that KLD-based charts are suitable for detecting different
Fig. 13. The time evolution of the T2 (a), Q (b), EWMA (c), KLD-Shewhart (d), and KLD-EWMA (with L¼3 and g¼0.2) (e) statistics in the presence of a drift sensor anomaly in x5 (Case (iii)). The horizontal dashed line denotes the control limit and the blue-shaded region represents the zone where the anomaly is introduced to the test data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
86
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87
types of anomalies: abrupt, intermittent, and gradual. Therefore, based upon the two case studies, the new method seems to offer a more sensitive process-monitoring capability than existing methods. 6. Conclusion Early detection of small faults is necessary to prevent the serious economic consequences that can result from an accumulation of small anomalies. In this paper, an incipient anomaly detection technique based on the KLD measure was developed. PLS was used as the modeling framework of the proposed PLS-based KLD anomaly detection methodology. The KLD measure is proposed to be an anomaly indicator, and specifically, it is used to measure the dissimilarity between the probability density function of fault-free and faulty data. The KLD should be null if the process is under nominal conditions but deviate from zero when an anomaly occurs due to environmental change (noise presence) that affects the monitoring process. This paper reports the development of two new monitoring charts based on the KLD measure, KLD-Shewhart and KLD-EWMA charts. Compared to T2, Q, and PLS-based EWMA charts given in (Harrou et al. (2015)), The KLD monitoring strategy using PLS is conceptually more straightforward and more sensitive for the detection of incipient anomalies. The effectiveness and superiority of KLD-Shewhart and KLD-EWMA charts were demonstrated with synthetic data and data from a simulated distillation column. We showed that satisfactory detection results were obtained using the proposed method, especially for detecting small anomalies. The anomaly detection methods developed here rely on linear PLS. However, most environmental and chemical processes are nonlinear and may exhibit some dynamic characteristics. Therefore, in future work, we plan to extend the advantages of KLDEWMA and KLD-Shewhart anomaly detection charts to handle nonlinearities (e.g., using kernel PLS). Acknowledgement This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No: OSR-2015-CRG4-2582. References Anderson, A., Haas, H., 2011. Kullback-leibler divergence (KLD) based anomaly detection and monotonic sequence analysis. In: IEEE Vehicular Technology Conference (VTC Fall). IEEE, pp. 1e5. Basseville, M., 2013. Divergence measures for statistical data processing-an annotated bibliography. Signal Process. 93 (4), 621e633. Basseville, M., Nikiforov, I., 1993. Detection of Abrupt Change: Theory and Application. Prentice Hall (Information and System Sciences Series). Belov, D., Armstrong, R., 2011. Distributions of the kullbackeleibler divergence with applications. Br. J. Math. Stat. Psychol. 64 (2), 291e309. Bissell, D., 1994. Statistical Methods for SPC and TQM, vol. 26. CRC Press. €f, M., 2014. Bittencourt, A., Saarinen, K., Sander-Tavallaey, S., Gunnarsson, S., Norrlo A data-driven approach to diagnostics of repetitive processes in the distribution domaineapplications to gearbox diagnostics in industrial robots and rotating machines. Mechatronics 24 (8), 1032e1041. Borovkov, A., 1998. Mathematical Statistics. Gordon and Breach Sciences Publishers, Amsterdam. Chaing, L., Russel, E., Braatz, R., 2001. Fault Detection and Diagnosis in Industrial Systems. Springer, London. Chen, H., Chung, A., Yu, S.S., Norbash, A., Wells, W., 2003. Multi-modal image registration by minimizing kullback-leibler distance between expected and observed joint class histograms. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2. IEEE pp. IIe570. Coetzee, F., 2005. Correcting the kullbackeleibler distance for feature selection. Pattern Recognit. Lett. 26 (11), 1675e1683. Dhara, V., Dhara, R., 2002. The Union Carbide disaster in bhopal: a review of health effects. Archives of Environmental Health. Int. J. 57 (5), 391e404. Dong, J., Zhang, K., Huang, Y., Li, G., Peng, K., 2015. Adaptive total pls based quality-
relevant process monitoring with application to the Tennessee eastman process. Neurocomputing 154, 77e85. Frank, L., Friedman, J., 1993. A statistical view of some chemometrics regression tools. Technometrics 35 (2), 109e135. Geladi, P., Kowalski, B., 1986. Partial least-squares regression: a tutorial. Anal. Chim. acta 185, 1e17. Gertler, J., 1998. Fault Detection and Diagnosis in Engineering Systems. CRC press. Gupta, A., Parameswaran, S., Lee, C.-H., 2009. Classification of electroencephalography (EEG) signals for different mental activities using kullback leibler (KL) divergence. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP. IEEE, pp. 1697e1700. Harrou, F., Fillatre, L., Nikiforov, I., 2014. Anomaly detection/detectability for a linear model with a bounded nuisance parameter. Annu. Rev. Control 38 (1), 32e44. Harrou, F., Nounou, M., 2014. Monitoring linear antenna arrays using an exponentially weighted moving average-based fault detection scheme. Syst. Sci. Control Eng. Open Access J. 2 (1), 433e443. Harrou, F., Nounou, M.H.N.N., Madakyaru, M., 2015. PLS-based EWMA fault detection strategy for process monitoring. J. Loss Prev. Process Ind. 38 (1), 108e119. Hotelling, H., 1933. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417e441. Isermann, R., 2005. Model-based fault-detection and diagnosis: status and applications. Annu. Rev. Control 29, 71e85. Jackson, J., Mudholkar, G., 1979. Control procedures for residuals associated with principal component analysis. Technometrics 21, 341e349. Kembhavi, A., Harwood, D., Davis, L., 2011. Vehicle detection using partial least squares. IEEE Trans. Pattern Anal. Mach. Intell. 33 (6), 1250e1265. rriz, J., Brahim, A., Segovia, F., 2015. Early diagnosis of Khedher, L., Ramírez, J., Go alzheimer’ s disease based on partial least squares, principal component analysis and support vector machine using segmented MRI images. Neurocomputing 151, 139e150. Kinnaert, M., Nyberg, M., Basseville, M., 2001. Discussion on: ’on fault detectability and isolability’ by m. basseville. Eur. J. Control 7 (6), 638e641. Kourti, T., MacGregor, J., 1995. Process analysis, monitoring and diagnosis using multivariate projection methods: a tutorial. Chemom. Intell. Lab. Syst. 28 (3), 3e21. Kullback, S., Leibler, R., 1951. On information and sufficiency. Ann. Math. Stat. 79e86. Lee, G., Han, C., Yoon, E., 2004. Multiple-fault diagnosis of the Tennessee eastman process based on system decomposition and dynamic PLS. Ind. Eng. Chem. Res. 43 (25), 8037e8048. Lee, H., Lee, M., Park, J., 2009. Multi-scale extension of PLS algorithm for advanced on-line process monitoring. Chemom. Intell. Lab. Syst. 98 (2), 201e212. Li, B., Morris, J., Martin, E., 2002. Model selection for partial least squares regression. Chemom. Intell. Lab. Syst. 64 (1), 79e89. Li, G., Zeng, X., Yang, J., Yang, M., 2007. Partial least squares based dimension reduction with gene selection for tumor classification. In: Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on. IEEE, pp. 1439e1444. Liese, F., Miescke, K.-J., 2008. Statistical Decision Theory: Estimation, Testing, and Selection. Springer Science & Business Media. Lu, N., Gao, F., Wang, F., 2004. Sub-pca modeling and on-line monitoring strategy for batch processes. AIChE J. 50 (1), 255e259. Lucas, J., Saccucci, M., 1990. Exponentially weighted moving average control schemes: properties and enhancements. Technometrics 32 (1), 1e12. MacGregor, J., Jaeckle, C., Kiparissides, C., Koutoudi, M., 1994. Process monitoring and diagnosis by multiblock PLS methods. AIChE J. 40 (5), 826e838. Madakyaru, M., Nounou, M., Nounou, H., 2013. Integrated multiscale latent variable regression and application to distillation columns. Model. Simul. Eng. 2013, 3. Montgomery, D.C., 2005. Introduction to Statistical Quality Control. John Wiley& Sons, New York. Muradore, R., Fiorini, P., 2012. A PLS-based statistical approach for fault detection and isolation of robotic manipulators. IEEE Trans. Ind. Electron. 59 (8), 3167e3175. Neumann, J., Deerberg, G., Schlüter, S., 1999. Early detection and identification of dangerous states in chemical plants using neural networks. J. Loss Prev. Process Ind. 12 (6), 451e453. Nimmo, I., 1995. Adequately address abnormal operations. Chem. Eng. Prog. 91 (9). Nomikos, P., MacGregor, J., 1995. Multi-way partial least squares in monitoring batch processes. Chemom. Intell. Lab. Syst. 30 (1), 97e108. Pardo, L., 2005. Statistical Inference Based on Divergence Measures. CRC Press. -Cornell, M.E., 1993. Learning from the piper alpha accident: a postmortem Pate analysis of technical and organizational factors. Risk Anal. 13 (2), 215e232. Peng, K., Zhang, K., You, B., Dong, J., 2015. Quality-related prediction and monitoring of multi-mode processes using multiple PLS with application to an industrial hot strip mill. Neurocomputing 168, 1094e1103. Qin, S., 1998. Recursive PLS algorithms for adaptive data modeling. Comput. Chem. Eng. 22 (4), 503e514. Qin, S., 2003. Statistical process monitoring: basics and beyond. J. Chemom. 17 (8/9), 480e502. Romano, D., Kinnaert, M., 2006. Robust fault detection and isolation based on the kullback divergence. In: Fault Detection, Supervision and Safety of Technical Processes, vol. 6, pp. 426e431. Roodbali, E., Sadegh, M., Shahbazian, M., 2011. Multi-scale PLS modeling for industrial process monitoring. Int. J. Comput. Appl. 26 (6). Serpas, M., Chu, Y., Hahn, J., 2013. Fault detection approach for systems involving
F. Harrou et al. / Journal of Loss Prevention in the Process Industries 44 (2016) 73e87 soft sensors. J. Loss Prev. Process Ind. 26 (3), 443e452. Teppola, P., Minkkinen, P., 2000. Wavelet-PLS regression models for both exploratory data analysis and process monitoring. J. Chemom. 14 (5e6), 383e399. Ullah, A., 1996. Entropy, divergence and distance measures with econometric applications. J. Stat. Plan. Inference 49 (1), 137e162. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S., Yin, K., 2003a. A review of process fault detection and diagnosis part I: quantitative model-based methods. Comput. Chem. Eng. 27, 293e311. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S., Yin, K., 2003b. A review of process fault detection and diagnosis part III: process history based methods. Comput. Chem. Eng. 27, 327e346. Wold, S., Ruhe, H., Wold III, H., W. D, 1984. The collinearity problem in linear regression. the partial least squares (PLS) approach to generalized inverses. SIAM J. Sci. Stat. Comput. 5 (3), 735e743.
87
Yin, S., Ding, S., Haghani, A., Hao, H., Zhang, P., 2012. A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee eastman process. J. Process Control 22 (9), 1567e1581. Yin, S., Steven, X., Xiaochen, X., Hao, L., 2014. A review on basic data-driven approaches for industrial process monitoring. IEEE Trans. Ind. Electron. 61 (11), 6418e6428. Yin, S., Xiangping, Z., Okyay, K., 2015. Improved PLS focused on key-performanceindicator-related fault diagnosis. IEEE Trans. Ind. Electron. 62 (3), 1651e1658. Zeng, J., Kruger, U., Geluk, J., Wang, X., Xie, L., 2014. Detecting abnormal situations using the kullbackeleibler divergence. Automatica 50 (11), 2777e2786. Zhang, W., Shan, S., Chen, X., Gao, W., 2007. Local gabor binary patterns based on kullbackeleibler divergence for partially occluded face recognition. IEEE Signal Process. Lett. 14 (11), 875e878.