9th International Symposium on Advanced Control of Chemical Processes June 7-10, 2015. Whistler, British Columbia,Control Canadaof Chemical Processes 9th International Symposium on Advanced 9th Symposium on Control of Processes Available online at www.sciencedirect.com 9th International International Symposium on Advanced Advanced Control of Chemical Chemical Processes June 7-10, 2015. Whistler, British Columbia, Canada June June 7-10, 7-10, 2015. 2015. Whistler, Whistler, British British Columbia, Columbia, Canada Canada
ScienceDirect IFAC-PapersOnLine 48-8 (2015) 617–622 Robust Process monitoring via Stable Principal Component Pursuit Robust Process monitoring via Stable Principal Component Pursuit Robust via Stable Principal Component Pursuit * Robust Process Process monitoring monitoring via Stable Principal Component Pursuit Chun-Yu Chen, Yuan Yao
Chun-Yu Chen, Yuan Yao*** Chun-Yu Chun-Yu Chen, Chen, Yuan Yuan Yao Yao Department of Chemical Engineering, National Tsing Hua University, Department of Chemical Engineering, National Tsing Hua University, Hsinchu, 30013, Taiwan, ROC Department of Engineering, National Tsing University, Department of Chemical Chemical Engineering, National Tsing Hua Hua University, Hsinchu, 30013, ROC * (Corresponding author: Tel: 886-3-5713690, Fax:Taiwan, 886-3-5715408, Email:
[email protected]) Hsinchu, 30013, Taiwan, ROC Hsinchu, 30013, Taiwan, ROC * (Corresponding author: Tel: 886-3-5713690, Fax: 886-3-5715408, Email:
[email protected]) * * (Corresponding (Corresponding author: author: Tel: Tel: 886-3-5713690, 886-3-5713690, Fax: Fax: 886-3-5715408, 886-3-5715408, Email: Email:
[email protected])
[email protected]) Abstract: For enhancing product quality and operation safety, statistical process monitoring has become Abstract: For technique enhancinginproduct and operation safety, component statistical process become an important processquality industries, where principal analysismonitoring (PCA) is ahas commonly Abstract: For enhancing product quality and operation safety, statistical process monitoring has become Abstract: For technique enhancingin product quality and safety, statistical process become an important process industries, where principal component analysismonitoring is ahas commonly used method. However, PCA assumes that theoperation training data matrix only contains an(PCA) underlying low-rank an important technique in process industries, where principal component analysis (PCA) is aa commonly an important technique in process industries, where principal component analysis (PCA) is commonly used method. However, PCA noise. assumes thatgross the training matrix only contains an underlying low-rank structure corrupted by dense When sparse data errors, i.e. outliers, exist, PCA often fails. In this used However, PCA assumes that the data matrix only an underlying low-rank used method. method. However, PCA noise. assumes that the training training data matrix only contains contains an(SPCP) underlying low-rank structure corrupted by dense When gross sparseprincipal errors, i.e. outliers, exist, PCA often isfails. In this paper, a robust matrix recovery method called stable component pursuit utilized to structure corrupted by dense noise. When gross sparse errors, i.e. outliers, exist, PCA often In structure corrupted by noise. Whencalled grossmonitoring sparseprincipal errors, i.e. outliers, exist, PCA oftenonisfails. fails. In this this paper, a robust matrix recovery method stable component pursuit (SPCP) utilized to solve this problem. A dense process modeling and procedure is developed based SPCP, the paper, aa robust matrix recovery method called stable component pursuit (SPCP) is utilized to paper, robust method stable principal principal component pursuit utilizedthe to solve this problem. A isrecovery process modeling andbenchmark monitoring procedure is developed based onis SPCP, effectiveness of matrix which illustrated usingcalled the Tennessee Eastman process.(SPCP) solve this problem. A process modeling and monitoring procedure is developed based on SPCP, the solve this problem. A isprocess modeling andbenchmark monitoring procedure is developed based on SPCP, the effectiveness of which illustrated using the Tennessee Eastman process. Keywords: robust process monitoring, principal component pursuit, singular effectiveness of which is using the Tennessee process. © 2015, IFAC (International Federation ofstable Automatic Control) Hosting Eastman by Elsevier Ltd. Allvalue rightsthresholding, reserved. effectiveness of which is illustrated illustrated using the benchmark benchmark Tennessee Eastman process. Keywords: robustprincipal process component monitoring,analysis stable principal component pursuit, singular value thresholding, matrix recovery, Keywords: robust process monitoring, stable Keywords: robustprincipal process component monitoring,analysis stable principal principal component component pursuit, pursuit, singular singular value value thresholding, thresholding, matrix recovery, matrix matrix recovery, recovery, principal principal component component analysis analysis
1. INTRODUCTION 1. INTRODUCTION 1. In the area of process monitoring, multivariate statistical 1. INTRODUCTION INTRODUCTION In the area of process multivariate methods have become monitoring, popular techniques duestatistical to the In the of process monitoring, multivariate statistical In the area area process monitoring, multivariate statistical methods have becomeFirstly, popular techniques dueprocesses to the following twoofreasons. modern production methods have become popular techniques due to methods have become popular techniques due to itthe the following two reasons. Firstly, modern production processes become more and more complex. As a consequence, is following two reasons. Firstly, modern production processes following two reasons. modern processes become more and moreFirstly, As production a with consequence, it is usually time-consuming tocomplex. model them the traditional become more more As consequence, it become more and andmethods more tocomplex. complex. As aa with consequence, it is is usually time-consuming model them the traditional first principle or knowledge-based methods. usually time-consuming to model them with the traditional usually time-consuming to model them with the traditional first principle methodsof or knowledge-based methods. Secondly, large amounts process data are collected by the first principle methods or knowledge-based methods. first principle methods or knowledge-based Secondly, large amounts of (DCS) process data are by the distributed control system which is collected widely methods. utilized, Secondly, large amounts of process data collected by Secondly, large amounts of (DCS) process data are are by the the distributed control system which isIncollected widely utilized, which contain useful process information. such situation, distributed control system (DCS) which is widely utilized, distributed control system (DCS) which is widely utilized, which contain useful process In such data-based statistical modelsinformation. that require little situation, process which contain useful process information. In situation, which contain useful process information. In such such situation, data-based statistical models thathave require littleincreasing process knowledge and are easy to build attracted data-based statistical models that require little process data-based statistical models that require little knowledge and are easy to build have attracted increasing attentions since last decades (Qin, 2012). Amongprocess them, knowledge and are easy to build have attracted increasing knowledgecomponent and are to build have attracted increasing attentions since lasteasy decades (Qin, 2012). Among them, principal analysis (PCA) (Jolliffe, 2002) is attentions since last decades (Qin, 2012). Among them, attentions since decades (Qin, 2012). Among them, principal component analysis (PCA) (Jolliffe, 2002) is probably the mostlast commonly used method. principal component analysis (PCA) (Jolliffe, 2002) is principal component analysis probably the most commonly used(PCA) method.(Jolliffe, 2002) is probably the most commonly used In the step process modeling, PCA implements a linear probably theof most commonly used method. method. In the step of of aprocess modeling, implements linear projection training data setPCA recorded under anormal In the of modeling, PCA implements a linear In the step stepfrom of aprocess process modeling, implements linear projection of data setPCA recorded under anormal operation thetraining observed high-dimensional variable space projection of data set recorded under normal projection of aathetraining training data set recorded under normal operation from observed high-dimensional variable space into the low-dimensional latent space, and at the same time operation from the observed high-dimensional variable space operation from observed high-dimensional variable space into the low-dimensional latent space, and the sameif time maximizes the the variance. From another pointat of view, the into the latent space, and at the same time into the low-dimensional low-dimensional latent space, theview, same time maximizes the variance. another point of the normalized training data From is stored in a and dataatmatrix X, ifPCA maximizes the variance. From another point of view, if the maximizes another point of the normalized training islow-rank stored in amatrix data matrix X, ifPCA decomposesthe Xvariance. intodataa From L view, containing normalized training data is stored in a data matrix X, PCA normalized training data is stored in a data matrix X, PCA decomposes X intoinformation a low-rank containing systematic variation and amatrix matrix L N comprising decomposes X aa low-rank matrix L containing decomposes X into into L containing systematic information amatrix matrix N comprising small densevariation noise, i.e. X =low-rank L + and N, by conducting singular systematic variation information and a matrix N comprising systematic variation information and a matrix N comprising small dense noise, i.e. X = L + N, by conducting singular value decomposition (SVD). It is well established that PCA small dense noise, i.e. X L by conducting singular small dense noise, optimal i.e.(SVD). X = =estimate LIt + +is N, N, conducting singular value well established PCA gives adecomposition statistically ofbythe low-rankthat subspace value decomposition (SVD). It is established that PCA value (SVD). is well well established that PCA gives statistically optimal estimate of the low-rank subspace if the adecomposition process disturbance isItindependently and identically gives aa statistically optimal estimate of the low-rank subspace gives statistically optimal estimate of the low-rank subspace if the process disturbance and identically Gaussian distributed. Due isto independently its superior property, many if the disturbance is independently and identically if the process process and monitoring identically Gaussian distributed. Due isbeen to independently its proposed superior property, many extensions of disturbance PCA have for Gaussian distributed. Due to its superior property, many Gaussian distributed. Due to its superior property, many extensions of of PCA have been proposed and for MacGregor, monitoring different types processes, e.g. (Nomikos extensions of PCA have been proposed for monitoring extensions of PCA have been proposed for monitoring different processes, e.g.2004, (Nomikos and2004, MacGregor, 1995, Ku types et al., of 1995, Lee et al., Lu et al., Yao et different types of processes, e.g. (Nomikos and MacGregor, different processes, e.g.2004, (Nomikos and MacGregor, 1995, Ku types etHowever, al., of 1995, Lee al., Lu et al., Yao et al., 2010). PCAetperforms poorly in 2004, dealing with 1995, Ku et al., 1995, Lee et al., 2004, Lu et al., 2004, Yao et 1995, Ku et al., 1995, Lee et al., 2004, Lu et al., 2004, Yao al., 2010). However, PCA performs poorly in dealing with gross sparse errors contained in training data set. Even witheta al., 2010). However, PCA performs poorly in dealing with al., 2010). PCA performs poorly in dealing with gross sparseHowever, errors contained in training data set. Evenfails with little amount of grossly corrupted entries in X, PCA toa gross sparse contained in data set. Even with gross amount sparse errors errors contained in training training set.PCA Evenfails withtoaa little of grossly corrupted entriesdata in X, little little amount amount of of grossly grossly corrupted corrupted entries entries in in X, X, PCA PCA fails fails to to
estimate the low-rank subspace correctly. In other words, estimate therobust low-rank subspace correctly. In other words, PCA is not to process outliers. estimate the low-rank subspace correctly. estimate therobust low-rank subspace correctly. In In other other words, words, PCA is not to process outliers. PCA is not robust to process outliers. Many methods have been proposed to deal with this problem, PCA is not robust to process outliers. Many methods have function been proposed to deal with this problem, including influence techniques (Huber, 1981, De la Many methods methods have have been been proposed proposed to to deal deal with with this problem, problem, Many includingand influence function techniques (Huber,this 1981, De la Torre Black, 2003), multivariate trimming including influence function techniques (Huber, De includingand influence function techniques (Huber, 1981, 1981, De la la Torre Black, 2003), multivariate trimming (Gnanadesikan and Kettenring, 1972), alternating Torre and Black, 2003), multivariate trimming Torre and Black, 2003), multivariate trimming (Gnanadesikan Kettenring, minimization (Keand and Kanade, 2005),1972), random alternating sampling (Gnanadesikan and Kettenring, 1972), alternating (Gnanadesikan Kettenring, alternating minimization (Keand and random sampling techniques (Fischler andKanade, Bolles, 2005), 1981),1972), etc. Nevertheless, minimization (Ke and Kanade, 2005), random sampling minimization (Ke and Kanade, 2005), random samplinga techniques (Fischler and Bolles, 1981), etc. Nevertheless, none of the above-mentioned approaches yields techniques (Fischler and Bolles, 1981), etc. Nevertheless, techniques and Bolles, 1981), etc. Nevertheless, none of (Fischler the above-mentioned approaches yields a polynomial-time algorithm with strong performance none of the approaches yields a none of (Candès the above-mentioned above-mentioned approaches yields polynomial-time algorithm withTherefore, strong Candès performance guarantees et al., 2011). et al.a polynomial-time algorithm with strong performance polynomial-time algorithm with strong performance guarantees (Candès al., 2011). Therefore, Candèsof et al. (Candès et al., 2011) et proposed another robust version PCA, guarantees (Candès et al., Therefore, Candès et al. guarantees (Candès et al., 2011). 2011). Therefore, Candès al. (Candès et al., 2011)component proposed another robust version of et PCA, named principal pursuit (PCP), aiming to (Candès et al., 2011) proposed another robust version of PCA, (Candès et al., 2011) proposed another robust version of PCA, named principal component pursuit (PCP), aiming to recovering a low-rank matrix L from observations X = L + S, named principal component pursuit (PCP), aiming to named component pursuit (PCP), aiming to recovering a low-rank matrixmatrix L from X=L + S, where theprincipal unknown sparse S observations consists of gross sparse recovering a low-rank matrix L from observations X = L + S, recovering a low-rank matrix L from observations X = L + S, where the unknown sparse matrix S consists of gross sparse errors. Unlike the conventional PCA where the entries in N where unknown sparse matrix S consists of sparse where the the sparse of gross gross errors. Unlike the the conventional where entriessparse in N should be unknown small, entriesmatrix in PCA S S canconsists have the arbitrarily large errors. Unlike the conventional PCA where the entries in errors. Unlike the the conventional the entries in N N should be small, entries in PCA Smethod canwhere have arbitrarily large magnitudes. However, the PCP ignores the dense should be small, the entries in S can have arbitrarily large shouldterm, be small, theit entries in to Smethod can have arbitrarily large magnitudes. However, the PCP ignores the dense noise making unsuited process monitoring, since magnitudes. However, the PCP method ignores the dense magnitudes. However, the PCPto method ignores thealways dense noise term, making unsuited process monitoring, since measurement noise itand other routine disturbances noise term, making it unsuited to process monitoring, since noise term, making it unsuited to process monitoring, since measurement and other routine disturbances always exist in processnoise measurements. measurement noise and other routine disturbances always measurement noise and other routine disturbances always exist in process measurements. exist in In paper, measurements. a recent proposed robust matrix recovery existthis in process process measurements. In this called paper, stable a recent proposed robust pursuit matrix (SPCP) recovery method principal component is In this this paper, paper, aa recent recent proposed proposed robust robust matrix matrix recovery recovery In method principal (SPCP) is extendedcalled to thestable field of process component monitoring.pursuit The organization method called stable principal component pursuit (SPCP) method called principal (SPCP) is is extended to the field of processIncomponent monitoring. The methodology organization of the paper is stable as following. section 2, pursuit the extended to the field of process monitoring. The organization extended to the field of process monitoring. The organization of thebepaper is as following. sectionvector 2, thedecomposition methodology will introduced, including In singular of is as In section 2, methodology of the thebepaper paper isand as following. following. sectionvector 2, the thedecomposition methodology will introduced, including In singular (SVT), SPCP, SPCP-based process modeling and online will be introduced, including singular vector decomposition will be introduced, including singular vector decomposition (SVT), SPCP, and the SPCP-based process modeling andmethod online monitoring. Then, effectiveness of the proposed (SVT), SPCP, and SPCP-based process modeling and online (SVT), SPCP, and SPCP-based process modeling and online monitoring. Then, the effectiveness of the proposed method is illustrated using the benchmark Tennessee Eastman monitoring. Then, the effectiveness of the proposed method monitoring. Then, effectiveness of the method is illustrated using the benchmark Tennessee process in section 3.theFinally, conclusions areproposed made inEastman section is using the Tennessee Eastman is illustrated illustrated using the benchmark benchmark process in section 3. Finally, conclusionsTennessee are made inEastman section 4. process in section 3. Finally, conclusions are process in section 3. Finally, conclusions are made made in in section section 4. 4. 2. METHODOLOGY 4. 2. METHODOLOGY 2. METHODOLOGY METHODOLOGY 2.1 Singular Value2. Thresholding and Stable Principal 2.1 Component Singular Value PursuitThresholding and Stable Principal 2.1 Singular Thresholding and Stable Principal 2.1 Component Singular Value Value PursuitThresholding and Stable Principal Component Pursuit Component Pursuit
Copyright © 2015, 2015 IFAC 618Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © IFAC (International Federation of Automatic Control) Copyright 2015 responsibility IFAC 618Control. Peer review©under of International Federation of Automatic Copyright © 618 Copyright © 2015 2015 IFAC IFAC 618 10.1016/j.ifacol.2015.09.036
IFAC ADCHEM 2015 618 June 7-10, 2015. Whistler, BC, Canada
Chun-Yu Chen et al. / IFAC-PapersOnLine 48-8 (2015) 617–622
As well known, a PCA model can be calculated using SVD. Any 𝑛𝑛 × 𝑚𝑚 matrix X can be uniquely expressed as 𝐗𝐗 = 𝐔𝐔𝐔𝐔𝐕𝐕 𝑇𝑇 ,
(1)
𝐓𝐓 = 𝐗𝐗𝐗𝐗.
(2)
𝐷𝐷𝜏𝜏 (𝐗𝐗): = 𝐔𝐔𝐷𝐷𝜏𝜏 (𝚺𝚺)𝐕𝐕 𝑇𝑇 ,
(3)
𝐷𝐷𝜏𝜏 (𝚺𝚺) = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑({𝜎𝜎𝑖𝑖 − 𝜏𝜏}+ ),
(4)
where S is a sparse matrix with most of its entries being zero. To estimate the unknown matrices L, S and N in equation (7), it proposes solving the following optimization problem: min𝐋𝐋,𝐒𝐒 ‖𝐋𝐋‖∗ + 𝜆𝜆‖𝐒𝐒‖1 +
where U is an orthogonal matrix consisting of left singular vectors, 𝚺𝚺 is the a diagonal matrix where the singular values are sorted in descending order, and V is an orthogonal matrix containing right singular vectors. If X is a normalized process data matrix, n is usually the number of observations and m is the number of variables. The right singular vectors V are the same as the loading vectors in PCA. Thus, the PCA scores T can be derived as
where 𝜎𝜎𝑖𝑖 is the i-th largest singular value contained in 𝚺𝚺, and the subscript ‘+’ means taking the positive part, i.e. t+ = max(0, t). Such operator applies a soft-thresholding rule to the singular values of X, effectively shrinking them toward zero. As a result, if a large number of the singular values are below the threshold τ, the rank of 𝐷𝐷𝜏𝜏 (𝐗𝐗) becomes considerably lower than that of X. Thus, 𝐷𝐷𝜏𝜏 (𝐗𝐗) describes the low-rank matrix recovered from X. It has been proved that: LX
2
2 F
L
*
,
2.2 Process Modeling and Online Monitoring Using SPCP Before conducting SPCP on a training set of process data, the data should be normalized to eliminate the effects of engineering units and measurement ranges. Conventionally, the most widely adopted normalization approach is autoscaling, i.e. mean-centering followed by dividing by the standard deviation. However, the mean is not a robust measure of central tendency, while the standard deviation is not a robust measure of scale. Both statistics are easily affected by the outliers contained in the data. Therefore, robust statistics should be used as replacements. In this paper, the sample median and the median absolute deviation (MAD) are adopted for illustration, where MAD is the median of the absolute values of the differences between the data values and the overall median of the data set. Other robust statistics, such as interquartile range (IQR), Sn and Qn (Rousseeuw and Croux, 1993), may also be chosen.
(5)
where ‖𝐀𝐀‖∗ represents the nuclear norm of matrix A, i.e. the sum of the singular values of A, and ‖𝐀𝐀‖𝐹𝐹 denotes the Frobenius norm of A. Consequently, after conducting SVT, the original data matrix X is decomposed as: L
X = L + N,
(6)
where L = 𝐷𝐷𝜏𝜏 (𝐗𝐗) represents the low-rank structure, and N = X - L is the matrix of dense noise. Efficient algorithm has been developed to directly calculate L without using SVD (Cai and Osher, 2013).
In the next step, SPCP is utilized to decompose the normalized data set X into three parts L, S and N, where L contains the low-rank structure reflecting systematic variation information, the sparse matrix S indicates the outliers corrupting the data set, and N involves the measurement noise.
From the above discussions, it is clear that SVT can be regarded as an alternative algorithm for computing PCA, since it keeps the singular vectors of X and only shrinks the singular values by a soft thresholding. However, similar to SVD, SVT cannot handle the sparse errors, either.
Then, by conducting SVD on L instead of the original data matrix X, the PCA loading vectors V are obtained, which equal to the right singular vectors of L.
To cope with this problem, stable principal component pursuit (Zhou et al., 2010) was developed on the basis of SVT. In SPCP, it assumes that the data matrix X is comprised of three parts: X = L + S + N,
(8)
As stated in (Zhou et al., 2010), if S is fixed to be 0, the solution L of (8) is equal to the singular value thresholding version of X with threshold μ. This means when there is no sparse outlier in the data matrix X, SPCP performs same as SVT. In more general cases where S cannot be overlooked, the solution of (8) provides stable estimates of both L and S. Therefore, the PCA loadings can be achieved by calculating the right singular vectors of L, even if the original data set X is grossly corrupted by sparse outliers. In a sense, SPCP is a robust version of PCA.
where U and V are the left and right singular vectors of X, respectively, and Dτ is the soft-thresholding operator. For each 𝜏𝜏 ≥ 0,
1
‖𝐗𝐗 − 𝐋𝐋 − 𝐒𝐒‖2𝐹𝐹 ,
where ‖𝐀𝐀‖1 is the l1 norm of A viewed as a vector, which equals to the sum of the absolute values of matrix entries. There are two parameters in (8), relating to the sparse error term and the dense noise term, respectively. This problem can be solved efficiently by applying a fast proximal gradient algorithm named accelerated proximal gradient (APG) (Lin et al., 2009), while the selection of λ and μ is discussed in (Zhou et al., 2010).
Recently, a singular value thresholding algorithm (Cai et al., 2010) was proposed to recover a low-rank data matrix from a data matrix corrupted by small dense noise on each entry, which can be formulated as
D ( X ) arg m in
1
2𝜇𝜇
After that, the monitoring statistics T2 and squared prediction error (SPE) and the corresponding control limits can be calculated in a similar way to the classical PCA. The details are as following.
(7)
By selecting the number of the retained latent variables, the original data space (without outliers) can be divided into two 619
IFAC ADCHEM 2015 June 7-10, 2015. Whistler, BC, Canada
Chun-Yu Chen et al. / IFAC-PapersOnLine 48-8 (2015) 617–622
subspaces, i.e. the principal component (PC) space and the residual space. The PC space is constructed by the retained latent variables TA, where TA = (X - S)VA,
The TE process simulation was developed based on an actual industrial process, which consists of four reactants and has two products. There are totally 41 measured variables and 12 manipulated variables. 20 different types of disturbances can be added into the process to simulate process faults. In each test data set, there are 960 observations, where the fault is introduced to the process at the 161 sampling interval.
(9)
and VA contains the first A columns of V, corresponding to the largest A singular values of L. Similarly, the latent variables TR in the residual space can be calculated as: TR = (X - S)VR,
Here, the test data sets IDV(5) and IDV(6) are chosen to demonstrate the feasibility of the proposed SPCP-based monitoring method, where IDV(5) shows the process behavior when there is a step change in condenser cooling water inlet temperature, and IDV(6) is collected when there is a feed loss in stream 1.
(10)
where VR consists of the remained R columns of V, A + R = m, and m is the total number of variables, i.e. the number of columns in X. Then, for each observation, the monitoring statistics T2 and SPE can be derived as: 𝑇𝑇 2 = 𝐭𝐭𝐴𝐴𝑇𝑇 𝚺𝚺𝐴𝐴−1 𝐭𝐭𝐴𝐴 ,
and
In this paper, we are going to choose several sets of the TE process data and add some outliers into it then use the PCA method and the SPCP method to deal with it, so the results will show the performance of both methods in handling the data with grossly corrupted entry.
(11)
𝑆𝑆𝑆𝑆𝑆𝑆 = 𝐭𝐭 𝑇𝑇𝑅𝑅 𝐭𝐭 𝑅𝑅 , (12) 𝑇𝑇 𝑇𝑇 where 𝐭𝐭𝐴𝐴 is a row vector in TA, 𝐭𝐭 𝑅𝑅 is the corresponding row vector in TR, and 𝚺𝚺A is the covariance matrix of TA.
3.2 Monitoring Results Based on Training data Without outlier
As can be seen, T2 summarizes the information in the PC space and SPE is obtained in the residual space. Therefore, these two monitoring statistics are complementary in nature.
In the first case study, a set of normal operation data without outlier are utilized for model training. In such case, the SPCP model is reduced to a SVT model. In both PCA and SPCP modeling, the number of latent variables is selected as 5.
For monitoring purpose, the control limits of T2 and SPE (Montgomery et al., 2009) are derived as (13) and (14). 𝑇𝑇𝑎𝑎2 ~
𝐴𝐴(𝑛𝑛−1) 𝑛𝑛−𝐴𝐴
∙ 𝐹𝐹𝐴𝐴,𝑛𝑛−𝐴𝐴,𝛼𝛼 ,
The monitoring results of the IDV(5) data are shown in Fig. 1 and Fig. 2. In both figures, the monitoring results of the first 250 sampling intervals are displayed, and the logarithm values of the monitoring statistics are plotted to facilitate demonstration. The straight red lines in the figures are the control limits.
(13)
where α is the significant level, n is the sample size, i.e. the total number of rows in X. 𝑆𝑆𝑆𝑆𝑆𝑆𝛼𝛼 = 𝜃𝜃1 [
𝐶𝐶𝛼𝛼 √2𝜃𝜃2 ℎ02 𝜃𝜃1
+1+
] ,
𝜃𝜃2 ℎ0 (ℎ0 −1) 𝜃𝜃12
𝑚𝑚−𝐴𝐴 𝑖𝑖 𝜆𝜆𝑗𝑗 (𝑖𝑖 = 1,2,3), ℎ0 = 1 − where 𝜃𝜃𝑖𝑖 = ∑𝑗𝑗=1
1 ℎ0
2𝜃𝜃1 𝜃𝜃3 3𝜃𝜃22
619
(14)
In Fig. 1, the T2 statistic of the PCA model detects the fault at the 173th sampling interval, while the SPE statistic achieves a little more efficient detection.
, λj is the j-
th largest eigenvalue of the covariance matrix of TR, and Cα is the critical value of normal distribution under significant level of α.
In online applications, when a new observation is obtained, it is normalized using the median and MAD values calculated from the training data. Then, the normalized new observation xT is projected to the PCA subspaces by multiplying it by the loadings: T
T
T
T
t A x VA ,
t R x VR
.
(14) (15)
Afterwards, the T2 and SPE statistics can be calculated using (11) and (12). If any of the statistics goes outside the corresponding control limit, a fault is detected.
3. Application results 3.1 Tennessee Eastman Process
Fig. 1. PCA monitoring results of IDV(5) based on training data without outlier
In this section, the Tennessee Eastman (TE) process (Downs and Vogel, 1993), which has been widely used for testing the technologies of process monitoring or control, is adopted to illustrate the proposed method.
Fig. 2 shows that in such situation, the monitoring efficiency of SPCP is similar to that of PCA. Both T2 and SPE detect the fault soon after the step change in condenser cooling water inlet temperature occurs. Therefore, it is natural to infer that
620
IFAC ADCHEM 2015 620 June 7-10, 2015. Whistler, BC, Canada
Chun-Yu Chen et al. / IFAC-PapersOnLine 48-8 (2015) 617–622
PCA and SPCP have similar performance when the training data are not corrupted by gross sparse errors.
Fig. 5 shows the failure of PCA in detection of the process abnormality contained in IDV(5). When the training data are grossly corrupted, neither T2 or SPE can detect the step change happening at the 161st sampling interval. All the sample points are under the control limits.
Fig. 2. SPCP monitoring results of IDV(5) based on training data without outlier Such conclusion is confirmed by the monitoring results of IDV(6). In Fig. 3, the T2 and SPE statistics of the PCA model detect the fault at the 167 and 173 sampling intervals, respectively.
Fig. 4. SPCP monitoring results of IDV(6) based on training data without outlier
Fig. 5. PCA monitoring results of IDV(5) based on training data with outlier
Fig. 3. PCA monitoring results of IDV(6) based on training data without outlier
In comparison, SPCP can still build an effective model to monitored the process, even though the training data are corrupted. The fault detection results shown in Fig. 6 can tell the robustness of the SPCP-based monitoring method.
Again, SPCP provides similar detection efficiency as shown in Fig. 4. 3.3 Monitoring Results Based on Training data With outliers
In monitoring of IDV(6), the SPE statistic of the PCA model indicates the occurrence of the fault after the sampling point 168. However, the T2 control chart cannot detect the fault at all, although the value of T2 increases after the feed loss happens.
In this section, training data with outliers are utilized in the model training step. To simulate the effects of outliers, gross sparse errors are added into 5% training data according to a random mechanism, i.e. choosing 5% training samples randomly and doubling the magnitudes of variables selected randomly in these samples.
The SPCP model, on the other hand, shows a satisfactory performance in spite of the existence of the outliers. In T2 control chart, the process abnormality is detected at the sampling point 170. In SPE control chart, the detection is more efficient.
In such situation, PCA fails to capture the abnormal behaviors occurring to the process, since it is not a robust method, and neither the classical T2 or SPE is a robust statistic. In contrast, SPCP performs much better, and is not easy to be affected by the outliers in the training data set. 621
IFAC ADCHEM 2015 June 7-10, 2015. Whistler, BC, Canada
Chun-Yu Chen et al. / IFAC-PapersOnLine 48-8 (2015) 617–622
621
monitoring. It is easy to understand, since SPCP reduces to SVT which is similar to SVD utilized in PCA. However, when the training data are contaminated, PCA cannot maintain its performance. In the case of IDV(5), it even does not detect the fault. In comparison, the performance of SPCP is more robust. 4. CONCLUSIONS In this paper, a SPCP-based robust process monitoring method is developed, which outperforms PCA when the training data are contaminated by sparse outliers. The applications to the TE process show the effectiveness of the proposed method. In our future research, the parameter selection problem will be studied. It has been pointed out by Zhou et al. (2010) that the parameter can be specified as:
Fig. 6. SPCP monitoring results of IDV(5) based on training data with outlier
and
𝜆𝜆 = 1⁄ , √𝑝𝑝
(16)
𝜇𝜇 = √2𝑝𝑝𝜎𝜎,
(17)
where p is equal to the larger value between the row number n and the column number m of X, and σ is the standard deviation of noise. However, our experiments showed that the parameters chosen in such way perform not very well in process monitoring. A more suitable method for determining 𝜆𝜆 and 𝜇𝜇 is desired. ACKNOWLEDGMENT
This work was supported in part by the Ministry of Science and Technology of R.O.C. under Grant No. Most 103-2221E-007-123. REFERENCES CAI, J.-F. & OSHER, S. 2013. Fast singular value thresholding without singular value decomposition. Methods and Applications of Analysis, 20, 335-352. CAI, J., CAND S, E. & SHEN, Z. 2010. A Singular Value Thresholding Algorithm for Matrix Completion. SIAM Journal on Optimization, 20, 1956-1982. CAND S, E. J., LI, X., MA, Y. & WRIGHT, J. 2011. Robust principal component analysis? Journal of the ACM, 58, 1-37. DE LA TORRE, F. & BLACK, M. 2003. A Framework for Robust Subspace Learning. International Journal of Computer Vision, 54, 117-142. DOWNS, J. & VOGEL, E. 1993. A plant-wide industrial process control problem. Computers & Chemical Engineering, 17, 245-255. FISCHLER, M. A. & BOLLES, R. C. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24, 381395. GNANADESIKAN, R. & KETTENRING, J. R. 1972. Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics, 81-124. HUBER, P. J. 1981. Robust statistics, New York, Wiley.
Fig. 7. PCA monitoring results of IDV(6) based on training data with outlier
Fig. 8. SPCP monitoring results of IDV(6) based on training data with outlier From the above case studies, when the training data is free of outliers, PCA and SPCP perform similar in process 622
IFAC ADCHEM 2015 622 June 7-10, 2015. Whistler, BC, Canada
Chun-Yu Chen et al. / IFAC-PapersOnLine 48-8 (2015) 617–622
JOLLIFFE, I. 2002. Principal component analysis, New York, Springer. KE, Q. & KANADE, T. Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005). 20-25 June 2005 2005 San Diego, CA, USA. 739-746. KU, W., STORER, R. & GEORGAKIS, C. 1995. Disturbance detection and isolation by dynamic principal component analysis. Chemometrics and Intelligent Laboratory Systems, 30, 179-196. LEE, J., YOO, C., CHOI, S., VANROLLEGHEM, P. & LEE, I. 2004. Nonlinear process monitoring using kernel principal component analysis. Chemical Engineering Science, 59, 223-234. LIN, Z., GANESH, A., WRIGHT, J., WU, L., CHEN, M. & MA, Y. Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP 2009), 2009 Aruba, Dutch Antilles. LU, N., GAO, F. & WANG, F. 2004. Sub-PCA modeling and on-line monitoring strategy for batch processes. AIChE Journal, 50, 255-259. MONTGOMERY, D. C., RUNGER, G. C. & HUBELE, N. F. 2009. Engineering statistics, John Wiley & Sons. NOMIKOS, P. & MACGREGOR, J. 1995. Multivariate SPC charts for monitoring batch processes. Technometrics, 37, 41-59. QIN, S. J. 2012. Survey on data-driven industrial process monitoring and diagnosis. Annual Reviews in Control, 36, 220-234. ROUSSEEUW, P. J. & CROUX, C. 1993. Alternatives to the Median Absolute Deviation. Journal of the American Statistical Association, 88, 1273-1283. YAO, Y., CHEN, T. & GAO, F. 2010. Multivariate statistical monitoring of two-dimensional dynamic batch processes utilizing non-Gaussian information. Journal of Process Control, 20, 1187-1197. ZHOU, Z., LI, X., WRIGHT, J., CANDES, E. & MA, Y. Stable Principal Component Pursuit. 2010 IEEE International Symposium on Information Theory (ISIT 2010), 13-18 June 2010 2010 Austin, Texas, USA. 1518-1522.
623