MULTIVARIATE STATISTICAL PROCESS MONITORING USING MULTI-SCALE KERNEL PRINCIPAL COMPONENT ANALYSIS Xiaogang Deng, Xuemin Tian College of Information and Control Engineering China University of Petroleum DongYing, Shandong 257061, P. R. China
Abstract: A monitoring method based on multi-scale kernel principal component analysis (MSKPCA) is proposed for nonlinear dynamic processes, by combining wavelet analysis with nonlinear transformation using kernel principal component analysis. Wavelet analysis is used to analyze dynamic characteristic of process data, while the kernel principal component analysis is to capture nonlinear principal components by kernel functions. This method can simultaneously extract cross correlation, auto correlation, and nonlinearities from the data. Furthermore, a multi-scale principal component analysis similarity factor is proposed for fault identification. Simulation of CSTR process shows that the proposed method outperforms the traditional PCA method in fault detection and diagnosis. Copyright © 2006 IFAC Key words: fault diagnosis, kernel function, nonlinear principal component analysis, wavelet analysis,similarity factor
1. INTRODUCTION
nature, while most of industrial processes have nonlinear characteristics and process data have dynamic and multi-scale property.
The demands for process safety, production quality and maintenance economics have stimulated the development of process monitoring and fault diagnosis methods. Various methods have been developed to detect abnormal events in the past three decades. They can be classified into three major categories: analytical methods, knowledge-based methods, data-driven methods (Chiang et al., 2001). The first two types of methods require detailed process models and they can be applied to information-rich processes. They are not often used in large-scale industrial process. With the development of modern computer control system, large amounts of data can be collected, and data can indeed reflect the physical/chemical principles of the process. As a result, data-driven multivariate statistical methods such as principal component analysis (PCA) and Partial Least Square (PLS) have attracted great attention from both academic researchers and process engineers.
To overcome the drawbacks of PCA, many nonlinear PCA methods have been developed. Krammer (1991) first proposed a nonlinear PCA based on an autoassociative neural network that is difficult to be trained for its five-layer structure. Dong and MacAvoy (1995) presented an alternative nonlinear PCA approach which uses principal curve algorithm and neural network. Principal curve algorithm is used to generate nonlinear principal scores and neural network to build nonlinear PCA model. Schölkopf et al. (1998) proposed a kernel PCA which used kernel functions to complete nonlinear transformation. Lee et al. (2004) used this method for nonlinear process monitoring. Hiden et al. (1999) suggested non-linear principal components analysis using genetic programming. Shao et al. (1999) proposed a nonlinear PCA based upon an input-training neural network. Other nonlinear PCA methods were also developed by Lin et al. (2000) and Fourie et al. (2000).
As a popular multivariate statistical method, PCA has been successfully applied in many industry processes which projects high-dimensional, noisy and correlated data onto a lower-dimensional subspace. PCA, however, is static linear transformation in
For extracting dynamic information from multi-scale process data, many approaches based on dynamic analysis and multi-scale analysis have appeared. Ku
108
et al (1995) applied PCA to a time-lagged data matrix. Neigz and Cinar (1997) constructed a canonical variate state space model to monitor process. This is called canonical variate analysis (CVA). Wavelet analysis has good time-frequency location and multi-resolution property, Bakshi (1998) proposed multi-scale principal component analysis (MSPCA) combined with PCA to decorrelate the cross correlation among different variables, and wavelet analysis is used to capture auto correlation within individual variable. Extension of this was performed by Misra et al. (2002) by introducing multi-scale fault identification. Lu et al. (2003) utilized wavelet analysis to extract quantitative timefrequency feature for on-line process monitoring.
It is not appropriate to perform PCA decomposition in original input space with strong nonlinear data structure since PCA is a linear transformation. So KPCA is put forward to carry out nonlinear transformation. The nonlinear data in original input space is mapped into a linear high-dimensional space referred to as feature space, in which PCA is used to analyze the linear correlation. This procedure is called KPCA. It utilizes a computationally tractable solution to complete nonlinear mapping by a simple kernel function while other nonlinear PCA methods resort to neural network which is difficult to implement. It is assumed that Φ (.) is a nonlinear mapping function that projects the data in input space to feature space and the covariance matrix in feature space can be expressed as: 1 n cov(Φ ( X)) = (4) ∑ Φ (x j )T Φ (x j ) n − 1 j =1 where xj (j=1,…,n) is one row of X. In the feature space the eigenvalue problem must be solved for KPCA decomposition. cov(Φ ( X))p i = λi p i (5) It should be pointed out that it is difficult to solve the problem of Eq. (5) as Φ (.) can not be obtained in most cases. However, there exist coefficients αij (j=1,2,…n) such that
This paper proposes a multi-scale kernel PCA (MSKPCA) method that uses kernel PCA (KPCA) to handle the nonlinearities and wavelet analysis to extract multi-scale dynamics. The monitoring charts of T2 and Q at different scales are constructed to detect faults. Once fault is detected via MSKPCA, a fault identification based on multi-scale principal component similarity factor is used for identifying the type of fault and discovering the fault source. The paper is organized as follows. The concept of KPCA and wavelet analysis is introduced in Section 2. Section 3 gives the formulation of MSKPCA and monitoring strategy. Section 4 explains the multiscale principal component similarity factor. In Section 5 one example of continuous stirred tank reactor system is used to validate the proposed monitoring scheme. Finally, conclusions are presented in Section 6.
n
pi = ∑ α ij Φ (x j )T
(6)
j =1
Combining Eqs.(4), (5) and (6), the following equation can be obtained (n − 1)λi α i = Kα i (7) where K is defined as [K]ij=k(xi,,xj) = < Φ ( xi ), Φ ( x j ) > .
2.KPCA AND WAVELET ANALYSIS 2.1 PCA and KPCA
One can avoid nonlinear mapping by defining kernel function k(x, y) = < Φ ( x ),Φ ( y ) > .There are many kernel functions available. In this paper kernel
PCA is a powerful dimension-reducing technique. It produces new variables that are uncorrelated with each other and are linear combinations of original variables. PCA can be implemented by solving an eigenvalue problem. For a given data matrix X which represents m columns of measured variables at n rows of sample points during normal operation condition, the covariance matrix of X is: XT X cov( X) = (1) n −1 If X has been autoscaled by subtracting the mean of each column and dividing each column by its standard deviation, the covariance matrix is the correlation matrix. The loading vector of PCA decomposition pi, is the ith eigenvector of the covariance matrix and λi is the eigenvalue
function is selected as k(x, y) = exp(- x-y
2
c) ,
where c is one parameter specified by user. With the use of kernel function, the scores vectors are expressed as: n
t i =< p i , Φ (x) >= ∑ α ij < Φ (x j ), Φ (x) >
(8)
j =1
For further details, see the paper of Schölkopf et al. (1998). 2.2 Wavelet analysis Wavelet analysis is a powerful signal-processing tool that has been widely used in many fields such as data compression, image processing and process monitoring and diagnosis. In recent years, multi-scale methods based on wavelet analysis have demonstrated superior performance in fault detection. (Bakshi ,1998, Lu et al, 2003).
corresponding to the eigenvector pi. cov( X)pi = λi pi (2) PCA decomposes the data matrix X as: (3) X = t1p1T + t 2 p T2 + ... + t k p Tk + E where ti is scores vector, pi is loading vector, k is the number of principal components(PCs) retained in PC model, and E is residual matrix.
109
The theory of wavelet analysis implies that any finite energy square integral function f (t) can be integrated by projecting it down onto scaling functions φ j , k (t ) and wavelet functions ψ j ,k (t ) , that is, f (t) can be decomposed at multiple resolutions. According to Mallat’s theory of multi-resolution analysis, a signal f (t) can be decomposed as (Lu et al, 2003): J
f (t ) = ∑ aJ , k φ J , k (t ) + ∑∑ d j , kψ j , k (t )
(9)
φJ , k (t ) = 2− J φ (2− J t − k ), k ∈ Z
(10)
k ∈Z
Fault detection based on KPCA
AJ
measured data matrix X
wavelet analysis
Fault detection based on KPCA
DJ • • •
j =1 k ∈Z
Fault detection based on KPCA
D1
ψ j , k (t ) = 2− jψ (2− j t − k ), j = 1,..., J , k ∈ Z (11) where j is scale factor, k is the translation factor, and J is the coarsest scale. aJ ,k is called as approximation coefficients or scaling coefficients, which are f (t)’s projects on scaling functions (see Eq.12). bj,k is referred to as detail coefficients or wavelet coefficients, which are f(t)’s projects on wavelet functions (see Eq.13). (12) aJ , k =< f (t ), φ J , k (t ) > d j , k =< f (t ),ψ j , k (t ) >
Fig.1. Flow diagram of MSKPCA strategy
3.1 Fault detection based on MSKPCA The MSKPCA-based fault detection is similar to that of PCA based methods in which Hotelling’s T2 statistic and Q statistic are developed to determine if an abnormality has occurred (Lee et al., 2004).
(13)
MSKPCA’s T2 statistic is the sum of the normalized squared scores, which monitors the variation within MSKPCA principal component subspace for data matrix D1, D2, …, DJ, AJ.. It is expressed as Eq. (17). (17) T 2 = [t1 , t 2 ,...t k ]Λ −1[t1 , t 2 ,...t k ]T where tk is obtained from Eq.(8) and Λ is the diagonal matrix of the eigenvalues associated with the retained PCs. It’s 100(1- α )% confidence limit is computed by means of an F-distribution as k ( n − 1) 2 (18) Tk,n, Fk , n − k ,α α ~ n−k where n is the number of samples and k is the number of PCs retained in PC model,F (k, n-k, α )is an F-distribution with degrees of freedom k and n-k and level of significance α .
Once approximation coefficients and detail coefficients are known, the approximation signal AJ (t) and the detail signal Dj(t) can be reconstructed as A J (t ) = ∑ aJ , k φ J , k (t ) (14) k ∈Z
D j (t ) = ∑ d j , kψ j , k (t ), j = 1,..., J
(15)
k ∈Z
Then the original signal f(t) can be represented as J
f (t ) = A J (t ) + ∑ D j (t )
Fault identification based on multi-scale principal component analysis similarity factor method
(16)
j =1
According to above description, wavelet analysis can be used to decompose the signal into approximation part and detail part at multiple scales, so signal can be analyzed at the time-frequency domain rather than only at time domain. The MSKPCA approach presented in Section 3 uses this property to develop the multi-scale analysis of process data for process monitoring and fault diagnosis.
On the other hand, MSKPCA’s Q statistic monitors the variation in the residual subspace computed as n
k
i =1
i =1
Q = ∑ t i2 −∑ t i2
3. PROCESS MONITORING STRATEGY BASED ON MSKPCA
(19)
The confidence limit for Q statistic can be calculated by different means. In this paper, it is computed by fitting a weighted χ 2 distribution as
The MSKPCA method of combing kernel PCA and wavelet analysis for process monitoring is proposed in this section. This method is motived from the idea that KPCA can decorrelate the cross correlation among variables while wavelet analysis can capture the information at multiple scales and decorrelate the auto correlation among the individual variable. The flow diagram of the proposed strategy is shown in Fig.1. For the measured data matrix X, wavelet analysis is first used to analyze X at multiple scales, resulting in new data matrix D1, D2, …, DJ, AJ. It is necessary to point out the proper selection of coarsest scale J is important, it is chosen as 2 in this paper. Then, KPCA is performed for each new data matrix to extract information in linear feature space for fault detection. Finally, multi-scale PCA similarity factor is applied to identify fault pattern.
(20) Qα ~ g χ h2 where g, h are the parameters that can be approximated by statistical information of Q. The steps of fault detection based on MSKPCA methodology are formulated as: (I).Modelling procedure 1. Acquire and autoscale measurement data matrix X at normal operating condition. 2. For each column in X, calculate the approximate signal and detail signals by wavelet analysis. Obtain the new data matrix D1, D2… DJ, AJ. 3. For each data matrix, KPCA is performed and then MSKPCA model under normal operating condition is obtained.
110
4. Calculate the monitoring statistics (T2 and Q) of the normal operating data, and determine the confidence limits of T2 and Q statistics.
M H ( A J ) = L H ( A J ) Λ H ( A J ),
(24) M X (A J ) = L X (A J )Λ X (A J ) and Λ is the diagonal matrix of the eigenvalues associated with the retained PCs.
(II). On-line process monitoring and fault detection 5. Obtain new data for each variable and scale it. 6. Calculate the approximate signal and detail signals for new data. 7. KPCA is performed on approximation and detail signals to compute the nonlinear scores. 8. Calculate the monitoring statistics and determine whether T2 or Q exceeds its confidence limit.
A unified weighted similarity factor SMSPCA is used when considering both approximations and details. Its expression is shown as
SMSPCA = η1SMSPCA (AJ ) + η2SMSPCA (D1 ) +... + ηJ +1SMSPCA (DJ )
where η j is weighting factors between 0 and 1 and the sum of all factors equals one. When the fault is revelled in the approximation, the similarity factor for AJ is good enough for fault pattern recognition and the similarity factor for Dj does not contribute a lot to fault identification for it reflects the noise. But the similarity factor for Dj can not be omitted because sometimes it can provide some important information at detail signal. In this paper weight factor η1 is given a larger value which is chosen as 0.8 in this paper and η2 ,...,η J +1 are given smaller value.
3.2 Fault pattern identification based multi-scale PCA similarity factor In this section, a fault pattern identification strategy is presented to identify fault and locate fault source. The method is based on pattern recognition method that search for the similar faulty situation in a historical database. Fault patterns of the database are defined by the corresponding fault data sets that reflect the characteristics of the process. Once a fault is detected, the fault type can be detected by matching the patterns in the fault database via a pattern recognition method.
4. CASE STUDY The proposed process monitoring and fault diagnosis strategy is tested with a simulated process, a nonisothermal continuous stirred tank reactor (CSTR) system. The practicability of MSKPCA for process monitoring and fault diagnosis can be demonstrated as follows.
Krzanowski(1976) developed an effective pattern recognition method to measure the similarity of two data sets using a PCA similarity factor, SPCA. Considering the variance explained by each PC direction, Singhal and Seborg(2002) provided a modified PCA similarity factor SλPCA , that weights each principal component by square root of its corresponding eigenvalue. The modified method made significant improvement for fault identification. Based on this, a new multi-scale PCA similarity factor is proposed in this paper to identify fault patterns.
CSTR description The CSTR with cooling jacket dynamics and variable liquid level is simulated for process monitoring. It is assumed that the classic first order irreversible reaction happens in CSTR. The flow of solvent and reactant A into a reactor produces a single component B as an outlet stream. Heat from the exothermic reaction is removed through cooling flow of jacket. The temperature of reactor is controlled to set-point by manipulating the coolant flow. The level is controlled by manipulating the outlet flow. A schematic diagram of the CSTR with feedback control system is shown in Fig. 2.
Consider a historical data set H and a snapshot data set X that contains the same number of variables but not necessarily the same number of samples. If multiscale PCA is performed on H and X respectively, principal component subspace of D1, D2, …, DJ, AJ can be constructed for them. The PC subspace for H are noted as LH(D1), LH(D2), …, LH(DJ), LH(AJ), and the PC subspace for X are noted as LX(D1), LX(D2), …, LX(DJ), LX(AJ). The multi-scale PCA similarity factor is defined to be SMSPCA (D j ) = trace(M H (D j )T M X (D j )M X (D j )T M H (D j )) trace( Λ H (D j ) Λ X (D j ))
The data of normal operating condition and faulty conditions are generated by simulating the CSTR
, (21)
LC QC TC
QC TCF h
trace(M H ( A J ) T M X ( A J )M X ( A J )T M H ( A J )) (22) trace( Λ H ( A J ) Λ X ( A J ))
FC TC
FT
TT T
where M X (D j ) = L X (D j ) Λ X (D j ), j = 1, 2,...J
LT
CAF QF TF
j = 1, 2,..., J SMSPCA ( A J ) =
M H (D j ) = L H (D j ) Λ H (D j ),
(25)
TC
(23)
TT
Fig.2. A diagram of the CSTR system
111
CA Q
process and Gaussian noise is added to all measurements in simulation procedure. The simulation brings normal operating data and ten kinds of fault pattern data. The applied fault pattern can be seen in Table 1. These faults contain process change, sensor malfunction and valve faults.
(b)
Table 1. Process faults for CSTR system Fault F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
Description Step change in feed flow rate. The feed temperature ramps up or down. The feed concentration ramps up or down. The hear transfer coefficient ramps down. Catalyst deactivation. The coolant feed temperature ramps up or down. Set point change for the reactor temperature. The reactor temperature measurement has a bias. The coolant temperature measurement has a bias The coolant valve was fixed at the steady value.
(c) sample number
sample number
Fig.4. Fault detection results of MSKPCA with 95% confidence limits for fault F2 . (a) monitoring charts of approximation signal A2, (b) monitoring charts of detail signal D1,(c) monitoring charts of detail signal D2
In next case, Fault F5 is introduced as the catalyst deactivation introduced at sample 200. The monitoring charts of PCA and MSKPCA are compared in Figs. 5 and 6. PCA Q chart can capture this abnormality at about sample 305 and the T2 statistic at about sample 430. In contrast, the fault detection of MSKPCA at approximations A2 displays great improvement. The Q chart captures the fault at about sample 274 and the T2 detects the fault from sample 273. These two statistics can give an alarm 31 samples and 157 samples earlier respectively..
Results and discussion In this section, both PCA and MSKPCA are applied to detect faults where retained PCs explain 90% information. The detection of fault F2 and F5 are illustrated to show the effectiveness of MSKPCA. As shown in Fig.3, the T2 and Q monitoring charts of PCA method are plotted for fault F2, where the feed temperature ramps slowly. This can not be detected by T2 statistic before sample 308, while the Q statistic can not detect it at all. In contrast, the Q chart of MSKPCA with approximation signal A2 detect successfully fault F2 at sample 275, much earlier than the PCA method, while T2 chart of MSKPCA with A2 can also indicate the presence of abnormalities at about sample 278. Fig.4 (b)(c), the monitoring charts of details signal D1 D2 , can not detect the fault although some data points in monitoring charts are above the 95% confidence limit. This can be explained by the property that the details signals describe noises mainly. In the following examples, the monitoring results from details signals D1 D2 will be omitted when they can not feature the fault clearly.
sample number
sample number
Fig.5. Fault detection results to PCA with 95% confidence limits for fault F5
sample number
sample number
Fig.6. Fault detection results of MSKPCA with 95% confidence limits for fault F5
sample number
After a fault is detected, it should be identified quickly for early remedy. The proposed MSPCA similarity factor method is used to identify fault pattern. The results can be seen in Table 2. In this table, the first row represents the similarity factors between tested fault data F1 with ten fault patterns, the second row represents the similarity factors of F2, and the other rows can be analogized. It is noticeable that the factor matrix in Table 2 is not diagonal because the test data is not the same as fault pattern data. From Table 2, it is clear that the diagonal value is the largest in corresponding row and other values are relative smaller. The result guarantees the right indications of fault pattern. For example, the similarity factor between fault F5 and fault pattern 5 is largest, 0.9940, while other factors stay at about
sample number
Fig.3. Fault detection results to PCA with 95% confidence limits for fault F2
(a)
112
Table 2 The MSPCA similarity factors between test data and ten fault patterns Test data F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
1 0.9984 0.2130 0.2087 0.2818 0.2126 0.2110 0.2518 0.2074 0.2777 0.5497
2 0.2079 0.9940 0.2060 0.2708 0.2145 0.2071 0.2384 0.9318 0.2555 0.5359
3 0.2043 0.2049 0.9978 0.2355 0.2115 0.2037 0.2244 0.2046 0.2350 0.5637
4 0.2670 0.2999 0.2522 0.9760 0.2801 0.2582 0.5541 0.2130 0.7583 0.4661
Fault pattern 5 6 0.1946 0.2087 0.2012 0.2102 0.1954 0.2083 0.2546 0.2425 0.9940 0.2156 0.1988 0.9869 0.2386 0.2306 0.1953 0.2093 0.2731 0.2355 0.3586 0.5548
7 0.2473 0.2483 0.2335 0.5248 0.2473 0.2268 0.9923 0.2197 0.6485 0.9498
8 0.2146 0.8993 0.2152 0.2248 0.2166 0.2146 0.2363 0.9999 0.2344 0.5916
9 0.2729 0.2808 0.2544 0.7478 0.2893 0.2530 0.7019 0.2257 0.9991 0.6645
10 0.3144 0.3199 0.3210 0.4895 0.2899 0.3118 0.9773 0.3126 0.6671 0.9900
programming. Computers and Chemical Engineering, 23(3), 413-425. Krammer, M.A.(1991). Nonlinear principal component analysis using autoassociative neural networks. AICHE Journal, 37(2), 233243. Krazanowski, W.J.(1979). Between groups comparison of principal component analysis. Journal of . American. Statistical. Association., 74, 703-707. Ku, W.F., R.H. Storer , and C. Gerogakis(1995). Disturbance detection and isolation by dynamic principal component analysis. Chemometrics and Intelligent Laboratory Systems, 30(1), 179186. Lee, J.-M., C.K. Yoo, S.W. Choi, P.A. Vanrolleghem, and I.-B. Lee(2004). Nonlinear process monitoring using kernel principal component analysis. Chemical Engineering Science., 59(1), 223-234. Lin, W.L., Y. Qian, and X.X. Li(2000). Nonlinear dynamic principal component analysis for online process monitoring and diagnosis. Computers and Chemical Engineering., 24(2-7), 423-429. Lu, N.Y., F.L. Wang., and F.R. Gao(2003). Combination method of principal component and wavelet analysis for multivariate process monitoring and fault diagnosis. Ind. Eng. Chem. Res., 42, 4198-4207 Misra, M., H. Yue, S.J. Qin, and C. Ling(2002). Multivariate process monitoring and fault diagnosis by multi-scale PCA. Computers and Chemical Engineering. 26(9), 1281 – 1293. Negiz, A., and A. Cinar .(1997). Statistical monitoring of multivariate dynamic processes with state-space models. AIChE Journal, 43(8), 2002-2020. Schölkopf, B., A.J. Smola , and K. Muller(1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5), 1299-1319. Shao, R., F. Jia, E.B. Martin, and A. J. Morris(1999). Wavelets and non-linear principal components analysis for process monitoring, Control Engineering Practice,7(7), 865-879 Singhal, A., and D.E. Seborg (2002). Pattern matching in historical batch data using PCA. IEEE control systems magazine, 22(5), 53-63
0.25 or smaller. However, this kind of identification method has its shortcoming in needing large mounts of data because the similarity factor is accurate only when the data is enough. 5. CONCLUSIONS In this paper, a new fault detection and identification strategy based on wavelet analysis and kernel PCA has been formulated for supervising multi-scale and nonlinear process. The proposed strategy uses MSKPCA to monitor the process which effectively captures cross correlation, auto correlation, and nonlinearities among the industrial data sets. Once an abnormality is detected, a multi-scale fault identification approach based multi-scale PCA similarity factor is applied to identify the fault pattern. The application on the CSTR system shows that the proposed strategy can work well.
ACKNOWLEDGEMENT This research was supported by 863 programme of China under Grant No. 2004AA412050. REFERENCES Bakshi, B.R. (1998). Multiscale PCA with applications to multivariate statistical process monitoring. AICHE Journal. 44(7), 1596-1610. Chiang, L. H., E. L. Russell, and R. D. Braatz (2001). Fault detection and diagnosis in industrial systems. Springer: London. Dong, D., and T.J. McAvoy(1996). Nonlinear principal component Analysis based on principal component curves and neural networks. Computers and Chemical Engineering, 20(1), 65-78 Fourie, S.H., and P. Vaal(2000). Advanced process monitoring using an nonlinear multiscale principal component analysis methodology. Computers and Chemical Engineering, 24(2-7), 755-760. Hiden, H.G., M.J. Willis, M.T. Tham, and G.A. Montague(1999). Non-linear principal components analysis using genetic
113