MULTISCALE PROCESS MONITORING WITH SINGULAR SPECTRUM ANALYSIS C. Aldrich*, G.T. Jemwa and S. Krishnannair Department of Process Engineering, University of Stellenbosch, Stellenbosch, South Africa Fax +27(21)808 2059, *E-mail:
[email protected]
Abstract: Multivariate statistical process control (MSPC) approaches are now widely used for performance monitoring, fault detection and diagnosis in chemical processes. Conventional MSPC approaches are based on latent variable projection methods such as principal component analysis and partial least squares. These methods are suitable for handling linearly correlated data sets, with minimal autocorrelation in the variables. Industrial plant data invariably violate these conditions, and several extensions to conventional MSPC methodologies have been proposed to account for these limitations. In this paper a multiscale methodology based on the use of singular spectrum analysis is proposed and its advantages are demonstrated via illustrative case studies. Copyright © 2007 IFAC Keywords: Fault Diagnosis; Multivariate Quality Control; Autocorrelation Functions; Singular Value Decomposition; Signal Processing; Signal Reconstruction.
1. INTRODUCTION Statistical process control (SPC) and its multivariate extensions based on principal component analysis (PCA) and partial least squares (PLS) are now wellestablished methods used in process monitoring, fault detection and diagnosis. PCA and PLS can extract linear relationships in highly collinear data as well as explain the dominant modes of variation in a rotated subspace of the measured variables (Kresta et al., 1991; Kourti and MacGregor and, 1995). Conventional multivariate statistical methods assume the process variables to be uncorrelated in time, with the same time-frequency localization. However, process data are invariably autocorrelated and have multiscale characteristics induced by stochastic variations, as well as the different localization and sampling frequency of variables. As a result, several extensions of MSPC methods have been proposed to address these limitations. These include dynamic PCA (Ku et al., 1995), moving PCA (Wold, 1994), and time-lagged PCA (Lee et al., 2004) which
explicitly account for serial correlation in process data. An alternative approach that extends MSPC methods is the integration of PCA and wavelet analysis in monitoring multiscale systems (Bakshi, 1998). With multiscale PCA, wavelets are used to decompose the process variables under scrutiny into multiple scale representations before application of PCA to detect and identify faulty conditions in process operations. In this way autocorrelation between variables are implicitly accounted for, resulting in a more sensitive method for detecting process anomalies. In this paper, an alternative multiscale representation of the data prior to PCA application is proposed. Instead of wavelets, singular spectrum analysis (SSA) (Vautard et al., 1992) is used to decompose the process variables. This approach treats the process variables as time series and explicitly accounts for the autocorrelation between the process variables. Unlike wavelet analysis where fixed basis functions are used, SSA is based on data-adaptive
basis functions, which makes the method more flexible than other spectral techniques. Moreover, SSA provides a qualitative decomposition of a signal into deterministic and stochastic parts. In the next section a brief review of the SSA methodology is given. This is followed by a description of the proposed SSA-based multiscale process monitoring scheme. The technique is subsequently demonstrated by application to simulated and industrial data. 2. SINGULAR SPECTRUM ANALYSIS Singular spectral time series analysis involves singular value decomposition SVD) of a trajectory or lagged covariance matrix obtained from the original time series, followed by reconstruction of the series using subsets of eigenfunctions and corresponding principal components. The basic version of SSA consists of four steps (Golyandina et al., 2001), viz. (i) augmentation or embedding of the time series, (ii) SVD of the trajectory matrix, (iii) grouping of the components, and (iv) reconstruction of the time series, which are described in more detail as follows.
values, and the set of ordered singular values
λ1 > λ2 ≥ ... ≥ λM is
Each eigenvector extracted in Step 2 can be used to construct a time series of length (n-M+1) by projecting the vectors along each principal direction k: M
z k (t ) =
x(t + j − 1)a k ( j ),
for t=1,2,…,n-M+1. The zk’s are called the k’th principal component scores and represent the projection of the original time series onto the k’th eigenvector in the rotated coordinate space. Therefore, with p < M leading components selected to represent the time series, the p-dimensional score vectors of the decomposed matrix Z are given by
z (t ) = z1 (t ) z 2 (t ) ... z p (t ) ′ ,
[
]
2.4 Reconstruction.
X=
x1
x2
xM
x2
x3
x M +1
x n − M +1
xn − M + 2
(2)
xn
2.2 Singular Value Decomposition. An M x M covariance matrix CX of X is constructed from the trajectory matrix, i.e.
CX =
1 n − M +1
X′X
(3)
and the decomposition of the trajectory matrix is obtained as
C X a k = λk a k ,
k = 1, 2, ... , M
(4)
where ak and λk are the k’th eigenvector and eigenvalue respectively. The square roots of the nonnegative eigenvalues (
λk ) are called the singular
(5)
j =1
Given a time series x(t), for t=1,2,…,n, a trajectory matrix is constructed by sliding a window of length M along the time series to give lagged vectors xi ∈ ℜM:
for i=1,2,…,n-M+1. The vectors xi thus formed are collected in an augmented multidimensional time series known as a trajectory matrix of the form
singular
2.3 Grouping of Components.
for t=1,2,…,n-M+1.
(1)
the
spectrum. The ordering implies that the k’th eigenvalue explains more of the variance in the data than the (k+1)’th eigenvalue.
2.1 Embedding.
x i = ( x (i ), x (i + 1),..., x(i + M − 1) )′ ,
called
(6)
Convolution of a set of principal components Z with corresponding eigenvector or principal directions recovers phase information lost in the preceding decomposition x (t + j − 1) =
p≤ M
z k (t )a k ( j ),
(7)
k =1
for t = 1, 2, …, n-M+1 and j = 1, 2, …, M. Reconstruction of the data can be performed via a diagonal averaging procedure
x (i ) =
1 A
i
p
z k (i − j )a k ( j ),
(8)
j =1 k =1
where A is a scaling factor that depends on the index i (Vautard et al., 1992). SSA is closely related to principal component analysis performed on the trajectory or lagged matrix (eq. 3). Consequently, all mathematical and statistical properties associated with PCA apply to SSA. Fig. 1 illustrates a typical multiscale representation based on SSA decomposition.. Note the resulting signal-to-noise ratio enhancement that provides for separation of deterministic and stochastic components. The use of these reconstructed signals allows for automatic localizations of intermittent oscillations that is not possible with the principal components.
X∈ℜNxm
10 (a)
…
0
X’1
X’2
…
X’m
-10 5 (b)
M multiscale approximations of X after diagonal averaging
0 -5 2
(c)
Reconstruction X’1,1 of trajectory = t1,1p1,1T matrices associated with variables via different X’1,2 principal = t1,2p1,2T components
0 -2 2
(d)
0
10 15 time index
20
25
Fig. 1. Decomposition of an (a) observed signal into multiple scale representations using singular spectrum analysis. Here, a sliding window M=5 was used. The significant or deterministic component (b) explaining the most variation in the signal is associated with the leading eigenelement, with (c)–(e) progressively explaining less variability. In particular, the last signal is associated with high frequency components. 2.5 Process Monitoring Methodology. Analysis of the process variables at different resolution levels can be accomplished by applying SSA to each variable separately using a common window of size M. More formally, given n observations on m variables X∈ℜn×m, each variable xj, for j = 1, 2, … m, is decomposed by expressing its corresponding trajectory matrix in terms of an ordered series of score and loading vector products, i.e. xj = tj,1pj,1T + tj,2pj,2T + … tj,Mpj,MT. Scaled versions of the original data matrix X are then reconstructed by
X i = r (t 1,i p1,i ) T
T
r (t 2,i p 2,i )
T
r (t m ,i p m ,i )
(9)
where r(⋅) is the diagonal averaging function (eq. 8). Hence, at the coarsest level (scale i = 1), the data matrix X will be represented by all m the process variables reconstructed from their respective first principal component loading and score vector pairs only
X1 = r (t1,1p1,1 ) T
T
r ( t 2,1p 2,1 )
T
r (t m ,1p m ,1 ) (10)
This is repeated for all M the scales, resulting in M representations of the data, as shown in Fig. 2 below. The M approximations of the original data are subsequently monitored separately. That is, each reconstructed matrix j’is decomposed using PCA:
cov( X′j )p i , j = p i , j λi , j
(11)
for i = 1,2, …,M and j = 1,2,…,m, where λ i,j is the eigenvalue associated with the j’th eigenvector i,j of the i’th representation of the original data set.
X’M = [X’1,M|X’2,M|…|X’m,M ]
X’1,M T = t1,Mp1,M
X’2,M = t2,M p2,M T
… = tX’m,M p
m,M mM
T
X2
…
…
…
5
X’
X1
…
0
… = tm,2m2 pm,2T
…
0 -1
X’2,2 = t2,2p2,2T
…
(e)
…
…
-2 1
X’m,1 = tm,1pm,1T
X’2,1 = t2,1p2,1T
XM
…
Diagonal averaging
Fig. 2. A schematic summary of multivariate data decomposition using SSA-based multiscale resolution analysis. Note that these loadings and corresponding scores i,j are not the same as those used to approximate the original data at the different scales (since the matrices from which they are derived are different). In a similar manner as with conventional PCA, the appropriate number of principal components that must be retained (A) at each scale are selected and the control limits on the monitored indexes (Hotelling’s T2 and Q statistics) are calculated using the parameters of the data set obtained under normal operating conditions (e.g. Kresta et al., 1991). Hotelling’s T2 statistic for sample k is given by Tk = t k λ t k = x k PA λ PA′ x k 2
−1
−1
(12)
where i refers to the kth row of A, the matrix of A score vectors from the PCA model, and λ -1 is the diagonal matrix containing the inverse of the eigenvalues associated with the A principal components retained in the model. The squared residuals or Q statistic resulting from approximating the data by the PCA model can be shown to be
Qk = e k e k = x k (I − PA PA ) x′k
(13)
where k is the k’th row of , and I is the identity matrix of appropriate size. The values of these two statistics are also calculated for the new data sets, i.e. the scores of the new data are calculated by projecting these data onto the A principal component loadings calculated from Equation 11;
t new , i , j = X′new , i p i , j
(14)
If, at a specific scale, 2 or for the reconstructed new data set is outside the calculated control limits, the process is judged to be out of control. The use of multiple tests increases false positives error, i.e. the likelihood of incorrectly assigning an event as abnormal. For a set of independent tests, the
significance level of each test must be adjusted such that the overall significance for all tests taken together equals the nominal value. Bonferroni’s method can be used to adjust the significance values (α) at each level (Bakshi, 1998; Yoon and McGregor, 2004). Thus, for an M level decomposition, the required adjustment is given by
α adj = 1 − (1 − α nominal )
1/ M
Table 1. Settings of fault conditions of 2 ×2 system Case
(15)
Below, the proposed methodology is demonstrated and assessed by means of two case studies: A simulated system, as well as data from a base metal flotation plant.
x (t ) = Ax(t − 1) + Bu (t − 1) u (t ) = Cu (t − 1) + Dw (t − 1)
(16)
y (t ) = x (t ) + v (t ) where A= 0.118 − 0.191 , B= 1.0 0.847
0.264
2. 0 , 3.0 − 4.0
C= 0.811 − 0.226 , D= 0.193 0.477
0.415
0.689 , − 0.320 − 0.749
u(t) is correlated input at time t, v(t) and w(t) are zero mean Gaussian noise, with variances of 0.1 and 1 respectively. The inputs u(t) and outputs y(t) are measured and used to monitor the system. Eight fault conditions were simulated as shown in Table 1. Fault conditions 1-5 were generated by progressively larger shifts in the mean of the first component of the w vector, while the last fault conditions were generated by changing one of the model parameters in equation 16.
Magnitude
0 1
Normal conditions Mean shift of w1
– 0 → 0.5
2
Mean shift of w1
0 → 1.0
3
Mean shift of w1
0 → 1.5
4
Mean shift of w1
0 → 2.0
5
Mean shift of w1
0 → 3.0
6
Change of parameter from u1 to x2 Change of parameter from u1 to x2 Change of parameter from u1 to x2
3 → 2.5
7
3. CASE STUDY I: A 2x2 DYNAMIC PROCESS The multivariate system considered by Ku et al. (1995) and more recently Kano et al. (2002) is considered first. The following model represents the system.
Type
8
3 → 2.0 3 → 1.0
Similar to Kano et al. (2002), a 99% confidence limit was empirically determined using Monte Carlo simulation based on 200 realizations of normal operating conditions. Subsequently, the SSA-based multiscale MSPC was evaluated for each of the fault conditions and the percentage of samples violating the control limit, or reliability, was computed. The results reported in Table 2 are averages over 1000 simulations. For comparison, the results of alternative MSPC methods considered in Kano et al. (2002) are also indicated in Table 2, viz. conventional multivariate statistical process control (cMSPC) (e.g. Kourti and MacGregor, 1995), moving principal components (MPCA), a method based on the dissimilarity of data sets (DISSIM), as well as multiscale principal component analysis based on the use of wavelets (wavelet MSPCA) (Bakshi, 1998). A sample size of 100 was used in each simulation. The first maximal decorrelation point of variables involved was used to determine M, as indicated in Fig. 3.
Table 2. Comparison of reliability (%) of various MSPC methods (Kano et al., 2002) with SSA-based MS-PCA in monitoring of the 2 × 2 simulated process. Fault Condition Method cMSPC MPCA DISSIM wavelet MSPCA SSA MSPCA
Index T32 Q3 A4 D50 D200 T32 Q3 T6,32 Q6,3
0 1.1 1.0 0.6 1.6 1.2 0.7 0.8 1.79 3.04
1 1.3 1.0 0.7 1.4 1.2 0.5 0.9 2.14 3.64
2 2.4 1.1 5.2 1.0 2.5 0.7 1.5 2.67 3.86
3 4.3 1.2 26.7 1.5 7.8 1.3 3.0 4.1 5.84
4 8.5 1.6 71.0 5.6 22.4 3.0 7.5 7.23 10.7
5 23.0 2.3 99.9 40.7 57.6 15.6 28.8 24.91 32.39
6 1.2 1.6 0.9 2.1 1.7 0.6 3.9 1.82 17.06
7 1.3 3.2 5.3 3.5 1.3 1.1 19.8 2.04 43.45
8 2.0 9.5 75.2 50.4 8.4 4.5 57.0 2.58 68.04
u1
autocorrelation
0.8
u2 y1
0.6
y2
0.4 Mmax = 6
0.2 0 -0.2 -0.4 1
5
10
15 time lag
20
25
Fig. 3. Determination of the size of the sliding window in the SSA of the 2×2 dynamic system by use of the autocorrelation functions of the variables. In this case a size corresponding to the first point of maximal decorrelation is chosen, i.e. M = 6. On the basis of this window size, a six-level multiscale representation was constructed, the results of which are summarized in the last two rows in Table 2. As indicated in Table 2, SSA MSPCA performs at least as well as the alternative wavelet MSPCA. In fact, for all fault conditions 1-9, SSA MSPCA gave better performance, especially in detecting parameter changes. While it did not fare as well in some instances when compared with other methods (DISSIM and MPCA), it must be noted that these methods focus on detecting changes in the correlation among variables. Hence, the comparison with the multiresolution approaches is not as important, as multiscale extensions can be implemented (Kano et al., 2002).
the conventional and SSA MSPCA were based on a set of 280 samples collected under normal operating conditions. A test set of 300 samples collected under faulty conditions was projected onto each of the models. Figs 4 and 5 show the T2 monitoring charts for cMSPC and SSA MSPCA respectively. While cMSPCA is able to detect the occurrence of the fault condition after the 280th sample (Fig. 4), detection is not as pronounced and unambiguous as with the SSA MSPCA approach (Fig. 5). Moreover, since the change is detected at the coarse scale (scale = 1), which is associated with the slowly changing and deterministic components, SSA MSPCA directs troubleshooting efforts to, say, areas or tasks that are related to process dynamics, instead of sensor disturbances. 50 40
T2 statistic
1
Image data collected under steady-state normal plant conditions were used to build a reference model. These data were associated with high values of valuable metal content. Visually, they had characteristic spherical bubbles, with a much darker appearance compared to abnormal plant conditions. The SSA-based process monitoring scheme was applied using a sliding window of size M=16, determined on the basis of the autocorrelation functions of the variables. Reference models for both
20 10 0 0
100
200 300 400 sample index
500
600
Fig. 4. A plot T2-based process monitoring chart for flotation froth variables, with a superimposition of the 95% (broken lines) and 99% (solid lines) confidence limits.
70
4. BASE METAL FLOTATION PLANT
60
T 2 Statistic
In the second case study, five features extracted from digital images of surface froths in the zinc roughers of a complex Australian base metal flotation plant were considered: SNE, LNE, NNU, SM and INSTAB, as discussed by Bezuidenhout et al. (1997). The textural features of the froth were characterised using the neighbouring grey level dependence matrix method. It has been shown that such image data can be used to distinguish different operating conditions with respect to the quality of the froth, in particular the amount of gangue content and the stability of the froth.
30
50 40 30 20 10 0
100
200 300 400 sample index
500
Fig. 5. A plot T2-based process monitoring chart for flotation froth variables at the first scale using SSA MPSC, with a superimposition of the 95% (broken lines) and 99% (solid lines) confidence limits (adjusted upwards using Bonferroni’s method). 5. DISCUSSION AND CONCLUSIONS In this study, multiscale process monitoring based on the use of singular spectrum analysis is proposed, instead of making use of wavelets. When the signals are decomposed on the basis of their singular spectra,
this can be seen as equivalent to the use of wavelets that are constructed from the data themselves, and their shapes (basis functions) are adapted to fit these data accurately. In this way, in principle at least, more reliable fault detection can be achieved, as was indicated by the results from the first case study. Finally, another advantage of decomposing the data based on the singular spectra of the signals is that it lends itself readily to extension by means of nonlinear methods. For example, in theory at least, the effectiveness with which the proposed method detects process deviations can be improved by evaluating the data at the respective scales with nonlinear singular spectrum analysis (e.g. Hsieh and Wu, 2002; Jemwa and Aldrich, 2006) rather than linear PCA, as was the case in this project. The usage of generalized principal components could also be considered to obtain orthonormal bases of functions with multiscale compact supports (Saucier, 2005). REFERENCES Bakshi, B.R. (1998). Multiscale PCA with application to multivariate statistical process monitoring, AIChE Journal., 44, 1596-1610. Bezuidenhout, M., J.S.J van Deventer, and D.W. Moolman (1997). The identification of perturbations in a base metal flotation plant using computer vision of the froth surface, Minerals Engineering, 10, 1057-1073. Golyandina, N., V. Nekrutkin and A. Zhigljavsky (2001). Analysis of Time Series Structure: SSA and Related Techniques. Chapman & Hall/CRC. FL, USA. Hsieh, W.W. and A. Wu (2002). Nonlinear singular spectrum analysis. In: Proceedings of the 2002 International Joint Conference on Neural Networks, 3, 2819-2824. Jemwa, G.T. and C. Aldrich (2006). Classification of process systems with nonlinear Monte Carlo
singular spectrum analysis. Computer and Chemical Engineering , 30, 816-831. Kano, M., K. Nagao, S. Hasebe, I. Hashimoto, H. Ohno, R. Strauss and B.R. Bakshi (2002). Comparison of multivariate statistical process monitoring methods with applications to the Eastman challenge problem, Computers and Chemical Engineering, 26, 161-174. Kourti, T., and J.F. MacGregor (1995). Process analysis, monitoring and diagnosis using multivariate projection methods, Chemometrics and Intelligent Laboratory Systems, 28, 3-21. Kresta, J.V., J.F. MacGregor and T.E. Marlin (1991). Multivariate statistical monitoring of process operating performance. Canadian Journal of Chemical Engineering, 69, 35-47. Ku, W., R.H. Storer and C. Georgakis (1995). Disturbance detection and isolation by dynamic principal component analysis, Chemometrics and Intelligent Laboratory Systems, 30, 179196. Lee, C., S.W. Choi and I. Lee (2004). Sensor fault identification based on time-lagged PCA in dynamic processes, Chemometrics and Intelligent Laboratory Systems, 70, 165-178. Saucier, A. (2005). Construction of data-adaptive wavelet bases with an extension of principal component analysis, Applied and Computational Harmonic Analysis, 18, 300-328. Wold, S. (1994). Exponentially weighted moving principal component analysis and projection to latent structures, Chemometrics and Intelligent Laboratory Systems, 23, 149-161. Yoon, S. and J.F. MacGregor (2004). Principal component analysis of multiscale data for process monitoring and fault diagnosis, AIChE Journal, 50, 2891 - 2903. Vautard, R., P. Yiou and M. Ghil. Singular-spectrum analysis: A toolkit for short, noisy chaotic signals, Physica D, 58, 95-126.