chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
Contents lists available at SciVerse ScienceDirect
Chemical Engineering Research and Design journal homepage: www.elsevier.com/locate/cherd
Using improved self-organizing map for fault diagnosis in chemical industry process Xinyi Chen, Xuefeng Yan ∗ Key Laboratory of Advanced Control and Optimization for Chemical Processes of Ministry of Education, East China University of Science and Technology, Shanghai 200237, PR China
a b s t r a c t There are numerous fault diagnosis methods studied for complex chemical process, in which the effective methods for visualization of fault diagnosis are more challenging. In order to visualize the occurrence of the fault clearly, a novel fault diagnosis method which combines self-organizing map (SOM) with correlative component analysis (CCA) is proposed. Based on the sample data, CCA can extract fault classification information as much as possible, and then based on the identified correlative components, SOM can distinguish the various types of states on the output map. Further, the output map can be employed to monitor abnormal states by visualization method of SOM. A case study of the Tennessee Eastman (TE) process is employed to illustrate the fault diagnosis and monitoring performance of the proposed method. The results show that the SOM integrated with CCA method is efficient and capable for real-time monitoring and fault diagnosis in complex chemical process. © 2012 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved. Keywords: Self-organizing maps; Correlative component analysis; Fault diagnosis; Monitoring; TE process
1.
Introduction
Nowadays, with the development of science and technology, modern chemical process becomes more and more automatic and complex. The connection of all components in a chemical process is much closer than before. On the one hand, all these progress can increase production capacity as well as reduce production cost. On the other hand, a slight error in the process may cause a huge loss in the end because of the compact connection. So it is very important to find an effective method to monitor the whole process and detect the fault in time. Over the past decades, different approaches have been pursued in order to achieve this goal. An abundance of literature on process fault diagnosis ranging from analytical methods to artificial intelligence and statistical approaches are discussed (Venkatasubramanian et al., 2003a,b,c). Generally speaking, fault diagnosis and monitoring methods can be broadly categorized into two classes, namely, “model-based methods” and “data-based methods.” Modelbased methods, as the name implies, need to construct an accurate model of the process to detect and diagnose
deviations, such as expert systems (Muthuswamy and Srinivasan, 2003), Kalman filters (Bhagwat et al., 2003), multilinear models (Azimzadeh et al., 2001) and so on. However, it is seldom available and difficult to construct an accurate model for some highly complex processes (Venkatasubramanian et al., 2003a,b). Thus this shortcoming restricts its practical applicability, especially for complex industrial-scale process. To overcome the difficulty in developing precise model, the data-based approaches have been employed. These methods do not assume any form of model and rely on historical data of operation to characterize the process. So the data-based methods have attracted more attention and been well developed. Statistical models and neural networks are widely used as data-based methods in fault diagnosis. Statistical models include Principal Components Analysis (PCA) (Chiang et al., 2000), Partial Least Squares (PLS) (Lee et al., 2003), Canonical Variate Analysis (CVA) (Russell et al., 2000) and so on. Chiang et al. (2001) reviewed the above-mentioned statistical methods and made a comparison of them in an industrial process. The result shows that every method has both advantage and disadvantage. Kernel independent component analysis
∗ Corresponding author at: East China University of Science and Technology, P.O. Box 293, MeiLong Road No. 130, Shanghai 200237, PR China. E-mail address:
[email protected] (X. Yan). Received 20 February 2012; Received in revised form 17 May 2012; Accepted 8 June 2012 0263-8762/$ – see front matter © 2012 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.cherd.2012.06.004
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
(KICA), as a new nonlinear process monitoring method, was applied to the fault detection in a fermentation process by Zhang and Qin (2007). The application of KICA shows its superior fault-detected ability compared with ICA. Neural networks are popular methods for fault diagnosis because they have powerful classification and function approximation capability. Samanta and Al-Balushi (2003) employed neural network for fault diagnosis of rolling element bearings and the results are effective. A neural-network based analog fault diagnostic system was developed by Aminian et al. (2002) for actual circuits. This system obtains a 95% accuracy in classifying faulty components. As an unsupervised neural network, self-organizing map (SOM) effectively preserves the original topological structure of sample vectors with intuitive visualization, and has been an important tool for fault diagnosis and monitoring. The effective visualization of fault diagnosis is always a problem for scholars who are involved in the related field. Therefore, SOM based fault diagnosis methods have caused great interests and the relevant researches have been conducted ever since. The SOM is adopted in multivariate temporal data analysis by Ng and Srinivasan (2008a,b). The training data are resampled to yield equal representation of the different process states. After training, the output neurons of the SOM are clustered and labeled. The signature of normal and abnormal transitions is compared in order to monitor and detect failures. This method has better visualization and intuitiveness, especially for transitions in continuous process. It is proved by Fuertes et al. (2010) that the SOM is very useful to display dynamic features of the process, and to identify the probabilities of transition among the conditions. In Corona et al. (2010), the SOM was applied in visualizing process measurements. The relevant information is extracted by exploiting the topological structure of the observations. It is confirmed that the SOMbased approach is capable to provide valuable information and offer possibilities for direct application to process monitoring tasks. There are various literature about SOM-based approaches used in fault diagnosis, and every method has its own merits and suitable applications. However, with the improvement of modern information technology, huge amounts of data are collected from monitoring system in the process. The feature of the collected data is nonlinear and relevant. Although the SOM technique has the ability to reduce the dimensionality of the data and preserve the topology, it is still sometimes difficult to make a clear distinction between clusters, especially in complex chemical industrial process. Consequently, before the SOM training, we may find an available method to deal with the original data, in order to extract the most valuable information from the huge dataset. The most popular feature extraction method is Principal Components Analysis (PCA). Nevertheless, it is not effective in some fields because only the variance of the data is considered. Therefore, a better feature extraction method, the correlative component analysis (CCA) (Chen et al., 1996), is adopted. The characteristics extracted by CCA consider not only the relative higher variance of the input data, but also the most pertinent information for classification. Based on the extracted feature, the SOM training result is much more distinct for different clusters, and the accuracy of fault diagnosis is greatly improved. In this paper, we try to use the correlative component analysis (CCA) for feature extraction and put it into integration with SOM for fault diagnosis and monitoring in chemical industrial process. Tennessee Eastman (TE) process is a
2263
famous simulation model in chemical process. It is employed to illustrate the performance of the proposed method. The experiment results show that the SOM integrated with CCA has a better performance in fault diagnosis and monitoring compared with simple SOM, the SOM integrated with PCA method and other statistical methods. The remainder of this article is organized as follows. Section 2 provides a detailed review of the self-organizing maps. Section 3 introduces the correlative component analysis (CCA) for feature extraction. Section 4 explains fault diagnosis based on SOM integrated with CCA in details. In Section 5, we apply the proposed method to the TE process and discuss the results. Finally, a conclusion is presented in Section 6.
2.
Self-organizing maps
The self-organizing map (SOM), as an unsupervised neural network, was first proposed by Kohonen in 1989 (Kohonen, 1990). It is widely used in data exploration and clustering in finance, linguistics, sciences, and so on. Through the SOM, the high-dimensional input data are projected into a lower dimensional output map which is usually a one- or twodimensional grid. Moreover, another distinguishing feature of SOM is that it can preserve the topology of the input data from the high-dimensional input space onto the output map in such a way that relative distances between input data are more or less preserved (Vesanto and Alhoniemi, 2000). The input data points, located close to each other in the input space, are mapped to the nearby neurons on the output map. Hence, the SOM can be served as a clustering tool of high-dimensional data and have a valuable ability in data visualization.
2.1.
The SOM algorithm
The SOM algorithm is very straightforward. It has only two layers: the input and output layers. The input layer represents the input vector dataset X of size I × N, thus, xi = [xi1 , xi2 , . . ., xin , . . ., xiN ] for i = 1, 2, 3, . . ., I, where I is the total number of input samples, and N is the dimension of input samples. The output layer is an ordered collection of neurons which are normally arranged as either hexagonal or rectangular lattices. The hexagonal lattice is usually preferred because it offers a better visualization. Each neuron is connected to the input layer via weight vectors (reference vectors). The dimensionality of the weight vector of each output neuron is the same as the dimensionality of the input vector, thus, wj = [wj1 , wj2 , . . . , wjn , . . . , wjN ] for j = 1, 2, 3, . . ., J, where J is the total number of neurons on the map. These weight vectors of the output neurons are compared with the input vectors according to some distance measures, in order to determine the degree of activation of that neuron. The Euclidean distance is usually used as the criterion to compare the input vector and the weight vector of each output neuron. After the comparison, the neuron whose reference vector has the smallest difference from xi is identified and defined as the winner neuron which is called the best matching unit (BMU) for the input:
bi = argmin xi − wj j
for i = 1, 2, 3, . . . , I, j = 1, 2, 3, . . . , J (1)
After the selection of the winning neuron b, the weight vector of the winning neuron is updated by a certain amount to
2264
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
make it closer to the input vector. In addition, the weight vectors of neighboring neurons are also updated, but to a lesser degree. As a consequence of this adjustment, the updated weight vectors of the neuron in the SOM are changed by a small amount, that is why they are more similar to the input vector. The weight update function is shown in the following function:
wj (t + 1) = wj (t) + ˛(t)hbj (t) xi (t) − wj (t)
hbj (t) = exp
−
2 2 (t)
√ J=5 I
(4)
And the aspect ratio is specified by the square root of the ratio between the two largest eigenvalues of the covariance matrix of X (Lamrini et al., 2011).
(2)
2.2.
where ˛(t) is the learning rate parameter, and hbj (t) is the neighborhood function. Usually, the learning rate parameter and the neighborhood function are gradually decreased during the training process. Let the location of the jth neuron be rj , where rj ∈ 2 , ri is the position of the neurons on the output map and rb is the position of the winning neuron on the output map. A popular choice of neighborhood function is the Gaussian function given by
rb − rj 2
reasonable optimum solution of the number of the neurons is the heuristic formula:
(3)
where (t) is the neighborhood width of the neurons. (t) decreases with time in order to control the size of the neighboring neurons at time t. In order to guarantee the convergence, (t) and ˛(t) are endowed with a large value initially and then decreased monotonically with t. When t→ ∞, then ˛(t) → 0 and (t) approaches a small value (typically 1). The training phase ends when the number of iterations exceeds the predetermined total number of iterations, and the neurons on the output map will be labeled with their corresponding input names. In summary, the SOM algorithm is given below (Corona et al., 2010; Han and Song, 2003): Step 1: Initialize weight vectors of all neurons on the output map, set the topological neighborhood parameter hbj (t) and set the learning rate parameter ˛(t). Step 2: Choose an input vector from the input space at random. Step 3: Calculate the distance between the input vector and the weight vectors of all output neurons, then find the winning neurons as Eq. (1), and update the weight vectors of all neurons within the neighborhood of the winning neuron as Eq. (2). Step 4: Update the learning rate ˛(t) and the neighborhood function hbj (t). Step 5: Test the stopping condition. If the stopping condition is satisfied, then go to Step 6. Otherwise, go back to repeat step 2. Step 6: Label the input data name on the final output map in the way that each input vector searches for its winning neuron b and labels its name on b. There are two parameters that should be specified: the number of neurons in the SOM (J) and the aspect ratio of the two-dimensional grid. The number of the neurons determines the accuracy and capability of the SOM. The bigger map size will get the lower quantization error, but higher the topographic error, and the higher computational cost. Therefore, a
The visualization of SOM
As SOM can project high-dimensional data into twodimensional data, it is commonly used to visualization for data analysis. There are many different SOM-based visualization methods. Among them, the most widely used methods are the unified distance matrix (U-matrix), the component map and hit histogram map (Nikkilä et al., 2002).
2.2.1.
The unified distance matrix (U-matrix)
The U-matrix shows the average distances of one neuron’s weight vector to the weight vectors of its neighboring neurons. It visualizes the SOM through different colors. Usually, dark colors represent small distance, and lighter colors represent larger distance. The area having low values in the U-matrix form the cluster, while the high values of the U-matrix indicate a cluster border. Therefore, clusters can be seen on the U-matrix as dark areas with the light borders.
2.2.2.
The component map
Each component plane is associated with one original variable. It studies the contribution of the cluster structure. The values of any component of all the weight vectors can be presented separately. The component planes are helpful to identify possible dependencies by colors of different values. The dependencies can be regarded qualitatively as similar patterns in identical locations on the component planes. Obviously, it is very easy to visualize and search for similar patterns.
2.2.3.
The hit histogram map
The hit histogram map, as a SOM visualization method, shows the number of hits in each neuron on the map. It displays distribution of the data on the map. The height of the hit histogram corresponds to the size of each neuron. When the input data are projected on the map, the close input vectors will be mapped in the same area. The area is just the clusters to be visualized.
3.
The correlative components analysis
Nowadays there are some new challenges in the fault diagnosis and monitoring. An abundant of high-dimensional original data is achieved from both normal and fault processes through the modern instruments. However, not all the variables have equal importance for the object. Some variables have little or even no effect on classification, and some variables are relevant with each others. If all variables are taken into the same account as the input variables of the SOM, it will be extremely complicated and even impossible to achieve reasonable results with the fault diagnosis. To overcome this problem, a method of feature extraction is required to extract some important information from the primary messages. This method is able to keep useful information as much as possible. There are many feature extraction methods in the literature,
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
and the typical method is the Principal Components Analysis (PCA) (Kämpjärvi et al., 2008). While the PCA only considers the input data, and extracts the components mainly according to their variance values. Therefore, the components extracted by PCA are not significant enough in the sense of classification power. In order to overcome this shortcoming, the correlative components analysis (CCA) was designed particularly for classification problems which can extract class information as much as possible (Chen et al., 1996; Yan et al., 2001; Wang et al., 2004). Assume that the original input patterns are vectors with m elements, each pattern belongs to an m-dimensional space. The whole original sample dataset consists of n patterns, and the pattern classes is k. The elements of the patterns of the original sample construct a n × m sample data matrix, donated by X. Each class can be denoted by a unit vector in a k-dimensional class space, in which the ith class is denoted by: ei = (0, . . . 0, 1, 0, . . . , 0)
i−1
k−i
T
(5)
2265
T is not related to the data X directly. Therefore, an original weight matrix W = [w1 , w2 , . . . , wm ] is proposed, which can be used to calculate matrix T directly from X (Li et al., 2009): T = XW
(9)
where W is obtained by P and U: W = U(PT U)
−1
(10)
Usually, only several classification characteristics are selected among all CCs as the feature vectors. On the one hand, it can reduce the dimension and computational work; on the other hand, it gets rid of noise in data and gets better mapping result (Yan et al., 2005). It is obvious that the correlation index (CI) represents the classification power. Assume that the correlation index of jth CC is j , the correlative quantity portion j of CCj is defined as j =
j
j
i=1 j
(11)
where the ith element is 1, and the others are 0. Obviously, the classification matrix Y is formed by a n × k class matrix in the class space. In fact, the classification matrix Y can be changed into a n × (k − 1) matrix, and the class of every row in the matrix X can still be expressed clearly. Suppose that all columns in matrix X are standardized and all columns in matrix Y are centralized, define Z = Y T · X. It is obvious that each column in the matrix Z is centralized. Assume that matrix S = ZT Z = XT YY T X is a m × m nonnegative definite real symmetric matrix, 1 is its largest eigenvalue, and u1 is the corresponding unit eigenvector. Under the condition of uT1 · u1 = 1, then uT1 · S · u1 will reach the largest value, and since
The first l CCs are selected according to the value of correlative quantity portion. If every correlative quantity portion of first l CCs is over the set value, then the l CCs are selected as the feature vectors. The accumulated correlative quantity portion of first l CCs will reach a higher value. As the original data is characterized by multicollinearity, especially in chemical industry process, the decrease in dimensionality has remarkable effectiveness. The reduced l dimension is much less than m, and it consists of major parts of information in the original data. Consequently, CCA is a useful method to process the data and the extracted components are used as input data for SOM.
1 = uT1 · S · u1 = uT1 · XT · Y · Y T · X · u1
4. Fault diagnosis based on SOM integrated with CCA
(6)
so the 1 will also reach the highest value, and = uT · S · u is named the correlation index (CI). CI should be a maximum value with a characteristic of processing the most significant capability in the classification. The component t1 = Xu1 has the most significant classification power, and it is named as the first classification characteristic (CC1 ) for classification. Denote X1 = X, t 1 = Xu1 = X1 u1 , then reject t1 from the matrix X1 , that is to calculate p1 : p1 =
XT1 t1 t T1 t1
(7)
where p1 is a load vector of the data matrix X1 , then: X2 = X1 − t1 · pT1
In this study, SOM integrated with CCA method is employed in fault diagnosis and monitoring. CCA can extract characteristic variables from the original data which contains both normal and fault states. The CCs not only contain more variance but also preserve the classification information of faults. When the CCs were sent into SOM as input data, the output plane can be classified into clusters clearly. The integration of CCA and SOM can greatly improve the ability of fault diagnosis and make it easy for monitoring. The detailed application procedure of fault diagnosis is as follows:
(8)
Similarly, assume matrix S2 = XT2 YY T X2 , the largest eigenvalue 2 of S2 and the corresponding unit eigenvector u2 are calculated. Then the second classification characteristic (CC2 ) t2 = X2 u2 is determined. By this way, the mth classification characteristic (CCm ) will be determined. The classification characteristics corresponding to the larger CI value reflects a higher classification power. Then the total CCs is recorded as T = [t1 , t2 , . . ., tm ], and the loading matrix is P = [p1 , p2 , . . ., pm ]. Matrix U = [u1 , u2 , . . ., um ] is used to calculate matrix T, however,
1) Construct a n × m matrix X as original training data that contains both normal and faults data, and its corresponding class matrix Y of n × (k − 1) dimensions. All data are standardized and centralized. 2) Extract the classification characteristic (CCs) from matrix X, according to the CCA method introduced in Section 3. That is t1 , t2 , . . ., tm . 3) Calculate the correlative quantity portion of each classification characteristic according to Eq. (11). Set a critical value of . If every correlative quantity portion of first l CCs is over the set value, then the l CCs, t1 , t2 , . . ., tl , are selected as the feature vectors. Vectors u1 , u2 , . . ., ul and p1 , p2 , . . ., pl are determined as well. The weight matrix W l = [w1 , w2 , . . . , wl ] is calculated as Eq. (10).
2266
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
Fig. 1 – Flow diagram of the TE process.
4) Define a new training data matrix F of reduced dimension n × l. Matrix F consists of the first l CCs, i.e., F = [t1 , t2 , . . ., t l ]. 5) Set the initial parameters of SOM network, and based on the training data matrix F, train the SOM network according to the algorithm introduced in Section 2.1. 6) Construct the test data. After training, the BMU of each input vectors are found. Meanwhile, the names of fault category are labeled on the output map. After the output neurons are labeled and clustered, it has the ability for fault diagnosis. A test data is constructed to estimate the accuracy of each state of the process. Assume that the standardized original test data Xtest is I × N matrix with the ith sample being xi = [xi1 , xi2 , . . . xin , . . . xiN ], T test is calculated as Eq. (9), that is T test = Xtest W l .
(12)
7) Send T test into the trained map as test data whose dimension is reduced from m to l. The test data will seek for the best matching unit automatically by the criterion of the shortest distance between the test data vectors and the prototype vectors. 8) Calculate the accuracy of fault diagnosis by comparing the label of the mapped neuron and the actual fault state. 9) Monitor. A special data mapping technique of SOM is trajectory. If the samples follow a time-series, their response on the map can be tracked. This unique ability is very helpful to monitor the whole process. For example, if a fault is happened, then the response neuron will leave the normal area to a fault area. This visualization can help us to detect the fault at once.
5.
Experimental results and analysis
In order to illustrate the performance of fault diagnosis and monitoring with the self-organizing maps, the proposed
method is applied in the Tennessee Eastman process. In this section, three methods are employed for fault diagnosis and monitoring separately: simple SOM, SOM integrated with PCA, and SOM integrated with CCA methods. The purpose is to show the performance of different methods, and to demonstrate the advantages of SOM integrated with CCA.
5.1.
The Tennessee Eastman process
Created by the Eastman Chemical Company, the Tennessee Eastman process provides a realistic industrial process which has been widely used in fault diagnosis and monitoring (Lee et al., 2004; Singhal and Seborg, 2006). The process is made of five major units: a product condenser, a reactor, a recycle compressor, a separator, and a stripper. Eight components are A, B, C, D, E, F, G, and H. The flow diagram of the process is shown in Fig. 1. The gaseous reactants A, C, D, and E and the inert B are fed to the reactor where the liquid products G and H are formed. The reactions in the reactor are:
⎧ A(g) + C(g) + D(g) → G(liq) ⎪ ⎪ ⎨ A(g) + C(g) + E(g) → H(liq) ⎪ A(g) + E(g) → F(liq) ⎪ ⎩ 3D(g) → 2F(liq)
There are 22 continuous process measurements, 19 composition measurements, and 12 manipulated variables in the TE process which are listed in Tables 1–3 respectively. There are 52 variables in all for each observation except the agitation speed of the reactor’s stirrer (XMV(12)). In some literatures, a certain amount of representative variables were selected in order to get a better performance. In this paper, all 52 variables are used to show the performance of the SOM-based methods in fault diagnosis and monitoring.
2267
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
Table 1 – Process measurements. Variables no. XMEAS(1) XMEAS(2) XMEAS(3) XMEAS(4) XMEAS(5) XMEAS(6) XMEAS(7) XMEAS(8) XMEAS(9) XMEAS(10) XMEAS(11) XMEAS(12) XMEAS(13) XMEAS(14) XMEAS(15) XMEAS(16) XMEAS(17) XMEAS(18) XMEAS(19) XMEAS(20) XMEAS(21) XMEAS(22)
Table 4 – Process faults.
Process measurements A feed (stream 1) D feed (stream 2) E feed (stream 3) A and C feed (stream 4) Recycle flow (stream 8) Reactor feed rate (stream 6) Reactor pressure Reactor level Reactor temperature Purge rate (stream 9) Product separator temperature Product separator level Product separator pressure Product separator underflow Stripper level Stripper pressure Stripper underflow (stream 11) Stripper temperature Stripper steam flow Compress work Reactor cooling water outlet temp Separator cooling water outlet temp
Unit kscmh kg/h kg/h kscmh kscmh kscmh kPa gauge % ◦ C kscmh ◦ C % kPa gauge m3 /h % kPa gauge m3 /h ◦ C kg/h kW ◦ C ◦ C
Table 2 – Composition measurements. Variables
State
Stream no.
Sample time/min
XMEAS(23) XMEAS(24) XMEAS(25) XMEAS(26) XMEAS(27) XMEAS(28) XMEAS(29) XMEAS(30) XMEAS(31) XMEAS(32) XMEAS(33) XMEAS(34) XMEAS(35) XMEAS(36) XMEAS(37) XMEAS(38) XMEAS(39) XMEAS(40) XMEAS(41)
Composition A Composition B Composition C Composition D Composition E Composition F Composition A Composition B Composition C Composition D Composition E Composition F Composition G Composition H Composition D Composition E Composition F Composition G Composition H
6 6 6 6 6 6 9 9 9 9 9 9 9 9 11 11 11 11 11
6 6 6 6 6 6 6 6 6 6 6 6 6 6 15 15 15 15 15
Table 3 – Manipulated variables. Variable XMV(1) XMV(2) XMV(3) XMV(4) XMV(5) XMV(6) XMV(7) XMV(8) XMV(9) XMV(10) XMV(11) XMV(12)
Description D feed flow (stream 2) E feed flow (stream 3) A feed flow (stream 1) A and C feed flow (stream 4) Compressor recycle value Purge valve (stream 9) Separator pot liquid flow (stream 10) Stripper liquid product flow (stream 11) Stripper steam valve Reactor cooling water valve Condenser cooling water flow Stirring rate
Variable IDV (1) IDV (2) IDV (3) IDV (4) IDV (5) IDV (6) IDV (7) IDV (8) IDV (9) IDV (10) IDV (11) IDV (12) IDV (13) IDV (14) IDV (15) IDV (16) IDV (17) IDV (18) IDV (19) IDV (20) IDV (21)
Description A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed temperature (stream 2) Reactor cooling water inlet temperature Condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss-reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) Reactor cooling water inlet temperature Condenser cooling water inlet temperature Reaction kinetics Reactor cooling water valve Condenser cooling water valve Unknown Unknown Unknown Unknown Unknown The valve for stream 4 was fixed at the steady state position
Type Step Step Step Step Step Step Step Random variation Random variation Random variation Random variation Random variation Slow drift Sticking Sticking Unknown Unknown Unknown Unknown Unknown Constant position
The TE process contains 21 preprogrammed faults (fault 1–21) which are described in Table 4. Among them, sixteen faults are known and five are unknown. Some faults lead to huge changes in most variables which are easy to diagnose. Others cause little deviation of variables from normal behavior; hence, some methods are not effective for this kind of faults. For each pattern, the simulation time is 48 h, and the fault was introduced in the process after 8 h. All simulation data are standardized before training and testing.
5.2.
Fault diagnosis based on the simple SOM
5.2.1.
Case study on fault 1 and fault 2
Fault 1 and 2 are easy for diagnosis because more than half of the variables monitored deviate significantly from their normal operating behavior, and the simple SOM method can distinguish the faults easily. We select 80 fault observations from the whole process of fault 1 and fault 2 separately (30 min interval sampling) and 80 normal observations as training data (i = 240). 40 observations are selected separately from these three processes (normal, fault 1, and fault 2) as test data. The total test data amount is 120. In this study, the original training data (240 × 52) are used as the input data of the SOM. Then a bi-dimensional SOM is constructed. The number of map units is determined by Eq. (4), and the map size (grid ratio) is determined by the square root of the ratio between the two biggest eigenvalues of the covariance matrix of the training data. Considering the complex of the data, the map size is enlarged to four times, and then it consists of a 23 × 14 hexagonal array with 52-dimensional prototype vectors. The training result is illustrated in Fig. 2, the distribution of training data is clear. On the map, ‘N’ represents
2268
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
Fig. 3 – U-matrix of training data for normal process, fault 1 and fault 2 based on SOM. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
Fig. 2 – Map result of training data for normal process, fault 1 and fault 2 based on SOM.
the normal status; ‘F1’ and ‘F2’ refer to fault 1 and fault 2 respectively. The digital in the brackets is the number of the data that are mapped on this neuron. For instance, ‘F2(3)’ implies that there are 3 observations of fault 2 projected on the neuron. It is obvious that the data on the output space are divided into three clusters; see also the U-matrix (shown in Fig. 3). The U-matrix explains clusters by colors: areas with homogeneous blue are clusters that within small distances, while the red corresponding to the borders. The Umatrix clearly displays the presence of three well-separated status. Once the self-organizing map is calibrated by training data, it can be directly used as a reference model for test data. Test data will find the best matching unit on the map by the criterion of the shortest distance between the test data vectors and the prototype vectors. Sending the test data into the map, the mapping result is shown in Fig. 4. The results of different status of test data are separately revealed on Fig. 4(a)–(c). It is clear that different data are projected into different areas. There are only 2 samples that are wrongly classified among all 120 samples, which mean that the correct rate is 98%. Therefore, it is proved that the SOM have a great performance in fault diagnosis on the condition that variables significantly deviate from normal value.
5.2.2.
Case study of fault 4 and fault 5
In the previous study, SOM has a good performance in fault diagnosis. However, it is not so effective for fault 4 and 5. In the
process of fault 4, there are 50 variables remain steady after the fault occurs, the mean and standard deviation of each variable differ less than 2% between the fault and the normal operating condition. For fault 5, the control loops are able to compensate for the change, and variables will return to the set-point after a few hours. So, it is difficult to detect and monitor this kind of faults. Here, 80 observations are selected as training data and 40 ones as test data from the normal, fault 4, and fault 5 processes respectively. The total training data number is 240, and the total test data number is 120. The SOM map is a 23 × 14 hexagonal array with 52-dimensional prototype vectors. Then the training data is sent into the map. After training, the result is shown in Fig. 5, and the U-matrix is shown in Fig. 6. On the map, ‘N’ represents the normal status, ‘F4’ is fault 4, and ‘F5’ means fault 5. It is clear that all data of different faults are still mixed with each other, so it is difficult to be divided into clusters. Most of the neurons are mapped at least two kinds of data. Obviously, this kind of fault diagnosis is a huge challenge for simple SOM.
5.3. Fault diagnosis based on SOM integrated with PCA and CCA Considering that the simple SOM is not so effective, we hope to process input data to improve the performance. Here, we use PCA and CCA to process the original data separately beforehand and then train the SOM network.
5.3.1.
Fault diagnosis based on SOM integrated with PCA
PCA is the most popular way to reduce the dimension of the data, and its principle is to extract most variance from the
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
2269
Fig. 4 – Map result of test data for normal process, fault 1 and fault 2 based on SOM. data. The number of principal components is determined by the ratio of variance contribution. In this study, training data is the same as the data in Section 5.2.2. The ratio of variance contribution is set as 85%, and then 21 principal components
are selected. The dimension of the training data reduces from 52 to 21. The SOM map is a 23 × 14 hexagonal array with 21dimensional prototype vectors. The training result is shown in Fig. 7, and the U-matrix is shown in Fig. 8. From the result, it is easy to see that the training data are still mixed with each other. From the U-matrix, it is hardly to see any clusters. Clearly, the performance of PCA-SOM is not much better than simple SOM. Because PCA only extract most variance from training data, it seldom considers the fault classification information. In this case, the processing method is useless for fault diagnosis.
Fig. 5 – Map result of training data for normal process, fault 4 and fault 5 based on SOM.
Fig. 6 – U-matrix of training data for normal process, fault 4 and fault 5 based on SOM.
2270
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
Fig. 7 – Map result of training data for normal process, fault 4 and fault 5 based on PCA-SOM.
Fig. 9 – Map result of training data for normal process, fault 4 and fault 5 based on CCA-SOM.
5.3.2.
Fig. 8 – U-matrix of training data for normal process, fault 4 and fault 5 based on PCA-SOM.
Fault diagnosis based on SOM integrated with CCA
At first, the CCA is applied to process the original data. Then the original training data X and fault classification matrix Y are constructed. The critical value of the correlative quantity portion is 1%. As a result, the first seven CCs were selected as the new independent variables. The accumulated correlative quantity portion is over 98%. Through this way, the original 52-dimensional patterns are reduced into a 7-dimensional space. After this pretreatment, the new input data of 7dimension is sent into the map for training. The map is a 25 × 13 hexagonal array with 7-dimensional prototype vectors. The training result is shown in Fig. 9, and the U-matrix is in Fig. 10. It is clear that all training data are divided into clusters. Compared with Fig. 5, this map result has a much better performance and it proves that CCA plays an important role in classification. This improvement is also visualized in U-matrix. Fig. 11 is the component map. From the map we can see the contribution of each component to cluster construction, especially for first several correlative components. It proves that correlative components are closely associated with the classification information. 120 observations (40 for each state) are selected as test data, the same as Section 5.2.1, to get the accuracy of the fault diagnosis with the proposed method. The new test correlative components are calculated as Eq. (12), and then these processed test data are sent into the trained map. Fig. 12(a)–(c)
2271
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
Table 5 – Comparison between SOM, CCA-SOM, and other methods. Misclassification rate SOM
CCA-SOM
Chiang et al. (best/worst)
Normal IDV (1) IDV (2) IDV (4) IDV (5) IDV (6) IDV (7)
0.400 0.075 0.025 0.350 0.400 0.025 0.575
0 0.025 0.025 0.025 0.025 0 0.050
– 0.013/0.880 0.010/0.441 0.119/1 0.006/1 0/0.834 0/0.978
Average
0.264
0.021
0.021/0.733
In this experiment, in order to get a visible result, the dataset is selected with a larger sampling interval; while for some complex industrial process, the sampling interval may be shortened to achieve a more accurate monitoring ability, so that to find out fault in time and to prevent loss.
5.4. Multi-fault diagnosis based on SOM integrated with CCA
Fig. 10 – U-matrix of training data for normal process, fault 4 and fault 5 based on CCA-SOM.
separately shows the results of three states of test data. Different data are projected into different areas. Among all 120 samples there are only 4 samples that are wrongly classified, so the correct rate is 96.7%. Therefore, it is proved that the SOM integrated with CCA has a much better performance in fault diagnosis on condition that most variables are similar to normal values, compared with the simple SOM and SOM integrated with PCA method. From the improved result map, people can distinguish the classification of the fault directly. We can also use visualization method to monitor the real-time fault process. This time, test data are selected from the whole process (48 h) of fault 4 that contains both normal and fault states. During the first 8 h, normal state, we select 4 samples (2-h-interval sampling). When the fault was introduced, 20 samples are selected (30-min-interval sampling) until 18 h. During the last 30 h, we choose 10 samples (3-h-interval sampling). After dimensional reduction, we send new test data into the map, and the result is shown in Fig. 13. All data are classified into correct regions. The following diagram, Fig. 14, shows the trajectory of this process. Through the figures, we can see the temporal evolution from normal state to fault 4: initially, the observations are operated in normal area (see Fig. 14(a)), and as time goes on, the new BMUs are mapped and the trajectory is added until the process deviate from the normal region and extend to the area of fault 4 status (see Fig. 14(b) and (c)). According to the trajectory, we can achieve a real-time monitoring for processes.
In literature, numerous methods are applied to fault diagnosis of TE process. A SOM based method is tested by Ng and Srinivasan (2008a,b). It is proved that this method is efficient for fault 1–5. A modified kernel FDA is employed to fault 4, 8, and 14 by Zhu and Song (2011). The result of fault diagnosis is positive. Through the experiments in Section 5.3.2, it is testified that CCA integrated with SOM method is effective for small number of faults. Among all 21 fault states, CCA-SOM is especially effective for the faults that are associated with a step change of some process variables, such as fault 1, 2, 4, 5, 6, and 7. In this case, CCA-SOM is applied to a multi-fault diagnosis. All these six faults are employed together with the normal states. 48 observations are selected from each process as training dataset and another 40 observations as test dataset. The total training data amounts to 336, and the test data amounts to 280. All the original data are standardized. At first, the original training data is sent into the map directly. The map is a 28 × 13 hexagonal array with 52dimensional prototype vectors. The training result is shown in Fig. 15(a). The data of fault 1, 2, and 6 are divided into clusters, while other faults are still mixed with each other. Secondly, the original training data is processed by CCA. The critical value of the correlative quantity portion is 1%. The first six CCs were selected as the new independent variables. The accumulated correlative quantity portion is over 98%. Through this way, the original 52-dimensional patterns are reduced into a 6-dimensional space. After this pretreatment, the new input data of 6-dimension is sent into the map for training. The training result is shown in Fig. 15(b). Compared with Fig. 15(a), fault 7 is an added bonus. There are 4 faults are distinguished into classes. However, fault 4, 5, and normal states are mixed together. It is because the data of these faults is too close to the normal data contrasting with other faults. Confronted with this situation, fault 4, 5, and normal states can be regarded as one class. And then these three states are selected alone to train the map once more.
2272
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
Fig. 11 – Component map based on CCA-SOM.
Fig. 12 – Map result of test data for normal process, fault 4 and fault 5 based on CCA-SOM.
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
Fig. 13 – Map result of test data for whole process based on CCA-SOM.
Fig. 14 – Trajectory of whole process based on CCA-SOM.
2273
2274
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
Fig. 15 – Map result of training data for normal process, fault 1, 2, 4, 5, 6, and 7.
Here, the training data of these three states amounts to 144. After the standardization, the original data are sent into the map. The map is a 20 × 13 hexagonal array. The training result is shown in Fig. 16(a). Just like the result in Section 5.2.2, these three data are still mixed with each other. For this reason, all data are treated by the CCA. The critical value of the correlative quantity portion is 1%. Seven CCs are extracted from the original data as the training data. The accumulated correlative quantity portion is over 98%. Then the training result is displayed in Fig. 16(b). All data are divided into three classes. Through these two trained maps, all seven states are separated successfully. In order to get the accuracy of the fault diagnosis with the proposed method, 280 observations are selected as test data. When the test data are sent into the first map of CCA-SOM, the data of fault 1, 2, 6, and 7 are projected to the corresponding regions (shown in Fig. 17). The data of fault 4, 5, and normal states are mapped on the mixed region (shown in Fig. 18). It
means that these test data belong to fault 4, 5, or normal state. Then these test data are sent into the second map of CCA-SOM which is trained by the above-mentioned three states. The test result is encouraging which is displayed in Fig. 19. The misclassification rate of the test data for fault diagnosis is listed in Table 5. Table 5 compares the diagnosis result of the CCA-SOM with those of Chiang et al. and the simple SOM method. Chiang et al. (2001) employed various statistical methods, including PCA, DPCA, CVA, PLS, FDA, and so on. The best and the worst rate of Chiang et al. is listed in the fourth column. From Table 5, it is demonstrated that the misclassification rate of CCA-SOM is much lower than simple SOM and approach or even lower than the best rate of Chiang et al. While facing with other 15 faults, the misclassification rate of CCA-SOM is not so encouraging. Most of the rates are over 50%, however, they are still in the range of best/worst misclassification rate of Chiang et al. In consequence, CCA-SOM method is an effective way for fault diagnosis.
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
Fig. 16 – Map result of training data for normal process, fault 4 and 5.
Fig. 17 – Map result of test data for fault 1, 2, 6, and 7 based on CCA-SOM.
2275
2276
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
Fig. 18 – Map result of test data for normal process, fault 4 and 5 based on CCA-SOM (first map).
Fig. 19 – Map result of test data for normal process, fault 4 and 5 based on CCA-SOM (second map).
6.
Conclusions
In this work, an approach of SOM integrated with CCA for fault diagnosis and monitoring is discussed and implemented. The CCA method is very helpful in decreasing dimension of complex data, because it takes the classification information into account in dimension reduction. The most category
information of original data is extracted by the correlative components. This superiority provides a great help for SOM in data clustering. Therefore, this SOM-based method is illustrated with an application of TE process. Compared with the simple SOM, PCA-SOM, and other statistic methods, the training result of CCA-SOM shows a great improvement of accuracy in fault diagnosis. The visualization of SOM also provides a
chemical engineering research and design 9 0 ( 2 0 1 2 ) 2262–2277
good way to monitor the whole process of a fault. It is able to detect abnormal states in time and avoid more losses. In a word, the proposed SOM-based method is efficient and suitable for real-time applications, even in complex industrial process. It implies a more dependable ability and a lower misclassification rate for fault diagnosis.
Acknowledgments The authors gratefully acknowledge the supports from the following foundations: National Natural Science Foundation of China (21176073), Doctoral Fund of Ministry of Education of China (20090074110005), Program for New Century Excellent Talents in University (NCET-09-0346), “Shu Guang” Project (09SG29) and the Fundamental Research Funds for the Central Universities.
References Azimzadeh, F., Galán, O., Romagnoli, J.A., 2001. On-line optimal trajectory control for a fermentation process using multi-linear models. Comput. Chem. Eng. 25 (1), 15–26. Aminian, F., Aminian, M., Collins, H.W., 2002. Analog fault diagnosis of actual circuits using neural networks. IEEE Trans. Instrum. Meas. 51 (3), 544–550. Bhagwat, A., Srinivasan, R., Krishnaswamy, P.R., 2003. Fault detection during process transitions: a model-based approach. Chem. Eng. Sci. 58 (2), 309–325. Chen, D., Chen, Y., Hu, S., 1996. Correlative components analysis for pattern classification. Chemometr. Intell. Lab. Syst. 35, 221–229. Chiang, L.H., Russell, E.L., Richard, D.B., 2000. Fault diagnosis in chemical process using fisher discriminant analysis, discriminant partial least squares, and principal component analysis. Chemometr. Intell. Lab. Syst. 50 (2), 243–252. Chiang, L.H., Russell, E.L., Braatz, R.D., 2001. Fault Detection and Diagnosis in Industrial Systems. Springer. Corona, F., Mulas, M., Baratti, R., Romagnoli, J.A., 2010. On the topological modeling and analysis of industrial process data. Comput. Chem. Eng. 34 (12), 2022–2032. Fuertes, J.J., Dominguez, M., Reguera, P., Prada, M.A., Díaz, I., Cuadrado, A.A., 2010. Visual dynamic model based on self-organizing maps for supervision and fault detection in industrial processes. Eng. Appl. Artif. Intell. 23 (1), 8–17. Han, Y., Song, Y.H., 2003. Using improved self-organizing map for partial discharge diagnosis of large turbogenerators. IEEE Trans. Energy Conver. 18 (3), 392–399. Kohonen, T., 1990. The self-organizing map. Proc. IEEE 78 (9), 1464–1480. Kämpjärvi, P., Sourander, M., Komulainen, T., Vatanski, N., Nikus, M., Jämsä-Jounela, S.-L., 2008. Fault detection and isolation of an on-line analyzer for an ethylene cracking process. Control Eng. Pract. 16 (1), 1–13. Lee, G., Song, S.-O., Yoon, E.S., 2003. Multiple-fault diagnosis based on system decomposition and dynamic PLS. Ind. Eng. Chem. Res. 42 (24), 6145–6154. Lee, G., Han, C., Yoon, E.S., 2004. Multiple-fault diagnosis of the Tennessee Eastman Process based on system decomposition and dynamic PLS. Ind. Eng. Chem. Res. 43 (25), 8037–8048.
2277
Li, G., Qin, S.Z., Ji, Y.D., Zhou, D.H., 2009. Total PLS based contribution plots for fault diagnosis. Acta Automat. Sin. 35 (6), 759–765. Lamrini, B., Lakhal, El.-K., Lann, M.-V.L., 2011. Data validation and missing data reconstruction using self-organizing map for water treatment. Neural Comput. Appl. 20 (4), 575–588. Muthuswamy, K., Srinivasan, R., 2003. Phase-based supervisory control for fermentation process development. J. Process Control 13, 367–382. Nikkilä, J., Törönen, P., Kaski, S., Venna, J., Castrén, E., Wong, G., 2002. Analysis and visualization of gene expression data using self-organizing maps. Neural Netw. 15 (8–9), 953–966. Ng, Y.S., Srinivasan, R., 2008a. Multivariate temporal data analysis using self-organizing maps. 1: training methodology for effective visualization of multistate operations. Ind. Eng. Chem. Res. 47 (20), 7744–7757. Ng, Y.S., Srinivasan, R., 2008b. Multivariate temporal data analysis using self-organizing maps. 2: monitoring and diagnosis of multistate operations. Ind. Eng. Chem. Res. 47 (20), 7758–7771. Russell, E.L., Chiang, L.H., Braatz, R.D., 2000. Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis. Chemometr. Intell. Lab. Syst. 51 (1), 81–93. Samanta, B., Al-Balushi, K.R., 2003. Artificial neural network based fault diagnostics of rolling element bearings using time-domain features. Mech. Syst. Signal Process. 17 (2), 317–328. Singhal, A., Seborg, D.E., 2006. Evaluation of a pattern matching method for the Tennessee Eastman challenge process. J. Process Control 16 (6), 601–613. Vesanto, J., Alhoniemi, E., 2000. Clustering of the self-organizing map. IEEE Trans. Neural Netw. 11 (3), 586–600. Venkatasubramanian, V., Rengaswamy, R., Yin, K., Kavuri, S.N., 2003a. A review of process fault detection and diagnosis. Part I: quantitative model-based methods. Comput. Chem. Eng. 27 (3), 293–311. Venkatasubramanian, V., Rengaswamy, R., Yin, K., Kavuri, S.N., 2003b. A review of process fault detection and diagnosis. Part II: qualitative models and search strategies. Comput. Chem. Eng. 27 (3), 313–326. Venkatasubramanian, V., Rengaswamy, R., Yin, K., Kavuri, S.N., 2003c. A review of process fault detection and diagnosis. Part III: process history based methods. Comput. Chem. Eng. 27 (3), 327–346. Wang, H., Chen, D., Chen, Y., 2004. The integrated strategy of pattern classification and its application in chemistry. Chemometr. Intell. Lab. Syst. 70 (1), 23–31. Yan, X., Chen, D., Chen, Y., Hu, S., 2001. SOM integrated with CCA for the feature map and classification of complex chemical patterns. Comput. Chem. 25 (6), 597–605. Yan, X., Yu, J., Qian, F., 2005. The feature-preserving map of high-dimensional complex chemical objects using non-linear map integrated with correlative component analysis. Chemometr. Intell. Lab. Syst. 75 (1), 13–22. Zhang, Y., Qin, S.J., 2007. Fault detection of nonlinear processes using multiway kernel independent component analysis. Ind. Eng. Chem. Res. 46 (23), 7780–7787. Zhu, Z.B., Song, Z.H., 2011. A novel fault diagnosis system using pattern classification on kernel FDA subspace. Expert Syst. Appl. 38 (6), 6895–6905.