Accepted Manuscript
Multi-subspace Factor Analysis Integrated with Support Vector Data Description for Multimode Process Monitoring Bei Wang , Zhichao Li , Xuefeng Yan PII: DOI: Reference:
S0016-0032(18)30524-6 https://doi.org/10.1016/j.jfranklin.2018.07.044 FI 3593
To appear in:
Journal of the Franklin Institute
Received date: Revised date: Accepted date:
22 March 2017 1 June 2018 30 July 2018
Please cite this article as: Bei Wang , Zhichao Li , Xuefeng Yan , Multi-subspace Factor Analysis Integrated with Support Vector Data Description for Multimode Process Monitoring, Journal of the Franklin Institute (2018), doi: https://doi.org/10.1016/j.jfranklin.2018.07.044
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Multi-subspace Factor Analysis Integrated with Support Vector Data Description for Multimode Process Monitoring Bei Wang1,2, Zhichao Li1, Xuefeng Yan1,*
(1. Key Laboratory of Advanced Control and Optimization for Chemical Processes of Ministry of
2. Shanghai Electric Windpower Group, Shanghai, P. R. China)
*: Corresponding author: Xuefeng Yan
AN US
Email address:
[email protected]
CR IP T
Education, East China University of Science and Technology, Shanghai, P. R. China
Tel\Fax Number: +86-21-64251036
M
Address: P.O. BOX 293, MeiLong Road NO. 130, Shanghai 200237, P. R. China
ED
Abstract
PT
In modern plant-wide systems, chemical industry processes are usually equipped with multiple operating modes to meet the requirements of diversification products. Accurately identifying the
CE
running-on mode therefore becomes a focal point. Meanwhile, systems produce numerous process
AC
variables, along with complex relationships, which may deteriorate the effectiveness with which statistical processes are monitored. To solve this problem, this study proposes a multimode factor analysis (FA) method that integrates Pearson’s correlation coefficient, joint probability, and support vector data description (SVDD). First, subspaces are generated automatically by using Pearson’s coefficients of correlation among variables, instead of based on prior knowledge, which is not always available. Second, the statistical indices are derived by the FA models constructed in each subspace
ACCEPTED MANUSCRIPT
and each mode. Third, the running-on mode is identified according to the joint probabilities among the statistical indices. Finally, SVDD is adopted to provide an intuitive indication for fault detection. The efficiency and availability of the proposed method are demonstrated by three case studies: a numerical simulation, the continuous stirred-tank reactor (CSTR) model, and the Tennessee Eastman
CR IP T
(TE) benchmark process.
Keywords: factor analysis, multi-subspace strategy, support vector data description; multimode
AC
CE
PT
ED
M
AN US
process; correlation coefficient
ACCEPTED MANUSCRIPT
1. Introduction
In modern industrial society, manufacturers have paid considerable attention to process monitoring to guarantee the security of factories and satisfy the high requirements for product quality. Owing to the technical progress in data collection and computation, multivariate statistical
CR IP T
process monitoring (MSPM) methods have been applied to diverse processes and significantly advanced in recent decades [1-4]. Among various MSPM methods, principal component analysis (PCA) [5, 6] can be considered the most fundamental one. It aims to reduce dimensionality and
AN US
retain main relation structures. Considering its validity, PCA has many extensions designed to handle different industrial conditions, including kernel PCA [7, 8], dynamic PCA [9, 10], nonlinear PCA [11, 12], and so on [13-15]. However, a notable disadvantage of PCA is the lack of an associated
M
probabilistic model for observed data. PCA-based methods assume that process variables are
ED
noiseless; this condition is difficult to achieve in practical industrial processes. To solve this problem, researchers propose probabilistic PCA (PPCA) [16] and factor analysis (FA) [17, 18], in
PT
which data in a density estimation framework are described. FA is more commonly used than PPCA
CE
because it assumes that all noises from the observed variables are found at different levels. By contrast, PPCA assumes that noises have the same variance. In other words, FA provides a general
AC
description of the Gaussian latent model structure, whereas PPCA is only a special case of FA. Similar to PCA, FA needs to satisfy Gaussian distribution and should be linearly related.
A trend in modern industrial processes shows that systems are increasingly becoming complicated and extensive [19]. Numerous data are generated with various complex relationships, such as linear, nonlinear, partial correlation, and independent. Under this occasion, constructing only one model for global monitoring seems inappropriate and insufficient, because the local behaviors
ACCEPTED MANUSCRIPT
will be disregarded. Therefore, the multi-subspace strategy emerges as a requirement to reduce data complexity by providing an accurate feature description [20, 21]. Westerhuis et al. [22] recommended separating variables into meaningful blocks and then built models to explain each block. Using this recommendation, Qin et al. [23] proposed calculating loadings and scores directly
CR IP T
from regular PCA and PLS models, developing decentralized monitoring and diagnosis. These methods successfully apply multi-subspace strategy to basic models and achieve effective monitoring results. However, they generate the subspaces based on prior process knowledge, which
AN US
is not always available in real industrial processes. Thus, the data-driven multi-subspace methods are proposed to divide the space automatically. Tong et al. [24] produced four subspaces based on the relevance and irrelevance between the principal component and the residual space. Ge and Song [25]
M
developed a distributed PCA method to separate the original space based on different directions of principal components. Later, Wang et al. [26] constructed subspaces by measuring the Generalized
ED
Dice’s coefficients among the loadings. All these data-driven division methods achieve expected
PT
monitoring performance. Therefore, the multi-subspace method based on data characteristics can be
CE
considered a viable and effective strategy to address complex plant-wide processes.
In addition, multimode is another common operating state for industrial processes [27, 28]
AC
because processes must shift their operating modes to cater to changes in product standards, set point, and other aspects. Therefore, traditional MSPM methods, which assume that processes only operate under a single condition, cannot be applied to such a multi-modal process. False alarms may be triggered when the process switches to another operating mode with a different confidence limit. Thereafter, monitoring the process with multiple operating modes becomes a difficult problem. To solve this issue, a series of multimode strategies have been developed. Qin et al. proposed recursive
ACCEPTED MANUSCRIPT
PCA and PLS [29, 30], which update the correlation matrix recursively to switch the status of the monitoring process. However, recursive work is carried out blindly, without considering the identified mode and the transition between modes. Hwang and Han [31] developed a monitoring methodology based on hierarchical clustering and constructed a super PCA model to monitor the
CR IP T
multi-modal processes, but this methodology works only when process behaviors share common characteristics. Zhao et al. [32] adopted principal angles to match operating modes with predefined models. Mauricio et al. [33] suggested monitoring the multimode process based on Kernel PCA.
AN US
Later, Gaussian mixture models (GMM) were widely studied and introduced to multimode process monitoring. Choi [34] expanded the concept of traditional PCA and used GMM to approximate the data pattern in subspace. Yu [35] constructed a principal components (PCs)-based GMM model and
M
proposed assessing process states with two quantification indices: negative log likelihood probability and Mahalanobis distance. Ge and Song proposed a multimode process monitoring method based on
ED
the Bayesian theorem [36] and later extended it to monitor batch processes [37]. These multimode
PT
approaches are effective, but a more general approach is needed to monitor plant-wide multimode
CE
processes.
To address high-dimensional and complex relational data from plant-wide multimode processes,
AC
a pioneering monitoring method, called CJS-FA, is proposed in this study. This method integrates FA model with correlation coefficient, joint probability, and support vector data description (SVDD). First, instead of prior knowledge, a classic statistical measurement called Pearson’s correlation coefficient is employed to estimate the correlations among variables so that the related variables can be grouped in the same subspace automatically. After classification, the variables sharing similar data characteristics in the same subspace obviously enhance monitoring capability. Moreover, using
ACCEPTED MANUSCRIPT
this approach, data complication and dimension can be jointly reduced. Second, FA is selected to establish models in each block under each operating condition because FA has a more general model structure compared with that of PCA. Thereafter, the joint probabilities of all modes are estimated with monitoring statistics from each subspace to identify the type of running-on mode. Finally,
CR IP T
SVDD is adopted to monitor the variation of all subspaces in the identified modes intuitively.
The novelty and advantages can be summarized as follows: (1) a completely data-driven unsupervised monitoring method for complex plant-wide processes with multiple operation states is
AN US
developed in this paper; (2) The process data are divided into several subspaces according to the Pearson’s coefficients rather than prior knowledge for easing the monitoring difficulties; (3) FA model of each block under each mode is established to better deal with multimode process noise; (4)
M
the running-on can be identified by joint probability; (5) the monitoring results are combined through
ED
SVDD.
The remainder of this paper is organized as follows. A detailed description of the proposed
PT
monitoring process is presented in Section 2, along with brief instructions for the related methods.
CE
Section 3 discusses the corresponding implementation procedures. Thereafter, applications to three cases and comparisons with other methods are analyzed in Section 4. Finally, some conclusions are
AC
drawn in Section 5.
2. Methodology
This section illustrates the monitoring scheme for the multimode plant-wide process, briefly reviewing related basic methods.
2.1 Correlation Coefficient for Space Division
ACCEPTED MANUSCRIPT
In plant-wide industrial processes, handling numerous high-dimensional and complex relational data in process monitoring has been a difficult problem. To accomplish dimensionality reduction and feature extraction, multi-subspace strategy is usually adopted to divide original process variables into several subspaces. Traditionally, division is implemented based on prior process knowledge.
CR IP T
However, this is not always as available as expected, especially in processes with multiple operating states [38]. In this study, a classic statistical measurement, Pearson product-moment correlation coefficient (referred to as Pearson’s correlation coefficient), is employed to divide the variables
AN US
based on data characteristics.
In statistics, Pearson’s correlation coefficient, developed by Karl Pearson, is widely used as a measure of the degree of correlation between two variables x p and xq , with the formula given as
p xq
Cov( x p , xq ) / D( x p ) D( xq )
ED
x
where Cov is the covariance, and
D( x p ) and
(1)
D( xq ) are the standard deviations. It is easy to
PT
derive that the value of x
M
follows [39]:
p xq
is a value between -1 and 1. When the linear relationship between the
CE
two variables increases, the correlation coefficient tends to 1 or -1; when one variable increases, the
AC
other variable also increases, indicating that they are positively correlated, and the correlation coefficient is greater than 0; if one variable increases, the other variable decreases, indicating that they are negatively correlated, the correlation coefficient is less than 0; if the correlation coefficient is equal to 0, there is no linear correlation between them. The correlation coefficients without plus-minus sign are to show the linearity of the two variables in essence. Variable relations can be quantified by computing the correlation coefficients among variables.
ACCEPTED MANUSCRIPT
For space division, coefficients of correlation among variables are calculated under each mode, k k with data collected from normal conditions. Assuming two variables x p and xq under mode k ,
their correlation coefficient can be written as xk p xq . In the practical system, a threshold value k must be set based on experience for fair division, and then the subspace number B should be
same subspace. For a new variable
xrk
CR IP T
determined. When xk p xq k , these two variables are considered correlated and grounded into the , if xk p xr k or xkr xq k . Then, this variable should be put
k k together with x p and xq . Thus, all variables are grouped in all modes. The setting of k should
AN US
be noted. An extremely small value causes an excessive number of subspaces, whereas an extremely large value causes difficulty in division. Thus, the threshold k is important in the division of variables. As observed, the space division completely depends on the correlation coefficient xk p xq
M
and the threshold value k , realizing data-driven division without prior knowledge.
ED
Obviously, variables whose correlation coefficients with others all stay below the threshold may exist, presenting an irrelevant relation with others. Under this circumstance, these variables can
PT
establish a new subspace or can be distributed to other existing subspaces. The basic principle of
CE
space division states that variables from different subspaces should be independent from one another, and the division should not impair the original monitoring performance.
AC
2.2 Process Monitoring Based on FA Following the division principle above, the process data X Rmn (including
m variables and
n samples) dissolve into B subspaces, as expressed in the following: X = X1 , X 2 ,..., X B
(2)
ACCEPTED MANUSCRIPT
For the b th b 1, 2,..., B subspace X b Rm n , where mb denotes the corresponding variable b
number, the variables are scaled to zero mean and unit variance for FA model structuration. The aim of the FA model is to transform the original mb -dimensional variables X b into an l -dimensional vector of latent variables Tb added with small white noise Eb , as shown in the following :[16, 40]
where the loading matrix the noise
CR IP T
X b = PbTb + Eb
Pb R mb l (l mb )
(3)
relates the two sets of variables. The variance matrix of
is expressed as E diag{i }, i 1, 2,..., mb , in which the variances are assumed at
Eb
b
AN US
different levels. When all the noise variances i are supposed to be zero, FA model is equivalent to PCA model. Moreover, if all are assumed to have the same value, then FA becomes PPCA. Therefore, FA is the general formation of the Gaussian latent model, including two special cases:
M
PCA and PPCA.
ED
In FA model, the latent variable satisfies
Tb ~ N 0, I ,
where data are conditionally independent
variable
Tb
PT
and Gaussian-distributed with unit variance. Specifically, the corresponding distribution of latent is expressed as follows:
CE
p Tb (2 )l /2 exp 1/ 2 TbT Tb
(4)
AC
With the additional noise Eb ~ N 0, E , the corresponding Gaussian distribution of the original variables
b
can be induced as
Xb
p X b p X b | Tb p Tb dTb (2 ) m /2 C
1/2
exp 1/ 2 X bT C 1 X b
(5)
where C = E Pb PbT represents the variances of the observational data, and then the Gaussian b
distribution for
Xb
can be described as X b ~ N (0, E Pb PbT ) . Therefore, the main work of FA turns b
ACCEPTED MANUSCRIPT
to estimate the parameter set {Pb , E } . The expectation and maximization (EM) algorithm is b
adopted as an iterative likelihood maximization algorithm to estimate the most probable value of the parameters {Pb , E } , which results in calculations given by b
n
1
Eb = 1 / n diag X b, z X bT, z PbTˆbT, z X bT, z z 1
where
z
(6)
CR IP T
n n Pb = X b, zTˆbT, z E{Tˆb, zTˆbT, z | X b, z } z 1 z 1
(7)
is the sample number. The detailed calculations are not displayed in this paper because of
AN US
the limitation of length. Please refer to [17, 40] for more details. For monitoring purposes, the T 2 statistics based on the FA model are constructed as follows: Tb2, z = TˆbT, z
2
X bT, z QT QX b , z 2,l
(8)
M
2 where Q = PbT Pb PbT E , and ,l is a -distribution with significance level and l degrees
2
b
Tb2
exceed the confidence limit, some fault is deemed to occur in the process. Thus, the FA
PT
statistics
ED
of freedom. Usually, a 99% confidence limit is set for process monitoring. When monitoring
CE
model is constructed to monitor the trend of observation data in each subspace.
AC
2.3 Joint Probability for Mode Identification For multimode processes, monitoring work not only involves the construction of monitoring
statistics, but also contains the mode identification of the current point. To identify the current mode type, the FA model is built in each mode with data from normal conditions. The currently observed sample
xnew R m1
should be divided to B subspaces, that is, xnew = x1 , x2 ,..., xB , and then
normalized with the normal training data from different modes. Thus, the
Tb2
statistics under each
ACCEPTED MANUSCRIPT
mode can be calculated. This study identifies the mode from a probabilistic perspective so that each statistic under the k th mode, expressed as Tk2,b , is converted into mode probability as
p( xb Mk ) e where xnew
Mk
Tk2,b
(9)
represents the k th system mode. Following the principle of joint probability, the event
approximately satisfies
CR IP T
Tb2
xnew Mk x1 Mk x2 Mk ... xB Mk , that is, the
probability of event xnew can be written as Tk2,1
e
Tk2,2
... e
Tk2,B
e
(Tk2,1 Tk2,2 ...Tk2,B )
AN US
p xnew Mk p x1 Mk p x2 Mk ... p xB Mk e
(10)
As illustrated in Section 2.2, variables from different subspaces should be nonlinear correlation.
M
Besides, the rigorous independence among subspaces cannot be satisfied easily in practical industrial processes because the sequential processes would produce interaction effects among the subspaces.
ED
Therefore, the joint probability calculated by Eq. (12) is only an approximate value and not an exact
PT
value, but it is sufficient for mode identification. The function of joint probability is to judge the mode to which the current point belongs. A high calculated joint probability indicates that the current
CE
point matches the current mode. By contrast, a low mode joint probability (close to zero) shows that
AC
the operating system is not running under this mode. Therefore, the mode of the current point can be determined by comparing joint probabilities from different modes, wherein the mode with the highest joint probability is regarded as the current mode.
2.4 SVDD for Statistics Combination
After the running-on mode is identified, the current data should be checked to see whether a certain fault exists in the operation process. For the current sample xnew , the data have been divided
ACCEPTED MANUSCRIPT
and projected into sub-spaces for statistical calculation. To monitor the variation of process statistics intuitively, SVDD [41, 42] is introduced. The aim of SVDD is to construct a hypersphere with minimum volume that contains all the training data objects so that deviation data can be distinguished when projected into this space. First, the training data yu Rm , u 1, 2,..., N , where N
: y F
CR IP T
is the sample number, is transformed into a higher dimension through a nonlinear function , which is usually simply performed by a given kernel K yu , yv . To ensure the
hypersphere with the minimum volume, SVDD must solve an optimization problem shown as
N
R 2 C u
min R , a ,
u 1
yu a R 2 u 2
s.t.
u 1, 2,
,N
AN US
follows [43]:
(11)
M
where a denotes the center of the constructed hypersphere, R represents the distance from the
u
is the slack variable. The dual form is given as N
min u
N
u 1
u 1 v 1
N
(12)
0 u C, u 1
CE
s.t.
N
u K yu , yv u v K yu , yv
PT
and
ED
center to the boundary, C is the trade-off between the hypersphere’s volume and the error number,
u 1
u
is a Lagrange multiplier, and K yu , yv yu yv
AC
where
is the kernel function. The
result of Eq. (8) shows that a set u can be obtained, and the samples yu with u 0 are called support vectors (SVs). The squared distance R 2 can be calculated as N
N
N
R K y, y 2u K 2 yu , y u v K yu , yv 2
u 1
u 1 v 1
(13)
ACCEPTED MANUSCRIPT
where yu SV . Therefore, for a new sample z , the squared distance can be estimated as N
N
N
z a K y, y 2u K z, yu K yu , yv u 1
(14)
u 1 v 1
For fault detection, an index is constructed for monitoring, as shown in the following: DR z - a / R 2
CR IP T
2
(15)
and the confidence limit is 1. This result indicates that the sample z is deemed normal when the
AN US
estimated distance is R 2 . Otherwise, the sample z is considered faulty. The data y and z are replaced by monitoring statistics generated from each subspace so that variation in each subspace can be detected in a timely manner.
M
3 Process Monitoring
ED
The proposed CJS-FA method employs correlation coefficients for space division, joint probability for mode identification, and SVDD for the result combination. The corresponding
PT
schematic diagram is shown in Figure 1, and the detailed implementation procedures are summarized
CE
as follows:
AC
Off-line modeling
Step 1: Training samples are collected from M operating modes and are standardized to zero mean and unit variance.
Step 2: Subspaces are generated following the correlation coefficients among variables.
Step 3: The FA models are established in each subspace under each mode condition.
ACCEPTED MANUSCRIPT
Step 4: The monitoring statistics y = tk2,1 , tk2,2 ,..., tk2, B are calculated with testing data under all modes. Online monitoring
Step 1: The current sample is divided from operating processes as previously obtained.
CR IP T
Step 2: The joint probability of each mode is estimated, and the mode of the current sample is determined.
Step 3: The monitoring statistics z = Tk2,1 , Tk2,2 ,..., Tk2, B of all subspaces are calculated under the
AN US
identified mode.
Step 4: The SVDD model is built with the selected statistics y = tk2,1 , tk2,2 ,..., tk2, B under the identified mode, using Eqs. (11-13).
are projected on the feature space, and the statistic DR is
ED
calculated through Eqs. (14-15).
z
M
Step 5: The monitoring statistics
PT
Step 6: The statistic DR , which measures the distance to hypersphere’s center, is monitored. A
DR 1
means that
CE
higher value DR 1 indicates abnormal operation, whereas a lower value the system is running under normal conditions.
AC
4. Applications
To demonstrate the effectiveness of the proposed CJS-FA method, applications to three cases are
discussed in this section. The monitoring performance and comparison with other methods are analyzed as well.
4.1 Numerical Example
ACCEPTED MANUSCRIPT
A simple numerical system with eight variables found in two subspaces is constructed to verify
0 0 0 0 x1 2.4527 0.5725 e1 x 4.2696 0.1566 0 0 0 0 s1 e2 2 x3 3.3541 0.4218 0 0 0 0 s2 e3 0 1.8532 0.7421 0 0 s3 e4 x4 0 x5 0 0 4.5747 0.4781 0 0 s4 e5 0 2.1598 0.5471 0 0 s23 e6 x6 0 x 0 0 0 0 0.1 0 s43 e7 7 0 0 0 0 0.1 x8 0 e8 T
(16)
represents signal sources, e1 e2 ... e8 denotes white Gaussian noises T
AN US
where s1 s2 s3 s4
CR IP T
the function of the proposed CJS-FA method. The multivariate system is simulated as follows:
with 2dBW. Three different signal sources are developed to generate different running-on modes, as shown in the following:
M
Mode 1: s1 ~ N (8.5,1.2) , s2 ~ N (7,0.8) , s3 ~ N (4.5,1) , and s4 ~ N (4,0.8) .
ED
Mode 2: s1 ~ N (1.2,3) , s2 ~ N (2.5,1) , s3 ~ N (11,1.2) , and s4 ~ N (12,0.8) .
PT
Mode 3: s1 ~ N (12,1.2) , s2 ~ N (13,1) , s3 ~ N (10,1.2) , and s4 ~ N (10,0.8) .
CE
Therefore, 400 samples used as training data and another 400 samples used as testing data are
AC
collected under each of the three modes. The training data are used for off-line modeling, whereas the testing data are employed to examine the normal condition of the process. Thereafter, the space should be divided based on the absolute correlation coefficients between each two variables. Since the numerical system is stable and the relations among the variables do not change, the division results under each mode are the same. The division result under mode 1 is shown in Fig. 2. After the threshold value 1 0.25 is specified, the correlation coefficients among the first three and the
ACCEPTED MANUSCRIPT
seventh variables are found to be above the threshold. Meanwhile, the remaining four variables are correlated with one another. Thus, two subspaces under mode 1 are generated. The same division results are obtained in the other two modes so that two subspaces are denoted as and
x4
x5
x6
x1
x2
x3
x7
T
x8 . After subspace division, the testing data are applied to each subspace in T
CR IP T
each mode to generate statistics under normal conditions.
Three cases are programmed for monitoring, as shown in the following:
Case 1: The system initially runs at mode 1 from the 1st point to the 100th point and then
AN US
successively shifts to mode 3, mode 2, and mode 1 every 100 points.
Case 2: The system initially runs at mode 1 where a ramp change of
0.1 z 100
is introduced to
s2 from the 101st point. Thereafter, the system switches to mode 2 from the 201st point to the 400th
M
point, with a step change of 1 added to s3 from the 301st point.
ED
Case 3: The system initially runs at mode 1 and turns to mode 3 from the 101st point. A step change
PT
of 0.8 is added to s3 from the 201st point to the end.
CE
The monitoring results of the above three cases are displayed in Figs. 3-5, where joint probability, identified process mode, and statistics DR are all given for the three cases. Given that the statistic from matched mode would be small and would remain below the confidence level, the
AC
T2
mode generating a comparatively large joint probability is considered the mode on which the operation process is currently running. By contrast, the mode with a small joint probability is not the currently operating mode. However, when
T2
statistics excessively deviate from their original
region and generate an extremely small joint probability, expressing the value of joint probability becomes a problem. Under such a condition, a tuning parameter is introduced to the probability
ACCEPTED MANUSCRIPT
formula as p xnew Mk e
(Tk2,1 Tk2,2 ...Tk2,B )/
, where the parameter does not affect the ranking
results but only provides a visual representation of the identification results. In the current simulation, the parameter is set as 30. Fig. 3 (a) shows that the process successively operates on mode 1, mode 3, mode 2, and mode 1, which is consistent with the real situation of case 1. The
CR IP T
corresponding mode number for process samples is shown in Fig. 3 (b) for a better explanation. The final monitoring result of the proposed CJS-FA method is shown in Fig. 3 (c). For comparison, PCA, PPCA and FA are used to deal with the data whose mode types have been identified, with the results
AN US
shown in Fig. 3 (c) as well. These four monitoring charts in Fig. 3 (c) reveal that the process is operating under normal conditions and the models have no effect on the process data.
The monitoring results of case 2 are presented in Fig. 4. Fig. 4 (a) indicates that the process
M
operates on mode 1 for the previous 200 points and then shifts to mode 2 from the 201st point,
ED
running on this mode until the end. The detailed mode number of each point is displayed in Fig. 4 (b). After mode identification, CJS-FA is used to detect the fault in the process, with results shown in
PT
Fig. 4 (c). It can be found that CJS-FA can detect the ramp change and step change accurately and
CE
quickly. For comparison, PCA, PPCA and FA are used to monitor the mode-identified data, with the monitoring results shown in Fig. 4 (c). As can be seen, all the three methods can detect fault samples,
AC
but a large part of the statistics of fault samples are below the control limit. CJS-FA achieves better monitoring performance than them. Fig. 4 (d) shows the monitoring charts of each subspace for case 2. It is obvious that the first subspace can detect the first fault successfully, while the second subspace can detect the second fault successfully. This is because the first fault is caused by s2 , which is related to
x1
x2
x3
x7 (block 1). Therefore, the first subspace can well detect the T
occurrence of the fault. However, the variables in the second subspace have not been affected, so the
ACCEPTED MANUSCRIPT
fault cannot be detected by subspace 2. Similarly, for the second fault, the second subspace containing the affected variables ( x4
x5
x8 ) contributes to the detection of the fault. T
x6
In case 3, the corresponding joint probability is displayed in Fig. 5 (a), which indicates two modes on which the process runs. The specific mode number of the process can be found in Fig. 5
CR IP T
(b) and the corresponding monitoring result of PCA, PPCA, FA and the proposed method CJS-FA is shown in Figs. 5 (c). Notably, CJS-FA has superior monitoring performance, with monitoring statistics exceeding the confidence limit after sample 200. None of PCA, PPCA and FA can monitor
AN US
the statistics well. Their monitoring statistics fluctuate around the confidence limit after the fault occurs. This can be due to the fact that the presence of a large number of unaffected variables (
x1
x2
x3
x7 ) inhibits detection of this fault. Fig. 5 (d) shows that the second subspace T
x5
x6
x8 ) can promote the detection of the fault. T
M
containing the affected variables ( x4
ED
4.2 Continuous Stirred-Tank Reactor (CSTR)
PT
The second simulation system adopts a non-isothermal continuously stirred tank reactor used by Yoon and MacGregor [44]. The corresponding schematic graph is provided in Figs. 6 . Nine
CE
monitored variables and three operating modes are listed in Table 1 and 2 [44, 45] for the case study.
AC
This process is sampled every minute, and then 500 samples are collected under normal conditions for each mode to be treated as training data. For space division, the correlation coefficients among the variables are calculated. After the threshold 0.15 is set, night variables are divided into two subspaces based on segmentation rules. The first subspace contains the first four variables and variables 7 and 8, whereas the second subspace contains variables 5, 6, and 9. For testing purposes,
ACCEPTED MANUSCRIPT
another 500 samples are obtained as well to examine the normal conditions and to prepare for further model construction. The three study cases are set as the following:
Case 1: The system initially operates in mode 1, switches to mode 2 from the 401st point, and finally
CR IP T
stays under mode 3 from the 701st point to the end.
Case 2: The system runs on mode 2 with a step of 1 K added to the cooling water temperature Tc from the 501th point to the 1000th point.
AN US
Case 3: The system runs in mode 2 with a 2 kmol/(m3min) step introduced into the inlet solute concentration CAA from the 501th point to the end.
Each case generates 1000 observations for simulation. The monitoring results are shown in Figs.
M
7-9. For comparison, the FA model is also constructed with the data from the identified model. In case 1, the joint probability of the monitoring statistics from each mode and each subspace is
ED
displayed in Fig. 7 (a), where the calculated results indicate that the process successively operates in
PT
mode 1, mode 2, and mode 3. This result of mode identification agrees with the reality. Thereafter,
CE
the monitoring statistics of CJS-FA and traditional FA are presented in Figs. 7 (b) and (c), respectively, to show the normal condition of the system.
AC
The second study case is a faulty condition, with a step added to the cooling water temperature
Tc. A deviation would be generated in outlet temperature T because of the control loop of the system, leading to an increase in the cooling water flow rate FC. Fig. 8 (a) indicates that the process is under mode 2, and Fig. 8 (b) shows the monitoring function of CJS-FA. The monitoring statistics start to deviate from sample 500, where the fault happens. Meanwhile, the FA model is constructed for
ACCEPTED MANUSCRIPT
comparison, using data from the identified mode. The inferior monitoring result in Fig. 8 (c) demonstrates the superiority of the CJS-FA method in multi-subspace.
Case 3 is another faulty process caused by the bias of the inlet solute concentration CAA. Thereafter, the outlet concentration C and the outlet temperature T would undergo floatation in this
CR IP T
loop system. Based on the joint probability in Fig. 9 (a), the process is found in mode 1 during the 1000 points. Both the proposed CJS-FA and the traditional FA can detect faults after a certain period, as presented in Figs. 9 (b)-(c). The monitoring results appear the same, but CJS-FA detects faults
AN US
with larger amplitude.
4.3 Tennessee Eastman (TE) Benchmark Process.
M
As a classic simulation system, the TE process developed by Downs and Vogel [46] has been widely used to evaluate and compare the characteristics of monitoring methods. The detailed
ED
schematic design is shown in Fig. 10. With the change of G/H mass ratios, the TE process can
PT
simulate six different modes, as presented in Table 3, and the first three modes are selected for simulation. Additional details about this system can be found in [47], and the simulation code can be
CE
downloaded from http://depts.washington.edu/control/LARRY/TE/download.html.
AC
In this study, 9 manipulated variables and 22 continuous measurements listed in Table 4 are selected for simulation. A total of 500 samples from each mode are generated with an interval of 0.05h and are regarded as training data for model construction. First, the correlation coefficients among 31 variables are estimated, and subspaces are produced based on the division rule. Table 5 shows the subspace division result, which can also be explained by practical experience. For example, variable 1 (A feed), variable 4 (Total feed), variable 25 (A feed flow valve), and variable
ACCEPTED MANUSCRIPT
26 (Total feed flow valve), which are related with one another, are grouped into subspace 1. More relations can be found by the comparison of division results with the TE process. Thereafter, 20 programmed faults, listed in Table 6, are introduced to TE process under each mode. In addition, Fault 0 without any fault is also simulated to generate the monitoring statistics under normal
CR IP T
conditions for SVDD construction. In verifying the superiority of the multi-block strategy, the monitoring results of each subspace in mode 1 are tabulated in Table 7 for illustration. Meanwhile, the monitoring results of both the traditional FA and the proposed CJS-FA are also presented in this
AN US
table for comparison. For further explanation, three cases are designed to test the function of the proposed method:
Case 1: The system normally operates on mode 1 from the 1st to the 500th point, on mode 2 from
ED
the 1501st to the 2000th point.
M
the 501st to the 1000th point, on mode 1 from the 1001st to the 1500th point, and on mode 3 from
Case 2: The system normally operates on mode 1, with fault 4 introduced at the 401st point.
PT
Thereafter, the system switches to mode 2 under normal condition at the 701st point, with fault 4
CE
introduced at the 1101st point. At the 1401st point, the system shifts to mode 3 under normal
AC
conditions, with fault 4 introduced from the 1801st to the 2100th point.
Case 3: The system is similar to Case 2, but fault 4 is changed to fault 10.
The monitoring results for Case 1 are shown in Fig. 11. The joint probability in Fig. 11 (a)
indicates that this simulation system operates in four modes: mode 1, mode 2, mode 1, and mode 3, as described by Fig. 11 (b). This mode identification result is consistent with the practical situation. Fig. 11 (c) presents the final monitoring result of the proposed CJS-FA, whereas Fig. 11 (d) shows
ACCEPTED MANUSCRIPT
the monitoring statistics in each subspace. As shown by the figures, both subspaces and the final result all denote the normal operation of the system.
In fault 4, the corresponding joint probability is given in Fig. 12 (a), which shows the correct type of operating modes for each point. Considering fault occurrence, the proposed detection method
CR IP T
is applied, with monitoring results listed in Fig. 12 (b). The monitoring statistics initially behave normally below the confidence limit until sample 400, where a fault occurs. Thereafter, the fault likely lasts for 300 points and then disappears at the 700th point. During the next 700 points, the
AN US
monitoring statistics stay below the confidence limit of 1 in the first 400 points and then exceed the threshold from the 1100th sample. For the last 700 points, the trend of the monitoring statistics is nearly the same as that in the samples from the 700th point to the 1400th point. For comparison, the
M
traditional FA model is constructed with data from the identification mode, as presented in Fig. 12
ED
(c). Without the application of a multi-subspace strategy, the monitoring statistics did not perform as well as those from CJS-FA. Significant information may be lost under numerous useless data. For
PT
further explanation, the monitoring statistics in each subspace are shown in Fig. 12 (d), where the
CE
first subspace shows effective monitoring performance for fault detection. Therefore, the function of the multi-subspace strategy can be confirmed.
AC
The third case considers fault 10, which is a random change in C feed temperature. Fig. 13 (a)
shows that the operating system runs on mode 1 from samples 1 to 700, then it runs on mode 2 from samples 701 to 1400 and on mode 3 from sample 1401 to 2100. Under each mode, fault 10 is introduced, and the monitoring statistics of CJS-FA can detect this fault, as shown in Fig. 13 (b). The monitoring result of FA is also shown in Fig. 13 (c), in which the monitoring statistics exceed the confidence limit after the fault happens, but the detection effect is no better than that of the proposed
ACCEPTED MANUSCRIPT
method. In addition, a fundamental model PCA is constructed to demonstrate the advantage of the general FA model. The monitoring result of PCA is shown in Fig. 13 (d), which indicates that the monitoring statistics only fluctuate around the confidence limit when the fault occurs. The monitoring statistics generated in each subspace are listed in Fig. 13 (e), wherein only subspace 3
CR IP T
can detect the fault in the process. The dimensionality of the process is reduced, and process data are simplified. Through the construction of SVDD, the variance in each subspace can be distinguished and can finally be presented more intuitively than before, as shown in Fig. 13 (b).
AN US
The three designed cases above have been tested, and the corresponding monitoring results show that the proposed method can correctly identify the running-on mode of the system and can provide effective monitoring results. In comprehensively illustrating the function of CJS-FA, it is compared
M
with other methods. The first two basic methods are the traditional PCA and FA. Thereafter, Jiang
ED
and Yan proposed sensitive FA (SFA), which concentrates on the most useful information of the FA model into one subspace [48]. Later, they focused on the problem of useful information being lost,
PT
and then they developed weighted FA (WFA) [49] . In comparing these methods fairly, PCA, FA,
CE
and CJS-FA are tested, with monitoring results listed in Table 8. As observed, the proposed CJS-FA performs well and exhibits superior function over other methods. In brief, the proposed CJS-FA is
AC
certified to be a realistic and effective method in addressing multimode processes with numerous complicated data.
ACCEPTED MANUSCRIPT
Conclusion
This study has proposed the CJS-FA method to monitor plant-wide processes with multiple operation modes. Given that prior knowledge is not always valid, the original space is segmented based on the correlation coefficients among variables. The principle of space division emphasizes
CR IP T
maintaining independence between variables from different subspaces. Thus, the joint probability can be used to identify the mode on which the operation system. For fault detection, the SVDD model is constructed with monitoring statistics from all subspaces in the identified mode, and it
AN US
identifies the fault intuitively and clearly. The performance of the proposed CJS-FA is demonstrated by a numerical simulation, the CSTR system, and the TE benchmark process. The simulation results and the comparison of the proposed method with other methods indicate the superiority and
M
effectiveness of the proposed method. Extensions based on the current study remain needed.
ED
Although the proposed method shows effective monitoring function on non-linear and non-Gaussian processes on some level, further studies that consider such processes should be conducted to improve
AC
CE
PT
monitoring performance.
Acknowledgments The authors gratefully acknowledge the support of the following foundations: 973 project of China (2013CB733605), National Natural Science Foundation of China (21176073) and the Fundamental Research Funds from the China Scholarship council.
ACCEPTED MANUSCRIPT
Reference [1] J. Wang, Q.P. He, Multivariate statistical process monitoring based on statistics pattern analysis, Industrial & Engineering Chemistry Research, 49 (2010) 7858-7869. [2] S. Yin, S.X. Ding, X. Xie, H. Luo, A review on basic data-driven approaches for industrial process monitoring, IEEE Transactions on Industrial Electronics, 61 (2014) 6418-6428.
CR IP T
[3] R.F. Sales, R. Vitale, S.M. de Lima, M.F. Pimentel, L. Stragevitch, A. Ferrer, Multivariate statistical process control charts for batch monitoring of transesterification reactions for biodiesel production based on near-infrared spectroscopy, Computers & Chemical Engineering, 94 (2016) 343-353.
[4] Z. Yan, C.-Y. Chen, Y. Yao, C.-C. Huang, Robust multivariate statistical process monitoring via
AN US
stable principal component pursuit, Industrial & Engineering Chemistry Research, 55 (2016) 4011-4021.
[5] R. Wehrens, Principal component analysis, in: 43-66.
Chemometrics with R, Springer, 2011, pp.
M
[6] H. Abdi, L.J. Williams, Principal component analysis, Wiley Interdisciplinary Reviews: Computational Statistics, 2 (2010) 433-459.
ED
[7] C.-Y. Cheng, C.-C. Hsu, M.-C. Chen, Adaptive kernel principal component analysis (KPCA) for monitoring small disturbances of nonlinear processes, Industrial & Engineering Chemistry Research,
PT
49 (2010) 2254-2262.
[8] O. Taouali, I. Jaffel, H. Lahdhiri, M.F. Harkat, H. Messaoud, New fault detection method based
CE
on reduced kernel principal component analysis (RKPCA), The International Journal of Advanced Manufacturing Technology, 85 (2016) 1547-1552.
AC
[9] N. Lu, Y. Yao, F. Gao, F. Wang, Two‐dimensional dynamic PCA for batch process monitoring, AIChE Journal, 51 (2005) 3300-3304. [10] L. Dobos, J. Abonyi, On-line detection of homogeneous operation ranges by dynamic principal component analysis based time-series segmentation, Chemical Engineering Science, 75 (2012) 96-105. [11] X. Liu, K. Li, M. McAfee, G.W. Irwin, Improved nonlinear PCA for process monitoring using support vector data description, Journal of Process Control, 21 (2011) 1306-1317.
ACCEPTED MANUSCRIPT
[12] Y. Mori, M. Kuroda, N. Makino, Nonlinear Principal Component Analysis, in:
Nonlinear
Principal Component Analysis and Its Applications, Springer, 2016, pp. 7-20. [13] G. Zwanenburg, H.C. Hoefsloot, J.A. Westerhuis, J.J. Jansen, A.K. Smilde, ANOVA–principal component analysis and ANOVA–simultaneous component analysis: A comparison, Journal of Chemometrics, 25 (2011) 561-567. [14] T.J. Rato, J. Blue, J. Pinaton, M.S. Reis, Translation-Invariant Multiscale Energy-Based PCA
CR IP T
for Monitoring Batch Processes in Semiconductor Manufacturing, IEEE Transactions on Automation Science and Engineering, (2016).
[15] Q. Jiang, X. Yan, B. Huang, Performance-driven distributed PCA process monitoring based on fault-relevant variable selection and Bayesian inference, IEEE Transactions on Industrial Electronics,
AN US
63 (2016) 377-386.
[16] M.E. Tipping, C.M. Bishop, Probabilistic principal component analysis, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61 (1999) 611-622. [17] A.T. Basilevsky, Statistical factor analysis and related methods: theory and applications, John
M
Wiley & Sons, 2009.
[18] R.P. McDonald, Factor analysis and related methods, Psychology Press, 2014.
ED
[19] S. Yin, S.X. Ding, D. Zhou, Diagnosis and prognosis for complicated industrial systems—Part I, IEEE Transactions on Industrial Electronics, 63 (2016) 2501-2505.
PT
[20] Y. Zhang, C. Ma, Decentralized fault diagnosis using multiblock kernel independent component analysis, Chemical Engineering Research and Design, 90 (2012) 667-676.
CE
[21] B. Song, H. Shi, Y. Ma, J. Wang, Multisubspace Principal Component Analysis with Local Outlier Factor for Multimode Process Monitoring, Industrial & Engineering Chemistry Research, 53
AC
(2014) 16453-16464.
[22] J.A. Westerhuis, T. Kourti, J.F. MacGregor, Analysis of multiblock and hierarchical PCA and PLS models, Journal of chemometrics, 12 (1998) 301-321. [23] S.J. Qin, S. Valle, M.J. Piovoso, On unifying multiblock analysis with application to decentralized process monitoring, Journal of chemometrics, 15 (2001) 715-742. [24] C. Tong, Y. Song, X. Yan, Distributed statistical process monitoring based on four-subspace construction and Bayesian inference, Industrial & Engineering Chemistry Research, 52 (2013) 9897-9907.
ACCEPTED MANUSCRIPT
[25] Z. Ge, Z. Song, Distributed PCA Model for Plant-Wide Process Monitoring, Industrial & Engineering Chemistry Research, 52 (2013) 1947-1957. [26] B. Wang, X. Yan, Q. Jiang, Z. Lv, Generalized Dice's coefficient‐based multi‐block principal component analysis with Bayesian inference for plant‐wide process monitoring, Journal of Chemometrics, 29 (2015) 165-178. [27] Z. Ge, Z. Song, P. Wang, Probabilistic combination of local independent component regression
CR IP T
model for multimode quality prediction in chemical processes, Chemical Engineering Research and Design, 92 (2014) 509-521.
[28] X. Peng, Y. Tang, W. Du, F. Qian, Multimode Process Monitoring and Fault Detection: A Sparse Modeling and Dictionary Learning Method, IEEE Transactions on Industrial Electronics,
AN US
(2017).
[29] S.J. Qin, Recursive PLS algorithms for adaptive data modeling, Computers & Chemical Engineering, 22 (1998) 503-514.
[30] W. Li, H.H. Yue, S. Valle-Cervantes, S.J. Qin, Recursive PCA for adaptive process monitoring,
M
Journal of process control, 10 (2000) 471-486.
[31] D.-H. Hwang, C. Han, Real-time monitoring for a process with multiple operating modes,
ED
Control Engineering Practice, 7 (1999) 891-902.
[32] S.J. Zhao, J. Zhang, Y.M. Xu, Monitoring of processes with multiple operating modes through
(2004) 7025-7035.
PT
multiple principle component analysis models, Industrial & engineering chemistry research, 43
CE
[33] M.L. Maestri, M.C. Cassanello, G.I. Horowitz, Kernel PCA performance in processes with multiple operation modes, Chemical Product and Process Modeling, 4 (2009).
AC
[34] S.W. Choi, J.H. Park, I.-B. Lee, Process monitoring using a Gaussian mixture model via principal component analysis and discriminant analysis, Computers & chemical engineering, 28 (2004) 1377-1387. [35] J. Yu, Fault detection using principal components-based Gaussian mixture model for semiconductor manufacturing processes, Semiconductor Manufacturing, IEEE Transactions on, 24 (2011) 432-444. [36] Z. Ge, Z. Song, Multimode process monitoring based on Bayesian method, Journal of Chemometrics, 23 (2009) 636-650.
ACCEPTED MANUSCRIPT
[37] Z. Ge, Z. Song, Bayesian inference and joint probability analysis for batch process monitoring, AIChE Journal, 59 (2013) 3702-3713. [38] Q.C. Jiang, X.F. Yan, Monitoring multi-mode plant-wide processes by using mutual information-based multi-block PCA, joint probability, and Bayesian inference, Chemometrics and intelligent laboratory systems, 136 (2014) 121-137.
Royal Society of London, (1895) 240-242.
CR IP T
[39] K. Pearson, Note on regression and inheritance in the case of two parents, Proceedings of the
[40] Z. Ge, Z. Song, Multivariate statistical process control: process monitoring methods and applications, Springer Science & Business Media, 2012.
[41] D.M. Tax, R.P. Duin, Support vector data description, Machine learning, 54 (2004) 45-66.
AN US
[42] Q. Jiang, X. Yan, Just‐in‐time reorganized PCA integrated with SVDD for chemical process monitoring, AIChE Journal, 60 (2014) 949-965.
[43] D.M. Tax, R.P. Duin, Support vector domain description, Pattern recognition letters, 20 (1999) 1191-1199.
M
[44] S. Yoon, J.F. MacGregor, Fault diagnosis with multivariate statistical models part I: using steady state fault signatures, Journal of process control, 11 (2001) 387-400.
ED
[45] Y. Ma, H. Shi, M. Wang, Adaptive local outlier probability for dynamic process monitoring, Chinese Journal of Chemical Engineering, 22 (2014) 820-827.
PT
[46] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Computers & Chemical Engineering, 17 (1993) 245-255.
CE
[47] L.H. Chiang, R.D. Braatz, E.L. Russell, Fault detection and diagnosis in industrial systems, Springer Science & Business Media, 2001.
AC
[48] Q. Jiang, X. Yan, Multivariate statistical process monitoring using modified factor analysis and its application, Journal of Chemical Engineering of Japan, 45 (2012) 829-839. [49] Q. Jiang, X. Yan, Probabilistic monitoring of chemical processes using adaptively weighted factor analysis and its application, Chemical Engineering Research and Design, 92 (2014) 127-138.
ACCEPTED MANUSCRIPT
Lists of figures and tables
Fig. 1. Schematic Diagram of CJS-FA for Process Monitoring Fig. 2. Correlation Coefficients among Variables in Mode 1
CR IP T
Fig. 3. Monitoring results of Case 1: (a) Joint Probabilities; (b) Identified Mode; (c) CJS-FA Monitoring Statistics; (d) FA Monitoring Statistics; (e) PCA Monitoring Statistics
Fig. 4. Monitoring results of Case 2: (a) Joint Probabilities; (b) Identified Mode; (c) CJS-FA
AN US
Monitoring Statistics; (d) FA Monitoring Statistics; (e) PCA Monitoring Statistics
Fig. 5. Monitoring results of Case 3: (a) Joint Probabilities; (b) Identified Mode; (c) CJS-FA Monitoring Statistics; (d) FA Monitoring Statistics; (e) PCA Monitoring Statistics
M
Fig. 6. Schematic graph of the CSTR process
ED
Fig. 7. Monitoring Results of Case 1 in CSTR
PT
Fig. 8. Monitoring Results of Case 2 in CSTR
CE
Fig. 9. Monitoring Results of Case 3 in CSTR
AC
Fig. 10. Control system of the Tennessee Eastman process
Fig. 11. Monitoring Results of Case 1 in the TE process: (a) Joint Probability; (b) Mode Identification Results; (c) Monitoring Statistic DR; (d) Monitoring Results in each Subspace
Fig. 12. Monitoring results of Case 2 in the TE process: (a) Joint Probability; (b) Monitoring Results of CJS-FA; (c) Monitoring Results of FA under Identified Mode; (d) Monitoring Results in each Subspace
ACCEPTED MANUSCRIPT
Fig. 13. Monitoring results of Case 3 in the TE process: (a) Joint Probability; (b) Monitoring Results of CJS-FA; (c) Monitoring Results of FA under Identified Mode; (d) Monitoring Results of PCA under Identified Mode; (e) Monitoring Results in each Subspace
Table 2 Three operating modes of the CSTR system Table 3 Six different operating modes of the TE process Table 4 Process monitoring variables in the TE process
AN US
Table 5 Division Result of the TE process
CR IP T
Table 1 Monitored process Variables in the CSTR system
Table 6 Process faults programmed in the TE process
Table 7 Monitoring Results of FA and CJS-FA in Mode 1
AC
CE
PT
ED
M
Table 8 The monitoring results of PCA, FA, SFA, WFA and CJS-FA
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
PT CE
1.5 1 0.5
AC
Correlation Coefficients
Fig. 1. Schematic Diagram of CJS-FA for Process Monitoring
0
0
2
4
6
8
1.5
1.5
1.5
1
1
1
0.5
0.5
0.5
0
0
Correlation Coefficients
variable number
2
4
6
8
0
0
variable number
2
4
6
8
0
1.5
1.5
1.5
1
1
1
1
0.5
0.5
0.5
0.5
0
2
4
6
8
variable number
0
0
2
4
6
variable number
2
8
0
0
2
4
6
4
6
8
variable number
1.5
0
0
variable number
8
0
0
variable number
Fig. 2. Correlation Coefficients among Variables in Mode 1
2
4
6
variable number
8
ACCEPTED MANUSCRIPT
0
Joint probability
10
-100
10
-200
10
Mode1 Mode2 Mode3
-300
10
50
100
150
200
250
300
350
(a)
3
AN US
Mode number
4
2
1
0
0
50
400
CR IP T
0
100
150
200
250
300
350
400
Samples
M
(b)
ED
20
10
PT
5
0
100
15
10
5
200 Samples
300
0
400
30
0
100
200 Samples
300
400
0
100
200 Samples
300
400
5
25
4
20
CJS-FA
FA
AC
CE
0
20
PPCA-T2
PCA-T2
15
25
15
3
2
10 1
5 0
0
100
200 Samples
300
400
0
(c) Fig. 3. Monitoring results of Case 1: (a) Joint Probabilities; (b) Identified Mode; (c) PCA, PPCA, FA and CJS-FA Monitoring Statistics
ACCEPTED MANUSCRIPT
0
Joint probability
10
-100
10
-200
10
Mode1 Mode2 Mode3
-300
0
50
100
150
200
CR IP T
10
250
300
(a)
2
1
0
0
50
100
150
AN US
3
200
250
M
Mode number
4
350
300
350
400
400
Samples
ED
(b)
70 60
50
50
PPCA-T2
60
40
CE
PCA-T2
PT
70
30
30
20
20
10
10
0
AC
40
0
100
200 Samples
300
0
400
150
0
100
200 Samples
300
400
0
100
200 Samples
300
400
350 300 250
FA
CJS-FA
100
50
200 150 100 50
0
0
100
200 Samples
300
400
0
ACCEPTED MANUSCRIPT
(c)
100
50
0
0
50
100
150
0
50
100
150
200 Samples
250
80 70
50 40 30 20 10 0
300
AN US
Subspace 2
60
CR IP T
Subspace 1
150
200 Samples
250
300
350
400
350
400
M
(d)
ED
Fig. 4. Monitoring results of Case 2: (a) Joint Probabilities; (b) Identified Mode; (c) PCA, PPCA, FA
AC
CE
PT
and CJS-FA Monitoring Statistics; (d) Monitoring Statistics in each subspace
ACCEPTED MANUSCRIPT
0
Joint probability
10
-10
10
-20
10
Mode1 Mode2 Mode3 0
50
100
150
200
CR IP T
-30
10
250
300
(a) 4
2
1
0
0
50
400
AN US
Mode number
3
350
100
150
200
250
300
350
400
M
Samples
ED
(b)
30
PT
25
20
15
PPCA-T2
PCA-T2
20 15
10
CE
10
5
5
AC
0
0
100
200 Samples
300
0
400
30
0
100
200 Samples
300
400
0
100
200 Samples
300
400
20
25 15
CJS-FA
FA
20 15
10
10 5 5 0
0
100
200 Samples
300
400
0
ACCEPTED MANUSCRIPT
(c)
14 12
Subspace 1
10 8 6
2 0
0
50
100
150
0
50
100
150
200 Samples
250
60
40 30 20 10
200 Samples
250
300
350
400
350
400
ED
M
0
300
AN US
Subspace 2
50
CR IP T
4
(d)
PT
Fig. 5. Monitoring results of Case 3: (a) Joint Probabilities; (b) Identified Mode; (c) PCA, PPCA, FA
AC
CE
and CJS-FA Monitoring Statistics; (d) Monitoring Statistics in each subspace
CR IP T
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
M
AN US
Fig. 6. Schematic graph of the CSTR process
ACCEPTED MANUSCRIPT
0
Joint probability
10
-5
10
Mode1 Mode2 Mode3 -10
0
200
400
600
Samples
(a)
10
Transients
AN US
CJS-FA
15
5
0
0
800
200
400
600
1000
CR IP T
10
800
1000
800
1000
M
Samples
ED
(b)
25
15
CE
FA
PT
20
10
AC
5 0
0
200
400
600
Samples
(c)
Fig. 7. Monitoring Results of Case 1 in CSTR
ACCEPTED MANUSCRIPT
0
Joint probability
10
-1
10
-2
Mode1 Mode2 Mode3
10
-3
0
200
400
600
Samples
(a) 10
6
AN US
CJS-FA
8
4 2 0
0
800
200
400
600
1000
CR IP T
10
800
1000
800
1000
M
Samples
(b)
ED
40
FA
PT
30 20
CE
10
AC
0
0
200
400
600
Samples
(c) Fig. 8 Monitoring Results of Case 2 in CSTR
ACCEPTED MANUSCRIPT
0
Joint probability
10
-5
10
Mode1 Mode2 Mode3
-10
10
-15
0
200
400
600
Samples
(a) 200 30 20 10 100
0 600
700
50 0
0
200
AN US
CJS-FA
150
800
1000
CR IP T
10
800
400
600
800
1000
M
Samples
(b)
ED
200
100 150
FA
PT
50
100
0 600
700
800
CE
50
AC
0
0
200
400
600
800
Samples
(c) Fig. 9. Monitoring Results of Case 3 in CSTR
1000
AN US
CR IP T
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
M
Fig. 10. Control system of the Tennessee Eastman process
ACCEPTED MANUSCRIPT
0
Joint probability
10
-100
10
Mode1 Mode2 Mode3 -200
0
500
1000
Samples
(a)
3 2 1 0
0
500
AN US
Mode number
4
1500
1000
2000
CR IP T
10
1500
2000
1500
2000
M
Samples
(b)
ED
10
6
PT
CJS-FA
8
4
CE
2
AC
0
0
500
1000
Samples
(c)
ACCEPTED MANUSCRIPT
8
Subspace 2
Subspace 1
20 15 10 5 0
0
500
1000
1500
6 4 2 0
2000
0
500
Subspace 4
8
20 10
0
500
1000
1500
6 4 2 0
2000
0
500
1000
AN US
Subspace 3
30
0
1000
1500
2000
Samples
CR IP T
Samples
Samples
1500
2000
Samples
(d)
Fig. 11. Monitoring Results of Case 1 in the TE process: (a) Joint Probability; (b) Mode
AC
CE
PT
ED
M
Identification Results; (c) Monitoring Statistic DR; (d) Monitoring Results in each Subspace
ACCEPTED MANUSCRIPT
0
-50
10
-100
10
Mode1 Mode2 Mode3
-150
10
-200
10
0
500
1000
1500
Samples
(a)
AN US
2000
100
50
0
500
PT
0
ED
M
CJS-FA
150
CR IP T
Joint probability
10
1000
1500
2000
1500
2000
Samples
(b)
CE
200
FA
AC
150
100
50
0
0
500
1000
Samples
(c)
ACCEPTED MANUSCRIPT
15
Subspace 2
Subspace 1
200 150 100 50 0
0
500
1000
1500
10 5 0
2000
0
500
Subspace 4
40 20
500
1000
1500
20 10 0
2000
0
500
1000
AN US
Subspace 3
30
0
2000
Samples
60
0
1500
CR IP T
Samples
1000
Samples
1500
2000
Samples
(d)
Fig. 12. Monitoring results of Case 2 in the TE process: (a) Joint Probability; (b) Monitoring Results
M
of CJS-FA; (c) Monitoring Results of FA under Identified Mode; (d) Monitoring Results in each
AC
CE
PT
ED
Subspace
ACCEPTED MANUSCRIPT
0
Joint probability
10
-50
10
-100
10
Mode1 Mode2 Mode3
-150
10
-200
0
500
1000
1500
Samples
(a)
100
50
0
500
1000
1500
2000
M
0
AN US
CJS-FA
150
2000
CR IP T
10
Samples
ED
(b)
2000
1000
CE
FA
PT
1500
AC
500
0
0
500
1000
Samples
(c)
1500
2000
ACCEPTED MANUSCRIPT
400
PCA
300
200
100
0
500
1000
1500
Samples
(d)
Subspace 2
15
20 10 0
0
500
1000
1500
M
Subspace 3
ED
500
1000
PT
0
500
1000
1500
2000
1500
1500
2000
8 6 4 2 0
2000
Samples
CE
0
Samples
1000
0
5 0
2000
Samples
500
10
AN US
30
Subspace 4
Subspace 1
40
2000
CR IP T
0
0
500
1000
Samples
(e)
AC
Fig. 13. Monitoring results of Case 3 in the TE process: (a) Joint Probability; (b) Monitoring Results of CJS-FA; (c) Monitoring Results of FA under Identified Mode; (d) Monitoring Results of PCA under Identified Mode; (e) Monitoring Results in each Subspace
ACCEPTED MANUSCRIPT
Table 1 Monitored process Variables in the CSTR system Process measurements
1
Outlet temperature T
2
Outlet concentration C
3
Cooling-water flow rate FC
4
Inlet solute flow FA
5
Inlet solvent flow FS
6
Cooling-water temperature Tc
7
Inlet temperature T0
8
Inlet solute concentration CAA
9
Inlet solvent concentration CAS
AC
CE
PT
ED
M
AN US
CR IP T
Variable No.
ACCEPTED MANUSCRIPT
Table 2 Three operating modes of the CSTR system Mode 1
Mode 2
Mode 3
Cooling-water temperature Tc
15.00
6.61
3.43
Inlet temperature T0
0.80
0.75
0.69
Inlet solute concentration CAA
356.25
370.77
372.00
AC
CE
PT
ED
M
AN US
CR IP T
Variable
ACCEPTED MANUSCRIPT
Table 3 Six different operating modes of the TE process G/H mass ratio
Production rate(kg/h)
1
50/50
14.076
2
10/90
14.076
3
90/10
11.111
4
50/50
Maximum
5
10/90
Maximum
6
90/10
Maximum
AC
CE
PT
ED
M
AN US
CR IP T
Operating mode
ACCEPTED MANUSCRIPT
Table 4. Process monitoring variables in the TE process Process measurements
No.
Process measurements
1
A feed (stream 1)
17
Stripper underflow (stream 11)
2
D feed (stream 2)
18
Stripper temperature
3
E feed (stream 3)
19
Stripper steam flow
4
total feed
20
Compressor work
5
Recycle flow (stream 8)
21
Reactor cooling water outlet temperature
6
Reactor feed rate (stream 6)
22
Separator cooling water outlet temperature
7
Reactor pressure
23
D feed flow valve (stream 2)
8
Reactor level
24
E feed flow valve (stream 3)
9
Reactor temperature
25
A feed flow valve (stream 1)
10
Purge rate (stream 9)
26
Total feed flow valve (stream4)
11
Product separator temperature
27
Purge valve (stream 9)
12
Product separator level
28
Separator pot liquid flow valve (stream10)
13
Product separator pressure
29
Stripper liquid product flow valve (stream 11)
14
Product separator underflow (stream10)
15
Stripper level
16
Stripper pressure
AN US 30
Reactor cooling water flow
31
Condenser cooling water flow
M ED PT CE AC
CR IP T
No.
ACCEPTED MANUSCRIPT
Table 5. Division Result of the TE process Variable No.
1
2, 10, 21, 23, 24, 27, 30
2
1, 4, 7, 25, 26
3
11, 13, 16, 18, 20, 22, 28, 29
4
3, 5, 6, 8, 9, 12, 14, 15, 17, 19, 31
AC
CE
PT
ED
M
AN US
CR IP T
Subspace No.
ACCEPTED MANUSCRIPT
Table 6. Process faults programmed in the TE process Process variable
Type
1 2 3 4 5 6 7
A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed temperature (stream 2) reactor cooling water inlet temperature condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss-reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) reactor cooling water inlet temperature condenser cooling water inlet temperature reaction kinetics reactor cooling water valve condenser cooling water valve unknown unknown unknown unknown unknown
step step step step step step step
CR IP T
Fault No.
random variation random variation random variation random variation random variation slow drift sticking sticking unknown unknown unknown unknown unknown
AC
CE
PT
ED
M
AN US
8 9 10 11 12 13 14 15 16 17 18 19 20
ACCEPTED MANUSCRIPT
Table 7. Monitoring Results of FA and CJS-FA in Mode 1 FA
Sub-FA 1
Sub-FA 2
Sub-FA 3
Sub-FA 4
CJS-FA
1
0.003
0.453
0.015
0.008
0.733
0.003
2
0.008
0.017
0.077
0.008
0.933
0.007
3
1
0.988
1
0.978
1
0.935
4
0.105
0.001
1
0.921
0.997
0
5
0.997
0.992
1
0.970
1
0.943
6
0
0.014
0
0
0.205
0
7
0
0.930
0
0.905
1
0
8
0.053
0.130
0.098
0.077
0.592
0.048
9
0.985
0.987
10
0.125
0.992
11
0.112
0.120
12
0.617
0.825
13
0.088
0.112
14
0.018
0.117
15
0.998
16
1
17
0.010
18
0.205
AN US
1
0.890
1
0.107
1
0.108
0.997
0.360
0.712
0.030
0.998
0.548
1
0.495
0.235
0.090
0.532
0.075
1
0.963
0.565
0.008
0.992
1
0.965
1
0.938
0.992
1
0.973
1
0.948
0.012
0.990
0.602
0.907
0.005
0.942
0.999
0.540
1
0.522
0.007
0.818
0.862
0.055
0.953
0.030
0.042
0.122
0.967
0.030
0.948
0.030
ED
M
0.957
PT
AC
20
1
CE
19
CR IP T
Fault No.
ACCEPTED MANUSCRIPT
Table 8. The monitoring results of PCA, FA, SFA, WFA and CJS-FA PCA
FA
SFA
WFA
CJS-FA
1
0.01
0.01
0.01
0.01
0
2
0.02
0.01
0.04
0.03
0
3
0.93
0.98
0.94
0.94
0.88
4
0.75
0.96
0.30
0.01
0.90
5
0.72
0
0.76
0.67
0
6
0.01
0.01
0.01
0.01
0
7
0
0.60
0.58
0.43
0.76
8
0.03
0.02
0.03
0.01
0
9
0.97
0.98
0.97
0.97
0.97
10
0.66
0.18
0.37
0.29
0.56
11
0.56
0.57
0.38
0.33
0.52
12
0.01
0.01
0.02
0.01
0
13
0.06
0.05
0.06
0.04
0
14
0.01
0
0.01
0
0.19
0.98
0.98
0.83
0.76
0.80
0.78
0.60
0.37
0.59
0.70
PT
16
AN US
17
0.22
0.08
0.17
0.16
0.157
18
0.10
0.10
0.23
0.09
0
19
0.84
0.97
0.96
0.88
0.92
20
0.73
0.49
0.84
0.18
0.36
CE AC
M
ED
15
CR IP T
Fault No.