Multi-subspace factor analysis integrated with support vector data description for multimode process monitoring

Multi-subspace factor analysis integrated with support vector data description for multimode process monitoring

Accepted Manuscript Multi-subspace Factor Analysis Integrated with Support Vector Data Description for Multimode Process Monitoring Bei Wang , Zhicha...

2MB Sizes 0 Downloads 55 Views

Accepted Manuscript

Multi-subspace Factor Analysis Integrated with Support Vector Data Description for Multimode Process Monitoring Bei Wang , Zhichao Li , Xuefeng Yan PII: DOI: Reference:

S0016-0032(18)30524-6 https://doi.org/10.1016/j.jfranklin.2018.07.044 FI 3593

To appear in:

Journal of the Franklin Institute

Received date: Revised date: Accepted date:

22 March 2017 1 June 2018 30 July 2018

Please cite this article as: Bei Wang , Zhichao Li , Xuefeng Yan , Multi-subspace Factor Analysis Integrated with Support Vector Data Description for Multimode Process Monitoring, Journal of the Franklin Institute (2018), doi: https://doi.org/10.1016/j.jfranklin.2018.07.044

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Multi-subspace Factor Analysis Integrated with Support Vector Data Description for Multimode Process Monitoring Bei Wang1,2, Zhichao Li1, Xuefeng Yan1,*

(1. Key Laboratory of Advanced Control and Optimization for Chemical Processes of Ministry of

2. Shanghai Electric Windpower Group, Shanghai, P. R. China)

*: Corresponding author: Xuefeng Yan

AN US

Email address: [email protected]

CR IP T

Education, East China University of Science and Technology, Shanghai, P. R. China

Tel\Fax Number: +86-21-64251036

M

Address: P.O. BOX 293, MeiLong Road NO. 130, Shanghai 200237, P. R. China

ED

Abstract

PT

In modern plant-wide systems, chemical industry processes are usually equipped with multiple operating modes to meet the requirements of diversification products. Accurately identifying the

CE

running-on mode therefore becomes a focal point. Meanwhile, systems produce numerous process

AC

variables, along with complex relationships, which may deteriorate the effectiveness with which statistical processes are monitored. To solve this problem, this study proposes a multimode factor analysis (FA) method that integrates Pearson’s correlation coefficient, joint probability, and support vector data description (SVDD). First, subspaces are generated automatically by using Pearson’s coefficients of correlation among variables, instead of based on prior knowledge, which is not always available. Second, the statistical indices are derived by the FA models constructed in each subspace

ACCEPTED MANUSCRIPT

and each mode. Third, the running-on mode is identified according to the joint probabilities among the statistical indices. Finally, SVDD is adopted to provide an intuitive indication for fault detection. The efficiency and availability of the proposed method are demonstrated by three case studies: a numerical simulation, the continuous stirred-tank reactor (CSTR) model, and the Tennessee Eastman

CR IP T

(TE) benchmark process.

Keywords: factor analysis, multi-subspace strategy, support vector data description; multimode

AC

CE

PT

ED

M

AN US

process; correlation coefficient

ACCEPTED MANUSCRIPT

1. Introduction

In modern industrial society, manufacturers have paid considerable attention to process monitoring to guarantee the security of factories and satisfy the high requirements for product quality. Owing to the technical progress in data collection and computation, multivariate statistical

CR IP T

process monitoring (MSPM) methods have been applied to diverse processes and significantly advanced in recent decades [1-4]. Among various MSPM methods, principal component analysis (PCA) [5, 6] can be considered the most fundamental one. It aims to reduce dimensionality and

AN US

retain main relation structures. Considering its validity, PCA has many extensions designed to handle different industrial conditions, including kernel PCA [7, 8], dynamic PCA [9, 10], nonlinear PCA [11, 12], and so on [13-15]. However, a notable disadvantage of PCA is the lack of an associated

M

probabilistic model for observed data. PCA-based methods assume that process variables are

ED

noiseless; this condition is difficult to achieve in practical industrial processes. To solve this problem, researchers propose probabilistic PCA (PPCA) [16] and factor analysis (FA) [17, 18], in

PT

which data in a density estimation framework are described. FA is more commonly used than PPCA

CE

because it assumes that all noises from the observed variables are found at different levels. By contrast, PPCA assumes that noises have the same variance. In other words, FA provides a general

AC

description of the Gaussian latent model structure, whereas PPCA is only a special case of FA. Similar to PCA, FA needs to satisfy Gaussian distribution and should be linearly related.

A trend in modern industrial processes shows that systems are increasingly becoming complicated and extensive [19]. Numerous data are generated with various complex relationships, such as linear, nonlinear, partial correlation, and independent. Under this occasion, constructing only one model for global monitoring seems inappropriate and insufficient, because the local behaviors

ACCEPTED MANUSCRIPT

will be disregarded. Therefore, the multi-subspace strategy emerges as a requirement to reduce data complexity by providing an accurate feature description [20, 21]. Westerhuis et al. [22] recommended separating variables into meaningful blocks and then built models to explain each block. Using this recommendation, Qin et al. [23] proposed calculating loadings and scores directly

CR IP T

from regular PCA and PLS models, developing decentralized monitoring and diagnosis. These methods successfully apply multi-subspace strategy to basic models and achieve effective monitoring results. However, they generate the subspaces based on prior process knowledge, which

AN US

is not always available in real industrial processes. Thus, the data-driven multi-subspace methods are proposed to divide the space automatically. Tong et al. [24] produced four subspaces based on the relevance and irrelevance between the principal component and the residual space. Ge and Song [25]

M

developed a distributed PCA method to separate the original space based on different directions of principal components. Later, Wang et al. [26] constructed subspaces by measuring the Generalized

ED

Dice’s coefficients among the loadings. All these data-driven division methods achieve expected

PT

monitoring performance. Therefore, the multi-subspace method based on data characteristics can be

CE

considered a viable and effective strategy to address complex plant-wide processes.

In addition, multimode is another common operating state for industrial processes [27, 28]

AC

because processes must shift their operating modes to cater to changes in product standards, set point, and other aspects. Therefore, traditional MSPM methods, which assume that processes only operate under a single condition, cannot be applied to such a multi-modal process. False alarms may be triggered when the process switches to another operating mode with a different confidence limit. Thereafter, monitoring the process with multiple operating modes becomes a difficult problem. To solve this issue, a series of multimode strategies have been developed. Qin et al. proposed recursive

ACCEPTED MANUSCRIPT

PCA and PLS [29, 30], which update the correlation matrix recursively to switch the status of the monitoring process. However, recursive work is carried out blindly, without considering the identified mode and the transition between modes. Hwang and Han [31] developed a monitoring methodology based on hierarchical clustering and constructed a super PCA model to monitor the

CR IP T

multi-modal processes, but this methodology works only when process behaviors share common characteristics. Zhao et al. [32] adopted principal angles to match operating modes with predefined models. Mauricio et al. [33] suggested monitoring the multimode process based on Kernel PCA.

AN US

Later, Gaussian mixture models (GMM) were widely studied and introduced to multimode process monitoring. Choi [34] expanded the concept of traditional PCA and used GMM to approximate the data pattern in subspace. Yu [35] constructed a principal components (PCs)-based GMM model and

M

proposed assessing process states with two quantification indices: negative log likelihood probability and Mahalanobis distance. Ge and Song proposed a multimode process monitoring method based on

ED

the Bayesian theorem [36] and later extended it to monitor batch processes [37]. These multimode

PT

approaches are effective, but a more general approach is needed to monitor plant-wide multimode

CE

processes.

To address high-dimensional and complex relational data from plant-wide multimode processes,

AC

a pioneering monitoring method, called CJS-FA, is proposed in this study. This method integrates FA model with correlation coefficient, joint probability, and support vector data description (SVDD). First, instead of prior knowledge, a classic statistical measurement called Pearson’s correlation coefficient is employed to estimate the correlations among variables so that the related variables can be grouped in the same subspace automatically. After classification, the variables sharing similar data characteristics in the same subspace obviously enhance monitoring capability. Moreover, using

ACCEPTED MANUSCRIPT

this approach, data complication and dimension can be jointly reduced. Second, FA is selected to establish models in each block under each operating condition because FA has a more general model structure compared with that of PCA. Thereafter, the joint probabilities of all modes are estimated with monitoring statistics from each subspace to identify the type of running-on mode. Finally,

CR IP T

SVDD is adopted to monitor the variation of all subspaces in the identified modes intuitively.

The novelty and advantages can be summarized as follows: (1) a completely data-driven unsupervised monitoring method for complex plant-wide processes with multiple operation states is

AN US

developed in this paper; (2) The process data are divided into several subspaces according to the Pearson’s coefficients rather than prior knowledge for easing the monitoring difficulties; (3) FA model of each block under each mode is established to better deal with multimode process noise; (4)

M

the running-on can be identified by joint probability; (5) the monitoring results are combined through

ED

SVDD.

The remainder of this paper is organized as follows. A detailed description of the proposed

PT

monitoring process is presented in Section 2, along with brief instructions for the related methods.

CE

Section 3 discusses the corresponding implementation procedures. Thereafter, applications to three cases and comparisons with other methods are analyzed in Section 4. Finally, some conclusions are

AC

drawn in Section 5.

2. Methodology

This section illustrates the monitoring scheme for the multimode plant-wide process, briefly reviewing related basic methods.

2.1 Correlation Coefficient for Space Division

ACCEPTED MANUSCRIPT

In plant-wide industrial processes, handling numerous high-dimensional and complex relational data in process monitoring has been a difficult problem. To accomplish dimensionality reduction and feature extraction, multi-subspace strategy is usually adopted to divide original process variables into several subspaces. Traditionally, division is implemented based on prior process knowledge.

CR IP T

However, this is not always as available as expected, especially in processes with multiple operating states [38]. In this study, a classic statistical measurement, Pearson product-moment correlation coefficient (referred to as Pearson’s correlation coefficient), is employed to divide the variables

AN US

based on data characteristics.

In statistics, Pearson’s correlation coefficient, developed by Karl Pearson, is widely used as a measure of the degree of correlation between two variables x p and xq , with the formula given as

p xq

 Cov( x p , xq ) / D( x p ) D( xq )

ED

x

where Cov is the covariance, and

D( x p ) and

(1)

D( xq ) are the standard deviations. It is easy to

PT

derive that the value of  x

M

follows [39]:

p xq

is a value between -1 and 1. When the linear relationship between the

CE

two variables increases, the correlation coefficient tends to 1 or -1; when one variable increases, the

AC

other variable also increases, indicating that they are positively correlated, and the correlation coefficient is greater than 0; if one variable increases, the other variable decreases, indicating that they are negatively correlated, the correlation coefficient is less than 0; if the correlation coefficient is equal to 0, there is no linear correlation between them. The correlation coefficients without plus-minus sign are to show the linearity of the two variables in essence. Variable relations can be quantified by computing the correlation coefficients among variables.

ACCEPTED MANUSCRIPT

For space division, coefficients of correlation among variables are calculated under each mode, k k with data collected from normal conditions. Assuming two variables x p and xq under mode k ,

their correlation coefficient can be written as  xk p xq . In the practical system, a threshold value  k must be set based on experience for fair division, and then the subspace number B should be

same subspace. For a new variable

xrk

CR IP T

determined. When  xk p xq  k , these two variables are considered correlated and grounded into the , if  xk p xr  k or  xkr xq  k . Then, this variable should be put

k k together with x p and xq . Thus, all variables are grouped in all modes. The setting of  k should

AN US

be noted. An extremely small value causes an excessive number of subspaces, whereas an extremely large value causes difficulty in division. Thus, the threshold  k is important in the division of variables. As observed, the space division completely depends on the correlation coefficient  xk p xq

M

and the threshold value  k , realizing data-driven division without prior knowledge.

ED

Obviously, variables whose correlation coefficients with others all stay below the threshold may exist, presenting an irrelevant relation with others. Under this circumstance, these variables can

PT

establish a new subspace or can be distributed to other existing subspaces. The basic principle of

CE

space division states that variables from different subspaces should be independent from one another, and the division should not impair the original monitoring performance.

AC

2.2 Process Monitoring Based on FA Following the division principle above, the process data X  Rmn (including

m variables and

n samples) dissolve into B subspaces, as expressed in the following: X =  X1 , X 2 ,..., X B 

(2)

ACCEPTED MANUSCRIPT

For the b th  b  1, 2,..., B  subspace X b  Rm n , where mb denotes the corresponding variable b

number, the variables are scaled to zero mean and unit variance for FA model structuration. The aim of the FA model is to transform the original mb -dimensional variables X b into an l -dimensional vector of latent variables Tb added with small white noise Eb , as shown in the following :[16, 40]

where the loading matrix the noise

CR IP T

X b = PbTb + Eb

Pb  R mb l (l  mb )

(3)

relates the two sets of variables. The variance matrix of

is expressed as  E  diag{i }, i  1, 2,..., mb , in which the variances  are assumed at

Eb

b

AN US

different levels. When all the noise variances i are supposed to be zero, FA model is equivalent to PCA model. Moreover, if all  are assumed to have the same value, then FA becomes PPCA. Therefore, FA is the general formation of the Gaussian latent model, including two special cases:

M

PCA and PPCA.

ED

In FA model, the latent variable satisfies

Tb ~ N  0, I  ,

where data are conditionally independent

variable

Tb

PT

and Gaussian-distributed with unit variance. Specifically, the corresponding distribution of latent is expressed as follows:

CE

p Tb   (2 )l /2 exp  1/ 2  TbT Tb 

(4)

AC

With the additional noise Eb ~ N  0,  E  , the corresponding Gaussian distribution of the original variables

b

can be induced as

Xb

p  X b    p  X b | Tb  p Tb dTb  (2 ) m /2 C

1/2

exp  1/ 2  X bT C 1 X b 

(5)

where C =  E  Pb PbT represents the variances of the observational data, and then the Gaussian b

distribution for

Xb

can be described as X b ~ N (0,  E  Pb PbT ) . Therefore, the main work of FA turns b

ACCEPTED MANUSCRIPT

to estimate the parameter set   {Pb ,  E } . The expectation and maximization (EM) algorithm is b

adopted as an iterative likelihood maximization algorithm to estimate the most probable value of the parameters   {Pb ,  E } , which results in calculations given by b



n

1

 Eb = 1 / n   diag X b, z X bT, z  PbTˆbT, z X bT, z z 1

where

z

(6)

CR IP T

 n  n  Pb =   X b, zTˆbT, z   E{Tˆb, zTˆbT, z | X b, z }  z 1  z 1 



(7)

is the sample number. The detailed calculations are not displayed in this paper because of

AN US

the limitation of length. Please refer to [17, 40] for more details. For monitoring purposes, the T 2 statistics based on the FA model are constructed as follows: Tb2, z = TˆbT, z

2

 X bT, z QT QX b , z  2,l

(8)

M

2 where Q = PbT  Pb PbT   E  , and  ,l is a  -distribution with significance level  and l degrees

2

b

Tb2

exceed the confidence limit, some fault is deemed to occur in the process. Thus, the FA

PT

statistics

ED

of freedom. Usually, a 99% confidence limit is set for process monitoring. When monitoring

CE

model is constructed to monitor the trend of observation data in each subspace.

AC

2.3 Joint Probability for Mode Identification For multimode processes, monitoring work not only involves the construction of monitoring

statistics, but also contains the mode identification of the current point. To identify the current mode type, the FA model is built in each mode with data from normal conditions. The currently observed sample

xnew  R m1

should be divided to B subspaces, that is, xnew =  x1 , x2 ,..., xB  , and then

normalized with the normal training data from different modes. Thus, the

Tb2

statistics under each

ACCEPTED MANUSCRIPT

mode can be calculated. This study identifies the mode from a probabilistic perspective so that each statistic under the k th mode, expressed as Tk2,b , is converted into mode probability as

p( xb  Mk )  e where xnew

Mk

Tk2,b

(9)

represents the k th system mode. Following the principle of joint probability, the event

approximately satisfies

CR IP T

Tb2

 xnew  Mk    x1  Mk    x2  Mk   ...   xB  Mk  , that is, the

probability of event xnew can be written as Tk2,1

e

Tk2,2

 ...  e

Tk2,B

e

 (Tk2,1 Tk2,2 ...Tk2,B )

AN US

p  xnew  Mk   p  x1  Mk  p  x2  Mk  ... p  xB  Mk   e

(10)

As illustrated in Section 2.2, variables from different subspaces should be nonlinear correlation.

M

Besides, the rigorous independence among subspaces cannot be satisfied easily in practical industrial processes because the sequential processes would produce interaction effects among the subspaces.

ED

Therefore, the joint probability calculated by Eq. (12) is only an approximate value and not an exact

PT

value, but it is sufficient for mode identification. The function of joint probability is to judge the mode to which the current point belongs. A high calculated joint probability indicates that the current

CE

point matches the current mode. By contrast, a low mode joint probability (close to zero) shows that

AC

the operating system is not running under this mode. Therefore, the mode of the current point can be determined by comparing joint probabilities from different modes, wherein the mode with the highest joint probability is regarded as the current mode.

2.4 SVDD for Statistics Combination

After the running-on mode is identified, the current data should be checked to see whether a certain fault exists in the operation process. For the current sample xnew , the data have been divided

ACCEPTED MANUSCRIPT

and projected into sub-spaces for statistical calculation. To monitor the variation of process statistics intuitively, SVDD [41, 42] is introduced. The aim of SVDD is to construct a hypersphere with minimum volume that contains all the training data objects so that deviation data can be distinguished when projected into this space. First, the training data yu  Rm , u  1, 2,..., N , where N

: y  F

CR IP T

is the sample number, is transformed into a higher dimension through a nonlinear function , which is usually simply performed by a given kernel K  yu , yv  . To ensure the

hypersphere with the minimum volume, SVDD must solve an optimization problem shown as

N

R 2  C  u

min R , a ,

u 1

  yu   a  R 2  u 2

s.t.

u  1, 2,

,N

AN US

follows [43]:

(11)

M

where a denotes the center of the constructed hypersphere, R represents the distance from the

u

is the slack variable. The dual form is given as N

min u

N

u 1

u 1 v 1

N

(12)

0  u  C, u  1

CE

s.t.

N

u K  yu , yv    u v K  yu , yv 

PT

and

ED

center to the boundary, C is the trade-off between the hypersphere’s volume and the error number,

u 1

u

is a Lagrange multiplier, and K  yu , yv     yu    yv 

AC

where

is the kernel function. The

result of Eq. (8) shows that a set  u can be obtained, and the samples yu with  u  0 are called support vectors (SVs). The squared distance R 2 can be calculated as N

N

N

R  K  y, y   2u K 2  yu , y    u v K  yu , yv  2

u 1

u 1 v 1

(13)

ACCEPTED MANUSCRIPT

where yu  SV . Therefore, for a new sample z , the squared distance can be estimated as N

N

N

z  a  K  y, y   2u K  z, yu    K  yu , yv  u 1

(14)

u 1 v 1

For fault detection, an index is constructed for monitoring, as shown in the following: DR  z - a / R 2

CR IP T

2

(15)

and the confidence limit is 1. This result indicates that the sample z is deemed normal when the

AN US

estimated distance is  R 2 . Otherwise, the sample z is considered faulty. The data y and z are replaced by monitoring statistics generated from each subspace so that variation in each subspace can be detected in a timely manner.

M

3 Process Monitoring

ED

The proposed CJS-FA method employs correlation coefficients for space division, joint probability for mode identification, and SVDD for the result combination. The corresponding

PT

schematic diagram is shown in Figure 1, and the detailed implementation procedures are summarized

CE

as follows:

AC

Off-line modeling

Step 1: Training samples are collected from M operating modes and are standardized to zero mean and unit variance.

Step 2: Subspaces are generated following the correlation coefficients among variables.

Step 3: The FA models are established in each subspace under each mode condition.

ACCEPTED MANUSCRIPT

Step 4: The monitoring statistics y = tk2,1 , tk2,2 ,..., tk2, B  are calculated with testing data under all modes. Online monitoring

Step 1: The current sample is divided from operating processes as previously obtained.

CR IP T

Step 2: The joint probability of each mode is estimated, and the mode of the current sample is determined.

Step 3: The monitoring statistics z = Tk2,1 , Tk2,2 ,..., Tk2, B  of all subspaces are calculated under the

AN US

identified mode.

Step 4: The SVDD model is built with the selected statistics y = tk2,1 , tk2,2 ,..., tk2, B  under the identified mode, using Eqs. (11-13).

are projected on the feature space, and the statistic DR is

ED

calculated through Eqs. (14-15).

z

M

Step 5: The monitoring statistics

PT

Step 6: The statistic DR , which measures the distance to hypersphere’s center, is monitored. A

 DR  1

means that

CE

higher value  DR  1 indicates abnormal operation, whereas a lower value the system is running under normal conditions.

AC

4. Applications

To demonstrate the effectiveness of the proposed CJS-FA method, applications to three cases are

discussed in this section. The monitoring performance and comparison with other methods are analyzed as well.

4.1 Numerical Example

ACCEPTED MANUSCRIPT

A simple numerical system with eight variables found in two subspaces is constructed to verify

0 0 0 0  x1   2.4527 0.5725  e1   x   4.2696 0.1566  0 0 0 0   s1  e2   2   x3   3.3541 0.4218 0 0 0 0   s2   e3        0 1.8532 0.7421 0 0   s3  e4   x4    0    x5   0 0 4.5747 0.4781 0 0   s4   e5        0 2.1598 0.5471 0 0   s23  e6   x6   0   x   0 0 0 0 0.1 0   s43  e7  7       0 0 0 0 0.1  x8   0  e8  T

(16)

represents signal sources, e1 e2 ... e8  denotes white Gaussian noises T

AN US

where  s1 s2 s3 s4 

CR IP T

the function of the proposed CJS-FA method. The multivariate system is simulated as follows:

with 2dBW. Three different signal sources are developed to generate different running-on modes, as shown in the following:

M

Mode 1: s1 ~ N (8.5,1.2) , s2 ~ N (7,0.8) , s3 ~ N (4.5,1) , and s4 ~ N (4,0.8) .

ED

Mode 2: s1 ~ N (1.2,3) , s2 ~ N (2.5,1) , s3 ~ N (11,1.2) , and s4 ~ N (12,0.8) .

PT

Mode 3: s1 ~ N (12,1.2) , s2 ~ N (13,1) , s3 ~ N (10,1.2) , and s4 ~ N (10,0.8) .

CE

Therefore, 400 samples used as training data and another 400 samples used as testing data are

AC

collected under each of the three modes. The training data are used for off-line modeling, whereas the testing data are employed to examine the normal condition of the process. Thereafter, the space should be divided based on the absolute correlation coefficients between each two variables. Since the numerical system is stable and the relations among the variables do not change, the division results under each mode are the same. The division result under mode 1 is shown in Fig. 2. After the threshold value 1  0.25 is specified, the correlation coefficients among the first three and the

ACCEPTED MANUSCRIPT

seventh variables are found to be above the threshold. Meanwhile, the remaining four variables are correlated with one another. Thus, two subspaces under mode 1 are generated. The same division results are obtained in the other two modes so that two subspaces are denoted as and

 x4

x5

x6

 x1

x2

x3

x7 

T

x8  . After subspace division, the testing data are applied to each subspace in T

CR IP T

each mode to generate statistics under normal conditions.

Three cases are programmed for monitoring, as shown in the following:

Case 1: The system initially runs at mode 1 from the 1st point to the 100th point and then

AN US

successively shifts to mode 3, mode 2, and mode 1 every 100 points.

Case 2: The system initially runs at mode 1 where a ramp change of

0.1   z  100 

is introduced to

s2 from the 101st point. Thereafter, the system switches to mode 2 from the 201st point to the 400th

M

point, with a step change of 1 added to s3 from the 301st point.

ED

Case 3: The system initially runs at mode 1 and turns to mode 3 from the 101st point. A step change

PT

of 0.8 is added to s3 from the 201st point to the end.

CE

The monitoring results of the above three cases are displayed in Figs. 3-5, where joint probability, identified process mode, and statistics DR are all given for the three cases. Given that the statistic from matched mode would be small and would remain below the confidence level, the

AC

T2

mode generating a comparatively large joint probability is considered the mode on which the operation process is currently running. By contrast, the mode with a small joint probability is not the currently operating mode. However, when

T2

statistics excessively deviate from their original

region and generate an extremely small joint probability, expressing the value of joint probability becomes a problem. Under such a condition, a tuning parameter is introduced to the probability

ACCEPTED MANUSCRIPT

formula as p  xnew  Mk   e

 (Tk2,1 Tk2,2 ...Tk2,B )/ 

, where the parameter  does not affect the ranking

results but only provides a visual representation of the identification results. In the current simulation, the parameter  is set as 30. Fig. 3 (a) shows that the process successively operates on mode 1, mode 3, mode 2, and mode 1, which is consistent with the real situation of case 1. The

CR IP T

corresponding mode number for process samples is shown in Fig. 3 (b) for a better explanation. The final monitoring result of the proposed CJS-FA method is shown in Fig. 3 (c). For comparison, PCA, PPCA and FA are used to deal with the data whose mode types have been identified, with the results

AN US

shown in Fig. 3 (c) as well. These four monitoring charts in Fig. 3 (c) reveal that the process is operating under normal conditions and the models have no effect on the process data.

The monitoring results of case 2 are presented in Fig. 4. Fig. 4 (a) indicates that the process

M

operates on mode 1 for the previous 200 points and then shifts to mode 2 from the 201st point,

ED

running on this mode until the end. The detailed mode number of each point is displayed in Fig. 4 (b). After mode identification, CJS-FA is used to detect the fault in the process, with results shown in

PT

Fig. 4 (c). It can be found that CJS-FA can detect the ramp change and step change accurately and

CE

quickly. For comparison, PCA, PPCA and FA are used to monitor the mode-identified data, with the monitoring results shown in Fig. 4 (c). As can be seen, all the three methods can detect fault samples,

AC

but a large part of the statistics of fault samples are below the control limit. CJS-FA achieves better monitoring performance than them. Fig. 4 (d) shows the monitoring charts of each subspace for case 2. It is obvious that the first subspace can detect the first fault successfully, while the second subspace can detect the second fault successfully. This is because the first fault is caused by s2 , which is related to

 x1

x2

x3

x7  (block 1). Therefore, the first subspace can well detect the T

occurrence of the fault. However, the variables in the second subspace have not been affected, so the

ACCEPTED MANUSCRIPT

fault cannot be detected by subspace 2. Similarly, for the second fault, the second subspace containing the affected variables (  x4

x5

x8  ) contributes to the detection of the fault. T

x6

In case 3, the corresponding joint probability is displayed in Fig. 5 (a), which indicates two modes on which the process runs. The specific mode number of the process can be found in Fig. 5

CR IP T

(b) and the corresponding monitoring result of PCA, PPCA, FA and the proposed method CJS-FA is shown in Figs. 5 (c). Notably, CJS-FA has superior monitoring performance, with monitoring statistics exceeding the confidence limit after sample 200. None of PCA, PPCA and FA can monitor

AN US

the statistics well. Their monitoring statistics fluctuate around the confidence limit after the fault occurs. This can be due to the fact that the presence of a large number of unaffected variables (

 x1

x2

x3

x7  ) inhibits detection of this fault. Fig. 5 (d) shows that the second subspace T

x5

x6

x8  ) can promote the detection of the fault. T

M

containing the affected variables (  x4

ED

4.2 Continuous Stirred-Tank Reactor (CSTR)

PT

The second simulation system adopts a non-isothermal continuously stirred tank reactor used by Yoon and MacGregor [44]. The corresponding schematic graph is provided in Figs. 6 . Nine

CE

monitored variables and three operating modes are listed in Table 1 and 2 [44, 45] for the case study.

AC

This process is sampled every minute, and then 500 samples are collected under normal conditions for each mode to be treated as training data. For space division, the correlation coefficients among the variables are calculated. After the threshold   0.15 is set, night variables are divided into two subspaces based on segmentation rules. The first subspace contains the first four variables and variables 7 and 8, whereas the second subspace contains variables 5, 6, and 9. For testing purposes,

ACCEPTED MANUSCRIPT

another 500 samples are obtained as well to examine the normal conditions and to prepare for further model construction. The three study cases are set as the following:

Case 1: The system initially operates in mode 1, switches to mode 2 from the 401st point, and finally

CR IP T

stays under mode 3 from the 701st point to the end.

Case 2: The system runs on mode 2 with a step of 1 K added to the cooling water temperature Tc from the 501th point to the 1000th point.

AN US

Case 3: The system runs in mode 2 with a 2 kmol/(m3min) step introduced into the inlet solute concentration CAA from the 501th point to the end.

Each case generates 1000 observations for simulation. The monitoring results are shown in Figs.

M

7-9. For comparison, the FA model is also constructed with the data from the identified model. In case 1, the joint probability of the monitoring statistics from each mode and each subspace is

ED

displayed in Fig. 7 (a), where the calculated results indicate that the process successively operates in

PT

mode 1, mode 2, and mode 3. This result of mode identification agrees with the reality. Thereafter,

CE

the monitoring statistics of CJS-FA and traditional FA are presented in Figs. 7 (b) and (c), respectively, to show the normal condition of the system.

AC

The second study case is a faulty condition, with a step added to the cooling water temperature

Tc. A deviation would be generated in outlet temperature T because of the control loop of the system, leading to an increase in the cooling water flow rate FC. Fig. 8 (a) indicates that the process is under mode 2, and Fig. 8 (b) shows the monitoring function of CJS-FA. The monitoring statistics start to deviate from sample 500, where the fault happens. Meanwhile, the FA model is constructed for

ACCEPTED MANUSCRIPT

comparison, using data from the identified mode. The inferior monitoring result in Fig. 8 (c) demonstrates the superiority of the CJS-FA method in multi-subspace.

Case 3 is another faulty process caused by the bias of the inlet solute concentration CAA. Thereafter, the outlet concentration C and the outlet temperature T would undergo floatation in this

CR IP T

loop system. Based on the joint probability in Fig. 9 (a), the process is found in mode 1 during the 1000 points. Both the proposed CJS-FA and the traditional FA can detect faults after a certain period, as presented in Figs. 9 (b)-(c). The monitoring results appear the same, but CJS-FA detects faults

AN US

with larger amplitude.

4.3 Tennessee Eastman (TE) Benchmark Process.

M

As a classic simulation system, the TE process developed by Downs and Vogel [46] has been widely used to evaluate and compare the characteristics of monitoring methods. The detailed

ED

schematic design is shown in Fig. 10. With the change of G/H mass ratios, the TE process can

PT

simulate six different modes, as presented in Table 3, and the first three modes are selected for simulation. Additional details about this system can be found in [47], and the simulation code can be

CE

downloaded from http://depts.washington.edu/control/LARRY/TE/download.html.

AC

In this study, 9 manipulated variables and 22 continuous measurements listed in Table 4 are selected for simulation. A total of 500 samples from each mode are generated with an interval of 0.05h and are regarded as training data for model construction. First, the correlation coefficients among 31 variables are estimated, and subspaces are produced based on the division rule. Table 5 shows the subspace division result, which can also be explained by practical experience. For example, variable 1 (A feed), variable 4 (Total feed), variable 25 (A feed flow valve), and variable

ACCEPTED MANUSCRIPT

26 (Total feed flow valve), which are related with one another, are grouped into subspace 1. More relations can be found by the comparison of division results with the TE process. Thereafter, 20 programmed faults, listed in Table 6, are introduced to TE process under each mode. In addition, Fault 0 without any fault is also simulated to generate the monitoring statistics under normal

CR IP T

conditions for SVDD construction. In verifying the superiority of the multi-block strategy, the monitoring results of each subspace in mode 1 are tabulated in Table 7 for illustration. Meanwhile, the monitoring results of both the traditional FA and the proposed CJS-FA are also presented in this

AN US

table for comparison. For further explanation, three cases are designed to test the function of the proposed method:

Case 1: The system normally operates on mode 1 from the 1st to the 500th point, on mode 2 from

ED

the 1501st to the 2000th point.

M

the 501st to the 1000th point, on mode 1 from the 1001st to the 1500th point, and on mode 3 from

Case 2: The system normally operates on mode 1, with fault 4 introduced at the 401st point.

PT

Thereafter, the system switches to mode 2 under normal condition at the 701st point, with fault 4

CE

introduced at the 1101st point. At the 1401st point, the system shifts to mode 3 under normal

AC

conditions, with fault 4 introduced from the 1801st to the 2100th point.

Case 3: The system is similar to Case 2, but fault 4 is changed to fault 10.

The monitoring results for Case 1 are shown in Fig. 11. The joint probability in Fig. 11 (a)

indicates that this simulation system operates in four modes: mode 1, mode 2, mode 1, and mode 3, as described by Fig. 11 (b). This mode identification result is consistent with the practical situation. Fig. 11 (c) presents the final monitoring result of the proposed CJS-FA, whereas Fig. 11 (d) shows

ACCEPTED MANUSCRIPT

the monitoring statistics in each subspace. As shown by the figures, both subspaces and the final result all denote the normal operation of the system.

In fault 4, the corresponding joint probability is given in Fig. 12 (a), which shows the correct type of operating modes for each point. Considering fault occurrence, the proposed detection method

CR IP T

is applied, with monitoring results listed in Fig. 12 (b). The monitoring statistics initially behave normally below the confidence limit until sample 400, where a fault occurs. Thereafter, the fault likely lasts for 300 points and then disappears at the 700th point. During the next 700 points, the

AN US

monitoring statistics stay below the confidence limit of 1 in the first 400 points and then exceed the threshold from the 1100th sample. For the last 700 points, the trend of the monitoring statistics is nearly the same as that in the samples from the 700th point to the 1400th point. For comparison, the

M

traditional FA model is constructed with data from the identification mode, as presented in Fig. 12

ED

(c). Without the application of a multi-subspace strategy, the monitoring statistics did not perform as well as those from CJS-FA. Significant information may be lost under numerous useless data. For

PT

further explanation, the monitoring statistics in each subspace are shown in Fig. 12 (d), where the

CE

first subspace shows effective monitoring performance for fault detection. Therefore, the function of the multi-subspace strategy can be confirmed.

AC

The third case considers fault 10, which is a random change in C feed temperature. Fig. 13 (a)

shows that the operating system runs on mode 1 from samples 1 to 700, then it runs on mode 2 from samples 701 to 1400 and on mode 3 from sample 1401 to 2100. Under each mode, fault 10 is introduced, and the monitoring statistics of CJS-FA can detect this fault, as shown in Fig. 13 (b). The monitoring result of FA is also shown in Fig. 13 (c), in which the monitoring statistics exceed the confidence limit after the fault happens, but the detection effect is no better than that of the proposed

ACCEPTED MANUSCRIPT

method. In addition, a fundamental model PCA is constructed to demonstrate the advantage of the general FA model. The monitoring result of PCA is shown in Fig. 13 (d), which indicates that the monitoring statistics only fluctuate around the confidence limit when the fault occurs. The monitoring statistics generated in each subspace are listed in Fig. 13 (e), wherein only subspace 3

CR IP T

can detect the fault in the process. The dimensionality of the process is reduced, and process data are simplified. Through the construction of SVDD, the variance in each subspace can be distinguished and can finally be presented more intuitively than before, as shown in Fig. 13 (b).

AN US

The three designed cases above have been tested, and the corresponding monitoring results show that the proposed method can correctly identify the running-on mode of the system and can provide effective monitoring results. In comprehensively illustrating the function of CJS-FA, it is compared

M

with other methods. The first two basic methods are the traditional PCA and FA. Thereafter, Jiang

ED

and Yan proposed sensitive FA (SFA), which concentrates on the most useful information of the FA model into one subspace [48]. Later, they focused on the problem of useful information being lost,

PT

and then they developed weighted FA (WFA) [49] . In comparing these methods fairly, PCA, FA,

CE

and CJS-FA are tested, with monitoring results listed in Table 8. As observed, the proposed CJS-FA performs well and exhibits superior function over other methods. In brief, the proposed CJS-FA is

AC

certified to be a realistic and effective method in addressing multimode processes with numerous complicated data.

ACCEPTED MANUSCRIPT

Conclusion

This study has proposed the CJS-FA method to monitor plant-wide processes with multiple operation modes. Given that prior knowledge is not always valid, the original space is segmented based on the correlation coefficients among variables. The principle of space division emphasizes

CR IP T

maintaining independence between variables from different subspaces. Thus, the joint probability can be used to identify the mode on which the operation system. For fault detection, the SVDD model is constructed with monitoring statistics from all subspaces in the identified mode, and it

AN US

identifies the fault intuitively and clearly. The performance of the proposed CJS-FA is demonstrated by a numerical simulation, the CSTR system, and the TE benchmark process. The simulation results and the comparison of the proposed method with other methods indicate the superiority and

M

effectiveness of the proposed method. Extensions based on the current study remain needed.

ED

Although the proposed method shows effective monitoring function on non-linear and non-Gaussian processes on some level, further studies that consider such processes should be conducted to improve

AC

CE

PT

monitoring performance.

Acknowledgments The authors gratefully acknowledge the support of the following foundations: 973 project of China (2013CB733605), National Natural Science Foundation of China (21176073) and the Fundamental Research Funds from the China Scholarship council.

ACCEPTED MANUSCRIPT

Reference [1] J. Wang, Q.P. He, Multivariate statistical process monitoring based on statistics pattern analysis, Industrial & Engineering Chemistry Research, 49 (2010) 7858-7869. [2] S. Yin, S.X. Ding, X. Xie, H. Luo, A review on basic data-driven approaches for industrial process monitoring, IEEE Transactions on Industrial Electronics, 61 (2014) 6418-6428.

CR IP T

[3] R.F. Sales, R. Vitale, S.M. de Lima, M.F. Pimentel, L. Stragevitch, A. Ferrer, Multivariate statistical process control charts for batch monitoring of transesterification reactions for biodiesel production based on near-infrared spectroscopy, Computers & Chemical Engineering, 94 (2016) 343-353.

[4] Z. Yan, C.-Y. Chen, Y. Yao, C.-C. Huang, Robust multivariate statistical process monitoring via

AN US

stable principal component pursuit, Industrial & Engineering Chemistry Research, 55 (2016) 4011-4021.

[5] R. Wehrens, Principal component analysis, in: 43-66.

Chemometrics with R, Springer, 2011, pp.

M

[6] H. Abdi, L.J. Williams, Principal component analysis, Wiley Interdisciplinary Reviews: Computational Statistics, 2 (2010) 433-459.

ED

[7] C.-Y. Cheng, C.-C. Hsu, M.-C. Chen, Adaptive kernel principal component analysis (KPCA) for monitoring small disturbances of nonlinear processes, Industrial & Engineering Chemistry Research,

PT

49 (2010) 2254-2262.

[8] O. Taouali, I. Jaffel, H. Lahdhiri, M.F. Harkat, H. Messaoud, New fault detection method based

CE

on reduced kernel principal component analysis (RKPCA), The International Journal of Advanced Manufacturing Technology, 85 (2016) 1547-1552.

AC

[9] N. Lu, Y. Yao, F. Gao, F. Wang, Two‐dimensional dynamic PCA for batch process monitoring, AIChE Journal, 51 (2005) 3300-3304. [10] L. Dobos, J. Abonyi, On-line detection of homogeneous operation ranges by dynamic principal component analysis based time-series segmentation, Chemical Engineering Science, 75 (2012) 96-105. [11] X. Liu, K. Li, M. McAfee, G.W. Irwin, Improved nonlinear PCA for process monitoring using support vector data description, Journal of Process Control, 21 (2011) 1306-1317.

ACCEPTED MANUSCRIPT

[12] Y. Mori, M. Kuroda, N. Makino, Nonlinear Principal Component Analysis, in:

Nonlinear

Principal Component Analysis and Its Applications, Springer, 2016, pp. 7-20. [13] G. Zwanenburg, H.C. Hoefsloot, J.A. Westerhuis, J.J. Jansen, A.K. Smilde, ANOVA–principal component analysis and ANOVA–simultaneous component analysis: A comparison, Journal of Chemometrics, 25 (2011) 561-567. [14] T.J. Rato, J. Blue, J. Pinaton, M.S. Reis, Translation-Invariant Multiscale Energy-Based PCA

CR IP T

for Monitoring Batch Processes in Semiconductor Manufacturing, IEEE Transactions on Automation Science and Engineering, (2016).

[15] Q. Jiang, X. Yan, B. Huang, Performance-driven distributed PCA process monitoring based on fault-relevant variable selection and Bayesian inference, IEEE Transactions on Industrial Electronics,

AN US

63 (2016) 377-386.

[16] M.E. Tipping, C.M. Bishop, Probabilistic principal component analysis, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61 (1999) 611-622. [17] A.T. Basilevsky, Statistical factor analysis and related methods: theory and applications, John

M

Wiley & Sons, 2009.

[18] R.P. McDonald, Factor analysis and related methods, Psychology Press, 2014.

ED

[19] S. Yin, S.X. Ding, D. Zhou, Diagnosis and prognosis for complicated industrial systems—Part I, IEEE Transactions on Industrial Electronics, 63 (2016) 2501-2505.

PT

[20] Y. Zhang, C. Ma, Decentralized fault diagnosis using multiblock kernel independent component analysis, Chemical Engineering Research and Design, 90 (2012) 667-676.

CE

[21] B. Song, H. Shi, Y. Ma, J. Wang, Multisubspace Principal Component Analysis with Local Outlier Factor for Multimode Process Monitoring, Industrial & Engineering Chemistry Research, 53

AC

(2014) 16453-16464.

[22] J.A. Westerhuis, T. Kourti, J.F. MacGregor, Analysis of multiblock and hierarchical PCA and PLS models, Journal of chemometrics, 12 (1998) 301-321. [23] S.J. Qin, S. Valle, M.J. Piovoso, On unifying multiblock analysis with application to decentralized process monitoring, Journal of chemometrics, 15 (2001) 715-742. [24] C. Tong, Y. Song, X. Yan, Distributed statistical process monitoring based on four-subspace construction and Bayesian inference, Industrial & Engineering Chemistry Research, 52 (2013) 9897-9907.

ACCEPTED MANUSCRIPT

[25] Z. Ge, Z. Song, Distributed PCA Model for Plant-Wide Process Monitoring, Industrial & Engineering Chemistry Research, 52 (2013) 1947-1957. [26] B. Wang, X. Yan, Q. Jiang, Z. Lv, Generalized Dice's coefficient‐based multi‐block principal component analysis with Bayesian inference for plant‐wide process monitoring, Journal of Chemometrics, 29 (2015) 165-178. [27] Z. Ge, Z. Song, P. Wang, Probabilistic combination of local independent component regression

CR IP T

model for multimode quality prediction in chemical processes, Chemical Engineering Research and Design, 92 (2014) 509-521.

[28] X. Peng, Y. Tang, W. Du, F. Qian, Multimode Process Monitoring and Fault Detection: A Sparse Modeling and Dictionary Learning Method, IEEE Transactions on Industrial Electronics,

AN US

(2017).

[29] S.J. Qin, Recursive PLS algorithms for adaptive data modeling, Computers & Chemical Engineering, 22 (1998) 503-514.

[30] W. Li, H.H. Yue, S. Valle-Cervantes, S.J. Qin, Recursive PCA for adaptive process monitoring,

M

Journal of process control, 10 (2000) 471-486.

[31] D.-H. Hwang, C. Han, Real-time monitoring for a process with multiple operating modes,

ED

Control Engineering Practice, 7 (1999) 891-902.

[32] S.J. Zhao, J. Zhang, Y.M. Xu, Monitoring of processes with multiple operating modes through

(2004) 7025-7035.

PT

multiple principle component analysis models, Industrial & engineering chemistry research, 43

CE

[33] M.L. Maestri, M.C. Cassanello, G.I. Horowitz, Kernel PCA performance in processes with multiple operation modes, Chemical Product and Process Modeling, 4 (2009).

AC

[34] S.W. Choi, J.H. Park, I.-B. Lee, Process monitoring using a Gaussian mixture model via principal component analysis and discriminant analysis, Computers & chemical engineering, 28 (2004) 1377-1387. [35] J. Yu, Fault detection using principal components-based Gaussian mixture model for semiconductor manufacturing processes, Semiconductor Manufacturing, IEEE Transactions on, 24 (2011) 432-444. [36] Z. Ge, Z. Song, Multimode process monitoring based on Bayesian method, Journal of Chemometrics, 23 (2009) 636-650.

ACCEPTED MANUSCRIPT

[37] Z. Ge, Z. Song, Bayesian inference and joint probability analysis for batch process monitoring, AIChE Journal, 59 (2013) 3702-3713. [38] Q.C. Jiang, X.F. Yan, Monitoring multi-mode plant-wide processes by using mutual information-based multi-block PCA, joint probability, and Bayesian inference, Chemometrics and intelligent laboratory systems, 136 (2014) 121-137.

Royal Society of London, (1895) 240-242.

CR IP T

[39] K. Pearson, Note on regression and inheritance in the case of two parents, Proceedings of the

[40] Z. Ge, Z. Song, Multivariate statistical process control: process monitoring methods and applications, Springer Science & Business Media, 2012.

[41] D.M. Tax, R.P. Duin, Support vector data description, Machine learning, 54 (2004) 45-66.

AN US

[42] Q. Jiang, X. Yan, Just‐in‐time reorganized PCA integrated with SVDD for chemical process monitoring, AIChE Journal, 60 (2014) 949-965.

[43] D.M. Tax, R.P. Duin, Support vector domain description, Pattern recognition letters, 20 (1999) 1191-1199.

M

[44] S. Yoon, J.F. MacGregor, Fault diagnosis with multivariate statistical models part I: using steady state fault signatures, Journal of process control, 11 (2001) 387-400.

ED

[45] Y. Ma, H. Shi, M. Wang, Adaptive local outlier probability for dynamic process monitoring, Chinese Journal of Chemical Engineering, 22 (2014) 820-827.

PT

[46] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Computers & Chemical Engineering, 17 (1993) 245-255.

CE

[47] L.H. Chiang, R.D. Braatz, E.L. Russell, Fault detection and diagnosis in industrial systems, Springer Science & Business Media, 2001.

AC

[48] Q. Jiang, X. Yan, Multivariate statistical process monitoring using modified factor analysis and its application, Journal of Chemical Engineering of Japan, 45 (2012) 829-839. [49] Q. Jiang, X. Yan, Probabilistic monitoring of chemical processes using adaptively weighted factor analysis and its application, Chemical Engineering Research and Design, 92 (2014) 127-138.

ACCEPTED MANUSCRIPT

Lists of figures and tables

Fig. 1. Schematic Diagram of CJS-FA for Process Monitoring Fig. 2. Correlation Coefficients among Variables in Mode 1

CR IP T

Fig. 3. Monitoring results of Case 1: (a) Joint Probabilities; (b) Identified Mode; (c) CJS-FA Monitoring Statistics; (d) FA Monitoring Statistics; (e) PCA Monitoring Statistics

Fig. 4. Monitoring results of Case 2: (a) Joint Probabilities; (b) Identified Mode; (c) CJS-FA

AN US

Monitoring Statistics; (d) FA Monitoring Statistics; (e) PCA Monitoring Statistics

Fig. 5. Monitoring results of Case 3: (a) Joint Probabilities; (b) Identified Mode; (c) CJS-FA Monitoring Statistics; (d) FA Monitoring Statistics; (e) PCA Monitoring Statistics

M

Fig. 6. Schematic graph of the CSTR process

ED

Fig. 7. Monitoring Results of Case 1 in CSTR

PT

Fig. 8. Monitoring Results of Case 2 in CSTR

CE

Fig. 9. Monitoring Results of Case 3 in CSTR

AC

Fig. 10. Control system of the Tennessee Eastman process

Fig. 11. Monitoring Results of Case 1 in the TE process: (a) Joint Probability; (b) Mode Identification Results; (c) Monitoring Statistic DR; (d) Monitoring Results in each Subspace

Fig. 12. Monitoring results of Case 2 in the TE process: (a) Joint Probability; (b) Monitoring Results of CJS-FA; (c) Monitoring Results of FA under Identified Mode; (d) Monitoring Results in each Subspace

ACCEPTED MANUSCRIPT

Fig. 13. Monitoring results of Case 3 in the TE process: (a) Joint Probability; (b) Monitoring Results of CJS-FA; (c) Monitoring Results of FA under Identified Mode; (d) Monitoring Results of PCA under Identified Mode; (e) Monitoring Results in each Subspace

Table 2 Three operating modes of the CSTR system Table 3 Six different operating modes of the TE process Table 4 Process monitoring variables in the TE process

AN US

Table 5 Division Result of the TE process

CR IP T

Table 1 Monitored process Variables in the CSTR system

Table 6 Process faults programmed in the TE process

Table 7 Monitoring Results of FA and CJS-FA in Mode 1

AC

CE

PT

ED

M

Table 8 The monitoring results of PCA, FA, SFA, WFA and CJS-FA

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

PT CE

1.5 1 0.5

AC

Correlation Coefficients

Fig. 1. Schematic Diagram of CJS-FA for Process Monitoring

0

0

2

4

6

8

1.5

1.5

1.5

1

1

1

0.5

0.5

0.5

0

0

Correlation Coefficients

variable number

2

4

6

8

0

0

variable number

2

4

6

8

0

1.5

1.5

1.5

1

1

1

1

0.5

0.5

0.5

0.5

0

2

4

6

8

variable number

0

0

2

4

6

variable number

2

8

0

0

2

4

6

4

6

8

variable number

1.5

0

0

variable number

8

0

0

variable number

Fig. 2. Correlation Coefficients among Variables in Mode 1

2

4

6

variable number

8

ACCEPTED MANUSCRIPT

0

Joint probability

10

-100

10

-200

10

Mode1 Mode2 Mode3

-300

10

50

100

150

200

250

300

350

(a)

3

AN US

Mode number

4

2

1

0

0

50

400

CR IP T

0

100

150

200

250

300

350

400

Samples

M

(b)

ED

20

10

PT

5

0

100

15

10

5

200 Samples

300

0

400

30

0

100

200 Samples

300

400

0

100

200 Samples

300

400

5

25

4

20

CJS-FA

FA

AC

CE

0

20

PPCA-T2

PCA-T2

15

25

15

3

2

10 1

5 0

0

100

200 Samples

300

400

0

(c) Fig. 3. Monitoring results of Case 1: (a) Joint Probabilities; (b) Identified Mode; (c) PCA, PPCA, FA and CJS-FA Monitoring Statistics

ACCEPTED MANUSCRIPT

0

Joint probability

10

-100

10

-200

10

Mode1 Mode2 Mode3

-300

0

50

100

150

200

CR IP T

10

250

300

(a)

2

1

0

0

50

100

150

AN US

3

200

250

M

Mode number

4

350

300

350

400

400

Samples

ED

(b)

70 60

50

50

PPCA-T2

60

40

CE

PCA-T2

PT

70

30

30

20

20

10

10

0

AC

40

0

100

200 Samples

300

0

400

150

0

100

200 Samples

300

400

0

100

200 Samples

300

400

350 300 250

FA

CJS-FA

100

50

200 150 100 50

0

0

100

200 Samples

300

400

0

ACCEPTED MANUSCRIPT

(c)

100

50

0

0

50

100

150

0

50

100

150

200 Samples

250

80 70

50 40 30 20 10 0

300

AN US

Subspace 2

60

CR IP T

Subspace 1

150

200 Samples

250

300

350

400

350

400

M

(d)

ED

Fig. 4. Monitoring results of Case 2: (a) Joint Probabilities; (b) Identified Mode; (c) PCA, PPCA, FA

AC

CE

PT

and CJS-FA Monitoring Statistics; (d) Monitoring Statistics in each subspace

ACCEPTED MANUSCRIPT

0

Joint probability

10

-10

10

-20

10

Mode1 Mode2 Mode3 0

50

100

150

200

CR IP T

-30

10

250

300

(a) 4

2

1

0

0

50

400

AN US

Mode number

3

350

100

150

200

250

300

350

400

M

Samples

ED

(b)

30

PT

25

20

15

PPCA-T2

PCA-T2

20 15

10

CE

10

5

5

AC

0

0

100

200 Samples

300

0

400

30

0

100

200 Samples

300

400

0

100

200 Samples

300

400

20

25 15

CJS-FA

FA

20 15

10

10 5 5 0

0

100

200 Samples

300

400

0

ACCEPTED MANUSCRIPT

(c)

14 12

Subspace 1

10 8 6

2 0

0

50

100

150

0

50

100

150

200 Samples

250

60

40 30 20 10

200 Samples

250

300

350

400

350

400

ED

M

0

300

AN US

Subspace 2

50

CR IP T

4

(d)

PT

Fig. 5. Monitoring results of Case 3: (a) Joint Probabilities; (b) Identified Mode; (c) PCA, PPCA, FA

AC

CE

and CJS-FA Monitoring Statistics; (d) Monitoring Statistics in each subspace

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

Fig. 6. Schematic graph of the CSTR process

ACCEPTED MANUSCRIPT

0

Joint probability

10

-5

10

Mode1 Mode2 Mode3 -10

0

200

400

600

Samples

(a)

10

Transients

AN US

CJS-FA

15

5

0

0

800

200

400

600

1000

CR IP T

10

800

1000

800

1000

M

Samples

ED

(b)

25

15

CE

FA

PT

20

10

AC

5 0

0

200

400

600

Samples

(c)

Fig. 7. Monitoring Results of Case 1 in CSTR

ACCEPTED MANUSCRIPT

0

Joint probability

10

-1

10

-2

Mode1 Mode2 Mode3

10

-3

0

200

400

600

Samples

(a) 10

6

AN US

CJS-FA

8

4 2 0

0

800

200

400

600

1000

CR IP T

10

800

1000

800

1000

M

Samples

(b)

ED

40

FA

PT

30 20

CE

10

AC

0

0

200

400

600

Samples

(c) Fig. 8 Monitoring Results of Case 2 in CSTR

ACCEPTED MANUSCRIPT

0

Joint probability

10

-5

10

Mode1 Mode2 Mode3

-10

10

-15

0

200

400

600

Samples

(a) 200 30 20 10 100

0 600

700

50 0

0

200

AN US

CJS-FA

150

800

1000

CR IP T

10

800

400

600

800

1000

M

Samples

(b)

ED

200

100 150

FA

PT

50

100

0 600

700

800

CE

50

AC

0

0

200

400

600

800

Samples

(c) Fig. 9. Monitoring Results of Case 3 in CSTR

1000

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

Fig. 10. Control system of the Tennessee Eastman process

ACCEPTED MANUSCRIPT

0

Joint probability

10

-100

10

Mode1 Mode2 Mode3 -200

0

500

1000

Samples

(a)

3 2 1 0

0

500

AN US

Mode number

4

1500

1000

2000

CR IP T

10

1500

2000

1500

2000

M

Samples

(b)

ED

10

6

PT

CJS-FA

8

4

CE

2

AC

0

0

500

1000

Samples

(c)

ACCEPTED MANUSCRIPT

8

Subspace 2

Subspace 1

20 15 10 5 0

0

500

1000

1500

6 4 2 0

2000

0

500

Subspace 4

8

20 10

0

500

1000

1500

6 4 2 0

2000

0

500

1000

AN US

Subspace 3

30

0

1000

1500

2000

Samples

CR IP T

Samples

Samples

1500

2000

Samples

(d)

Fig. 11. Monitoring Results of Case 1 in the TE process: (a) Joint Probability; (b) Mode

AC

CE

PT

ED

M

Identification Results; (c) Monitoring Statistic DR; (d) Monitoring Results in each Subspace

ACCEPTED MANUSCRIPT

0

-50

10

-100

10

Mode1 Mode2 Mode3

-150

10

-200

10

0

500

1000

1500

Samples

(a)

AN US

2000

100

50

0

500

PT

0

ED

M

CJS-FA

150

CR IP T

Joint probability

10

1000

1500

2000

1500

2000

Samples

(b)

CE

200

FA

AC

150

100

50

0

0

500

1000

Samples

(c)

ACCEPTED MANUSCRIPT

15

Subspace 2

Subspace 1

200 150 100 50 0

0

500

1000

1500

10 5 0

2000

0

500

Subspace 4

40 20

500

1000

1500

20 10 0

2000

0

500

1000

AN US

Subspace 3

30

0

2000

Samples

60

0

1500

CR IP T

Samples

1000

Samples

1500

2000

Samples

(d)

Fig. 12. Monitoring results of Case 2 in the TE process: (a) Joint Probability; (b) Monitoring Results

M

of CJS-FA; (c) Monitoring Results of FA under Identified Mode; (d) Monitoring Results in each

AC

CE

PT

ED

Subspace

ACCEPTED MANUSCRIPT

0

Joint probability

10

-50

10

-100

10

Mode1 Mode2 Mode3

-150

10

-200

0

500

1000

1500

Samples

(a)

100

50

0

500

1000

1500

2000

M

0

AN US

CJS-FA

150

2000

CR IP T

10

Samples

ED

(b)

2000

1000

CE

FA

PT

1500

AC

500

0

0

500

1000

Samples

(c)

1500

2000

ACCEPTED MANUSCRIPT

400

PCA

300

200

100

0

500

1000

1500

Samples

(d)

Subspace 2

15

20 10 0

0

500

1000

1500

M

Subspace 3

ED

500

1000

PT

0

500

1000

1500

2000

1500

1500

2000

8 6 4 2 0

2000

Samples

CE

0

Samples

1000

0

5 0

2000

Samples

500

10

AN US

30

Subspace 4

Subspace 1

40

2000

CR IP T

0

0

500

1000

Samples

(e)

AC

Fig. 13. Monitoring results of Case 3 in the TE process: (a) Joint Probability; (b) Monitoring Results of CJS-FA; (c) Monitoring Results of FA under Identified Mode; (d) Monitoring Results of PCA under Identified Mode; (e) Monitoring Results in each Subspace

ACCEPTED MANUSCRIPT

Table 1 Monitored process Variables in the CSTR system Process measurements

1

Outlet temperature T

2

Outlet concentration C

3

Cooling-water flow rate FC

4

Inlet solute flow FA

5

Inlet solvent flow FS

6

Cooling-water temperature Tc

7

Inlet temperature T0

8

Inlet solute concentration CAA

9

Inlet solvent concentration CAS

AC

CE

PT

ED

M

AN US

CR IP T

Variable No.

ACCEPTED MANUSCRIPT

Table 2 Three operating modes of the CSTR system Mode 1

Mode 2

Mode 3

Cooling-water temperature Tc

15.00

6.61

3.43

Inlet temperature T0

0.80

0.75

0.69

Inlet solute concentration CAA

356.25

370.77

372.00

AC

CE

PT

ED

M

AN US

CR IP T

Variable

ACCEPTED MANUSCRIPT

Table 3 Six different operating modes of the TE process G/H mass ratio

Production rate(kg/h)

1

50/50

14.076

2

10/90

14.076

3

90/10

11.111

4

50/50

Maximum

5

10/90

Maximum

6

90/10

Maximum

AC

CE

PT

ED

M

AN US

CR IP T

Operating mode

ACCEPTED MANUSCRIPT

Table 4. Process monitoring variables in the TE process Process measurements

No.

Process measurements

1

A feed (stream 1)

17

Stripper underflow (stream 11)

2

D feed (stream 2)

18

Stripper temperature

3

E feed (stream 3)

19

Stripper steam flow

4

total feed

20

Compressor work

5

Recycle flow (stream 8)

21

Reactor cooling water outlet temperature

6

Reactor feed rate (stream 6)

22

Separator cooling water outlet temperature

7

Reactor pressure

23

D feed flow valve (stream 2)

8

Reactor level

24

E feed flow valve (stream 3)

9

Reactor temperature

25

A feed flow valve (stream 1)

10

Purge rate (stream 9)

26

Total feed flow valve (stream4)

11

Product separator temperature

27

Purge valve (stream 9)

12

Product separator level

28

Separator pot liquid flow valve (stream10)

13

Product separator pressure

29

Stripper liquid product flow valve (stream 11)

14

Product separator underflow (stream10)

15

Stripper level

16

Stripper pressure

AN US 30

Reactor cooling water flow

31

Condenser cooling water flow

M ED PT CE AC

CR IP T

No.

ACCEPTED MANUSCRIPT

Table 5. Division Result of the TE process Variable No.

1

2, 10, 21, 23, 24, 27, 30

2

1, 4, 7, 25, 26

3

11, 13, 16, 18, 20, 22, 28, 29

4

3, 5, 6, 8, 9, 12, 14, 15, 17, 19, 31

AC

CE

PT

ED

M

AN US

CR IP T

Subspace No.

ACCEPTED MANUSCRIPT

Table 6. Process faults programmed in the TE process Process variable

Type

1 2 3 4 5 6 7

A/C feed ratio, B composition constant (stream 4) B composition, A/C ratio constant (stream 4) D feed temperature (stream 2) reactor cooling water inlet temperature condenser cooling water inlet temperature A feed loss (stream 1) C header pressure loss-reduced availability (stream 4) A, B, C feed composition (stream 4) D feed temperature (stream 2) C feed temperature (stream 4) reactor cooling water inlet temperature condenser cooling water inlet temperature reaction kinetics reactor cooling water valve condenser cooling water valve unknown unknown unknown unknown unknown

step step step step step step step

CR IP T

Fault No.

random variation random variation random variation random variation random variation slow drift sticking sticking unknown unknown unknown unknown unknown

AC

CE

PT

ED

M

AN US

8 9 10 11 12 13 14 15 16 17 18 19 20

ACCEPTED MANUSCRIPT

Table 7. Monitoring Results of FA and CJS-FA in Mode 1 FA

Sub-FA 1

Sub-FA 2

Sub-FA 3

Sub-FA 4

CJS-FA

1

0.003

0.453

0.015

0.008

0.733

0.003

2

0.008

0.017

0.077

0.008

0.933

0.007

3

1

0.988

1

0.978

1

0.935

4

0.105

0.001

1

0.921

0.997

0

5

0.997

0.992

1

0.970

1

0.943

6

0

0.014

0

0

0.205

0

7

0

0.930

0

0.905

1

0

8

0.053

0.130

0.098

0.077

0.592

0.048

9

0.985

0.987

10

0.125

0.992

11

0.112

0.120

12

0.617

0.825

13

0.088

0.112

14

0.018

0.117

15

0.998

16

1

17

0.010

18

0.205

AN US

1

0.890

1

0.107

1

0.108

0.997

0.360

0.712

0.030

0.998

0.548

1

0.495

0.235

0.090

0.532

0.075

1

0.963

0.565

0.008

0.992

1

0.965

1

0.938

0.992

1

0.973

1

0.948

0.012

0.990

0.602

0.907

0.005

0.942

0.999

0.540

1

0.522

0.007

0.818

0.862

0.055

0.953

0.030

0.042

0.122

0.967

0.030

0.948

0.030

ED

M

0.957

PT

AC

20

1

CE

19

CR IP T

Fault No.

ACCEPTED MANUSCRIPT

Table 8. The monitoring results of PCA, FA, SFA, WFA and CJS-FA PCA

FA

SFA

WFA

CJS-FA

1

0.01

0.01

0.01

0.01

0

2

0.02

0.01

0.04

0.03

0

3

0.93

0.98

0.94

0.94

0.88

4

0.75

0.96

0.30

0.01

0.90

5

0.72

0

0.76

0.67

0

6

0.01

0.01

0.01

0.01

0

7

0

0.60

0.58

0.43

0.76

8

0.03

0.02

0.03

0.01

0

9

0.97

0.98

0.97

0.97

0.97

10

0.66

0.18

0.37

0.29

0.56

11

0.56

0.57

0.38

0.33

0.52

12

0.01

0.01

0.02

0.01

0

13

0.06

0.05

0.06

0.04

0

14

0.01

0

0.01

0

0.19

0.98

0.98

0.83

0.76

0.80

0.78

0.60

0.37

0.59

0.70

PT

16

AN US

17

0.22

0.08

0.17

0.16

0.157

18

0.10

0.10

0.23

0.09

0

19

0.84

0.97

0.96

0.88

0.92

20

0.73

0.49

0.84

0.18

0.36

CE AC

M

ED

15

CR IP T

Fault No.