Online process monitoring for complex systems with dynamic weighted principal component analysis

Online process monitoring for complex systems with dynamic weighted principal component analysis Zhengshun Fei, Kangling Liu PII: DOI...

Download PDF

2MB Sizes 0 Downloads 117 Views

Report

PDF Reader
Full Text

Online process monitoring for complex systems with dynamic weighted principal component analysis Zhengshun Fei, Kangling Liu PII: DOI: Reference:

S1004-9541(16)30512-2 doi: 10.1016/j.cjche.2016.05.038 CJCHE 591

To appear in: Received date: Revised date: Accepted date:

31 October 2015 18 February 2016 3 May 2016

Please cite this article as: Zhengshun Fei, Kangling Liu, Online process monitoring for complex systems with dynamic weighted principal component analysis, (2016), doi: 10.1016/j.cjche.2016.05.038

This is a PDF ﬁle of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its ﬁnal form. Please note that during the production process errors may be discovered which could aﬀect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

RI

PT

2015-0545

SC

Graphic Abstract

NU

The following figure illustrates that PCA monitoring neglects dynamic information hidden in the data and it may be insensitive to changes in the component auto-correlation structure. In the PCS, T 2 is computed based

1

2 , and in the RS, Q is determined according to axes t3 and t 4 along the directions p3 and p4 with 3 and 4 . Normal sample space lies within the circle and the ellipse. Obviously, auto-

TE

minimum variances of

D

and

MA

on axes t1 / (1 ) and t 2 / (1 ) that represent the directions p1 and p2 with maximum variances of

AC CE P

correlation structures of components t 2 and t 3 change from samples x0  xk to samples xk  xk  , and this change is undetectable by PCA since their statistics are still within the circle and ellipse. To address this problem, we propose a novel process monitoring method that combines time series technique and PCA.

t2

t4

p3

2

p1 p2

.

p4

.

x0

1

xk '

xk

PCS t1

1

RS

x0

4

xk xk '

.

3

t3

1

ACCEPTED MANUSCRIPT Process Systems Engineering and Process Safety

RI

PT

Online process monitoring for complex systems with dynamic weighted * principal component analysis

SC

Zhengshun Fei(费正顺) 1,**, Kangling Liu(刘康玲)2

School of Automation and Electrical Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China

2

State Key Lab of Industrial Control Technology, Institute of Industrial Control Technology, College of Control Science and

MA

Engineering, Zhejiang University, Hangzhou 310027, China

Received 31 October 2015

Accepted 3 May 2016

TE

Received in revised form 18 February 2016

D

Article history:

Supported by the National Natural Science Foundation of China (61174114), the Research Fund for the Doctoral Program of

AC CE P



NU

1

Higher Education in China (20120101130016), the Natural Science Foundation of Zhejiang Province (LQ15F030006), and the Science and Technology Program Project of Zhejiang Province (2015C33033). ** To whom correspondence should be addressed. Email: [email protected]

Abstract Conventional multivariate statistical methods for process monitoring may not be suitable for dynamic processes since

they usually rely on assumptions such as time invariance or uncorrelation. We are therefore motivated to propose a new

monitoring method by compensating the principal component analysis with a weight approach. The proposed monitor consists of

two tiers. The first tier uses the principal component analysis method to extract cross-correlation structure among process data,

expressed by independent components. The second tier estimates auto-correlation structure among the extracted components as

auto-regressive models. It is therefore named a dynamic weighted principal component analysis with hybrid correlation structure.

2

ACCEPTED MANUSCRIPT

The essential of the proposed method is to incorporate a weight approach into principal component analysis to construct two new

PT

subspaces, namely the important component subspace and the residual subspace, and two new statistics are defined to monitor

them respectively. Through computing the weight values upon a new observation, the proposed method increases the weights

SC

RI

along directions of components that have large estimation errors while reduces the influences of other directions. The rationale

behind comes from the observations that the fault information is associated with online estimation errors of auto-regressive

NU

models. The proposed monitoring method is exemplified by the Tennessee Eastman process. The monitoring results show that

MA

the proposed method outperforms conventional principal component analysis, dynamic principal component analysis and

dynamic latent variable .

AC CE P

1 INTRODUCTION

TE

D

Keywords principal component analysis, weight, online process monitoring, dynamic

Advanced manufacturing systems rely on an efficient process monitoring to increase the quality, efficiency and reliability of existing technologies [1, 2]. Manufacturing process is usually highly complicated and lacks accurate models, which makes the model-based methods [3-5] unsuitable. However, floods of data can be obtained on-line through sensors embedded in the process. This situation facilitates the development of multivariate statistical process monitoring method based on principal component analysis (PCA) [6] that utilizes process data and requires no explicit process knowledge. PCA is widely used in many applications because of its advantage of handling high dimensional and correlated process variables [7-10]. For process monitoring, PCA partitions the process data space into a principal component subspace and a residual subspace, and uses T 2 and Q statistics to monitor the two subspaces respectively.

3

ACCEPTED MANUSCRIPT On the other hand, manufacturing applications are generally dynamic processes and process variables

PT

exhibit auto-correlations because of controller feedback and disturbances. Here, auto-correlation means that current observation is correlated with previous ones. As a result, conventional multivariate statistical methods,

RI

which rely on assumptions that (i) the process is time invariant and (ii) variables are serially uncorrelated, have

SC

the tendency to generate false alarms or missed detection [11]. This mismatch suggests a dynamic method

NU

analyzing serial correlations is needed [11-14]. Some speech recognition approaches, such as hidden Markov

MA

model [15] and dynamic time warping [16], were developed for off-line diagnosis. These approaches rely heavily on known fault information, obviously, it is often not complete since we can not ensure that all

D

possible faults are pre-defined in complex systems. Ku et al. [11] proposed a dynamic PCA (DPCA) that

TE

constructs singular value decomposition on an augmented data matrix containing time lagged process

AC CE P

variables, which increases the size of variable set and has difficulty in model interpretation [17-19]. With the similar idea, some subspace methods based on canonical variate analysis [20] and consistent DPCA [21] were proposed. Bakshi [12] introduced a multi-scale PCA that integrates PCA with wavelet analysis, which is an effective tool to monitor auto-correlated observations without matrix augmentation. Multi-scale PCA first decomposes process data into several time-scales using the wavelet analysis and then establishes PCA on wavelet coefficients for different scales, and a moving window technique is used for online monitoring. A further analysis on multi-scale PCA was provided by Misra et al [14]. Yoon [22] pointed out that MSPCA puts equal weights on different scales regardless of the scale contribution to overall process variation and then unreasonably increases the small contribution of high-frequency scales. Recently, Li and Qin [23] proposed a dynamic latent variable (DLV) model to extract auto-correlation and cross-correlations. In particular, some

4

ACCEPTED MANUSCRIPT probability methods were developed for dynamic process monitoring [24-26]. Choi [24] constructed a

PT

Gaussian mixture model based on PCA and discriminant analysis for representing the distribution underlying dynamic data. Li and Fang [25] proposed an increasing mapping based on hidden Markov model for large-

RI

scale dynamic processes. Zhu and Ge [26] extended hidden Markov model to characterize the time-domain

SC

dynamics.

NU

Inspired by these approaches, we propose a new monitoring method called dynamic weighted PCA

MA

(DWPCA), with the advantages that it is dynamic data driven and can detect faults in an automatic manner. The proposed method designs a hybrid correlation structure that simultaneously contains auto- and cross-

D

correlation information of processes. The design includes two tiers. The first tier is to use the PCA method to

TE

extract the cross-correlation structure among process data, expressed by independent components, and the

AC CE P

second tier is to estimate the auto-correlation structure among the extracted components as auto-regressive (AR) models. For online monitoring, we incorporate a weight approach into PCA. Actually, the weight approach is not new and has many applications such as correspondence search [27], face recognition [28] and process monitoring [29, 30]. To the best of our knowledge, the weight method developed on a two-tier hybrid correlation structure is new for process monitoring. In this work, we use the weight approach to give different weights on different directions of components based on their contributions to a fault. Assume that fault information is associated with online estimation errors of AR models, a weight function is defined based on estimation errors for each component to take emphasis on directions of components, and its essential is that the directions are given high weight values if they have large estimation errors. The weight values are automatically computed when a new observation becomes available. Then, the computed weights can be used

5

ACCEPTED MANUSCRIPT to dynamically partition the process data space into two new subspaces, namely an important component

PT

subspace and a remaining component subspace, and two new statistics are calculated to monitor them, with similar motivations of conventional PCA monitoring. But the differences are that (i) the proposed method

SC

RI

makes use of online process operating information to actively perform subspace partition, and (ii) two new statistics takes both auto- and cross-correlations into account while T 2 and Q statistics only consider cross-

NU

correlations, and (iii) the contributions of component directions of the proposed method are not at the same

MA

degree while those of PCA are with the same value of 1.

The rest of this work is organized as follows. The conventional PCA is introduced briefly in Section 2. A

D

simple process simulation is provided to illustrate problems of PCA monitoring based on T 2 and Q statistics.

TE

This gives rise to the motivations of DWPCA. In Section 3, DWPCA for process monitoring is detailed,

AC CE P

including two new monitoring statistics. Tennessee Eastman process is employed to demonstrate the process monitoring performance of the proposed method in Section 4. The results show that the proposed method outperforms conventional PCA. Finally, Section 5 concludes the work.

2 PRINCIPAL COMPONENT ANALYSIS MONITORING 2.1 Principal component analysis Suppose that a normal data set X 

N J

collecting N samples of J variables is scaled to have zero

means and unit variances. The principal component analysis (PCA) decomposition is developed as l J ˆ ˆ T  TP T . Here, t  X  i 1 ti piT  i l 1 ti piT  TP i

pi 

J 1

and

variance

i  tiT ti

are

N 1

eigenvector

represents the ith component, and its direction and

eigenvalue

of

covariance

matrix

SX  X T X / (N  1) . The components are in the order of variance decrease, i.e. 1  2 ,,  J .

6

ACCEPTED MANUSCRIPT The first l components retained span a principal component subspace (PCS) and the remaining J  l

matrices in the PCS, respectively, and T 

N ( J l )

, P

N l

and Pˆ 

J ( J l )

J l

are component and direction

PT

components represent a residual subspace (RS). The Tˆ 

correspond to the RS. To determine l ,

J1

, the T 2 and Q statistics are established for monitoring the two subspaces. In the PCS,

SC

x

RI

the cumulative percent variance (CPV) is widely used for its simplicity. For a particular observation

l

NU

ˆ 1 Pˆ T x  t 2 /   T 2 , where Λ  diag  ,  ,,    T 2  x T PΛ  i i lim 1 2 l i 1

T

T



J 2 i l 1 i

t  Qlim . A fault is detected when the monitoring statistics violate their

TE

2.2 Problems of PCA monitoring

D

2 control limits Tlim and Qlim .



 2 distribution [31] with a

AC CE P

Control limits for both statistics can be calculated from an F or weighted confidence

is a diagonal matrix, and in

MA

the RS, Q  x PP x 

l l

, typically set   95% or 99% . In other words, a fault is detectable by PCA when its

statistics must violate their corresponding control limits more than (1   ) 100% times. The essential of PCA monitoring lies in detecting changes in the cross-correlation structure among components. PCA monitoring neglects dynamic information hidden in the data and it may be insensitive to changes in the component auto-correlation structure under the condition formulated in Figure 1. In the PCS, T 2 is computed based on axes t1 / (1 ) and t 2 / (1 ) that represent the directions p1 and p2 with maximum variances of

1 and 2 , and in the RS, Q is determined according to axes t3 and t 4 along the directions p3 and p4 with minimum variances of 3 and

4 . Normal sample space lies within the circle and the ellipse. Obviously,

7

ACCEPTED MANUSCRIPT auto-correlation structures of components t 2 and t 3 change from samples x0  xk to samples xk  xk  ,

PT

and this change is undetectable by PCA since their statistics are still within the circle and ellipse.

z T   z1 , z2 , z3 , z4  as

SC

z (k )  Au(k )  ζ (k )

RI

The problems of PCA monitoring are illustrated by a simulated simple process involving four variables

(1)

NU

u(k )  0.4u(k  1)  ξ (k ) and

D

MA

 0.6062 0.2205 0.0493 0.7625  0 0.8921 0.3935 0  A   0.5463 0.3012 0.6767 0.3910    0.6203 0.4819   0.5640 0.2547

TE

Here, ζ T  ( 1 ,  2 ,  3 ,  4 ) of which each element

 i ~ N (0,0.01) and ξ T  (1 , 2 , 3 , 4 ) of zero

AC CE P

mean possess a variance of 4, 2, 0.9 and 0.1. We produce 600 observations for modeling (normal case) and generate another 600 samples (fault case) in which z2 is set to 2.5 after sample 200. In the PCA modeling, components 1, 2, 3 and 4 have a variance of

1  2.2817 , 2  1.1981 , 3  0.4549 and 4  0.0652 ,

respectively. Components 1 and 2 with a total variance contribution of 87% are retained to compute T 2 statistic and the remaining components are used to determine Q statistic. The statistic monitoring results using PCA are shown in Figure 2, and the fault is significantly under-reported by PCA. Figure 3 reveals that the four components are not of the same influence degree to the occurrence of the fault. We can find huge changes in the auto-correlation structures of components 2 and 4, which contain most important information of the fault in the time region. However, components 1 and 3 are rarely affected, which provide little fault information for monitoring. The reason of the high under-report rate in the PCA monitoring is probably that important

8

ACCEPTED MANUSCRIPT information of components 2 and 4 is submerged by the computation of T 2 and Q statistics, respectively. The

PT

motivation of DWPCA is to take emphasis on directions of components that carry most fault information in component auto- and cross-correlation structures. Figure 4 shows that the fault can be successfully detected

RI

when we use components 2 and 4 to compute T 2 statistic. The missing detection rates are reduced

p1 p2 p4

PCS

.

x0

MA

p3

2

t1

D

1

1

xk

RS

t4

. x0

4

xk xk '

.

3

t3

AC CE P

TE

xk '

NU

t2

SC

significantly compared with those in Figure 2.

Fig.1. Schematic illustration of problems of PCA monitoring

9

ACCEPTED MANUSCRIPT

15 monitoring statistics 99% confidence limit

T2

PT

10

0

100

200

300

400

500

NU 0

100

200

D

0

500

600

monitoring statistics 99% confidence limit

MA

5

400

300

600

samples

TE

Q

10

SC

0

RI

5

AC CE P

Fig.2. PCA monitoring results for the fault

10

AC CE P

TE

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

Fig.3. Influence of each component in the fault case

11

ACCEPTED MANUSCRIPT

80 monitoring statistics 99% confidence limit

PT

T2

60 40

0

100

200

300

10

500

monitoring statistics 99% confidence limit

100

200

D

0

300

600

samples

TE

T2

400

600

using components 2 and 4, and

Q

using components 1 and 3

AC CE P

Fig.4.

400

MA

Q

20

0

500

NU

30

SC

0

RI

20

3 DYNAMIC WEIGHTED PCA The proposed method combines time series technique and PCA, with the purpose to design a hybrid of autoand cross-correlation structures in processes. This hybrid design of correlation structure includes two tiers. The first tier is to use the PCA method to extract the cross-correlation structure among process data, expressed by independent components, and the second tier is to estimate the auto-correlation structure among the extracted components as auto-regressive (AR) models. Based on the estimated AR models, different weights are determined on different component directions automatically and dynamically, and a component direction is given a high weight value if its component has large estimation error. In this way, the DWPCA method considers the dynamic information in the processes. As a result, the DWPCA method is a dynamic method and

12

ACCEPTED MANUSCRIPT can effectively applied in dynamic systems for process monitoring. The new method produces two new

PT

statistics, Tw2 and Qw , with a similar interpretation to the T 2 and Q statistics described in the PCA monitoring.

RI

The rest of this section is organized as follows. Section 3.1 determines weights on component directions

SC

based on estimated AR models, and the weights are automatically updated when a new observation becomes

MA

NU

available. The Tw2 and Qw statistics are developed for online process monitoring in Section 3.2.

3.1 Determination of weights on component directions

D

Assume that the cross-correlation structure is expressed by independent components using the PCA

TE

decomposition. To evaluate the importance of each component i in the auto-correlation structure, for

AC CE P

i  1, 2,, J , we set a weight value wi on its component direction pi and initially wi  1 . Then, the weighted direction is pw,i  pi wi . The next step is the design of the learning algorithm for updating the weights. Let ei (k )

ti (k )  t i (k ) be the estimation error, where k is an observation index and t i (k ) is an

estimation value based on a AR model, i.e. t i (k ) 



d s 1

 s ,iti (k  s) . In which,  s ,i (s  1, 2,..., d ) is the

sth AR coefficient and d is the model order. The multi-variable least squares (MLS) algorithm is applied in

 s ,i estimation and Akaike information criterion (AIC) is used to determine d . The learning algorithm based on ei (k ) for the online-updating weights is developed as an extended exponential function:

wi (k )    (1   ) exp(D[ei (k )] / i ) with constants   1 ,

(2)

 i  0 , where  denotes the maximal bound of weights, wi (k )  1,   . The dead-

zone operator D • prevents the adaption of the weights when the modulus of estimation error ei (k ) does

13

ACCEPTED MANUSCRIPT not exceed its bound

 i , thereby reducing false alarms caused by noise. The dead-zone operator D • is

 0 ei (k )   i  2 ei (k ) otherwise

(3)

RI

D  ei (k )

PT

defined as

SC

The dead-zone bound is determined based on the ei (k ) (i  1, 2,, J ) under normal operating conditions

of data, and a univariate kernel function is defined as

1 n  z  z (i)  K nh i 1  h 

MA

f z 

NU

by the kernel density estimation (KDE) method [32, 33]. KDE is an effective tool to estimate the distribution

(4)

D

where z is the data point under consideration; z (i ) is an observation value from the data set; h is the

TE

window width or the smoothing parameter;

n

is the number of observations. The kernel function K

AC CE P

determines the shape of the smooth curve under the conditions Gaussian function is chosen for K . The

 i is obtained by





i







K  z  dz  1 and K ( z)  0 . Usually, a

f  ei  d ei   with a given confidence

  95% or 99% .

3.2 Online process monitoring scheme We partition the observations into an important component subspace (ICS) and a remaining component subspace (RCS). The ICS is constructed by components that carry most important fault information in the hybrid correlation structure, and the remaining components comprise the RCS. The importance of information that component i carries to a fault is given by

w,i  i wi2 . The value of w,i may change with different

observations, which can be written as a function of observation index k , i.e. w,i (k ) . For a particular

14

ACCEPTED MANUSCRIPT observation x (k ) 

w, i

, components are rearranged in the decreasing order of w,i (k ) and the set

(k ), i  1, 2,, J  is sorted as w ,1 (k )  w ,2 (k )   w , J (k ) . The first lw (k ) components are

PT



J 1

retained to construct the ICS and the remaining J  lw (k ) components comprise the RCS, and lw (k ) is

w, k

component

 ( pw ,1 (k ), pw ,2 (k ),, pw , J (k )) 

weighted

rearranged

as

. Next, two direction matrices comprised of the first lw (k )

ICS,k

and

RCS,k

, are given by



 pw ,1 (k ), pw ,2 (k ),, pw ,lw ( k ) (k ) 



are

J lw ( k )



 pw ,lw ( k )1 (k ), pw ,lw ( k ) 2 (k ),, pw , J (k ) 

(5)

J ( J lw ( k ))

D

RCS, k



after

MA

ICS, k

J

directions

J J

directions and the last J  lw (k ) directions,

w ,i (k ) / i 1 w ,i (k ) .

RI

corresponding

i 1

NU

Similarly,

lw ( k )

SC



determined by the CPV method, CPV (lw (k )) 

TE

Furthermore, the following Tw2 and Qw statistics can be defined in the ICS and RCS as

AC CE P

Tw2 (k )  x T (k )

Qw (k )  x (k )

1 ICS, k Λw, k

T

where

RCS, k

T ICS, k

lw ( k )



x (k ) 

i 1

T RCS, k

x (k ) 

J



i lw ( k ) 1

Λw,k  diag (w ,1 (k ), w ,2 (k ),,   w,lw ( k ) (k )) 

tw2,i (k )  Tw2,lim (k )  w,i (k )

(6)

t  (k )  Qw,lim (k ) 2 w ,i

lw ( k )lw ( k )

is

a

diagonal

matrix

and

t  w,i (k )  pwT,i (k ) x(k ) . The control limits Tw2,lim (k ) and Qw,lim (k ) are determined based on normal process data via KDE since their statistic distributions is complicated and KDE has superior ability in dealing with this situation. DWPCA-based process monitoring includes off-line modeling and on-line monitoring as summarized in Table 1. Remark 1. With the proposed method, the components are in the decreasing order of

w,i  i wi2 that takes

both the information in the auto- and cross-correlation structure into account, where wi2 and

i calculate the

15

ACCEPTED MANUSCRIPT contribution of auto- and cross-correlation information, respectively. In contrast, PCA only considers the

i decrease.

PT

information in the cross-correlation structure and its components are in the order of

Remark 2. Generally, the time complexity of the conventional PCA is O( NJ ) . We can see from Table 1 2

RI

that the DWPCA method introduces a few additional steps for online monitoring as compared to conventional

SC

PCA. The added steps are Steps 2, 3 and 4, whose running time are O(dJ ) , O( JlgJ ) and O( J ) . Then, a total of added time complexities is O(max(dJ , JlgJ )) . We have max(dJ , JlgJ )  NJ , the time

NU

2

2

MA

complexity of DWPCA is the same as PCA, O( NJ ) .

Theorem 1. Projections onto all components t w,i  Xpw,i (i  1, 2,, J ) are orthogonal to each other and

D

w,i is the variance of projection onto t w,i .

Proof. From the above introduction to the PCA method, we know that, i, j  1, 2,, J , pi p j  0

pi2  1 .

We

TE

and

have

AC CE P

(i  j )

T

twT,i tw, j  pwT,i X T Xpw, j  ( N  1)w, j pwT,i pw, j .

Incorporating

pwT,i pw, j  wi wj piT p j  0 (i  j ) gives rise to t wT,i t w, j  0 . This illustrates that projection on every t w,i is orthogonal to each other. One the other hand, we have pw,i pw,i  wi pi pi  wi . Moreover, T

2

T

2

i is the

variance of component t i , i.e. E (ti2 )  i , where E ( ) denotes the expectation function. Hence,

E (t w2 ,i )  wi2 E (ti2 )  i ,w . The proof is complete. Theorem 2. DWPCA reduces to PCA when weights on component directions are of the same value of 1, in other words, PCA is a special case of DWPCA. Proof. If the weight values equal to 1, then i  1, 2,, J , wi  1 . Since

w,i  i wi2 and

1  2   J , we have w,1  w,2   w, J . Then, i  1, 2,, J , w ,i  w,i , which means that the

order

of

components

remains

unchanged,

so

tw ,i  tw,i  ti

.

We

have

16

ACCEPTED MANUSCRIPT CPV (lw )  iw1 w ,i / i 1 w ,i  iw1 i / i 1 i , then choosing CPV (lw )  CPV (l ) gives rise to l

J

l

J

PT

lw  l . In this case, important components that construct the RCS of DWPCA are exactly principal components that comprise the PCS of PCA, similarly, the RCS and the RS are identical. Moreover, lw

RI

Tw2   tw2,i / w ,i   i 1 ti2 / i  T 2 , similarly, Qw  Q . The proof is complete. l

SC

i 1

NU

4 CASE STUDY ON TENNESSEE EASTMAN PROCESS

MA

Tennessee Eastman (TE) process [34] is widely used for process monitoring [35]. It consists of five major operations: reactor, product condenser, vapor-liquid separator, recycle compressor and a product stripper, as

TE

D

shown in Figure 5. The process has 41 measured variables (22 continuous and 19 composition) and 12

AC CE P

manipulated variables. The 22 continuous measurements and 11 manipulated variables are used for monitoring as listed in Table 2. The plant-wide control structure recommended by Lyman and Georgakis [36] is used in this case study. A total of 22 data sets are collected in different modes (one normal and 21 fault modes), and each data set contains 960 samples of the 33 variables. In each fault mode, the fault is introduced after sample 160. The detailed description of the 21 faults is provided in Table 3. Conventional PCA, DPCA [11] and DLV [11] and the proposed DWPCA method are illustrated based on the collected data sets. Fault missing detection rate is considered for evaluating the monitoring performance, which denotes the percentage rate of samples under the control limits when a fault is introduced. In this study, the number of principal components of PCA, DPCA, DLV and DWPCA is determined by the CPV with 85% variation, and their control limits are calculated by KDE with 99% confidence. The KDE methods are

17

ACCEPTED MANUSCRIPT detailed in Section 3.1. In the DWPCA method, we set  =5 in Equation 2. This application of the proposed

PT

method follows the procedure of Table 1 and more analytical details are provided in Section 3. The specific monitoring results of the proposed method are listed in Table 4 and those of conventional PCA, DPCA [11]

RI

and DLV [11] are given for comparison. The lowest fault missing detection rate for each fault is highlighted in

SC

bold. Note that both the two methods have high missing detection rate for faults 3, 9 and 15, and the three

NU

faults are difficult to be detected since they have almost no effect on the variation and the mean. Table 4 shows

MA

that the proposed DWPCA method can efficiently reduce the missing detection rate for faults 5, 10, 16, 19 and 20, as compared to conventional PCA, DPCA and DLV. The results of other faults are almost at the same

TE

D

degree.

DWPCA(

AC CE P

Table 1 Process monitoring with DWPCA

J 1

x (k ) 

Description:

X

x (k )

normal data,

Off-line modeling: 1. Normalize X  ( X 2. PCA decompose 3. Obtain

, X (k  1) , X )

 u) / 

, where

X  i1 ti piT J

1,i ,,  d ,i

for

current observation,

u, 

observations of the past

are means and standard variations

and calculate

i  1, 2,, J

X (k  1)

i  ti2

i

by 99% KDE

using Equation 2 and

w,i  i wi2

using MLS and AIC, and determine

On-line monitoring:

x(k )  ( x(k )  u) /  for i  1,, J , j  k  d ,, k , 1. Normalize

2. Calculate 3. Sort

, and calculate

ti ( j )  piT x( j )

ei (k )  ti (k )   s k d  s ,iti (k  s) , obtain wi k 1

{w,i } in a decrease order using the quicksort algorithm [37], accordingly arrange {ti , w  ti wi }

4. Determine 5. Calculate

lw

Tw2

by 85% CPV, then calculate and

Tw2,lim

and

Qw , and report a fault if T  T 2 w

Qw,lim

2 w,lim

or

Qw  Qw,lim .

18

ACCEPTED MANUSCRIPT

description

no.

description

no.

description

1

A Feed

12

Product Sep level

23

D Feed Flow

2

D Feed

13

Prod Sep Pressure

24

E Feed Flow

3

E Feed

14

Prod Sep Underflow

25

4

A and C Feed

15

Stripper Level

26

A and C Feed Flow

5

Recycle Flow

16

Stripper Pressure

27

Compressor Recycle Valve

6

Reactor Feed Rate

17

Stripper Underflow

28

Purge Valve

7

Reactor Pressure

18

Stripper Temperature

29

Separator Pot Liquid Flow

8

Reactor Level

19

Stripper Steam Flow

30

Stripper Liquid Product Flow

9

Reactor Temperature

20

Compressor Work

31

Stripper Steam Valve

10

Purge Rate

21

RCW Outlet Temp

32

RCW Flow

11

Product Sep Temp

22

SCW Outlet Temp

33

CCW Flow

PT

no.

MA

Table 2 Variables for monitoring in the TE process

D

NU

SC

RI

A Feed Flow

Process variable

Type

A/C feed ratio, B composition constant（stream4）

step

B composition, A/C ratio constant (stream4)

step

D feed temperature (stream2)

step

reactor cooling water inlet temperature

step

condenser cooling water inlet temperature

step

A feed loss(stream1)

step

7

C header pressure loss-reduced availability (stream4)

step

8

A，B，C feed composition (stream4)

random variation

9

D feed temperature (stream2)

random variation

10

C feed temperature (stream2)

random variation

11

reactor cooling water inlet temperature

random variation

12

condenser cooling water inlet temperature

random variation

13

reaction kinetics

slow drift

14

reactor cooling water valve

sticking

1 2 3 4 5 6

AC CE P

Fault

TE

Table 3 Process disturbances in the TE process

19

ACCEPTED MANUSCRIPT Process variable

Type

15

condenser cooling water valve

sticking

16

unknown

unknown

17

unknown

unknown

18

unknown

19

unknown

20

unknown

21

valve position constant(stream 4)

PT

Fault

unknown

SC

RI

unknown unknown

NU

constant position

Fault

PCA

MA

Table 4 Fault missing detection rates in the TE process DPCA

Q

T2

Q

5

0.7562

0.7612

10

0.7063

0.7200

DLV

Tv2

Ts2

0.5988

0.7725

0.7588

0.51

0.7625

0.805

Qw

Tw2

0.7438

0.0038

0.7363

0.7125

0.6163

0.3512

0.6550

Qr

0.7563

0.6725

0.8762

0.505

0.9063

0.8275

0.8713

0.6088

0.3063

0.8225

0.8225

0.9013

0.3775

0.77

0.9913

0.8775

0.8125

0.4075

0.4213

0.4800

0.6900

0.3875

0.6113

0.6675

0.6825

0.455

0.3912

0.4450

0.0012

0.0088

0.0012

0.0088

0.6925

0.0088

0.0012

0.0088

0

2

0.0400

0.0162

0.0288

0.015

0.4225

0.0163

0.0488

0.0413

0.0175

4

0

0.8125

0

0.94

0.9925

0.47

0.0012

0.6813

0.0075

6

0

0.0088

0

0.0113

0.0363

0.0088

0

0

0.0062

7

0

0

0

0

0.6563

0

0

0.6013

0

8

0.1362

0.0325

0.0325

0.0263

0.1525

0.03

0.1213

0.1325

0.0188

11

0.2350

0.6088

0.0688

0.7225

0.6125

0.4713

0.3688

0.7600

0.2325

12

0.0913

0.0162

0.0375

0.0088

0.075

0.0163

0.0913

0.1050

0.0100

13

0.0475

0.0637

0.0463

0.0575

0.1088

0.0625

0.0488

0.0650

0.0475

16 19 20 1

AC CE P

TE

D

T2

DWPCA

20

ACCEPTED MANUSCRIPT DPCA

DLV

Q

T2

Q

T2

Tv2

Ts2

14

0

0.0088

0

0.0012

0.0012

0.0012

17

0.0400

0.2425

0.0238

0.2263

0.2213

0.2025

18

0.0975

0.1075

0.095

0.11

0.12

0.1075

21

0.5100

0.6125

0.5213

0.5425

0.9675

3

0.9625

0.9912

0.9463

0.995

0.995

9

0.9738

0.9875

0.94

0.9963

15

0.9613

0.9912

0.9263

0.9938

DWPCA

Qw

Tw2

0.0113

0.0038

0

0.04

0.1887

0.0550

0.095

0.1125

0.0988

0.5838

0.485

0.6013

0.5537

0.9875

0.9663

0.9900

0.9600

Qr

SC

RI

PT

PCA

NU

Fault

0.9825

0.9725

0.9925

0.9600

0.9813

0.9838

0.975

0.9938

0.9688

MA

0.9875

D

4.1 Case study on fault 5

TE

Fault 5 is a step change in the condenser cooling water inlet temperature. Once this fault is introduced, a

AC CE P

step change happens to the flow rate of condenser cooling water (variable 33) and this change propagates to other variables. As time goes on, the control system tends to tolerate and compensate this fault, thus most variables attain to their steady states again. The monitoring results using DWPCA, PCA, DPCA and DLV are shown in Figures 6-9, respectively. Figures 7-9 shows that PCA, DPCA and DLV can detect this fault at the beginning stages, but fails to detect it after sample 340. However, the DWPCA method can detect this fault during the whole process as shown in Figure 6. As compared to PCA, DPCA and DLV, DWPCA is much more sensitive to this fault. The DWPCA method takes emphasis on components with large estimation errors, as a result of high weight values as shown in Figure 10. We can see from Figure 10 that component 31 have high weights, so it is still affected after sample 340, and this helps the fault detection using the DWPCA method. Actually, variables 17 and 33 have largest contributions, 0.7039 and 0.7027, respectively, to the

21

ACCEPTED MANUSCRIPT direction of component 31. Figure 11 shows the influence of variables 17 and 33, in which, variable 33 has a

PT

significant step change, then we can determine it as the root of this fault. This isolation result is in agreement

RI

with the above analysis.

SC

4.2 Case study on fault 10

Fault 10 involves a random variation in C feed temperature (stream 4), which provides inlet feed for the

NU

stripper. Then, this fault first affects the stripper temperature (variable 18) and then propagates the influence to

MA

other variables. Most variables are able to remain around their steady points and behave similarly as normal. This makes the fault detection rather challenging. Monitoring performances of fault 10 based on DWPCA,

TE

D

PCA, DPCA and DLV are shown in Figures 12-15, respectively. The missing detection rate of Qw is reduced

AC CE P

significantly using DWPCA as compared to the missing detection rates of Q and T 2 using PCA and DPCA and of Tv2 , Ts2 and Qr using DLV. Figure 16 shows that weight values on components 26, 27 and 28 are high. Then, DWPCA can facilitate the fault isolation by narrowing down the faulty variables to variables with large contribution on these components. 5. CONCLUSIONS

We have shown that conventional PCA has difficulty in monitoring dynamic processes since it neglects dynamic information underlying process data. To solve this problem, we have proposed a DWPCA method with hybrid correlation structure design for online process monitoring in this work. The main contributions can be summarized as follows.

22

ACCEPTED MANUSCRIPT (i) We have evaluated the monitoring performance of conventional PCA on dynamic processes, based on the

PT

idea that online operating information contained in process auto-correlation structures should be used to detect incipient faults with the purpose to reduce the fault missing detection rate. To this aim, we have designed a

RI

two-tier hybrid correlation structure that considers both auto- and cross-correlations.

SC

(ii) We have introduced the new monitoring scheme that makes use of online operating information to

NU

dynamically partition the process data space into the important and remaining component subspaces, and the

i and direction weight wi of

MA

partition step is based on a contribution index i wi2 defined with variance

each component i . To dynamically monitor the two new subspaces, we have produced two new statistics.

D

(iii) We have demonstrated the monitoring performance of the proposed DWPCA method in the application

TE

of TE process. The monitoring results have shown that DWPCA can obtain a higher accuracy as compared to

AC CE P

conventional PCA, DPCA and DLV. Moreover, the results with DWPCA could aid process operators to narrow down the root cause of faults.

Extensions of concepts of the proposed method are recommended for further research. Further research could include the introduction of nonlinear behaviors and uncertainties in processes, and as a result improved monitoring schemes based on the proposed method can deal with process problems that are more practical and close to real word. We can also extend the proposed method for fault detection in discrete event systems or hybrid systems.

23

ACCEPTED MANUSCRIPT FI 8

FI 1

CWS

9

Compressor

TI

Condenser

FI

CWR

2

D

SC

PI

TI

5

LI

CWS

FI

FI

12

CWR

TI

Reactor

TI

NU

re z yl a n A

Stripper

Separator

SC

PI

TI 6

RI

FI XA XB XC XD XE XF

FI

LI

Product

300

T2

w

200 100 0

0

AC CE P

TE

D

Fig.5. Tennessee Eastman process

XD XE XF XG XH

FI 11

MA

4

CWS CWR

re z yl a n A

FI C

Purge XA XB XC XD XE XF XG XH

10

3

E

PI

LI

13

PT

7

re z yl a n A

A

FI JI

200

monitoring statistics 99% confidence limit

400

600

800

1000

800 monitoring statistics 99% confidence limit

Q

w

600 400 200 0

0

200

400

samples

600

800

1000

Fig.6. Monitoring results of fault 5 using DWPCA in the TE process

24

ACCEPTED MANUSCRIPT

200 monitoring statistics 99% confidence limit

PT

100 50 0

200

400

600

SC

0

NU

60

1000

monitoring statistics 99% confidence limit

200

400

samples

600

800

1000

TE

0

D

20 0

800

MA

Q

40

RI

T

2

150

AC CE P

Fig.7. Monitoring results of fault 5 using PCA in the TE process

25

ACCEPTED MANUSCRIPT

300 monitoring statistics 99% confidence limit

T2

PT

200

0

200

400

600

600

800

1000

monitoring statistics 99% confidence limit

MA

Q

100 50

200

400

D

0

1000

samples

TE

0

800

NU

150

SC

0

RI

100

200

T2 v

100 0

0

T2s

200

AC CE P

Fig.8. Monitoring results of fault 5 using DPCA in the TE process

200

monitoring statistics 99% confidence limit 400

600

1000

monitoring statistics 99% confidence limit

100 0

800

0

200

400

0

200

400

600

800

1000

600

800

1000

Qr

50

0

samples

26

ACCEPTED MANUSCRIPT

AC CE P

TE

D

MA

NU

SC

RI

PT

Fig.9. Monitoring results of fault 5 using DLV in the TE process

Fig.10. Weights on component directions for fault 5 in the TE process

27

ACCEPTED MANUSCRIPT

PT

2

-2 -4

0

200

0

200

400

600

1000

800

1000

NU

800

MA

5

400

D

0

samples

600

TE

variable 33

10

-5

RI

0

SC

variable 17

4

AC CE P

Fig.11. Influence of variables 17 and 33 for fault 5 in the TE process

28

ACCEPTED MANUSCRIPT

100

0

PT 0

200

400

600

NU

300

1000

monitoring statistics 99% confidence limit

200

400

samples

600

800

1000

TE

0

D

100 0

800

MA

Q

w

200

RI

50

SC

T2

w

monitoring statistics 99% confidence limit

AC CE P

Fig.12. Monitoring results of fault 10 using DWPCA in the TE process

29

ACCEPTED MANUSCRIPT

80 monitoring statistics 99% confidence limit

PT

40 20 0

200

400

600

SC

0

NU

30

10

1000

monitoring statistics 99% confidence limit

200

400

D

0

samples

600

800

1000

TE

0

800

MA

Q

20

RI

T2

60

AC CE P

Fig.13. Monitoring results of fault 10 using PCA in the TE process

150

T2

100 50 0

0

200

monitoring statistics 99% confidence limit

400

600

800

1000

80 monitoring statistics 99% confidence limit

Q

60 40 20 0

0

200

400

samples

600

800

1000

30

ACCEPTED MANUSCRIPT Fig.14. Monitoring results of fault 10 using DPCA in the TE process

50

0

0

200

400

600

0

200

400

600

MA 0

D

20

200

400

TE

Qr

40

0

1000

SC

50 0

800

monitoring statistics 99% confidence limit

NU

T2s

100

RI

T2v

PT

monitoring statistics 99% confidence limit

800

1000

monitoring statistics 99% confidence limit 600

800

1000

samples

AC CE P

Fig.15. Monitoring results of fault 10 using DLV in the TE process

31

AC CE P

TE

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

Fig.16. Weights on component directions for fault 10 in the TE process

REFERENCES [1] J. Davis, T. Edgar, J. Porter, J. Bernaden, M. Sarli, Smart manufacturing, manufacturing intelligence and demand-dynamic performance, Computers & Chemical Engineering 47 (2012) 145-156. [2] EFFRA, Factories of the Future: MULTI-ANNUAL ROADMAP FOR THE CONTRACTUAL PPP UNDER HORIZON 2020, (2013). [3] P.M. Frank, Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy: A survey and some new results, Automatica 26 (1990) 459-474. [4] R. Isermann, Model-based fault-detection and diagnosis–status and applications, Annual Reviews in control 29 (2005) 71-85. [5] V. Venkatasubramanian, R. Rengaswamy, S.N. Kavuri, K. Yin, A review of process fault detection and diagnosis: Part III: Process history based methods, Computers & chemical engineering 27 (2003) 327-346.

32

ACCEPTED MANUSCRIPT [6] S. Wold, K. Esbensen, P. Geladi, Principal component analysis, Chemometrics and intelligent laboratory systems 2 (1987) 37-52. [7] I. Jolliffe, Principal component analysis. 2002: Wiley Online Library. [8] U. Kruger, S. Kumar, T. Littler, Improved principal component monitoring using the local approach, Automatica 43 (2007)

PT

1532-1542.

[9] Z. Li, U. Kruger, X. Wang, L. Xie, An error-in-variable projection to latent structure framework for monitoring technical

RI

systems with orthogonal signal components, Chemometrics and Intelligent Laboratory Systems 133 (2014) 70-83. [10] K. Liu, X. Jin, Z. Fei, J. Liang, Adaptive partitioning PCA model for improving fault detection and isolation, Chinese

SC

Journal of Chemical Engineering (2015).

[11] W. Ku, R.H. Storer, C. Georgakis, Disturbance detection and isolation by dynamic principal component analysis,

NU

Chemometrics and intelligent laboratory systems 30 (1995) 179-196.

[12] B.R. Bakshi, Multiscale PCA with application to multivariate statistical process monitoring, AIChE journal (1998). [13] J. Gertler, J. Cao, PCA‐based fault diagnosis in the presence of control and dynamics, AIChE Journal 50 (2004) 388-402.

& Chemical Engineering 26 (2002) 1281-1293.

MA

[14] M. Misra, H.H. Yue, S.J. Qin, C. Ling, Multivariate process monitoring and fault diagnosis by multi-scale PCA, Computers

[15] W. Sun, A. Palazoğlu, J.A. Romagnoli, Detecting abnormal process trends by wavelet‐domain hidden Markov models, AIChE Journal 49 (2003) 140-150.

D

[16] A. Kassidas, P.A. Taylor, J.F. MacGregor, Off-line diagnosis of deterministic faults in continuous dynamic multivariable

TE

processes using speech recognition methods, Journal of Process Control 8 (1998) 381-393. [17] W. Li, S.J. Qin, Consistent dynamic PCA based on errors-in-variables subspace identification, Journal of Process Control

AC CE P

11 (2001) 661-678.

[18] R.J. Treasure, U. Kruger, J.E. Cooper, Dynamic multivariate statistical process control using subspace identification, Journal of Process Control 14 (2004) 279-292. [19] C. Cheng, M.-S. Chiu, Nonlinear process monitoring using JITL-PCA, Chemometrics and Intelligent Laboratory Systems 76 (2005) 1-13.

[20] A. Negiz, A. Çlinar, Statistical monitoring of multivariable dynamic processes with state-space models, Aiche Journal 43 (1997) 2002-2020.

[21] W. Li, S.J. Qin, Consistent dynamic PCA based on errors-in-variables subspace identification, Journal of Process Control 11 (2001) 661–678. [22] S. Yoon, J.F. MacGregor, Principal‐component analysis of multiscale data for process monitoring and fault diagnosis, AIChE Journal 50 (2004) 2891-2903. [23] G. Li, S.J. Qin, D. Zhou, A New Method of Dynamic Latent-Variable Modeling for Process Monitoring, IEEE Transactions on Industrial Electronics 61 (2014) 6438-6445. [24] W.C. Sang, H.P. Jin, I.B. Lee, Process monitoring using a Gaussian mixture model via principal component analysis and discriminant analysis, Computers & Chemical Engineering 28 (2004) 1377-1387. [25] Z. Li, H. Fang, L. Xia, Increasing mapping based hidden Markov model for dynamic process monitoring and diagnosis, Expert Systems with Applications 41 (2014) 744-751. [26] J. Zhu, Z. Ge, Z. Song, HMM-Driven Robust Probabilistic Principal Component Analyzer for Dynamic Process Fault Classification, IEEE Transactions on Industrial Electronics 62 (2015) 1-1.

33

ACCEPTED MANUSCRIPT [27] K.-J. Yoon, I.S. Kweon, Adaptive support-weight approach for correspondence search, (2006). [28] X. Niyogi. Locality preserving projections. in Neural information processing systems. 2004: MIT. [29] S. Wold, Exponentially weighted moving principal components analysis and projections to latent structures, Chemometrics and intelligent laboratory systems 23 (1994) 149-161.

PT

[30] Q. Jiang, X. Yan, Chemical processes monitoring based on weighted principal component analysis and its application, Chemometrics and Intelligent Laboratory Systems 119 (2012) 11-20.

RI

[31] P. Nomikos, J.F. MacGregor, Multivariate SPC charts for monitoring batch processes, Technometrics 37 (1995) 41-59. [32] Q. Chen, U. Kruger, A.T. Leung, Regularised kernel density estimation for clustered process data, Control engineering

SC

practice 12 (2004) 267-274.

[33] Q. Chen, R. Wynne, P. Goulding, D. Sandoz, The application of principal component analysis and kernel density estimation

NU

to enhance process monitoring, Control Engineering Practice 8 (2000) 531-543.

[34] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Computers & chemical engineering 17 (1993) 245255.

MA

[35] S. Yin, S.X. Ding, A. Haghani, H. Hao, P. Zhang, A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process, Journal of Process Control 22 (2012) 1567-1581. [36] P.R. Lyman, C. Georgakis, Plant-wide control of the Tennessee Eastman problem, Computers & chemical engineering 19 (1995) 321-331.

AC CE P

TE

D

[37] T.H. Cormen, Introduction to algorithms. 2009: MIT press.

34