Ensemble modified independent component analysis for enhanced non-Gaussian process monitoring

Control Engineering Practice 58 (2017) 34–41 Contents lists available at ScienceDirect Control Engineering Practice journal homepage: www.elsevier.c...

Download PDF

2MB Sizes 0 Downloads 72 Views

Report

PDF Reader
Full Text

Control Engineering Practice 58 (2017) 34–41

Contents lists available at ScienceDirect

Control Engineering Practice journal homepage: www.elsevier.com/locate/conengprac

Ensemble modiﬁed independent component analysis for enhanced nonGaussian process monitoring

crossmark

⁎

Chudong Tong , Ting Lan, Xuhua Shi Faculty of Electrical Engineering and Computer Science, Ningbo University, Zhejiang 315211, PR China

A R T I C L E I N F O

A BS T RAC T

Keywords: Modiﬁed independent component analysis Fault detection Ensemble learning

As a multivariate statistical tool, the modiﬁed independent component analysis (MICA) has drawn considerable attention within the non-Gaussian process monitoring circle since it can solve two main problems in the original ICA method. Despite the diversity in applications, the determination logic for non-quadratic functions involved in the iterative procedures of MICA algorithm has always been empirical. Given that the MICA is an unsupervised modeling method, a direct rational study that can conclusively demonstrate which non-quadratic function is optimal for the general purpose of fault detection is inaccessible. The selection of non-quadratic functions is still a challenge that has rarely been attempted. Recognition of this issue and motivated by the superiority of ensemble learning strategy, a novel ensemble MICA (EMICA) modeling approach is presented for enhancing non-Gaussian process monitoring performance. Instead of focusing on a single non-quadratic function, the proposed method combines multiple base MICA models derived from diﬀerent non-quadratic functions into an ensemble one, and the Bayesian inference is employed as a decision fusion method to form a unique monitoring index for fault detection. The enhanced fault detectability of the EMICA method is also illustrated on two industrial processes.

1. Introduction Modern industrial plants have been witnessing a rapid development of distributed computer-aided systems and sensor technologies as well as operator support systems through data-driven process monitoring systems, in particular, multivariate statistical process monitoring (MSPM) techniques in recent years (Ruiz-Cárcel, Cao, Mba, Lao, & Samuel, 2015; Yin, Li, Gao, & Kaynak, 2015). Not surprisingly, MSPM on the basis of two fundamental algorithms: principal component analysis (PCA) and partial least squares (PLS), has been receiving considerable attention as ﬁrst-principle models of modern complex process systems are often costly to develop or practically inaccessible (Portnoy, Melendez, Pinzon, & Sanjuan, 2016; Yin, Ding, Xie, & Luo, 2014). However, the proﬁciency of identifying faults from data for the PCA/PLS-based methods can be deteriorated because they assume that the sampled data follows a multivariate Gaussian distribution approximately (Lee, Qin, & Lee, 2006; Lee, Yoo, & Lee, 2004; Zhang & Zhang, 2010). To handle the monitoring problem of non-Gaussian processes, independent component analysis (ICA), which can extract more useful information from non-Gaussian process data with the utilization of higher-order statistics, has been intensively investigated in the last few years (Fan, Qin, & Wang, 2014; Hsu, Chen, & Chen,

⁎

2010; Jiang, Wang, & Yan, 2015; Lee et al., 2006; Zhang & Zhang, 2010). In comparison to PCA, ICA not only de-correlates the data but also makes the projected data as independent as possible, and thus glean more essential features from observed measurements. Among the diverse applications of the ICA-based non-Gaussian process monitoring, the FastICA iterative algorithm proposed by Hyvärinen (1999) is always employed as a “default” method for model construction because it can greatly reduce the computation time. However, the utilization of the FastICA algorithm has some drawbacks in practical applications. First, diﬀerent solutions would be obtained because of its random initialization step, which might result in unstable monitoring models. Second, unlike PCA model which sorts the importance of the principal components (PCs) according to their variance, a proper ordering of the independent components (ICs) is still a well-known issue. To tackle these challenges, Lee et al. (2006) developed a modiﬁed ICA (MICA) algorithm that extracts a small number of ordered ICs and produces a consistent result as well. The basic idea is to ﬁrst use the normalized PCs from PCA model as an initial estimate for ICs and then to perform the FastICA algorithm to update the dominant ICs while maintaining the variance. Furthermore, Zhang and Zhang (2010) adopted the particle swarm optimization method for estimating ICs, and the importance of ICs is then sorted by

Corresponding author. E-mail address: [email protected] (C. Tong).

http://dx.doi.org/10.1016/j.conengprac.2016.09.014 Received 7 April 2016; Received in revised form 16 August 2016; Accepted 24 September 2016 0967-0661/ © 2016 Elsevier Ltd. All rights reserved.

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

(1)

T = PTX

the role of resumption of the original data. Although the PSO-based ICA algorithm would reduce the risk of obtaining local minimum solution, it increases the computation time compared with the FastICA algorithm. Recently, Ge and Song (2013) developed a performancedriven ensemble learning ICA model for non-Gaussian process monitoring. The ensemble learning approach is used to improve the stability of the FastICA algorithm, and the determination of dominant ICs is realized by a performance-driven method with reference abnormal datasets involved. The essence of the ensemble learning is to combine multiple solutions from diﬀerent models into a unique one, which is expected to give a signiﬁcantly better result than any outcomes of individual solutions. Beneﬁting from this superiority, the ensemble learning technique has become quite popular in the ﬁeld of MSPM over the last several years (Ge & Song, 2014; Li & Yang, 2014; Tong, Palazoglu, & Yan, 2014). Nevertheless, it should be stressed that all these ICA modeling methods mentioned above involve a measure of non-Gaussianity so as to reﬂect the statistically independence of ICs. The negentropy on the basis of the information theoretic quantity of diﬀerential entropy, is usually served as a good estimate of non-Gaussianity of a random variable. However, the calculation of negentropy requires an estimation of the probability density function, which sometimes is unobtainable. Fortunately, Hyvärinen (1999) formulated a feasible and reliable calculation of negentropy through using a proper non-quadratic function. Given that there are three suggestions for the non-quadratic function available in the literature, the stability of the ICA model cannot be ensured with diﬀerent non-quadratic functions employed, and thus the resulted monitoring performance would also be aﬀected. Generally, the modeling procedures in the ICA-based process monitoring method as well as other approaches in MSPM are unsupervised, which means that only a dataset sampled under normal operating condition is needed. Without respect to abnormal data, a proper selection of the non-quadratic functions is inaccessible. Meanwhile, the available samples from all possible faulty conditions is highly limited, a single empirically determination of the non-quadratic functions would lead to some speciﬁc faults undetected. From this viewpoint, a single non-quadratic function cannot be eﬀective for all kinds of faults. Therefore, the selection of non-quadratic functions is a severe problem that remains unsolved. Recognition of this issue motivates the current study, which integrates the ensemble learning strategy into the MICA algorithm. As mentioned previously, the MICA modeling method can address the two challenges existed in the original ICA iterative procedures. Additionally, the MICA only extracts a few dominant ICs instead of all ICs that needed for process monitoring, high computational load can thus be attenuated (Lee et al., 2006). With the involvement of ensemble learning strategy, the proposed ensemble MICA (EMICA) method combines multiple base MICA models resulted from diﬀerent non-quadratic functions into an ensemble one through Bayesian inference based decision fusion (Ghosh, Ng, & Srinivasan, 2011). Since any of these non-quadratic functions could be useful for fault detection, a feasible solution is to take advantage of all of them, and then produce an ensemble result for enhanced non-Gaussian process monitoring. Unlike traditional MICA-based monitoring method, multiple base MICA monitoring models with diﬀerent non-quadratic functions utilized are ﬁrst developed, the Bayesian inference strategy is then employed for online fault alarm decision fusion, which generates an ensemble probabilistic index from multiple monitoring statistics.

where X contains n samples of m measured variables. T ∈ R m × n consists of the extracted PCs, P ∈ R m × m is composed of the eigenvectors of covariance matrix XX T /(n − 1) = PΛPT , and Λ = diag {λ1, λ2, …, λ m}. The last few elements in Λ are sometimes close to zero because of the collinearity existed in the measurements, they can be excluded. But it is highly suggested to include as many eigenvalues as possible. The extracted PCs are whitened as follows:

Z = Λ−1/2T = Λ−1/2PTX = QX

(2)

where Q = The whitened components Z is then served as an initial estimate for ICs. The objective of MICA algorithm is to update a matrix C ∈ R m × d satisfying CTC = D with a form such that the extracted components

Λ−1/2PT .

(3)

S = C TZ

become as independent of each other as possible, where D = diag {λ1, λ2, …, λd }. The requirement SST /(n − 1) = D makes the variance of each IC in S and the corresponding PC in PCA be the same Therefore, a proper ordering of the ICs in MICA can then be realized in accordance with their variance. The S can be normalized by

Sn = D−1/2S = D−1/2CTZ = Cn TZ T

D−1/2CT

(4)

TC

and Cn n = I , the main task of MICA is thus with Cn = reduced to ﬁnd the matrix Cn . The demixing matrix W ∈ R d × m and mixing matrix A ∈ R m × d are given as

W = D1/2Cn TQ = D1/2Cn TΛ−1/2PT

(5)

A = PΛ1/2Cn D−1/2

(6)

where WA = I d ∈ R d × d . Given that the variance of each IC in S is the same as that of the corresponding PC in PCA, the number of retained ICs, d, can then be determined by some criteria that used in PCA (Valle, Li, & Qin, 1999; Wold, 1978), for example, cumulative percent variance (CPV). If the process data strictly follows an Gaussian distribution, Cn reduces to [I d ⋮0], which means S = T . Therefore, the PCA can be considered as a special case of the MICA, and the updating of Cn can be started from a ﬁxed initialization, i.e., [I d ⋮0]. In the MICA iterative procedures provided in the Appendix, the statistically independent requirement of ICs needs a measure of nonGaussianity. Generally, the measure of non-Gaussianity is approximated by

J ( y) = [E {G ( y)} − E {G (v )}]2

(7)

where y is scaled to be of zero mean and unit variance, v is a Gaussian variable of zero mean and unit variance, and G is known as the nonquadratic function. Hyvärinen and Oja (2000) introduced three nonquadratic functions:

G 1 (u ) =

1 log cosh(a1 u ) a1

G 2 (u ) = exp(−a2 u2 /2)

G 3 (u ) =

u4

(8) (9) (10)

2. MICA based process monitoring

where 1 ≤ a1 ≤ 2 and a2 ≈ 1. It has been empirically shown that G1 is a good contrast function, and it is usually adopted for ICA model construction. However, as will be illustrated later, the function G1 cannot be always good for improving fault detectability since the MICA is an unsupervised modeling method. The selection of G would highly inﬂuence the coming monitoring results. Without enough prior knowledge, the optimal determination of function G is still an open problem.

2.1. MICA algorithm

2.2. Process monitoring based on MICA algorithm

The ﬁrst step of MICA method is to use PCA to extract all available PCs from data X ∈ R m × n :

Similar to PCA-based fault detection method, the implementation of MICA for fault detection also depends on two statistics (i.e., T 2 and 35

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

Q ). The T 2 and Q statistics aimed for monitoring a new sampled data x ∈ R m×1 are formulated as follows:

PT 2 (F |x) =

PT 2 (x|F ) PT 2 (F ) i

i

PT 2 (x)

i

(15)

i

T 2 = sTD−1s

(11)

Q = eTe

(12)

with the probabilities PT 2 (x) deﬁned as i

PT 2 (x) = PT 2 (x|N ) PT 2 (N ) + PT 2 (x|F ) PT 2 (F ) i

where s = Wx , e = x − As . Unlike PCA-based monitoring, the control limits for T 2 and Q statistics cannot be determined by a known probability density function. The details are well discussed in Lee et al. (2006).

⎛ T2 PT 2 (x|N ) = exp ⎜⎜ − 2 i i ⎝ Ti, lim

To attempt the issue raised by the selection of function G, a novel EMICA-based monitoring method is proposed and depicted in Fig. 1 schematically, the detailed introduction is given below.. In the oﬄine modeling phase, the normalized PCs derived from PCA model is used as an initial point for updating ICs. The MICA algorithm given in the Appendix goes one step further to transform the whitened components into a number of ICs with three diﬀerent nonquadratic functions (i.e., G1, G 2 , and G 3) involved. As shown in Fig. 1, the i-th MICA model in association with Gi is given as

i

(16)

i

⎞ ⎟⎟ , ⎠

⎛ T2 i, lim PT 2 (x|F ) = exp ⎜⎜ − 2 i ⎝ Ti

⎞ ⎟⎟ ⎠

(17)

The ensemble probabilistic index is then deﬁned using a weighted form, given as 3

BIC T 2 =

⎧ P 2 (x|F ) P 2 (F |x) ⎫ ⎪ Ti ⎪ Ti ⎬ ⎪ ∑3 P 2 (x|F ) ⎪ ⎩ ⎭ i =1 Ti

∑⎨ i =1

(18)

to combine multiple monitoring results into an ensemble. Similarly, the ﬁnal probabilistic index, BICQ , for Q statistic can be obtained the same way as BIC T 2 . Generally, the EMICA model triggers a fault alarm when either the value of BIC T 2 or BICQ exceeds the control limits 1 − α . Otherwise, the monitored process is considered to be faulty-free. Given that three diﬀerent MICA models focusing on diﬀerent nonquadratic functions are built in parallel, the proper selection of function G is no longer an issue in the proposed EMICA monitoring method. Instead of determining a speciﬁc G for MICA modeling through using priori knowledge or other analytic strategies, the presented modeling phase in the EMICA approach accounts for all possible non-quadratic functions. The EMICA is, of course, expected to be more feasible and applicable in monitoring non-Gaussian processes. Moreover, it needs to be stressed that the proposed method is not limited to these three non-quadratic functions. If more formulations of function G become available, the proposed method, of course, can combine them into an ensemble in a similar way.

(13) (14)

Si = WX i

i

where N and F represents normal and abnormal operating conditions, respectively. PT 2 (N ) and PT 2 (F ) can be simply assigned as α and 1 − α , i i respectively. α is the conﬁdence level for calculating control limits. The conditional probabilities PT 2 (x|N ) and PT 2 (x|F ) can be calculated as i i follows (Ge et al., 2011; Tong et al., 2014)

3. Ensemble modiﬁed ICA (EMICA) for process monitoring

X = Ai Si + Ei

i

Ti2, lim

and Qi, lim ) are also where i ∈ {1, 2, 3}. The control limits (i.e., calculated for the i-th MICA model. In online monitoring phase, when a new data x becomes available, it is ﬁrst scaled using the estimated mean and variance derived from X. The two monitoring statistics in each MICA model are then computed using Eqs. (11) and (12), respectively. Obviously, three diﬀerent sets of monitoring results are obtained in this step. The question is how to turn the diﬀerent monitoring statistics into an ensemble index. Among several decision fusion techniques that have been reviewed in Ghosh et al. (2011), the Bayesian inference can produce an intuitive and practical fusion (Bishop, 2006). Because of this, the Bayesian inference has been widely used elsewhere in recent years (Ge & Song, 2013, 2014; Ge, Gao, & Song, 2011; Tong et al., 2014). Therefore, the Bayesian inference is utilized to combine the multiple statistics into a unique index in a probabilistic manner. The fault probability of x monitored by Ti2 statistic in the i-th MICA model is deﬁned as

4. Case studies The proﬁciency of the EMICA-based monitoring model is demon-

Fig. 1. Flowchart of the proposed ICA-based monitoring scheme.

36

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

strated on two industrial processes: a numerical non-Gaussian system and the Tennessee Eastman (TE) benchmark process. Illustrations and discussions are provided through comparing fault detectability of G1MICA, G2-MICA, G3-MICA, and the proposed method.

of all three functions and combines multiple base models into an ensemble one through Bayesian inference. Therefore, the determination of G is no longer a troublesome issue in the EMICA model, beneﬁting from the ensemble learning strategy.

4.1. Numerical non-Gaussian example

4.2. TE benchmark process

The following eight-variable system designed by Tong et al. (2014) is considered:

The physical model of TE process was provided by Downs and Vogel (1993), which consists of ﬁve major unit operations: a reactor, a condenser, a vapor-liquid separator, a recycle compressor and a stripper. The TE process is a widely used benchmark platform for evaluating process monitoring approaches because of its relatively realistic and complex dynamics. A set of 52 process variables can be measured, which includes 22 continuous variables, 12 manipulated variables, and 19 composition measurements sampled less frequently. Table 2 tabulates a total of 33 continuous measured variables that will be used for monitoring purpose. The simulated datasets were downloaded from http://web.mit.edu/braatzgroup, the downloaded datasets have been separated into two parts: the training datasets and the testing datasets, where 960 samples were collected in each testing dataset with fault condition introduced after the 160-th sampling interval. The TE benchmark process can simulate 21 diﬀerent abnormal conditions as described in Table 3. Particularly, faults 3, 9 and 15 have been empirically shown to be diﬃcult to detect (Ge & Song, 2013; Lee et al., 2006; Li & Yang, 2014). Almost all previously mentioned MSPM approaches failed to successfully detect them. The normal dataset with 960 samples is normalized before the application of the MICA, in which 31 PCs from Eq. (1) are retained to ﬁnd ICs because the eigenvalues corresponding to last 2 PCs close to zero. The number of dominant ICs is empirically chosen to be 9 for all monitoring models in order to conduct a fair comparison. The missing alarm rates of four nonGaussian process monitoring methods for all 21 faults are given in Table 3, the bold numbers in Table 3 stand for the minimum missing alarm rates achieved for each fault. As tabulated in Table 3, the diﬃculties in detecting Faults 3, 9, and 15 have also been conﬁrmed. Therefore, they are not pursued in the current study. It can be easily seen from Table 2 that the proposed EMICA method achieves the smallest missing alarm rates for most of the remaining 18 faults. Although the monitoring performance of the EMICA is not the best for Faults 2, 13, and 21, the values are only slightly larger than the minimum ones. The comparisons on the TE process again demonstrate the superiority of the EMICA method for non-Gaussian process monitoring. It needs to be stressed that the fault detectability can be directly inﬂuenced by the number of dominant ICs. Therefore, the eﬀect of the number of dominant ICs on fault detectability is studied as

⎡ 0.95 ⎢ 0.23 ⎢ ⎢ 0.61 x = As = ⎢ 0.49 ⎢ 0.89 ⎢ 0.76 ⎢ ⎢ 0.46 ⎣ 0.02

0.82 0.45 0.62 0.79 0.92 0.74 0.18 0.41

0.94 ⎤ 0.92 ⎥ ⎥ 0.41⎥ 0.89 ⎥ s + v 0.06 ⎥ 0.35⎥ ⎥ 0.81⎥ 0.01⎦

(19)

To simulate a non-Gaussian process, the source variables s are generated as follows:

s1 (k ) = 2cos (0.08k ) sin (0.06k ) s2 (k ) = sin (0.3k ) + 2cos (0.1k ) s3 (k ) = (rem(k , 27) − 13)/9

(20)

where noise v follows a Gaussian distribution with zero mean and variance 0.01, k represents the sampling step, and the rem function returns a remainder after division. First, 1000 samples are generated under normal operating condition. Given that the studied system is driven by non-Gaussian source variables, which can be clearly seen from Fig. 2, it is more appropriate to be monitored by non-Gaussian methods. Therefore, the proposed EMICA model and the original MICA models involving diﬀerent functions G are taken into account for this purpose.. In these non-Gaussian monitoring models, 3 dominant ICs are determined and the conﬁdence level α = 99%. For testing, three abnormal operating conditions listed in Table 1 are considered, which both generate 500 samples. The monitoring details of the ﬁrst case is depicted in Fig. 3, in which the G1–MICA and G 3-MICA are useless in detecting the fault, although a few values captured by the Q statistic do exceed the control limit. On the contrary, the fault in the case 1 can be well detected by the G 2 –MICA model as well as the proposed EMICA method. The MICA monitoring model developed on the widely used non-quadratic function G1 is not sensitive to this kind of fault. This points to fact that the determination of function G can heavily aﬀect the fault detectability of the resulted monitoring model, and the utilization of ensemble learning strategy makes the proposed monitoring model insensitive to the selection of function G. This conclusion can also be validated by the monitoring results of the second testing case, which is illustrated in Fig. 4. In this case, only the G 2 –MICA cannot detect the process abnormality. Interestingly, the monitoring details displayed in Fig. 5 show that the fault deﬁned in case 3 can be only detected by the Q statistic in both G1–MICA and G 3-MICA models, or be only detected by the T 2 statistic in G 2 –MICA model. In contrast, both BIC T 2 and BICQ in the proposed EMICA model can detect this process abnormality. Concluded from these comparisons, the proposed method can provide robust monitoring performance regardless of the determination of function G. Given that the adopted MICA for non-Gaussian process monitoring can be viewed as a one-class classiﬁcation method, the underutilization of reference faults would deteriorate the fault detectability of the resulted model. In addition, as can be clearly seen from Figs. 3 to 5, diﬀerent MICA models function diﬀerent in monitoring diﬀerent faults, the determination of G could be an intricate problem without a clue even though some information of faults is accessible. Instead of focusing on a single function G, the proposed EMICA approach takes full advantage

Fig. 2. Source variables in the numerical process.

37

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

ing a single function G, the proposed approach takes all three formulations of G into account and combines multiple base MICA models into an ensemble one through Bayesian inference. Beneﬁting from the ensemble learning strategy, diﬀerent monitoring results obtained by diﬀerent MICA base models can be combined into an ensemble, an “optimal” decision can thus be achieved in the proposed EMICA model. The enhanced fault detectability of the proposed EMICA method is validated on a numerical non-Gaussian system and the well-known TE benchmark process. It is important to emphasis that the novelty of the presented work holds promise for wider applications, the presented monitoring scheme is not limited to only linear non-Gaussian process, the same strategy can be adopted for other MICA-based methods in monitoring nonlinear, or dynamic systems. Furthermore, non-Gaussian regression algorithms based on MICA can also utilize the same strategy to obtain latent components in a more robust manner.

Table 1 Abnormal scenarios in the numerical example. Case no.

Description

1 2

The (5, 2) element of matrix A changes to 0 after the 100-th sample. A step change of magnitude 5 in the source variable s2 in introduced from sample 101 to the end. A step change of magnitude 1 is added to x1 starting from the 101-th sample.

3

well. The average missing alarm rates of these four monitoring models for all the considered 18 faults against the number of dominant ICs are plotted in Fig. 6. It can then be seen from Fig. 6, the average missing alarm rate of the proposed EMICA method is always the smallest one, irrespective of the number of dominant ICs. 5. Conclusions

Acknowledgement A novel EMICA approach has been proposed for enhanced nonGaussian process monitoring through incorporating ensemble learning strategy and Bayesian inference. The ensemble learning strategy is employed to attempt the issue caused by the determination of nonquadratic functions in MICA iterative procedures. Instead of determin-

This work was sponsored by K.C. Wong Magna Fund in Ningbo University, the Natural Science Foundation of China (61503204), the Natural Science Foundation of Zhejiang Province (Y16F030001), and the Science & Technology Planning Project of Zhejiang (2015C31017).

Appendix A The iterative procedures of the MICA algorithm is presented here. First, initialize the matrix Cn as

CTn

(A1)

= [I d ⋮0]

where I d is the d-dimensional identity matrix and 0 is the d × (m − d ) zero matrix. Unlike random initialization in the original ICA iterative

Fig. 3. Monitoring details of the case 1 based on (a) G1–MICA, (b) G 2 –MICA, (c) G 3 –MICA, and EMICA.

38

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

Fig. 4. Monitoring details of the case 2 based on (a) G1–MICA, (b) G 2 –MICA, (c) G 3 –MICA, and EMICA.

Fig. 5. Monitoring details of the case 3 based on (a) G1–MICA, (b) G 2 –MICA, (c) G 3 –MICA, and (d) EMICA.

39

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

Table 2 Monitored variables in the TE process. No.

Description

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

A feed D feed E feed Total feed Recycle flow Reactor feed rate Reactor pressure Reactor level Reactor temperature Purge rate Product separator temperature Product separator level Product separator pressure Product separator under flow Stripper level Stripper pressure Stripper under flow Stripper temperature Stripper steam Flow Compressor work Reactor cooling water outlet temperature Separator cooling water outlet temperature D feed flow valve E feed flow valve A feed flow valve Total feed flow valve Compressor recycle valve Purge valve Separator pot liquid flow valve Stripper liquid product flow valve Stripper steam valve Reactor cooling water flow Condenser cooling water flow

Fig. 6. Average missing alarm rates against the number of dominant ICs for (a) T 2 (or BIC T 2 ) statistic and (b) Q (or BICQ ) statistic.

procedures, this sort of initialization used here can lead to a consistent result. Denoting ci as the i-th column of Cn , the MICA algorithm obtains the i-th IC (i.e., si ) by the following iterative procedures: (1) (2) (3) (4) (5)

Determine the number of dominant ICs, d, that needs to be extracted, and initialize the counter to be i = 1. Update the vector ci by ci ← E {Zg (ciT Z)} − E {g′(ciT Z)} ci , where g and g′ are the ﬁrst-order and the second-order derivative of G. i −1 Implement orthogonalization for ci : ci ← ci − ∑ j =1 (ciT ci) cj and then normalize ci ← ci / ci . Check if ci has converged, if not, go back to step 2, otherwise, go to step 5. Replace the i-th column of Cn with the vector ci , then set i = i + 1 and repeat the above procedures till d ICs are obtained.

Table 3 Missing alarm rates in the TE process. No.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Description

A/C feed ratio, B composition constant. B composition, A/C ratio constant. D feed temperature. Reactor cooling water inlet temperature. Condenser cooling water inlet temperature. A feed loss. C header pressure loss A,B, C feed composition. D feed temperature. C feed temperature. Reactor cooling water inlet temperature. Condenser cooling water inlet temperature. Reaction kinetics. Reactor cooling water valve. Condenser cooling water valve. Unknown Unknown Unknown Unknown Unknown Fixed position of valve in stream 4.

G1–MICA

G 2 –MICA

G 3 –MICA

EMICA

T2

Q

T2

Q

T2

Q

BIC T 2

BICQ

0.25 1.88 98.25 20.88 0.00 0.00 0.00 2.38 99.00 19.75 50.50 0.38 4.88 0.13 96.50 13.75 5.38 10.00 26.25 22.25 54.50

0.13 2.00 93.00 7.00 0.00 0.00 0.00 2.38 95.75 22.75 34.63 0.25 5.00 0.00 94.25 20.00 10.38 10.13 49.25 27.25 42.00

0.25 1.63 97.50 11.38 0.00 0.00 1.38 2.25 98.00 15.25 48.50 0.38 5.25 1.88 95.38 13.63 9.13 9.88 37.38 21.13 49.50

0.13 2.38 95.63 4.25 0.00 0.00 0.00 3.75 97.25 19.25 32.63 0.25 5.13 0.00 94.00 22.25 7.13 10.38 45.50 25.50 46.88

0.13 2.25 98.63 8.75 0.00 0.00 0.13 2.25 97.63 15.88 38.38 0.25 4.75 0.13 96.13 13.75 5.00 10.13 25.75 16.00 53.50

0.50 1.88 97.25 10.38 0.00 0.00 0.00 2.88 96.38 22.63 36.75 0.25 5.50 0.00 97.25 19.13 10.38 10.25 37.75 25.63 49.00

0.25 1.75 98.00 3.25 0.00 0.00 0.00 1.88 98.13 13.75 37.38 0.25 4.88 0.13 95.88 11.25 4.38 10.00 21.00 14.25 50.50

0.13 2.00 95.00 3.50 0.00 0.00 0.00 2.38 96.75 19.38 32.00 0.25 5.13 0.00 94.88 17.75 7.75 10.38 35.38 21.50 43.38

40

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

2497–2508. Lee, J. M., Yoo, C. K., & Lee, I. B. (2004). Statistical process monitoring with independent component analysis. Journal of Process Control, 14(5), 467–485. Lee, J. M., Qin, S. J., & Lee, I. B. (2006). Fault detection and diagnosis based on modiﬁed independent component analysis. AIChE Journal, 52(10), 3501–3514. Li, N., & Yang, Y. (2014). Ensemble kernel principal component analysis for improved nonlinear process monitoring. Industrial & Engineering Chemistry Research, 54(1), 318–329. Portnoy, I., Melendez, K., Pinzon, H., & Sanjuan, M. (2016). An improved weighted recursive PCA algorithm for adaptive fault detection. Control Engineering Practice, 50, 69–83. Ruiz-Cárcel, C., Cao, Y., Mba, D., Lao, L., & Samuel, R. T. (2015). Statistical process monitoring of a multiphase ﬂow facility. Control Engineering Practice, 42, 74–88. Tong, C., Palazoglu, A., & Yan, X. (2014). Improved ICA for process monitoring based on ensemble learning and Bayesian inference. Chemometrics & Intelligent Laboratory Systems, 135(14), 141–149. Valle, S., Li, W., & Qin, S. J. (1999). Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods. Industrial & Engineering Chemistry Research, 38(11), 4389–4401. Wold, S. (1978). Cross-validatory estimation of components in factor and principal components models. Technometrics, 20(4), 397–405. Yin, S., Ding, S. X., Xie, X., & Luo, H. (2014). A review on basic data-driven approaches for industrial process monitoring. IEEE Transactions on Industrial Electronics, 61(11), 6418–6428. Yin, S., Li, X., Gao, H., & Kaynak, O. (2015). Data-based techniques focused on modern industry: An overview. IEEE Transactions on Industrial Electronics, 62(1), 657–667. Zhang, Y., & Zhang, Y. (2010). Fault detection of non-Gaussian processes based on modiﬁed independent component analysis. Chemical Engineer Science, 65(16), 4630–4639.

References BishopC.M. (2006). Pattern recognition and machine learning. Springer-Verlag, New York. Downs, J. J., & Vogel, E. F. (1993). A plant-wide industrial process control problem. Computers & Chemical Engineering, 17(3), 245–255. Fan, J., Qin, S. J., & Wang, Y. (2014). Online monitoring of nonlinear multivariate industrial processes using ﬁltering KICA-PCA. Control Engineering Practice, 22(22), 205–216. Ge, Z., & Song, Z. (2013). Performance-driven ensemble learning ICA model for improved non-Gaussian process monitoring. Chemometrics & Intelligent Laboratory Systems, 123(2), 1–8. Ge, Z., & Song, Z. (2014). Ensemble independent component regression models and soft sensing application. Chemometrics & Intelligent Laboratory Systems, 130(130), 115–122. Ge, Z., Gao, F., & Song, Z. (2011). Two-dimensional Bayesian monitoring method for nonlinear multimode processes. Chemical Engineering Science, 66(21), 5173–5183. Ghosh, K., Ng, Y. S., & Srinivasan, R. (2011). Evaluation of decision fusion strategies for eﬀective collaboration among heterogeneous fault diagnostic methods. Computers & Chemical Engineering, 35(2), 342–355. Hsu, C. C., Chen, M. C., & Chen, L. S. (2010). A novel process monitoring approach with dynamic independent component analysis. Control Engineering Practice, 18(3), 242–253. Hyvärinen, A. (1999). Survey on independent component analysis. Neural Computing Surveys, 2(7), 94–128. Hyvärinen, A., & Oja, E. (2000). Independent component analysis: Algorithm and applications. Neural Networks, 13(4–5), 411–430. Jiang, Q., Wang, B., & Yan, X. (2015). Multiblock independent component analysis integrated with Hellinger distance and Bayesian inference for non-Gaussian plantwide process monitoring. Industrial & Engineering Chemistry Research, 54(9),

41

Ensemble modified independent component analysis for enhanced non-Gaussian process monitoring

Ensemble modified independent component analysis for enhanced non-Gaussian process monitoring

Recommend Documents