Ensemble modified independent component analysis for enhanced non-Gaussian process monitoring

Ensemble modified independent component analysis for enhanced non-Gaussian process monitoring

Control Engineering Practice 58 (2017) 34–41 Contents lists available at ScienceDirect Control Engineering Practice journal homepage: www.elsevier.c...

2MB Sizes 0 Downloads 72 Views

Control Engineering Practice 58 (2017) 34–41

Contents lists available at ScienceDirect

Control Engineering Practice journal homepage: www.elsevier.com/locate/conengprac

Ensemble modified independent component analysis for enhanced nonGaussian process monitoring

crossmark



Chudong Tong , Ting Lan, Xuhua Shi Faculty of Electrical Engineering and Computer Science, Ningbo University, Zhejiang 315211, PR China

A R T I C L E I N F O

A BS T RAC T

Keywords: Modified independent component analysis Fault detection Ensemble learning

As a multivariate statistical tool, the modified independent component analysis (MICA) has drawn considerable attention within the non-Gaussian process monitoring circle since it can solve two main problems in the original ICA method. Despite the diversity in applications, the determination logic for non-quadratic functions involved in the iterative procedures of MICA algorithm has always been empirical. Given that the MICA is an unsupervised modeling method, a direct rational study that can conclusively demonstrate which non-quadratic function is optimal for the general purpose of fault detection is inaccessible. The selection of non-quadratic functions is still a challenge that has rarely been attempted. Recognition of this issue and motivated by the superiority of ensemble learning strategy, a novel ensemble MICA (EMICA) modeling approach is presented for enhancing non-Gaussian process monitoring performance. Instead of focusing on a single non-quadratic function, the proposed method combines multiple base MICA models derived from different non-quadratic functions into an ensemble one, and the Bayesian inference is employed as a decision fusion method to form a unique monitoring index for fault detection. The enhanced fault detectability of the EMICA method is also illustrated on two industrial processes.

1. Introduction Modern industrial plants have been witnessing a rapid development of distributed computer-aided systems and sensor technologies as well as operator support systems through data-driven process monitoring systems, in particular, multivariate statistical process monitoring (MSPM) techniques in recent years (Ruiz-Cárcel, Cao, Mba, Lao, & Samuel, 2015; Yin, Li, Gao, & Kaynak, 2015). Not surprisingly, MSPM on the basis of two fundamental algorithms: principal component analysis (PCA) and partial least squares (PLS), has been receiving considerable attention as first-principle models of modern complex process systems are often costly to develop or practically inaccessible (Portnoy, Melendez, Pinzon, & Sanjuan, 2016; Yin, Ding, Xie, & Luo, 2014). However, the proficiency of identifying faults from data for the PCA/PLS-based methods can be deteriorated because they assume that the sampled data follows a multivariate Gaussian distribution approximately (Lee, Qin, & Lee, 2006; Lee, Yoo, & Lee, 2004; Zhang & Zhang, 2010). To handle the monitoring problem of non-Gaussian processes, independent component analysis (ICA), which can extract more useful information from non-Gaussian process data with the utilization of higher-order statistics, has been intensively investigated in the last few years (Fan, Qin, & Wang, 2014; Hsu, Chen, & Chen,



2010; Jiang, Wang, & Yan, 2015; Lee et al., 2006; Zhang & Zhang, 2010). In comparison to PCA, ICA not only de-correlates the data but also makes the projected data as independent as possible, and thus glean more essential features from observed measurements. Among the diverse applications of the ICA-based non-Gaussian process monitoring, the FastICA iterative algorithm proposed by Hyvärinen (1999) is always employed as a “default” method for model construction because it can greatly reduce the computation time. However, the utilization of the FastICA algorithm has some drawbacks in practical applications. First, different solutions would be obtained because of its random initialization step, which might result in unstable monitoring models. Second, unlike PCA model which sorts the importance of the principal components (PCs) according to their variance, a proper ordering of the independent components (ICs) is still a well-known issue. To tackle these challenges, Lee et al. (2006) developed a modified ICA (MICA) algorithm that extracts a small number of ordered ICs and produces a consistent result as well. The basic idea is to first use the normalized PCs from PCA model as an initial estimate for ICs and then to perform the FastICA algorithm to update the dominant ICs while maintaining the variance. Furthermore, Zhang and Zhang (2010) adopted the particle swarm optimization method for estimating ICs, and the importance of ICs is then sorted by

Corresponding author. E-mail address: [email protected] (C. Tong).

http://dx.doi.org/10.1016/j.conengprac.2016.09.014 Received 7 April 2016; Received in revised form 16 August 2016; Accepted 24 September 2016 0967-0661/ © 2016 Elsevier Ltd. All rights reserved.

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

(1)

T = PTX

the role of resumption of the original data. Although the PSO-based ICA algorithm would reduce the risk of obtaining local minimum solution, it increases the computation time compared with the FastICA algorithm. Recently, Ge and Song (2013) developed a performancedriven ensemble learning ICA model for non-Gaussian process monitoring. The ensemble learning approach is used to improve the stability of the FastICA algorithm, and the determination of dominant ICs is realized by a performance-driven method with reference abnormal datasets involved. The essence of the ensemble learning is to combine multiple solutions from different models into a unique one, which is expected to give a significantly better result than any outcomes of individual solutions. Benefiting from this superiority, the ensemble learning technique has become quite popular in the field of MSPM over the last several years (Ge & Song, 2014; Li & Yang, 2014; Tong, Palazoglu, & Yan, 2014). Nevertheless, it should be stressed that all these ICA modeling methods mentioned above involve a measure of non-Gaussianity so as to reflect the statistically independence of ICs. The negentropy on the basis of the information theoretic quantity of differential entropy, is usually served as a good estimate of non-Gaussianity of a random variable. However, the calculation of negentropy requires an estimation of the probability density function, which sometimes is unobtainable. Fortunately, Hyvärinen (1999) formulated a feasible and reliable calculation of negentropy through using a proper non-quadratic function. Given that there are three suggestions for the non-quadratic function available in the literature, the stability of the ICA model cannot be ensured with different non-quadratic functions employed, and thus the resulted monitoring performance would also be affected. Generally, the modeling procedures in the ICA-based process monitoring method as well as other approaches in MSPM are unsupervised, which means that only a dataset sampled under normal operating condition is needed. Without respect to abnormal data, a proper selection of the non-quadratic functions is inaccessible. Meanwhile, the available samples from all possible faulty conditions is highly limited, a single empirically determination of the non-quadratic functions would lead to some specific faults undetected. From this viewpoint, a single non-quadratic function cannot be effective for all kinds of faults. Therefore, the selection of non-quadratic functions is a severe problem that remains unsolved. Recognition of this issue motivates the current study, which integrates the ensemble learning strategy into the MICA algorithm. As mentioned previously, the MICA modeling method can address the two challenges existed in the original ICA iterative procedures. Additionally, the MICA only extracts a few dominant ICs instead of all ICs that needed for process monitoring, high computational load can thus be attenuated (Lee et al., 2006). With the involvement of ensemble learning strategy, the proposed ensemble MICA (EMICA) method combines multiple base MICA models resulted from different non-quadratic functions into an ensemble one through Bayesian inference based decision fusion (Ghosh, Ng, & Srinivasan, 2011). Since any of these non-quadratic functions could be useful for fault detection, a feasible solution is to take advantage of all of them, and then produce an ensemble result for enhanced non-Gaussian process monitoring. Unlike traditional MICA-based monitoring method, multiple base MICA monitoring models with different non-quadratic functions utilized are first developed, the Bayesian inference strategy is then employed for online fault alarm decision fusion, which generates an ensemble probabilistic index from multiple monitoring statistics.

where X contains n samples of m measured variables. T ∈ R m × n consists of the extracted PCs, P ∈ R m × m is composed of the eigenvectors of covariance matrix XX T /(n − 1) = PΛPT , and Λ = diag {λ1, λ2, …, λ m}. The last few elements in Λ are sometimes close to zero because of the collinearity existed in the measurements, they can be excluded. But it is highly suggested to include as many eigenvalues as possible. The extracted PCs are whitened as follows:

Z = Λ−1/2T = Λ−1/2PTX = QX

(2)

where Q = The whitened components Z is then served as an initial estimate for ICs. The objective of MICA algorithm is to update a matrix C ∈ R m × d satisfying CTC = D with a form such that the extracted components

Λ−1/2PT .

(3)

S = C TZ

become as independent of each other as possible, where D = diag {λ1, λ2, …, λd }. The requirement SST /(n − 1) = D makes the variance of each IC in S and the corresponding PC in PCA be the same Therefore, a proper ordering of the ICs in MICA can then be realized in accordance with their variance. The S can be normalized by

Sn = D−1/2S = D−1/2CTZ = Cn TZ T

D−1/2CT

(4)

TC

and Cn n = I , the main task of MICA is thus with Cn = reduced to find the matrix Cn . The demixing matrix W ∈ R d × m and mixing matrix A ∈ R m × d are given as

W = D1/2Cn TQ = D1/2Cn TΛ−1/2PT

(5)

A = PΛ1/2Cn D−1/2

(6)

where WA = I d ∈ R d × d . Given that the variance of each IC in S is the same as that of the corresponding PC in PCA, the number of retained ICs, d, can then be determined by some criteria that used in PCA (Valle, Li, & Qin, 1999; Wold, 1978), for example, cumulative percent variance (CPV). If the process data strictly follows an Gaussian distribution, Cn reduces to [I d ⋮0], which means S = T . Therefore, the PCA can be considered as a special case of the MICA, and the updating of Cn can be started from a fixed initialization, i.e., [I d ⋮0]. In the MICA iterative procedures provided in the Appendix, the statistically independent requirement of ICs needs a measure of nonGaussianity. Generally, the measure of non-Gaussianity is approximated by

J ( y) = [E {G ( y)} − E {G (v )}]2

(7)

where y is scaled to be of zero mean and unit variance, v is a Gaussian variable of zero mean and unit variance, and G is known as the nonquadratic function. Hyvärinen and Oja (2000) introduced three nonquadratic functions:

G 1 (u ) =

1 log cosh(a1 u ) a1

G 2 (u ) = exp(−a2 u2 /2)

G 3 (u ) =

u4

(8) (9) (10)

2. MICA based process monitoring

where 1 ≤ a1 ≤ 2 and a2 ≈ 1. It has been empirically shown that G1 is a good contrast function, and it is usually adopted for ICA model construction. However, as will be illustrated later, the function G1 cannot be always good for improving fault detectability since the MICA is an unsupervised modeling method. The selection of G would highly influence the coming monitoring results. Without enough prior knowledge, the optimal determination of function G is still an open problem.

2.1. MICA algorithm

2.2. Process monitoring based on MICA algorithm

The first step of MICA method is to use PCA to extract all available PCs from data X ∈ R m × n :

Similar to PCA-based fault detection method, the implementation of MICA for fault detection also depends on two statistics (i.e., T 2 and 35

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

Q ). The T 2 and Q statistics aimed for monitoring a new sampled data x ∈ R m×1 are formulated as follows:

PT 2 (F |x) =

PT 2 (x|F ) PT 2 (F ) i

i

PT 2 (x)

i

(15)

i

T 2 = sTD−1s

(11)

Q = eTe

(12)

with the probabilities PT 2 (x) defined as i

PT 2 (x) = PT 2 (x|N ) PT 2 (N ) + PT 2 (x|F ) PT 2 (F ) i

where s = Wx , e = x − As . Unlike PCA-based monitoring, the control limits for T 2 and Q statistics cannot be determined by a known probability density function. The details are well discussed in Lee et al. (2006).

⎛ T2 PT 2 (x|N ) = exp ⎜⎜ − 2 i i ⎝ Ti, lim

To attempt the issue raised by the selection of function G, a novel EMICA-based monitoring method is proposed and depicted in Fig. 1 schematically, the detailed introduction is given below.. In the offline modeling phase, the normalized PCs derived from PCA model is used as an initial point for updating ICs. The MICA algorithm given in the Appendix goes one step further to transform the whitened components into a number of ICs with three different nonquadratic functions (i.e., G1, G 2 , and G 3) involved. As shown in Fig. 1, the i-th MICA model in association with Gi is given as

i

(16)

i

⎞ ⎟⎟ , ⎠

⎛ T2 i, lim PT 2 (x|F ) = exp ⎜⎜ − 2 i ⎝ Ti

⎞ ⎟⎟ ⎠

(17)

The ensemble probabilistic index is then defined using a weighted form, given as 3

BIC T 2 =

⎧ P 2 (x|F ) P 2 (F |x) ⎫ ⎪ Ti ⎪ Ti ⎬ ⎪ ∑3 P 2 (x|F ) ⎪ ⎩ ⎭ i =1 Ti

∑⎨ i =1

(18)

to combine multiple monitoring results into an ensemble. Similarly, the final probabilistic index, BICQ , for Q statistic can be obtained the same way as BIC T 2 . Generally, the EMICA model triggers a fault alarm when either the value of BIC T 2 or BICQ exceeds the control limits 1 − α . Otherwise, the monitored process is considered to be faulty-free. Given that three different MICA models focusing on different nonquadratic functions are built in parallel, the proper selection of function G is no longer an issue in the proposed EMICA monitoring method. Instead of determining a specific G for MICA modeling through using priori knowledge or other analytic strategies, the presented modeling phase in the EMICA approach accounts for all possible non-quadratic functions. The EMICA is, of course, expected to be more feasible and applicable in monitoring non-Gaussian processes. Moreover, it needs to be stressed that the proposed method is not limited to these three non-quadratic functions. If more formulations of function G become available, the proposed method, of course, can combine them into an ensemble in a similar way.

(13) (14)

Si = WX i

i

where N and F represents normal and abnormal operating conditions, respectively. PT 2 (N ) and PT 2 (F ) can be simply assigned as α and 1 − α , i i respectively. α is the confidence level for calculating control limits. The conditional probabilities PT 2 (x|N ) and PT 2 (x|F ) can be calculated as i i follows (Ge et al., 2011; Tong et al., 2014)

3. Ensemble modified ICA (EMICA) for process monitoring

X = Ai Si + Ei

i

Ti2, lim

and Qi, lim ) are also where i ∈ {1, 2, 3}. The control limits (i.e., calculated for the i-th MICA model. In online monitoring phase, when a new data x becomes available, it is first scaled using the estimated mean and variance derived from X. The two monitoring statistics in each MICA model are then computed using Eqs. (11) and (12), respectively. Obviously, three different sets of monitoring results are obtained in this step. The question is how to turn the different monitoring statistics into an ensemble index. Among several decision fusion techniques that have been reviewed in Ghosh et al. (2011), the Bayesian inference can produce an intuitive and practical fusion (Bishop, 2006). Because of this, the Bayesian inference has been widely used elsewhere in recent years (Ge & Song, 2013, 2014; Ge, Gao, & Song, 2011; Tong et al., 2014). Therefore, the Bayesian inference is utilized to combine the multiple statistics into a unique index in a probabilistic manner. The fault probability of x monitored by Ti2 statistic in the i-th MICA model is defined as

4. Case studies The proficiency of the EMICA-based monitoring model is demon-

Fig. 1. Flowchart of the proposed ICA-based monitoring scheme.

36

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

strated on two industrial processes: a numerical non-Gaussian system and the Tennessee Eastman (TE) benchmark process. Illustrations and discussions are provided through comparing fault detectability of G1MICA, G2-MICA, G3-MICA, and the proposed method.

of all three functions and combines multiple base models into an ensemble one through Bayesian inference. Therefore, the determination of G is no longer a troublesome issue in the EMICA model, benefiting from the ensemble learning strategy.

4.1. Numerical non-Gaussian example

4.2. TE benchmark process

The following eight-variable system designed by Tong et al. (2014) is considered:

The physical model of TE process was provided by Downs and Vogel (1993), which consists of five major unit operations: a reactor, a condenser, a vapor-liquid separator, a recycle compressor and a stripper. The TE process is a widely used benchmark platform for evaluating process monitoring approaches because of its relatively realistic and complex dynamics. A set of 52 process variables can be measured, which includes 22 continuous variables, 12 manipulated variables, and 19 composition measurements sampled less frequently. Table 2 tabulates a total of 33 continuous measured variables that will be used for monitoring purpose. The simulated datasets were downloaded from http://web.mit.edu/braatzgroup, the downloaded datasets have been separated into two parts: the training datasets and the testing datasets, where 960 samples were collected in each testing dataset with fault condition introduced after the 160-th sampling interval. The TE benchmark process can simulate 21 different abnormal conditions as described in Table 3. Particularly, faults 3, 9 and 15 have been empirically shown to be difficult to detect (Ge & Song, 2013; Lee et al., 2006; Li & Yang, 2014). Almost all previously mentioned MSPM approaches failed to successfully detect them. The normal dataset with 960 samples is normalized before the application of the MICA, in which 31 PCs from Eq. (1) are retained to find ICs because the eigenvalues corresponding to last 2 PCs close to zero. The number of dominant ICs is empirically chosen to be 9 for all monitoring models in order to conduct a fair comparison. The missing alarm rates of four nonGaussian process monitoring methods for all 21 faults are given in Table 3, the bold numbers in Table 3 stand for the minimum missing alarm rates achieved for each fault. As tabulated in Table 3, the difficulties in detecting Faults 3, 9, and 15 have also been confirmed. Therefore, they are not pursued in the current study. It can be easily seen from Table 2 that the proposed EMICA method achieves the smallest missing alarm rates for most of the remaining 18 faults. Although the monitoring performance of the EMICA is not the best for Faults 2, 13, and 21, the values are only slightly larger than the minimum ones. The comparisons on the TE process again demonstrate the superiority of the EMICA method for non-Gaussian process monitoring. It needs to be stressed that the fault detectability can be directly influenced by the number of dominant ICs. Therefore, the effect of the number of dominant ICs on fault detectability is studied as

⎡ 0.95 ⎢ 0.23 ⎢ ⎢ 0.61 x = As = ⎢ 0.49 ⎢ 0.89 ⎢ 0.76 ⎢ ⎢ 0.46 ⎣ 0.02

0.82 0.45 0.62 0.79 0.92 0.74 0.18 0.41

0.94 ⎤ 0.92 ⎥ ⎥ 0.41⎥ 0.89 ⎥ s + v 0.06 ⎥ 0.35⎥ ⎥ 0.81⎥ 0.01⎦

(19)

To simulate a non-Gaussian process, the source variables s are generated as follows:

s1 (k ) = 2cos (0.08k ) sin (0.06k ) s2 (k ) = sin (0.3k ) + 2cos (0.1k ) s3 (k ) = (rem(k , 27) − 13)/9

(20)

where noise v follows a Gaussian distribution with zero mean and variance 0.01, k represents the sampling step, and the rem function returns a remainder after division. First, 1000 samples are generated under normal operating condition. Given that the studied system is driven by non-Gaussian source variables, which can be clearly seen from Fig. 2, it is more appropriate to be monitored by non-Gaussian methods. Therefore, the proposed EMICA model and the original MICA models involving different functions G are taken into account for this purpose.. In these non-Gaussian monitoring models, 3 dominant ICs are determined and the confidence level α = 99%. For testing, three abnormal operating conditions listed in Table 1 are considered, which both generate 500 samples. The monitoring details of the first case is depicted in Fig. 3, in which the G1–MICA and G 3-MICA are useless in detecting the fault, although a few values captured by the Q statistic do exceed the control limit. On the contrary, the fault in the case 1 can be well detected by the G 2 –MICA model as well as the proposed EMICA method. The MICA monitoring model developed on the widely used non-quadratic function G1 is not sensitive to this kind of fault. This points to fact that the determination of function G can heavily affect the fault detectability of the resulted monitoring model, and the utilization of ensemble learning strategy makes the proposed monitoring model insensitive to the selection of function G. This conclusion can also be validated by the monitoring results of the second testing case, which is illustrated in Fig. 4. In this case, only the G 2 –MICA cannot detect the process abnormality. Interestingly, the monitoring details displayed in Fig. 5 show that the fault defined in case 3 can be only detected by the Q statistic in both G1–MICA and G 3-MICA models, or be only detected by the T 2 statistic in G 2 –MICA model. In contrast, both BIC T 2 and BICQ in the proposed EMICA model can detect this process abnormality. Concluded from these comparisons, the proposed method can provide robust monitoring performance regardless of the determination of function G. Given that the adopted MICA for non-Gaussian process monitoring can be viewed as a one-class classification method, the underutilization of reference faults would deteriorate the fault detectability of the resulted model. In addition, as can be clearly seen from Figs. 3 to 5, different MICA models function different in monitoring different faults, the determination of G could be an intricate problem without a clue even though some information of faults is accessible. Instead of focusing on a single function G, the proposed EMICA approach takes full advantage

Fig. 2. Source variables in the numerical process.

37

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

ing a single function G, the proposed approach takes all three formulations of G into account and combines multiple base MICA models into an ensemble one through Bayesian inference. Benefiting from the ensemble learning strategy, different monitoring results obtained by different MICA base models can be combined into an ensemble, an “optimal” decision can thus be achieved in the proposed EMICA model. The enhanced fault detectability of the proposed EMICA method is validated on a numerical non-Gaussian system and the well-known TE benchmark process. It is important to emphasis that the novelty of the presented work holds promise for wider applications, the presented monitoring scheme is not limited to only linear non-Gaussian process, the same strategy can be adopted for other MICA-based methods in monitoring nonlinear, or dynamic systems. Furthermore, non-Gaussian regression algorithms based on MICA can also utilize the same strategy to obtain latent components in a more robust manner.

Table 1 Abnormal scenarios in the numerical example. Case no.

Description

1 2

The (5, 2) element of matrix A changes to 0 after the 100-th sample. A step change of magnitude 5 in the source variable s2 in introduced from sample 101 to the end. A step change of magnitude 1 is added to x1 starting from the 101-th sample.

3

well. The average missing alarm rates of these four monitoring models for all the considered 18 faults against the number of dominant ICs are plotted in Fig. 6. It can then be seen from Fig. 6, the average missing alarm rate of the proposed EMICA method is always the smallest one, irrespective of the number of dominant ICs. 5. Conclusions

Acknowledgement A novel EMICA approach has been proposed for enhanced nonGaussian process monitoring through incorporating ensemble learning strategy and Bayesian inference. The ensemble learning strategy is employed to attempt the issue caused by the determination of nonquadratic functions in MICA iterative procedures. Instead of determin-

This work was sponsored by K.C. Wong Magna Fund in Ningbo University, the Natural Science Foundation of China (61503204), the Natural Science Foundation of Zhejiang Province (Y16F030001), and the Science & Technology Planning Project of Zhejiang (2015C31017).

Appendix A The iterative procedures of the MICA algorithm is presented here. First, initialize the matrix Cn as

CTn

(A1)

= [I d ⋮0]

where I d is the d-dimensional identity matrix and 0 is the d × (m − d ) zero matrix. Unlike random initialization in the original ICA iterative

Fig. 3. Monitoring details of the case 1 based on (a) G1–MICA, (b) G 2 –MICA, (c) G 3 –MICA, and EMICA.

38

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

Fig. 4. Monitoring details of the case 2 based on (a) G1–MICA, (b) G 2 –MICA, (c) G 3 –MICA, and EMICA.

Fig. 5. Monitoring details of the case 3 based on (a) G1–MICA, (b) G 2 –MICA, (c) G 3 –MICA, and (d) EMICA.

39

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

Table 2 Monitored variables in the TE process. No.

Description

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

A feed D feed E feed Total feed Recycle flow Reactor feed rate Reactor pressure Reactor level Reactor temperature Purge rate Product separator temperature Product separator level Product separator pressure Product separator under flow Stripper level Stripper pressure Stripper under flow Stripper temperature Stripper steam Flow Compressor work Reactor cooling water outlet temperature Separator cooling water outlet temperature D feed flow valve E feed flow valve A feed flow valve Total feed flow valve Compressor recycle valve Purge valve Separator pot liquid flow valve Stripper liquid product flow valve Stripper steam valve Reactor cooling water flow Condenser cooling water flow

Fig. 6. Average missing alarm rates against the number of dominant ICs for (a) T 2 (or BIC T 2 ) statistic and (b) Q (or BICQ ) statistic.

procedures, this sort of initialization used here can lead to a consistent result. Denoting ci as the i-th column of Cn , the MICA algorithm obtains the i-th IC (i.e., si ) by the following iterative procedures: (1) (2) (3) (4) (5)

Determine the number of dominant ICs, d, that needs to be extracted, and initialize the counter to be i = 1. Update the vector ci by ci ← E {Zg (ciT Z)} − E {g′(ciT Z)} ci , where g and g′ are the first-order and the second-order derivative of G. i −1 Implement orthogonalization for ci : ci ← ci − ∑ j =1 (ciT ci) cj and then normalize ci ← ci / ci . Check if ci has converged, if not, go back to step 2, otherwise, go to step 5. Replace the i-th column of Cn with the vector ci , then set i = i + 1 and repeat the above procedures till d ICs are obtained.

Table 3 Missing alarm rates in the TE process. No.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Description

A/C feed ratio, B composition constant. B composition, A/C ratio constant. D feed temperature. Reactor cooling water inlet temperature. Condenser cooling water inlet temperature. A feed loss. C header pressure loss A,B, C feed composition. D feed temperature. C feed temperature. Reactor cooling water inlet temperature. Condenser cooling water inlet temperature. Reaction kinetics. Reactor cooling water valve. Condenser cooling water valve. Unknown Unknown Unknown Unknown Unknown Fixed position of valve in stream 4.

G1–MICA

G 2 –MICA

G 3 –MICA

EMICA

T2

Q

T2

Q

T2

Q

BIC T 2

BICQ

0.25 1.88 98.25 20.88 0.00 0.00 0.00 2.38 99.00 19.75 50.50 0.38 4.88 0.13 96.50 13.75 5.38 10.00 26.25 22.25 54.50

0.13 2.00 93.00 7.00 0.00 0.00 0.00 2.38 95.75 22.75 34.63 0.25 5.00 0.00 94.25 20.00 10.38 10.13 49.25 27.25 42.00

0.25 1.63 97.50 11.38 0.00 0.00 1.38 2.25 98.00 15.25 48.50 0.38 5.25 1.88 95.38 13.63 9.13 9.88 37.38 21.13 49.50

0.13 2.38 95.63 4.25 0.00 0.00 0.00 3.75 97.25 19.25 32.63 0.25 5.13 0.00 94.00 22.25 7.13 10.38 45.50 25.50 46.88

0.13 2.25 98.63 8.75 0.00 0.00 0.13 2.25 97.63 15.88 38.38 0.25 4.75 0.13 96.13 13.75 5.00 10.13 25.75 16.00 53.50

0.50 1.88 97.25 10.38 0.00 0.00 0.00 2.88 96.38 22.63 36.75 0.25 5.50 0.00 97.25 19.13 10.38 10.25 37.75 25.63 49.00

0.25 1.75 98.00 3.25 0.00 0.00 0.00 1.88 98.13 13.75 37.38 0.25 4.88 0.13 95.88 11.25 4.38 10.00 21.00 14.25 50.50

0.13 2.00 95.00 3.50 0.00 0.00 0.00 2.38 96.75 19.38 32.00 0.25 5.13 0.00 94.88 17.75 7.75 10.38 35.38 21.50 43.38

40

Control Engineering Practice 58 (2017) 34–41

C. Tong et al.

2497–2508. Lee, J. M., Yoo, C. K., & Lee, I. B. (2004). Statistical process monitoring with independent component analysis. Journal of Process Control, 14(5), 467–485. Lee, J. M., Qin, S. J., & Lee, I. B. (2006). Fault detection and diagnosis based on modified independent component analysis. AIChE Journal, 52(10), 3501–3514. Li, N., & Yang, Y. (2014). Ensemble kernel principal component analysis for improved nonlinear process monitoring. Industrial & Engineering Chemistry Research, 54(1), 318–329. Portnoy, I., Melendez, K., Pinzon, H., & Sanjuan, M. (2016). An improved weighted recursive PCA algorithm for adaptive fault detection. Control Engineering Practice, 50, 69–83. Ruiz-Cárcel, C., Cao, Y., Mba, D., Lao, L., & Samuel, R. T. (2015). Statistical process monitoring of a multiphase flow facility. Control Engineering Practice, 42, 74–88. Tong, C., Palazoglu, A., & Yan, X. (2014). Improved ICA for process monitoring based on ensemble learning and Bayesian inference. Chemometrics & Intelligent Laboratory Systems, 135(14), 141–149. Valle, S., Li, W., & Qin, S. J. (1999). Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods. Industrial & Engineering Chemistry Research, 38(11), 4389–4401. Wold, S. (1978). Cross-validatory estimation of components in factor and principal components models. Technometrics, 20(4), 397–405. Yin, S., Ding, S. X., Xie, X., & Luo, H. (2014). A review on basic data-driven approaches for industrial process monitoring. IEEE Transactions on Industrial Electronics, 61(11), 6418–6428. Yin, S., Li, X., Gao, H., & Kaynak, O. (2015). Data-based techniques focused on modern industry: An overview. IEEE Transactions on Industrial Electronics, 62(1), 657–667. Zhang, Y., & Zhang, Y. (2010). Fault detection of non-Gaussian processes based on modified independent component analysis. Chemical Engineer Science, 65(16), 4630–4639.

References BishopC.M. (2006). Pattern recognition and machine learning. Springer-Verlag, New York. Downs, J. J., & Vogel, E. F. (1993). A plant-wide industrial process control problem. Computers & Chemical Engineering, 17(3), 245–255. Fan, J., Qin, S. J., & Wang, Y. (2014). Online monitoring of nonlinear multivariate industrial processes using filtering KICA-PCA. Control Engineering Practice, 22(22), 205–216. Ge, Z., & Song, Z. (2013). Performance-driven ensemble learning ICA model for improved non-Gaussian process monitoring. Chemometrics & Intelligent Laboratory Systems, 123(2), 1–8. Ge, Z., & Song, Z. (2014). Ensemble independent component regression models and soft sensing application. Chemometrics & Intelligent Laboratory Systems, 130(130), 115–122. Ge, Z., Gao, F., & Song, Z. (2011). Two-dimensional Bayesian monitoring method for nonlinear multimode processes. Chemical Engineering Science, 66(21), 5173–5183. Ghosh, K., Ng, Y. S., & Srinivasan, R. (2011). Evaluation of decision fusion strategies for effective collaboration among heterogeneous fault diagnostic methods. Computers & Chemical Engineering, 35(2), 342–355. Hsu, C. C., Chen, M. C., & Chen, L. S. (2010). A novel process monitoring approach with dynamic independent component analysis. Control Engineering Practice, 18(3), 242–253. Hyvärinen, A. (1999). Survey on independent component analysis. Neural Computing Surveys, 2(7), 94–128. Hyvärinen, A., & Oja, E. (2000). Independent component analysis: Algorithm and applications. Neural Networks, 13(4–5), 411–430. Jiang, Q., Wang, B., & Yan, X. (2015). Multiblock independent component analysis integrated with Hellinger distance and Bayesian inference for non-Gaussian plantwide process monitoring. Industrial & Engineering Chemistry Research, 54(9),

41