Generalized grouped contributions for hierarchical fault diagnosis with group Lasso

Generalized grouped contributions for hierarchical fault diagnosis with group Lasso

Control Engineering Practice 93 (2019) 104193 Contents lists available at ScienceDirect Control Engineering Practice journal homepage: www.elsevier...

791KB Sizes 0 Downloads 41 Views

Control Engineering Practice 93 (2019) 104193

Contents lists available at ScienceDirect

Control Engineering Practice journal homepage: www.elsevier.com/locate/conengprac

Generalized grouped contributions for hierarchical fault diagnosis with group Lasso Chao Shang a ,โˆ—, Hongquan Ji b , Xiaolin Huang c , Fan Yang a , Dexian Huang a a

Department of Automation, Tsinghua University, and Beijing National Research Center for Information Science and Technology, Beijing 100084, China College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China c Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200400, China b

ARTICLE

INFO

Keywords: Fault diagnosis Multivariate statistical process monitoring Grouped variable selection Group Lasso Industrial alarm systems

ABSTRACT In process industries, it is necessary to conduct fault diagnosis after abnormality is found, with the aim to identify root cause variables and further provide instructive information for maintenance. Contribution plots along with multivariate statistical process monitoring are standard tools towards this goal, which, however, suffer from the smearing effect and high diagnostic complexity on large-scale processes. In fact, process variables tend to be naturally grouped, and in this work, a novel fault identification strategy based on group Lasso penalty along with a hierarchical fault diagnosis scheme is proposed by leveraging group information among variables. By introducing the group Lasso as a regularization approach, groups of irrelevant variables tend to yield exactly zero contributions collectively, which help find the exact root cause, alleviate the smearing effect, and furnish clear diagnostic information for process practitioners. For online computational convenience, an efficient numerical solution strategy is also presented. Besides, it turns out that the proposed approach also applies to dynamic monitoring models with lagged measurements augmented, thereby enjoying widespread generality. Its effectiveness is evaluated on both the Tennessee Eastman benchmark process and a pilot-scale experiment apparatus.

1. Introduction With the ever-increasing availability of massive data, renewed inspections have been consistently made in many industries upon traditional disciplines by leveraging power of data analytics and machine learning (Qin & Chiang, 2019; Yin & Kaynak, 2015). In the process industries, the widespread use of manufacturing execution systems (MES) makes it possible to conveniently collect and archive high volumes of data, which embody profound information themselves for achieving intelligent decision-making (Qin, 2014; Shang, 2018). As an important ingredient of process data analytics, data-based process monitoring and fault diagnosis strategies have been intensively investigated in the past two decades (Chiang, Russell, & Braatz, 2000; Ge, Song, Ding, & Huang, 2017; Qin, 2012), and have shown significant promises in securing efficient and safe process operations without requiring too much firstprinciple knowledge (MacGregor & Cinar, 2012; Tulsyan, Garvin, & รœndey, 2018). In order to disentangle significant correlations between process variables, latent variable models (LVMs) such as principal component analysis (PCA), partial least squares (PLS) and independent component analysis (ICA) have been widely used as baseline methods to analyze

process data and establish monitoring models (Qin, 2003). Although they have been relatively well-developed, effective fault diagnosis technologies are still of great necessity. Upon receiving an alarm based on monitoring models, the root cause of abnormality shall be located timely and accurately, such that instructive information is available for further maintenance and repair actions. Contribution plots are standard fault diagnosis tools tailored to analyzing statistical process monitoring indices and finding out primary contributors (Miller, Swanson, & Heckler, 1998; Westerhuis, Gurden, & Smilde, 2000); however, due to the intimate connections between variables, local abnormality could quickly spread out around the entire plant. Such phenomenon is typically termed as the smearing effect (Liu, 2012), which gives rise to non-negligible contributions of nonessential process variables and finally poses formidable challenges for accurate fault identification and diagnosis (Ji, He, Shang, & Zhou, 2018; Liu, Wong, & Chen, 2014). To address the smearing effect, Alcala and Qin (2009) proposed the reconstruction-based contribution (RBC) plot that is based on the reconstruction magnitude along a certain variable direction. It has been theoretically proven that RBC can ensure basic diagnosability for single sensor faults with large magnitudes whilst other generic contribution

โˆ— Corresponding author. E-mail addresses: [email protected] (C. Shang), [email protected] (H. Ji), [email protected] (X. Huang), [email protected] (F. Yang), [email protected] (D. Huang).

https://doi.org/10.1016/j.conengprac.2019.104193 Received 22 May 2019; Received in revised form 9 October 2019; Accepted 11 October 2019 Available online 17 October 2019 0967-0661/ยฉ 2019 Elsevier Ltd. All rights reserved.

C. Shang, H. Ji, X. Huang et al.

Control Engineering Practice 93 (2019) 104193

plots cannot (Alcala & Qin, 2011). Although the smearing effect can be alleviated to some extent by using RBC, they are still present and may give rise to misleading diagnostic results (Shang, Huang, Yang, & Huang, 2016). It was even pointed out by Ji, He, and Zhou (2016) that RBC completely loses effect when the dimension of latent variables reduces to one. In recent years, sparse learning emerging from statistics has motivated a new roadmap for accurate fault diagnosis. The rationale lies in that only a subset of faulty variables are of significance and sparsity commonly exists in fault directions. From the viewpoint of sparse learning, the reconstruction task can be basically formulated as a mixed-integer nonlinear program where the monitoring statistic is minimized subject to the cardinality constraint of independent faulty factors, and branch and bound (B&B) algorithms have been employed for efficient computations (He, Yang, Chen, & Zhang, 2012; Kariwala, Odiowei, Cao, & Chen, 2010). Nevertheless, due to the combinatorial nature of the optimization problem, the required computation is still too costly to afford online. Yan and Yao (2015) proposed an efficient fault identification approach by promoting sparsity via the least absolute shrinkage and selection operator (Lasso) (Tibshirani, 1996), which originated in the field of statistics and has found widespread applications in variable selection and parsimonious model development (Verhaegen & Hansson, 2016; Zou & Hastie, 2005; Zou, Hastie, & Tibshirani, 2006). Other related works leveraging sparsity of faulty variables include Liu, Zeng, Xie, Luo, and Su (2019), Sun, Zhang, Zhao, and Gao (2017) and Zeng, Huang, and Xie (2015). In the existing approaches, the contribution of each single variable is evaluated individually, which brings considerable challenges in plant-wide interpretation and analysis when there are a plethora of process variables. In engineering practice, a large-scale plant-wide system can be split into multiple local sub-blocks that are connected via a few streams (MacGregor, Jaeckle, Kiparissides, & Koutoudi, 1994). Consequently, adjacent variables may naturally fall into the same group and exhibit similar contributions. It is thus more meaningful and realistic to diagnose the faulty block as the root-cause factor rather than focusing on each single variable, especially for large-scale industrial processes. Another case, which is of similar interest but has ever been forgotten in literature, is that one may want to include lagged measurements in the monitoring model to appropriately describe the process dynamics and improve the monitoring performance, as typically done in dynamic PCA (DPCA) (Ku, Storer, & Georgakis, 1995) and dynamic ICA (DICA) (Lee, Yoo, & Lee, 2004a). In this case, the diagnostic complexity increases with the number of lagged measurements, and it is reasonable to regard lagged measurements of a certain process variable as a whole and quantify their contributions collectively. Qin, Valle, and Piovoso (2001) and Zhang, Zhou, Qin, and Chai (2009) developed multi-block fault diagnosis strategies to reduce the intricacy of fault diagnosis and give interpretable diagnostic results arranged in a hierarchical manner. Motivated by RBC, Liu, Chai, and Qin (2012) proposed a reconstruction-based block contribution (RBBC) to localize the faulty zone in the continuous annealing processes. Nevertheless, RBBC is still vulnerable to the smearing effect and suffers from the ineffectiveness for monitoring statistics defined on low-dimensional subspace. The present work aims to address the aforesaid issues by leveraging group sparsity information in fault occurrence. As a major contribution, a generalized grouped contribution based on the group Lasso is proposed for fault identification and hierarchical fault diagnosis. Having emerged in the field of statistics, the group Lasso evaluates the sum of ๐“2 -norms of grouped variables, and has been extensively used for structural variable selection in regression (Yuan & Lin, 2006). In recent years, group Lasso and related sparse learning algorithms have been adopted in various fields, with typical applications to anomaly detection in large populations (Ohlsson, Chen, Pakazad, Ljung, & Sastry, 2014), bioinformatics data analysis (Yang, Wan, Yang, Xue, & Yu, 2010), bearing fault diagnosis (Zhang, Zhao, Liu, & Kong, 2019; Zhao, Wu, Qiao, Wang, & Chen, 2019), fault location of manufacturing processes and power systems (Feng & Abur, 2015; Kim, Lee,

Kim, Lee, & Lee, 2018), etc. Nevertheless, these existing formulations are based on domain-specific models. To the best of authorsโ€™ knowledge, the use of group Lasso has not been considered in a systematic setting of multivariate statistical process monitoring. It will be shown that the use of group Lasso in fault identification helps penalizing collective contributions of a group of variables with connectivity information among variables utilized. Effects of irrelevant variables within the same group shrink to zero as a whole. In this way, the smearing effect can be remarkably mitigated and concise diagnostic information can be provided. The fault identification problem is cast as an unconstrained optimization problem where the monitoring statistic together with group Lasso penalty is minimized. Because the resulted optimization problem is essentially non-smooth albeit convex, an efficient numerical solution strategy is developed based on the groupwise-majorization-descent (GMD) algorithm proposed in Yang and Zou (2015). Group-wise and variable-wise contributions are defined respectively based on the optimal solution, which eventually provide a systematic hierarchical fault diagnosis scheme. Then, its flexible usage in fault diagnosis is illustrated by two representative circumstances in industrial practice. A fundamental deficiency of generic RBBC is also revealed, and by comparison the advantages of the group Lasso-based contributions are highlighted. The layout of this article is organized as follows. Section 2 reviews classic methodologies of LVM-based multivariate statistical process monitoring, as well as fault diagnosis techniques based on contribution plots and sparsity promotion. In Sections 3 and 4, the fault diagnosis approach based on group Lasso is put forward, with an effective solution algorithm suggested and comprehensive comparisons with the existing strategies carried out. Sections 5 and 6 include two case studies on the Tennessee Eastman (TE) benchmark process and a pilotscale experimental system, followed by concluding remarks in the final section. Notations and definitions. โ€–โ‹…โ€–๐‘ represents the ๐“๐‘ -norm of a vector, and โ€– โ‹… โ€– denotes the Euclidean norm by convention. For a matrix S, Sโ€  denotes its Mooreโ€“Penrose inverse, and S โชฐ 0 denotes the positive semi-definiteness for a symmetric S. I denotes the identity matrix, and ๐ƒ ๐‘– denotes its ๐‘–th column, whose dimensions can be deemed from the context. (โ‹…)+ = max{โ‹…, 0} stands for the max-out function. 2. Preliminaries 2.1. Multivariate statistical process monitoring Assume that process data, collected under nominal operating conditions, are stacked into a matrix X โˆˆ R๐‘ร—๐‘š with ๐‘ rows of observations and ๐‘š columns of process variables. Due to the inherent correlations between variables, it is desirable to perform latent variable modeling to attain a feature space with reduced dimension that explains essential variations of process data. In classical LVMs, a general decomposition of X โˆˆ R๐‘ร—๐‘š can be described as: X = TPT + E, R๐‘ร—๐พ

(1) R๐‘šร—๐พ

where T โˆˆ and P โˆˆ are the score and loading matrices for a low-dimensional feature subspace, respectively. Matrix E โˆˆ R๐‘ร—๐‘š represents residuals that are not captured by the feature subspace. In a nutshell, the score matrix T can be understood as consisting of โ€˜โ€˜latent variablesโ€™โ€™ or โ€˜โ€˜featuresโ€™โ€™ underlying the input data, whereas P represents the basis of features in the input space. Likewise, a new data sample x can also be decomposed as x = Pt + e, where t stands for latent variables of a single sample and e is the residual. This provides a unified view of various LVMs for process monitoring, based on which individual monitoring statistics can be defined on the feature subspace and the residual subspace. Next several most-used examples will be revisited. 2

C. Shang, H. Ji, X. Huang et al.

Control Engineering Practice 93 (2019) 104193

In the SFA-based monitoring framework, ๐‘‡ 2 and SPE statistics can be constructed in a similar manner as those in ICA. Meanwhile, an ฬ‡ can also be exclusive advantage of SFA is that the distribution of P{x} described. This motivates the novel ๐‘† 2 statistic for monitoring process dynamics anomalies (Shang et al., 2015b):

PCA-based process monitoring It is assumed in PCA that the low-dimensional features t capture most variations within inputs, and P is orthonormal. Alternatively, PCA can be also derived by minimizing the variance of the residual e. This is a numerically handy task since only singular value decomposition (SVD) upon the covariance estimation XT Xโˆ•(๐‘ โˆ’ 1) is needed. Two monitoring statistics can be constructed on the basis of PCA. Assuming that ๐พ largest eigenvalues of XT Xโˆ•(๐‘ โˆ’ 1) are arranged in a diagonal matrix ฮ›, the well-known Hotellingโ€™s ๐‘‡ 2 statistic can be defined as: ๐‘‡ 2 = tT ฮ›โˆ’1 t = xT Pฮ›โˆ’1 PT x,

ฬ‡ ๐‘† 2 = tฬ‡ T โ„ฆโˆ’1 tฬ‡ = xฬ‡ T WT โ„ฆโˆ’1 Wx.

As pinpointed in Shang et al. (2016), abnormality shown by essentially indicates control performance changes, and ๐‘† 2 can be used under different steady working points. By synthesizing information from ๐‘‡ 2 , SPE and ๐‘† 2 statistics, one could effectively discriminate real faults from nominal operating condition switches, and remove false alarms with reduced laborious efforts (Shang et al., 2015b). Meanwhile, by disentangling variable contributions made on ๐‘† 2 , one is able to identify potential faulty loops as root causes of control performance changes.

(2)

which quantifies variations occurring in the feature space. As with the residual space, one builds the squared prediction error (SPE) statistic to monitor the variations therein: SPE = โ€–eโ€–2 = โ€–(I โˆ’ PPT )xโ€–2 .

(3)

Variations within the data space can be completely summarized by two PCA-based monitoring statistics, and their usage can be interpreted in a complementary manner. The ๐‘‡ 2 statistic quantifies the โ€˜โ€˜magnitudeโ€™โ€™ of nominal variations within the process, whilst SPE evaluates the level of violations of correlation between process variables. To decide their control limits, quantiles of the Chi-square distribution or other approximation approaches can be used (Qin, 2003).

2.2. Contribution plot for fault diagnosis Upon receiving an alarm from a monitoring statistic, it is necessary to quantify the contribution of each variable to the abnormality in order for effective identification of fault variables. As discussed, monitoring statistics/indices generally admit quadratic expressions, which can be ฬ‡ = xฬ‡ T Mx, ฬ‡ where the written uniformly as Index(x) = xT Mx or Index(x) matrix M is positive semi-definite (PSD). As a result, contribution plots can be conveniently utilized for fault diagnosis based upon general quadratic control charts, see e.g. Alcala and Qin (2011). The simplest form is the complete decomposition contribution (CDC): โˆ‘( โˆ‘ 1 )2 xT Mx = โ€–M1โˆ•2 xโ€–2 = ๐ƒ T๐‘– M 2 x โ‰œ CDC๐‘– (x). (9)

ICA-based process monitoring PCA is essentially based on the Gaussian assumption of process data such that uncorrelation implies independence. To extract statistically independent driving factors from non-Gaussian data, ICA has been created (Hyvรคrinen & Oja, 2000), which has been applied to process monitoring (Lee, Yoo, & Lee, 2004b). The latent variables t are termed as independent components, assumed as having unit variances. The Hotellingโ€™s ๐‘‡ 2 and the SPE statistics are defined as: ๐‘‡ 2 = tT t = xT WT Wx, ) โ€–2 โ€–( โ€  โ€– SPE = eT e = โ€– โ€– I โˆ’ W W xโ€– . โ€– โ€–

(8) ๐‘†2

๐‘–

๐‘–

A variable having the highest value of CDC is deemed as the main faulty variable. However, it is known that CDC lacks a solid theoretical foundation, and could yield ambiguous diagnostic information for some faults. To address these issues, Alcala and Qin (2009) proposed RBC based on the idea of reconstructing fault-free variables and minimizing the quadratic monitoring statistic. When the ๐‘–th variable is faulty, the observed sample can be described as:

(4) (5)

In literature, ๐‘‡ 2 defined with ICA is typically termed as the ๐ผ 2 statistic. The use of ICA for process monitoring is more involved than PCA. On one hand, since statistical independence is evaluated based on higher-order moments, iterative algorithms are typically required for deriving the model. On the other hand, due to the non-Gaussianity of data, one cannot use quantiles of the Chi-square or the Studentโ€™s ๐‘ก distribution to compute control limits (Lee et al., 2004b).

x = xโˆ— + ๐ƒ ๐‘– โ‹… ๐‘“ ,

(10)

xโˆ—

where is the fault-free sample, and ๐‘“ denotes the fault magnitude. The identification of ๐‘“ can be done by minimizing Index(xโˆ— ), that is, min (x โˆ’ ๐ƒ ๐‘– โ‹… ๐‘“ )T M(x โˆ’ ๐ƒ ๐‘– โ‹… ๐‘“ ).

(11)

๐‘“

The philosophy behind (11) can be explained as that, an appropriate estimation of ๐‘“ leads to โ€˜โ€˜nominalโ€™โ€™ reconstructed variables as well as a low value of monitoring index. Taking the derivative with respect to ๐‘“ yields:

Slow feature analysis (SFA)-based process monitoring In both PCA and ICA, the independence assumption is made upon latent variables t. Due to process dynamics, however, t characterizing driving factors of process variations shall have temporal correlations. This issue could be appropriately addressed by SFA (Wiskott & Sejnowski, 2002), which has recently been a promising and popular approach for process monitoring (Qin & Zhao, 2019; Shang et al., 2015b; Shang, Yang, Huang, & Huang, 2018), fault diagnosis (Shang et al., 2016; Zheng & Yan, 2019), soft sensing (Fan, Kodamana, & Huang, 2018; Shang, Huang, Yang, & Huang, 2015a), and oscillation isolation (Gao, Shang, Yang, & Huang, 2015). Different from PCA and ICA, SFA assumes that the latent variables t, referred to as slow features, have as slow variations as possible. This can be formally stated as: { } min E ๐‘กฬ‡ 2๐‘– , ๐‘– = 1, โ€ฆ , ๐พ, (6)

๐‘“ฬ‚ =

๐ƒ T๐‘– Mx ๐ƒ T๐‘– M๐ƒ ๐‘–

(12)

.

Then RBC for the ๐‘–th variable is defined as: RBC๐‘– (x) = Index(๐ƒ ๐‘– โ‹… ๐‘“ฬ‚) =

xT M๐ƒ ๐‘– ๐ƒ T๐‘– Mx ๐ƒ T๐‘– M๐ƒ ๐‘–

.

(13)

In contrast to CDC, RBC can ensure basic diagnosability for single sensor faults, and empirical results show that RBC can help mitigating the smearing effect to some extent (Alcala & Qin, 2009). In a follow-up work, RBC was extended to RBBC to tackle general multi-dimensional faulty variables (Liu et al., 2012). In this case, the faulty sample can be described as:

where ๐‘กฬ‡ โˆถ= ๐‘ก(๐‘˜) โˆ’ ๐‘ก(๐‘˜ โˆ’ 1) stands for the time difference of a signal. The associated optimization problem turns out to be a generalized eigenvalue problem: { } { } E xฬ‡ xฬ‡ T w๐‘– = ๐œ”๐‘– โ‹… E xxT w๐‘– , ๐‘– = 1, โ€ฆ , ๐พ, (7)

x = xโˆ— + ฮžf

(14)

where ฮž โˆˆ R๐‘šร—๐‘›๐‘“ is a matrix that spans the ๐‘›๐‘“ -dimensional fault subspace, and vector f โˆˆ R๐‘›๐‘“ includes fault magnitudes along all fault

which can be numerically solved by conducting SVD twice. 3

C. Shang, H. Ji, X. Huang et al.

Control Engineering Practice 93 (2019) 104193

โˆ‘ where the dimension of x๐‘˜ is ๐‘š๐‘˜ , ๐‘˜ = 1, โ€ฆ , ๐ต, and ๐ต ๐‘˜=1 ๐‘š๐‘˜ = ๐‘š. This can be done either based on expert knowledge about process mechanism, or by performing connectivity analysis among process variables with historical data (Jiang, Wang, & Yan, 2015; Jiang & Yan, 2015; Yang, Duan, Shah, & Chen, 2014). Accordingly, f can be decomposed in a similar way. Then the general fault identification problem with group sparsity can be cast as the following unconstrained optimization problem:

directions. Hence, one can cast fault reconstruction as the following unconstrained optimization problem: min (x โˆ’ ฮžf)T M(x โˆ’ ฮžf),

(15)

f

whose optimal solution is given by: fฬ‚ = (ฮžT Mฮž)โ€  ฮžT Mx.

(16)

Then RBBC is defined as: ฬ‚ = xT Mฮž(ฮžT Mฮž)โ€  ฮžT Mx RBBC(x) = Index(ฮžf)

(17)

min (x โˆ’ f)T M(x โˆ’ f) + ๐œ† f

To evaluate the contribution of a group of variables, one only needs to form ฮž by stacking {๐ƒ T๐‘– , ๐‘– โˆˆ ๎ˆณ}, where ๎ˆณ is the index set of variables of particular interest. Note that CDC, RBC and RBBC can be expressed as quadratic functions, and hence they are easily to compute online.

The spirit of fault isolation indicates that, in principle, only a subset of variables are candidate contributors to the fault (Dunia & Qin, 1998), especially during the initial stage of fault occurrence. Therefore, in the absence of a priori information about fault directions, a natural choice is to limit the number of nonzero elements in the fault vector f. This can be formally achieved by solving the following optimization problem:

where the ๐“0 -norm โ€– โ‹… โ€–0 counts the number of nonzero elements in a vector, and ๐œ… is a user-specified upper-bound of the number of variables essentially responsible for the abnormality. To address the nonconvexity of the ๐“0 -norm in the optimization, He et al. (2012) employed the B&B algorithm for online computations; however, the computational demand is still heavy since the number of candidate combinations grows exponentially with the number of process variables. To tackle this issue, Yan and Yao (2015) proposed to replace the ๐“0 -norm by the ๐“1 -norm as a convex surrogate:

3.2. Solution algorithm Next, the computational issue in solving (22) is addressed. Although one may not have to execute fault diagnosis all the time, a low computational cost is still attractive because the earlier the diagnostic information arrives, the earlier the maintenance action can be taken. For the Lasso-based problem (20), efficient algorithms have been well established to find out the entire solution path (Efron, Hastie, Johnstone, Tibshirani, et al., 2004), i.e. the trajectory of optimal solutions with varying ๐œ†. This owes to the so-called piecewise linearity of the solution path of Lasso-based regression. Unfortunately, the solution path of group Lasso is not piecewise linear in general, which prohibits the design of efficient path-finding algorithms. Hence, the GMD algorithm developed in Yang and Zou (2015) is employed, which is essentially a coordinate descent approach. The underlying idea is to iteratively optimize over f๐‘˜ while fixing f๐‘˜โ€ฒ , ๐‘˜โ€ฒ โ‰  ๐‘˜. Assuming that there exists an old solution fold , and only the ๐‘˜th group f๐‘˜ is to be optimized, one obtains:

min (x โˆ’ f)T M(x โˆ’ f) f

(19)

which is identical to the following unconstrained optimization problem: min(x โˆ’ f)T M(x โˆ’ f) + ๐œ†โ€–fโ€–1 ,

(20)

f

(23)

Note that if ๐‘š๐‘˜ โ‰ก 1 for all ๐‘˜ = 1, โ€ฆ , ๐ต, then (22) reduces to the fault identification problem (20) based on generic Lasso (Yan & Yao, 2015), where the shrinkage penalty is equally made on each individual variable ๐‘“๐‘– , ๐‘– = 1, โ€ฆ , ๐‘š. By contrast, by encouraging structural sparsity with group Lasso, the fault identification problem can be posed in a more general setting, and process knowledge can be effectively utilized to yield meaningful diagnostic information, as to be clarified in the sequel.

(18)

s.t. โ€–fโ€–1 โ‰ค ๐œ…

(22)

๎ˆญ = {๐‘˜ โˆถ โ€–f๐‘˜ โ€– > 0} โŠ† {1, โ€ฆ , ๐ต}.

min (x โˆ’ f)T M(x โˆ’ f) f

๐‘š๐‘˜ โ€–f๐‘˜ โ€–2 ,

๐‘˜=1

where ๐œ† > 0 is used to balance between two conflicting objectives. The โˆš โˆ‘ ๐‘š๐‘˜ โ€–f๐‘˜ โ€–2 can be conceived as an intermedigroup Lasso penalty ๐ต ๐‘˜=1 ate between the ๐“1 -norm penalty and the ๐“2 -norm penalty. It shrinks group-wise contributions {f๐‘˜ } in a collective manner, thereby resulting in exactly zero contributions from irrelevant groups. The optimal solution f is characterized by the active set including decisive nonzero groups:

2.3. Sparsity-promoting fault identification and diagnosis

s.t. โ€–fโ€–0 โ‰ค ๐œ…

๐ต โˆ‘ โˆš

where ๐œ† โ‰ฅ 0 is the regularization parameter. Similar to the ๐“0 -norm, the ๐“1 -norm penalty is also known to be effective in promoting sparsity, and has been extensively studied in sparse learning and compressive sensing (Eldar & Kutyniok, 2012). Most importantly, the ๐“1 -norm penalty is advantageous over the ๐“0 -norm because convex optimization techniques with superior computational efficiency can be used to solve the ๐“1 -norm-based problems. The identified fault, namely the optimal solution f to (20), essentially defines an optimization-based contribution with structural sparsity, such that influences of indecisive variables tend to be precisely zero, which is in sharp contrast to generic contribution plots like CDC and RBC. As a consequence, the ambiguity of diagnostic information can be remarkably alleviated, which is particularly beneficial for subsequent maintenance actions (Liu et al., 2019).

๐ฟ(f) โ‰œ (x โˆ’ f)T M(x โˆ’ f) + ๐œ†

๐ต โˆ‘ โˆš ๐‘š๐‘˜ โ€–f๐‘˜ โ€–2 , ๐‘˜=1

= (x โˆ’ fold + fold โˆ’ f)T M(x โˆ’ fold + fold โˆ’ f) + ๐œ†

๐ต โˆ‘ โˆš ๐‘š๐‘˜ โ€–f๐‘˜ โ€–2 , ๐‘˜=1

= (f โˆ’ fold )T M(f โˆ’ fold ) + 2(f โˆ’ fold )T M(fold โˆ’ x) + ๐œ†

๐ต โˆ‘ โˆš

๐‘š๐‘˜ โ€–f๐‘˜ โ€–2 + ๐ถ1

๐‘˜=1

โˆš old old T T = (f๐‘˜ โˆ’ fold ๐‘˜ ) M๐‘˜ (f๐‘˜ โˆ’ f๐‘˜ ) + (f๐‘˜ โˆ’ f๐‘˜ ) v๐‘˜ + ๐œ† ๐‘š๐‘˜ โ€–f๐‘˜ โ€–2 + ๐ถ2

3. Fault identification based on group Lasso

(24) 3.1. Problem setup Suppose that all process non-overlapping groups, that is, โŽก x1 โŽค โŽข โŽฅ x x = โŽข 2โŽฅ, โŽขโ‹ฎโŽฅ โŽขx โŽฅ โŽฃ ๐ตโŽฆ

variables

can

be

split

into

where f๐‘˜โ€ฒ = fold , ๐‘˜โ€ฒ M๐‘˜ โˆˆ R๐‘š๐‘˜ ร—๐‘š๐‘˜ is

โˆ€๐‘˜โ€ฒ

old

โ‰  ๐‘˜. v โˆถ= โˆ‡๐ฟ(f ) denotes the gradient vector, and the sub-matrix of M associated with f๐‘˜ . ๐ถ1 and ๐ถ2 are constants independent from f๐‘˜ . Because M is PSD, the sub-matrix M๐‘˜ is also PSD, with its largest eigenvalue denoted by ๐›พ๐‘˜ > 0. Consequently, it holds that M๐‘˜ โชฏ ๐›พ๐‘˜ I, and the following majorization can be established:

๐ต

(21) โˆš old old T T ๐ฟ(f) โ‰ค ๐›พ๐‘˜ (f๐‘˜ โˆ’ fold ๐‘˜ ) (f๐‘˜ โˆ’ f๐‘˜ ) + (f๐‘˜ โˆ’ f๐‘˜ ) v๐‘˜ + ๐œ† ๐‘š๐‘˜ โ€–f๐‘˜ โ€–2 + ๐ถ2 . 4

(25)

C. Shang, H. Ji, X. Huang et al.

Control Engineering Practice 93 (2019) 104193

cause any alarm (Yan & Yao, 2015). Therefore, a criterion to determine ๐œ† is suggested as follows:

Instead of minimizing ๐ฟ(f), the right hand side of (25) is selected as the optimization objective, which is a special case of regularized least squares regression with group Lasso penalty. In this case, minimization over f๐‘˜ admits a simple closed-form solution: ) โˆš )( ( ๐œ† ๐‘š๐‘˜ v๐‘˜ old new + f๐‘˜ 1โˆ’ . (26) f๐‘˜ = 2๐›พ๐‘˜ โ€–v๐‘˜ + 2๐›พ๐‘˜ fold ๐‘˜ โ€–2 +

โŽง โˆ— โŽช max ๐œ†๐‘— ๐œ† = โŽจ ๐‘—=1,โ€ฆ,๐ฝ โŽช s.t. (x โˆ’ f(๐œ†โˆ—๐‘— ))T M(x โˆ’ f(๐œ†โˆ—๐‘— ))T < ๐›พM โŽฉ

where f(๐œ†โˆ—๐‘— ) denotes the optimal solution f derived with the regularization parameter ๐œ†โˆ—๐‘— . The formula (29) can be understood as seeking the most sparse solution that lies exactly at the transition point and induces normal reconstructed statistic (28). Unfortunately, for group Lasso, an efficient algorithm for computing all transition points is generally unavailable (Yang & Zou, 2015), and thus an approximation strategy is considered. The algorithmic details are provided in Algorithm 2. A series of densely distributed regularization parameters ๐œ†1 > ๐œ†2 > โ‹ฏ > ๐œ†๐‘„ are first generated and calculation starts from the largest one, i.e. ๐œ† = ๐œ†1 . In fact, ๐œ†1 can be set as ๐œ†โˆ—1 , i.e. the smallest regularization parameter inducing exactly zero solution, which can be calculated based on the Karushโ€“Kuhnโ€“Tucker condition (Yang & Zou, 2015): โˆš โ€– ๐œ†โˆ—1 = max โ€– (30) โ€–h๐‘˜ โ€–2 โˆ• ๐‘š๐‘˜ , h = 2Mx. ๐‘˜=1,โ€ฆ,๐ต

For brevity details of derivation are omitted and readers are referred to Yuan and Lin (2006). Hence, iterative minimization based on groupwise majorization can be carried out in a cost efficient way. Algorithm 1 summarizes the entire procedure of GMD with a fixed regularization parameter ๐œ†. It has been proved that GMD possesses strictly descending properties and converges to a global optimum (Yang & Zou, 2015). Note that, however, group Lasso penalized regression problems typically suffer from potential non-uniqueness of solutions, see Roth and Fischer (2008) and references therein. Hence, only the convergence of Algorithm 1 to one of multiple optimal solutions can be ensured. To inspect convergence, various stopping criteria can be adopted, for example, max

๐‘˜=1,โ€ฆ,๐ต

old โ€–fnew ๐‘˜ โˆ’ f๐‘˜ โ€–

1 + โ€–fnew ๐‘˜ โ€–

โ‰ค ๐œ–,

(27)

Then in each iteration, Algorithm 1 is initialized using the optimal solution derived in the preceding iteration as a warm start strategy. Then proceed by decreasing the value of ๐œ† until the active set ๎ˆญ is changed and the condition (28) is met. Despite a mass of candidate regularization parameters {๐œ†1 , โ€ฆ , ๐œ†๐‘„ }, the algorithm is found to be quite efficient and arrive at convergence within a few iterations thanks to the warm start strategy. Moreover, the algorithm commonly terminates when ๐‘— < ๐‘„. Such aspects eventually enable significant computational convenience.

where ๐œ– is a pre-specified accuracy tolerance. It is worth mentioning that the online computational efficiency can be further improved if multiple processors are coordinated to operate in parallel, since majorizations and minimizations within different groups can be carried out simultaneously. Algorithm 1 GMD Algorithm for Group Lasso Problems Inputs: Initial solution f0 , faulty data x, PSD matrix M and regularization parameter ๐œ†. 1: Initialization: f = f0 . Compute ๐›พ๐‘˜ , the largest eigenvalue of M๐‘˜ , ๐‘˜ = 1, โ‹ฏ , ๐ต. 2: Do until convergence 3: for ๐‘˜ = 1, โ‹ฏ , ๐ต 4: โ€“ Compute the gradient v(f) ( = 2M(f โˆ’ x). ) โˆš ( ) ๐œ† ๐‘š v๐‘˜ 5: โ€“ Do update f๐‘˜ โ‰œ 2๐›พ + f๐‘˜ 1 โˆ’ โ€–v +2๐›พ ๐‘˜f โ€– . ๐‘˜

๐‘˜

๐‘˜ ๐‘˜ 2

Algorithm 2 Fast Online Algorithm for Group Lasso-Based Fault Diagnosis Inputs: Faulty data x, PSD matrix M, control limit ๐›พ, and a sequence of regularization parameters ๐œ†1 > ๐œ†2 > โ‹ฏ > ๐œ†๐‘„ . 1: Initialization: f(0) = 0. 2: for ๐‘— = 1, โ‹ฏ , ๐‘„ 3: Solve for f(๐‘—) and ๎ˆญ(๐‘—) with ๐œ†๐‘— by performing Algorithm 1 initialized with f(๐‘—โˆ’1) . 4: If Index(x โˆ’ f(๐‘—โˆ’1) ) โ‰ค ๐›พM and ๎ˆญ(๐‘—โˆ’1) โ‰  ๎ˆญ(๐‘—) , then abort the loop. 5: end for 6: Return f = f(๐‘—โˆ’1) .

+

6: end for 7: end 8: Return f and ๎ˆญ = {๐‘˜ โˆถ โ€–f๐‘˜ โ€– > 0}. Obviously, the selection of ๐œ† heavily influences the performance of fault reconstruction. If ๐œ† is too large, all variables tend to provide zero contributions, which prohibit correct identification of real faulty variables. If ๐œ† is excessively small, f โ†’ x and there is no meaningful information embodied by f. In this sense, a reasonable guideline is indispensable for choosing ๐œ†. Essentially, the dependence of group sparsity of f on ๐œ† is an intrinsic property of the problem (22) with x and M given. To be more precise, there is a finite sequence of transition points ๐œ†โˆ—1 > ๐œ†โˆ—2 > โ‹ฏ > ๐œ†โˆ—๐ฝ such that the induced active set ๎ˆญ varies at each transition point (Yuan & Lin, 2006). When ๐œ† > ๐œ†โˆ—1 , contributions of all groups shrink to zero, i.e. f = 0 and ๎ˆญ = โˆ…. When ๐œ† < ๐œ†โˆ—๐ฝ , all groups are selected and ๎ˆญ = {1, โ€ฆ , ๐ต}. Within each interval (๐œ†๐‘˜+1 , ๐œ†๐‘˜ ), the active set ๎ˆญ does not change. Hence, ๐œ† can be chosen from transition points {๐œ†โˆ—๐‘— } since they induce solutions with best performances under the same levels of sparsity. Meanwhile, a crucial fact is that, if f is appropriately estimated, the reconstructed fault-free variables xโˆ— = x โˆ’ f shall be recognized as normal by the monitoring statistic, formally stated as: Index(xโˆ— ) = (x โˆ’ f)T M(x โˆ’ f)T < ๐›พM ,

(29)

4. Fault diagnosis with grouped contributions 4.1. Contribution definitions and practical usage Based on the optimal solution f derived by Algorithm 2, the following group-wise contribution (GWC) is defined: GWC๐‘˜ = โ€–f๐‘˜ โ€–,

(31)

which is the ๐“2 -norm of the fault in the ๐‘˜th group. And the fault magnitude |๐‘“๐‘– | along each direction can be naturally interpreted as the variable-wise contribution (VWC). A combinational use of GWC and VWC leads to a hierarchical fault diagnosis scheme. For a large-scale industrial process, massive process variables can be first decomposed into various groups. Then, after a certain monitoring index exceeds its threshold, a process operator could inspect GWCs of various group to determine the faulty group and obtain a โ€˜โ€˜roughโ€™โ€™ location of possible abnormality (Jiang, Yan, & Huang, 2019). Afterwards, VWCs within the faulty group can be further utilized for accurate fault identification and diagnosis. This

(28)

where ๐›พM is the corresponding control limit. In other words, after successful reconstruction the fault-free variables are supposed not to 5

C. Shang, H. Ji, X. Huang et al.

Control Engineering Practice 93 (2019) 104193 Table 1 Divided groups of process variables in the TE process.

essentially resembles the procedure of multi-block fault diagnosis (Liu et al., 2012). Moreover, the proposed strategy turns out to be particularly useful for handling monitoring statistics established with lagged measurements, as have been done in PCA, ICA and SFA to enhance monitoring performance in the presence of evident process dynamics. This can be precisely described as follows. For each variable ๐‘ฅ๐‘– (๐‘ก), ๐‘‘ lagged measurements are included to form a group: โŽก ๐‘ฅ๐‘– (๐‘ก) โŽค โŽข โŽฅ ๐‘ฅ (๐‘ก โˆ’ 1) โŽฅ โˆˆ R๐‘‘+1 , x๐‘– (๐‘ก) = โŽข ๐‘– โŽข โŽฅ โ‹ฎ โŽข๐‘ฅ (๐‘ก โˆ’ ๐‘‘)โŽฅ โŽฃ ๐‘– โŽฆ

= ๐ฑT ๐‹T UฮฃVT (VฮฃUT UฮฃVT )โ€  VฮฃUT ๐‹๐ฑ = ๐ฑT ๐‹T UฮฃVT (Vฮฃ2 VT )โ€  VฮฃUT ๐‹๐ฑ

.

T

M = L L, where L โˆˆ

= ๐ฑT ๐Œ๐ฑ = Index(x) This completes the proof.

โ–ก

Theorem 1 reveals that, whenever the dimension of the fault space, i.e. the number of grouped variables, is higher than that of the subspace defining monitoring statistics, different groups of variables eventually give same contributions of RBBC, which are no longer informative for effective troubleshooting. By contrast, such degeneracy can be effectively avoided by the proposed scheme since the reconstruction is cast as an optimization problem with group Lasso penalty. This issue will be investigated in Section 6. 5. Case studies on Tennessee Eastman process In this section, the TE benchmark process (Downs & Vogel, 1993) is used as a well-established platform to investigate the performance of the proposed fault identification and diagnosis approach. To stabilize the system, the plant-wide control strategy developed by Lyman and Georgakis (1995) is adopted. The simulated datasets are available from the website of Prof. Richard Braatz. There are 12 manipulated variables XMV(1โ€“12) and 41 measured variables XMEAS(1โ€“41) in the TE process, and the variables to be monitored are chosen as XMEAS(1โ€“22) and XMV(1โ€“11) whose sampling interval is 3 min, thereby giving rise to 33 process variables in total.1 500 data samples are collected under the nominal operating condition, which are used to build monitoring models. What is more, twenty-one faulty cases are also deliberately designed, which have been extensively used to verify performance of process monitoring and fault diagnosis techniques (Chiang et al., 2000). For further details about the TE process, readers are referred to Lyman and Georgakis (1995). Next, two general situations to which the proposed strategy applies are investigated.

(33)

5.1. Multi-block fault diagnosis First, a trivial PCA-based monitoring model is built using 500 normal data, based on which ๐‘‡ 2 and SPE statistics are defined. The latent subspace has ๐พ = 18 dimensions, which explains 95% variations within input data. Then to establish a hierarchical process monitoring scheme, 33 process variables are split into 5 blocks according to the process flowchart, as summarized in Table 1 where variables related to a certain unit are put into the same group. In the proposed group Lasso-based strategy, ๐œ– is set as 10โˆ’6 in the stopping criterion (27). The sequence of vanishing regularization parameters is determined as:

(34) Then the following theorem is established.

Theorem 1. Assume that ๐พ โ‰ค ๐‘›๐‘“ and Q = Lฮž โˆˆ R๐พร—๐‘›๐‘“ has full row rank, i.e. rank{Q} = ๐พ. It then holds that RBBC(x) = Index(x). Proof. First, conduct SVD on Q: Q = UฮฃVT ,

(36)

= ๐ฑT ๐‹T UฮฃVT Vฮฃโˆ’2 VT VฮฃUT ๐‹๐ฑ

By contrast, the proposed hierarchical fault diagnosis scheme could provide both group-wise and variable-wise contributions straightforwardly, without resorting to a two-step procedure. Meanwhile, it has been pointed out by Ji et al. (2016) that classical RBC-based contribution plots suffer from severe invalidity in some special cases. Specifically, when the dimension of the latent subspace based on which monitoring statistics are defined reduces to one, RBC plots for all variables will have same values, thereby rendering diagnostic information meaningless. Next it is shown at a theoretical level that, under certain circumstances RBBC is also subject to similar deficiencies as RBC. Assume that Index(x) = xT Mx is defined on a ๐พ-dimensional subspace (๐พ < ๐‘š). Hence, it can be easily deduced that M is rank-deficient and admits the following decomposition:

R๐พร—๐‘š .

1, 2, 3, 5, 6, 23, 24, 25 7, 8, 9, 21, 32 11, 12, 13, 14, 22, 29, 33 10, 20, 27, 28 4, 15, 16, 17, 18, 19, 26, 30, 31

= ๐ฑT ๐‹T Q(QT Q)โ€  QT ๐‹๐ฑ

Recall that after using RBBC to determine the faulty group of variables, one needs to leverage RBC again to further evaluate variablewise contributions within the group. To do so, the reconstruction-based variable contribution (RBVC) is defined (Liu et al., 2012):

๐ƒ T๐‘– M๐‘˜ ๐ƒ ๐‘–

Indices of process variables

Inputs Reactor Separator & Condenser Compressor Stripper

RBBC(x) = ๐ฑT ๐Œฮž(ฮžT ๐Œฮž)โ€  ฮžT ๐Œ๐ฑ

4.2. Comparison with RBBC

xT๐‘˜ M๐‘˜ ๐ƒ ๐‘– ๐ƒ T๐‘– M๐‘˜ x๐‘˜

Description

1 2 3 4 5

UUT = UT U = I and VT V = I hold. According to the definition of Mooreโ€“Penrose inverse, RBBC can be computed as:

(32)

which gives rise to an augmented ๐‘š(๐‘‘ + 1)-dimensional vector of monitored variables, i.e. x โ‰œ [xT1 , xT2 , โ€ฆ , xT๐‘š ]T . In this case, using variable-wise contribution plots typically leads to messy diagnostic information that is difficult to organize and visualize, since the input complexity has increased by ๐‘‘ +1 times. Indeed, for identifying decisive variables, evaluating the total contribution of x๐‘– is of major interest, and the proposed strategy can be applied to attain concise diagnostic results by simply regarding x๐‘– as a group of ๐‘‘ + 1 variables and setting ๐ต = ๐‘š. Meanwhile, it is worth noting that the diagnostic performance of the proposed method is dependent on the monitoring model and the control limit used; hence, they shall be appropriately developed by an experienced practitioner beforehand.

RBVC๐‘– (x๐‘˜ ) =

Group No.

๐œ†๐‘– = ๐œ†โˆ—1 ร— 0.90.1(๐‘–โˆ’1) , ๐‘– = 1, โ€ฆ , 1000

(35)

where U โˆˆ R๐พร—๐พ , ฮฃ = diag{๐œŽ1 , โ€ฆ , ๐œŽ๐พ }, ๐œŽ1 , โ€ฆ , ๐œŽ๐พ > 0 and V โˆˆ R๐‘›๐‘“ ร—๐พ according to rank{Q} = ๐พ. Hence orthonormality relations

(37)

1 Throughout this section, the first 22 process variables are XMEAS(1โ€“22), while the last 11 process variables are XMV(1โ€“11).

6

C. Shang, H. Ji, X. Huang et al.

Control Engineering Practice 93 (2019) 104193

where ๐œ†โˆ—1 is derived based on (30). The algorithm is implemented in MATLAB R2018b running on a desktop computer with Intel(R) Core(TM) i7-8086K CPU @ 4.80 GHz and 32GB RAM. Then two particular faulty cases are investigated in details, and diagnostic results based on various approaches are comparatively analyzed. In each case, there are 960 samples in total, and the fault occurrence starts from the 160th sample. IDV(4): Step disturbance in reactor cooling water inlet temperature In this case, the inlet temperature of reactor cooling water experiences a sudden change at the 160th sample. Fig. 1 displays the ๐‘‡ 2 and SPE statistics of the PCA model, where ๐‘‡ 2 can effectively detect abnormality at the 161th sample. Then based on ๐‘‡ 2 the group Lasso problem is solved with Algorithm 2, which starts from ๐œ†โˆ—1 = 12.5223. Then GMD in Algorithm 1 is repeatedly executed with vanishing regularization parameters in (37), and the reconstructed indices along with GWCs are shown in Fig. 2. It can be observed that with ๐œ† decreasing, the contribution of Group 2 emerges, and the reconstructed monitoring index declines. Moreover, contributions of other groups remain zero, thereby highlighting the effect of group Lasso in securing group sparsity. When ๐œ† continues to decrease, the reconstructed index finally falls below the limit, and then Group 3 appears as an active group when ๐œ†๐‘— = 1.3275, which terminates the entire procedure. Finally, the regularization parameter is determined as ๐œ†๐‘—โˆ’1 = 1.3416, and the overall CPU time is only 0.57s. If the value of ๐œ† continues to decrease, then sparsity is gradually sacrificed and eventually the trivial solution f = x is attained at ๐œ† = 0, which gives exactly zero value of the reconstructed index. This apparently showcases the fundamental tradeoff between minimizing the monitoring index and pursuing group sparsity. Then fault identification is conducted based on the ๐‘‡ 2 statistic with various strategies, as shown in Fig. 3. It can be observed that RBC successfully signifies XMEAS(9) (the reactor temperature) and XMV(10) (the reactor cooling water flow) as primary contributors. But the number of contributions is the same as that of process variables, thereby leading to high diagnosis complexity. RBBC and the group Lasso-based contributions provide much more concise diagnostic information organized in a group-wise manner, and both of them manage to determine the faulty group correctly, namely Group 2 related to the reactor. Nevertheless, results of RBBC are more or less prone to the smearing effect since Group 5 also shows obvious contributions. By above comparisons, it can be clearly seen that the group Lasso approach is excellent in giving accurate and succinct diagnostic information, which fundamentally owes to the structural sparsity enhanced in a group-wise manner.

Fig. 1. Monitoring statistics for IDV(4) based on PCA.

Fig. 2. Reconstructed index and grouped sparsity with varying ๐œ† on the 161st sample of IDV(4).

Fig. 3. Diagnostic results of ๐‘‡ 2 statistic for IDV(4) at the 161st sample. (a) RBC plot. (b) RBBC plot. (c) Group Lasso-based contribution plot.

7

C. Shang, H. Ji, X. Huang et al.

Control Engineering Practice 93 (2019) 104193

Fig. 4. Variable contributions within the group for IDV(4) at the 161st sample. (a) RBVC. (b) VWC.

IDV(9): Random variation in D feed temperature In IDV(9), random variation occurs in D feed temperature. This is a special faulty case of TE process mainly because it exerts minor influence on the process, and is hence notoriously difficult to identify using classic monitoring approaches (Yin, Ding, Haghani, Hao, & Zhang, 2012). Interestingly, some useful diagnostic information can be uncovered by the proposed strategy. Monitoring statistics based on PCA are plotted in Fig. 5. It can be observed that, after random variations are introduced, the ๐‘‡ 2 statistic temporarily exceeds its threshold, indicating a moderate short-term deviation of operating condition. Then fault diagnosis is carried out at the 161st sample based on ๐‘‡ 2 , and the results are reported in Fig. 6. In this case, due to disturbance propagation, variables in different groups are more or less affected, which showcase the so-called smearing effect. Obviously, RBC yields ambiguous information since a large number of process variables manifest non-negligible contributions after the fault has propagate around the entire system, and the main faulty variables are not highlighted. RBBC is also subject to smearing effects because all groups are shown to contribute noticeably, and Group 5 is erroneously identified as the faulty group. Such deficiencies fundamentally owes to the fact that RBBC and RBC are based on the assumption of occurrence of a single fault, which is rather restrictive. Hence, their performance could degrade heavily in the presence of multi-directional faults, mainly because projections of different faults become co-linear in the low-dimensional subspace (Xu et al., 2013). For this reason, it may be the case that a group of variables with even minor abnormality are erroneously identified as primary contributors, while the real root cause group shows moderate contributions. By comparison, the group Lasso-based contribution is based on a more realistic assumption of multi-directional faults and takes a holistic optimization approach to find out critical groups. Consequently, it clearly identifies Group 1 as the faulty block of variables pertaining to process inputs, with contributions of other groups being exactly zero.

Fig. 5. Monitoring statistics for IDV(9) based on PCA.

After recognizing Group 2 as the primary faulty group, the faulty variable can further be located within the faulty group based on VWC and RBVC. The latter has been utilized in conjunction with RBBC to evaluate variable-wise contributions within a group. The results are visualized in Fig. 4, where both RBVC and VWC can detect the reactor temperature (XMEAS(9)) and the reactor cooling water flow (XMV(10)). A subtle advantage of VWC is that the contribution of XMV(10) is more pronounced than XMEAS(9), which alludes to the occurrence of abnormality in the cooling water system.

Fig. 6. Diagnostic results of ๐‘‡ 2 statistic for IDV(9) at the 161st sample. (a) RBC plot. (b) RBBC plot. (c) Group Lasso-based contribution plot.

8

C. Shang, H. Ji, X. Huang et al.

Control Engineering Practice 93 (2019) 104193

Fig. 7. Monitoring statistics for IDV(11) based on SFA.

Fig. 9. Monitoring statistics for IDV(15) based on SFA.

5.2. Fault diagnosis with dynamic LVM In this subsection, a different case is considered where lagged measurements have been incorporated to build a dynamic LVM, based on which monitoring statistics can then be constructed. It has been found that such a strategy could significantly enhance the performance of SFA in process monitoring (Shang et al., 2015b). Here, a dynamic SFA model with ๐‘‘ = 2 lagged measurements added is built. Under such circumstance, the input size triples and it is necessary to group all lags of a certain process variable and evaluate their contributions as a whole, as discussed in the preceding section. Two representative scenarios with clear fault mechanisms are chosen to illustrate the effectiveness of the group Lasso-based contribution plot.

and the results are given in Fig. 8. The first observation is that, although XMV(10) (condenser cooling water flow) is correctly identified by RBC with significance, the entire diagnostic scheme appears to be fairly messy. This implies the necessity of using grouped contributions in dynamic monitoring models, and hereby two group contributions are investigated, as shown in Fig. 8(b) and (c). It is obvious that the number of contributions become much smaller and XMV(10) can be accurately recognized by both RBBC and group Lasso-based approach; however, RBBC is more sensitive to the smearing effect and most groups do not exhibit zero contributions. In contrast, the proposed group Lasso-based contribution furnishes much clearer diagnostic information with group sparsity imposed. This case well highlights the particular advantages of group Lasso in using dynamic LVMs for fault diagnosis.

IDV(11): Random variations in reactor cooling water inlet temperature In this case, random variations take place in reactor cooling water inlet temperature, which consistently alter both steady operating condition and process dynamics. As indicated in Fig. 7, the ๐‘‡ 2 and SPE statistics of dynamic SFA show operating condition deviations, and the ๐‘† 2 statistic implies dynamics anomalies. Then contribution analysis based on ๐‘† 2 , which is also a quadratic index, is carried out,

IDV(15): Stiction in the condenser cooling water valve IDV(15) is known as a special faulty case that is rather challenging to identify using traditional data-driven methods (Yin et al., 2012), while dynamic SFA with contribution plots has shown superior sensitivity in dealing with this fault (Shang et al., 2016). Next it is shown that the diagnostic performance can be further enhanced by incorporating group sparsity in fault identification.

Fig. 8. Diagnostic results of ๐‘† 2 statistic for IDV(11) at the 167th sample. (a) RBC plot. (b) RBBC plot. (c) Group Lasso-based contribution plot.

9

C. Shang, H. Ji, X. Huang et al.

Control Engineering Practice 93 (2019) 104193

Fig. 10. Diagnostic results of ๐‘† 2 statistic for IDV(15) at the 163rd sample. (a) RBC plot. (b) RBBC plot. (c) Group Lasso-based contribution plot.

Fig. 11. Schematic diagram of a pilot-scale three-tank system.

Fig. 12. The ๐‘† 2 statistic for the three-tank system.

In Fig. 9, the monitoring results based on dynamic SFA are reported, where mild dynamics anomalies are indicated by ๐‘† 2 . Then contribution analysis based on ๐‘† 2 is conducted, and the results are shown in Fig. 10. Even though XMEAS(17) (the stripper underflow) and XMV(11) (condenser cooling water flow) are highlighted by RBC as possible faulty variables, the entire diagnostic result looks quite complicated. Meanwhile, there are many nonzero contributions made by irrelevant variables. Despite of the simplicity of RBBC plots, smearing effects still tend to contaminate the contributions and pose challenges for deriving a correct answer. In sharp contrast to RBBC, the group Lasso-based strategy successfully shrinks contributions of irrelevant variables to zero, and distinctly signifies XMV(11) as the main root-cause, which is in perfect accordance with the fault mechanism.

Table 2 Process variables to be monitored in the pilot-scale three-tank system. Group No.

Variable

Description

1

CV1 CV3 MV1

Liquid level of Tank 1 Inlet flow rate of Tank 1 Inlet flow rate of Tank 1

2

CV2 CV4 MV2

Liquid level of Tank 3 Inlet flow rate of Tank 3 Inlet flow rate of Tank 3

magnitude increases and leads to abnormal dynamic behaviors. Accordingly, 1000 faulty data samples are collected. The goal is to detect the abnormality and achieve localization with grouped contributions, which are calculated based on the group information in Table 2. A monitoring model tailored to detecting process dynamics anomalies is established based on SFA without lagged measurements, where ๐พ = 4 slow features are used to construct the ๐‘† 2 statistic. The monitoring result in the faulty phase is shown in Fig. 12. It can be observed that process dynamics has slightly changed, since about 20% samples exceed the control limit. Then troubleshooting is conducted on this minor faulty case with the 226th sample used for analysis, with results shown in Fig. 13. In the RBBC plot, the irrelevant group still displays nonzero contribution, which can be completely excluded by the proposed approach. Next the dimension of the feature space is chosen as 3, which is identical to the number of process variables in both groups. Obviously, the standing assumption of Theorem 1 is satisfied for both groups of variables, and hence RBBC becomes theoretically invalid in this case. Contributions of two groups are presented in Fig. 14. As expected, two different groups yield same values of RBBC, and the inefficacy of

6. Case studies on a pilot-scale three-tank system In this section, case studies are performed based on a laboratory three-tank system, which includes three tanks connected in series by pipelines, as shown in Fig. 11. Two pumps are used to feed water into Tanks 1 and 3, which are connected via Tank 2. Because of this, there exist correlations between liquid levels of Tanks 1 and 3. Three outlet valves are kept open, and two cascade controllers are used to maintain liquid levels in Tanks 1 and 3, which are the main controlled variables in this case. In the primary loop, the manipulated variable is the set point of inlet flow rate, while in the secondary loop, the manipulated variable is the pump motor speed. A total of six process variables are measured with sensors, which can be classified into two groups related to Tank 1 and Tank 3, respectively, as listed in Table 2. A colored Gaussian noise is introduced as a disturbance variable in the secondary loop of the cascade controller of Tank 1, which induces nominal plant-wide variations around the set-point and 1000 normal data samples are generated. In the test phase, the disturbance 10

C. Shang, H. Ji, X. Huang et al.

Control Engineering Practice 93 (2019) 104193

magnitudes by means of nonconvex penalties (Huang & Yan, 2018), and extend the proposed method to handle nonlinear monitoring models such as kernel PCA. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments This work is supported in part by National Natural Science Foundation of China (Nos. 61673236, 61433001, 61873142). Hongquan Ji is supported by National Natural Science Foundation of China (No. 61803232), the Natural Science Foundation of Shandong Province (No. ZR2019BF021), and China Postdoctoral Science Foundation (No. 2018M642679). Xiaolin Huang acknowledges financial support from National Natural Science Foundation of China (No. 61603248). Chao Shang also thanks Prof. Yuan Yao for helpful discussions.

Fig. 13. Diagnostic results of ๐‘† 2 statistic with ๐พ = 4 for the three-tank system at the 226th sample. (a) RBBC plot. (b) Group Lasso-based contribution plot.

References Alcala, C. F., & Qin, S. J. (2009). Reconstruction-based contribution for process monitoring. Automatica, 45, 1593โ€“1600. Alcala, C. F., & Qin, S. J. (2011). Analysis and generalization of fault diagnosis methods for process monitoring. Journal of Process Control, 21, 322โ€“330. Chiang, L. H., Russell, E. L., & Braatz, R. D. (2000). Fault diagnosis in chemical processes using fisher discriminant analysis, discriminant partial least squares, and principal component analysis. Chemometrics and Intelligent Laboratory Systems, 50, 243โ€“252. Downs, J. J., & Vogel, E. F. (1993). A plant-wide industrial process control problem. Computers & Chemical Engineering, 17, 245โ€“255. Dunia, R., & Qin, S. J. (1998). Subspace approach to multidimensional fault identification and reconstruction. AIChE Journal, 44, 1813โ€“1831. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle regression. The Annals of Statistics, 32, 407โ€“499. Eldar, Y. C., & Kutyniok, G. (2012). Compressed sensing: Theory and applications. Cambridge University Press. Fan, L., Kodamana, H., & Huang, B. (2018). Identification of robust probabilistic slow feature regression model for process data contaminated with outliers. Chemometrics and Intelligent Laboratory Systems, 173, 1โ€“13. Feng, G., & Abur, A. (2015). Fault location using wide-area measurements and sparse estimation. IEEE Transactions on Power Systems, 31, 2938โ€“2945. Gao, X., Shang, C., Yang, F., & Huang, D. (2015). Detecting and isolating plant-wide oscillations via slow feature analysis. In 2015 American Control Conference (ACC) (pp. 906โ€“911). IEEE. Ge, Z., Song, Z., Ding, S. X., & Huang, B. (2017). Data mining and analytics in the process industry: The role of machine learning. IEEE Access, 5, 20590โ€“20616. He, B., Yang, X., Chen, T., & Zhang, J. (2012). Reconstruction-based multivariate contribution analysis for fault isolation: A branch and bound approach. Journal of Process Control, 22, 1228โ€“1236. Huang, X., & Yan, M. (2018). Nonconvex penalties with analytical solutions for one-bit compressive sensing. Signal Processing, 144, 341โ€“351. Hyvรคrinen, A., & Oja, E. (2000). Independent component analysis: Algorithms and applications. Neural Networks, 13, 411โ€“430. Ji, H., He, X., Shang, J., & Zhou, D. (2018). Exponential smoothing reconstruction approach for incipient fault isolation. Industrial and Engineering Chemistry Research, 57, 6353โ€“6363. Ji, H., He, X., & Zhou, D. (2016). On the use of reconstruction-based contribution for fault diagnosis. Journal of Process Control, 40, 24โ€“34. Jiang, Q., Wang, B., & Yan, X. (2015). Multiblock independent component analysis integrated with Hellinger distance and Bayesian inference for non-Gaussian plant-wide process monitoring. Industrial and Engineering Chemistry Research, 54, 2497โ€“2508. Jiang, Q., & Yan, X. (2015). Nonlinear plant-wide process monitoring using MI-spectral clustering and Bayesian inference-based multiblock KPCA. Journal of Process Control, 32, 38โ€“50. Jiang, Q., Yan, X., & Huang, B. (2019). Review and perspectives of data-driven distributed monitoring for industrial plant-wide processes. Industrial and Engineering Chemistry Research, 58, 12899โ€“12912. Kariwala, V., Odiowei, P. E., Cao, Y., & Chen, T. (2010). A branch and bound method for isolation of faulty variables through missing variable analysis. Journal of Process Control, 20, 1198โ€“1206.

Fig. 14. Diagnostic results of ๐‘† 2 statistic with ๐พ = 3 for the three-tank system at the 226th sample. (a) RBBC plot. (b) Group Lasso-based contribution plot.

RBBC in this case prohibits appropriate evaluations of group contributions. In contrast, the proposed approach still takes effect by correctly highlighting Group 1 and reporting zero contribution of Group 2. Finally, the cases of ๐พ = 1 and ๐พ = 2 are also investigated and similar observations are obtained. In some sense, the assumption of Theorem 1 is not so restrictive, especially when the variable size within a group is large. It indicates that the invalidity of RBBC can occasionally occur, which further kills the opportunity of performing within-group diagnosis. Such degeneracy can be avoided by the proposed group Lasso-based contribution. Hence, the proposed approach shows widespread applicability and practical advantages. 7. Concluding remarks In this paper, the fault identification problem was approached from the novel viewpoint of grouped variable selection, and a hierarchical fault diagnosis scheme was proposed. The introduction of group Lasso penalty can help shrink contributions of irrelevant groups of variables to zero, thereby avoiding the smearing effect and giving rise to clear fault diagnostic information. An efficient solution algorithm was proposed to tackle the induced optimization problem. It turns out that the proposed approach particularly applies to another important situation where massive lagged measurements are included in the dynamic monitoring model. Meanwhile, a fundamental deficiency of classic reconstruction-based group contribution was pointed out and the applicability of the proposed approach was shown in this situation. Case studies demonstrated that the group Lasso-based approach yields correct diagnostic information in two challenging faulty cases in the TE process, and superior performance over traditional contribution plot techniques. Even for some mild faults, root-cause blocks and variables can still be effectively identified by the proposed approach. In future work, a promising direction is to further enhance sparsity in fault 11

C. Shang, H. Ji, X. Huang et al.

Control Engineering Practice 93 (2019) 104193 Shang, C., Yang, F., Huang, B., & Huang, D. (2018). Recursive slow feature analysis for adaptive monitoring of industrial processes. IEEE Transactions on Industrial Electronics, 65, 8895โ€“8905. Sun, H., Zhang, S., Zhao, C., & Gao, F. (2017). A sparse reconstruction strategy for online fault diagnosis in nonstationary processes with no a priori fault information. Industrial and Engineering Chemistry Research, 56, 6993โ€“7008. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 58, 267โ€“288. Tulsyan, A., Garvin, C., & รœndey, C. (2018). Advances in industrial biopharmaceutical batch process monitoring: Machine-learning methods for small data problems. Biotechnology and Bioengineering, 115, 1915โ€“1924. Verhaegen, M., & Hansson, A. (2016). N2SID: Nuclear norm subspace identification of innovation models. Automatica, 72, 57โ€“63. Westerhuis, J. A., Gurden, S. P., & Smilde, A. K. (2000). Generalized contribution plots in multivariate statistical process monitoring. Chemometrics and Intelligent Laboratory Systems, 51, 95โ€“114. Wiskott, L., & Sejnowski, T. (2002). Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14, 715โ€“770. Xu, H., Yang, F., Ye, H., Li, W., Xu, P., & Usadi, A. K. (2013). Weighted reconstructionbased contribution for improved fault diagnosis. Industrial and Engineering Chemistry Research, 52, 9858โ€“9870. Yan, Z., & Yao, Y. (2015). Variable selection method for fault isolation using least absolute shrinkage and selection operator (lasso). Chemometrics and Intelligent Laboratory Systems, 146, 136โ€“146. Yang, F., Duan, P., Shah, S. L., & Chen, T. (2014). Capturing connectivity and causality in complex industrial processes. Springer. Yang, C., Wan, X., Yang, Q., Xue, H., & Yu, W. (2010). Identifying main effects and epistatic interactions from large-scale snp data via adaptive group Lasso. BMC Bioinformatics, 11(S18). Yang, Y., & Zou, H. (2015). A fast unified algorithm for solving group-Lasso penalize learning problems. Statistics and Computing, 25, 1129โ€“1141. Yin, S., Ding, S. X., Haghani, A., Hao, H., & Zhang, P. (2012). A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process. Journal of Process Control, 22, 1567โ€“1581. Yin, S., & Kaynak, O. (2015). Big data for modern industry: Challenges and trends [point of view]. Proceedings of the IEEE, 103, 143โ€“146. Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 68, 49โ€“67. Zeng, J., Huang, B., & Xie, L. (2015). A Bayesian sparse reconstruction method for fault detection and isolation. Journal of Chemometrics, 29, 349โ€“360. Zhang, J., Zhao, Y., Liu, M., & Kong, L. (2019). Bearings fault diagnosis based on adaptive local iterative filteringโ€“multiscale permutation entropy and multinomial logistic model with group-Lasso. Advances in Mechanical Engineering, 11, 1โ€“13. Zhang, Y., Zhou, H., Qin, S. J., & Chai, T. (2009). Decentralized fault diagnosis of largescale processes using multiblock kernel partial least squares. IEEE Transactions on Industrial Informatics, 6, 3โ€“10. Zhao, Z., Wu, S., Qiao, B., Wang, S., & Chen, X. (2019). Enhanced sparse periodgroup lasso for bearing fault diagnosis. IEEE Transactions on Industrial Electronics, 66, 2143โ€“2153. Zheng, H., & Yan, X. (2019). Extracting dissimilarity of slow feature analysis between normal and different faults for monitoring process status and fault diagnosis. Journal of Chemical Engineering of Japan, 52, 283โ€“292. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 67, 301โ€“320. Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15, 265โ€“286.

Kim, C., Lee, H., Kim, K., Lee, Y., & Lee, W. B. (2018). Efficient process monitoring via the integrated use of Markov random fields learning and the graphical Lasso. Industrial and Engineering Chemistry Research, 57, 13144โ€“13155. Ku, W., Storer, R. H., & Georgakis, C. (1995). Disturbance detection and isolation by dynamic principal component analysis. Chemometrics and Intelligent Laboratory Systems, 30, 179โ€“196. Lee, J. M., Yoo, C. K., & Lee, I. B. (2004a). Statistical monitoring of dynamic processes based on dynamic independent component analysis. Chemical Engineering Science, 59, 2995โ€“3006. Lee, J. M., Yoo, C. K., & Lee, I. B. (2004b). Statistical process monitoring with independent component analysis. Journal of Process Control, 14, 467โ€“485. Liu, J. (2012). Fault diagnosis using contribution plots without smearing effect on non-faulty variables. Journal of Process Control, 22, 1609โ€“1623. Liu, Q., Chai, T., & Qin, S. J. (2012). Fault diagnosis of continuous annealing processes using a reconstruction-based method. Control Engineering Practice, 20, 511โ€“518. Liu, J., Wong, D. S. H., & Chen, D. S. (2014). Bayesian filtering of the smearing effect: Fault isolation in chemical process monitoring. Journal of Process Control, 24, 1โ€“21. Liu, Y., Zeng, J., Xie, L., Luo, S., & Su, H. (2019). Structured joint sparse principal component analysis for fault detection and isolation. IEEE Transactions on Industrial Informatics, 15, 2721โ€“2731. Lyman, P. R., & Georgakis, C. (1995). Plant-wide control of the Tennessee Eastman problem. Computers & Chemical Engineering, 19, 321โ€“331. MacGregor, J., & Cinar, A. (2012). Monitoring, fault diagnosis, fault-tolerant control and optimization: Data driven methods. Computers & Chemical Engineering, 47, 111โ€“120. MacGregor, J. F., Jaeckle, C., Kiparissides, C., & Koutoudi, M. (1994). Process monitoring and diagnosis by multiblock pls methods. AIChE Journal, 40, 826โ€“838. Miller, P., Swanson, R. E., & Heckler, C. E. (1998). Contribution plots: A missing link in multivariate quality control. Applied Mathematics and Computer Science, 8, 775โ€“792. Ohlsson, H., Chen, T., Pakazad, S. K., Ljung, L., & Sastry, S. S. (2014). Scalable anomaly detection in large homogeneous populations. Automatica, 50, 1459โ€“1465. Qin, S. J. (2003). Statistical process monitoring: Basics and beyond. Journal of Chemometrics, 17, 480โ€“502. Qin, S. J. (2012). Survey on data-driven industrial process monitoring and diagnosis. Annual Reviews in Control, 36, 220โ€“234. Qin, S. J. (2014). Process data analytics in the era of big data. AIChE Journal, 60, 3092โ€“3100. Qin, S. J., & Chiang, L. H. (2019). Advances and opportunities in machine learning for process data analytics. Computers & Chemical Engineering, 126, 465โ€“473. Qin, S. J., Valle, S., & Piovoso, M. J. (2001). On unifying multiblock analysis with application to decentralized process monitoring. Journal of Chemometrics: A Journal of the Chemometrics Society, 15, 715โ€“742. Qin, Y., & Zhao, C. (2019). Comprehensive process decomposition for closed-loop process monitoring with quality-relevant slow feature analysis. Journal of Process Control, 77, 141โ€“154. Roth, V., & Fischer, B. (2008). The group-lasso for generalized linear models: Uniqueness of solutions and efficient algorithms. In Proceedings of the 25th International Conference on Machine Learning (pp. 848โ€“855). ACM. Shang, C. (2018). Dynamic modeling of complex industrial processes: Data-driven methods and application research. Springer. Shang, C., Huang, B., Yang, F., & Huang, D. (2015a). Probabilistic slow feature analysisbased representation learning from massive process data for soft sensor modeling. AIChE Journal, 61, 4126โ€“4139. Shang, C., Huang, B., Yang, F., & Huang, D. (2016). Slow feature analysis for monitoring and diagnosis of control performance. Journal of Process Control, 39, 21โ€“34. Shang, C., Yang, F., Gao, X., Huang, X., Suykens, J. A. K., & Huang, D. (2015b). Concurrent monitoring of operating condition deviations and process dynamics anomalies with slow feature analysis. AIChE Journal, 61, 3666โ€“3682.

12