Proceedings, 10th IFAC International Symposium on Proceedings, 10th of IFAC International Symposium on Advanced Control Chemical Processes Proceedings, 10th IFAC International Symposium on Proceedings, 10th of IFAC International Symposium on Advanced Control Chemical Processes Shenyang, Liaoning, China, July 25-27, 2018 online at www.sciencedirect.com Advanced Control of Chemical Processes Available Advanced of Chemical Processes Shenyang,Control Liaoning, China, July 25-27, 2018 Proceedings, 10th IFAC International Symposium on Shenyang, Liaoning, China, July 25-27, 2018 Shenyang, Liaoning, July 25-27, 2018 Advanced Control of China, Chemical Processes Shenyang, Liaoning, China, July 25-27, 2018
ScienceDirect
IFAC PapersOnLine 51-18 (2018) 614–619
Soft Sensor Development for Multimode Soft Sensor Development for Multimode Soft Sensor Development for Soft Sensor Development for Multimode Multimode Processes Based on Semisupervised Processes Based on Semisupervised Soft Sensor Development for Multimode Processes Based on Semisupervised ⋆⋆ Processes Based on Semisupervised Gaussian Mixture Models ⋆ Gaussian Mixture Models Processes Based on Semisupervised Gaussian Mixture Models Gaussian Mixture Models ⋆⋆ Weiming Shao, Zhihuan Song, Le Yao Gaussian Mixture Models
Weiming Shao, Zhihuan Song, Le Yao Weiming Weiming Shao, Shao, Zhihuan Zhihuan Song, Song, Le Le Yao Yao State Key Laboratory Industrial Control Technology, of Weiming of Shao, Zhihuan Song, Le Yao College State Key Laboratory of Industrial Control Technology, College of of State Key Laboratory of Industrial Control Technology, College Control Science and Engineering, Zhejiang University, Hangzhou State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou Control Science and University, 310027, China (e-mail:
[email protected]; Control Science and Engineering, Engineering, Zhejiang University, Hangzhou Hangzhou State Key Laboratory of Industrial Zhejiang Control
[email protected]; Technology, College of 310027, China (e-mail:
[email protected];
[email protected]; 310027, (e-mail:
[email protected];
[email protected]). 310027, China (e-mail:
[email protected];
[email protected]; ControlChina Science and yaole Engineering,
[email protected]; University, Hangzhou yaole
[email protected]). yaole
[email protected]). yaole
[email protected]). 310027, China (e-mail:
[email protected];
[email protected];
[email protected]). Abstract: The Gaussian mixture yaole models (GMM) is an effective tool for modeling processes Abstract: The The Gaussian Gaussian mixture mixture models models (GMM) (GMM) is is an effective effective tool tool for modeling modeling processes processes Abstract: with multiple operating modes that widely exist in industrial Abstract: Theoperating Gaussianmodes mixture models (GMM) is an an effectiveprocess tool for forsystems. modelingTraditional processes with multiple that widely exist in industrial process systems. Traditional with multiple modes that widely exist industrial process supervised version of GMM, namely the Gaussian mixture regression for Traditional developing with multiple operating modes that widely exist in in industrial process systems. Abstract: Theoperating Gaussian mixture models (GMM) is an effective tool (GMR), forsystems. modeling processes supervised version of GMM, GMM, namely the Gaussian mixture regression (GMR), for Traditional developing supervised version of namely the Gaussian mixture regression (GMR), for developing soft sensors merely relies on the labeled samples. However, labeled samples in the soft sensor supervised version of GMM, namely the Gaussian mixture regression (GMR), for Traditional developing with multiple operating modes that widely existHowever, in industrial process systems. soft sensors merely relies on the labeled samples. labeled samples in the soft sensor soft sensors merely relies on the labeled samples. However, labeled samples in the soft sensor application are usually very infrequent due to economical or technical limitations, which may soft sensorsversion merely relies on infrequent the labeled samples. However, samples in the soft sensor supervised of GMM, namely thedue Gaussian mixtureorlabeled regression (GMR), for which developing application are usually very to economical technical limitations, may application are usually very infrequent due to economical or technical limitations, which may lead the GMR to unreliable parameter estimation and finally poor performance for predicting application are usually very infrequent due to economical or technical limitations, which may soft sensors merely relies on parameter the labeledestimation samples. However, labeled samples in the soft sensor lead the GMR to unreliable and finally poor performance for predicting lead the to unreliable parameter estimation and poor performance for predicting the primary variable. Tovery tackle this problem, semisupervised GMM for regression purpose is lead the GMR GMR to unreliable parameter estimation and finally finally poor performance forwhich predicting application are usually infrequent due toaa economical or technical limitations, may the primary primary variable. To tackle tackle this problem, problem, semisupervised GMM for regression purpose is the variable. To this a semisupervised GMM for regression purpose is proposed, where both labeled and unlabeled samples take effect, and the Gaussian parameters the primary variable. To tackle thisunlabeled problem, a semisupervised GMM for Gaussian regression purpose is lead the GMR toboth unreliable parameter estimation andtake finally poor performance forparameters predicting proposed, where labeled and samples effect, and the proposed, where both labeled samples take effect, and the and primary regression coefficients are and learned simultaneously based onGMM the proposed, where both To labeled unlabeled samples take effect, andexpectation-maximization the Gaussian parameters the variable. tackle thisunlabeled problem, a semisupervised for Gaussian regressionparameters purpose is and regression regression coefficients are and learned simultaneously based on the the expectation-maximization and coefficients are learned simultaneously based on expectation-maximization algorithm. Two case studies are carried out using simulated dataset and real-life dataset collected and regression coefficients are learned simultaneously based on the expectation-maximization proposed, where both labeled and unlabeled samples take dataset effect, and the Gaussian parameters algorithm. Two case studies are carried out using simulated and real-life dataset collected algorithm. Two case studies are carried out using simulated dataset and real-life dataset collected from a primary reformer in an ammonia synthesis process, which demonstrates the effectiveness algorithm. Two coefficients case studies areammonia carried out using simulated dataset andexpectation-maximization real-life dataset collected and regression are learned simultaneously based on the from a primary reformer in an synthesis process, which demonstrates the effectiveness from primary reformer in synthesis process, which the of theaa proposed method. from primary reformer in an an ammonia synthesis process, dataset which demonstrates demonstrates the effectiveness effectiveness algorithm. Two case studies areammonia carried out using simulated and real-life dataset collected of the the proposed proposed method. of method. of the proposed method. from a primary reformer in an ammonia synthesis process, which demonstrates the effectiveness © 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Keywords: Soft sensor, process, semisupervised Gaussian mixture models, Gaussian of the proposed method.multimode Keywords: Soft sensor, multimode process, semisupervised Gaussian mixture models, Gaussian Keywords: Soft sensor, multimode process, semisupervised semisupervised Gaussian Gaussian mixture mixture models, models, Gaussian Gaussian mixture regression, expectation-maximization. Keywords: Soft sensor, multimode process, mixture regression, expectation-maximization. mixture regression, expectation-maximization. mixture regression, expectation-maximization. Keywords: Soft sensor, multimode process, semisupervised Gaussian mixture models, Gaussian 1. regression, INTRODUCTION are inherently random processes (Yuan et al. (2017)), thus mixture expectation-maximization. 1. INTRODUCTION INTRODUCTION are inherently random processes (Yuan et al. (2017)), thus 1. are random (Yuan et al. thus probabilistic areprocesses more proper the uncer1. INTRODUCTION are inherently inherentlymodels random processes (Yuanto ethandle al. (2017)), (2017)), thus probabilistic models are more proper to handle the uncerprobabilistic models are more proper to handle the uncertainties compared to their deterministic counterparts. In industrial processes, many quality-related variables 1. INTRODUCTION probabilistic models are more proper the uncerare inherently random processes (Yuantoethandle al. (2017)), thus tainties compared to their deterministic counterparts. In industrial processes, many quality-related variables compared to their deterministic counterparts. In industrial processes, many quality-related variables called the primary variables are measured with large delay tainties tainties compared to are their deterministic counterparts. probabilistic models more proper the uncerIn industrial processes, many quality-related variables The Gaussian mixture models (GMM)toishandle well known to be called the primary variables are measured with large delay called the primary variables areprice measured with large delay in the laboratory or with many high usingwith analyzers. Soft The Gaussian mixture models (GMM) is well known to be tainties compared to their deterministic counterparts. called the primary variables are measured large delay In industrial processes, quality-related variables The mixture models (GMM) is be effective in modeling multimode in the the laboratory or with high high price using analyzers. Soft The Gaussian Gaussian mixture the models (GMM) characteristics is well well known known to toand be in laboratory or with price using analyzers. Soft sensors are popular alternatives of the lab analysis or effective in modeling the multimode characteristics and in the the laboratory or with high using analyzers. Soft called primary variables areprice measured with large delay effective in modeling the multimode characteristics and accounting for the process uncertainties simultaneously, sensors are popular alternatives of the lab analysis or effective in for modeling the multimode The Gaussian mixture models (GMM) characteristics is well known toand be sensors are popular alternatives of the lab analysis or analyser, since they are delay-free and very low-cost (Ge accounting the process uncertainties simultaneously, sensors are popular alternatives of thevery lab analysisSoft or and in the laboratory or are with high price using analyzers. accounting the process uncertainties simultaneously, has recently established itself as a popular tool forand deanalyser, since they delay-free and low-cost (Ge accounting for the process uncertainties simultaneously, effective in for modeling the multimode characteristics analyser, since they are delay-free and very low-cost (Ge et al. (2017)). As process data that reflect the true process and has recently established itself as a popular tool for deanalyser, since they are delay-free and (Ge sensors are popular alternatives thevery lablow-cost analysis or veloping and established itself as aa popular tool for softfor sensors for processes with multiple operating et al. al. (2017)). (2017)). As process data that thatof reflect the true process and has has recently recently established itself as popular tool for dedeaccounting the process uncertainties simultaneously, et As process data reflect the true process conditions become available, during pastvery decades a process variety soft sensors for processes with multiple operating et al. (2017)). As process data that reflect the true analyser, since they are delay-free and low-cost (Ge veloping veloping soft sensors for processes with multiple operating conditions. The GMM-based soft sensors can be generally conditions become available, during past decades a variety veloping softThe sensors for processes with multiple operating and has recently established itself as a popular tool for deconditions become available, during past decades a variety of data-driven algorithms have been applied to soft sensor conditions. GMM-based soft sensors can be generally conditions become available, during past decades a process variety et al. (2017)). As process data that thetotrue conditions. GMM-based can be generally into two groups. Insoft thesensors first multiple one, of data-driven data-driven algorithms have beenreflect applied soft sensor categorized conditions. The GMM-based soft sensors canthe besecondary generally veloping softThe sensors for processes with operating of algorithms have been applied to regression, soft sensor development, such as the principal component categorized into two groups. In the first one, the secondary of data-driven algorithms have been applied to soft sensor conditions become available, during past decades a variety categorized into two groups. In the first one, the secondary and primary variables are separately manipulated, and the development, such as the principal component regression, categorized into two groups. In the first one, the secondary conditions. The GMM-based soft sensors can be generally development, such as the the principal component regression, partial least squares, artificial neural networks, support primary variables are separately manipulated, and the development, such as principal component of data-driven algorithms have been applied to regression, soft sensor and and primary variables are separately manipulated, and the unsupervised GMM is used for mode identification. Then partial least squares, artificial neural networks, support and primaryinto variables are separately manipulated, and the two groups. In the first one, the secondary partial least squares, squares, artificial neural networks, support vector machines, and so forth . neural Extensive reviews of the categorized unsupervised GMM is used for mode identification. Then partial least artificial networks, support development, such as the principal component regression, unsupervised GMM is used for mode identification. Then localized models are trained for each mode using regresvector machines, and so forth . Extensive reviews of the unsupervised GMM istrained used for mode identification. Then and primary variables are separately manipulated, and the vector machines, and so forth . Extensive reviews of the development algorithms and application surveys of soft localized models are for each mode using regresvector machines, and so forth . neural Extensive reviewssupport of soft the sion partial least squares, artificial networks, localized models are for each mode using regresalgorithms such as thefor kernel development algorithms and application application surveys of localized models are istrained trained for eachpartial mode least using squares regresunsupervised GMM used mode identification. Then development algorithms and surveys of soft sensors in industrial processes can be found in (Kadlec sion algorithms such as the kernel partial least squares development algorithms and application surveys of vector and processes so forth . can Extensive reviews of soft the (Yu sion algorithms as kernel partial squares and such Gaussian process regression (Grbi´ cregreset al. sensorsmachines, in industrial industrial be found found in (Kadlec (Kadlec sion (2012)) algorithms such as the the kernel partial least squares localized models are trained for each mode least using sensors in processes can be be in et al. (2009); Kano and Ogawa (2009); Ge (2017)). (2012)) and Gaussian process regression (Grbi´ cc et al. sensors in industrial processes can found in (Kadlec development algorithms and application surveys of soft (Yu (Yu (2012)) and Gaussian process regression (Grbi´ (2013)). In the other group, the supervised GMM, namely et al. (2009); Kano and Ogawa (2009); Ge (2017)). (Yu (2012)) and Gaussian process regression (Grbi´ c et et al. al. sion algorithms such as the kernel partialGMM, least squares et al. al. (2009); (2009); Kano and and Ogawa (2009); (2009); Ge (2017)). (2013)). In the other group, the supervised namely et Kano Ogawa Ge (2017)). sensors in industrial processes can be found in (Kadlec (2013)). In other group, the namely Gaussian mixture regression (GMR), GMM, is(Grbi´ developed, Processes with multiple operating modes widely exist in the (2013)). In the the other group, the supervised supervised GMM, namely (Yu (2012)) and Gaussian process regression c et al. the Gaussian mixture regression (GMR), is developed, Processes with multiple operating modes widely exist in et al. (2009); Kano and Ogawa Ge widely (2017)). the Gaussian regression (GMR), is treats themixture secondary and variables together, Processes with multiple operating modes exist in in which industrial process systems, which(2009); may result from multiple the Gaussian mixture regression (GMR), is developed, developed, (2013)). In the other group, theprimary supervised GMM, namely Processes with multiple operating modes exist which treats the secondary and primary variables together, industrial process process systems, which may may resultwidely from multiple multiple which treats the secondary and primary variables together, and learns their joint probability distribution function industrial systems, which result from product grade demands, changes in feedstocks, load variwhich treats the secondary and primary variables together, the Gaussian mixture regression (GMR), is developed, industrial process systems, which may result from multiple Processes with multiple operating modes widely exist in learns their joint probability distribution function product grade grade demands, demands, changes changes in in feedstocks, feedstocks, load load varivari- and and learns their joint probability distribution function in each functional relationship for the product ations orgrade seasonal operations (Souza and Ara´ ujoload (2014)). and learns their jointThe probability distribution function which treats themode. secondary and primary variables together, product demands, changes in feedstocks, vari- (PDF) industrial process systems, which may result from multiple (PDF) in each mode. The functional relationship for the ations or seasonal operations (Souza and Ara´ u jo (2014)). (PDF) in each mode. The functional relationship for the soft sensor can be derived directly from the joint PDF. ations or seasonal operations (Souza and Ara´ u jo (2014)). Developing soft sensors for these processes needs to take (PDF) in each mode. functional relationship for For the and learns their jointThe probability distribution function ations orgrade seasonal operations (Souza and Ara´ ujoload (2014)). product demands, changes in feedstocks, varisoft sensor can be derived directly from the joint PDF. For Developing soft sensors for these processes needs to take soft sensor can be derived directly from the joint PDF. For instance, Yuan et al. (2014) trained soft sensors using the Developing soft sensors for these processes needs to take into account the characteristics of the multiple operating soft sensor can be derived directly from the joint PDF. For (PDF) in each mode. The functional relationship for Developing soft sensors for these processes needs to take ations or seasonal operations (Souza and Ara´ ujooperating (2014)). instance, Yuan et al. (2014) trained soft sensors using the into account the characteristics of the multiple instance, et al. trained soft sensors using the for Yuan multimode/multiphase processes, which shows into account thesensors characteristics ofprocesses the multiple multiple operating modes that soft make the processes non-Gaussian be- GMR instance, Yuan et derived al. (2014) (2014) trained softthe sensors using the sensor can be directly from joint PDF. For into account the characteristics of the operating Developing for theseexhibit needs to take GMR for multimode/multiphase processes, which shows modes that make make the processes processes exhibit non-Gaussian be- soft GMR for multimode/multiphase processes, which shows that the GMR outperforms the unsupervised GMM, bemodes that the exhibit non-Gaussian behaviors. As a result, a single model may fail to perform satGMR forGMR multimode/multiphase processes, which shows Yuan et al. (2014) trained soft sensors using the modes that make the processes exhibit non-Gaussian be- instance, into account the characteristics of may the multiple operating that the outperforms the unsupervised GMM, behaviors. As a result, a single model fail to perform satthat the GMR outperforms the unsupervised GMM, because the GMR learns model parameters together instead haviors. As a result, a single model may fail to perform satisfactorily and multiple models accounting for each mode that the outperforms the unsupervised GMM, beforGMR multimode/multiphase processes, which shows haviors. Asand amake result, a single model may fail to perform satmodes that the processes exhibit non-Gaussian be- GMR cause the GMR learns model parameters together instead isfactorily multiple models accounting for each mode cause the GMR learns model parameters together instead of separately, and thus is not constrained by the number isfactorily and multiple models accounting for each mode are required. In addition, due to the measurement variacause theGMR GMR learns model parameters together instead the outperforms the unsupervised GMM, beisfactorily multiple models accounting forperform each variamode haviors. Asand a result, a single model maymeasurement fail to sat- that of separately, and thus is not constrained by the number are required. In addition, due to the separately, thus is not constrained by the number of samples in and each mode. Zhu et al. (2017) developed a are required. Inmultiple addition, due to toaccounting the measurement measurement variationsrequired. and transmission disturbances, industrial processes of separately, and thus is not constrained by the number cause the GMR learns model parameters together instead are In addition, due the variaisfactorily and models for each mode samples in each mode. Zhu et al. (2017) developed aa tions and and transmission transmission disturbances, disturbances, industrial processes of samples in each mode. Zhu et al. (2017) developed variational GMR to model non-Gaussian processes. Their tions industrial processes ⋆ This samples GMR in and each mode. Zhu et al. (2017) developed a of separately, thus is not constrained by the number tions and transmission disturbances, industrial processes are required. In addition, due to the measurement variavariational to model non-Gaussian processes. Their work was supported by the National Key Research and ⋆ This work was supported by the National Key Research and variational GMR to model non-Gaussian processes. studies indicate that the Bayesian can notTheir only ⋆ variational GMR to mode. model non-Gaussian processes. Their Development Program of China (2017YFB0304203) and by and the of samples in each Zhu et treatment al. (2017) developed a tions disturbances, industrial processes studies indicate that the Bayesian treatment can not only This and worktransmission was supported by the National Key Research ⋆ Development Program of China (2017YFB0304203) and by and the This work was supported the National Key Research studies the treatment can only Natural Science Foundation of by China (61703367). Development Program of China (2017YFB0304203) and by the studies indicate indicate that the Bayesian Bayesian treatment can not notTheir only variational GMRthat to model non-Gaussian processes. ⋆ Natural Science Foundation of China (61703367). Development Program of China (2017YFB0304203) and by the This work was supportedof by the (61703367). National Key Research and Natural Science Foundation China studies indicate that the Bayesian treatment can not only Natural Science Foundation of China (61703367). Development Program of China (2017YFB0304203) and by the
Copyright © 2018, 2018 IFAC 608Hosting by Elsevier Ltd. All rights reserved. Natural Science Foundation of China (61703367). 2405-8963 © IFAC (International Federation of Automatic Control) Copyright © 2018 IFAC 608 Copyright ©under 2018 responsibility IFAC 608Control. Peer review of International Federation of Automatic Copyright © 2018 IFAC 608 10.1016/j.ifacol.2018.09.356 Copyright © 2018 IFAC 608
2018 IFAC ADCHEM Shenyang, Liaoning, China, July 25-27, 2018 Weiming Shao et al. / IFAC PapersOnLine 51-18 (2018) 614–619
automatically determine the number of Gaussians, but also improve the performance compared to the standard GMR. Although the GMR outperforms the unsupervised GMM, it requires sufficient labeled samples (for which both the second and primary variables are known). However, in soft sensor applications, labelling samples could be expensive and infrequent due to certain economical or technical limitations. As a result, labeled samples are usually rare and the success of the GMR could not be guaranteed, because insufficient samples often lead to unreliable estimation of the PDF, especially when the dimensionality of process variables is high. On the contrary, there are large amounts of unlabeled samples (for which only the secondary variables are known), which also contain useful information yet have not been utilized in the GMR. Semisupervised learning exploiting both labeled and unlabeled samples, has been confirmed effective for soft sensor development in remedying the insufficiency of labeled samples (Yao and Ge (2018)). Moreover, the validity of semisupervised GMM for classification applications has also been demonstrated (Xing et al. (2013); Yan et al. (2017)). Unfortunately, to our best knowledge, no work on semisupervised GMM for regression task has been reported, in particular for soft sensor modeling. Therefore, in order to deal with the above analyzed deficiency of the GMR in the soft sensor application, this paper proposes a semisupervised GMM (S2 GMM) for regression purpose, where both labeled and unlabeled samples take effect, and the PDF parameters and regression coefficients in each mode are learned simultaneously by the expectation-maximization (EM) algorithm. 2. SEMISUPERVISED GAUSSIAN MIXTURE MODELS 2.1 Formulation of the S2 GMM
µxy k
Also, the conditional PDF of y given x is calculated as ( � ) ( � ) Pk y �x = N y �wkT x + wk0 , σk2 (3)
Therefore, in the S2 GMM the PDF for labeled and unlabeled samples can be expressed as P (xi , yi ) = P (xj ) =
K ∑
k=1 K ∑
αk Pk (xi , yi )
(4)
αk Pk (xj )
(5)
k=1
where αk = P (zi = k) = P (zj = k) is the prior of the k-th component for i = 1, · · · , nl and j = 1, · · · , nu . Here zi and zj are the latent variables associated with the i-th labeled sample and j-th unlabeled sample, respectively. Again, using the linear Gaussian operation, the posterior distributions over the latent variables given the corresponding labeled and unlabeled sample are calculated as per � ( xy ) � ( ) αk N xi , yi �µxy k , Σk l � P zi = k xi , yi = ∑K � xy xy ) ≡ Rik ( �µ , Σ α N x , y k i i k=1 k k (6) ) ( � � ) ( αk N xj �µxk , Σxk u P zj = k �xj = ∑K ( � x x ) ≡ Rjk � k=1 αk N xj µk , Σk
(7)
where Rik and Rjk represent the posterior responsibilities of the k-th component for generating the i-th labeled sample and j-th unlabeled sample, respectively. 2.2 Parameter learning for the S2 GMM
Let x ∈ Rd and y ∈ R be the d -dimensional input variable and scalar output variable, respectively, and (Xl , Yl ) = nl nu (xi , yi )i=1 and (Xu ) = (xj )j=1 denote the labeled and unlabeled dataset, respectively. Here nl and nu are the numbers of labeled and unlabeled samples, respectively. Assume there are a total of K Gaussian components, and for the k-th component, the PDF of x and the functional dependence of y on x are defined as: ( � ) Pk (x) = N x�µxk , Σxk (1) y = wkT x + wk0 + εk ) ( � where Pk (x) means the PDF of x, N x�µxk , Σxk stands for the Gaussian distribution over x with mean vector µxk and covariance matrix Σxk , wk and wk0( are the ) regression coefficients between x and y, εk ∼ N 0, σk2 denotes the measurement noise of y, respectively, for the k-th component. By linear Gaussian operations, the joint PDF of x and y for the k-th component can be obtained as � ( xy ) Pk (x, y) = N x, y �µxy (2) k , Σk where
615
] ] [ Σxk wk µxk Σxk xy , Σk = . = wkT Σxk wkT Σxk wk + σk2 wkT µxk + wk0 [
609
For the S2 GMM, the model parameters that need to be ( )K learned are denoted as Θ = αk , µxk , Σxk , wk , wk0 , σk2 k=1 . In this paper, we develop an efficient way of fulfilling such task based on the EM algorithm. The complete log-likelihood function is formulated as ∑ ( � ) L (Θ) = P Z �Xl , Yl , Xu ln P (Xl , Yl , Xu , Z) =
Z n K l ∑∑ i=1 k=1
+
nu ∑ K ∑
j=1 k=1
+
nu ∑ K ∑
nl ∑ K ( � ) ∑ l l Rik Rik ln Pk (xi ) ln Pk yi �xi +
u Rjk ln Pk (xj ) +
i=1 k=1
nl ∑ K ∑
(8)
l Rik ln αk
i=1 k=1
u Rjk ln αk + const
j=1 k=1 n
n
u l where Z = (Zl , Zu ), and Zl = (zi )i=1 and Zu = (zj )j=1 are the latent variable sets corresponding to (Xl , Yl ) and Xu , respectively. Setting the derivatives of L (Θ) with K respect to those model parameters except (αk )k=1 to zeros can get their iteration equations. Specifically,
2018 IFAC ADCHEM 616 Weiming Shao et al. / IFAC PapersOnLine 51-18 (2018) 614–619 Shenyang, Liaoning, China, July 25-27, 2018
)−1 ( ∂L (Θ) �T Ωl X �l �T Ωl Yl �k = X = 0 =⇒ w (9) X l k l k �k ∂w [ ] ( l ) wk l l � �k = where w 0 , Ωk = diag R1k , · · · , Rnl k , Xl = wk T (Xl , 1), and 1 = (1, · · · , 1) ∈ Rnl . Similarly, we can get / nl nl ∑ ( )2 ∑ l l Rik Rik (10) yi − wkT xi − wk0 σk2 = i=1
i=1
µxk
Σxk =
=
∑nl
∑nl
i=1
l i=1 Rik xi ∑ nl l i=1 Rik
+ +
∑nu
∑j=1 nu
j=1
u Rjk xj u Rjk
∑nu u l ¯ ¯T xi xi + j=1 Rik Rjk x¯j x¯j T ∑nu u ∑ nl l i=1 Rik + j=1 Rjk
(11)
(12)
where x¯i = xi − µxk , x¯j = xj − µxk .
Using the Lagrange multiplier that combines L (Θ) and ∑K the constraint k=1 αk = 1 leads to ∑nu u ∑n l l i=1 Rik + j=1 Rjk (13) αk = nl + nu
Detailed derivations for Eqs.(8)∼(13) are omitted here, and one can refer to Bishop (2006) for fundamentals. The procedure for parameter learning for the S2 GMM based on the EM algorithm are summarized as Algorithm 1. Algorithm 1 Pseudocode for parameter learning for the S2 GMM (
3. SOFT SENSOR DEVELOPMENT BASED ON S2 GMM Based on the S2 GMM, a soft sensor model can be developed for estimating the true value (yq ) of the primary variable when a sample (xq ) of the secondary variable is available according to the following process. The posterior distribution over the latent variable (zq ) given xq is calculated according to ( � ) � ) ( P (zq = k) P xq �zq = k � P zq = k xq = ∑K ( � ) � k=1 P (zq = k) P xq zq = k ( � ) (14) αk N xq �µxk , Σxk = ∑K ( � ) αk N xq �µx , Σx k=1
k
k
� ) u . Then, the For simplicity, P zq = k �xq is denoted as Rqk conditional PDF of yq given xq is computed as (
K � ) ( � ( � ) ∑ ( ) P yq �xq = P zq = k �xq P yq �xq , zq = k
=
k=1 K ∑
k=1
u Rqk N
( � T ) yq �wk xq + wk0 , σk2
(15)
Therefore, the estimation of yq is determined as K [ � ] ∑ ( T ) u y�q = E yq �xq = Rqk wk xq + wk0
(16)
k=1
4. CASE STUDIES
)K αk , µxk , Σxk , wk , wk0 , σk2 k=1 .
Step 1): Initialize Θ = Step 2): Repeat 3)∼4) until the convergence criterion is satisfied. Step 3): E -step for k = 1, · · · , K; i = 1, · · · , nl ; j = 1, · · · , nu l Calculate Rik with Eq.(6). u Calculate Rjk with Eq.(7). end for Step 4): M -step for k = 1, · · · , K �k with Eq.(9). Update w Update σk2 with Eq.(10). Update µxk with Eq.(11). Update Σxk with Eq.(12). Update αk with Eq.(13). end for In Algorithm 1, the convergence criterion can be either the increment of the log-likelihood function, or the maximum iteration number. Also, from Algorithm 1 it can be seen K that in the S2 GMM, learning (αk , µxk , Σxk )k=1 involves explicitly both labeled and unlabeled samples, while unlabeled samples take effect implicitly in the learning of ( )K l �k , σk2 k=1 through Rik w (k = 1, · · · , K; i = 1, · · · , nl ). 610
In this section, the S2 GMM is first investigated using a numerical example and then applied to developing soft sensor for a real-life industrial primary reformer in an ammonia synthesis process. For comparison purpose, the performance of the GMR (Yuan et al. (2014)) and the popular partial least squares (PLS) (Lindgren et al. (1993)) are also presented. Parameter initialization in the learning procedure for the GMR and S2 GMM are aided by the k -means clustering method following the suggestion of Bishop (Bishop (2006)). In addition, certain proportion of training samples are chosen as labeled samples, and the rest of training samples are treated as unlabeled. To deal with the randomness in the initialization procedure for the GMR and S2 GMM, 100 independent simulations are run for each labeling rate. The prediction accuracy is measured using the averaged root mean square error (RMSE), which is defined as � / � nt �∑ 2 RMSE = � nt (yt − y�t ) (17) t=1
where yt and y�t represent the true and predicted labels of the t-th testing sample, respectively, and nt stands for the number of testing samples. 4.1 Numerical Example T
Assume a 2-dimensional input vector x = (x1 , x2 ) and a scalar output y follow the relationship described in Eq.(1)
2018 IFAC ADCHEM Shenyang, Liaoning, China, July 25-27, 2018 Weiming Shao et al. / IFAC PapersOnLine 51-18 (2018) 614–619
Table 1. Configurations of the three Gaussian components
αk µxk Σxk
k=1
k=1
k=3
20%
30%
50%
T
T
(0,2) [ ] 21
wk wk0
11 (1,1)T 0
σk2
0.5
3 −1
0.5
PLS GMR 2
S GMM 0
−5
−15 −15
−10
−5
true value
0
5
10
Fig. 2. Scatter plot comparisons among PLS, GMR and S2 GMM.
−1 1.5 (-1,1)T 0
0.5 2 (1,-1)T 0
5
−10
T [ (4,0) ]
[ (4,6) ] 1 0.5
10
predicted value
with 3 Gaussian components, where the configurations of each Gaussian component are listed in Table 1, and the data distributions are visualized in Fig. 1 in the input space, which presents clear multimode characteristics. In the simulation, a total of 200 samples are generated for model parameter learning and 1000 samples are generated for performance evaluation.
617
0.5
Scatter plot comparisons among the three investigated soft sensing methods are illustrated in Fig. 2, where the labeling rate is 10%. It is found that in the prediction results obtained by the PLS, significant bias occurs in all the three modes, implying that the PLS doesn’t model any mode well. The reason is that the PLS is a linear algorithm and can only deal with Gaussian or approximated Gaussian distributions, but this multimode example is strongly nonlinear and non-Gaussian. In contrast, the GMR and S2 GMM show more powerful abilities in capturing the multimode characteristics. However, the GMR doesn’t perform well in the area where the output is greater than 0, because this area corresponds to the first component where the data distribution is longer and narrower, and is easily disturbed by the other two components, which can be inferred from Fig. 1. 12 10 8
the predictive accuracies of the GMR and S2 GMM are almost identical. However, when the labeling rate is reduced to be below 40%, the performance of the GMR starts to deteriorate, especially when the labeling rate decreases to 10%. By contrast, as the labeling rate decreases, the performance of the S2 GMM deteriorates relatively slowly and the deterioration is much smaller. For examples, when the labeling rate decreases from 50% to 10%, the deteriorations for the GMR and S2 GMM are 55.7% and 10.6%, respectively. Thus, in this numerical example it can be concluded that the S2 GMM outperforms the GMR with small amount of labeled samples, which verifies the benefit of incorporating those unlabelled samples. Table 2. Average predictive RMSE for the numerical example by PLS, GMR and S2 GMM labeling rate
PLS
GMR
S2 GMM
10% 20%
2.1943 2.0632
1.6776 1.3890
1.1956 1.1839
30% 40%
2.1567 2.0939
1.3184 1.0101
1.0876 1.0372
50%
2.0617
1.0776
1.0815
4.2 Primary Reformer
x2
6
The primary reformer shown in Fig. 3 comes from the hydrogen manufacturing units in the ammonia synthesis process. It transforms the desulphurized natural gas into crude synthesis gas for ammonia production in a followup unit through the following reaction with nickel catalyst:
4 2 0 −2 −4 −5
∆
0
x1
5
Cn H2n+2 + nH2 O ←−−−→ nCO + (2 n + 1 )H2
10
∆
CH4 + H2 O ←−−−→ CO + 3 H2
Fig. 1. Visualization of the numerical example in the input space using the testing dataset. The average predictive RMSE comparisons are presented in Table 2, which indicates that the addition of labeled samples doesn’t help the PLS too much, due to its limitation in dealing with the non-Gaussianity. Compared with the PLS, both the GMR and S2 GMM demonstrate evident advantages, and when the labeling rates are 40% and 50%, 611
(18)
∆
CO + H2 O ←−−−→ CO2 + H2 Reaction temperature is a pivotal factor to keep the chemical reaction described in Eq. (18) stable. In an ammonia production plant from the China National Offshore Oil Company, the reaction temperature is controlled at a certain level by manipulating the burning condition in the furnace, which is realized by monitoring the concentration of the oxygen at the top of the primary reformer
2018 IFAC ADCHEM 618 Weiming Shao et al. / IFAC PapersOnLine 51-18 (2018) 614–619 Shenyang, Liaoning, China, July 25-27, 2018
O content (%)
20
5
(
0
TR03012.PV
Temperature of process gas
TI03013.PV TI03014.PV TR03015.PV
Temperature of furnace flue gas Temperature of furnace flue gas Temperature of mixed furnace flue gas
TR03016.PV
Temperature of transformed gas
TR03017.PV
Temperature of transformed gas
TR03020.PV
Temperature of transformed gas
200
300 400 500 sample number
600
700
real value predicted value
20 15 10
2
Temperature of fuel natural gas
100
(a)
O content (%)
TI03009.PV
10
5 0
100
200
300 400 500 sample number
600
700
(b) real value predicted value
20 15 10
2
Pressure of fuel off gas Pressure of furnace flue gas Temperature of fuel off gas
real value predicted value
15
0
O content (%)
PC03002.PV PC03007.PV TI03001.PV
700
5
Table 3. Descriptions of selected secondary variables for soft sensing O2 concentration
Flow of fuel natural gas Flow of fuel off gas
600
2
O content (%)
measured by an expensive mass spectrometer (AR03001). Thus, a soft sensor is desired for online estimating the oxygen concentration at the top of the primary reformer. Secondary variables of the soft sensor are selected using expert knowledge from the field engineers (Yao and Ge (2017)), which are presented in Table 3.
FR03001.PV FR03002.PV
300 400 500 sample number
20
Fig. 3. Flow chart of the primary reformer.
Descriptions
200
levels with labeling rate = 20% are presented as Fig. 5(a), Fig. 5(b) and Fig. 5(c), respectively.
Tags
100
Fig. 4. Illustration of the multimode characteristics of primary reformer in terms of O2 content.
10
2
15
5 0
100
200
300 400 500 sample number
600
700
(c)
Around 1500 samples for soft sensor development have been selected from the database of the distributed control systems in a real-life ammonia production plant. Those samples are evenly partitioned as training ones and testing ones. The primary reformer is a multimode process due to the operating conditions, which are shown in Fig. 4 using the testing samples. The number of Gaussian components for both the GMR and S2 GMM are roughly pre-defined as 3 according to the operating conditions. Note that some criteria, such as the Akaike information criterion (Yan et al. (2017)) and the absolute increment of the log-likelihood (Yuan et al. (2014)), can also be used for automatically determining the number of Gaussian components. Predictions achieved by the PLS, GMR and S2 GMM-based soft sensors for the oxygen concentration at their average performance 612
Fig. 5. Predictions of the oxygen concentration by: (a) PLS; (b) GRM; (c) S2 GMM. It can be seen from Fig. 5(a) that like in the previous numerical example, the PLS could not deal with any mode pertinently, but tends to find a balance among the three modes, which is because of its failure in dealing with the multimode characteristics. In contrast, both the GMR and S2 GMM perform better than the PLS in capturing multimode characteristics. However, in some areas such as those around 50-th and 550-th sample, the S2 GMM shows apparent predictive advantages over the GMR. For further investigations, quantitative prediction accuracies achieved by the three soft sensors for the oxygen content are tabulated in Table 4. Once more, there seems no substantial performance improvement in the PLS through
2018 IFAC ADCHEM Shenyang, Liaoning, China, July 25-27, 2018 Weiming Shao et al. / IFAC PapersOnLine 51-18 (2018) 614–619
increasing the labeling rate. But the performance of both the GMR and S2 GMM get improved as the labeling rate is increased from 20% to 50%, and the S2 GMM accomplishes higher predictive accuracy than the GMR. These results confirm the effectiveness of the semisupervised learning strategy for the GMM. However, when the amount of labeled samples are small to certain extent, for example the labeling rate is set as 10% in the primary reformer, both the GMR and S2 GMM suffer from the drawback of the standard GMM model, i.e., the numerical issue caused by ill-conditioned covariance matrices. As a result, the inverse of the covariance matrices of the PDF for the Gaussian components become unavailable, which disables the use of the GMR and S2 GMM as marked with the symbol ‘–’ in Table 4. Table 4. Average predictive RMSE for the primary reformer by PLS, GMR and S2 GMM labeling rate
PLS
GMR
S2 GMM
10% 20% 30%
1.7271 1.6708 1.6525
– 1.5064 1.4341
– 1.4374 1.4147
40% 50%
1.6772 1.6570
1.4271 1.4011
1.4070 1.3897
5. CONCLUSION In this paper, we have proposed a semisupervised version of the GMM (S2 GMM) for regression applications, which is able to mine the information contained in both labeled and unlabeld samples. The S2 GMM has been applied to developing soft sensor for the multimode process, and two case studies including a numerical example and a real-life chemical process have confirmed the superiorities of the S2 GMM over the popular PLS and the supervised GMM (GMR). However, in the second case study we have encountered a drawback of the S2 GMM and GMR with too small amount of labeled samples and high-dimensional process variables, namely the numerical issue. A potential way of solving or alleviating this problem is to form the variational Bayesian S2 GMM or a simpler S2 GMM with Bayesian regularization, which both treat the regression coefficients for each Gaussian component as random variables. This will be our future work. REFERENCES Bishop C.M., Pattern recognition and machine learning. Springer, New York, 2006. Ge Z., Song Z., Ding S.X. and Huang B., Data mining and analytics in the process industry: the role of machine learning. IEEE Access, 5(99):20590-20616, 2017. Ge Z., Review on data-driven modeling and monitoring for plant-wide industrial processes. Chemometrics and Intelligent Laboratory Systems, 171:16-25, 2017. Grbi´c R., Sliˇskovi´c D. and Kadlec P., Adaptive soft sensor for online prediction and process monitoring based on a mixture of Gaussian process models. Computers and Chemical Engineering, 58(22):84-97, 2013. 613
619
Kadlec B., Gabrys B. and Strandt S., Data-driven soft sensors in the process industry. Computers and Chemical Engineering, 33(4):795-814, 2009. Kano M. and Ogawa M., The state of the art in chemical process control in Japan: good practice and questionnaire survey. Journal of Process Control, 20(9):969C982, 2010. Lindgren R., Geladi P. and Wold S., The kernel algorithm for PLS. Journal of Chemometrics, 7(1):45-59, 1993. Souza A.A.F. and Ara´ ujo R., Mixture of partial least squares experts and application in prediction settings with multiple operating modes. Chemometrics and Intelligent Laboratory Systems, 130(2):192-202, 2014. Xing X., Yu Y., Jian H. and Du S., A multi-manifold semisupervised Gaussian mixture model for pattern classification. Pattern Recognition Letters, 34(16):21182125, 2013. Yan H., Zhou J. and Pang C., Gaussian mixture model using semisupervised learning for probabilistic fault diagnosis under new data categories. IEEE Transactions on Instrumentation and Measurement, 64(4):723-733, 2017. Yao L. and Ge Z., Moving window adaptive soft sensor for state shifting process based on weighted supervised latent factor analysis. Control Engineering Practice, 61:72-80, 2017. Yao L. and Ge Z., Deep learning of semi-supervised process data with hierarchical extreme learning machine and soft sensor application. IEEE Transactions on Industrial Electronics, 65:1490-1498, 2018. Yu J., Multiway Gaussian mixture model based adaptive kernel partial least squares regression method for soft sensor estimation and reliable quality prediction of nonlinear multiphase batch processes. Industrial and Engineering Chemistry Research, 51(40):13227-13237, 2012. Yuan X., Ge Z. and Song Z., Soft sensor model development in multiphase/multimode processes based on Gaussian mixture models. Chemometrics and Intelligent Laboratory Systems, 138(6):97-109, 2014. Yuan X., Ge Z., Huang B. and Song Z., A probabilistic just-in-time learning framework for soft sensor development with missing data. IEEE Transactions on Control Systems and Technology, 25(3):1124-1132, 2017. Zhu J., Ge Z. and Song Z., Variational Bayesian Gaussian mixture Regression for soft sensing key variables in nongaussian industrial processes. IEEE Transactions on Control Systems and Technology, 25(3):1092-1099, 2017.