Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109
Contents lists available at ScienceDirect
Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab
Soft sensor model development in multiphase/multimode processes based on Gaussian mixture regression Xiaofeng Yuan, Zhiqiang Ge ⁎, Zhihuan Song State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control, Department of Control Science and Engineering, Zhejiang University, Hangzhou, 310027 Zhejiang, P. R. China
a r t i c l e
i n f o
Article history: Received 7 May 2014 Received in revised form 7 July 2014 Accepted 17 July 2014 Available online 25 July 2014 Keywords: Soft sensor Gaussian mixture regression Multiphase process Multimode process Quality prediction
a b s t r a c t For complex industrial plants with multiphase/multimode data characteristic, Gaussian mixture model (GMM) has been used for soft sensor modeling. However, almost all GMM-based soft sensor modeling methods only employ GMM for identification of different operating modes, which means additional regression algorithms like PLS should be incorporated for quality prediction in different localized modes. In this paper, the Gaussian mixture regression (GMR) model is introduced for multiphase/multimode soft sensor modeling. In GMR, operating mode identification and variable regression are integrated into one model; thus, there is no need to switch prediction models when the operating mode changes from one to another. To improve the GMR model fitting performance, a heuristic algorithm is adopted for parameter initialization and component number optimization. Feasibility and efficiency of GMR based soft sensor are validated through a numerical example and two benchmark processes. © 2014 Elsevier B.V. All rights reserved.
1. Introduction In modern industrial plants, process operating condition is frequently monitored and controlled in order to improve process efficiency and produce high-quality products. However, these techniques are highly dependent on accurate model identification and reliable measurements. Especially, online analysis of key process variables is critical for process monitoring and control. In some cases, due to reasons of high investment cost, technical difficulty, measurement delay, and so on, the process often encounters the great challenge of lacking accurate real-time measurements of key process variables. Over the past decades, soft sensors have been widely used to tackle these problems, which provide frequent estimations of key process variables through those that are easy to measure online [1–5]. By far, the majority of soft sensors are based on data-driven modeling methods, which construct inferential models by using abundant process measurement data. As a category of data-driven soft sensors, the multivariate statistical techniques such as principal component analysis (PCA) [6], partial least squares (PLS) [7], and independent component analysis (ICA) [8] are extensively researched and employed in diverse processes. In addition, machine learning based nonlinear methods like artificial neural network (ANN) [9] and support vector machine (SVM) [10] have also been also applied for soft sensor modeling. Apart from these, many soft sensors have also been developed with the usage of fuzzy systems like Takagi–Sugeno fuzzy model [11–14]. ⁎ Corresponding author. Tel.: +86 87951442. E-mail address:
[email protected] (Z. Ge).
http://dx.doi.org/10.1016/j.chemolab.2014.07.013 0169-7439/© 2014 Elsevier B.V. All rights reserved.
Though different types of soft sensor modeling techniques have been applied for quality prediction, most of them are based on the assumption that process data are generated from a single operating region and follow a unimodal Gaussian distribution. For complex multiphase/multimode processes that are running at multiple operating conditions, the basic assumption of multivariate Gaussian distribution does not met because of the mean shifts or covariance changes. Consequently, data distribution may be complicated with arbitrary non-Gaussian patterns. Meanwhile, another problem in multiphase/multimode processes is that a single global soft sensor model is no longer well suited in predicting the output of key process variables. As mixture models can represent arbitrarily complex probability density functions, they are ideal tools to model complex multi-class data distribution. By taking sufficient linear combinations of single multivariate Gaussian distributions, the Gaussian mixture model (GMM) can smoothly approximate almost any continuous density to arbitrary accuracy [15]. Thus the GMM technique has shown strong ability in dealing with non-Gaussian data and can be used for classification or cluster problems in various fields. Speech recognition [16], image segmentation [17], and robotic learning [18] are some typical applications. With respect to the process industry, GMM is extensively utilized for multiphase/multimode process monitoring and soft sensor applications [19–22]. In these researches, the main purpose of GMM is to identify and localize operating mode of data. For example, a novel multimode process monitoring approach based on finite Gaussian mixture models and Bayesian inference strategy is proposed in reference [20].
98
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109
As have been mentioned, GMM can classify those data similar to each other into the same class, and then certain data modes can be obtained in multiphase/multimode processes. In order to carry out soft sensor estimation of process variables, localized soft sensor algorithm should be constructed to model the relationship between output and input variables in each mode, respectively. In other words, two steps are carried out in sequence when building soft sensors for multiphase/multimode processes: mode localization step and localized regression step. It is cockamamie and time consuming. First of all, both classification and regression approaches should be incorporated. Second, multiple regression models should be trained, the number of which is equivalent to the number of operating modes. An improved method is to give a weighted output by assigning membership degrees to specific regions. For example, in Lughofer's research work [13,14], a certain number of piecewise linear predictors are build in different regions, then the output of the query sample is achieved by a weighted sum of all the piecewise linear predictors. Such a technique can obtain good prediction accuracy and address the non-linear problem of processes. In a simpler and more direct way, we introduce a new soft sensing method for multiphase/multimode processes, which is called Gaussian mixture regression (GMR) [23,24]. GMR, first proposed in [23], is an extension of Gaussian mixture model and can be exploited for regression problems. The regression procedure is based on the Gaussian conditioning and linear combination properties of Gaussian distributions. By separating the data point into input and output parts, the joint probability distribution of input and output of data point is modeled in a GMM. Then the conditional probability distribution of output on input is estimated with parameters obtained in GMM. After the training step, output can be predicted when an input data comes. Compared to deterministic regression methods like the fuzzy systems [13,14], GMR tries to find the relationship between the input and output under a probabilistic maximum likelihood framework. Besides, as mentioned in reference [24], GMR is easy to implement, and satisfies robustness and smoothness criterions that are common to a variety of fields. Although the theoretical considerations of GMR were presented two decades ago, it has come out with only few applications [25–28]. By far, GMR has mainly been utilized in area of robot programming by demonstration (PbD) for imitation learning of multiple tasks [28,29]. To our best knowledge, this method has not been applied in other fields yet. Particularly, no literature about GMR has been found for soft sensor application in chemical processes up to now. Therefore, the advantage of GMR in soft sensor modeling has not yet been explored. Two practical but fundamental issues to be addressed in employing GMR are as follows: (1) how to determine the number of Gaussian components, (2) how to calculate the parameters in the mixture model. Too small number of components will result in under-fitting problem, while a large number suffers from computational burden and data overfitting; thus, a proper number of Gaussian components are critical to adequately describe the data in GMR. There are several techniques such as Schwarz's Bayesian inference criterion (BIC) [30], the minimum message length (MML) [31], and the minimum description length (MDL) [32] that can be used. For the second issue, the expectation maximization (EM) algorithm [33] is a classical method for learning the parameters of mixture models. By iteratively running E-step and M-step, the parameters will converge to optimum values. Nevertheless, this algorithm has some drawbacks like the need for a predetermined component number and critical dependence on initialization. In general, by executing certain different runs of EM algorithm with different initializations and different component numbers, and assessing each estimation with some criteria like BIC, the optimal number of components can be obtained by maximizing or minimizing the criterion. However, this is computational burden and time consuming. In this paper, a simple absolute increment of log-likelihood (AIL) criterion is used to determine the optimal components, which will show great effectiveness in later case studies. To overcome the
disadvantage of the EM algorithm, we adopt a heuristic procedure. We first start with an adequately number of components to have a fine-scale representation of data. Then by sequentially pruning the least probabilistic component, merging the least probabilistic one with the closest component and obtaining a nearly good initialization for the next run of EM, new mixture models with reduced components will be obtained successively. Detailed description of the schedule will be explained in Section 3. The remainder of this paper is organized as follows. In Section 2, the definition of GMM and the EM algorithm for parameter estimation are briefly revisited. Then the GMR based soft sensor with optimal components selection is introduced in Section 3. A numerical example and two application examples are provided in Section 4. Finally, conclusions are made. 2. Preliminaries 2.1. Gaussian mixture model Let z ϵ Rd be a data point of d-dimensional variable. If z comes from a unimodal multivariate Gaussian distribution, then the probability density function is given by n o 1 T −1 f ðzjθÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi exp −1=2ðz−μ Þ ∑ ðz−μ Þ ð2πÞd ∑
ð1Þ
where μ is the mean vector and Σ is the covariance matrix, and θ={μ,Σ} are the parameters to determine a Gaussian distribution. When the data point z is from a mixture Gaussian model, then its probability density function can be formulated as follows [15] pðzjΩÞ ¼
K X
ωk f ðzjθk Þ
ð2Þ
k¼1
where K is the number of Gaussian components in GMM, ωk is the probabilistic weight of the kth Gaussian component and subjects to K
condition of ωk ≥0; ∑ ωk ¼ 1, θk = {μk,Σk} represent the parameters k¼1
in the kth Gaussian component (mean vector μk and covariance matrix Σk), and Θ = {θ1, θ2,⋯θK} = {μ1, ∑1, μ2, ∑2,⋯μK, ∑K} denote all parameters defining each Gaussian component, respectively. Then the total parameters in the complete GMM with K components can be defined as Ω = {{ω1, μ1, ∑1}, {ω2, μ2, ∑2},⋯{ωK, μK, ∑K}}, which involve both the Gaussian model parameters θk and the mixing probabilities ωk (1 ≤ k ≤ K). Given a set of N independent and identically distributed training samples Z = [z1,z2,…,zN], the likelihood and log-likelihood function of Z can be written as N
LðZ; ΩÞ ¼ ∐
n¼1
K X
ωk f ðzn jθk Þ
ð3Þ
k¼1
and logLðZ; ΩÞ ¼
N X n¼1
log
K X
! ωk f ðzn jθk Þ
ð4Þ
k¼1
2.2. EM algorithm for GMM To build a GMM, the unknown parameter set Ω need to be estimated firstly. This problem is equal to find parameters that maximize the loglikelihood function formulated as ^ ¼ arg max ð logLðZ; ΩÞÞ Ω Ω
ð5Þ
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109
Analytical solution of the above problem is by setting the first-order derivatives of the log-likelihood function to zero, but this may cause issues like singular or ill-conditioned estimates of Gaussian model parameters. In practice, the numerical strategy of expectation– maximization (EM) algorithm is more practically and extensively used in estimating Gaussian parameters. By giving an initial parameter set Ω(1) and successively applying the E step and M step, the EM algorithm can produce a sequence of parameters {Ω(1),Ω(2),…Ω(m),…}. The details of these two steps are performed as follows [20]: E-step:
γ
ðmÞ
j ðmÞ ck jz j ; Ω ¼
ðmÞ ðmÞ ðmÞ ωk f z j jμ k ; ∑k K X
ðmÞ
ωi
i¼1
XðmÞ ðmÞ f z j jμ i ; i
ð6Þ
where γ(m)(cjk|zj, Ω(m)) denotes the posterior probability of the jth training sample with the kth Gaussian component at the mth iteration.
99
For a given input zI from the kth Gaussian component, its corresponding output zO also obeys the Gaussian distribution. The conditional probability of zO on zI is defined by O I O ^ p z jz ; k f z j^zk ; ∑ k
ð9Þ
^ can be calculated as where the mean ^xk and covariance ∑ k I −1 I I ^zk ¼ μ Ok þ ∑OI z −μ k k ∑k ^ ¼ ∑O −∑OI ∑I −1 ∑IO ∑ k k k k k
ð10Þ
The details of the derivation of these results can be found in chapter 2 of Bishop's book [15]. As zI is generated from a mixture model, the distribution of output zO consists of K parts. Considering the complete mixture model, the expected conditional distribution of output zO on input zI is also a Gaussian mixture model K X O I O ^ Þ p z jz hk f ðz j^zk ; ∑ k
ð11Þ
k¼1
M-step:
where hk corresponds to the posterior probability that k is responsible for zI. By utilizing the Bayes formula, we can estimate hk as N X
ðmþ1Þ
¼
ωk
ðmþ1Þ
¼
μk
γ
ðmÞ
pðkÞp zI jk ωk f zI jμ Ik ; ∑Ik I hk ¼ p kjz ¼ K ¼ K X XI X I I I pðiÞp z ji ωi f z jμ i ; i
j ðmÞ ck jz j ; Ω
j¼1
N N X ðmÞ j ðmÞ γ ck jz j ; Ω zj
i¼1
j¼1 N X
γ
ðmÞ
j ðmÞ ck jz j ; Ω
ð7Þ
j¼1 N X ðmþ1Þ
∑k
¼
γ
ðmÞ
j¼1
j ðmÞ ðmþ1Þ ðmþ1Þ T ck jz j ; Ω z j −μ k z j −μ k XN j¼1
+ 1) , ω(m k
+ 1) μ(m , k
γ ðmÞ ckj jz j ; ΩðmÞ
i¼1
O Finally, given a zI, the conditional expectation of z can be approxiO ^ ^ mated by just one Gaussian distribution f z ; ∑ , which is based on the linear transformation property of Gaussian distributions. So the O ^ of output can be calculated as mean ^z and covariance ∑
^zo ¼
K X k¼1
+ 1) ∑(m k
ð12Þ
^ ¼ hk ^zk ;∑
K X
2 ^ hk ∑ k
ð13Þ
k¼1
and are the probabilistic weight, where mean and covariance of the kth Gaussian component at the (m + 1) th iteration, respectively. Details of this algorithm can be found in reference book [24]. 3. Gaussian mixture regression based soft sensor 3.2. Fitting mixture model In this part, we will introduce the algorithm of GMR. Meanwhile, the technique to fitting mixture model is present. Then, it is used for construction of the soft sensor. 3.1. Gaussian mixture regression In this part, the description of Gaussian mixture regression (GMR) is introduced [23,24]. The derivation of GMR is mainly based on the Gaussian conditioning and the linear combination properties of Gaussian distributions. We assume the data point z is composed of two parts: the input zI and output zO. If z obeys the Gaussian mixture distribution with K components, the probabilistic density function can be denoted as Eq. (2). Meanwhile, the mean vector and covariance matrix of each Gaussian component are divided into input and output parts like follows
" μk ¼
# " # I I IO μk ∑k ∑k ; ∑ ¼ k O OI O μk ∑k ∑k
ð8Þ
As a precondition, the number of Gaussian components is given whenever GMM or GMR is used. Also, parameter estimation procedure of EM algorithm needs a predefined number K of mixture components. It is already known that maximum likelihood (ML) criterion cannot be used to determine the number of Gaussian components because the maximized likelihood is a non-decreasing function of component number. So other criterions, like Bayesian inference criterion (BIC) and minimum description length (MDL), are employed to obtain optimal K by minimizing or maximizing some criterion function. The basic procedure of these methods is as follows: Firstly, for a range of values K from Kmin ^ ðK Þ for each fixed to Kmax, EM algorithms are executed to get parameter Ω K. Second, the criterion functions relating to component K are calculated. Finally, the optimal number of components, Kopt, is obtained by maximizing or minimizing the criterion function [34]. Generally, an additional Gaussian component is necessary only if it can bring significant improvement in data description. Otherwise it is unnecessary. Specifically, if the log-likelihood is improved significant, then adding a component is effective and necessary. So in this study, we use a simple absolute increment log-likelihood criterion (AIL) to choose the optimal
100
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109
probable component and merging with similar component, we can get relative good initialization parameters for the next mixture model. The procedure of this algorithm is described in Table 1. The key step in the procedure is the pruning of least probable component and merging it with its closest component into a single one. First, the least probable component can be selected out by finding the least mixing probability of components. That is
Table 1 Procedures of fitting mixture model algorithm. Preparation: Collect training data, predetermine Kmin and Kmax, select a criterion function ^ ðK Þ ; K . such as AIL Ω Initialization: Set K equal to Kmax Initialize the parameters of K-component mixture model using K-means. ^ ð1Þ . Denoted by Ω K Main loop: While K ≥ Kmin, repeat:
p ¼ arg min fwi ; i ¼ 1; 2; …K g
^ . Run EM algorithm to get the estimated model parameter Ω K ^ ðK Þ ; K . Calculate and store corresponding criterion function AIL Ω
Then the component that is closest to the pth component can be found. The symmetric Kullvack–Leibler (KL) [35] divergence is a common criterion to measure dissimilarity between probability densities. For the Gaussian densities, the symmetric KL divergences between the pth and the other components are
Prune the least probable component and merge it with its closest one into a single component. ð1Þ
^ Obtain a (K − 1)–component mixture with initial parameters Ω ðK‐1Þ . Set K to (K − 1). Optimal number of Gaussian components n o ^ K ; K ; K ¼ K min ; ⋯K max , and the final parameter K opt ¼ arg max AIL Ω
i 1 h −1 −1 Ds ½ f ðzjμ p ; ∑p ; f ðzjμ j ; ∑ j ¼ tr ∑p −∑ j ∑ j −∑p 2 ð18Þ T 1 −1 −1 −1 ∑p þ ∑ j up −u j ; j ¼ 1; 2; …K; j≠p þ up −u j 2
K
^K estimate of mixture model is obtained as Ω opt
Gaussian components different from that in [34]. The AIL criterion is defined as
^ ; K ¼ logL Z; Ω ^ ^ AIL Ω ðK Þ ðK Þ − logL Z; ΩðK−1Þ ∧
The most similar component then is chosen as n h i o q ¼ arg min Ds f ðzjμ p ; ∑p ; f ðzjμ j ; ∑ j ; j ¼ 1; 2; …K; j≠p
∧
^ ^ logL Z; Ω ðK Þ − logL Z; ΩðK−1Þ ^ RIL ΩðK Þ ; K ¼ ^ logL Z; Ω ðK−1Þ
ð15Þ
Thus the optimal K can be obtained by maximizing the criterion function values like AIL n o ^ ;K ; K ¼ K ;K ¼ arg max AIL Ω ðK Þ min min þ 1; ⋯; K max K
ð16Þ
ð19Þ
j
ð14Þ
where logL ðZ; Ω ðK Þ Þ and logL ðZ; Ω ðK−1Þ Þ represent the maximum loglikelihood function of GMM with K and K − 1 components, respectively. This function can be used to test the improvement of data description when one additional component is added to the mixture model. Also, we can use the relative increment log-likelihood (RIL) criterion, defined as
K opt
ð17Þ
i
Thus the pth and qth components will be merged into a single component, whose mixing probability, mean, and covariance will be updated as
0
ω ¼ ωm þ ωn ω μ þ ωn μ n 0 μ ¼ m m ωm þ ωn T T þ ω ω Σ þ μ μ Σ þ μ μ m m n n m m n n 0 0 0T Σ ¼ −μ μ ω m þ ωn
ð20Þ
In this way, the parameters of newly updated component as well as the other (K − 2) ones are combined as the initialization parameter ^ ð1Þ of (K − 1)–component mixture. It is notable that the merging Ω ðK−1Þ
In the above procedure, EM algorithm should be carried out for parameter estimation when K varies. A practical technique difficulty of EM is its critical dependence on initialization. Without careful initialization, the algorithm may converge to a local optimum. For each K, initialized parameters should be determined independently using clustering methods like K-means. This is tedious and time-consuming. Moreover, the estimation procedures for different K are independent. That is to say, the parameters of mixture model with K components are not used in adjacent estimation procedure of the mixture model with K − 1 components after they are obtained. In this study, we use a hierarchical strategy to address these problems. Firstly, the procedure begins with considerable large number of components Kmax; its corresponding model parameters are gained. By sequentially pruning the least Table 2 Configuration of GMR model. Gaussian component
1st
2nd
3rd
Input x
ωi μi Σi
0.3 0 1
0.4 0 pffiffiffi 6
0.3 6 1
Output y
y =5*x+3
operation for covariance in Eq. (20) is on the assumption of sufficient coverage. When this assumption is not met, an improved technique can be found in reference [36]. 3.3. GMR based soft sensor 3.3.1. Soft sensor for online quality prediction As for multiphase/multimode processes, data distributions can be arbitrarily complicated with non-Gaussian patterns. Therefore, the mixture model is very useful in approximating data distribution in those processes. Explicitly, GMR can not only well describe data distribution, but also model input–output relations. A fundamental issue in using GMR based soft sensor for multiphase/multimode processes is that the number of modes is not known beforehand. Meanwhile, data structures between phase transitions in the multiphase/multimode process may be more complicated than in steady phases. As a result, it probably requires more Gaussian components to describe these data. Comprehensively, how to determine the number of mixture components plays a key role in applying GMR for quality prediction in multiphase/multimode processes. Assume that the process input variables are x, and the output variables are y. Firstly, we combine input variables x and output variables y into a new variable vector v = [xT yT]T. The joint probability
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109
101
Table 3 Parameters obtained using fitted GMR. Gaussian component
1st
2nd
3rd
ω μ
0.3285 0:2272 4:1360 1:3594 6:7969 6:7969 33:9843
0.3624 ‐0:1898 2:0508 5:2115 26:0574 26:0574 130:2871
0.3091 5:8683 32:3416 1:1819 5:9094 5:9094 29:5474
Σ
50 40 30
y
20 10 0 -10 -20 -30 0
50
100
150
200
250
Sampling time Fig. 1. Prediction results of numerical example.
distribution of x and y—the probabilistic density of v, can be described by the GMM distribution
pðvÞ ¼
K X
pðkÞpðvjkÞ
into input and output parts described in Eq. (8). Then the posterior probability and conditional probability distribution ynew,k on xnew in the kth component can be estimated by
ð21Þ pðkjxnew Þ ¼
k¼1
where K is the number of Gaussian components, which is unknown. So the strategy introduced in Section 3.2 will be exploited to determine the optimal Kopt. After that, corresponding mixing weights and Gaussian component parameters (means and covariances) can be stored. Suppose the new input data is xnew, and the actual output measurement is ynew. In order to predict the output, GMR model based soft sensor is constructed. Firstly, the means and covariances are divided
10
pðxnew jθk ÞpðkÞ pðxnew jθk Þ
ð22Þ
^ ^new;k ; ∑ p ynew;k jxnew;k f ynew;k jy new;k
^ ^new;k and ∑ where y new;k are the mean and covariance parameters of the kth conditional Gaussian distribution, which can be calculated by
x 10-5
8
Prediction error
6 4 2 0 -2 -4 -6 0
50
ð23Þ
100
150
Sampling time Fig. 2. Prediction errors of numerical example.
200
250
102
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109 50 40 30
ypred
20 10 0 -10 -20 -30 -30
-20
-10
0
10
20
30
40
50
yobs Fig. 3. Relationship between measured and predicted output.
Eq. (10). At last, the final prediction results of output are the weighted combination of each individual as ^new ¼ y
K opt X
^new;k pðkjxnew Þy
ð24Þ
k¼1
The step-by-step procedures of GMR based soft sensing approach are given below.
Posterior
Posterior
Posterior
(1) Collect training data of input and output for soft sensor model training. (2) Combine the input and output into a data vector and further build GMM model of the joint probability density of input and output. (3) Fitting the mixture model using the method presented in Section 3.2, choose the optimal number of Gaussian components and obtain final mixture model parameters. (4) Decompose the mean and covariance of each Gaussian component. (5) Estimate the posterior probabilities of the input of the testing data with respect to each Gaussian component. (6) Calculate the conditional distribution of output on input of the testing data with respect to each Gaussian component.
(7) Combine the conditional distribution of output given input of the test data into a single Gaussian distribution. 3.3.2. Performance evaluation and method comparisons To evaluate the performance of GMR soft sensor model, several indices are used in this study. Among these, the first index is the root-mean square error (RMSE). The prediction accuracy of soft sensor model can be reflected by the RMSE index. Another widely used index is the coefficient of determination R2, which represents a squared correlation between the actual output and estimated output. R2 can give information about how much of the total variance in the output variable data can be explained by the model. So the reliability of the model can be reflected by this index. For performance comparisons, the traditional partial least squares regression (PLSR) based algorithms are also employed to predict the output. As our purpose is to deal with data of multiphase/multimode processes, we have designed several strategies: a global PLSR (G-PLSR) approach, local PLSR (LPLSR) approach and combination of local PLSR (C-LPLSR) approach. The G-PLSR approach builds a global PLSR model using the whole training data, then the model is used for predicting output. The local PLSR approach builds a PLSR model in a certain identified mode using relevant training data, and then the PLSR is used for
1 0.8 0.6 0.4 0.2 0 0
50
100
150
200
250
0
50
100
150
200
250
0
50
100
150
200
250
1 0.8 0.6 0.4 0.2 0
1 0.8 0.6 0.4 0.2 0
Sampling time Fig. 4. Posteriors of input to modes 1–3 in the testing set.
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109
103
FIC FIC
FIC
8
Compressor Purge
1
9
Condenser
JIC
PI
TIC
FIC
LIC
Separator
13
7
2
Analyzer
Condenser Cooling
Feed A
Feed D
XB
TI
5 PI
LIC
FIC
FIC
PIC
10
3 LIC TIC
XF FIC
stream
TIC
6
Reactor
Analyzer
Analyzer
Stripper
Reactor Cooling
Feed E
FI
FIC
FC
12
TIC
4
FI
FC
G H
11 Product
Feed C
Fig. 5. The flowchart of TE process [37].
Table 4 Input variables for soft sensor models in TE process. No.
Input variable
No.
Input variables
1 2 3 4 5 6 7 8
A feed D feed E feed A and C feed Recycle flow Reactor feed rate Reactor temperature Purge rate
9 10 11 12 13 14 15 16
Separator temperature Separator pressure Separator underflow Stripper pressure Stripper temperature Stripper steam flow Reactor cooling water outlet temperature Separator cooling water outlet temperature
online prediction. The combination of local PLSR builds several local PLSR models in each identified mode. When the output of a new sample need to be estimated, it is first localized to a corresponding mode, and
then output is predicted using localized PLSR model. In all approaches that require mode identification, GMM is adopted for mode classification. Before carrying out the case studies, we firstly give a simple analysis of advantages and disadvantages of GMR compared to other algorithms. Since GMR utilizes a soft classification and combination strategy, the prediction results will be generated automatically, which indicates that there is no need to localize the exact mode and change prediction models for a new input data. This is superior to the local PLSR and the combination form of local PLSR methods. Meanwhile, GMR can well track the multiphase/multimode behaviors, while global PLSR and local PLSR do not possess this property. Furthermore, as a probabilistic structure model, GMR can provide the probability distribution of output, which means both mean and variance of the output can be obtained.
22
20
output variable
18
16
14
12
10
8 0.1
0.2
0.3
0.4
0.5
1st input variable Fig. 6. The relationship of output variable and the first input variable.
0.6
0.7
104
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109
Table 5 Prediction results of RMSE and R2 in the testing set on TE process. Method
GMR
G-PLSR
C-LPLSR
LPLSR1
LPLSR2
LPLSR3
LPLSR4
LPLSR5
LPLSR6
RMSE R2
0.2606 0.9988
1.2630 0.9716
0.2903 0.9985
79.6385 -111.7355
38.0885 -24.7872
5.5305 0.4563
90.5305 -144.6817
24.0351 -9.2685
19.7385 -5.9254
30
25
E component
20
15
10
5
0 0
200
400
600
800
1000
1200
1400
1600
1800
1400
1600
1800
Sampling time Fig. 7. Prediction results of TE process.
1
Prediction error
0.5
0
-0.5
-1
-1.5 0
200
400
600
800
1000
1200
Sampling time Fig. 8. Prediction errors on TE process.
25
20
ypred
15
10
5
0 0
5
10
15
20
yobs Fig. 9. Relationship between measured and predicted output in TE process.
25
Posterior
Posterior
Posterior
Posterior
Posterior
Posterior
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109
105
1 0.5 0 0
200
400
600
800
1000
1200
1400
1600
1800
0
200
400
600
800
1000
1200
1400
1600
1800
0
200
400
600
800
1000
1200
1400
1600
1800
0
200
400
600
800
1000
1200
1400
1600
1800
0
200
400
600
800
1000
1200
1400
1600
1800
0
200
400
600
800
1000
1200
1400
1600
1800
1 0.5 0 1 0.5 0 1 0.5 0 1 0.5 0 1 0.5 0
Sampling time Fig. 10. Posteriors of input to modes 1–6 in the testing set on TE process.
4. Case studies 4.1. Numerical example In this part, we give a simple numerical example to examine if the algorithm can identify the correct number of components. The input data are generated from a univariate GMM of 3 components, and the output is a linear function of the input. The description of the model is displayed in Table 2. There are totally 250 samples for training. For each x and y, random noises with zero mean and variance σ 2 = 0.01 are added independently. For model construction, each pair of x and y are combined as a new vector v. then the algorithm is evolved from Kmax = 10 to Kmin = 2. The optimal Kopt is estimated as 3 by the AIL criterion. The final parameters are obtained as given in Table 3. From the results, it can be seen that the mixing probabilities of each component are nearly the same as the setting values. For the mean vector μ, the first value of the vector is the mean of input x, and the second value is the mean of output y. Thus the mean vectors of each component are also close to their actual values. The same conclusion can be obtained for the results of covariance Σ. From this numerical example, we can be convinced that the algorithm described in Section 3.2 can be used to find the optimal number of components in GMR. To test the soft sensing performance of GMR, a total of 250 data points are generated from 3 modes as the testing dataset. Finally, the RMSE and R2 values for the testing data are 3.0849*10−5 and 1, respectively, which show a perfect
Table 6 Comparison of CPU running time of each algorithm on TE process. Method
CPU running time
GMR
Training time:99.9173 s Testing time:0.0258 s Whole time: 0.0315 s Whole time:2.1819 s Whole time: 2.1622 s Whole time: 2.1646 s Whole time: 2.1616 s Whole time: 2.1609 s Whole time: 2.1644 s Whole time: 2.1682 s
GPLSR C-LPLSR LPLSR1 LPLSR2 LPLSR3 LPLSR4 LPLSR5 LPLSR6
prediction ability of GMR for the testing data. Additionally, detailed prediction results are also shown in Fig. 1, where the red line and blue line represent actual output and estimated output. The red line is overlapped by the blue line because the actual output and predicted output are almost the same. This can be seen from the prediction errors graph and relationship graph of actual and predicted output, which are described in Figs. 2 and 3 respectively. Finally, we also provide the posterior probability distribution of input in the testing set in Fig. 4.
4.2. Tennessee Eastman process The Tennessee Eastman (TE) process is a well-known benchmark that is extensively used for testing process monitoring, control and soft sensor methods [37]. It consists of five operating units: a reactor, a condenser, a compressor, a separator and a stripper. Fig. 5 is the flowchart of this process. There are totally 41 measured variables and 12 manipulated variables in this process. Among the 41 measured variables, there are 22 variables that are easy to measure while the other 19 component variables are difficult to measure. Thus it is necessary to predict these component variables by soft sensor methods. For soft sensor modeling, component C in the purge stream is selected as the output. Meanwhile, 16 easy-to-measure variables are chosen as input variables, which are tabulated in Table 4. There are six different possible operating modes that can be simulated in the TE process. To test the GMR based soft sensor algorithm, all of the six operating modes are simulated and 300 samples in each operation mode are collected as training data for model construction. To examine the multimode non-Gaussian behavior of process data, the relationship between the output variable and the first input variable in the training set is depicted in Fig. 6. Additionally, a total of 1800 samples are generated from the six operation modes, each mode with 300 samples. To construct the soft sensor model, several parameters are first determined. The search range of number of Gaussian components is set from Kmax = 20 to Kmin = 3. The optimal K is obtained as Kopt = 6 utilizing the fitting approach given in Section 3.2. Also, the number of local modes for LPLSR and C-LPLSR is optimized as 6 and the number of component in PLSR is determined as 7. So there are totally six LPLSR regression models in the LPLSR approach.
106
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109
Fig. 11. The flow sheet of the penicillin fermentation process [38].
The prediction results of each algorithm are listed in Table 5. From the results, the best prediction result in the testing dataset is obtained by GMR, and the next one is the C-LPLSR approach. As the six LPLSR models are constructed only using data in relevant mode, they perform worst among all algorithms. Compared to the local PLSR methods, the global PLSR performs much better as it uses the information of data in all modes while local PLSR only uses information of a single operating mode. Specifically, the detailed prediction results of GMR based soft sensor approach on the testing set are given in Fig. 7, where the red line is the actual output and the blue line represents the prediction results of GMR. It is clear that the predicted output can track well with the actual output in all operating modes. Moreover, Fig. 8 exhibits the prediction errors of each sample using GMR. Also, a plot of the measured and predicted output values is shown in Fig. 9. It can be seen that the point pairs of actual and predicted output lie tightly along the diagonal line. At last, to test the mode identification information of testing samples, the posterior probabilities of each sample is provided in Fig. 10. Finally, in order to examine the offline and online efficiency of the GMR model, comparisons of CPU running time among different algorithms are listed in Table 6. The configuration of the computer is as follows. OS: Windows 7 (32 bit); CPU: AMD Athlon™ 64 X2 Dual Core Processor 4600 + (2.4GHz); RAM: 3.5GB; the version of MATLAB is 2013a. Based on the obtained results, though the training time of GMR is much longer than other methods, it takes only 0.0258 s for validation of the testing dataset, which means the GMR based soft sensor is quite suitable for online quality prediction.
Table 7 Input variables in the penicillin fermentation process. No.
Variable description
Unit
1 2 3 4 5 6 7 8 9 10 11
Aeration rate Agitator power Substrate feed rate Substrate feed temperature Substrate concentration Dissolved oxygen concentration Biomass concentration Culture volume Carbon dioxide concentration pH Fermentor temperature
L/h W L/h K g/L g/L g/L L g/L – K
4.3. Fermentation process for penicillin production The fermentation process for penicillin production is a biochemical batch benchmark for evaluating soft sensor and fault diagnosis algorithms. The batch process is a typical multiphase batch process. Fig. 11 shows a flow sheet of this process; a detailed description of the process can be found in reference [38]. A simulation tool of this process can be found at the website of http://simulator.iit.edu/web/pensim/ simul.html. The PenSim simulator contains a fermenter where the biological reaction takes place. For different demands, the simulator provides several settings including the controller, process duration, sampling rates, etc. In the fermentation process, there are totally 16 measured variables. To construct the soft sensor model, the penicillin concentration is chosen as the output variable. Another 11 highly related variables are selected as the input variables, the description of which are listed in Table 7. In this study, a total of five batches are simulated for training of the soft sensor model, and the batch data are batch-wise unfolded into 2-demension data matrix. To test the characteristic of input and outputs, the detailed information of them in one batch are described in Fig. 12. In this figure, subfigure (a)–(k) describe the trend plots of the input variables, while subfigure (l) shows the trend plot of the output variable. As shown in the figure, the process is divided into 3 phases: lag phase, exponential phase and stationary phase. Thus, mixture model is suitable for soft sensor modeling. In addition, one batch data are generated for testing purpose of the soft sensor algorithm. Again, the search range of component number K is set as Kmax = 10 and Kmin = 2. The optimal number of mixture components is obtained as Kopt = 3. Meanwhile, the global PLSR, local PLSR and C-LPLSR are also constructed for comparisons of the quality prediction performance. The number of local modes for LPLSR and MLPLSR is determined as 3 using GMM. Thus there are totally 3 local PLSR models in LPLSR method. In PLSR regression models, the number of principal components is optimized as 5. The prediction results of these different soft sensor approaches are shown together in Table 8. From the results of RMSE and R2 values, the best prediction result in the testing dataset is obtained by C-LPLSR and GMR, which are comparative. C-LPLSR is slightly better than GMR in this case. However, compared to C-LPLSR approach, GMR has several advantages. First, the mode identification and variable regression steps are incorporated into a single model in GMR, while the C-LPLSR approach need two algorithms for mode localization and regression,
30
29.5
29 400
500
10 5 0 -5 0
100
200 300 Samling time
400
500
100
200 300 Samling time
400
1.1
1.05 0
5.4 5.3
1.5
5.2
1
5.1
0.5
5
100
200 300 Samling time
400
200 300 Samling time
400
500
100
200 300 Samling time
400
500
5
0
100
200 300 Samling time
400
100
200 300 Samling time
400
296.1 296 295.9 295.8 0
100
200 300 Samling time
400
500
0
100
200 300 Samling time
400
500
0
100
200 300 Samling time
400
500
110 105 100
500
1.5
298.1 298 297.9 297.8
500
296.2
95 0
298.2
0
296.3
115
10
500
4.9 100
0
15
1.15
2
0
0
1.2
2.5
0
0.01
500
pH
Carbon Dioxide Concentration (g/l)
Substrate Concentration (g/l)
15
0
0.02
Biomass Concentration (g/l)
200 300 Samling time
Fermentor Temperature (K)
100
Dissolved Oxygen Concentration (g/l)
0
0.03
Culture Volume (l)
8.55
0.04
Penicillin Concentration (g/l)
8.6
0.05 Substrate Feed Rate (l/h)
30.5 Agitator Power (W)
Aeration Rate (l/h)
8.65
107 Substrate Feed Temperature (K)
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109
0
100
200 300 Samling time
400
500
1 0.5 0 -0.5
Fig. 12. Trend plots of input and output variables of the training data on the fermentation process.
Table 8 Prediction results of RMSE and R2 in the testing set on penicillin fermentation process. Method
GMR
G-PLSR
C-LPLSR
LPLSR1
LPLSR2
LPLSR3
RMSE R2
0.0296 0.9948
0.0466 0.9870
0.0271 0.9956
0.8258 -3.0619
1.2344 -8.0768
0.3585 0.2346
respectively. Second, GMR do not need to switch the model when the operating condition of process changes; thus, its automation level is higher than C-LPLSR. Furthermore, as the three LPLSR models are constructed only using data in relevant mode, they perform worst in all algorithms. Compared to the local PLSR methods, the global PLSR performs much better as it uses the information of data in all modes, while local PLSR only uses information of a single operating mode.
In detail, the prediction results of GMR based soft sensor approach on the testing set are given in Fig. 13, where the red line and the blue line represent the actual and estimated output using GMR, respectively. From this figure, it is clear that the predicted output can track well with the predicting output. Furthermore, the prediction errors of each sample using GMR are depicted in Fig. 14. The absolute error of each sample is much less than 0.1. Again, the relationship between measured and predicted output values is shown in Fig. 15. It can be seen that the point pairs of actual and predicted output also lie tightly along the diagonal line, which demonstrates good prediction results. Finally, to test the mode identification information of testing samples, the posterior probabilities of each sample is provided in Fig. 16. Again, comparison results of CPU running time of different algorithm are provided in Table 9. Similarly, the training time of GMR is much
1.4 1.2
Penicillin concentration
1 0.8 0.6 0.4 0.2 0 -0.2 0
200
400
600
800
1000
1200
Samling time Fig. 13. Prediction results of penicillin fermentation process.
1400
1600
108
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109 0.06 0.04
Prediction error
0.02 0 -0.02 -0.04 -0.06 -0.08 -0.1 0
200
400
600
800
1000
1200
1400
1600
Sampling time Fig. 14. Prediction errors on penicillin fermentation process.
1.2
1
ypred
0.8
0.6
0.4
0.2
0 0
0.2
0.4
0.6
0.8
1
1.2
1.4
yobs
Posterior
Posterior
Posterior
Fig. 15. Relationship between measured and predicted output on penicillin process.
1 0.8 0.6 0.4 0.2 0 0
200
400
600
800
1000
1200
1400
1600
0
200
400
600
800
1000
1200
1400
1600
0
200
400
600
800
1000
1200
1400
1600
1 0.8 0.6 0.4 0.2 0
1 0.8 0.6 0.4 0.2 0
Sampling time Fig. 16. Posteriors to mode 1–3 of input in the testing set of the penicillin process.
X. Yuan et al. / Chemometrics and Intelligent Laboratory Systems 138 (2014) 97–109 Table 9 Comparison of CPU running time of each algorithm on penicillin process. Method
CPU running time
GMR
Training time:159.9114 s Testing time:0.0231 s Whole time: 0.05182 s Whole time:7.9877 s Whole time: 7.7325 s Whole time: 7.7356 s Whole time: 7.7332 s
GPLSR C-LPLSR LPLSR1 LPLSR2 LPLSR3
longer than other methods, while the testing time is very small on the testing dataset. Hence, GMR can be considered as an effective method for online quality prediction. 5. Conclusions In this paper, a Gaussian mixture regression model based soft sensor has been constructed for quality prediction in multiphase/multimode processes. To improve the GMR model fitting, a heuristic algorithm is adopted for parameter initialization and component number optimization. Compared to other conventionally used soft sensor approaches, GMR integrates mode identification and input–output regression into only the Gaussian mixture model. Moreover, due to its mixture probabilistic framework, the GMR model based soft sensor can automatically carried out quality prediction under different operating modes/phases without switching between different local models. Both feasibility and effectiveness of the GMR based soft sensor have been confirmed through a numerical example and two simulation benchmarks. Compared to other methods, the new soft sensor has provided more satisfactory performances. During the online operations for new input/output data samples, GMR can be adapted to the newest operation conditions. This can be discussed in the following two aspects. For multimode/multiphase processes, new operation mode will be firstly identified by methods like process monitoring technique. If no new mode is detected, the original model does not need to be updated. Otherwise, model can be updated after enough samples are accumulated for new operation mode. For other continues processes, GMR model can be adapted using moving window or recursive techniques. Conflict of interest There is no conflict of interest. Acknowledgement This work was supported in part by National Natural Science Foundation of China (NSFC) (61370029), National Project 973 (2012CB720500), and the Fundamental Research Funds for the Central Universities (2013QNA5016). References [1] Z. Ge, B. Huang, Z. Song, Mixture semisupervised principal component regression model and soft sensor application, Aiche J. 60 (2014) 533–545. [2] H. Kaneko, K. Funatsu, Adaptive soft sensor model using online support vector regression with time variable and discussion of appropriate parameter settings, Procedia Comput. Sci. 22 (2013) 580–589. [3] S. Kim, R. Okajima, M. Kano, S. Hasebe, Development of soft-sensor using locally weighted PLS with adaptive similarity measure, Chemom. Intell. Lab. Syst. 124 (2013) 43–49. [4] S. Khatibisepehr, B. Huang, S. Khare, Design of inferential sensors in the process industry: a review of Bayesian methods, J. Process Control 23 (2013) 1575–1596.
109
[5] P. Kadlec, B. Gabrys, S. Strandt, Data-driven soft sensors in the process industry, Comput. Chem. Eng. 33 (2009) 795–814. [6] I. Jolliffe, Principal component analysis, Wiley Online Library, 2005. [7] H.O.A. Wold, Nonlinear estimation by iterative least square procedures, 1968. [8] H. Kaneko, M. Arakawa, K. Funatsu, Development of a new soft sensor method using independent component analysis and partial least squares, AlChE J. 55 (2009) 87–98. [9] C.M. Bishop, Neural networks for pattern recognition, Oxford University Press, 1995. [10] Z. Ge, Z. Song, A comparative study of just-in-time-learning based methods for online soft sensor modeling, Chemom. Intell. Lab. Syst. 104 (2010) 306–317. [11] P. Angelov, D. Filev, N. Kasabov, Evolving intelligent systems: methodology and applications, John Wiley & Sons, New York, 2010, pp. 313–336. [12] E. Lughofer, V. Macián, C. Guardiola, E.P. Klement, Identifying static and dynamic prediction models for NOx emissions with evolving fuzzy systems, Appl. Soft Comput. 11 (2011) 2487–2500. [13] C. Cernuda, E. Lughofer, W. Märzinger, J. Kasberger, NIR-based quantification of process parameters in polyetheracrylat (PEA) production using flexible non-linear fuzzy systems, Chemom. Intell. Lab. Syst. 109 (2011) 22–33. [14] C. Cernuda, E. Lughofer, P. Hintenaus, W. Märzinger, T. Reischer, M. Pawliczek, J. Kasberger, Hybrid adaptive calibration methods and ensemble strategy for prediction of cloud point in melamine resin production, Chemom. Intell. Lab. Syst. 126 (2013) 60–75. [15] C.M. Bishop, Pattern recognition and machine learning, Springer, New York, 2006. [16] L. Lu, A. Ghoshal, S. Renals, Cross-lingual subspace Gaussian mixture models for low-resource speech recognition, 2013. [17] N. Greggio, A. Bernardino, C. Laschi, P. Dario, J. Santos-Victor, Fast estimation of Gaussian mixture models for image segmentation, Mach. Vis. Appl. 23 (2012) 773–789. [18] P. Núñez, R.P. Rocha, M. Campos, J. Dias, Novelty detection and segmentation based on Gaussian mixture models: a case study in 3D robotic laser mapping, Robot. Auton. Syst. 61 (2013) 1696–1709. [19] S.W. Choi, J.H. Park, I.-B. Lee, Process monitoring using a Gaussian mixture model via principal component analysis and discriminant analysis, Comput. Chem. Eng. 28 (2004) 1377–1387. [20] J. Yu, S.J. Qin, Multimode process monitoring with Bayesian inference‐based finite Gaussian mixture models, AlChE J. 54 (2008) 1811–1829. [21] J. Yu, Multiway Gaussian mixture model based adaptive kernel partial least squares regression method for soft sensor estimation and reliable quality prediction of nonlinear multiphase batch processes, Ind. Eng. Chem. Res. 51 (2012) 13227–13237. [22] R. Grbić, D. Slišković, P. Kadlec, Adaptive soft sensor for online prediction and process monitoring based on a mixture of Gaussian process models, Comput. Chem. Eng. 58 (2013) 84–97. [23] Z. Ghahramani, M.I. Jordan, Supervised learning from incomplete data via an EM approach, Advances in Neural Information Processing Systems 6, Citeseer, 1994. [24] C. Sylvain, Robot programming by demonstration: a probabilistic approach, EPFL Press, 2009. [25] T.H. Falk, H. Shatkay, W.-Y. Chan, Breast cancer prognosis via Gaussian mixture regression, Electrical and Computer Engineering, 2006. CCECE'06, Canadian Conference on, IEEE, 2006, pp. 987–990. [26] Y. Tian, L. Sigal, H. Badino, F. De la Torre, Y. Liu, Latent Gaussian mixture regression for human pose estimation, Computer Vision–ACCV 2010, Springer, 2011, pp. 679–690. [27] G. Wang, L. Qian, Z. Guo, Continuous tool wear prediction based on Gaussian mixture regression model, Int. J. Adv. Manuf. Technol. 66 (2013) 1921–1929. [28] T. Cederborg, M. Li, A. Baranes, P.-Y. Oudeyer, Incremental local online Gaussian mixture regression for imitation learning of multiple tasks, Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, IEEE, 2010, pp. 267–274. [29] S. Calinon, F. Guenter, A. Billard, On learning, representing, and generalizing a task in a humanoid robot, IEEE Trans. Syst. Man Cybern. B 37 (2007) 286–298. [30] C. Fraley, A.E. Raftery, How many clusters? Which clustering method? Answers via model-based cluster analysis, Comput. J. 41 (1998) 578–588. [31] J.J. Oliver, R.A. Baxter, C.S. Wallace, Unsupervised learning using MML, ICML, 1996, pp. 364–372. [32] J. Rissanen, Stochastic complexity in statistical inquiry, World Scientific Singapore, 1989. [33] G. McLachlan, T. Krishnan, The EM algorithm and extensions, John Wiley & Sons, 2007. [34] M.A. Figueiredo, J.M. Leitão, A.K. Jain, On fitting mixture models, Energy minimization methods in computer vision and pattern recognition, Springer, 1999, pp. 54–69. [35] S. Kullback, Information theory and statistics, Courier Dover Publications, 2012. [36] E. Lughofer, A dynamic split-and-merge approach for evolving cluster models, Evol. Syst. 3 (2012) 135–151. [37] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Comput. Chem. Eng. 17 (1993) 245–255. [38] G. Birol, C. Ündey, A. Cinar, A modular simulation package for fed-batch fermentation: penicillin production, Comput. Chem. Eng. 26 (2002) 1553–1565.