Control Engineering Practice 20 (2012) 165–172
Contents lists available at SciVerse ScienceDirect
Control Engineering Practice journal homepage: www.elsevier.com/locate/conengprac
Multiple model based LPV soft sensor development with irregular/missing process output measurement$ Xing Jin, Siyun Wang, Biao Huang n, Fraser Forbes The Department of Chemical and Materials Engineering, University of Alberta, Edmonton, Alberta, Canada T6G 2G6
a r t i c l e i n f o
abstract
Article history: Received 8 February 2011 Accepted 15 October 2011 Available online 9 November 2011
Data-driven soft sensors have been applied extensively in process industry for process monitoring and control. Linear soft sensors, which are only valid within a relatively small operating envelope, are considered to be insufficient in practice when the processes transit among several operating modes. Moreover, owing to a variety of causes such as malfunction of sensors, multiple rate sampling scheme for different process variables, etc., missing data problem is commonly experienced in process industry. In this paper, soft sensor development with irregular/missing output data is considered and a multiple model based linear parameter varying (LPV) modeling scheme is proposed for handling nonlinearity. The efficiency of the proposed algorithm is demonstrated through several numerical simulation examples as well as a pilot-scale experiment. It is shown through the comparison with the traditional missing data treatment methods in terms of the parameter estimation accuracy that the developed soft sensors enjoy improved performance by employing the expectation–maximization (EM) algorithm in handling the missing process data and model switching problem. & 2011 Elsevier Ltd. All rights reserved.
Keywords: Bayesian methods Soft sensors Expectation-maximization method
1. Introduction Data from the process play a key role in process monitoring and control as it reflects the status of process operations. Although it is true that in the past few decades the technologies for developing more stable, reliable and accurate sensors have experienced dramatic improvement, large process measurement time delays or unreliable measurement is still frequently observed for certain type of process variables such as product composition, liquid concentration, etc. As a result, industrial practitioners as well as academic researchers have been working on developing inferential models through which the actual value of the process variables can be indirectly inferred from other easily measured variables. In the process industry, such predictive models are normally referred to as soft sensors, owing to the similarity with their hardware counterpart in measuring the process variables. Although it would be desirable to apply engineering principles to build model-driven soft sensors with more meaningful model structures and a wider model validity region, some limitations associated with the model-driven soft sensors (e.g. lack of process knowledge, numerous amount of required effort, etc.) prevent them from wide applications. As an alternative, data-driven soft sensors use statistical models identified
$ n
A short version of the paper will be presented at the 2011 Adconip conference. Corresponding author. Tel.: þ1 780 492 9016; fax: þ1 780 492 2881. E-mail address:
[email protected] (B. Huang).
0967-0661/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.conengprac.2011.10.007
from archived process data which normally require less or even no process knowledge in comparison with their model-driven counterpart. In the past few decades, a number of successful stories on the applications of soft sensors in process inferential control, monitoring and fault detection have already been reported. For instance, Kano, Miyazaki, Hasebe, and Hashimoto (2000, 2003) have reported successful applications of linear data-driven soft sensors in the inferential control of distillation compositions; Yan, Shao, and Wang (2004) and Wang, Srinivasan, Liu, Guru, and Leong (2006) proposed various methods in building soft sensors for on-line prediction of certain key process quality variables which would otherwise be difficult to measure with traditional hardware sensors. As pointed out by Park and Han (2000), since most of chemical processes are nonlinear in nature, linear soft sensors may fail to provide satisfactory prediction performance when the process is operated under a wide range. A number of methods for nonlinear soft sensor development have been proposed in which modeling tools such as artificial neural networks (ANNs), support vector machines (SVM) and nonlinear partial least squares (NPLS), etc. are utilized in extracting nonlinear predictive models from process data collected under different operating conditions. These methods have been widely adopted by researchers. While it is undeniable that a wide spectrum of applications have already been well tackled by methods like SVM or ANNS, problems such as interpretability of the model structure, difficulties in tuning certain model parameters have imposed great inconvenience as well as challenges when applying these black box nonlinear
166
X. Jin et al. / Control Engineering Practice 20 (2012) 165–172
modeling methods in real industrial process modeling practice. It is desirable to have a modeling method which not only delivers qualified prediction performance, but also presents simpler and more interpretable model structure. A modeling scheme featured by smooth combination of locally identified linear regression models was put forward by Park and Han (2000) in their effort to estimate key quality variables of two industrial distillation columns. It was shown that the proposed method can, for both nonlinear and relatively linear process, deliver comparable performance with other well-known methods such nonlinear PLS or ANNS while enjoying conceptually simpler model structure. Following similar principles, Kadlec and Gabrys (2008) also put forward another locally weighted regression (LWR) based nonlinear modeling scheme to achieve automatic adaption to the process operating condition changes which normally cause the invalidity of locally identified linear models. The method shows superior performance when compared with non-adaptive soft sensor in predicting exhaust gas concentration of an industrial process. When facing process records with missing data, which is considered to be commonly experienced with process industry, these methods may suffer from performance deterioration, especially when missing data account for large portion of process data. This incapability of dealing with missing data is shared by most of datadriven soft sensor development methods (Kadlec, Gabrys, & Strandt, 2009). In process industry, hardware sensor failure is the main cause for missing data; however, if process variables are sampled under different frequency, in the context of soft sensor development, the ones with lower sampling frequency can also be perceived as the variables with missing data. For instance, lab analysis is widely applied in measuring product composition in process industry owing to the difficulties associated with online composition measurement. Therefore, the irregular/infrequent sampled composition from the lab analysis may well be perceived as measurement with a good portion of missing data compared with the other process variables such as flow rate, pressure, temperature, etc. In fact, the issue of slow sampled lab analysis data commonly encountered in our applications of soft sensor technology to process industries is the main motivation for this work. As a result, it is desirable to develop a data replacement/imputation strategy for the missing data in the course of soft sensor development. Several mechanisms have already been identified for data missing patterns including missing completely at random (MCAR), missing at random (MAR) and non-ignorable (NI) (Imtiaz & Shah, 2008; Khatibisepehr & Huang, 2008). In this paper, it is assumed that the missing pattern of the process data follows MCAR or MAR, which is considered to be a reasonable assumption since in many realistic applications, the departures from MAR are not large enough to significantly compromise the performance of the MAR-based methods (Schafer & Graham, 2002). The primitive but commonly applied approaches in dealing with missing industrial process data are case deletion and simple mean substitution (Imtiaz & Shah, 2008; Khatibisepehr & Huang, 2008). Effective as it may be, these primitive methods are not rigorous. For example, a model estimated from the data having the missing data points casewise deleted has been shown to be biased if the data are not missing at completely random (Khatibisepehr & Huang, 2008; Schafer & Graham, 2002). Moreover, loss of valuable process information may be incurred as a result of casewise deletion if a significant amount of data is deemed to be missing. For some other existing methods such as mean substitution, single regression imputation, last observation carried forward (LOCF), etc., their statistical properties have been investigated and flaws associated with each of the methods are explained in Schafer and Graham (2002). In contrast with all the older and primitive procedures that are listed above, modern methods based on maximum likelihood and Bayesian multiple imputation for the missing data have drawn
great amount attention and relevant comparative studies on the performance of different missing treatment strategies have been performed (Khatibisepehr & Huang, 2008; Schafer & Graham, 2002). Khatibisepehr and Huang (2008) applied a variety of missing data treatment methods in process modeling and it is found that, under the assumption of MAR for missing data mechanism, the Bayesian method (essentially the EM algorithm) outperforms the other procedures when developing soft sensors for the process. Also, Imtiaz and Shah (2008) provide an in-depth study on the causes of missing data in process industry and conduct analytical study on the principles of the EM algorithm (maximum likelihood based method), data augmentation and multiple imputation methods which, to a large extent, are considered to be the mainstream advanced modern missing data handling methods. Moreover, applications of the EM algorithm along with other filtering techniques (such as particle filter) in parameter estimation have been reported (Gopaluni, 2008; Kim & Stoffer, 2008). As for the more general overview of the development and status quo of the missing data handling methods, we recommend the research paper written by Schafer and Graham (2002). From the brief discussion above, it can be found that owing to the attractive statistical properties as well as the simplicity of implementation, the EM algorithm has already been widely applied and the results appear to be promising. The main contribution that distinguishes this paper from the previous study on soft sensor modeling with irregular sampled process data is that this paper renders a complete solution for building a soft sensor under missing process data by employing the EM algorithm as the main tool. By employing the concept of linear parameter varying (LPV) and calculating the weights assigned to each local model within the EM algorithm, the developed soft sensor enjoys a broader validity range than its linear counterpart with simpler and more meaningful model structure compared with other nonlinear black-models obtained through a variety of other methods (Kadlec et al., 2009). Moreover, the proposed method maintains the performance of the developed soft sensor in the context of missing process data. The paper is organized as follows: in Section 2, different missing mechanisms are briefly discussed and examples of the missing mechanisms are illustrated. Section 3 begins with a revisit of the EM algorithm and lays out the mathematical formulation for the proposed soft sensor modeling in which an LPV model composed of different local models is identified with missing output data. Numerical simulated processes as well as an experimental example are illustrated in Section 4 which aim at demonstrating the effectiveness of the proposed method in developing soft sensors with missing output data. Comparative studies are performed among the proposed method and different missing data treatment methods. Section 5 draws the conclusion based on the results obtained from this paper.
2. Missing mechanisms Missing data occur for a variety of reasons. According to the conditions under which the data are missing, three different types of missing mechanisms have been identified, namely, missing completely at random (MCAR), missing at random (MAR), not missing at random (NMAR) (Khatibisepehr & Huang, 2008). To indicate whether the output measurement at each sampling instant is missing or not, an indication variable I is defined through which the availability of the output measurement is denoted. Ik ,k ¼ 1; 2 . . . N is binary with 0 meaning that the corresponding output at kth time instant Yk is missing while 1 indicating that the output Yk is available. As a result, the output Y k ,k ¼ 1; 2 . . . N composed of observed part and missing part, in
X. Jin et al. / Control Engineering Practice 20 (2012) 165–172
other words, Y ¼ fY miss ,Y obs g. Denote the inputs as X k ,k ¼ 1; 2 . . . N, the different missing mechanisms can be explained as: 1. Missing completely at random (MCAR): Under this missing pattern, the causes for the missing output are unknown and it is believed that the occurrence of missing output does not depend on the selected process inputs and outputs, which can be expressed by probability dependence equation: PðI9Y,XÞ ¼ PðIÞ
ð1Þ
where X denotes the model inputs. Examples for missing at completely random include the missing measurement due to the sensor hardware failure or irregular sampling scheme for some process quality variables (such as product composition of a distillation column). 2. Missing at random (MAR): When the process output data are missing under MAR pattern, it is believed that the observed data Z obs ¼ fY obs ,Xg could be used to explain the causes of the missing: PðI9Y,XÞ ¼ PðI9Y obs ,XÞ
ð2Þ
In reality, we have some process variables that are difficult and expensive to measure. They are measured only when there is a necessity (such as the input measurement X indicates certain abnormality of the process output variable Y) of measuring the output; normally, no process output values are recorded. 3. Not missing at random (NMAR): This missing mechanism indicates that the process output is unavailable due to the combinational effect of the observed data Z obs ¼ fY obs ,Xg and the missing output data: PðI9Y,XÞ ¼ PðI9Y miss ,Y obs ,XÞ
ð3Þ
For instance, due to the settings/capability of sensors, process output values are not recorded when it exceeds the specified range. In this paper, it is assumed that the process output data are missing in MCAR or MAR pattern, which means that the values of the random binary variable I do not depend on the unobserved missing output Ymiss.
3.1. EM algorithm revisited A nonlinear iterative optimization method named expectation–maximization (EM) algorithm which has been widely used in different research disciplines is employed in this paper. In the EM algorithm, a complete data set C consisting of two parts: fC obs ,C mis g, is constructed and its expectation, which is normally referred to as the Q function in the literature (Jin & Huang, 2009), is formulated and calculated: Q ðY9Y
Þ ¼ EC
mis 9ðY
old
,C obs Þ
flog LðC, YÞg
ð4Þ
where Yold refers to the parameters estimated from the previous iteration, the expectation EC 9ðYold ,C Þ flog LðC, YÞg is taken with mis
obs
respect to the missing data Cmiss conditioned on ðYold ,C obs Þ, LðC, YÞ is the likelihood function of the complete data C and is dependent also on the parameters Y. The step carrying out the computation of Eq. (4) is called the E-step. Afterwards, to determine the optimal parameters that maximize the expectation of the complete data C over the missing information Cmiss, a maximization step is implemented, as expressed by
Y ¼ arg maxQ ðY9Yold Þ Y
The parameters Y determined from this maximization will be used as Yold for the next iteration. This iteration repeats until convergence. 3.2. Mathematical formulation of the LPV models identification under the EM algorithm Owing to the nonlinearity of the processes, the applicability of the linear models to the process, which is operated within a wide region, can be reduced rapidly. Therefore, it is necessary that a nonlinear model is obtained or approximated by a group of linear models so as to describe the process dynamics in a wider region. A LPV model, as indicated by its name, is featured by its linear model structure and varying model parameters. After being first introduced by Shamma and Athans (1991) in their study of gain scheduling control of an LPV process, a considerable number of publications on LPV model identification have already been seen thanks to its capability in approximating complex nonlinear systems (Bamieh & Giarr, 2002; Xu, Zhao, & ZhuQian, 2009). An LPV model is composed of several local linear models and by smoothly combining these different local models, it is expected that the prediction from the LPV model could capture the dynamics of the process under various operating conditions. As discussed in the Introduction section, process measurement devices installed in the industrial fields may suffer from all sorts of malfunctions, which may result in missing of the reading in the historian database. Moreover, certain types of process output variables (product composition, etc.), due to a variety of reasons, cannot be sampled as frequently as the other process variables such as flow, pressure or temperature. This discrepancy in the sampling rate of the input and output variables can also be treated as an output missing from the modeling perspective. All these possible causes for missing process variable readings pose challenges for data-driven soft sensor development. To circumvent the difficulties introduced by the missing output in the soft sensor development, the EM algorithm is employed in this section to estimate the parameters, which maximize the likelihood of observing the available input–output data. In this paper, we consider the following steady-state nonlinear model: Y k ¼ f ðX k , YÞ
3. Multi-model LPV modeling with missing output data
old
167
where X k ¼ ½1,x1k ,x2k . . . xnk is the model regressor while f determines the nonlinear relationship between the input Xk and the output Yk. This nonlinear model will be approximated by a LPV model. Prediction from a LPV model is calculated by the weighted average of each local model according to the measurement of the scheduling variable(s); therefore, the prediction for Yk would be Y^ k ¼
M X
aki Y^ ki , k ¼ 1; 2 . . . N, i ¼ 1; 2 . . . M
ð7Þ
i¼1
where Y^ k and Y^ ki ¼ X k yi in Eq. (7) represent the overall LPV model prediction and the prediction from the ith local model respectively. yi denotes the ith local model parameters. aki denotes the weight assigned to the ith local model for the kth data point. Assuming that the scheduling variable(s) through which the operating condition of process is reflected can be measured or inferred from the other process measurements, using the exponential function to smoothly combine the local linear models, the weight aki ,k ¼ 1; 2 . . . N,i ¼ 1; 2 . . . M for each local model can be calculated by the following equations: wki ¼ exp
ðT k T i Þ2 2ðsi Þ2
wki
ð5Þ
ð6Þ
aki ¼ PM
i¼1
wki
,
,
k ¼ 1; 2 . . . N, i ¼ 1; 2 . . . M
k ¼ 1; 2 . . . N, i ¼ 1; 2 . . . M
ð8Þ
ð9Þ
168
X. Jin et al. / Control Engineering Practice 20 (2012) 165–172
where Tk and Ti in Eq. (8) represent the measurement of the scheduling variable(s) at kth time instant and the center of ith local model respectively. Due to the difficulty associated with brute force optimization method in maximizing the following likelihood function (Jin & Huang, 2009)
owing to Eq. (6) where Yk does not depend on Y k1 , . . . , Y 1 ,X k1 , . . . ,X 1 and does not depend on Tk if Hk is given. In the derivation of Eq. (12), the product of PðIk 9Y obs ,X k . . . X 1 ,Ik1 . . . I1 , T k . . . T 1 ,Hk . . . H1 , YÞC xk C T k is lumped to a term C where
maxPðY N . . . Y 1 ,X N . . . X 1 ,IN . . . I1 ,T N . . . T 1 9YÞ
and
Y
ð10Þ
C xk ¼ PðX k 9Y k1 . . . Y 1 ,X k1 . . . X 1 ,Ik1 . . . I1 ,T k . . . T 1 ,Hk1 . . . H1 , YÞ
C T k ¼ PðT k 9Y k1 . . . Y 1 ,X k1 . . . X 1 ,Ik1 . . . I1 ,T k1 . . . T 1 ,Hk1 . . . H1 , YÞ
where binary variable I in Eq. (10) denotes the availability of the process output, and I¼1 indicates that data is available while I ¼0 indicates that data is missing, EM algorithm is employed here in an attempt to solve the optimization problem as given in Eq. (10). Introducing a hidden variable Hk ,k ¼ 1; 2 . . . N, to denote the local model identity of each data point, the conditional expectation of the complete data C ¼ fY N . . . Y 1 ,X N . . . X 1 ,IN . . . I1 ,T N . . . T 1 , HN . . . H1 g over the missing output Ymiss as well as the data point identity H can be determined as old
Q ðY, Y
Þ ¼ EYmiss,H9Yold ,C
since all these terms are not dependent on Y and they will not play a role in the following derivation process. PN The term EYmiss,H9Yold ,C k ¼ 1 log PðY k 9X k ,T k ,H k , YÞ in Eq. (12) obs can be further expanded as EYmiss,H9Yold ,C
N X obs
¼ EH9Yold ,C
log PðY k 9X k ,Ik ,Hk , YÞ
k¼1 N X
obs
Ik log PðY k 9X k ,Hk , YÞ
k¼1
obs
flog PðY N . . . Y 1 ,X N . . . X 1 ,IN . . . I1 ,T N . . . T 1 ,HN . . . H1 9YÞg
þEYmiss,H9Yold ,C
¼ EYmiss,H9Yold ,C obs ( N Y log fPðY k 9Y k1 . . . Y 1 ,X k . . . X 1 ,Ik . . . I1 ,T k . . . T 1 ,Hk . . . H1 , YÞ k¼1
PðIk 9Y k1 . . . Y 1 ,X k . . . X 1 ,Ik1 . . . I1 ,T k . . . T 1 ,Hk . . . H1 , YÞ PðHk 9Y k1 . . . Y 1 ,X k . . . X 1 ,Ik1 . . . I1 ,T k . . . T 1 ,Hk1 . . . H1 , YÞ PðX k 9Y k1 . . . Y 1 ,X k1 . . . X 1 ,Ik1 . . . I1 ,T k . . . T 1 ,Hk1 . . . H1 , YÞ
)
PðT k 9Y k1 . . . Y 1 ,X k1 . . . X 1 ,Ik1 . . . I1 ,T k1 . . . T 1 ,Hk1 . . . H1 , YÞg ð11Þ where the notation EA9B refers to expectation with respect to conditional probability distribution pðA9BÞ, the probability expression PðIk 9Y k1 . . . Y 1 ,X k . . . X 1 ,Ik1 . . . I1 ,T k . . . T 1 ,Hk . . . H1 , YÞ in Eq. (11) can be further simplified as PðIk 9Y obs ,X k . . . X 1 ,Ik1 . . . I1 , T k . . . T 1 ,Hk . . . H1 Þ owing to the assumption that the missing mechanism that is under consideration in this paper is MAR or MCAR. In other words, for the process output data Y ¼ fY miss ,Y obs g, the values of missing index variable I are only dependent on the observed process data. Define O ¼ foi ,i ¼ 1; 2 . . . Mg as the validity width of each local model. Eq. (11) can be further simplified as
N X obs
ð1Ik Þlog PðY k 9X k ,Hk , YÞ
ð13Þ
k¼1
where Y k ,k ¼ 1; 2 . . . N, in Eq. (13) consist of observed and missing output. Owing to the multiplication of Ik ,k ¼ 1; 2 . . . N, within the PN term EYmiss,H9Yold ,C k ¼ 1 Ik log PðY k 9X k ,H k , YÞ, the expectation obs over the missing observation Ymiss can be dropped without causing any computational difference because for all the missing parts in Y k ,k ¼ 1; 2 . . . N, the missing index Ik ,k ¼ 1; 2 . . . N is equal P to zero; thus N k , YÞ will be zero. k ¼ 1 Ik log PðY k 9X k ,HP N Similarly, the term EYmiss,H9Yold ,C k ¼ 1 ð1Ik Þ log PðY k 9X k ,H k , YÞ obs P N can also be written as EY,H9Yold ,C k ¼ 1 ð1Ik Þlog PðY k 9X k ,H k , YÞ obs since for non-missing part, this term is automatically zero. As a result, Eq. (13) can be further expanded as EYmiss,H9Yold ,C ¼
N X obs
N X M X
Ik PðHk ¼ i9T k ,Oold Þlog PðY k 9X k , yi Þ
k¼1i¼1
(
þEY9Yold ,C ¼
N X M X
log PðY k 9X k ,Ik ,Hk , YÞ
k¼1
obs
N X M X
) ð1Ik ÞPðHk ¼ i9T k ,Oold Þlog PðY k 9X k , yi Þ
k¼1i¼1
Ik PðHk ¼ i9T k ,Oold Þlog PðY k 9X k , yi Þ
k¼1i¼1 N Y
Q ðY, Yold Þ ¼ EYmiss,H9Yold ,C log obs
fPðY k 9X k ,Ik ,Hk , YÞPðHk 9T k ,OÞ
þ
k¼1
Z
PðIk 9Y obs ,X k . . . X 1 ,Ik1 . . . I1 ,T k . . . T 1 ,Hk . . . H1 , YÞ C xk C T k g
¼ EYmiss,H9Yold ,C ¼ EYmiss,H9Yold ,C þ EH9Yold ,C
N X obs
N X obs
log PðHk 9T k ,OÞ þ C
ð1Ik ÞPðHk ¼ i9T k ,Oold Þ
k¼1i¼1
PðY k 9X k ,ðyi Þold Þlog PðY k 9X k , yi Þ dY k
PðHk ¼ i9T k ,Oold Þ ¼ ð12Þ
k¼1
Here, in the derivation of Eq. (12) from Eq. (11), the following equation holds: PðHk 9Y k1 . . . Y 1 ,X k . . . X 1 ,Ik1 . . . I1 ,T k . . . T 1 ,Hk1 . . . H1 , YÞ ¼ PðHk 9T k ,OÞ due to the fact that the local model identity associated with each of the data points depends only on the scheduling variable(s). Furthermore,
ð14Þ
where exp
log PðY k 9X k ,Ik ,Hk , YÞ
k¼1
N X obs
flog PðY k 9X k ,Ik ,Hk , YÞPðHk 9T k ,OÞCg
k¼1
N X M X
ðT k T i Þ2
2ðsold Þ2 9gððoi Þold Þ ðT k T i Þ2 i ¼ 1 exp 2ðsold Þ2
PM
therefore, Eq. (14) can be rewritten as EYmiss,H9Yold ,C ¼
N X obs
N X M X
log PðY k 9X k ,T k ,Hk , YÞ
k¼1
Ik gððoi Þold Þlog PðY k 9X k , yi Þ
k¼1i¼1
þ
N X M X
ð1Ik Þgððoi Þold Þ
Z
PðY k 9X k ,ðyi Þold Þlog PðY k 9X k , yi Þ dY k
k¼1i¼1
PðY k 9Y k1 . . . Y 1 ,X k . . . X 1 ,Ik . . . I1 ,T k . . . T 1 ,Hk . . . H1 , YÞ ¼ PðY k 9X k ,Ik ,Hk , YÞ
ð15Þ
X. Jin et al. / Control Engineering Practice 20 (2012) 165–172
R The integral term PðY k 9X k ,ðyi Þold Þlog PðY k 9X k , yi Þ dY k in Eq. (15) can be calculated as Z PðY k 9X k ,ðyi Þold Þlog PðY k 9X k , yi Þ dY k Z 2 1 2 ¼ PðY k 9X k ,ðyi Þold Þlog pffiffiffiffiffiffi expð1=2s ÞðY k X k YÞ dY k 2ps Z 1 ¼ PðY k 9X k ,ðyi Þold Þlog pffiffiffiffiffiffi 2ps 2
2
þ PðY k 9X k ,ðyi Þold Þexpð1=2s ÞðY k X k YÞ dY k Z 1 1 ðY k X k YÞ2 dY k ¼ log pffiffiffiffiffiffi þ PðY k 9X k ,ðyi Þold Þ 2s2 2ps 1 1 old ¼ log pffiffiffiffiffiffi 2 ððsold Þ2 þ ðX k yi Þ2 Þ 2ps 2s 1 1 old þ 2 ðX k yi ÞðX k yi Þ 2 ðX k yi Þ2 2s s
ð16Þ
Substituting Eqs. (15) and (16) in Eq. (12), the Q function can be derived as Q ðY, Yold Þ ¼
N X M X
Ik gððoi Þold Þlog PðY k 9X k , yi Þ
1 1 þ ð1Ik Þgððoi Þold Þ log pffiffiffiffiffiffi 2 ððsold Þ2 2 s p s 2 k¼1i¼1 1 1 þ ðX k ðyi Þold ÞÞ þ 2 ðX k yi ÞðX k ðyi Þold Þ 2 ðX k yi Þ2 2s s N X M X gððoi Þold Þlog PðHk ¼ i9T k ,OÞ þC ð17Þ þ M X
k¼1i¼1
To calculate the system parameters through which the conditional expectation function Q given in Eq. (17) can be maximized, the derivative is taken over the parameters fyi ,i ¼ 1; 2 . . . M, sg such that @Q =@s ¼ 0,@Q =@yi ¼ 0,i ¼ 1; 2 . . . M , can be achieved. After several steps of manipulation, the process noise variance s2 as well as the local model parameters yi ,i ¼ 1; 2 . . . M, is obtained as PN PM gððo Þold ÞIk ðY k X k yi Þ2 s2 ¼ k ¼ 1 PNi ¼ 1 PM i old k¼1 i ¼ 1 gððoi Þ Þ PN PM old old 2 1 gððoi Þ Þð1Ik ÞðX k yi X k ðyi Þ Þ þ k ¼ 1 i ¼P PM N old k¼1 i ¼ 1 gððoi Þ Þ PN PM old 2 gððoi Þ Þð1Ik Þðsold i Þ ð18Þ þ k ¼ 1PNi ¼ 1 PM old k¼1 i ¼ 1 gððoi Þ Þ PN
yi ¼
k¼1
PN
old T old k ¼ 1 gððoi Þ Þð1Ik ÞðX k yi ÞX k T old k ¼ 1 gððoi Þ ÞX k X k
gððoi Þold ÞIk Y k X Tk þ PN
ð19Þ
To calculate the validity width of each local linear model oi ,i ¼ 1; 2 . . . M, the ones that achieve the minimization of the sum of the square error between the predictions and the real process data will be searched and calculated as min Onew
s:t:
N X
3.2.1. Influence of the missing data imputation on the LPV model identification Binary variable I is introduced in this paper to account for the availability of the measurement at certain time instant. For the data points that are considered as unavailable, the predictions from the identified LPV model given all the other process information are calculated to replace/impute the missing parts of the data set. In nonlinear processes modeling using the LPV models, owing to the necessity of estimating local models validity width oi ,i ¼ 1; 2 . . . M, the data points that locate in the transition period from one local model to the other are critical in local model width estimations. Therefore, when the data points are missing in the transition period, imputation of the missing information using the predictions from the identified LPV model would be beneficial in increasing the accuracy of the parameter estimations. The improvement in terms of parameter estimation will be shown in one of the following simulated examples in which a nonlinear fermentation reactor is modeled using the proposed LPV modeling method. 3.3. Some practical implementation issues
k¼1i¼1 N X
169
3.3.1. Searching for initial starting points One of the commonly shared issues with nonlinear numerical optimization algorithms is their sensitivity to the initial starting points from which the algorithm starts searching for the optimal solution. For the EM algorithm, as being pointed by Jin and Huang (2009) in their study of the influence of different starting points to the estimation results obtained from the EM algorithm, it is desirable to have a set of appropriate initial guesses in order to secure the optimality of the algorithm. In this paper, to effectively calculate the proper initial guess without imposing any requirement on the amount of a priori knowledge for the process, genetic algorithm (GA) is employed in an attempt to search for the appropriate initial guess for system parameters y, s and local model validity region oi ,i ¼ 1; 2 . . . M for soft sensor development using the LPV modeling method. Following the procedure of Rao, Tun, and Lakshminarayanan (2009), a pool of candidates (chromosomes) are randomly selected first and the fitness of each individual in the population is evaluated through certain fitness calculation function. Afterwards, different genetic operators including selection, reproduction and mutation are applied to maintain the genetic diversity. The algorithm is implemented iteratively until certain termination criterion is met. Out of all the procedures adopted in GA, fitness measurement of the candidates is of particular importance as it directly influences the solution of the algorithm. For system parameters estimation, the mean square errors (MSE) of the predictions of the identified model against the real process data is calculated based on the following equation: PN ðy y^ Þ2 MSEYi ¼ k ¼ 1 k k , i ¼ 1; 2 . . . h ð21Þ N where N and h in Eq. (21) represent the number of data points and the GA population size respectively.
ðIk ðY k Y^ k ðOnew ÞÞ2 þ ð1Ik ÞðY^ k ðOold ÞY^ k ðOnew ÞÞ2 Þ
k¼1
Omin r Onew r Omax
ð20Þ
where Y^ Onew and Y^ Oold denote the prediction calculated from the combination of each of the local models. Onew and Oold denote the local model validity region obtained in the current and previous step respectively. Since the objective function is nonlinear and no specific closed form expression for calculating the optimal oi ,i ¼ 1; 2 . . . M can be derived, a numerical optimization method is employed to optimize Eq. (20). In this paper, we adopt a constrained nonlinear optimization function named ‘fmincon’ provided by MATLAB as to calculate the best width of each Gaussian weighting function associated with relevant local models.
3.3.2. Optimality versus Robustness It is noticed that, in Eq. (20), the constrained maximization over the local model validity width oi ,i ¼ 1; 2 . . . M is nonlinear and a nonlinear optimization procedure is required. By specifying the lower and upper boundaries omin and omax for each of oi ,i ¼ 1; 2 . . . M, the optimization procedure is capable of balancing the optimization versus robustness in its searching for the solutions. For instance, when the user is not confident with a priori knowledge regarding some of the process local model validity width or, when the process data quality is perceived to be good, the bounds of the constraints can be tuned with a relatively large value so that the
170
X. Jin et al. / Control Engineering Practice 20 (2012) 165–172
optimizer has more freedom to search for optimum. On the other hand, given process data with significant magnitude of noise, to increase the robustness of the algorithm, small values for the
boundaries on oi ,i ¼ 1; 2 . . . M are recommended so as to make sure that a meaningful (not necessarily optimal) estimation result is rendered by the algorithm. Therefore, through proper tuning of the upper and lower bounds of the validity width of the local models, it is expected that a balance between the optimality and robustness of the algorithm can be achieved.
9 8
4. Simulation examples
6
4.1. A numerical simulation example
5
A numerical simulation example was used by Liu (2007) in the study of online soft sensor development for chemical process with multiple grades. The nonlinear system can be expressed by
y
7
4
yk ¼ expð2x1k sinðpðx1k x2 ÞÞÞ þ sinðx2 ðx1k þ x2 ÞÞ þek
3
where x1 in Eq. (22) is chosen to be the scheduling variable and the system input, variable x2, is fixed to be 0.5. Process noise is given by e Nð0,0:5Þ. Within the operating range of the scheduling variable x1, four different operating points are selected, namely, x11 ¼ 0:5, x21 ¼ 0, x31 ¼ 0:5, x41 ¼ 1. Around each operating point, 50 input–output data are collected with the input x1 being generated according to the normal distribution x1 Nðxi1 ,0:05Þ,i A 1; 2,3; 4. The transitions between different operating points are filled with 50 data points in which the input/scheduling variable x1 is stair-wise increased gradually with a fixed stair size. Therefore, 350 input–output data points are collected at the end. To test the algorithm’s capability in handling the missing data, following the MAR missing pattern, around 50% of the output data are removed from the training data set. The verification results of the identified model are shown in Figs. 1 and 2
2 1 0 −0.5
0
0.5
1
x1 Fig. 1. Comparison of the nonlinear steady state relationship between the identified LPV model and the real nonlinear model. Blue dashed line denotes the prediction of the output against different input values from the identified LPV model, read solid line represents the real model output against different input values. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
x1 = −0.5
y
1.5 1
100
x1 = −0.1
2 y
50
100
50
100
x1 = 0.3
1.5
50
100
x1 = 0.4
50
100
x1 = 0.7
4
2 50
100
100
50
100
0
50
100
x1 = 0.6
2 1.5 0
50
0
100
x1 = 0.9
50
100
x1 = 1.0
10 8
4 0
x1 = 0.2
2.5
6
2 0
50 x1 = 0.5
8
4
100
1 0
100
x1 = 0.8
6
3
50
50
1.5
1 0
0 2
1.5
1 0
x1 = 0.1
2
1.5
0.5
100
1 0
2
1
50
1.5
1 0
1 0
2
1.5
1
y
50 x1 = 0.0
2
1.5
1.5
1 0
x1 = −0.2
2
1.5
1 0
x1 = −0.3
2
1.5
0.5
y
x1 = −0.4
2
ð22Þ
6 0
50
100
0
50
100
Fig. 2. Cross verification of the identified LPV model under different x1 values throughout the operating range. Black solid line denotes the predicted response from the identified LPV model, red dashed line represents the real system response. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
X. Jin et al. / Control Engineering Practice 20 (2012) 165–172
171
The verification results shown in Fig. 1 demonstrates that the identified multiple linear model based nonlinear model is able to capture the nonlinear steady mapping between the input and the output. To further examine the capability of the identified LPV model in approximating the process under different operating conditions, a cross validation scheme is designed in which the scheduling variable (x1 in this case) evolves over its operating range (from 0.5 to 1). At each of the operating point, input excitation signal x1 is generated according to the normal distribution x1 Nðxi1 ,0:05Þ,i A each of the operating point. Fig. 2 demonstrates the performance of the identified LPV model in describing process behavior under each of the operating condition. It can be seen that, although some discrepancy is observed at certain x1 value, the LPV model can deliver satisfactory approximation of the nonlinear input–output relationship given by Eq. (22).
Select the centers for each of the local models in accordance with the recommendation given by Venkat et al. (2003). The input–output identification data consists of the local sections and process transition sections between the local models. Setting the missing rate of the output data at around 15%, the verification models from the proposed method, case deletion method, LOCF method and mean substitution method are compared in Fig. 4. To further quantify the difference of the model predictions from different methods, the MSE between the model prediction and the real process response curve is computed and listed in Table 1. From the comparison results as shown in Fig. 4 and Table 1, it can be seen that the proposed method not only well captures the nonlinear process relations between the selected input and output variables, but more importantly, it outperforms the other missing data treatment methods.
4.2. Application on a simulated nonlinear process
4.3. Experimental verification
To test the applicability of the proposed algorithm for nonlinear processes, a simulated nonlinear fermentation reactor (Venkat, Vijaysai, & Gudi, 2003) is employed in this section to demonstrate the effectiveness of the proposed algorithm. The process consists of two inputs, namely dilution rate (u1) and feed substrate concentration (u2), and three outputs including biomass concentration (x1), substrate concentration (x2) and product concentration (x3). Differential equations governing the process dynamics have been given in Venkat et al. (2003). Following the same parameter settings as given by Venkat et al. (2003), assuming that the biomass concentration (x1) is of interest and the manipulated variable is the feed substrate concentration (u2), the model describing the nonlinear steady-state relationship between the selected input and output variables is built using the proposed multi-model modeling method. Moreover, comparison with the other commonly used missing data treatment methods is also performed in terms of the model accuracy. By fixing the dilution rate at a constant value 0.1636 h 1, it is noticed that over a wide operation range of the input substrate concentration (from 0 g/l to 40 g/l), the gain of the process changes from positive to negative. The steady state input–output relationship is given in Fig. 3.
A pilot scale three tank system (Jin & Huang, 2009) is utilized in this section to generate experimental data set for the verification purpose of the proposed algorithm. The inlet flow rate (input), which is controlled by a DC pump, is manipulated and the level of the third tank is considered as the output. Under
x1, biomass concentration
7
6
5
4
3
2
8
1
7 x1, biomass mass concentration g/l
8
5 6
10
15 20 25 30 U2, feed substratetion concentration
35
Fig. 4. Comparison of the models performance identified from different missing data treating methods. Blue solid line is the real process nonlinear steady state relationship curve, black star denotes the prediction of the model identified by the proposed method; red cross represents the model prediction obtained from case deletion method; red circle denotes the model prediction from mean substitution method; green triangle represents the model prediction from LOCF method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
5 4 3 2
Table 1 Comparison of the identified models from different missing data treatment methods.
1 0 0
5
10 15 20 25 30 U2, feed substrate concentration g/l
35
40
Fig. 3. Steady state relationship between the input feed substrate concentration and the output biomass concentration.
Missing data treatment methods
The proposed method
CD
MSE
0.057
0.563
Missing data treatment methods
Mean substitution
LOCF
MSE
0.563
2.308
172
X. Jin et al. / Control Engineering Practice 20 (2012) 165–172
proposed in which the EM algorithm is employed to estimate the parameters of the LPV model. Through the tests on several examples as well as the comparison with the other missing data treatment methods, first, the identified model from the proposed LPV identification method is able to deliver satisfactory performance in approximating the nonlinear process; second, the EM algorithm outperforms the traditional missing data treatment methods in terms of parameters estimation accuracy.
25
20
Tank level, cm
15
10 Acknowledgments The authors gratefully acknowledge the financial support from National Science and Engineering Research Council of Canada.
5
0
References
−5 25
30
35
40
45
50
55
60
Inlet Flow rate q, % Fig. 5. Experimental verification of the identified nonlinear model. Plus sign represents the collected experimental data, solid line denotes the prediction from the identified model.
certain inlet flow rate, when the system reaches steady state, the level of the system is determined by the nonlinear relationship given by q ¼ C 3 Ha33
ð23Þ
where q denotes the inlet flow and the H3 represents the height of the third tank. Using the rotameter to measure the inlet flow, the value of q ranges from 0% to 100% and it is selected as the scheduling variable. C3 is considered to be the flow resistance of the valve installed at the outlet of the third tank. a3 depends on the fluid dynamics of the liquid in the third tank, and for ideal laminar fluid, a3 equals 0.5. Steady state relationship as indicated by Eq. (23) between the inlet flow rate and the third tank level is modeled using the proposed multiple model based nonlinear modeling algorithm. Thirty two steady state input–output data are collected and two local models centered around 35% and 50% of the inlet flow are considered. Following MAR missing mechanism, that is, when the inlet flow is larger or smaller than certain pre-defined bound, the output is missed, around 50% of the data are removed from the training data set. Applying the algorithm to the remaining data set, the verification result against the real experimental data is shown in Fig. 5. The prediction curve calculated from the identified LPV model as shown in Fig. 5 exhibits the capability of approximating the process nonlinearity which is described by Eq. (23) with unknown parameter a3 . 5. Conclusion A novel soft sensor development with missing process output data is investigated in this paper. A LPV process modeling scheme is
Bamieh, B., & Giarr, L. (2002). Identification of linear parameter varying models. International Journal of Robust & Nonlinear Control, 12(9), 841–853. Gopaluni, R. (2008). A particle filter approach to identification of nonlinear processes under missing observations. Canadian Journal of Chemical Engineering, 86, 1081–1092. Imtiaz, S. A., & Shah, S. L. (2008). Treatment of missing values in process data analysis. The Canadian Journal of Chemical Engineering, 86(5), 838–858. Jin, X., & Huang, B. (2009). Robust identification of switching/piecewise autoregressive exogenous process. AIChE Journal, 56(7), 1829–1844. Kadlec, P., & Gabrys, B. (2008). Adaptive local learning soft sensor for inferential control support. In 2008 international conference on computational intelligence for modeling, control and automation (pp. 243–248), Vienna, Austria. Kadlec, P., Gabrys, B., & Strandt, S. (2009). Data-driven soft sensors in the process industry. Computers & Chemical Engineering, 33(4), 795–814. Kano, M., Miyazaki, K., Hasebe, S., & Hashimoto, I. (2000). Inferential control system of distillation compositions using dynamic partial least squares regression. Journal of Process Control, 2–3(10), 157–166. Kano, M., Showchaiya, N., Hasebe, S., & Hashimoto, I. (2003). Inferential control of distillation compositions: Selection of model and control configuration. Control Engineering Practice, 8(11), 927–933. Khatibisepehr, S., & Huang, B. (2008). Dealing with irregular data in soft sensors: Bayesian method and comparative study. Industrial & Engineering Chemistry Research, 22(47), 8713–8723. Kim, J., & Stoffer, D. (2008). A stochastic volatility mixture model: Estimation in the presence of irregular sampling via particle methods and the EM algorithm. Journal of Time Series Analysis, 29, 811–833. Liu, J. (2007). Online soft sensor for polyethylene process with multiple production grades. Control Engineering Practice, 15, 769–778. Park, S., & Han, C. (2000). A nonlinear soft sensor based on multivariate smoothing procedure for quality estimation in distillation columns. Computers & Chemical Engineering, 24(2–7), 871–877. Rao, R. K., Tun, K., & Lakshminarayanan, S. (2009). Genetic programming based variable interaction models for classification of process and biological systems. Industrial & Engineering Chemistry Research, 10(48), 4899–4907. Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. Shamma, J. S., & Athans, M. (1991). Guaranteed properties of gain scheduled control for linear parameter-varying plants. Automatica, 27(3), 559–564. Venkat, A. N., Vijaysai, P., & Gudi, R. D. (2003). Identification of complex nonlinear process based on fuzzy decomposition of the steady state space. Journal of Process Control, 13, 473–488. Wang, D., Srinivasan, R., Liu, J., Guru, P. N. S., & Leong, K. M. (2006). Data-driven soft sensor approach for quality prediction in a refinery process. In 2006 IEEE international conference on industrial informatics (pp. 230–235), Singapore. Xu, Z., Zhao, J., & ZhuQian, Y. (2009). Nonlinear MPC using identified LPV model. Industrial & Engineering Chemistry Research, 6(48), 3043–3051. Yan, W., Shao, H., & Wang, X. (2004). Soft sensing modeling based on support vector machine and Bayesian model selection. Computers & Chemical Engineering, 8(28), 1489–1498.