G Model
ARTICLE IN PRESS
JJPC-1879; No. of Pages 11
Journal of Process Control xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Journal of Process Control journal homepage: www.elsevier.com/locate/jprocont
Using horizon estimation and nonlinear optimization for grey-box identification夽 Alf J. Isaksson a,b,∗ , Johan Sjöberg b,c , David Törnqvist d , Lennart Ljung b , Manon Kok b a
ABB AB, Corporate Research, SE-721 78 Västerås, Sweden Dept. Electrical Eng., Linköping University, SE-581 83 Linköping, Sweden Volvo Construction Equipment, SE-631 85 Eskilstuna, Sweden d SenionLab AB, Teknikringen 7, 583 30 Linköping, Sweden b c
a r t i c l e
i n f o
Article history: Received 19 May 2014 Received in revised form 23 December 2014 Accepted 24 December 2014 Available online xxx Keywords: System identification State estimation Parameter estimation Optimization Nonlinear systems Kalman filtering Moving horizon estimation Model predictive control
a b s t r a c t An established method for grey-box identification is to use maximum-likelihood estimation for the nonlinear case implemented via extended Kalman filtering. In applications of (nonlinear) model predictive control a more and more common approach for the state estimation is to use moving horizon estimation, which employs (nonlinear) optimization directly on a model for a whole batch of data. This paper shows that, in the linear case, horizon estimation may also be used for joint parameter estimation and state estimation, as long as a bias correction based on the Kalman filter is included. For the nonlinear case two special cases are presented where the bias correction can be determined without approximation. A procedure how to approximate the bias correction for general nonlinear systems is also outlined. © 2015 Elsevier Ltd. All rights reserved.
1. Introduction This paper ultimately deals with parameter estimation in nonlinear models. What triggers our interest is nonlinear model predictive control [9]. While linear model predictive control has long been an established industrial area, nonlinear MPC has only found industrial applications more recently, see for example [8,22,23]. Partly, the limited applicability is related to modelling and state estimation challenges. Since in MPC an optimization framework is already in place, it is natural to deploy this also for the state estimation. This results in so-called moving horizon estimation (MHE). This paper will focus on some aspects that are important when MHE of states is combined with parameter estimation of model parameters. These parameters could be part of linear black box model parameterizations, but could also be selected unknown parameters of physical significance in a linear or nonlinear model. 夽 This paper is an extension of a keynote paper, [13], presented at DYCOPS 2013 by the first author, which in turn refers to a conference paper, [12], presented at SSS’09 by the first four authors. ∗ Corresponding author at: ABB AB, Corporate Research, SE-721 78 Västerås, Sweden. Tel.: +46 705323236. E-mail address:
[email protected] (A.J. Isaksson).
Estimation of such unknown parameters is known as Grey-boxidentification, see e.g. [3] or [18]. Nonlinear Grey-box identification is typically approached in an off-line setting by maximum likelihood techniques. If process noise is present in the model, this requires the difficult nonlinear prediction problem to be handled. Typical approaches for this are approximate solutions with extended Kalman filter, as detailed in [3] or with particle filtering, e.g. [28]. We study here a different approach, namely to extend the common moving horizon state estimation solution [1,6,7,14,19,20,24, 25] with parameter estimation. Such an approach has also been discussed in, for instance, [17,30]. This tempting approach has some fallacies, however, since biased parameter estimates may result if proper attention is not paid to the criterion formulation. We explain the root of this effect and show how it can be handled in the linear model case. That also suggests a way to treat the problem in the general nonlinear case. We illustrate the approach with an application to a nonlinear drumboiler model. 2. Model predictive control As the name model predictive control indicates a crucial element of an MPC application is the model on which the control is based.
http://dx.doi.org/10.1016/j.jprocont.2014.12.008 0959-1524/© 2015 Elsevier Ltd. All rights reserved.
Please cite this article in press as: A.J. Isaksson, et al., Using horizon estimation and nonlinear optimization for grey-box identification, J. Process Control (2015), http://dx.doi.org/10.1016/j.jprocont.2014.12.008
G Model JJPC-1879; No. of Pages 11
ARTICLE IN PRESS
2
A.J. Isaksson et al. / Journal of Process Control xxx (2015) xxx–xxx
Therefore, before a controller can be implemented a model has to be established. There are two main alternatives available for obtaining the model.
0 −2 −4
In general, a white-box model becomes a DAE 0
40
50
v 0 −2 −4
0
10
20
30
40
50
Fig. 1. Illustration of moving horizon estimation.
= Cxk
2.1. State estimation
xˆ k|k = xˆ k|k−1 + Kk (yk − C xˆ k|k−1 )
(2c)
Pk|k = (I − Kk C)Pk|k−1
(2d)
xˆ k+1|k = Aˆxk|k + Buk
(2e)
T
Pk+1|k = AP k|k A + Q
xk+1 = Axk + Buk + wk
(1a)
yk = Cxk + vk
(1b)
where wk and vk are white Gaussian noises with covariance matrices Q and R respectively. Since we want to use certain quantities in the calculation later, the complete set of Kalman filter equations is given below: (2a) (2b)
(2f)
With access to more computational power, a much newer and increasingly popular approach is to use so-called moving horizon estimation (MHE). In MHE the introduced process and measurement noises are used as slack variables in an optimization formulation. If the model is nonlinear, these slack variables are usually introduced in a discretized version of the model. = g(xk , uk ) + wk
xk+1
= h(xk , uk ) + vk
yk
Moving horizon estimation then corresponds to minimizing T
V = (xk−M+1 − xˆ k−M+1 ) P −1 (xk−M+1 − xˆ k−M+1 ) +
For the state estimation the optimization target is to obtain the best estimate of the internal variable x using knowledge of y and u, to be used as starting point for the forward optimization. This can be done using a Kalman filter (for an old classic see [2]) – or if the model is nonlinear an extended Kalman filter see [15] – where stochastic modelling of the process and measurement noises is applied. A Kalman filter is a recursive method, meaning that it takes only the most recent values of yk and uk to update the previous estimate xˆ k−1 to obtain the new xˆ k . Hence, it does not actually solve an optimization problem on-line. Kalman filtering is done in a statistical framework by adding process noise and measurement noise to the discrete-time state space system given in the previous section.
Pk|k−1 C T Sk−1
30
y measured yhat
= Axk + Buk
Sk = CP k|k−1 C T + R
20
2
Here the integer k denotes the k:th time index for which the signal value is available, i.e. at time kTs , where Ts is the sampling interval. Hence, we have for example xk = x(kTs ). The core of MPC is optimization. In each iteration of the control, i.e. any time a new measurement is collected, two optimization problems have to be solved (both using the model as an equality constraint); one using past data to estimate the current state vector x(t) and one to optimize the future control variables. When solving the forward optimization problem a number of future values of the manipulated variables are calculated. However, only the values at the first time instant are transmitted to the underlying process. At the next time instant the optimizations are repeated, with the optimization windows shifted one time step. This is known as receding horizon control, and is in fact what makes this a feedback control method. Performing optimization just once would correspond to open-loop control. The emphasis of this paper is on the second step – the state estimation – which will be presented in some more detail in the next subsection.
Kk =
10
4
where y(t) denotes the measured process variables, and u(t) the manipulated variable, i.e. the output of the MPC. Finally the internal variable x(t) is what is usually referred to as the state of the system. A black-box model on the other hand, is typically linear, but most often also discrete in time
yk
0
6
˙ = f (x(t), x(t), u(t))
y(t) = h(x(t), u(t))
xk+1
x simulated xhat
2 w
• Deriving a model from first principles using laws of physics, chemistry, etc.; so-called white-box modelling • Estimating an empirical model from experimental data; blackbox modelling
4
k
wnT Q −1 wn + vTn R−1 vn
(3)
n=k−M+1
with respect to all states xn within the horizon and possibly subject to constraints as, for example, xmin ≤ xn ≤ xmax Here P, Q and R are weight matrices used for tuning of the estimator, which have a similar interpretation and importance as the estimate and noise covariance matrices in Kalman filtering. In the minimization wn and vn are replaced by expression with xn , xn+1 , yn and un according to (3). As indicated in its name, the optimization for moving horizon estimation is typically done over a horizon of data [t − (M − 1)Ts , t], where t is the current measurement time. Since this time interval is in the past, we assume access to historic values of the applied manipulated variables uk and the measured process variables yk . The first penalty term in the criterion, P−1 , is called the arrival cost. It is to create a link from one optimization window to the next, where xˆ k−M+1 denotes the estimate for this particular time instant from the optimization run at the previous cycle. Fig. 1 tries to illustrate the MHE optimization which is a weighted sum of the vertical bars in the upper and lower plots. MHE for estimating the states is an extensively studied field, see [26] for a thorough overview. Much of the literature is related
Please cite this article in press as: A.J. Isaksson, et al., Using horizon estimation and nonlinear optimization for grey-box identification, J. Process Control (2015), http://dx.doi.org/10.1016/j.jprocont.2014.12.008
G Model JJPC-1879; No. of Pages 11
ARTICLE IN PRESS A.J. Isaksson et al. / Journal of Process Control xxx (2015) xxx–xxx
to stability of the estimate, arrival cost approximation and how to implement the estimation numerically, see [1,6,7,20,24,25]. In this paper, the focus is different. The main goal is not only to the estimate the states but also to simultaneously estimate parameters in the model. The simultaneous estimation introduces additional aspects that need to be considered. Notice that in the optimization problems described above, the model (nonlinear or linear) should be considered as an equality constraint. We will not go into how these optimization problems are solved. Let us just point out that depending on the objective function and constraints (most importantly the model) different type of optimization problems result, leading to different type of optimization solvers needed. For example, a quadratic objective together with linear constraints corresponds to a quadratic programming (QP) problem, whereas a nonlinear objective or nonlinear model yields a nonlinear programming (NLP) problem which of course is much more difficult to solve. The latter case, when combined with the forward optimization, is usually referred to as a nonlinear MPC problem (NMPC). 2.2. Kalman filter vs MHE Moving horizon estimation is identical to Kalman filtering if (see [24]) • the model is linear and discrete, i.e. given by matrices (A, B, C), • the horizon M = 1, • the arrival cost is given by the Kalman filter Riccati equation as Pk−M+1|k−M , • the noises are zero-mean Gaussian with Evk vT = R and Ewk wT = k k Q , and • there are no active constraints. When M > 1 the MHE estimates are identical to fixed interval smoothing estimates (see e.g. [2]) if the arrival cost is given by the corresponding forward-backward Riccati equation as Pk−M+1|k−1 . The big difference between the two methods is that the MHE optimization is easily extended with constraints. For nonlinear systems this exact similarity does of course no longer hold, why extended Kalman filtering and MHE will produce different results. Again constraint handling as well as the fact that the model only needs to be fulfilled at convergence (particularly useful for unstable processes) are advantages for MHE, while the heavier computational burden is a disadvantage. 3. Parameter estimation Even if physical modelling is used there are often parameters whose numerical values are unknown, and therefore need to be estimated from data. Furthermore it is typically not easy to know at what level of detail to perform the physical modelling. A systematic procedure to gradually build up nonlinear models using collected process data is usually referred to as grey-box identification. One such procedure is presented in the book by Bohlin [3], with which also follows a Matlab based software called MoCaVa. The software was also presented in [4], and contains modules for • Preparation of data. • Simulation – to solve the (nonlinear) differential equations for fixed parameters and known inputs. • Calibration – to start with a root model and extend model gradually. • Validation – to test the model for a specific purpose. The part of the procedure of particular interest in this paper is the Calibration which includes the steps:
3
• Start with the root model. • For each extension, make a fit to data. • Compute the statistical risk of rejecting the previous model. The goal of the calibration can be formulated as: “Find the simplest model that cannot be falsified using the available data”. For the parameter fit, MoCaVa uses a maximum-likelihood criterion based on the extended Kalman filter, EKF, minVML =
N
((yk − yˆ k ) Sk−1 (yk − yˆ k ) + log det(Sk )) T
(4)
k=1
where yˆk = C xˆ k|k−1 are the predictions of the EKF and Sk is given by [the EKF counterpart of] (2a). Another similar approach also based on a linearization of the nonlinear state-space model, but with improved numerical properties, is presented in [16]. Then for the computation of the statistical risk, the fact that VML is (asymptotically) chi-squared distributed is utilized. Hence, if n new parameters are released and fitted, the cost function difference (improvement) V = Vold − Vnew
(5)
is in fact chi-squared distributed with n degrees of freedom, 2n
(see [3]), and the risk of accepting the new model is calculated using the corresponding survival function r = 1 − F(V ).
(6) 2n
Here F(x) denotes the cumulative density function of the distribution. Now instead, having an MPC implementation based on moving horizon estimation, a valid question is whether one cannot use the same optimization code for the parameter estimation? Because the identification is made on a collected set of input and output data, an off-line identification would rather correspond to horizon estimation, since no moving window will be applied. This has been suggested in [17,30,31]. Notice that to use the chi-square-test the cost function has to correspond to a ML criterion. Hence, we need to investigate the relation between horizon estimation and maximum likelihood. The answer is that an ML criterion may be formed using horizon estimation, but in addition to the criterion (3) it will contain a bias correction term which (somewhat surprisingly) is exactly the same one as in (4), i.e. the ML-criterion becomes minVML = xk ,
N
(vTk R−1 vk + wkT Q −1 wk + log det(Sk ))
(7)
k=1
where R and Q have to be the actual covariance matrices and cannot be treated as weight matrices to be chosen by the user. See Appendix A for a detailed derivation of this expression in the linear case. If Q and R are unknown, they need to be estimated as part of the calibration procedure as described above, i.e. the elements of Q and R are treated as estimated parameters (see (9)). A common approach is to not include any process noise wk when performing the parameter estimation (see e.g. [30,31]), and only use the first term in (7) with fixed weighting matrix (often the identity matrix). This is, in the system identification literature (see e.g. [18]), known as output error (OE) and will also lead to unbiased estimates. Notice, however, that if we want to include the measurement covariance R as a free estimation parameter, then the bias correction term is indeed necessary, otherwise there is nothing preventing the covariance estimate from tending to infinity. There is thus a connection between the ML criterion (4) (or (7)) and the OE criterion in that yk − yˆ k in (4) is replaced by vk = yk − Cxk (xk computed from the noise-free (wk = 0) state equation (1a)) and the log det-term is omitted (Sk is independent of the model parameters when wk = 0). This means that the predictions yˆ k are typically
Please cite this article in press as: A.J. Isaksson, et al., Using horizon estimation and nonlinear optimization for grey-box identification, J. Process Control (2015), http://dx.doi.org/10.1016/j.jprocont.2014.12.008
G Model JJPC-1879; No. of Pages 11
ARTICLE IN PRESS
4
A.J. Isaksson et al. / Journal of Process Control xxx (2015) xxx–xxx
8
6
4
2
0
−2
−4
−6
0
50
100
150
200
Fig. 2. Example of one signal realization in the Monte-Carlo simulation with yk blue and uk red. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of the article.)
not optimal and that leads to higher variance of the estimates than in the true ML case, see Section 9.3 in [18]. However, if the system works in open loop, the OE-estimates are still unbiased (Section 8.3 in [18]). These observations are confirmed by the simulations in the next section.
Fig. 3. Monte-Carlo simulation comparing three different identification methods for first-order system.
ML Horizon estimation with the bias correction:
min VML =
xk ,a,b
subject to
In this section we will illustrate the theoretical result of the previous section by simulating a simple first order (linear) process for many different noise realizations. The chosen example is given by
yk
xk+1 yk
= axk + buk + wk
pk+1
= xk + vk
where both wk and vk are zero mean Gaussian noises with unit variance. The particular example a = 0.7 and b = 0.3 was simulated for 100 different noise realizations (one such simulation is shown in Fig. 2) and the optimal parameter estimates were found using three different criteria:
(wk2 + v2k + log(sk ))
k=1
4. A simple linear example
xk+1
N
sk
= axk + buk + wk = xk + vk =
a2 pk +1 (pk + 1)
= pk + 1
OE Output error, i.e. estimating the parameters from a pure simulation model minVOE =
N
a,b
(yk − yˆ k )
k=1
HE Horizon estimation is also the same as the maximum a posteriori estimate using a flat prior, i.e.
min VHE =
xk ,a,b
N
yˆ k+1 = aˆyk + buk
(wk2 + v2k )
k=1
The parameter estimates are shown in Fig. 3, where each cross corresponds to one estimate for one realization. From the figure it is clear that pure horizon estimation leads to significant bias in
subject to xk+1 yk
subject to
= axk + buk + wk = xk + vk
Table 1 Mean and standard deviation for the parameter estimates in the Monte-Carlo simulation example. Method
aˆ¯
¯ bˆ
(ˆa)
ˆ (b)
HE ML OE
0.79 0.69 0.69
0.23 0.31 0.31
0.037 0.048 0.11
0.043 0.052 0.096
Table 2 Summary of the drum boiler model constants and their values. Constant
Value
K1 K2 K3 A1 A2 A3
0.3 14 3.3 0.064 15 0.32
Please cite this article in press as: A.J. Isaksson, et al., Using horizon estimation and nonlinear optimization for grey-box identification, J. Process Control (2015), http://dx.doi.org/10.1016/j.jprocont.2014.12.008
G Model JJPC-1879; No. of Pages 11
ARTICLE IN PRESS A.J. Isaksson et al. / Journal of Process Control xxx (2015) xxx–xxx
5
Fig. 4. The input signals u1 (left) and u2 (right) used for the identification of the drumboiler parameters.
the parameter estimates, while both maximum-likelihood and output error are unbiased. However, as expected, ML clearly has much lower covariance of the parameter estimates. The mean and standard deviation for each case are given in Table 1, where it is clear that the bias of HE is statistically significant.
5. Suggested procedure for nonlinear models While the derivation and simulations above are for linear systems, the end goal is of course to use this new optimization approach for identification of nonlinear models.
Fig. 5. Results after simulation using nominal parameter values. Measured values (blue) and simulated (red). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of the article.)
Please cite this article in press as: A.J. Isaksson, et al., Using horizon estimation and nonlinear optimization for grey-box identification, J. Process Control (2015), http://dx.doi.org/10.1016/j.jprocont.2014.12.008
G Model JJPC-1879; No. of Pages 11
ARTICLE IN PRESS
6
A.J. Isaksson et al. / Journal of Process Control xxx (2015) xxx–xxx
Pars: Cost: Diff: Risk:
3.704e5 -
Pars: x 10, x 20, R1, R2 Cost: 12310 Diff: 3.580e5 Risk: 0
Pars: Cost: Diff: Risk:
Pars: Cost: Diff: Risk:
a4 8839 309 0
Pars: Cost: Diff: Risk:
tr 9144 3,80 0.05
x 30, x 40 9148 3162 0
Pars: Cost: Diff: Risk:
td 8656 491.8 0
Pars: Cost: Diff: Risk:
Pars: Cost: Diff: Risk:
tr 2470 322.1 0
Pars: Cost: Diff: Risk:
td 2457 12.70 0
a4 2792 1121 0
Pars: Cost: Diff: Risk:
Pars: Cost: Diff: Risk:
q1, q2 3913 5235 0
Pars: Cost: Diff: Risk:
tr 3912 0.4 0.53
Pars: Cost: Diff: Risk:
td 3647 265.7 0
td 2745 47.20 0
Fig. 6. Tree illustrating the calibration procedure. Step 0 at the top and Step 6 at the bottom.
For the nonlinear case without process noise the maximumlikelihood criterion still has an exact and very straight-forward derivation, since then the process model is deterministic and Sk = R. Hence the ML solution is obtained by solving the optimization problem (e.g. using IPOPT [29]): min VML =
xk ,vk ,R,
N
(vTk R−1 vk
+ log det(R))
subject to xk+1 yk
= g(xk , uk , ) = h(xk , uk , ) + vk
xmin
≤ xk ≤ xmax
min
≤ ≤ max
k=1
Table 3 Parameters (either given or estimated) after the different steps, estimating more and more parameters. Red values in Step 6 correspond to the results in MoCaVa manual.
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Step 6
x1,0 , x2,0
r1 , r2
x3,0 , x4,0
q1 , q2
A4
TR
TD
147.95, 30.28 138.70, 31.89 175.58, 27.75 43.38, 21.06 168.74, 33.52 156.30, 30.36 158.14, 30.70 148.17, 28.39
1, 1 157.18, 191.21 50.87, 25.00 1.01, 0.86 0.76, 0.97 0.99, 0.97 0.983, 0.982 0.962, 0.973
– – −15.50, −19.65 66.31, 104.1 −22.05, −20.32 −9.34, −8.08 −10.94, −9.90 –
– – – 3.01, 0.21 0.64, 0.042 0.095, 0.039 0.108, 0.0262 0.104, 0.0311
0.5 0.5 0.5 0.5 0.227 0.213 0.2124 0.261
30 30 30 30 30 16.3 16.43 16.49
250 250 250 250 250 250 292.8 285.3
Please cite this article in press as: A.J. Isaksson, et al., Using horizon estimation and nonlinear optimization for grey-box identification, J. Process Control (2015), http://dx.doi.org/10.1016/j.jprocont.2014.12.008
G Model JJPC-1879; No. of Pages 11
ARTICLE IN PRESS A.J. Isaksson et al. / Journal of Process Control xxx (2015) xxx–xxx
7
Fig. 7. Results after identification when x10 , x20 , x30 , x40 , R1 , and R2 are let free but the other parameters are fixed. Measured values (blue) and estimated (red).
Notice that this is still an important case, since if there are multiple measured outputs yk , one wants to estimate the covariance matrix R in order to get efficient parameter estimates and to be able to use the 2 -test during the calibration procedure. Another special case which may be solved without approximation is when the process noise only enters through linear state equations: nl xk+1 = g(xknl , uk )
(8a)
l xk+1 = Ak xkl + wk
(8b)
yk = h(xknl , uk ) + Ck xkl + vk
(8c)
Then the exact ML criterion is still min
xk ,wk ,vk ,Q,R,
VML =
N
(wkT Q −1 wk + vTk R−1 vk + log det(Sk ))
(9)
For the case with nonlinearly entering process noise there is clearly no exact maximum likelihood formulation. A possible approach, which we have not tested yet though, would be to linearize the model in each iteration of the optimization (similar to extended Kalman filtering), but only for the computation of the bias correction term. The rest of the optimization should deploy the full nonlinear model (although discretized). A full procedure could look like:
Model process, for example, in Modelica. Discretize symbolically and export equations. Linearize model symbolically. Import and prepare data. Carefully introduce noise variables at equations motivated by physical insight. 6. Solve by nonlinear programming (e.g. using IPOPT). 1. 2. 3. 4. 5.
k=1
where Sk should be calculated using the Riccati equation for the linear system {Ak , Ck }. An example of this special case is illustrated in the next section.
min
xk ,wk ,vk ,Q,R,
VML =
N
(wkT Q −1 wk + vTk R−1 vk + log det(Sk ))
k=1
Please cite this article in press as: A.J. Isaksson, et al., Using horizon estimation and nonlinear optimization for grey-box identification, J. Process Control (2015), http://dx.doi.org/10.1016/j.jprocont.2014.12.008
G Model JJPC-1879; No. of Pages 11
ARTICLE IN PRESS
8
A.J. Isaksson et al. / Journal of Process Control xxx (2015) xxx–xxx
Fig. 8. Results after Step 6, i.e. with all parameters free and estimated. Measured values (blue) and estimated (red). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of the article.)
subject to xk+1
example also used in the user’s manual for MoCaVa (distributed with the book [3]). The system has 2 states and 2 measured process values:
= g(xk , uk , ) + wk = h(xk , uk , ) + vk
x˙ 1
= K1
A1 u2 − A2 u1 x1 TD
xmin
≤ xk ≤ xmax
x˙ 2
= K2
min
≤ ≤ max
A2 u1 x1 − A3 x2 TR
y1
= K3 (A4 A2 u1 x1 + (1 − A4 )A3 x2 ) + v1
y2
= x1 + v2
yk
7. For every evaluation of VML calculate the (time varying) linearized system along the trajectory to use in Riccati equation to compute Sk . 8. Test which parameters to make free (including noise parameters) by hypothesis testing using the 2 risk calculation. 9. Repeat 5–8 until no further improvement.
where y2 is the total boiler power, y1 is the drum pressure, u1 the control valve position and u2 the fuel flow. For the optimization the state (differential) equations are discretized using an implicit Euler approximation yielding
6. Drum boiler example
x1,k+1
= x1,k + Ts K1
A1 u2,k+1 − A2 u1,k+1 x1,k+1 TD
To illustrate and test the grey-box identification of a nonlinear model using Horizon Estimation we will use the drum boiler
x2,k+1
= x2,k + Ts K2
A2 u1,k+1 x1,k+1 − A3 x2,k+1 TR
Please cite this article in press as: A.J. Isaksson, et al., Using horizon estimation and nonlinear optimization for grey-box identification, J. Process Control (2015), http://dx.doi.org/10.1016/j.jprocont.2014.12.008
G Model JJPC-1879; No. of Pages 11
ARTICLE IN PRESS A.J. Isaksson et al. / Journal of Process Control xxx (2015) xxx–xxx
where Ts denotes the sampling interval. According to [3] the model has three parameters which need to be estimated – A4 , TD and TR – and a number of constants whose values are given in Table 2. The parameters are constrained as 0 ≤ A4 ≤ 1 and TD , TR > 0. The calibration procedure was run using the identification data dboil.dat provided with MoCaVa. The sampling interval of the data is Ts = 1 s. The first step is a simulation where no parameters were fitted and the initial states were back calculated from the first measurement sample (see Fig. 4 for the drumboiler input signals and Fig. 5 for simulated outputs and the corresponding residuals). The full calibration tree illustrating the selections made in the various stages of the hypothesis testing is depicted in Fig. 6, and the corresponding parameter estimates are given in Table 3. After first releasing initial states and measurement noise covariances, at Step 2 of the calibration tree constant measurement offsets, x3 and x4 , are introduced as y1
= K3 (A4 A2 u1 x1 + (1 − A4 )A3 x2 )) + v1 + x3
y2
= x1 + v2 + x4
As can be seen in Fig. 7 the fit is quite good, but there is still quite an obvious drift in the residuals why the offset is instead modelled by two extra states with random noise
All in all, this test indicates that using horizon estimation for grey-box identification looks very promising. The only major concern is the computational time, since models with more states and parameters quite quickly result in very large optimization problems. 7. Conclusions The process modelling is by far the most time consuming part of an MPC project. When physical modelling is used a procedure for gradual extension of the model has been designed by Bohlin [3], and is an example of grey-box identification. This procedure is built around maximum-likelihood identification. In its original form the ML criterion is evaluated using (extended) Kalman filters. In this paper we have advocated that if the MPC implementation uses moving horizon estimation for the state estimation, this optimization code could perhaps be used also for the grey-box identification. Therefore, a central question treated in this contribution is the following one: Given a parameterized nonlinear model xk+1 yk
= g(xk , uk , ) + wk = h(xk , uk , ) + vk
x3,k+1
= x3,k + w1,k
xmin
≤ xk ≤ xmax
x4,k+1
= x4,k + w2,k
min
≤ ≤ max
can we reliably estimate the parameter with the “horizoninspired” minimization problem
The noise is modelled as having a diagonal covariance
Q =
q1
0
0
q2
(10)
This is exactly the type of situation referred to in the previous section, where the process noise enters via linear states that directly influence the measurements without going through the nonlinear dynamic equations. The correction term can be computed by adding two scalar Riccati equations p1,k+1 p2,k+1
r1 p1,k + q1 p1,k + r1 r2 p2,k = + q2 p2,k + r2 =
s1,k
= p1,k + r1
s2,k
= p2,k + r2
9
In the optimization these are just treated as equality constraints exactly as the model. Hence for 1000 data points we get (including initial values of x and P) a total of ca. 10,000 equations. In the results presented we have used zero initial covariances, but have also tried non-zero P0 as well as using the stationary Riccati equation (thus reducing the number of equations to ca. 6000). Similarly Step 2 – with only constant offset parameters – corresponds to only ca. 4000 equations. More precisely in our IPOPT implementation Step 2 has 3998 equality constraints and 4004 variables, while Step 6 has 9994 equations and 12,005 variables. On an Intel Core i5 they take 0.97 and 10.5 s, respectively, to execute. The simulation results after the full calibration procedure, which leads to opening up and estimating also the three model parameters, are plotted in Fig. 8. Now the residuals look very white, and the final parameters obtained in Step 6 are when compared to the ones from MoCaVa in Table 3 found to be very similar. The initial values of the additional states (x30 and x40 ) are not presented in MoCaVa why we cannot compare. The resulting calibration tree is also very similar, although we chose to introduce the extra states for drift earlier, which showed to give more significant reduction of the objective function than releasing model parameters.
min xk ,
N
[(xk+ − g(xk , uk , )) Q −1 (xk+ − g(xk , uk , )) T
k=1
+ (yk − h(xk , uk , )) R−1 (yk − h(xk , uk , ))]? T
(11)
We have provided the following answers to this question: • If there is no process noise, wk = 0, Q = 0 the answer is yes. The state estimation problem is then trivial and xk are computed by a (-dependent) simulation (“Output Error Method”). If v has Gaussian distribution, it will provide the maximum likelihood (ML) estimate. However, if R is unknown and has to be estimated a penalty term log det(R) has to be added to the criterion. • In case process noise is present, (11) will lead to biased estimates, even in the linear case (see Section 4). • In the linear case with Gaussian disturbance we can obtain the ML estimate by adding the term log det S(kk ) (from (2a) to the terms in (11). • In case process noise enters linearly in some states, and is absent in nonlinear states (as in (8b)), adding that term from the linear states still gives the ML estimate. (See Section 6.) • In the general nonlinear case it is reasonable to use (11) with an additive term log det S(kk ) (from the EKF version of (2a). The drum boiler simulation example from [3] was used to illustrate and test this horizon estimation based approach. In this case process noise is introduced in a linear fashion and the bias correction can be made without approximation. This is research still in progress and several issues are remaining such as • How to deal with symmetry and sparsity of P matrix in the general case, and the constraint that P should be positive semi-definite. • To investigate whether using stationary Riccati equations is typically sufficient.
Please cite this article in press as: A.J. Isaksson, et al., Using horizon estimation and nonlinear optimization for grey-box identification, J. Process Control (2015), http://dx.doi.org/10.1016/j.jprocont.2014.12.008
G Model JJPC-1879; No. of Pages 11
ARTICLE IN PRESS
10
A.J. Isaksson et al. / Journal of Process Control xxx (2015) xxx–xxx
• The relationship between discretization of the DAE and the bias correction. • How to treat the case where process noise enters nonlinearly. • Creating a user friendly software environment similar to MoCaVa which support for the calibration procedure.
To analyze the difference between the ML and MAP estimates, rewrite the MAP estimation density as ∝ p(y|x, )p(x|)p() = p(y, x|)p()
p(x, |y)
= p(x|y, ) p(y|) p()
(12)
MLdensity
For some of these topics we collaborate with tool vendors for the modelling language Modelica – Modelon and Wolfram-MathCore – within the European ITEA 2 project MODRIO. A first output from this collaboration is the Master Thesis by Palmkvist [21].
where it can be seen that MAP adds a factor p(x|y, )p() to the ML density. If the MAP solution should equal ML, p(ˆxMAP |y, )p() has to be independent of (ˆxMAP = argmaxp(x|y, )). Selecting a p() that x
fulfils this requirement corresponds to a non-informative prior - a prior that will secure an unbiased estimate.
Acknowledgements The first author is very grateful for the invitation to deliver the Sunahara Keynote Lecture at 2009 Symposium of Stochastic Systems (SSS’09) in Kobe, Japan, which triggered this whole research effort, and for the invitation to deliver a keynote at Dycops 2013, Mumbai, India which gave the sufficient excitation to complete this paper. The authors also gratefully acknowledge financial support from the Swedish Foundation for Strategic Research (SSF) – as part of the Process Industry Centre Linköping (PIC-LI), the Swedish Agency for Innovation Systems (VINNOVA) through the ITEA 2 project MODRIO, the Linnaeus Center CADICS, funded by the Swedish Research Council, and the ERC advanced grant LEARN, no. ∼267381, funded by the European Research Council.
Appendix A. Derivation of parameter estimation using Horizon estimation Parameter estimation with the maximum likelihood (ML) method generically gives unbiased parameter estimates. In the current context we seek simultaneous estimation of the states and parameters. Then the parameter estimate will normally not be unbiased. Intuitively, the reason for that is that the maximum of the joint pdf of parameter and states will occur for another value than the parameter values that maximizes the likelihood function, when the states have been eliminated by marginalization. It is shown in Exercise 7G.6 in [18] that adding a log det term with the output variance to the joint estimation criterion will secure unbiased parameter estimates in the linear case. This is also discussed in Section 14.3.2 of [27]. We will review that additional term below, in terms of a parameter prior, which is also valid in the general, nonlinear case, which we first study. The following densities are given as a model of the system: p(y|x, ), p(x|), and p(). We will generally treat normal, Gaussian distributions, which we write p(x) = N(x; m, P)
Linear Gaussian System The result above is here derived for the special case of a linear Gaussian system, where the data is given for a time window. Consider a linear state space model on the following form xk+1 yk
= Ak ()xk + wk , = Ck ()xk + vk ,
where the system matrices A and C are dependent on the parameter , x is the system state, y is the system output, w and v are zero-mean Gaussian process and measurement noise with covariances Q and R respectively. This system can be described in batched form over a time window, directly derived from the linear state-space form. In addition to moving horizon estimation, the batch form is often used in fault detection/diagnosis, see [5,10]. Following the calculations in Problem 7G.6 in [18], stack N signal values to define signal vectors T
T ) , for all signals except the input noise vector like Y = (y1 T , . . ., yN T
which is defined as W = (w0 T , . . ., wN−1 T ) . If the initial state is zero (the framework can easily be extended to nonzero initial states), the signals may be described as ¯ X = F()W,
(13a)
¯ Y = H()X + V,
(13b)
where
⎛
I
⎜ A1 ⎜ ⎜ ⎜ A2 A1 ⎜ ¯ F() =⎜ ⎜ .. ⎜. ⎜ N−1 ⎜ ⎝ A
n
⎞
0
0
···
I
0
···
0⎟
A2
I
···
0⎟
..
.
.. .
AN−1
I
..
.
···
0
⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
n=1
for a Gaussian random variable x that has mean m and covariance matrix P. The states, x, and parameters, , should be estimated given a measurement, y. This could be done either with maximumlikelihood (ML) estimation as
ˆ = argmaxp(y|) =
p(y|x, )p(x|) dx
or as a joint state/parameter maximum a posteriori (MAP) estimation p(y|x, )p(x|)p() ˆ = argmaxp(x, |y) = . p(y) x,
¯ and H() = diag(C1 , · · ·, CN ). The noise covariances are given by Q¯ := cov(W ) = diag(Q, · · ·, Q ) and R¯ := cov(V ) = diag(R, · · ·, R). ¯ ¯ = H(). ¯ To simplify notation, denote F¯ = F() and H Now, the Eq. (13) can be described as the following probability densities p(X|) p(Y |X, )
= N(X; 0, F¯ Q¯ F¯ T ), ¯ R). ¯ = N(Y ; HX,
The likelihood density is computed as
p(Y |) =
¯ p(Y |X, )p(X|) dX = N(Y ; 0, S),
Please cite this article in press as: A.J. Isaksson, et al., Using horizon estimation and nonlinear optimization for grey-box identification, J. Process Control (2015), http://dx.doi.org/10.1016/j.jprocont.2014.12.008
G Model
ARTICLE IN PRESS
JJPC-1879; No. of Pages 11
A.J. Isaksson et al. / Journal of Process Control xxx (2015) xxx–xxx
¯ F¯ Q¯ F¯ T H ¯ T + R. ¯ The calculations are analogous to those where S¯ = H made for the Kalman filter time update [11]. The distribution for the states in the window is computed as p(X|Y, ) =
p(Y |X, )p(X|) ¯ = N(X; KY, P) p(Y |)
p(X
|Y, ) =
(2)N/2 det1/2 P¯
.
which is clearly dependent of since P¯ is a function of . As we saw in the general case, (12), in order to fulfil that p(Xˆ MAP |Y, )p() is independent of the prior must be selected as ¯ p() = det1/2 P, which is the non-informative prior, or a correction term to make up for the lacking unbiasedness. The cost function to minimize for the MAP estimate is then given by − log p(X, |Y ) ∝ − log p(Y |)p(X|Y, )p() =
1 T ¯ −1 Y S Y + (X − KY )T P¯ −1 (X − KY ) 2 +
1 1 1 ¯ log det S¯ + log det P¯ − log det P, 2 2 2
where the prior cancels the covariance term of p(X|Y, ). The MAP estimate of X is Xˆ MAP = KY which will make the X-dependent term zero in optimum, the remaining terms are the same as for ML estimation. Another way of writing this cost function is as a function of the basic densities − log p(X, |Y ) ∝ − log p(Y |X, )p(X|)p() =
1 1 −1 ¯ T R¯ −1 (Y − HX) ¯ (Y − HX) + X T (F¯ Q¯ F¯ T ) X 2 2 +
1 1 1 log det R¯ + log det F¯ Q¯ F¯ T − log det P¯ 2 2 2
(14)
Notice that the first two terms after the equality sign in (14) correspond exactly to the moving horizon estimation criterion (3). For the three remaining terms in (14) observe that log det R¯ + log det F¯ Q¯ F¯ T − log det P¯ ¯ T R¯ −1 H) ¯ = log det S¯ = log det R¯ det(I + F¯ Q¯ F¯ T H where the last equality follows from the determinant identity det(I + AB) = det(I + BA) We thus have to add this log det term to the joint criterion. [It is of interest to compare with Problem 7G.6 in [18], where it is shown that the same term should be added to the joint criterion to make it equivalent to the likelihood function.] To compute S¯ = EYY T we apply the innovations form for the linear stochastic state space model (see e.g. [2]): x¯ k+1 yk
= Ak ()¯xk + Kk k , = Ck ()¯xk + k ,
N
log det(Sk )
k=1 −1
1
where Kk is the Kalman filter gain given by (2b) and k is the prediction error which has covariance Sk given by (2a). Utilizing a batch form similar to (13) we get log det S¯ =
¯ T R¯ −1 H) ¯ T S¯ −1 and P¯ = ((F¯ Q¯ F¯ T )−1 + H ¯ where K = F¯ Q¯ F¯ T H . For this result, the calculations are analogous to the Kalman filter measurement update. For its maximum Xˆ MAP = KY , the exponent is zero and the probability function evaluates to ˆ MAP
11
which concludes the derivation of (7). References [1] A. Alessandri, M. Baglietto, G. Battistelli, Moving-horizon state estimation for nonlinear discrete-time systems: new stability results and approximation schemes, Automatica 44 (7) (2008) 1753–1765. [2] B.D.O. Anderson, J.B. Moore, Optimal Filtering, Prentice-Hall, 1979. [3] T. Bohlin, Practical Grey-box Process Identification – Theory and Applications, Springer, 2006. [4] T. Bohlin, A.J. Isaksson, Grey-box model calibrator and validator, in: Proceedings IFAC SYSID, Rotterdam, The Netherlands, 2003. [5] A.Y. Chow, A.S. Willsky, Analytical redundancy and the design of robust failure detection systems, IEEE Trans. Autom. Control 29 (7) (1984) 603–614. [6] R.A. Delgado, G.C. Goodwin, A combined MAP and Bayesian scheme for finite data and/or moving horizon estimation, Automatica 50 (4) (2014) 1116–1121. [7] M. Diehl, H.J. Ferreau, N. Haverbeke, Efficient numerical methods for nonlinear MPC and moving horizon estimation, Lect. Notes Control Inf. Sci. (2009) 391–417. [8] R. Franke, L. Vogelbacher, Nonlinear model predictive control for cost optimal startup of steam power plants, at – Automatisierungstechnik 54 (12) (2006). [9] L. Grüne, J. Pannek, Nonlinear Model Predictive Control, Springer, London, 2011. [10] F. Gustafsson, Adaptive Filtering and Change Detection, 2nd ed., John Wiley & Sons, Ltd, 2001. [11] Y.C. Ho, R.C.K. Lee, A Bayesian approach to problems in stochastic estimation and control, IEEE Trans. Autom. Control 9 (1964) 333–338. [12] A.J. Isaksson, D. Törnqvist, J. Sjöberg, L. Ljung, Grey-box identification based on horizon estimation and nonlinear optimization, in: Proceedings of the Symposium on Stochastic Systems (SSS’09), Kobe, Japan, 2009. [13] A.J. Isaksson, Some aspects of industrial identification, in: Proceedings of the International Symposium on Dynamics and Control of Process Systems (DYCOPS’13), Mumbai, India, 2013. [14] S.S. Jang, B. Joseph, H. Mukai, Comparison of two approaches to on-line parameter and state estimation of nonlinear systems, Ind. Eng. Chem. Process Des. Dev. 25 (1986) 809–814. [15] T. Kailath, A.H. Sayed, B. Hassibi, Linear Estimation, Prentice-Hall, 2000. [16] N.R. Kristensen, H. Madsen, S.B. Jørgensen, Parameter estimation in stochastic grey-box models, Automatica 40 (2) (2004) 225–237. [17] P. Kühl, M. Diehl, T. Kraus, J.P. Schlöder, H.G. Bock, A real-time algorithm for moving horizon state and parameter estimation, Comp. Chem. Eng. 35 (1) (2011) 71–83. [18] L. Ljung, System Identification – Theory for the User, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, 1999. [19] H. Michalska, D.Q. Mayne, Moving horizon observers and observer-based control, IEEE Trans. Autom. Control 40 (1995) 995–1006. [20] K.R. Muske, J.B. Rawlings, Nonlinear Moving Horizon State Estimation, Methods of Model Based Process Control, Springer, Netherlands, 1995. [21] E. Palmkvist, Implementation of Grey-Box Identification in JModelica.org, Master thesis, Department of Automatic Control, Lund University, 2014. [22] J. Pettersson, L. Ledung, X. Zhang, Decision support for pulp mill operations based on large-scale on-line optimization, in: Preprints of Control Systems 2006, Tampere, 6–8 June, 2006. [23] S.J. Qin, T.A. Badgwell, An Overview of Nonlinear Model Predictive Control Applications. Nonlinear Model Predictive Control, Birkhäuser, Basel, 2000, pp. 369–392. [24] C.V. Rao, J.B. Rawlings, J.H. Lee, Constrained linear state estimation – a moving horizon approach, Automatica 37 (10) (2001) 1619–1628. [25] C.V. Rao, J.B. Rawlings, D.Q. Mayne, Constrained state estimation for nonlinear discrete-time systems: stability and moving horizon approximations, IEEE Trans. Autom. Control 48 (2) (2003). [26] J. Rawlings, B. Bakshi, Particle filtering and moving horizon estimation, Comp. Chem. Eng. 30 (2006) 1529–1541. [27] F. Schweppe, Uncertain Dynamic Systems, Prentice-Hall, Upper Saddle River, NJ, 1973. [28] T.B. Schön, A. Wills, B. Ninness, System identification of nonlinear state-space models, Automatica 47 (1) (2011) 39–49. [29] A. Wächter, L.T. Biegler, On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming, Math. Program. 106 (1) (2006) 25–57. [30] V.M. Zavala, C.D. Laird, L.T. Biegler, A fast computational framework for large-scale moving horizon estimation, in: Proceedings of 8th International Symposium on Dynamics and Control of Process Systems, 2007. [31] V.M. Zavala, Computational Strategies for the Operation of Large-Scale Chemical Processes, PhD thesis, Carnegie Mellon University, 2008.
Please cite this article in press as: A.J. Isaksson, et al., Using horizon estimation and nonlinear optimization for grey-box identification, J. Process Control (2015), http://dx.doi.org/10.1016/j.jprocont.2014.12.008