e
Computers (hem. Engng Vol. 19. No. 10. pp. 1031-1039.1995 Copyright © 1995 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0098-1354(94)00107-3 0098-1354/95 $9.5U+0.00
Pergam..
DECOMPOSITION ALGORITHMS FOR ON-LINE ESTIMATION WITH NONLINEAR MODELS J . S. ALBUQ UERQ UE and L. T. BIEULERt Chemical Engineer ing Dep artment . Carn egie Mellon Unives ity. Pitt sburgh . PA 15213. U .S.A .
(Received 8 October / 993; final revision received /5 A ugust /994 ; received for publi cation 22 Septemb er /994) Abstract-Dynamic sta te and parameter estim ation for nonlinear systems usuall y lead s to large nonlinear problems when th e govern ing differential con str aint s are discretized under a simulta neo us solution strategy . as th e optima lity conditions for each dat a set are coupled to the conditions of othe r data sets. This NLP problem grows not only in the number o f variables. but also in the number of degrees of freedom if an e rro r in all variables method (EYM) approach is used. In this paper we discretize the set of ODEs using a one-step integration meth od. and use the SOP method to so lve the resulting NLP. The optimality conditions for each data set at the OP subproblem level are deeoupled using an affine transform . so that the first-order conditions in the sta te and input variables can be so lved recursively and expressed as functions o f the optimality condition s in the parameters. thu s reducing th e size of the problem a nd turn ing thc effo rt of solvin g it linear with the number o f data sets . As see n in our examples. this approach is th er efore ove r two orders of magn itud e faster than ge neral purpose NLP so lve rs. This technique ca n also he viewed as an e xte nsion o f a decomposition strategy th at has heen succe ssfull y applied to optima l co ntro l prohlems.
I. INTRODUCTION
Data reconciliation and parameter estimatio n in dynamic systems plays an imp ortant role in plant ope ratio n because in man y practical situatio ns the process is continuously und ergoing changes. Classical regression assumes th at the control or input variables are error free . reducing the problem to one of parameter estim ation . However. this assumption is not always valid . An example of this occurs in the estimation of ph ase equlibrium where measurements of all pha ses are affected by experimental error. Thus there is a need to study the problem in which all mea sured vari able s are affected by random effects (EVM ). Thi s problem can be formulated as a nonlinear programming problem (NLP) in which an ob je ctive function which mea sures the size of the estim ated residuals is minimized . subject to the proc ess con str aint s. typically differential and algebraic equations. Although the NLP appro ach is computationally mor e inten sive . it has been found to be more robust and reliabl e than the extended Kalman filter (EKF) (Jang et al. , 1986). Th e greater the number of o bservatio ns, however. the larger the NLP problem will be . and onl y small problems have been solved directl y. Thus, much work needs to be done in o rde r to reduce the computational load and improve robustness. For t To whom all correspondence sho uld be addressed.
steady stat e problems. Britt and Luecke (1973) exploited the special structure of the lea st-squares ob jective function in order to optimize it with a Lagr ang e method . In their approach , the const raints were successively linearized with respect to the par amet er s a nd the measured variables. This method yields both parameter estimates and reconciled data simult aneously. In the method of Reilly and Patino-Leal (1981) the constraints are linearized about the reconciled data and a new estimate of the parameters is computed. A variable correction step is nested within the parameter estim ation procedure . The method of Valko and Vajd a (1987) is essent ially the same as Reilly and Patine-Leal's except th at the va riable correction ste p and the paramet er estimation procedure a re se pa rated resultin g in a more robust algor ithm. These algorithm s exploit the least- squares structure of the problem using linearized constraints. On many systems the y sho uld perform adequatel y. However, they may not be able to handle the severe nonlinearities found in many chemical engineering applic ations . Kim et al. ( 1990) used a two stage a pproa ch, as done by Valko and Vajda . except that the successive linearizati on solutio n for the variable correction step was repl aced by an NLP problem for added robustness. An other method they proposed was an NLP formul at ion . in which the data reconciliation formulation was nested in the parameter estimation formulati on . Tjoa and Biegler (1992) used a reduced
!O3l
1032
J. S.
ALBUQUERQUE
Hessian approach to successive quadratic programming (SQP), with the decomposition applied separately for each data set. Here the computational effort of this algorithm is linear with the number of data sets. However this method can only be used on steady-state problems, since there cannot be any coupling of the optimality conditions among data sets, except through the parameters. The difference between dynamic and steady-state EVMs are the differential constraints present in the process model. There are two competing approaches to deal with differential constraints in the NLP problem. In the sequential approach, the ODE system is solved within the optimization cycle. This approach is time consuming and no inequalities on the state variables can be included. In the simultaneous approach, the differential equations are discretized and added to the problem as algebraic constraints. It is potentially more efficient because it is an infeasible path approach. The advantages of the simultaneous optimization over sequential methods when applied to optimal control problems have been demonstrated (Biegler, 1984). However, the disadvantage of the simultaneous approach is that the EVM problem grows in size with the number of data sets and can get intractable quite easily since the effort of solving the problem is polynomial with the number of data sets when general methods are used. A comparison between simultaneous, sequential and linearization based methods has been made for the model predictive control of a CSTR reactor by Sistu et al. (1983). Without exploiting large-scale structure, they concluded that the simultaneous method is faster for smaller problems. However, if the number of inputs remains fixed, the sequential method takes a linear effort with the size of the prediction horizon, making it much faster for big problems. Some attempts have been made to take advantage of the problem structure and tailor the NLP solvers to the problem. Liebman et al. (1992) used orthogonal collocation on finite elements to discretize the ODEs and an SQP method for large, sparse NLP, taking advantage of the problem sparsity. Ramamurthi et al. (1993) linearized the DAE system at each sample point. and found an analytical solution for the initial condition by taking advantage of the problem's least-squares structure. They then applied an efficient two-level strategy to deal with noisy inputs. In this paper we will extend a technique used by Dunn and Bertsekas (1989) for optimal control problems, when the process is modelled by ODEs only. In a future paper the general DAE system will be treated. Also. this method allows for both cases where explicit and implicit integration methods are used to discretize the ODE system.
and L. T.
BIEGLER
2. PROBLEM STATEMENT
The objective function is a sum of functions which depend only on the states and inputs at one time, and the constraints are the discretized ODEs, using a one step explicit integration method. The problem can be stated as follows: N
min IN+I(ZN+I)+
2: I;(Z;, V,). ;=1
s.t.
Z;+I =!;(Z;, V;, 0), ZI=a,
i= 1, ... , N,
g(O)~O,
Z;Er;]l,n.
h(O)=O, V; Er;]l,n1, OE!JJl!',
f,: r;]l,n-,>r;]l,n (P1)
where Z; are the state variables, V; are the input or control variables and 0 are the unknown parameters. This problem has N. (n + m) +p variables and N. m + p degrees of freedom. The most commonly used objective function is the residual-sumof-squares. However, more general objective functions, based on the maximum likelihood estimation method and on the distribution assumptions of the measurement errors, could also be used. For instance, Tjoa and Biegler (1991) used a bivariate objective function so that gross error detection could be accomplished simultaneously with the regression. Problem (P1) is not restricted to simple Euler steps, but applies to anyone-step integration method. Consider the first-order ordinary differential system described by (1). If this system is stiff, then an implicit method should be used:
F(t, Z, Z', V, 0) = O.
(1)
The M-stage implicit Runge-Kutta method is described as follows (Brenan et al., 1989):
F(tn_1 + c.h, Zn-I +
i:
«, K;. V~~
a;j
j~1
10
0) = 0,
i=l, ... ,M, M
Zn=Zn-1
+
2: s,«; ;=1
h=tn-tn- I ,
V~)= V(tn-I +c;h).
(2)
where the Ks are proportional to the stage derivatives and have to be solved implicitly. If the input variables are assumed to be piecewise constant over the time intervals [tn - Io tn [: (3) Then the implicit Runge-Kutta method can be written as:
Decomposition algorithms for on-line estimation
These constraints apply to all one-step discretization schemes and are very general. They have the same form as in problem (PI) and the same strategy can be used, both for explicit methods and for implicit Runge-Kutta. Also, other integrating schemes such as collocation methods can be brought into form (4). Finally, in the absence of parameters. problem (PI) is an unconstrained N-stage discrete-time optimal control problem with Bolza objectives. Algorithms for solving this problem are discussed in Dunn and Bertsekas (1989) and Wright (1989). The parameter vector can easily be augmented to accommodate unknown initial conditions. Consider the following problem with no constraints on the parameters for simplicity: N
min IN+I(Z.'i+') +
1= J
i=l, .... N,
e
N
L ((Z" V,), i=\
S.t.Zi+I=[,(Z"Vi,e),
3.1. Line search A line-search method developed by Cuthrell and Biegler (1985) is used. This line search uses an augmented Lagrange function:
L= ~
ll,(Zi. V,)
-AI+ I(Z'II - f,(Z,.
u.. elj
+lYIIZ'+I-f,(Z"V,.&)II'J+IN+
(5)
This line search function possesses both global and local superlinear convergence properties. when used with an SQP algorithm.
3.2. Hessian update
(P2)
which is identical to (PI) except that the initial conditions are not provided. Problem (P2) can be converted into (PI) by augmenting the parameter vector as to include the initial conditions and by starting the horizon one step earlier: min 1"'+I(Z,v+I) +
Prior to the decomposition step (I) we briefly summarize the remaining steps.
At a stationary point. first-order conditions require the gradient of the Lagrange function to be zero:
L I,(Z,. V,),
S.t.Zi+I=j;(Zi,Vi,e),
1\13.\
i=J. ... ,N,
(P3) The parameter vector in (P3) is (e, 8,,) where Z" is a bias term for the initial conditions and e" + Z" are the initial conditions. (P3) is identical to (P2) and has the same form as (PI). (P3) has N. (n + m) + n + p variables and N. m + n + p degrees of freedom. 3. SOLUTION STRATEGY
Problem (PI) is solved by developing a decomposition strategy based on SOP, which is known to take fewer iterations than other NLP methods. However, a large OP subproblem may be very expensive to solve and we propose a strategy to solve the OP subproblem efficiently by taking advantage of the problem structure. The general framework for SOP is: (1) Solve a OP approximation of the problem to obtain the descent directions. (2) Perform a line search. (3) Update the Hessian matrix of the Lagrange function. (4) Return to (1) until convergence is achieved.
v
VL=VI N _ I+ LVL, "I
= VI N _ I +
L (VI,+ vc.z., I) + vgp + vgl'
O.
I
(6)
where C, are the constraints in (4) If the residuals are small, then for a least squares objective function, VI, will be small (because it measures the size of the residuals) and so will the multipliers /. I ' /I and v, assuming linear independence of the !!radlents. In that case. the Hessian of the Lagrange function will be approximated by ( 7)
This will be a Gauss-Newton (GN) approximation to the analytical Hessian of the Lagrangian. and it will be at least positive semidefinite. if the objective function is derived from the MLF method. However, this approximation can be a poor .mc If the residuals happen to be large at the solution. In this case, it is desirable to use an update method which does not depend on the size of the residuals. such as BFGS applied to the Hessian of data sell. ID order to maintain a superlinear convergence rate V'(V:)I
H'S(Sk)IH'
Hk+'=H'+~-'-"-'.-- " , ' . . . t
'(y:)I\;
(s')IH\'
(~)
S:=X:-I~X:. y'=vL,(Xi'I.A"ij-vLi(X:, ),:+ I). SO far, two ways of approximaung
where
the Hessian based on the magnitude of the residuals arc available. Unfortunately, the magnitude of the residuals is unknown before solving problem (PI). but the progress of the merit function can determine which approximation should be used. Here we usc a
J . S.
1034
ALBUQUERQUE
switc hing ru le first de rived by Fletc her and X u (1979) for unconstrained problems and ada pted to constrained problems by Tjoa and Biegler (1991). We define the follow ing switching parameter:
£k_ [k +1 ~=
~)
[k
where £7 is th e augmented Lagrange function (5) used in th e line search. For zero re sidual problems, ~ conve rges to 1, whereas for nonzero residu als , ~ co nverges to O. Based o n the se limiting behaviors the followi ng switchi ng ru le is used to decide which kind of updat e is to be used: BFG S if ~ ~ e, H; » . { GN oth erwise,
i=I, . . . ,N + l,
3.3. Solving the QP Th e corresponding OP subproblem for problem (PI) is:
j~ 1
[(a 1 -. l)T -' ~Z,+ (al)! - ' ~Vi+ -~ZrB\I~Zj ez,
st),
]
2
1
+ 2: /'iU!B 'n /'i U, +"2 so : 1J ~3/'i8 + /'iZTB j1 2~ Vj+ /'iZ TB jl)~8
+ ~VTB~3/'ie]
af
i\ T
+f;(Z" Vi, e) + ( ae} ~e , i= 1, . . . , N, ~ZI=O ,
g(8) + 'Vg(er ~e ~o, I! (e) + VI! (e )T~ e = 0, ~ZjE !JI.n,
/'iViE m lll ,
ou,
where the second-order information Blk is defi ned as follows :
a2L B'11 - (J 2Z;'
a 2L
B '22 . - -2-
2 Bi _ _a L' JJ - a2e '
a 2L B' - - -
a 2L B' - - 1 ) - az;ae '
a2L Bi - - 2) - aViae'
B\I Erat nxn,
B ~2E
B ~J Erat PxP,
B\2E m nxlII ,
BbE ffinxp,
B ~1 Erat lll xP.
12 -
ez.eu;
a v,'
(11) Here L is the Lagrange function, or more specifically : IV
L(Z, V, e , l ) = llHl(Zn+I)+ 2:
(P4)
u.
j~ 1
L, » I;(Z" Vi) - ll~ I[Z, +1 - j;(Zi' Ui, e)], l Effin
(12)
and A j , B, are the matrices:
A= -af;)T
, ( azi
'
B= ( -af;) T , aVi '
s,Erat nx m •
(13)
Thi s OP subproblem has N. (n + m) + p variables , N. m + P degrees of free do m an d it will grow as the number of data sets N becomes large , easily becoming too expe nsive to solve with a general OP sol ve r for large N . We deal with this by taking advantage of the st ructur e of the optimality conditions. A full derivation of our technique is given in the Appendix, but for now we will try to tra nsm it the gen eral idea : the op timality conditions for data set i are co upled to the ones of dat a set i + I th rough th e multipliers and the state var ia bles. On the othe r hand, there is a coup ling on all data sets th rou gh t he parameters. This suggests that the op timality conditions for the state va riab les , control variables and the multipli ers can be solved recursively and exp ressed as functions of the parameters. If such is the case , then the OP pro ble m (P4) could be tra nsform ed int o a much smaller and eq uivalent OP only in the spac e of the parameters, and the req uired transformation wo uld be linear with N since it is recursive. In order to decouple , a stra tegy used by D unn and Bertsekas (1989) for optimal co ntrol problems is adapted. The idea is to writ e the multipliers and the cha nges in the control variables as linear functio ns of the cha nges in the state variables: /'iV j = Nj~Zj+fl;~e+ n"
~OE '!Jl.I' ,
ali - e o/t m ,
BIEGLER
(10)
wit h 0 < e < 1. From previous experience (Tjoa & Biegle r , 1991), e=O.1 is recommende d. Thi s app roach is pa rticularly useful in problems where the residuals are small and the ana lytical Hessian is po sitive semidefinite . In the latt er case , the BFGS update would become ill-conditioned or singular. Note that other update formulas cou ld be used. Several q uasi-Newton methods such as PSB, Broyden and SRI updates hav e been studied by Kelley et al. (1991) and applied to problems with a similar structure .
N min2:
and L. T .
l ;= T,~Zi+ i,~e+ii '
(14)
Decomposition algorithms for on-line estirnation The above relations, once inserted into the optimality conditions will eliminate I'1.Z i, I'1.U j and I, but will allow for a backward recursive solution for N, rl" n" Ti , ii, i,. Once these relating matrices and vectors are computed, the constraints on the states of the QP subproblem (P4) can be used to compute the state and control variables as functions of the parameters in a forward recursive cycle:
I'1.Z,= l'1.2'i+ 1'1.2,MJ, l'1.2'j E ffi
n
,
1'1.2iE'ZIl
nx
!"
1200
00
'" X
<: ~
'" 0
N+I
min ~ [VtTI'1.X,+I'1.XTH,tlX,jM i= [
N+'
+"21'1.8 1 ~[tlXlH,I'1.X,11'1.0, 1"'1
st. g(O) + VTg(O)M"SO,
h(O)+Vh(O)TM=O,
c
"'"
400
z" +!;,(Z", 011) + 1'1.0 11 •
( 16)
( 17)
tlZ,,=O,
where
c..
o
200
300 Data
400
500
oDD
sets
Fig. 1 Solver performance in OP example
where N is the number of data sets and II 'c. I \ This problem has N + 1 degrees of freedom. FIgure I shows the performance of our method (Tl . RBO) compared with MINOS and OPSOL. Note th'll our method has performance that IS lineal Wllh .m increase in the data sets, while MINOS .md ()I'SOI increase quadratically and cubically re, pn lll l' l\ (the cubic result In OPSOL results from Ch·ksk\ factorization of the projected Hessian mat: \ .md the OR factorization of the basis)
5. NONLINEAR EXAMI'LE
The case of an adiabatic CSTR with .in lC\'ither rnic, first-order irreversible reaction I' descnbed This case was also treated hy Ramarnurtln
t 1'))
where Z" Z, and () are the dimcnsionlc-c, COIll'Clll ration, temperature and cooling jacket temperature respectively. The relation between the dimensionless quantities and the physical variables 1,·;lHlwn
0' o o
o
MI~JO~,
0:
•
Hif{Bf;
2000
'" e
/
'C
1
o
+ Zic)l,
":J: :::J
c..
s.t. Z,+1.1 = Z, 1+ hZ,,+hO, 2
DPSCX
3000,--,---------------,
In this example we solve a quadratic prohlem with the form of (PI) with our algorithm and compare with general solvers. The example problem IS as follows:
Zi+ I. 2 = - hZ, , + Z,
'LJRIJ( MINOS
:;)
4. QP EXAMPLE
~ 6uf+h [~' (2Zf
"
• •
(PS)
where I'1.X j = (I'1.Z j , tlU" 1'1.0). This problem is much smaller and easier to solve than (P4). Once the directions in the parameters 1'1.0 are determined from (PS), relations (15) can be used to compute I'1.Z j and I'1.U,. Finally, an efficient OP solver. QPKWIK, Schmid and Biegler (1994), was used to solve problem (P5). We note from (P3) that this decomposition method can also he extended to the estimation of initial conditions simply by introducing new parameters and extending the horizon by one step as described previously. In this case the following linearized constraints are added to the OP subproblem (P4):
min h
/ 100
tn
= -
600
~
1'1.0, E 7/i"'X!'.
Defining I'1.Xi = I'1.X, + 1'1. Xi 1'1.0 with I'1.X, Elf/." t '" 'I' and I'1.XT = [1'1.2'T 1'1. 01 ] and I'1.X,r = [1'1.2T 1'1.0T I], and inserting these into QP (P4), the resulting problem will be:
I'1.Z,
BOO
'C
(IS)
1
/1
1000
M
I'1.U,= 1'1.0,+ 1'1.0iMJ, 1'1.0,Effi"',
I OJ..
+hU
o
i= L ... .N,
1000
V 200
---
..
400
600
data
I •
(P6)
800
11.10(1
120(1
sets
Fig.' Sotvcr performance in n-actor " \
J. S. ALBUQUERQUE and L. T. BIEGLER
1036
Table 1. Solver performance in reactor example. IBM R6lXX) MINOS
TURBO I
CONOPT ----------
Data sets
Iterations
CPU (s)
Iterations
CPU (s)
Iterations
CPU (s)
147 118 221 265 447
118.5 164.2 691.2 768.9 2635.1
153 118 140
87.3 177.3 383.6
8 II 9 6 4 4 2 2
1.4 3.6 7.9 9.8 10.6 14.4 21.3 42.9
10
20 50 ]()()
150 200 500 1000
0 I Kuhn-Tucker and constraint tolerances in TURBO were 1 x 10- except for 10 data sets (5 x 10-') and 20 data sets (1 x 10·').
0.9
below. These are described in Ramamurthi et al. (1993):
0.8
e.;
y=--, RTf0
0.7
~
Zl simulated
VA
0
10
20
30
Time
Fig. 3. True values, measurements and reconciled concentrations for reactor example.
Z2 true values
2.5
Z2 with noise Z2 reconciled
1.5
0.5 +--~--r--~--r--~--j 30 10 20 o Time
Fig. 4. True values, measurements and reconciled temperatures for reactor example.
j)=--, pCp t.;
C
T-Tf O
0.5
(~H)Cf
,
Z-'-Co'
ic, Tf O'
Zl reconciled
'" N
Qo
0=--
Zl wilh noise
0.6
V =-k 0 e "
Zz=--y, Tf o
Tf - Tf o ZZf=-T-- Y' fO
T c - Tf O V=--Y, Tf o
Qo r=-t. V
(20)
This system was simulated by computing ZI and Zz over a time horizon of r = a to t = 20, with V set to zero, for all times, and with the feed conditions set to Ztf= 1.0, Z2f= 0.0. The dimensionless Arrhenius factor, reaction enthalpy, heat transfer coefficient of the cooling jacket, activation energy and flow (,j), 0, A., q) were set to 0.072, 8, 0.3, 20 and 1.0, respectively, and a Gaussian noise with a standard deviation of 0.02 was added to 2,,22 and U. The initial conditions on 2 1 and 2 z were set to 0.55 and 2.75 respectively. The state and parameter estimation was performed by discretizing (18) and (19) using an explicit fourth-order Runge-Kutta method: Z I, ]
Z,> [ Z2 ' 0.070
K\ =hf(Zj, Vi, 8),
-r-------------, U true values
0.035
U with noise
U reconciled ~
0.000
K~=hf(Zi+1KII Vi,
K~=hf(ZI+1K~,V', 8),
K 4=hf(Xi + K], Vi, 8), Zi+l = Zj+t(K; +2K 2+2K]+ K 4).
·0.035 ·0.070 +---~--,--~----j
o
10
20
Time
Fig. 5. True values, measurements and reconciled temperatures for reactor example.
8),
(21)
The unknown parameters were <1>, j), 0, A. and q which were initialized at the simulation values. As the objective function we used the sum of squares of the residuals in ZI, Z2 and V. The number of degrees of freedom is N + 5 where N is the number of data sets. This problem was solved for several
1037
Decomposition algorithms for on-line estimation Table 2. Performance of BFGS, GN and switching rule. IBM R6IKX) GN
BFGS Data sets
CPU (s)
Iterations
CPU (s)
Iterations
R II 9 6 4 4 2 2
Fail 2.2 4.5 RR 132 175 25.7 38.6
Fail 7 6 6 6 6 3 2
1.4 3.6 7.9 9.R
10 20 50 100 150 200 500 1000
Switching rule
io.s 14.4 21.3 42.9
CPU (s) 1.4 _'tJ 7,4
liR 10.6 14.4 213 42.9
ltcrations
k II li 6 4 4
-"
Table 3. Fixed initial conditions vs optimized. VAX 3200 Fixed initial conditions Data sets
CPU (s)
Iterations
Objective
CPU (s)
Iterations
51.R 155.7 251.1
4 4 3
9.03 x 10-' 8.04x 10-' R.53xlo-'
76.7 126.6 228.4
4
200 500 1000
data sets using our implementation (TURBO) and MINOS and CONOPT, A GRG solver. The last two solvers were used through GAMS (Brooke et al., 1988). Figure 2 and Table 1 compare the three solvers with increasing number of data sets and the same initialization. Note that CONOPT could not solve a problem with 100 data sets due to lack of workspace in our version of GAMS, It should also be noted that the two-level strategy of Ramamurthi et al. (1993) is generally more efficient than general purpose methods. On moving horizon problems with five data sets, their strategy was over 80% faster than a simultaneous SOP approach using NPSOL (Gill etal., 1986). Because it uses dense OR and Cholesky factorizations, NPSOL behaves similarly to OPSOL (see Fig, 1) and scales cubically with the number of variables. On the other hand, the two-level strategy applies NPSOL, but only to its outer problem which deals solely with the input variables. As a result, we expect that the two-level algorithm would also scale cubically with the number of input variables, and consequently, with the number of data sets. Figures 3-5 show the excellent agreement of the measurements and the reconciled data for 100 points. Table 2 compares the performance of both GN and BFGS approaches and the switching rule. Table 4. Convergence for far-away starting point on parameters. No noise IBM R6000
+ 50% Data sets 200 500 1000
Optimized initial conditions
from optimal value
- 50% from optimal value
CPU (s)
Iterations
CPU (s)
Iterations
17.8 53.9 79
6 7 5
IRO 600 119.4
6 8 8
Objective
RZ x 10-' 7.6 x [0-'
79xlo-'
The same problem was solved again when the initial conditions were uncertain. Table :3 shows the decrease in the objective function. CPU time and number of iterations when the initial conditions are optimized as in (P:3) relative to when they are fixed. as in (PI). When the initial conditions are optimized, two more variables and degrees of freedom are added to the problem. Finally. we illustrate the robustness of our algorithm by starting far away from the solution. Table 4 shows the number of iterations and CPU time when the data is simulated without any noise and the starting point for the parameters deviates hy ')()% and 50% from the true value. All runs converged to the true value of the parameters .
6. DISCUSSION AND COMPARISON OF RESULTS
A large number of data sets is generally helpful in the estimation problem. both to ensure statistical meaning and numerical accuracy of the ODE discretization, However, the NLP problem size grows linearly with the number of data sets and only small problems in this class could be solved using general NLP solvers within a reasonable computational time. In this paper we propose an efficient decomposition strategy in which the solution effort is linear with the problem size, allowing for the solution of very large problems. In fact, the largest problem (3007 variables) took less CPU time with our decomposition approach than the smallest problem (37 variables) with the general solvers This algorithm performed well on both linear and nonlinear cases both in efficiency and robustness. A cornpanson with the extended Kalman filter (EKF) can also be drawn if the appropriate objective function is
J. S. ALBUQUERQUE and L. T. BIEGLER
1038
used. With this objective function, the NLP formulation would yield the expected values of the states and parameters conditioned to the observations (Lee et al., 1992), as with EKF when used for smoothing estimates. The main advantage of the NLP approach to the Kalman filter is its robustness and a traditional drawback to NLP is the greater computational effort it requires. However, in our approach, this effort is only linear with the number of observations. Acknowledgements-Financial support of the first author from the Programa Ciencia, a scholarship program from JNICT, Lisbon, Portugal, is gratefully acknowledged. Facilities support from the Engineering Design Research Center is also acknowledged.
REFERENCES Biegler L. T., Solution of dynamic optimization problems by successive quadratic programming and orthogonal collocation. Computers chern. Engng 8,243 (1984). K. E. Brenan, S. L. Campbell, L. R. Petzold, Numerical Solution of Initial-Value Problems in DifferentialAlgebraic Equations. Elsevier, NY (1989). Britt H. I. and R. H. Luecke, The estimation of parameters in nonlinear, implicit models. Technometrics 15, 223 (1973). Brooke A., D. Kendrick and A. Meeraus, GAMS-A User's Guide. Scientific Press, Palo Alto, CA (1988). Cuthrell J. E. and L. T. Biegler, Improved infeasible path optimization for sequential modular simulators-II: the optimization algorithm. Computers chern. Engng 9, 257 (1985). Dunn J. C. and D. P. Bertsekas, Efficient dynamic programming implementations of Newton's method for J. unconstrained optimal control problems. Optimization Theory Applic. 63, 23 (1989). Fletcher Rand C. Xu, Hybrid methods for nonlinear least squares. IMA J. Numer. Anal. 7, 371 (1979). Gill P. E., W. Murray, M. A. Saunders and M. H. Wright, User's Guide for NPSOL (Version 4.0): A Fortran Program for Nonlinear Programming. Technical Report SOL 86-2, Department of Operations Research, Stanford University (1986). Jang S.-S., B. Joseph and H. Mukai, Comparison of two approaches to on-line parameter and state estimation of nonlinear systems. Ind. Chern. Process Des. Dev. 25, 809 (1986). Kim I. W., M. J. Liebman and T. F. Edgar, Robust errorin-variables estimation using nonlinear Programming Techniques, AIChE Jl 36, 985 (1990). Lee J. H., D. Robertson and J. B. Rawlings, Receding horizon state estimation and its application to model predictive control. Presented at 1992 AIChE Annual Meeting, Miami Beach (1992). Liebman M. J., T. F. Edgar and L. S. Lasdon, Efficient data reconciliation and estimation for dynamic processes using nonlinear programming techniques. Computers chern. Engng 16, 963 (1992). Kelley C. T., E. W. Sachs and B. Watson, Pointwise quasiNewton method for unconstrained optimal control problems II. J. Optim. Theory Applic. 71,535 (1991). Ramamurthi Y., P. B. Sistu and B. W. Bequette, Control-relevant dynamic data reconciliation and parameter estimation. Computers chern. Engng 17, 41 (1993). Reilly P. M. and H. Patino-Leal, A Bayesian study of the error-in-variables model. Technometrics 23, 221 (1981).
Schmid C. and L. T. Biegler, Quadratic programming methods for tailored reduced Hessian SQP. Computers Chern. Engng 8,817 (1994). Sistu P. B., R. S. Gopinath and B. W. Bequette, Computation issues in nonlinear predictive control. Computers chern. Engng 17, 361 (1993). Tjoa I. B. and L. T. Biegler, Simultaneous strategies for data reconciliation and gross error detection of nonlinear systems. Computers chem. Engng 15, 679 (1991). Tjoa I. B. and L. T. Biegler, Simultaneous solution and optimization strategies for parameter estimation of differential-algebraic equation systems. I&EC Res. 30, 376 (1991). Tjoa I. B. and L. T. Biegler, Reduced successive quadratic programming strategy for error-in-variables estimation. Computers Chern. Engng 16, 523 (1992). Valk6 P. and S. Vajda. An extended Marquardt-type procedure for fitting error-in-variables models. Computers chern. Engng 11, 37 (1987). Wright, S. J., Partitioned dynamic programming for optimal control. Preprint MCS-PI73-0890, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne (1989).
APPENDIX QP Decomposition Dropping for the moment the inequalities and equality constraints that arise only in the parameters, the QP subproblem to be solved (P4, Section 3.3) arises from problem (PI, Section 2):
N[(Ol)T (Ol)T 1 min~ O;i ~Zi+ O~i ~Ui+2~Z;B\1~Zi
+ ~Z;B\2~Ui + ~Z;Bi13~O
+~UrB~3M J OlN+I ) T 1 T N1 + ( OZN+l ~ZN+I +2~ZN+IBll+ L'!.ZN+[,
(PI)
af,) T +f,(Zi'U"O)+ ( ao M, i= 1, ... , N, ~Zl=O,
The necessary conditions for optimality of problem (PI) are:
ai+ Bh~Ui+ B\l~Zi-Ai+ A;+1''.i+! = 0, bi+B~1~Zi+B~2~Ui+B;Ai+I=0,
(AI) (A2)
~Zi+I-Ai~Zi-Bi~U,-Si=O,
(A3)
an+1 +B~+l~ZN+I-A.N+I =0,
(A4)
where
(A5)
Decomposition algorithms for on-line estimation The optimality conditions for data set i, are coupled to the ones of data set i + 1 through A, + I and !'1Zi + I' In order to decouple, the multipliers and the changes in the control variables are written as linear functions of the changes in the state variables:
!'1Vi= N, !'1Zi + ni(!'18),
(A6)
Ai = T,!'1Zi+ti(!'18),
(A7)
1039
Backward cycles
Fori=N, .... 2 Compute ii" iii' Ni using (AI5), (AI6) and (AIO) Compute i, i,. T, using (A17), (AI8) and (A12).
Forward cycle From (A4) and (A 7) the following terminal conditions can be derived:
TN+ I =B~I+I,
fN+ I =aN+l'
(A8)
Inserting (A6) and (A7) into (AI), (A2) and (A3), eliminating !'1Z i+ 1 by means of (A3) and by imposing:
M i!'1Z,+mi=O::}M,=O,
mi=O,
"IMi.mil,Zi (A9)
Initialize !'12i=0. !'1Z,=O. For i= 1, .... N. Compute !'10,. !'10, using (A20). Compute !'1Z" . !'1Zi + l using (A21). Once the relations between !'1Z" !'1V" !'1e are available, equations (AI9) can be used to rebuild OP problem (P4), by replacing them in the objective function. Here. we can group the directions in the variables:
the following recursive relations can be derived for N, and (A22)
n.: N i = -(B 22+ B;Ti + 1Bifl(B zl + BFTi 'IA,), (A1O)
Then !'1Xi can be expressed as a linear function 01 /),fJ:
ni(!'18) = - (B 22+ B;Ti+IBif\bi + BrTi+ ISi+ Brt" I)' (All) Repeating the same process but eliminating !'1Zi, (AI) leads to:
I
Ti= Bill + B\2Ni+AiTi+l(A,+ BiNi), t,(!'18) = a, + B12ni + AFti, 1 + AT+l Ti,
I(S,
from
!'1XF=[Ll.2,1
+ Bini)' (A13)
ni = ii,+l1 i M I ,
(A23)
where the components are given by:
(AI2)
Note that because s., ai and b, depend linearly on !'18, so do ni and f i • Now n, and t, can be written as:
fi=i,+i,!'18,
Ll.X, = Ll.X, + Ll.X, D.f).
01.
,,;Or
Ll.X,I=I!'1Z,1 Mil
i=2 . . .,N.
!'1X~+I=[!'1t~ 10 0].
(AI4)
II, (A24)
Ll.X("I=[!'1Z~"O/I
Ll.,\'l=[O
Mj~
01,
Ll.X}=[O
Ll.l.'i
II·
(A25)
The gradients can also be grouped: With these last equations and (AS), (A 11) and (A 13) we will get:
11,= -(B22+B;TdIBif l
X{:~,+BTTi'I[-Z'+I+f,(Zi'Vi,e)]+Bri,+l
VI~'I=lC~i\,'_:J
0
0]'
(A26)
(AI5) T
l1i= -(B 22+B,Ti+1B;)
along with the Hessian matrices:
r)
af; Bz)+B,T"lae+Bitd'
1('
I
i = 1.
(AI6)
•
t, =
ali
i _
az,+ B Iln,+A
r i til I
r I () B
H~'I =
(a
f; ) . . r: T fi=Bb+B'l1iii+Aifi+I+AiTi+l a8+BA, .
L
(A18)
Now fl i , iii' N" ill ill T, ean be computed recursively in a backward cycle using (A1O), (AI2), (AI5). (AI6), (A17) and (AI8) starting from the terminal conditions (A8). Once these are computed, !'1Z, and !'1Vi can be expressed as functions of !'18:
Ll.Z i=!'1Zi+!'12i!'1e, Ll.V,=!'10,+!'10,1'18. (AI9)
:
()
..
::o
(.'\27)
0
The Hessian matrices H, are to be approximated either by the BFGS or the GN method. Replacing (A20) and (A21) in the objective function of problem (PI) and grouping the terms using (A22), (A26) and (A27), the OP problem, without the equalities and inequalities in the parameters, will be transformed into: N
min
I
IVIT!'1Xi+Ll.X,JH,Ll.X,IMI
From (A3), (A6) and (AI9) we have a forward cycle:
!'10i=Ni!'1Zi+ii"
!'10i=Ni!'12i+fl"
(A20)
!'12i+l =A,Ll.2,+ Bi!'10, - Z" ,+ 1;(Z" Vi' e),
_
-
-
of,
!'1Zi+l=Ai!'1Zi+Bi!'1Vi+ f)()'
IN.]
+'2 !\.OJ I
[Ll.X,'H,!'>X,]1'18
(P2)
,-I
(A21)
The strategy for expressing !'1Zi and !'1V, as functions of !'1e is as follows:
Constraints in the parameters can now be added to the problem by adding the following relations to (P2i g(e) + V,(e)1 ,,;e 'SO,
Iz (/1) + Viz(e)' L\l1 =
(I
(A28)