The Chemical Engineering Journal, 23 (1982) 1 - 13 @ Elsevier Sequoia S.A., Lausanne - Printed in The Netherlands
1
A Family of Collocation Based Methods for Parameter Estimation in Differential Equations
NIELS BADEN and JOHN VILLADSEN Znstituttet for Kemiteknik,
Danmarks tekniske Hdjskole, Lyngby (Denmark)
Abstract A fully observed state vector is used to estimate the value of an unknown pammeter vector in a differential equation model. First the experimental data base is reduced by a least squares technique to a dimension equal to that used in the approximate solution of the model. Next collocation is used to solve the differential equation model and the collocation ordinates are compared with the experimental values using three different methods. One of these methods is previously described in the literature by van den Bosch while the two other methods are based on a different, but statistically reasonable objective function, giving parameters that are close approximations to the maximum likelihood estimates. The numerical opemtions involved in the three methods are first demonstrated, using a simple example which is worked out in detail. Next the relative merits of the three methods are compared in a computer simulation of three examples. Finally, in a conclusion a word of warning is given on inappropriate use of any of the three methods in situations where simpler alternatives are available.
INTRODUCTION
In a frequently encountered experimental situation the results are available as computer loggings of the process variables through each experimental run. The model for the system is a scalar- (or vector-) ordinary or partial differential equation, and the object of the data treatment is to extract statistically correct model parameters from the data file.
Standard gradient- or quasi-linearization methods for this purpose are described in the literature [ 1, 21, but the parameter estimation problem is notoriously time consuming even on fast computers. Collocation is a fast and accurate method to solve certain differential equation models [7], and in a number of papers a collocation based parameter estimation method originally proposed by van den Bosch [4, 5, 61 - probably inspired by an idea presented in refs. 3 and 8 - has been used to obtain a substantial reduction in computer expenditure. Unfortunately, the publications describing these original ideas are not quite satisfactory as regards to a proper formulation of the computational procedure. The statistical properties of the parameter estimates are unclear, and the methods are often used improperly on problems which are better solved by other techniques. In the present work a statistically sound van den Bosch type method (method A) is compared with the original van den Bosch method (method C), and both methods are related to a true maximum likelihood collocation based method (method B). Thus we hope to promote the use of these quite ingenious methods by removing some of the uncertainties that are now connected with their use. We presume that the data base is complete (in the sense described in the next section) and that it will have to be digested in order to give a reasonable computational procedure. The other extreme where the data material is deficient - such as is often the case in fixed bed reactors where only temperature measurements are available - can only be treated by method B which is consequently the most generally applicable collocation based parameter estimation method.
2 1. DESCRIPTION OF THE METHODS
1.1. Manipulations on the data base The j-dimensional parameter vector p in a differential equation o@*(t)) = 0 with the kdimensional state vectory* and the mdimensional independent variable vector t is estimated from a data base v* measured at discrete values of t = t*. We assume a. The measurement of t* is error free b. The experimental error of Uj* is normally distributed with expected value 0 c. The covariance matrix 02Z * corresponding to Y* is known except for the scalar u2 By collocation of g(y*(t)) = 0 one obtains approximate ordinate values y (r ) at a discrete set of t values = 1. ri is an N-dimensional vector of collocation point abscissas corresponding to the ith independent variable ti. With only one independent variable y is a vector of Iz-N interior collocation ordinates. In partial differential equations N may well be different for each ti. In general t* is different from 7, and we compute a synthetic data base 3 to enable comparison of an ‘experimental’ value with a corresponding collocation ordinate. We approximate the solution of the differential equation by a polynomial y(t) of degree N or N + 1 depending on whether side conditions are given at one or both ends of the interval. Whenever the number of data points n* is larger than the dimension of $ we have to use least squares to fit a polynomial q(t) of the same degree my(t) to the data points. q(t) should satisfy the following: a. Minimize the criterion function e = [v* - q(t*)]TI:*l[aJ*
-q(P)].
b. Satisfy the same side conditions asy*(t) and the collocation approximation y(t). u is related to q(t*) by the linear mapping q(t*) = Lv + I, where L and 1 are Lagrange interpolation coefficients modified to satisfy the side conditions. Minimizing 0 with respect to Y we obtain the linear mapping: P = ( LTz;lL)-lLTz;l(u*
-I)
(1)
The covariance matrix corresponding to v is 021: = ( LTz;lL)-lo2
(2)
In general Z will be a full matrix even if the original data are uncorrelated (Z, a diagonal
matrix). It is noted that if dim(v) = n*, then no information is lost by the linear mapping (1). A proper smoothing of the data by means of q(t) is possible only if all components of the state vector have been measured at sufficiently many and properly distributed points. The question of missing experiments for some components of the state vector y* and of the proper placement of experiments (choice of t*) is a serious one which will be taken up in the discussion of methods A - C below. 1.2. A simple collocation technique (Method A) The method of this paragraph is suitable only if the collocation equations Q&J) = 0 are linear in the parameters p. When applicable it is a conceptually simple, numerically efficient technique giving parameter estimates #3which satisfy sound statistical criteria. We use an object function #2 constructed from the residuals with the data base ti inserted instead of y : $2
=
K%v(~,P)IT~ON(~~P)
= pA4*,P)(pw (3)
The weight matrix is given by (4), and we shall show (in formulas 8 - 14) that the resultingl> is a close approximation to a maximum likelihood estimate bML for the unknown parameters.
(4) W is seen to be an estimate of the inverse dispersion matrix for the residual 0,. It is noted that W is only a weak function of p since @&/a_~) is completely dominated by the numerically big elements of the collocation matrices. We proceed to findb by minimization of #2 with respect to p. Assume for a moment that W is independent of p in which case the minimum for $J~ is found from WC&=Owithy=D Since ON is assumed to be linear inp
3
av=avV,Po)
+
f3
a2 is an estimate of u2 :
(P -PO)
1
s2 =
k-PO
n* - dim(p)
=
1 = n* -dim(p)
Since W is in fact a (weak) function of p we must interpret (6) as the first step of a linearly convergent iteration process Iji+l
-bO
II‘v* -
-z/&1
I&$)
-LLv^---I)+ II(v* n
;im(p) {l[v*- Lfl---I II:-1 * + -Y(ti)lI;Tz-l~ +
=
n* + 11 3
=
+ 2(v* - LP - Z)$lL( =
D --y(i))1
min(0) + min(G2)
n* -
P
t
(IO)
since wi
oiV
(7)
We may choosepo = 0 in which case a particularly simple W matrix is usually obtained (for a set of kinetic equations W = {AXA*}-’ where A is the discretization matrix for dy/dt). The first iterate p1 obtained with p. = 0 is quite often a good estimate for p. If, by a strange coincidence, v = y then C&(9, fi) = 0, and the result & obtained after one iteration in (7) is exactly equal to the desired parameter vector p. This situation does, of course, only occur if the ‘experimental’ points D were constructed by Nth order collocation solution of the model O(y*) = 0, but it is the reason for the rapid convergence of (7) when u2 is small and N large. For small o2 the covariance matrix corresponding to p is JwP-P)(P--P)*) E(GGY1( V -y)(
= I’-y)*z-l(G’)*G*)
(8)
where P and Y are the stochastic variables from which b and D are samples. G = Gi( i + -) of (7) and
Since a2x = E(( V -y)( cov?p) = GG’z-l(GG’)*
V --y)*) =
we obtain
2(v* - LO - r)*qlL
=-
OT
and L*xilL = z-1 dim(p) is the dimension of p and n* the original number of data points. We have foundi by minimization of a rather strange looking object function $J~of (3). At this point it is desirable to relate fi to the parameter estimate IjMIL,a maximum likelihood estimate obtained by the minimization of $Q in (ll), a logical appendix to the Nth order collocation approximation of the solution to the differential equation. $1 = 119-y&r
= (v^--y)TX-l($
-y)
(11)
Thus we wish to discuss two features of the procedure (7): (a) The object function c$~is not the same as the more desirable $J~ (b) @2 is not exactly minimized by the final iteratefi of (7). It will be shown next that neither objection to the procedure is significant: (a) @2 can be rewritten 42
=
IICM~, PM;
=
4
Here y1 is the solution of the collocation equations obtained after one iteration in a Newton-Raphson process using D as the initial guess for the solution y of the collocation equations :
la8,I
-1
y1
=v^-_
3
_.
\ OY ly=v
(13)
oh@,P)
For a linear d.e. y1 = y , and $~sis exactly equal to #.Q.For a non-linear d.e. y1 is different from y , but if the collocation order N is not unreasonably low or the data very poor that is if v is a good approximation toy then the first step in a Newton process yields a norm llyl -yII that is very small compared with llv^ -yll, which is of the order of the experimental error. (b) The correct differentiation of @z with respect tOpi gives
ati =
-
aPi
[
= qy,
aPi -@*Z--1
W(Y,P)
= Qvcvo,P)
+
+- ach
( 1 ay
1y=v^ ach 1ay
aoN + 6NT
20NTW -
cation equations and minimization of the object function & of (11). To obtain this goal the steps of the following procedure may be followed : (a) Guess a solution y. and po. The choice of y. is not difficult since y. = v*is an obvious first guess. The proper choice of p. may be more difficult, but a one-point collocation solution of the model or, even better, the first iterate p1 of the process used in Method A is usually sufficiently good to ensure that the following algorithm converges. (b) Linearize the residual Q&J, p) with respect toy at y = y.
(Y -uo)
-
(15)
%$i
-
-
aO,
aPi
+
In (7) we consistently neglect the last term in the square brackets, but again this is of small consequence when u is small and N reasonably high since in that case C&(P,lj) z 0 andy r y*. Thus, if an accurate parameter estimate is desired (large N and many data points) we also obtain a fi that practically minimizes c$~, and by the argument of (a) C#Q z C#J~ in this 1 . . case gmngfi = pML.
The first step of a Newton-Raphson process for computation of the collocation solution when p = p. is taken. The solution of W(Y,PO) = 0 isyl. (c) The sensitivity of y1 with respect to p is computed by differentiation of f&‘(y,p) = 0. Equations (16) and (17) are the two equations fory, -y. = Ay and for ayJap:
aoN
i-1 ay
AY
= -%Yo,
PO)
(16)
Y0*P0
!s$ =0 =-
1.3. A collocation based Gauss-Newton method (method B) In method A the state vector approximation was never obtained beyond the first iterate y1 in a Newton-Raphson process on the collocation equations. We have shown that the resultingfi is not necessarily a less accurate estimate for fi,, when u is small, but finite and N reasonably large. We want to avoid the complete solution of the differential equations at each experimental t value, and the collocation method may help us in this respect, but a completely consistent collocation analogue to the standard Gauss-Newton parameter fitting process involves simultaneous solution of the N collo-
ON(Y, P)
Y,.P
+
(g& ..(y&=
aON -
i aP
1
(17)
Y,.Po
Note that the coefficient matrix of Ay and of dy,/dp is the same. Equation (16) is solved fory, which is used to compute the r.h.s. of eqn. (17). (d) Solution of eqn. (16) for variousp, will giveyl(p). If the d.e. model is linear iny (15) is exact, and we might directly find ljhlL by minimization of @I = 1l.v-ylli-1 using the relation between y, and p found by repetitive solution of (16). Now Q,, is rarely linear in y and a complete minimization of C& is unjustified sincey, is not the solution of the collocation equations. To obtain an estimate of i we linearize y from y 1
5
simple example to compare it with the methods of sections 1.2 and 1.3. Consider the functional
AP and @l
-#lJ=
2
I/
+,,-$A,
II
(18)
r-1
The minimum of r$l’ is at
N+l(t))l
2Wt) dt
(21)
Here yN+ 1(t) is a polynomial approximation to the solution of o(y) = 0. The polynomial is defined by either N + 1 or N + 2 ordinates, N of which are situated in the interior of (0,l). The endpoint t = 1 may or may not be used in the definition of yN+ 1(t). W(t) is a positive definite, but otherwise arbitrary weight function. An approximate solution of o(y) = 0 may now be constructed by minimization of LSQ with respect to the ordinates defining the interpOlatiOn pOlynOInid yN+ 1 (t). This method is described in ref. 7 [chapter 21 under the heading “Least squares MWR”. Let W(t) = tP(l - t)” with (ar,p) > (-1, -1). With this choice of W(t) an optimal quadrature for LSQ is readily available. The quadrature may be based on interior points only, on interior points and t = 0 or t = 1, and on interior points and both endpoints. [Ref. 7, chapter 31 gives the location of the N optimal interior quadrature abscissae as the zeros of PN@““)(t) where ((Y’, 0’) = {(ar, p), (a, p + l), (a+l,p),(a+l,fl+l)}inthefourcaseslisted.
p)Tz-1 (!?&q-l(+!l~T:-l(y^ -ul)
(19) As the next point of linearization for ON in eqn. (15) we use PI
jKXr 0
AP=PI-PO= =
LSQ=
=PO + AP
and AP
(20)
Repetition of steps (b), (c), (d) gives an approximately quadratical convergent iteration to the simultaneous solution of the N-collocation equations and the minimization problem for $J~.We avoid much unnecessary computer work by this stepwise double linearization process (16), (18). The solution of the collocation equations is not found with too great accuracy when p is far from j&, and we do not minimize $r with greater accuracy than required by the accuracy of y . Since the collocation solution of a d.e. is often much faster than the finite difference methods used in standard Gauss-Newton parameter fitting methods, and since we treat the iterative solution of the complete problem more reasonably than most standard routines the total computer expenditure is often several orders of magnitude smaller than that required by standard routines. 1.4. The method of Bruno van den Bosch (Method C) Van den Bosch and Hellinckx [4] have presented a parameter fitting method based on the principles of MWR (methods of weighted residuals). In a review paper [ 51 van den Bosch expounded the merits of the method under the heading of ‘orthogonal collocation’. Since this label is likely to cause some confusion concerning the numerical basis of their method we shall presently use a very
LSQ - g
[O(YN+l(tk))l
2wk=
k=2
=(%I
di%$‘J)&
(22)
Nl is equal to N, N + 1 or N + 2, again referring to the four above listed cases. o(yN +l(tk)) contains derivatives of y at the quadrature points tk, and these are expressed as weighted sums of the interpolation ordinates yN +1. Thus ON, is an algebraic expression in the Nl quadrature ordinates and in the endpoint ordinates. diag(w) is a diagonal matrix of positive numbers, the quadrature weights wk. The side condition(s) of 0(y) = 0 can be used to eliminate y( t = 0) and (for boundary value problems) also y(t = 1) from the weighted sum of ordinates that appear after discretization of derivatives in 8 N1. The smallest possible value of LSQ is zero. This is obtained when the unknown ordinates are determined as follows:
6
(a) Two point boundary value problem: Nl = N, t = zeros of PN(**P)(t) and ON1 = ON = 0 at t. (b) Initial value problem : Nl = N + 1, t = zeros of PN @+l*B)(t) and t = 1. (?$,,, = 0N+1 = Oatt. However, in van den Bosch’s method [4] a Lobatto quadrature Nl = N + 2 is used in all cases, and there is at least one equation in excess of the number of unknown ordinates. Consequently min(LSQ) > 0 and the N or N + 1 unknown ordinates are fitted by least squares. Assume that o(y) = 0 also contains a jdimensional parameter vector p, while a data base v^is available at the quadrature abscissas t = zeros of PN(OL+l* @l)(t), D is constructed by the method of section 1.1 from a sufficient number of data points v*. 0 is inserted fory,+l in (22) which is subsequently used to determine p. If U,, is linear inp (i.e. 6 = 0(p = 0) + (a 0 /ap)p) an explicit expression for the parameter estimate bB is obtained IjBC-_
ao,T d@(w)
I(ap 1
T aoN -l a6,, -
-
~ag(W)ONlV,P = 0)
ap
li ap 1 v
(23)
The structure of (6) with p. = 0 and of (23) is the same and one iteration of (7) involves the same computation as that required by (23). The two methods differ in the choice of W the choice of method A leading to a standard statisticalestimate of p - and also in the different approach to the use of endpoint ordinates. Both methods include the boundary ordinates in the interpolation formula for q(t), but method C also includes a collocation equation at each boundary. This may lead to considerable difficulties. Take as an example an initial value problem. If we have an experimental value fory (t = 0) or if we trust that y (t = 0) is known exactly s (e.g. y(O) = (1, 0,O. .) no difficulties arise, but quite often the initial state of the reaction mixture is unknown and concentration measurements are taken only at t > 0. In method A the unknown y (t = 0) enters only through the linear combination of ordinates that results after discretization of derivatives at interior collocation points. In method C it also enters non-linearly (with p) in the collocation equation at t = 0, even though the original d.e. is
linear in p, and the simple algorithm (23) cannot be used. This particular difficulty would have been avoided if only interior quadrature points and t = 1 (for the initial value problem) had been used in (23). Even with this rather sensible modification of method C there remains the question of the choice of W in method C which leads to an obscure statistical estimate for p and complicated formulas in the computation of cm: Z&&
= H-1H1ZH1TH-1s2
(24)
[ref. 5, p. 5761 where H=
(25)
HI = s2
=
(* -y~+r)~x-l(+
-Y~v+~) + myin
n * - dim(p)
(26)
As noted under (7) we may also here obtain the ‘true’ parameter vectorp - namely if B = = the solution of O(y*) = 0 by the least YN+l squares MWR that is the basis of method C. We note that this is almost the case if a2 is small and N large.
2. DEMONSTRATION SECTION
OF THE METHODS
OF
1
Parameter fitting built on top of a collocation approximation of the d.e. is likely to be considered a somewhat complicated operation, and we wish to demonstrate the mechanics of the three methods of the previous section on a simple example. Consider
dy*
=-p(~*)~ dt
with y*(t = 0)E 1
(27)
We approximate y* by a polynomial of degree N + 1 that passes through t = 0, y = 1. The choice of the remaining N + 1 interpolation points will be discussed in section 2.4. The experimental data Y* is converted by the method of section 1.1 into synthetic data points Vj taken at the same abscissae. The N + 1 collocation equations are
7
ON =Ay+by(t=0)+py2
=0
(28)
The discretization matrix A* for N + 2 interpolation points is computed by the method in ref. 7 [p. 131 - 1351. A is the lower (N + 1) X (N + 1) block of A* and b is the first column of A*, except for the first element.
Pa = -[(I?~)~ diag(w)(v^2)]-1(02)Tdiag(w) (A+$)
2.4. gi determined from the analytical solution of the differential equation The analytical solution of the differential equation is
2.1. Method A 42
= @Not =
(37)
1
P))TWQdt
PI
y* =1+pt
=
IIAv^+ b + pB2112
(29)
W
91* =(y*
- V*)rZ;l(y*
-v*)
where yf = y*(tT) for i = 1,2,. estimate p = fi from where
f!& =OT
1
-,,*)Tz;l
( p
i
-
aY
= A + 2p diag(u)
(30)
ei+l =-{(P2)TW$2}-1(G2)TW($~)(AC
+ b) (31)
where dy? _=dp
. . , n* and we
tr
(1 + ptT)2
which is (7) with p0 = 0. W,, = W(p = 0) = (AzAT)-l
(32)
3V(p = Gs2 where the scalar G = ((+2)TWv*2}-1 and g =
1
n*-1 i
where the estimate s2 for a2 is determined from
min#2 +mVinS P
(33)
1
2.2. Maximum likelihood collocation method (Method B) The equations corresponding to (16) and (17) (ys = P) are: (A + 2 PO diagW)(yl
-YO)
= -Qdt
po)(34)
dy, (A + 2p, diag(0)) dp = = -(g2
+ 2 diag(P)(y, - 3))
(35)
(34) is solved for yl and (35) is solved for dyl/dp using yl from (34). Next Ap is calculated from (19) and the new linearization point for 0, is determined from (20). 2.3. Method of van den Bosch (Method C) oNI (v, p) = A*O + p02
(36)
Here A* is the full discretization matrix based on t = 0, t = 1 and the N zeros of PN@+’ @+l)(t). v(t=O)1. l
g
= 91*(P = 6)
n*-1
’
For a general j-dimensional p we may compare methods A, B and C by the following formula for the covariance matrix.
where Y is defined as follows: Analytical solution of d.e.: Y = analytical solution at the n* measurement points. Method B: Y =Nth order collocation solution at the collocation points. Method A: Y= y1 defined in (13) - from this follows an interpretation of G in (9). Method C: The expression (24) for ( l/s2) cov@) cannot be reduced to the form (38) since fi is not even an approximate maximum likelihood estimate. Numerical example ‘ Determine p and estimate the variance V(p) from two experimental points. v* = (0.6,0.5)
8
TABLE1 Method
7
i
82
G,
am
A
l/2,1 l/3,1 P2'0*0'(t) =0
1.2570 1.1380 1.1364
0.0052 0.0046 0.0021
0.115 0.060 0.025
0.34 0.25 0.16
B
l/2,1 l/3,1
P,(O*O)(t) = 0
1.2583 1.1373 1.1368
0.0052 0.0046 0.0021
0.091 0.051 0.023
0.30 0.23 0.15
C
(0,0.5,1)
1.1354
0.0022
0.014
0.12
Analytical solution
(0,0.5,1)
1.1441
0.0024
0.028
0.17
at t* = (0.5,l). u*(O) = 1 = y*(O). Z, = diag(Z). In Table 1, results obtained by the three collocation methods are compared with p from the analytical solution. Y* is at the right position for immediate use of method C (with the reasonable choice (II= p = 0) since the zero of P1o* ‘j(t) is t = g. The Lobatto quadrature weights are 1 (1,4,1) (= the weights of Simpsons formula‘I . Thus for method C: -3
4
-1
Z*=Z=
HIT = *{A*
+ 2fi,diag(v)}
diag(l,4,
l)u2 =
initial guessy,i
= (0.6,0.5)
= v*:
wb~Nl(~Nl,
6B) (i) i
Finally s2 = 0.0022 by formula (26). , Methods A and B consistently give almost the same result for fi, as expected from the discussion of these methods. The location of the collocation points is quite important in the present low order approximation for the profile. T = (l/2,1) is definitely not the best choice. If we insist upon using t = 1 as a collocation point the other collocation point should be at the zero t = l/3 of PI a+ O)(t) as shown in ref. 7 [p. 1581. A synthetic data point at t = l/3 is found by interpolation from ref. 7 [p. 1061 t(t - l)(t - l/2) qN+ltt)
=- 1 6
H-’ = {i(v2)’
diag(l,4,
l)v2}-l
6 =1.5809
and by formula (25) : 15.3535
VGI=1.580g2
*
s2=6143s2 ’
The difficulty lies in solution by a least squares technique of the three collocation equations for y(1/2) and y(l) usingp = & = 1.1354. The following iterative scheme converges toy,, = (0.6376, 0.4719)T for an
=;
1
(t-ti)[t(t-l)(t-11/2)]$1,)t,
u*(ti)
The estimate for p is considerably improved, but V(p) is much larger than obtained from the analytical solution. The best estimate for p with two experimental points is found if T is chosen as the zeros of P2(O-O)(t) (i.e. t = (1 + &f/3)/2) and the interpolation formula is used to fabricate u values at these points. Note that no ‘interpolation error’ is introduced by the linear mapping (1). Our choice of 7 is determined solely by a desire to obtain an accurate solution of the differential equation. When properly used the three methods are seen to give almost the same results as the
9
estimation of p from the analytical solution. Consequently s2 is a measure of the experimental error, no extra error being introduced bxthe approximate solution of the model. V(p) for methods A, B and the analytical solution is the same while method C underestimates the variance of I; - a somewhat unhealthy sign. Much more serious objections to method C are raised in the appendix.
TABLE 2
-
n
N 2 3 4 6 10 12
V(Ph
IF-PI
-
0.6 0.004 0.008 0.0001 <10-s <10-s
22.6 7.47 1.37 7.60 8.66 -break
02
V(P)
-
c? 63.0 8.26 6.75 6.48 6.50 down -
VP) lim 2 a+0 U
F
21.4 6.80 6.34 6.48 6.48 6.48
3.27 1.06 1.05 1.07 1.17 -
3. SIMULATION RESULTS
Having described the theoretical background for three variants of a collocation based parameter estimation method, and having illustrated the mechanical operations that are necessary to obtain the statistical quantities associated with the estimates, we shall finally present some results from an extensive empirical study of the three methods as they work in practice. ‘Experimental’ data v* were generated from the exact solution of the models and overlaid with Gaussian white noise. In some cases many time series, each of n* ‘measurements’ were generated to give average values p of @, average covariance matrix estimates c=(p) based on one series of measurements and finally an observed covariance matrix i&o. 3.1. Second order irreversible reaction. y(0) f landp=2 Table 2 shows results for method A. 12 equidistant ‘experimental’ points v* with u2 = 0.01 were used to generate v at an N component 7 vector consisting of the zeros of P$$!& t) and rN = 1. The simulation was repeated 100 times to obtain fi = 0.01 100 z Iji,
%Obs
1
100
V(p) = 0.01 z
= 1 100 7 (Pi--iQ2, ~ 99
GiSi2
V(Xot#*
=l+gg
6 equidistant points V* with u2 = 10d6, N = 100 runs
n
(P-P)2 100 f:
2
(39)
In Table 2 the result of data condensation is studied. When the 12 data points are condensed to 2 points (at t = l/2 and t = 1) the
n* = 6.
Ti
wPhsl~2
V@)/u2
F
Equidistant collocation
13.7
21.8
8.32
of 13.2
12.3
1.00
t = 1 and zeros P5’0*O)(t)
collocation approximation of the model is too inaccurate with the non-optimal collocation points chosen. On the other hand the use of N > 4 is entirely wasteful (due to the relatively large experimental error u = 0.1) and at N = n* = 12 the algorithm (7) of method A does not converge from p. = 0. The centrality of the estimates is seen from the lp --pi column to be excellent and F < 1.1 - 1.2 corresponds to a trustworthy statistical result with no skewness of the estimates. We have included a column that shows the limit of V(p)/u2 for u + 0. These values are calculated from G of eqn. (9) or by eqn. (38) after insertion of the N-point collocation solution for the correct p (= 2). It is a measure of the error of the collocation solution and for N + 00 and 12 equidistant points it should approach /(5+(Z)/-’
1
F=1+(P-P)2
TABLE 3
=6.43
....
This limit is approached for N = 6, indicating that no further accuracy is obtained in p by increasing N beyond 6. If u2 decreases to 10M6 as in Table 3 equidistant collocation at the original 6 data points can be used whereas this method breaks down for N > 4 when u 2 = 10T2. The second line of Table 3 does, however, show
10
that by far the best results are obtained when the data base is transformed to conform with collocation at zeros of an orthogonal polynomial. In both tables better results would have been obtained if the data base had been transformed to (u,, u2, . . , uN) at zeros of PNto*O)(t). The difference is, however, small forN> 3 -4. Method B (results not shown) give virtually the same results as method A, but convergence is also obtained in the bottom line of Table 2. 3.2. Laminar-flow tubular reactor The linear PDE (40) has been studied by Seinfeld [8], van den Bosch [4], [5] and by Cleland and Wilhelm [9] with the purpose of determining either or both parameters k and b: 1
0;
a a~
(rg1
-ky*
=(l_-r2)$Y
(40)
y* =latx=O,
aY*
-=Oatr=Oandatr=l. ar If only measurements of F(x) = 2j1(1 - U)y* (x, U) du (U = r2) are available it be%omes exceedingly difficult to estimate p and Fz together [91 since Y is a very weak function of p. Consequently k can be well determined even with a rough estimate for 0. If the measurements are taken along the tube wall and also to a certain extent if the measurements are taken at different radial positions it becomes dangerous to use a rough estimate for p from generalized formulas, since close to u = 1 the profiles change sharply with p [9]. To avoid distortion of the k value it is better to include both parameters in the estimation. Van den Bosch treats measurements of ji and of tube-wall y-values in [4, 51 and of y at different r-positions in [4]. He assumes p = 0.1 and determines k. We have generated the exact solution for p = 0.1 and k = 1 by the method of ref. 7 [chapter 4.31 using radial collocation at u = 1 (not optimal!) and at zeros of P4(O*O)(u) followed by matrix diagonalization and ‘exact’ computation of y* by means of the collocation eigenfunctions. The ‘exact’ results were corrupted by 10% relative white noise, and the data positions were those used by van den Bosch, ref. 4 [table 51, at zeros of (1 - r)rPdt2* 2’(r) and of
(1 -x)Pp 3, (x). Subsequently p and k were determined using either method A or B and Tl = zeros of (1 - u)P,(O* O)(u), 72 = zeros of (1 - x)P,(O3 O’(x). It appears that van den Bosch by method C obtained a standard deviation of 19% for the average estimate (= 0.953) of k in 16 subsequent runs. This corresponds to V(k) = (0.953. 0.19)2 (16 - 1) = 0.49 which is several orders of magnitude larger than our estimate V(k) = 3. 10m4 obtained either by method A or B, for known p = 0.1 or for simultaneous determination of /3 and k. Our average estimate fork after 16 runs is less than 1.5-10-3 from the true k. The simulations were repeated with different numbers n,* and nx* of equidistantly spaced ‘experimental’ points transformed to the zeros of (1- u)P$O* O)(u) and (1 -x)P30* 3)(~) as above. We find that the variance of 0 and of 6 decreases approximately linearly with the product nr* n,*, indicating an approximately equal importance of measurements in both directions. It is not surprising to find that the best x-position for one set of radial measurements is at the reactor outlet x = 1. A situation with only two radial measurements at r = 0 and at r = 1 taken at 10 equidistant axial positions was also considered. The noise level was high (5% relative), but methods A and B both gave quite satisfactory results when N = 4 - 10 axial collocation points (zeros of (1 -x)B~;‘(x)) and two radial points u = (1+ J3/3)/2 were used. For N > 4 V(k) - 2*10e4 and V(p) - 2*10e5. Van den Bosch also recommends that method C is used in situations with very sparse experimental data - e.g. at r = 1 only or of y at different axial positions. He must include a number of ‘unknown’ y-values at the collocation points and the model solution for these unknowns is determined simultaneously with the parameters. This is entirely unreasonable. The explicit formula (23) cannot be used and the unknowns must be found by a multidimensional search routine whereby the computational attractiveness of the method is lost. Michelsen [lo] fits the wall heat transfer coefficient and a radial conductivity in a heat exchanger to a set of radially distributed measurements at the reactor outlet by ‘exact’ collocation solution of (40) without the chemical reaction term. His example is much more difficult than the present example since the
11 TABLE4 u= 0,~=&~/(3.6-6), Method
91 = ~~~-J'(~)~~2~-1 ‘Sfiff’sysfem
‘Soff’sysfem BA
A
Al
C
BA
A,
Al
C
k4 hi h
1.5 0.3 3.0 0.9 1.2 1.8
1.509 0.298 3.052 0.917 1.117 1.912
1.513 0.297 3.075 0.921 1.079 1.955
1.500 0.300 2.996 0.899 1.208 1.789
1.5 3.0 3.0 0.9 12.0 1.8
1.469 4.278 1.950 2.004 19.007 2.024
1.500 4.634 1.682 2.215 21.018 1.795
1.505 2.645 4.049 -0.162 8.711 1.758
S
7.10-16
1.8.10-4
2.5.10-4 9.8.10+
kl
kz k3
boundary condition at the wall changes with the parameter values, but he circumvents this difficulty and manages to work iteratively on the ‘exact’ solution. This points to a major weakness of method C and method A if inappropriately used. Including ‘unknown’ y-values to boost the collocation order leads to wasteful computation and a difficult iterative process for estimation of p and k. As stated in the conclusion of this study it might often be better to work out special tricks in each parameter estimation problem. 3.3. Three component reversible monomolecular reaction cyclus Luss and Wedel [ 111 have used the triangular reaction system (41) as a test example for method C in a reaction engineering graduate course.
(41)
A3
k4 It is assumed that [Ai], [A,], [As] = (yl*, y2*, y3*) = (1,0,O) at t = 0, and that the components of y are measured independently at certain t-values >O. Using the transformation of section 1 .l these measurements and the given (error free) initial value y. are used to generate v^at the zeros of (1 - t)P5(‘* l’(t). The Vi are supposed to be stochastically independent with the same variance CJ~.
2.9.10-16 5.9.10-S 7.2.10-3 2.7~10-~
The interpolation abscissas are optimal for method C, but not for methods A and B (better: 7 = zeros of Ps(” O)(t)) but we have chosen to compare all three methods with the same set of collocation points. Table 4 collects the results for a ‘soft’ and a ‘stiff’ system. (Reaction between A1 and A3 speeded up by a factor 10). Method Al is one iteration of method A starting with p. = 0. Method BA is a maximum likelihood fit on the analytical solution of the three coupled equations. Hence the small s-value of the order of the machine accuracy when u = 0. The results in Table 4 were compared with results for a more realistic simulation with u = 0.001. All simulations were repeated 28 times to obtain an empirical estimate for the covariance matrix and for the standard deviation of the parameters that could be compared with estimates based on the formulas in section 2. Without showing the computed results we shall summarize the main observations. A. ‘Soft’ system : (a) Methods A, Al, BA and C all gave results that deviate less than 8% from the exact results in Table 4, column BA. The standard deviation was much larger than in Table 4 (up to 80% for k5 and ks which appear to be less accurately determined than the other components of k). The F-value of (39) was small (< 1.2), indicating a satisfactory centrality of the estimates. (b) The empirical correlation matrices are approximately the same,for all four methods. The estimates obtained by (9) and (10) for the covariance matrix and variance of k in methods A and B are also approximately the
12
same as obtained from the corresponding empirical quantities. B. ‘Stiff’ system: (a) Now all collocation based methods A, Al and C give about the same results. They are all poor compared with those of method BA, but they are no worse than the corresponding results in Table 4 (‘stiff’-case) showing that an increase of u from 0 to 0.001 has no effect on results in this case. Obviously 6 point collocation is not sufficiently accurate to represent the profiles, and the s-values of Table 4 (‘stiff’-case) are considerably higher than u = 0.001, indicating that the uncertain estimates of Fziare caused not by noisy measurements, but by an inadequate numerical method for solving the d.e. The correct experimental plan for the ‘stiff’-case is to make three experimental runs, starting at a different corner of the triangle (A,, Aa, As) in each run.
4. CONCLUSIONS
We have argued that the simple collocation method A is an attractive alternative to a full collocation maximum-likelihood method B. Method C is even simpler than method A since only one application of (23) yields the parameter estimate, but quite often one step in (7) gives a satisfactory fi. If an accurate solution of the model is desired for the final maximum likelihood estimate &,,n the first iterate of (7) may be used as initial value for method B which like all Newton-Raphson methods needs a fairly accurate estimate for the unknowns to ensure convergence. Neither method A nor method C should be used when the parameters enter non-linearly in the d.e. ‘model since in that case they loose their computational attractiveness as compared to method B. The statistical quality of the estimates obtained by method A (and B) is excellent as shown by comparison with estimates based on the analytical solution of the d.e. Formulas for estimation of the covariance matrix for i are presented and we show that these estimates are close to those observed by repetition of the experimental runs. A close analogy between the three maximum likelihood esti-
mates A, B and BA is shown in (38) while the statistical interpretation of method C is somewhat obscure, some times giving misleading estimates of the standard deviation of the parameters or as shown in the appendix method C may even lead to results that are qualitatively wrong. Methods A and B are both to be recommended for solution particularly of nonlinear (in y* ) boundary value models e.g. reaction with diffusion problems, where the fast convergence of the collocation approximation makes them very attractive compared with the routines currently available in computer libraries. Since the differential equation model is often linear in the parameters while the analytical solution is very non-linear in p it might well be advantageous to apply the collocation estimation methods even if the analytical solution is available, but obviously this is not always the case. We can see this most clearly in the linear PDE of section 3.2 if the data points are taken at a single value of one of the independent variables - e.g. at the reactor wall. It is ridiculous to construct fictitious ‘data’ points at other radial positions since this will add a vast number of superfluous elements to the parameter vector. In these cases collocation in the radial direction and integration in the axial direction (either analytically or for non-linear problems by a semi-implicit Runge-Kutta method) is likely to be faster than the cumbersome double collocation method that must be used in the collocation based methods. These somewhat depreciating remarks should not be construed to mean that the methods have only a restricted application but as always a judicious attack on each individual problem is likely to give much better results than the use of a rigid set of standard routines.
REFERENCES
1 D. M. Himmelblau, Process Analysis by Statistical Methods, Wiley, New York, 1968. 2 J. H. Seinfeld and L. Lapidus, Mathematical Methods in Chemical Engineering, Vol. 3‘, Prentice Hall, Englewood Cliffs, 1974.
13
6
7
6 9 10
11
J. H. Seinfeld and W. H. Chen, Chem. Eng. Sci., 26 (1971) 753. B. van den Bosch and L. J. Hellinckx, AZChE J., 20 (1974) 250. B. van den Bosch, in W. Harmon Ray and D. G. Lainiotis (eds.), Distributed Parameter Systems, Marcel Dekker, New York, 1978. M. Crine, R. Gosset, B. Ealitventzeff, Proceedings Chemdata, Vol. 77, Science Press, Princeton, 1977, p. 126. J. Villadsen and M. L. Michelsen, Solution of Difdifferential Equation Models by Polynomial Approximation, Prentice Hall, Englewood Cliffs, 1978. J. H. Seinfeld, Chem. Eng. Sci., 24 (1969) 65. F. A. Cleland and R. H. Wilhelm, AZChE J., 2 (1956) 489. M. L. Michelsen, Chem. Eng. J., 18 (1979) 67. D. Luss, private communication from Univ. of Houston.
gives
@B
=
+(I - a~+dl))
(A3)
1 _f
&+I dt
0
The variance of the numerator = + {4v~+IV(vN+I)}
= 2vg+I
o2
is $ V(q&+,(l)) for any value
of N. For N --f = the denominator can be found exactly by a Lobatto quadrature 1 s 0
N+l
q&+1(t) dt = z
WjUj2 ad
i=O N+l
V
=
x
wi”V(vi”)
=
j=O
APPENDIX
N+l
We shall use a first order kinetic model to illustrate a fundamental weakness of method C (van den Bosch’ method). Let
=
4 2
Wj2Vj2U2
j=O
N+l
<4 (max
Wj)02
x
WjVj2
j=O
dy*
* = 0
--$-PY
with y*(O) = 1
(Al)
and determine p, using N + 1 experimental points v taken at t where tl, tz, . . . , tnr are zeros of PN (L l)(t) and tN+l = 1. V(Vj) = U2, a given non-zero constant. The vj are uncorrelated. We construct an interpolation polynomial qN+l(t) of degree N + 1 that passes through (t = 0, v = 1) and through the points defined above. The residual QN1 in formula (21) with aN+l(t) replacing Y~v+#) is (.J
=
dq,+,(t)
Nl
In method of
dt
+ PQN+l(t)
C we determine
or for OL= p = 0 by solution
p
by minimization
of:
Now the weights of a Gaussian (or Lobatto) ‘quadrature decrease to zero as l/N when N --f 00. Consequently the variance of the denominator decreases to zero for N + 00 and a finite u2 while the variance of the numerator remains constant. ,We arrive at the disconcerting result that V(p) remains constant even when the number of experimental points increases to infinity. This phenomenon will not occur with the maximum likelihood estimation method B where an increase of the number N of experimental points2ach with V(q) = u2 eventually must give V(p) = 0. The approximate maximum likelihood method A will give the same results as method B if convergence of the algorithm (7) is guaranteed. This may be troublesome when u2 is significantly different from zero, but extensive computer experimentation shows that convergence is obtained when (ol, p) < (1,l) while divergence does occur for high (a, P), e.g. e = 1, /3 = 3 as proposed by van den Bosch. The reasonable (a, p) to be used in method A and B is, of course, (Y= fl= 0.