Copyright © IFAC Identification and System Parameter Estimation , Beijing, PRC 1988
SOME OBSERVATIONS ON ROBUST STOCHASTIC ESTIMATION G. C. Goodwin Department of Electrical awl Computer Engineering, University of Newcastle , N.S . w. , Australia
Abstract : This paper examines several issues which arise in state and parameter estimation for linear s y stems having stochastic disturbances. A ke y objective will be to ensure that the discrete time results are formulated in such a way that the y are consistent with the corresponding continuous time results when fast sampling is used. Keywords, Stochastic s y stems, Kalman filters, parameter estimation, numerical analys1s, discrete time systems.
The principal advantages of this Delta operator over the Shift operator are
INTRODUCTION It is frequently argued that the wide spread availability of microprocessors means that sampled data systems theory has a far greater practical significance than does continuous time systems theory. Whilst this statement is undoubtedly true, it is equally true that fast sampling rates are needed to avoid the inevitable information loss arising from the sampling process .
the underlying continuous time theory usually turns out to be the appropriate limit as 4-00 significant improvements in numerical and computational accuracy result, and continuous time intuit ions can be used to design discrete time systems (and vice versa) .
This paper will examine some issues in stochastic systems theory with particular emphasis on cases when fast sampling is used .
The notion of the Delta operator and some of its elementary properties were introduced in a previous address (Goodwin (1985)) . Since that time the idea has matured and in the current paper we wi 11 explore, in some depth, its implicat ions in stochastic estimation theory.
By fast sampling, we mean that the sampling frequency, fs' should be greater than the bandwidth of any signal of interest. A suitable rule of thumb is to set the sampling frequency equal to five to ten times the maximum signal bandwidth. If this is not done, then control strategies, filters etc. will perform badly due to the loss of informat ion between samples . (Middleton and Goodwin (1988)).
We first show that the usual discrete time formulation of the Kalman Filter is inappropriate for fast sampling. We then show that these problems are resolved if the discrete Kalman Filter is reformulated in Delta form . This allows a unified treatment of continuous and discrete time systems.
Having decided that fast sampling is desirable, the logical question is what can be said about the resultant discrete time models. In particular, it seems desirable that all discrete time theory should converge smoothly to the underlying continuous time theory as the sampling rate increases.
We next show how these results on Kalman Filtering can be used to motivate appropriate models of stochastic systems in both discrete and continuous time . This gives further insight into the commonly used ARMAX models for discrete time systems and reveals several points of potential confusion. We then show tinuous time be developed g?rithms ~nd dIscrete tIme
Here we face a difficulty ; since shift operator models with the associated z-transform do not allow a convenient representation of impulse functions and hence are incompatible with continuous time models and theory. Indeed , if one takes the limit of discrete time models and theory expressed in this domain, the results are usually meaningless .
Finally , we turn to the issue of numerical accuracy in optimal filtering and show how this is improved if Delta models are used rather than shift operator models . Related numerical issues associated with recursive parameter estimation are also examined briefly.
To overcome these problems, it has been argued elsewhere (Peterka (1986a, 1986b), Goodwin (1985), Middleton and Goodwin (1987)) that the usual shift operator, q, should be replaced by an alternative operator called the Delta operator, 6, defined as
() -- tl 4
how an appropriate theory of constochastic parameter estimation can and we use this to reflect on almethods of analysis used in the case .
CONTINUOUS AND DISCRETE KALMAN FILTERING THEORY. We explore the connection between the continuous time Kalman Filter and the corresponding discrete results.
(1.1 )
where 4 is the sampling period. 23
G. C. Goodwin
24
Cons ider a cont inuous- time, lumped, linear, time invariant stochastic system given by : dx = Axdt + dv dz = Cxdt + dw
with x(O )
= xo
(2.1) (2.2)
mt, A E mnxn , C em txn , E[xOJ = n it O' Cov[xOJ = PO ; v(t) E m , is a vector liiener process with incremental covariance ndt ; w(t) E mt, is a vector liiener process with incremental covariance rdt; x(O), v(t) and w(t) are independent; n and Po are symmetric positive semi- def ini te matrices ; and r is a symmetric positive definite matrix . (Note that n, rare actually the noise spectral densities) .
where
x
~n,z
E
E
lie shall consider (2.1), (2 .2) as the underlying system and derive the discrete Kalman Filter by sampling (2.1), (2 .2). A key aspect of this derivation is the specification of the sampling For example , it makes no sense to process. directly sample the output, in (2.2 ) since this would lead to a discrete system having output noise of infinite variance! This problem is resolved by replacing the impractical ideal sampler by a practical form of sampl ing in which the signal is passed through an anti-aliasing low pass filter prior to sampling (Astrom (1970)). lie shall denote the impulse response of this low pass filter by h(T) . For simplicity we assume that h(T) is zero for T~4 .
*'
Proof : See Salgado et . al (1988) . vvv
The opt imal discrete time filter for the system (2 .3), (2.4) is then given by the well known Discrete Kalman Filter:
-
Hq(k)
q
Cov{v q} =
J ~h(kA+A-T)C~(T,kA)dT) -~
A
J
O~(A'T)n~(A'T)
T
R1(t) R2(t)
~ ~
(2.6) (2.7) (2.8)
J: J:h(t-T)R3(T-T)h(t-T)TdTdT
J~h(t-T)rh(t-T)TdT
(2.9) (2 . 10)
-~
Pq(k+1)
f
Ar~ TT T oJ_~'(A'T)o.(a'T) C h(A-a) dadT
~ S q
Qq+
+ Sq]
Alq(k)A~- [Alq(k)C~
[Rq+CqPq(k)C~]-1 [CqPq(k)A~+S~]
with
(2.12)
(2 . 15) (2 . 16)
A simple anti-aliasing filter which captures the essence of the required pre-sample filtering action is an 'avera~e' filter obtained from a reset and integrate cIrcuit, specifically 1 y(kA+A) = I
J(k+1)Adz(t)
(2.17)
kA
The corresponding impulse response, this filter, is given by
t ;0 ~ t
~ A
h(t)
for (2.18)
= 0 otherwise From equation (2.7), it can be seen that for small A:
(2 .19)
Qq '" OA
Also, for the above typical pre- sampling filter, it is readily seen that for small A: (2.20) (2.21) Using the above results, we are able to see what happens to the discrete Kalman Filter in the limit as A-oO. A major conclusion is that the shift form of the filter, as given in equations (2.19) to (2.21), is not appropriate when fast sampling is used. For example, as the sampling rate increases, we obtain the following results, which hold irrespective of the underlying continuous time system: lim Q = 0
(2.22)
lim R =
~
(2.23)
=I =0
(2.24)
4-00
q
q
lim A A-oO q lim Hq
4-00
~ E{vqW~} =
=
4-00
(2.11)
and Also, Cov(vq,W q)
A
dT = Qq
Cov{w q} = R1 (A) + R2(A) ~ Rq with
= [AqP(k)C~ + Sql(CqP(~)C~ + Rq)- 1
h(t) =
Lemma 2.1 If the system (2.1), (2.2) is sampled at interval 4 using a pre-sampling filter of impulse response h(T), then the sampled output response is described by the following stationary discrete time linear system (expressed in shift operator form) : x(k+1) = Aqx(k) + vq (k) (2 .3) y(k) = Cqx(k) + wq(k) with x(O)=x O (2.4) AA (2 .5) where Aq = ~(k4+4, A) = e
(2.13)
with x(O) = itO (2 . 14) where P(k) satisfies the following Discrete Riccati Difference Equation (DRDE):
lie then have the following result:
C =
-
x(k+1) = Aqx(k) + Hq(k) [y(k)-Cqx(k)J
(2.25)
These anamalous results are a direct consequence of the use of the shift operator to describe the discrete time model. The situation is remedied if we reformulate the discrete model using the Delta operator . In this case the discrete time system (2.3), (2.4) becomes
25
Some Observations on Robust Stochastic Estimation
6x(k) = A6x(k) + v6(k) y(k) = C6x(k) + w6(k) A-I
where
A6 = -;-- , C6
(2.26) (2 . 27)
= Cq
(2 . 28)
the discrete model converge to the corresponding continuous quantities; i.e. (i)
lim A6 = A
(ii)
lim C6 = C
(2.39)
lim 06 =
(2.49)
A....o
and where
(iii)
A....o
(2 .29) (2.30) (2 .31)
(2.38)
A....o
°
lim r6 = r
(iv)
(2.41)
A....o
(v)
lim S6 = 0
(2.42)
A....o
[H] The Delta form of the discrete filter converges to the following continuous time filter:
.
.
From (2.19), (2.20), we see that for fast sampling, Qq' Rq are typically of order A and respectively . Thus Q6' R6 are typically both of
dx(t)
1 order "K' Hence to achieve sampling 'independence' it is desirable to define "spectral densities' 06' r6 by dividing Q6' R6 by the
and H satisfies: H(t) = p(t)CTr- 1
t
sampling frequency fs =
t;
H = lim H6
. x(O)=x(O) (2.43) (2.44)
A....o
(2.45)
(2 .32)
~(t) = P(t)AT + AP(t) - p(t)CTr- 1CP(t) +
(2.33)
with P(O) = PO'
With this notation, the discrete Kalman Filter becomes, "
+ H(t) [dz(t)-Cx(t)dt]
where P( t) satisfies the following Continuous Riccati Differential Equation (CRDE) :
i . e.
6x(k) = A6x(k) + H6(k) [y(k)
where
= Ax(t)dt
°
(2.46)
Proof :
Immediate from (2.28) to (2.35) on letting A""O.
.
C6x(k)] (2.34) H6(k)= [( AA6+ I) P6(k) C~+S 6] [AC 6P6(k )C~+r6 1 (2 .35)
r
where
vvv
k
Using p to denote either (continuous time) or 6 (discrete time), then the Kalman Filter for both continuous and discrete time systems can be described in a unified way as
.
"
px
where
= Ax + H[y- Cx]
H = [(AA+I)PCT + S] [ACPC T + r]-l
(2 .47) (2 .48)
and where P satisfies where and Y satisfies (2.39) and is of order A. A key aspect concerning the use of the operator p is that little conceptional error results if p is interpreted as even in discrete time . This will be a key feature of the subsequent discussion .
k
In summary, the following continuous and discrete cases :
result
connects
Lemma 2.2 [A] The quantities appearing in the Delta form of
The Delta form indicates that the design and performance of the opt imal filter are most conveniently expressed as functions of the noise spectral densities (06,r6) rather than the noise variances as seems to be the case in the alternative shift formulation . The conclusion that the Kalman filter design and performance depends on the noise spectral density is consistent with recent results on estimation theory (see Ljung (1987)) which show, amongst other things, that estimation accuracy is only a function of the ratio of signal power to noise power at each frequency of interest.
G. C. Goodwin
26
lie can drall a further inference from this discussion. In particular, say the noise is actually coloured lIith a finite bandllidth, say fB ' Provided this bandllidth exceeds the signal bandlIidth then the design and performance of the optimal filter are virtually unaltered provided they are expressed in terms of noise spectral densities. HOllever, a redesign is necessary if the filter is expressed in the more conventional lIay using variances (Anderson and Moore (1979)). For steady state filtering problems, lie are interested in the Hmi t ing behav iour as t""O. Under mild regularity conditions, it is knolln that P converges to a unique positive semi- definite sol ut ion knolln as the 'Strong Sol ut ion' of the corresponding Algebraic Riccati Equation ARE (obtained from by setting p = 0 in (2.49)) . There exist many lIays of explicitly evaluating the strong solution of the ARE. A common method uses the follolling result:
If lie ftefine the innovat ions process, f, by f = y-Cx, then the steady state Kalman filter
(2 .47) can be rellritten as px
Ax
+
Bu
y
= Cx
+
f
+
Kf
(3.1 ) (3 .2)
IIhere lie have used K to denote the steady state value of Ht' Note that the innovations process, f, is a IIhite 'noise' process lIith spectral density, !», IIhere (3.3) IIhere C, P and r are as defined in section 2. For the continuous time case (3.1), (3 .2) should be interpretted as a short hand notation for the follolling incremental model:
Lemma 2.3 Consider the follolling Hamiltonian Matrix: (AA+I!- T_ I [
O(AA+I)'T
T 1 (AA+I)-Tc r- c
1
dx = Axdt + Budt + Kdw
(3.4)
dz = Cxdt + dw
(3.5)
IIhere dw is an independent increment process, lIith incremental variance rdt. (Formally, dz d dw) y=ar an f=Q['
A+AO(AA+I)-TcTr -1 c (2.50)
[A] This matrix has the property that the eigenvalues of M can be grouped into tllO disjoint subsets r 1 and r 2 such that for every hC € r 1 there exist a hd € r 2 such that hc+hd+Ahchd=O. lie can thus choose either r 1 or r 2 to contain only those eigenvalues IIhich are inside or on the stability boundary . [B] The strong solution of the ARE can be obtained [Xll] € 1R2nxn to span the nth X21 order stable invariant subspace . Then the strong solution is
To convert (3.1), (3.2) to a for black box modell ing lie loss of generality, that the form, i .e. A, B, K have the
more convenient form may assume, lIi thout model is in observer follolling structure:
- an_1
bn_1
A
B= - aO
n-1 K=
bO
(3.6)
by choosing
(2.51)
lIith this model structure, lie can successively eliminate the state vector x, to yield a model expressed purely in terms of input and output quantities. This yields the follolling left matrix fraction representation (Goodllin and Sin (1984), Kailath (1980)) :
[C] For continuous true systems, the above result holds lIith A=O on noting that
e]
T 1 . M = [_AT c rhm A....o 0 A
vvv
The Kalman filter is a basic building block in the formulation of black box models for stochastic systems. In particular, the usual ARMAX or transfer function type models are simply a convenient lIay of expressing the innovations form of the Kalman filter.
n
n-1 + ... + a O
A(p)
p +a n- 1P
B(p)
n- 1 + ... + b bn_1P O
(2.52)
ARMAX MODELS REVISITED
(3.7)
A(p)y = B(p)u + C(p)f IIhere
Again lie see that the continuous and discrete results can be expressed in a unif ied fashion as in (2.50).
kO
n
n- 1
C(p) = p + cn- 1P
lIith
c·1 = k.1 + a·1
+
... + Co
i=O, .. "' n-1
(3.8) (3.9) (3.10) (3 . 11 )
For est imat ion purposes, it is desirable to reexpress (3 .7) in Fractional form (Francis (1987)) . To this end, let E(p) be a stable polynomial. Then (3.7)
can be IIritten as
Ey = (E-A)y + Bu + Cf
(3.12)
IIhere lie have dropped the explicit dependence on p for convenience .
27
Some Observations on Robust Stochastic Estimation
Dividing (3 . 12) throughout by E gives
general theory of discrete and continuous time estimation. lie will discuss further why this is so below. For the moment, we will use the general form (4 .5), (4 .6) .
This model is the appropriate form of the continuous or discrete ARMAX model with observer polynomial E. In discrete time theory, it is common to use E = qn where q is the shift operator. However, this is an inappropriate choice because qn has no continuous counterpart and, implies 'near differentiations' in the model (3.13) in discrete time. Thus the more general observer polynomial, E, is important in defining the generalized ARMAX model .
It is now possible to define a suitable parameter estimator for the vector, S . For example, the gradient algorithm has the form :
vvv
where a > 0 constant.
and
is a normalization
Again this equation needs to be interpretted as a compact way of writing the following algorithms:
PARAMETER ESTIMATION lie next turn to the problem of estimating the coefficients in the model (3.13). lie will illustrate the basic form of algorithm that can be employed by discussing the class of algorithms known as pseudo linear regressions (Ljung (1987), Goodwin and Sin (1984)) . In this regard, a key observation regarding the model (3.13) is that it can be written as a linear regression as follows: (4 . 1) where y. n-1 u u n-1, f] E""'E'P E"" " E"P E"""E'
-TA[ n-1y'
>/It= P
(4.8)
(4.2)
Discrete time : (4.9) Continuous time (4.10) For pedagogical reasons, we will continue with the A rigorous formal notation given in (4 .8). treatment of the continuous case based on state space models and (4.10) is available in Gevers et. al (1987). Remark 4. 2:
(4.3) Equation (4.1) is not quite in a form suitable for parameter estionation due to the fact that the . regression vector, ;j,t' depends upon the unmeasured innovations process, 't. This can be circumvented by replacing, 't' by an on line estimate,
't where
A common choice for the constant r t in (4.8), in the discrete time case, is r t = ri where ro = c > 0
However, this does not seem to be appropriate for the limiting continuous time problem since it does not allow one to establish existence of the solutions of (4.10) . Hence, it has been recently argued in Gevers et . al (1987) that the normalization should be changed to a stronger form, e.g.
( 4.4)
Substituting ,
for ,
in (4.2)
gives
.
T
>/It =
(4 .12) vvv
(4 . 5)
with
[p
.
Y. n- 1 u u n- 1 [Y:..L) Y..:..::L] P""pP E" " "E"P L " " 'T
n- 1 y.
(4 .6) ~U:
!n discrete time , it is somet imes suggested that y in (4.5) be replaced by an a-posteriori prediction defined as follows : (4.7) Clearly the distinction between a- prori and a- posteriori prediction.s disa~pears as .A-iJ . and thus this replacement WIll be InconsequentIal In a
(4.11)
lIith this choice of normalization constant, one can establish meaningful properties of the algorithm (4.8). For example, one can use discrete or continuous Martingale Theory (Doob (1953)) to obtain properties of the algorithm. To this end, it i~ usually to define a parameter error vector, S, and the deterministic part of the prediction error, ~, by (4.13) ~
= y- y-,
(4.14)
Equation (4.8) can then be re- expressed in error form as:
28
G. C. Goodwin
-
"'t T)t
pGt = a -r
+
(4.15)
t
The analysis then proceeds by evaluating Using (4.15) we immediately have that
-r p(G G).
r ("'tGt)T)t'
we also have the first term in (4.16)
2 which depends upon T)t' Hence, to handle this term, the sufficient condition for convergence must be strengthened to
rC Discrete
Re~ression
a 2A . T IS Input Strictly Passive
Time with Vector .
A-Posteriori
(4.20)
Errors
in
The algorithm was briefly introduced in remark (4.1). In this case, (4.18) is replaced by (4.21)
(4.16) where
Remark 4.3 Equation (4.16) is strictly only valid discrete time case. In the limit as first two terms in (4.16) disappear as expected. However, caution is necessary third term. The problem is that f t spectral density which converges to r (see (3.3». Hence the variance of
would be with the has a as .1.-<0 f is t approximately Indeed, application of the Ito rule for the continuous case, shows that the third
t.
T
term in (4.16) should be replaced by
a2 "'t"'t ~ r. r
t
(Actually, a simple discrete convergence result can be established (Astrom (1970» making this replacement more plausible). Thus, in continuous time, equation (4.16) reduces to (4.17) vvv
The main remaining problem with the analysis of (4.8) (or 4.10) is the cross product term between (",T and T)t' This term is handled by noting that, in both continuous and discrete time, we have from (4.1), (4.5), (4.14) that
e)
(4 . 18)
A little manipulation (see Goodwin and Sin (1984) now shows that the first term in (4.16) can be eliminated and hence the sufficient condItion for convergence reduces to the one given in (4.19) . Remark 4.4. From the above discussion, we see that with fast sampling, there is no substantive difference between the use of a- priori errors or a- posteriori errors in the regression vector since both converge to the continuous time case as .1.-<0. This is entirely in accord with one's intuition. Remark 4.5: It is also important to note the significant role played by the observer polynomial, E, in satisfying the positive real condition ~4.19). Indeed, the polynomial E should be close to' the polynomial C for the algorithm to work. Certainly, the choice E = qn=(Ap+1)n commonly made in discrete time seems to be inappropriate . With the above restrictions, we can use Martingale analysis to prove (Gevers et. al (1987» that the following propert ies of the parameter estimator hold irrespective of the nature of the data generating mechanism (expressed in continuous form) . Lemma 4.1 (Properties of the Parameter Estimat?r). a.s. (i) lim sup "G t " ~ k1 < ~
(4.23)
t-
T-
Thus the sign of the product term, ("'tGt)T)t' can be 'fixed on average' by assumin that the polynomial E(p) is chosen so that r satisfies a positive real condition. The following remarks now apply:
5
Continuous Time: In this case, it suffices to have the product term 'negat i ve on average' and hence the sufficient condition usually imposed for convergence is that
~ is Input Strictly Passive
(4.22)
for the
.1.-<0, the
(4.19)
Discrete Time with A- Priori Errors in Renession ~
In this case, in addition to the product term
t
2
J T)T dT ~ k2 < ~ t- 0 r
(ii) lim
a.s .
(4.24)
T
(iii) For all finite A', Hm sup "Gt+T-Gt"=O t- O~T~A' (4 .25) vvv
These results can then be used in many ways, e.g. to prove global stability of discrete (Goodwin, Ramadge, Caines (1981» and continuous (Gevers et. al (1987)) stochastic adaptive control laws using the following result which links the parameter estimator properties with the control law:
29
Some Observations on Robust Stochastic Estimation
Lemma 4.2
(Stochastic Key Technical Lemma)
(4.27)
Suppose the estimator is such that (4.6) is satisfied, and the controller is such that the following growth condition is satisfied: -r t t
~
J
t 1)2dT tOT
C + -k
where vt has spectral density {}. (4.1) in this case simply becomes:
(4 .28)
a.s.
·t
T
where C, k are positive constants.
where
Then
and where f t
(i)
r lim sup t t t--
(ii)
lim t1 t--
J0t
k3 < '"
a . s.
2 dc = 0
a.s .
~
1)
The model
[n- 1 y..
= p
y.. n- 1 u u] P E" " "E'
L'''' ' p
(4.29)
has spectral density r .
Application of the Kalman Filter (2.47), (2 .48) to the model (4.27) , (4.28) immediately leads to the following least squares parameter estimator : (4_30)
T
where
Direct application of the Bellman-Gronwall Lemma - see Gevers et. al (1987) .
Proof:
(4.31)
vvv
and where Pt satisfies Remark 4.6: We note that it is implicit in the innovations model (3.1), (3.2) that the relative degrees of the noise model in (3 .7) will always be zero for both discrete and continuous time . This will be the case if one assumes that the measurement noise in the model (2.2) is non-zero . In practice it is always desirable to impose this restriction so as to account for measurement imperfections e.g. AID converter quantization errors (Wahlberg (1987)). From a theoretical point of view , it is possible to hypothesise a data generation mechanism of the form (3.7) in which C(p) has degree nc < n. This can be converted into the standard problem format by defining y = ElY where El is any stable polynomial of degre n- nc' Equation (3.7) then becomes (4 . 25 ) If we then put
nc '
E = El E2 where then (3 . 13) reduces to
y = (E-A)
E2
has degree
f + B (r:-)2 + (C-E2) ~2 + f
(4.26)
The theory then proceeds exactly as above but with the number of parameters reduced to n+(n b+1) + nc and with the Positive Real condition required on rather than ~.
t-2
Remark 4.7 : In pract ice, it is preferable to use a least squares estimator rather than the simple gradient scheme given in (4 .8). Actually the least squares estimator can be motivated by the corresponding discrete or continuous Kalman Filter . To illustrate this point, consider again the model (4.1) but let us assume for simplicity that C ~ E. We now add a simple model for the parameter variations, e.g.
pP t = {} -
(4 .32)
Note that Y in (2 .50) is zero for the model (4 .27), (4 .28) . Equations (4 .30) to (4 .32) give a very simple form of an appropriate least squares parameter estimator for use with either continuous or discrete problems . Note again, the dependence on noise Spectral Densities rather than variances . Also, it is clear that (4.30) , (4 .31) can be simplified for A""O. By making minor variations to (4 .30), (4.31), (4.32) other related algorithms can be readily developed , e .g. constant trace algorithms, exponential data weighting, exponential forgetting and resetting algorithm (EFRA) etc. NUMERICAL ISSUES IN OPTIMAL FILTERING AND ESTIMATION. We have seen that use of the general operator P allows both discrete and continuous estimation problems to be described in a unified framework. Moreover, the continuous time results are imbedded in the corresponding discrete results when the latter are properly formulated . Use of the operator p={j in discrete time makes explicit near differentations which are often overlooked when the shift operator is used. We next turn to the question of numerical properties of discrete time algorithms. It is generally true that the Delta form has superior numerical properties relative to the shift form. These enhanced numerical properties basically result from the centering of calculations by shifting the origin to the point 1+jO in the z- plane . The superior numerical performance of the Delta form of both the Algebraic Riccati Equation and Riccati Difference Equation are established in Salgado et . al (1987) and Middleton and Goodwin (1988). Here we will illustrate the conclusions using a simple example.
30
G. C. Goodwin
Example 5.1.
(Riccati Difference Equation)
Consider a system having transfer function H(s) = 1 Sls+TT with continuous state space model in observer form. The Riccati equation was solved using 5 mantissa bits to represent Aq and A{j' The following constants were also used: R=l, Q=I, A=0.075, PO=!' Figure 5.1 shows the propagation of the relative error defined as 11 e
=
"[p(e»)FP-p(e)"F
(
L
A T I V E E A R 0 A
S
.001
L
• T
1 V
E E
R R 0 R
.8
(5.1) 10
It can be seen from the figure that the Delta formulation leads to a significant improvement in the relative error in the computation of the Riccati Difference Equation.
A E
R E
"p(e)"F
where [p(e»)FP' P(e) denote the floating point and "infinite" precision solution of the Riccati equation and "o"F denotes the Frobenius norn.
.02
.01
~
16
12
18
20
22
MANTISSA BITS Shill: _ Della:
Figure 5.2 : Comparison of Errors for Algebraic Riccati Equations Using Delta ({j) and Shift (q) Operators with A=0.075. Turning to the problem of parameter estimation, the shift form of models simply cannot be used with fast sampling . This is because, in shift form, the regression vector depends upon terms of the form Yt' Yt-l etc. However, in the limit as A~, these quantities always become perfectly correlated and hence a singular estimation problem results. On the other hand, the delta form of the discrete estimation problem, conver~es as A~ to the continuous time problem whlch is well formulated (Edmunds (1985), Young (1981» .
.01
.009 .008 .rXl7 .008
ALTERNATIVE PARAMETERIZATIONS
.005 .00<
.003
12
0 Shift:
15
18
21
24
27
30
ITEAATION NUMBEA Della: -----
_
Figure 5.1.
Comparison of Errors for Dynamic Riccati Equation Using Delta ({j) and Shift (q) Operators.
Example 5.2.
(Algebraic Riccati Equation)
Consider the same system as in Example 5.1 with the following parameters R=l, Q=I. The steady state solution of the Riccati equation was found using the eigenvector method (Lemma (2 .3» for sampling period A=0.075 and for mantissa lengths ranging from 5 to 24 bits . Figure 5.2 compares the relative error between Delta and shift forms of the Riccati equation. It is clear from the figure that the Delta performance is better. We also see from the figure that, for fixed number of mantissa bits, the Delta form gives errors which are roughly an order of magnitude smaller than those obtained with the shift form. Alternatively, the same accuracy is obtained using 4 less mantissa bits in Delta form than in shift form. vvv
As pointed out in the last section, shift operator models are generally unsuitable for parameter estimation, expecially when fast sampling is used . On the other hand, the Delta model coeff icients are readily estimated . Yet the delta coefficients are related to the shift operator coefficients by a simple linear transformation . For example, when n=3 , we have aq2
A
0
0
aq1 a0q
- 2A
A2
0
A
_A2
{j
a2
a{j 1 3 a{j A 0
-3 +
3
(6 .1)
-1
This leads to the question, do there exist other linear transformation of parameters which have even better numerical properties. An infinite number of transformations are possible. All have the same statistical properties but the numerical properties will differ. To investigate the best form of linear transformat ion, we note that the least squares estimate depends on the inversion of the information matrix, M, where (6.2) To proceed further we will need to use a- priori informat ion about the nominal system parameters and the nature of the input. However, it is
Some Observations on Robust Stochastic Estimation
interesting to see hO\l this a- priori information might be utilized to improve the performance of parameter estimation algorithms. Say \le have a-priori estimate, M, of M in equation (6 _2) then \le can find a (non-unique) D such that
31
T~e
2nx2n matrix on the right hand side of (6 . 10) \1111 be recognIzed as the Sylvester Matrix, !l0' (Good\lin and Sin (1984)) associated \lith the polynomials AO' BO ' . H these polynomials are co- prime then this matrix is nonsingular. Hence \le can define a ne\l regression vector, -/I, and parameter vector, ~, by
(6 .3) This suggests that an appropriate linear transformation of parameters to achieve the best conditioning of the least squares equations \lould be to use: (6.4)
(6 . 11) lie then have the follo\ling results regarding the nominal vectors -/I and ~ . Lemma 6 . 1
(i)
\lhich corresponds to using a ne\l regression vector -/I \lhere -/I
so that
T -
(6.5)
y = ~Ta = -/lT~
(ii)
(6.6) M-/I'
is then
u by factoring out
i:o
r
[ pn- 1AOY" " ,AoY'pn- 1AOu, ... , AOU ] o bn_1
bO 0
p p
(6.9) 2n-l 2n- 2
·=bo
u
p
(6.10)
6
w
1
0 4 w 0
0 w4 0 4 0 - w2 w .u dw 0 w2 0 -w 2 0 (6 _13)
(w 1 , (2) is the pre-selected \lhere band\lidth and ~u is the input spectral density. The nominal values of the coefficients of ~ are the coefficients of the polynomials EB .
Proof :
(')
di)
(6 .8)
\lhere AO' BO are the nominal (a- priori) values of A, B. Further, AOY ~ BOu, hence (6 .8) can be \lritten as
o
(iii)
nominally
[2n1 , . .. ,1]u p
~
w 1
as follo\ls :
..T _ 1 [pn- 1 A n- lA A] ?t - ~ oy,··· ,AOY'p Ou, ... , OU
I\OL
-/1-
lie see that if M:>
insight can be gained into the parameterproblem examining in more detail the form regression vector, ~, given in equation This vector can be related to the input
t -
M-
(6.7)
To I centre I the least squares algorithm as outlined above \le need to have an a-priori estimate of M. This can be obtained by using the nominal values of A and B together \lith an a- priori estimate of the nature of the system input signal over the band\lidth of interest.
1
~
-/It'
(6.12) t For a stationary input, the ne\l information matrix has a near zero entry for every second element. In part icular , it has a structure (illustrated for 4 parameters) as follo\ls:
-/I
= D- 1~
The ne\l information matrix,
Further ization of the (4.29) _
The regression vector, satisfies
(iii)
Immediate form (6.12) , (6.13). This result is exactly true in continuous . t1me since p i Zt an d pj Zt are al\lays uncorrelated for (i- j) add. The result is approximately true in discrete time since 6 1 ~ for A 1 O. From (4 . 3), (6.12), (6 . 13) \le have [p2n-1, .. . ,p,1]~ = BO(E-A) + AOB
(6.14)
BOE \lhen A=AO,B=B O (6.15)
vvv Lemma 6.1 part (iii) su~ests that a further improvement is possible 1n the transformation (6.13) by subtracting the nominal value of /3 trom the ne\l parameter vector: (6 . 16) In this case, the nominal \lhere /3 0 is EBO' value of /3' is zero . Hence the transformation (6.16) achieves the dual role of centering the parameter estimates on zero and giving an information matrix as in (6 . 13) .
32
G. C. Goodwin
The final form of the parameter algorithm based on ~. is simply -/It
estimation
-1
= ~O
true value
(6.17)
+t
Where +t is necessarily defined by (4.29) in the formulation of the algorithm . T T -. Ht[yt--/lt ~O--/lt ~t] Pt-/lt
(6.18) (6.19)
~tI-..,.,oo:--~,!"'•. ""..:---.""•.-:-.::-,-~,,:-.':-:-'--"~O."""--"'''::-O.''''OO:--'71tll ... ITEMTION NUMun -\0'
(6.20)
.
(a) Gradient
(6.21) Remark 6.1: Apart from improving the conditioning of the least squares est imat ion problem, a further mot i vat ion for the use of transformations as in (6.15) to (6.19) is to attempt to decouple the parameter estimates so that when one parameter changes, it is only that particular estimate which responds.
true
value
.
"
=
Remark 6.2: The form of the information matrix given in (6.13) suggests that further pre-transformations are possible. Indeed, using a-priori estimates of AO and the input spectrum, it is relatively easy to obtain an estimate of M-/I in (6 . 13) which can then be used as to the basis of a further transformation as in (6.4), (6.5) (Wittenmark et. al 1988)).
0,: GO
•
2Q.OQ
40.00
-60.00
10.00
ITERATION NUMBER >10'
.
I~O.OO
o
(b) Transformed gradient
o
true value
o
Example 7.1: To illustrate the advantages to be gained from centering the least squares algorithm as outlined above, a closed loop system was simulated with sampling period A=O.l seconds. The nominal plant model and controller were chosen as 2 G = 8.6(6+1) (6.22) C 6( 6+5) The following estimation algorithms were compared: (a) Ordinary gradient algorithm; (b) Gradient algorithm with pre-transformation; (c) Regularized least squares with constant trace; (Regularization was achieved by adding a diagonal matrix to the P update); and (d) Regularized least squares with constant trace and pre- transformed parameters. The set-point sequence was chosen as a square wave with period 10 seconds. The plant parameters were changed as follows: G' -
1
0 ~ k < 350
4
350
P - (6+1)2 (6+2)2 1
= (6+1)2
~
k < 700
(6.23)
--
-0
.;~:\-,-. .---:,"-b.-',-=-o--.cb.-:-'O:--~8b::-.:-::DO--,CC~--:.,-=-,--',1":~I:-.,=-=,-'711h. 00 ITERATION NUMBER 010'
(c)
Regularized Least Squares
true value
:. .
_-_--I~
.... 00
700 ~ k < 1050 (least squares only)
1'-1
~ ~_--+---'l,L-J
ZO. 00
~Q. oa n.1I 10.00 ITERATION HUMBER -10'
IQO.OO
120 . 00
(d) Transformed Regularized Least Squares Figure 6.2 Comparison of Different Parameter Estimators
Some Obsen"ations on Robust Stochastic Estimation
The filter E = (6+2)2 was used in constructing the regression vector. Using the nominal set point, plant and controller, the matrix M was evaluated and used to transform the regression vector as in (6.5). This pre- transformation was used for algorithms (b) and (d). , Figure 6.1 shows the results of the various parameter estimation procedures for the first element of the parameter vector (the other two elements give very similar responses). It can be seen that center ing of the algor i thm by use of the pre· transformation leads to a significant improvement in the performance of the est imat ion methods. In fact, the gradient algor i thm with transformat ion is almost as good as the untransformed least squares algorithm. The above results were obtained with a fixed controller Gc' However, almost identical results were obtained when the controller design was made adaptive with A* as the desired closed loop denominator polynomial. Remark 6.3: From extensive computer simulations, we have found that typical values for the cond i t ion number of the information matrix for a second order model are: Shift operator model r> 1,000], ordinary del ta model [100], delta model transformed as in (6.11) [10) and delta model for the transformed using a-prIori form of input spectrum [2]. CONCLUSIONS This paper has shown that continuous and discrete est imat ion algori thms can be descr ibed in a unified fashion provided the discrete case is form· ~lated in a consistent fashion. Also, a discussIon of various numerical issues that arise in estimation has been presented . REFERENCES Anderson, B.D.O . and Moore, J.B. (1979). Optimal Filtering. Prentice Hall, Englewood Cliffs.
33
Astrom, K.J. (1970). Introduction to Stochastic Control Theorv. Academic Press, New York. Doob, J.L. (1955) . Stochastic Processes. Wiley, New York. Edmunds, J.M. (1985) Identifying sampled data systems using difference operator models. Report, UMIST, Control Systems Centre, Report 60l. Francis, B. (1987). A Course in H Control Theory. Lecture Notes on Control and Information Theory, Vol.88, Springer Verlag. Gevers , M., Good.in, G.C., and Wertz, V. (1987). Continuous time stochastic adaptive control. Tech . Report EE8743, University of Newcastle, Australia. Goodwin, G.C., Ramadge, P.C. and Caines, P.E. (1981). Discrete time stochastic adaptive control. SIAM J . Control and Optimization, 19 , No.6, 829·853. Goodwin, G.C. and Sin, K.J. (1984). Adaptive Filtering Prediction and Control. Prentice Hall, Englewood Cliffs. Goodwin, G.C. (1985). Some observations on robust est imat ion and control. Plenary address, 7th IFAC Symposium on Identification and System Parameter Estimation, York, 3·7 July. Kailath, T. (1980). Linear Systems . Prentice Hall, Englewood Cliffs. Ljung, L. (1987), System Identification: Theory for the User. Prentice Hall, Englewood Cliffs. Middleton, R.H. and Goodwin, G.C. (1988). Digital Estimation and Control : A Unified Approach. Prentice Hall, to appear. Peterka , V. (1986a) . Algorithms for LQG Self-Tuning control based on Input-Output Delta Models. IFAC Workshop on Adaptive Systems in Control and Signal Processing . Lund , Sweden. Peterka, V. (1986b). Control of Uncertain Processes: Applied Theory and Algorithms, Kybernetika, 22, 1- 102. Salgado, M.E., Middleton, R.H. and Goodwin, G.C. (1987). Connections between continuous and discrete Riccati equations with applications to Kalman filtering . Proc. lEE Part D. Wahlberg, B. (1987). Ph.D. Thesis, Linkoping, Sweden. Wittenmark, B., Salgado, M. and G.C. Goodwin. (1988). Use of prior information to pre-scale parameters for estimation. Tech. Report, University of Newcastle, Australia.