Mathematics North-Holland
and Computers
DERIVATION
in Simulation
33 (1992)
OF DYNAMIC
507-512
507
ESTIMATION
E DIRAC
DELTA
EQUATIONS
BY MEANS
OF
FUNCTION
P. R. Benyon P.O. Box 193, Milton, NSW 2539, Australia 1.
IINTRODUCTION A key step when estimating
the state and parameters
in nonlinear dynamic models is
determining how uncertainty propagates through the nonlinear functions of the model.
Given a
function Y = f(x)
(1)
relating the vectors x and y, of dimension nn and nr, and given the density, p(x), of x we require the density, q(y), of y. This is the problem of derived distributions discussed in probability theory. In its application to estimation for the general form of dynamic model x = f(x_,v)
(2a)
(2b) Y = POLW) (where x is a state vector at a given instant, x_ its value at the preceding instant, y an observed output vector and v and w system and measurement noise vectors) first, when determining the predicted state density, the vectors x_ and v together play the role of x in (l), and x in (2) plays the role of y in (1). Next, when determining the density of y given x, the vector w plays the role of X,
and y that of y. Tkn,
the filtered density of x is proportional to its predicted density times the
density of y given x. Various formulae are available for derived density.
The simplest is for the scalar case where
the function is monotonic: 4(Y)
where
/,:a)
= PW~f’W~
(Sb)
x=f_‘(_v).
This generalises to the vector case with nx = tzy and f invertible: (44
4(y) = p(x)ll Wx)l I where
(4b)
x=f_‘(y),
IJ(x>l being the determinant of the Jacobian matrix of f(x).
The general case can be handled 111by
first calculating the characteristic function of q(y): (54
6(w)=lp(x)e’O’““‘dx.
(5b)
Then
y(y)=+1 #(0)e-‘U’Ydw (2n) y where W is a vector of fry angular frequencies. 1 However, this
detour
into
the frequency domain
can be avoided with the help of the Dirac delta function, thus: ’ Marcgencmlly
0378-4754/92/$05.00
still, in case p(x)isdc.gcncrate,
0 1992 - Elscvicr
Scicncc
o(w)
Publishers
=sc i”‘r(‘)@(~)
B.V.
-see
also Section
All rights rcscrvcd
3 below.
P.R. Benyon I Derivation of dynamic estimcition equations
508
q(y)=JS[y-fix)lp(x)dx.
(6)
This is the simplest general formula and was used in [21 where, however, it was employed only as a compact way of expressing the operations to be performed, the emphasis in that paper being on numerical, in particu!ar Monte Carlo, methods of carrying them out. Those methods have. proved to be of limited power, leading to renewed interest in what might be accomplished analytically.
Of course, purely analytical methods are extremely limited in the general nonlinear
non-normal case.
They can, however, be helpful in preliminary steps that can simplify hc task to
be performed numerically. This paper shows that equation (6) and the properties of the Dirac delta function can be very useful in preliminary analysis of this class of problem. 2.
EXPLOIT’NG
THE SIFTING
PROPERTY
OF TME 6 FUNCTION
In non-vector notation equation (6) is WI,... J”,PJ;~...j;=$u,
-X(x 1,“.x,Jl...srY,
-.&(x
1’“. X”,)IP(.~~,....~,~)dr,...dx,~.
(7)
Thus, multidimensional integration is required over the whole of x space, and if the function q is needed throughout the whole of its domain, the integration must be repeated for every point in y space. In the application to dynamic estimation, integration over the nx + nv dimensions of x_ and v is needed for every point in the nx dimensional space of x to get the predicted state density.
Then
integration over the nw dimensions of w is needed to obtain the the density of y given x and this too is required at every point in the n, dimensional space of x, but only at the single point in y space representing the observed values of the outputs.
One further nx dimensional integration is
required (again at just the one point in y space) to normalise the filtered state density. Fortunately, many of these are not full-fledged integrations since the S functions make the integrand zero over most of the region of integration.
By the so-called s@ng property of the 6
function, only those points at which its argument is zero contribute to the integral. In its simplest form the sifting fomuia is
(8)
=fCO>. J~~&xMx)dw
This can be generalised to allow the argument of S to be some function, J+=-sr -_ ~(-~(x)dx
v, of x:
= ~~(~,)/Iv’(x,)I ‘
where xi is the ith root of
@)=O. (9b) This sifting property is, of course, the reason for the presence of the 6 function in equation (6)~it picks out just those points in x space at which f(x) equals y (and at which the density contributes to the density at y). In applying (9) to equation (7), each of the 6’s can, in principle, be used to eliminate one dimension of integration. If the equations Y, -.f/
j=i...t+
(10)
P.R. Senpn
I Derivation of dynamic estimation equations
509
can be solved analytically to give nry of the x’s in terms of the y’s and the remaining X’S (assuming for the present nx 2 HY)then after the necessary manipulations integration over only nX- n, of the x’s will remain. In the estimation case, prediction can, in theory, be reduced from nw+ nv to only nv integrations and filtering from nWto Ilw - ny (often zero) integrations plus those needed if the filtered density is to be normalised. In many problems 2nalytica.l solution of the complete set of equations (10) would be difficult. Note, however, that :t is not necessary to solve them all as a simultaneous set.
Sequential
solution in which one equation at a time is solved to give one of the x’s in terms of those still remaining is sufficient. As each variable is solved for, its integration can be eliminated (alung with ti Kgftinction whose argu,mer?t provided the equh:,ion jurt solved) at the expense of complicating the J’ the rc:st of the in;egr:nd (including the arguments of the remaining 6 functions). It may not bc: possible to take this process of successive elimination very far before the equations become too comp!icated to so!ve. In this connection there is considerable problem.
scope for exploiting the structure of the particular
Suppose, for example, equation (1) took the form (I la) (!lb)
y1 = XI+ yl (X3)
yz =x2 + W2hX3)
(1 lc) y3 = ~~(X~,&,&). The first is easily solved for X, in terms of xX (and y,). Substituting that result into equation (1 lb) leaves it still easily solvable for x2 in terms of x,.
After substitution of x1 and x2 in (11~) it
becomes an equation in only one unknown, x3, but probably not solvable analytically.
In more
complicated cases it may not bG so clear which equation to solve for which variable, or in what order to solve them, so as to allow as many x’s as possible to be eliminated.
Since an equation
that is initially solvable for a particular x may no longer be ank!ytically solvable for that x after the equation has been complicated
by earlier substitutions, one would like
to fbd
SCrd
G$inuin
pairing of unknowns with equations and sequencing of the equations that maximises the number of x integrations eliminated. In general, however, finding such an optimum is not an easy task. The reduction in dimensionality of integration theoretically made possible by the 6’s may thus not always be achievable analytically.
However, certain important cases that occur in the dynamic
estim: tion context are especially easy. First, the system equation (2a) is often the result of applying some numerical integration method to a system of differential equations x = f(x,v).
(12)
If the implicit Euler method is chosen, the discrete time version becomes x = x + /zf(x,v)
(13)
where h is the time step, This is trivially easy to solve for x_. The equation for predicted state density, namely P(X) = SI 61x - x_- kf(x,v)~ qo_(x_)q(v)dx_dv
(14)
P.R. Benyon i Derivation of dynamic estimationequations
510
(where cp_is the filtered density at the previous instant and 4 the densiiy of v) easily reduces to (15)
p(x) = I cp-[x- ~f(%v)lq@) dv. If instead the explicit Euler method were used, (13) would be replaced by x =x_ + hP(x_,v)
(16) which would in general be much more difficul: to solve. It is paradoxical that it should be the impiicit version that is easier to implemect since in deterministic modelling it ii the explicit method that is easier, implicit methods requiring the solution of a set of simultaneous equations. A second cast: that is easy to handle is when the measurement equation (2b) is linear in the noise, as it often is: y = g(x) + G(x)w.
(17)
Assuming the matrix G to be square and invertible, the density of y given x is ~(~14 = j ~IY - g(x) -
(18)
GWwlrW ~FV
= r[G-‘(x)(y - g(x))]/1 IG(x)l I
(19)
where r is the density of w. Sometimes G is simply a unit matrix, in which case Frequently,
P(Y~X)= r[Y - gW. both of these simple cases (the simple system equation
(2% and the simple
;ileasdrement equation) occur together, in which e.;ent only rhe rv integrations of the system noise components remain.
(The nx integrations for normalising the filtered density can probably be
avoided since for most purposes only the shape of the density matters snd a very rough normalising to avoid under- or overflow should suffice.) 3.
EXPLOITING
THE 6 FUNCTION’S
ABILITY
Throughout Section 2 it was assumed nx 2 n,.
DEGENERACY
When nx < n,, all the y probability becomes
concentrated in an fzx dimensional subspace of y _?ace. everywhere zero.
TO HANDLE
Outside that sutspace the density is
Within it, the density in y space (probability per unit of y hypervolume)
is
infinite. though the density in the swbspace (probability per unit of subspace hyperyolume) is finite. Such degenerate
densities
can be accommodated
by working
instead with cumulative
distribution functions. A formula analogous to equation (6) but relating the distribution of y to that of x was given in [2]: (21)
PtY)=Jt’rY-f(x)l@(x). However, the:‘, is actualIs no nixed to shur degenerate densities. expressions containing
They can be described by
6 functions and these expressions arise In a straightfor?\/arti ;vsp fr.;m
continuing to use equation (6) or (7). With 3%=CuY, there are more than enough 6’s in (7) to eliminate all of the integrations and the surplus serve to confine the probability to the required subspace, while the factor making up the rest of the resulting expression for q(y) definec the density throughcut the subspace. Cases of degeneracy can easiiy arise in the dynamic estimation context, as pointed out in [3]. There, another way of handling the problem was given and illustrated for a non-Gaussian XRMA (3,l) model. In state space form this model is
P.R.
Benyon I Derivation of dynamic estimation equations
511
Wa)
(22b)
The measurement eqclitinn is ur,usual in containing no noise, but can be regarded as equivalent to y=x,+w
(23) with ‘noise’ rli having density 6(w), implying that it is always zero. Hence the density of y given x is I &y-x, -w)SCw)dw = &y-x,). The filtered density is therefore of the form
(24)
(25) PU,J,J,M~Y-.Qlc where p(x,,x2,x,) is the predicted density and c a normalising constant. Since the valLz of the first factor matters only when x, = y , this can also be written p~yJ,,x,)6(y--x,!lc
(26)
~P(~zJJ~(Y-4:. By the same argument, the filtered density at the previous instant must be of the form
(27)
which is of the form
~-((xz-+.)~(y Hence, in the integrations over x,-,x,-,x,_
(28)
-x,_)
and v to obtain the predicted density at the m-rent
instant, there wil: be not only the three S’s arising from the three components of the S./stem equation, but also a fourth contributed by the filtered density from the previous instant. Normally four 6’s would be enough to eliminate all fox integrations, but in this Lase two of them are (29)
6(x, - &x1-)
(30) 6(y_ - x,_) either of which could be used to take out the x1_ inzegration but the other, because Of not containing and
x2_, x3_ or v, cannot be used to eliminate any other integrations and so one dimension of integration (say v) will remain and one S (say the first above) will carry through into the predicted density and thence into the filtered density. The latter can then be still further specialised to have the form 6(x, -
y)(p(x&%x,- &*%_).
(31)
Similarly, at the previous instant it would have the frPrm (32) 6(x,_-Y_~~.(x,_)6(x,-~,~,--). Given this prevititis filtered density and the density q(v) of v, application of the equations for prediction and filtering now yields, after some manipulation, the following procedure. (33a) Let a, = @,Y.. a2 = &y_ + q&y-_
(33b!
a3 = &y-.
(33c)
Then the filtered density IS 6(x, -)9(p(-Yz)&x~- a3)
(33d)
P.R.Benyon I Derivation of dynamic estimationequations
512
We)
where
(330
and
In this example no integration is needed except for normalising which requires only a one dimensional convolution integral. 4.
CONCLIJDING
REMARKS
Further examples, could be given, but those in Sections 2 and 3 should serve to illustrate the usefulness of the Dirac delta function in the analysis and simplification of problems in state (and thence parameter) estimation in dynamic modelling. In many cases the simplification will leave the problem still analytically intractable and the solution will have to be completed by numerical methods. A new numerical approach that looks to have some promise is a ‘sparse’ adaptation of DeWit and Fransz’s ‘compound simulation’ technique 12,4]. It is well able to exploit degeneracy or near degeneracy in the state densities. REFERENCES M G. Kendall, A. Stuart and J.K. Grd, Kendall’s Advanced Theory of Statistics, Vol. 1, Sfh edn (Griffin, London,1987). P.R. Benyon, Monte Carlo and other methods for nonlinear non-Gaussian estimation, Math. Comput. Simulation 32 (l&2)
(1990) 215220.
R. Kahn and C.F. Ansley, Comment [on G. Kitagawa’s ‘Non-Gaussian state-space modeling of nonstationary time series’], J. American Statistical Assoc. 82 (1987) 1041-1044. H.G. Fransz, The Fuwtional Wageningen, 1974).
Response
to Prey Density in an Acarine
System (Pudoc,