7 Linear Filtering Theory

7 Linear Filtering Theory

7 Linear Filtering Theory I. INTRODUCTION In this chapter, we specialize the nonlinear theory developed in Chapter 6 to linear filtering problems. A...

3MB Sizes 0 Downloads 115 Views

7

Linear Filtering Theory

I. INTRODUCTION In this chapter, we specialize the nonlinear theory developed in Chapter 6 to linear filtering problems. As was already noted in Section 5.2, in linear problems the conditional density function is Gaussian and is, therefore, completely characterized by its mean vector and covariance matrix. This fact makes linear filtering problems particularly tractable. As we have seen in some special cases (Examples 6.1, 6.2, and 6.4), the equation for the conditional covariance matrix is independent of the observations. Thus, it is also the unconditional covariance matrix and may be precomputed in advance. We shall see that this is true, in general, in linear problems. As a result, the filtering algorithm essentially consists of the equation for the conditional mean. Appropriately, the conditional mean is sometimes referred to as a sufficient statistic. We recall that, according to the Corollary of Theorem 5.2, the conditional mean minimizes the expected loss for a large class of loss functions. In any case, it is, of course, the minimum variance estimate (Theorem 5.3). By specializing the nonlinear results to the linear case, we are following the probabilistic approach outlined in Section 5.2. We will use some of the statistical methods (Section 5.3) to derive these same results in various examples. In addition, some examples will be devoted to the 194

195

CONTINUOUS-DISCRETE FILTER

generalizations described in Section 5.4 and to the derivation of a linear smoother. The bulk of the remainder of this chapter is devoted to the study of some important properties of linear filters. The concepts of observability, information, and controllability are defined and exploited in the study of the asymptotic behavior of linear filters. Useful bounds and stability properties are developed. Of great importance in applications is the sensitivity of the filtering algorithm to dynamical model and statistical model errors. Considerable attention is devoted to this subject. Finally, linear limited memory filter algorithms are developed. These, as we shall see in Chapter 8, have important applications in problems of optimal filter divergence. 2. CONTINUOUS-DISCRETE Fll..TER

Our linear dynamical system is described by the linear vector (Ito) stochastic differential equation dx,

=

F(t) Xtdt

+ G(t) df3t ,

(7.1)

where x, is the n-vector state, F and G are, respectively, n X nand T nonrandom, continuous matrix time-functions, and {flt , t ~ to} is an r-vector Brownian motion process with rS'{dflt dfl?} = Q(t) dt. Discrete, linear observations are taken at time instants t k :

n X

k

= 1,2, ... ;

(7.2)

where Yk is the m-vector observation, M is an m X n nonrandom, bounded matrix function, and {Vk' k = 1, 2,...} is an m-vector, white Gaussian sequence, Vk '"'-' N(O, R k ) , R k > O. The distribution of x'o is Gaussian, x'o '"'-' N(x to ' P'o)' and Xto ' {fl,}, and {v k} are assumed independent. For the linear system (7.1), Kolmogorov's forward equation (6.5) becomes

aplat = -p tr(F) - p,/Fx

+ i tr(GQGfp",,,,).

(7.3)

The conditional density in Eq. (6.12) is P(Yk I x)

=

(1/(21T)m/2

I

s, 11 / 2) exp{-i(Yk -

M(t k) X)T Rk1(Yk - M(t k) x)}, (7.4)

and Theorem 6.1 is specialized to the linear case.

196

LINEAR FILTERING THEORY

We obtain the equations for the conditional mean and covariance matrix, between observations, directly from Eq. (6.25) of Theorem 6.2. Since

we have dx/ldt = F(t) x t t , dP/ldt = F(t)P/

To compute the change in

+ P/P(t) + G(t)Q(t) ar(t). x and P at an observation at

(7.5)

tk

,

we use

Eq, (6.14) of Theorem 6.J.l Carrying out the integration in the denomi-

nator of (6.14), we have P(x, t k I

Y ) _ P(Yk I x) p(x, t k I Y t; tl: P(Yk I Yfi)

)

2

.

(7.6)

All the densities appearing in (7.6) are Gaussian. By definition," p(x, t k I Y t; )

'"

N(x~:, p:D.

P(Yk I x) has already been computed in (7.4). Now

in view of (7.2), and t9'{(Yk - t9'{Yk I Yfi})(Yk - t9'{Yk I Y fiW I Yfi}

so that Combining these results, and dropping the tk subscripts and superscripts for the moment, we have

I MP-MT + R 11 / 2 p(x, t k I Y t.) = (27T)R/21 R 11 / 21 P_1 1 / 2exp [ -to],

1 The following derivation was given by Ho and Lee [27] and Lee [37]. 2 Recall that Y T = {YI: t l <: 'T} (6.3); Ytl: = {Yl ,... , Yk}; Y ti = Ytl:_1' a Recall that v-T(Xt) = {ep(Xt)yT = .fT{
(7.7)

197

CONTINUOUS- DISCRETE FILTER

where

+ (x - x--)T P_-l(X Mx-)T (MP-MT + R)-1 (y - Mx-).

{-} = (y - Mx)T R-l(y - Mx) - (y -

x--)

(7.8)

Now, by definition,

so that the term in braces in (7.8) must be (7.9)

To put (7.8) in the form of (7.9), we rewrite it as

{-} = xT(MTR-IM + P_- 1) x _ 2(yTR-IM + X-TP---l)(MTR-lM + P---l)_1 X

(MTR-IM

+ P_- 1) x + yTR-ly + X-Tp--1X-

- (y - Mx--)T (MP--W

+ R)-l (y -

Mx--)

and complete the squares by adding and subtracting

We thus obtain

{-} = [x - (MTR-IM + p---l)-1 (MTR-ly

+ p--1x-W [MTR-IM + P---l]

[x - (MTR-IM + P-- 1)_1 (MTR-ly

X

+ P---1 X-)] +

(7.10)

T,

where, with the aid of the matrix equalities developed in Appendix 7B, r can be shown to be zero. Comparing (7.10) with (7.9), we conclude

that

xtt _ (MT(tk ) R-1M(t ) + pt'k )-1 (MT(tk ) R-1y tt k k tk k k +

-1

+ pt'k tt

~

xt'k) tt '

(7.Il)

Equations (7.11) are the desired relations. We see [(7.5), (7.11)] that the conditional covariance matrix is independent of observations and is, therefore, the unconditional covariance matrix: Ptt

£

tf{(Xt - Xtt)(Xt - x/)T I Y t} = tf{(Xt - xtt)(Xt - x/)TJ.

(7.12)

Consequently, we shall sometimes write P,' as P(t I t) and even as P(t) when no confusion can arise.

198

LINEAR FILTERING THEORY

The computation of Eqs. (7.11) involves the inversion of n X n matrices. We next reduce (7.11) to a form that requires the inversion. of an m X m matrix (m is the dimension of the observation vector). Often, in applications, m < n, and, in case R is diagonal, we may always take m = 1. This is done by processing the components of Yk one by one, and supposing that there is no change in x or P due to the dynamics (7.5) in the meantime. That this is equivalent to processing the whole observation vector at once is conceptually clear and can be proved algebraically (very tedious!). Using (7B.5) of Appendix 7B, we rewrite the second of Eqs. (7.11) as (7.13)

With the aid of (7B.5) and (7B.6) of Appendix 7B, the first of Eqs. (7.11) becomes x:r

=

x::

+ p:fMT(tk)[M(tk) P:fMT(tk) + Rkr

1

(Yk - M(t k) x::).

(7.14)

+ Rkr

(7.15)

If we define the Kalman gain [32, 36] K(t k) g p:fMT(tk)[M(tk) p:fMT(tk)

1

,

we have x:r

=

x:f

+ K(tk)(Yk -

M(t k) x::),

p:r = Pt:" - K(t k) M(t k) P:f

(7.16)

= [I - K(t k) M(t k)] Pt:"[I .....:. K(t k) M(tkW + K(t k) RtKT(tk). We summarize these results in the following theorem.

Theorem 7.1.

The optimal (minimum variance) filter for the continuousdiscrete system (7.1), (7.2) consists of equations of evolution for the conditional mean XI' and covariance matrix P/' Between observations, these satisfy the differential equations dx/fdt = F(t) x/, dpttfdt = F(t)P/

+ P/FT(t) + G(t)Q(t) GT(t),

At an observation at t k x:r

,

(7.17)

they satisfy the difference equations

= x:: + K(tk)(Yk - M(t k) x::),

(7.18)

199

CONTINUOUS-DISCRETE FILTER

where the Kalman gain K is given in (7.15). Prediction for t > T (X{, p{), based on Y T (6.3), is clearly accomplished via (7.17), with initial condition (x/, PTT).

Theorem 7.1 gives the celebrated Kalman-Bucy filter [32, 33, 36] for the continuous-discrete problem. Compare this with Examples 6.1 and 6.2. The equations for the estimate are linear, with a forcing term that is linear in the observation. The forcing term is proportional to the residual The initial conditions (X'o ' P ) are part of the problem statement and 'o reflect the prior knowledge of X, before the filtering begins. As will be seen from other derivations of these same results, we need only that P O. In connection with the question of prior information, see Appendix 7A and also Sections 5 and 10. The parameters (x/, P/) comprise the state of the linear filter. They determine the density p(x, t I V,), However, as has already been noted, P,' and the filter gain K(tk ) may be precomputed, since they do not depend on the noise samples, but only on the noise statistics which are part of the problem statement. Thus, once the observation schedule has been settled, the real-time implementation of the Kalman-Bucy filter requires only the computation of the first of Eqs. (7.17) and the first of Eqs. (7.18). This happy situation is, of course, unique to the linear case. Now Eq. (7.1) can be integrated over intervals [tk, tk+l]:

'o; : :

XI~

1

+

= <1>(tk +1 , t k ) Xt~ +

f

t~+1

t~

<1>(tk +1 , T) G(T)dfJT ,

k

= 0, 1,..., (7.19)

where the n X n matrix <1> is the fundamental matrix of the homogeneous part of (7.1), or the state transition matrix of (7.1). That is, d<1>(t, T)/dt = F(t) <1>(t, T), <1>(t, T) <1>(T,

~) =

<1>(t,

<1>(T, T) = I

e)

all t, T,

all T,

e.

(7.20) (7.21)

From (7.21), it follows that <1>-I(t, T) = <1>(T, t)

all t, T.

(7.22)

We write (7.19) as (7.23)

where (7.24)

200

LINEAR FILTERING THEORY

It is easy to see that {Wk} is a zero-mean, white Gaussian sequence with (see Theorem 4.1) 0"{wk+lwI+l} =

I

tk+l

cP(tk+l , T) G(T) Q(T) GT(T) cPT(tk+l , T) dr,

(7.25)

tk

Thus the continuous-discrete filter can be imbedded in a discrete filter, and we need not study the properties of the continuous-discrete filter separately. It is easy to verify that (7.17) of Theorem 7.1 are equivalent to (7.26)

3. DISCRETE FILTER

The discrete linear system is described by the vector difference equation k = 0,1,..., (7.27) where Xk is the n-vector state at t~, ep is the n X n, nonsingular state transition matrix, r is n X T, and {wk , k = I, ...} is an r-vector, white Gaussian sequence, Wk""'" N(O, Qk)' The discrete, linear observations are given in (7.2): (7.2) Xo, {wk}, and {Vk} are assumed independent. Difference equations for x and P at an observation are already given in Theorem 7.1. We compute difference equations between observations directly from (7.27). This procedure avoids the computation of the transition probability density p(xk +1 I x k ) and the Chapman-Kolmogorov equation (6.62) of Theorem 6.4. We know that all the densities are Gaussian. Now k, X~+l = 0"{Xk+l I Y k} = cP(tk+1 , t k) Xk

in view of (7.27). Also, P~+l = tf{(Xk+l - X~+l)(Xk+l - x:+l? I Y k}

= 0"{[cP(Xk = cPPkkcPT

Xk k) + rWk+J[cP(Xk - Xk k) + rWk+JT I Y k}

+ rQk+lFT .

Compare this with (7.26).

201

DISCRETE FILTER

The discrete filter is summarized in the following theorem. Theorem 7.2. The optimal (minimum variance) filter for the discrete system (7.27), (7.2) consists of difference equationsfor the conditional mean and covariance matrix. Between observations, (7.28)

At observations, (7.29)

where (7.15)

is the Kalman gain. Prediction for t, (7.28) with initial condition (~kk, Pkk).

>

t k (~,k,

pn is accomplished via

This is the Kalman-Bucy filter [32; 36] for the discrete problem. The discussion of Theorem 7.1 applies in this case as well. We may note that, if Po ~ 0, then P/ ~ 0 for all k, since P is a covariance matrix. Example 7.1. (Orthogonal Projections). In his original derivation of the discrete filter, Kalman [32] used the concept of orthogonal projection. We develop this concept here and, using it, rederive the discrete filter. As was pointed out in footnote 7 of Chapter 3, the space of random variables with finite second moments, with scalar product


is a (complete) normed space (an infinite-dimensional Hilbert space). Now, by definition, the minimum variance estimate minimizes the error norm The following lemma gives a condition that is equivalent to the minimization of the norm.

202

LINEAR FILTERING THEORY

Orthogonal Projection Lemma. Let X be a normed space, x and let Y be a subspace of X. Then min II x - 0: 11 2 = aEY

if, and only if, (x -

x, 0:) =

0

II x -

E

X,

X 11 2

for all 0: E Y.

The space Y is the "approximation space." That is, we want to find that a E Y which minimizes the error norm for every x E X. The orthogonal projection lemma states that the error is orthogonal to the approximation space Y. That is, if we write x =

x+x,

where x = x - ~ is the error and ~ is the best approximation (estimate), then ~ is the orthogonal projection of x into Y. PROOF:

any a

E

(x -

We first prove the "if" part. Suppose (x Y, a =1= O. Then

x + (X, x -

X + 0:) = (x -

= (x -

>

(x -

~,

Y) = O. Take

x, x - x) + 2(x - x, (X) + «(X, (X) x, x - x) + (0:, (X) x, x - x).

Now we shall prove the "only if" part. Suppose there exists an a such that (x - x, 0:) = P =1= O. Then, for any scalar '\,

Then, for ,\ II x

-

=

-,8/11 a 11 2 ,

x + '\0: 11 2 =

[I x

-

x 11 2 - 112~2 + II ~1I2 < II x - x11 2•



In our filtering problem, the approximation space is Y k = {space spanned by Yl ,..., Yk}' This is a km-dimensional space. The space X is infinite-dimensional. Let the set of m-vectors {u1 , ... , Uk} be an orthonormal basis for Y k , that is, i =1= j.

203

DISCRETE FILTER

It is interesting that the minimization of (7.30)

is accomplished componentwise. That is, in minimizing (7.30), we shall minimize each component of the expected error. Now, from the orthogonal projection lemma, we have for all i, or

Multiplying by

Ui

and summing,

L" tS'{x"ul}

Ui =

i=1

According to the lemma, Therefore,

~kk

L" tS'{x""ul} u, •

(7.31)

i=1

is the orthogonal projection of Xk into Y. xl =

L"

(7.32)

AtUi

I-I

for some set of constant matrices Ai. Using (7.32), it is easy to show that the right-hand side of (7.31) is simply ~kk; thus,

x,," = L" C{x"ul}

Uj

1=1

or Xk k =

10-1

L C{XkU? } UI + C{XkUk"f} Uk'

1-1

and, using (7.27), Xk 10 =

"-I

L C{[C/>(k, k -

1) X"_I

1=1

Now wk is independent of Yk -

x/ =

I ,

+ rWk] UI"f} u, + C{XkUk"f} Uk·

and so we get

C/>(k, k - 1) xti

+ C{xku,,"f} U".

Now Uk is orthogonal to Yk- I . We next show that

(7.33)

204

LINEAR FILTERING THEORY

is also orthogonal to Y k - 1 , and, since it clearly lies in Y k , we must have

for some constant matrix K k



Clearly,

i = 1,...,k - 1, and, multiplying this by

([J

and using (7.27),

tB'{(Xk -
i = 1,..., k - 1.

Multiplying this by M and using (7.2), tB'{(Yk - M
o,

i = 1,..., k - 1.

Therefore, Xkk =
+ Kk(Yk -

It remains to determine the gain K k The estimation error is given by

- Kk[M(k) r(k - 1) Wk

(7.34)



+ r(k -

x/ = Xk - Xk k =
M(t k)
1) Wk - K~(k)
+ Vk],

(7.35)

where we have used (7.27) and (7.2); and Yk = M(k)
+ x~:~) + M(k) r(k -

1) Wk

+ f)k •

Therefore, by the orthogonal projection lemma,

o=

tB'{Xkky/}

=


+ r(k -

1) QkTT(k - 1) MT(k)

- Kk[M(k)
+ M(k) r(k -

1) QkTT(k - 1) MT(k)

+ Rk],

where we have defined - 1 t:>. P kk-l =

4{-k-l,...k-lT}

(0

Xk-lXk_l .

Defining in addition p~-l

g
we get from (7.36)

x, =

P~-lMT(k)[M(k) P~-lMT(k)

+ Rkr

1



(7.36)

205

DISCRETE FILTER

Using (7.35), it is easy to compute the difference equation for T P kk -_ BJ{-Xk k_k Xk } (0

and thus to see that we have derived the Kalman-Bucy filter in Theorem 7.2. ... In the next several examples, we shall rederive the Kalman-Bucy filter using some of the statistical methods outlined in Chapter 5. Example 7.2 (Recursive Least Squares). This method has been used by many authors (e.g., Swerling [53J, Gainer [19J, Fagin [16J, Mowery [4OJ, and Bryson and Ho [IOJ) to derive the discrete Kalman-Bucy filter. We give a version of the method here. For simplicity, we assume that the dynamical system is noise-free [wk = 0 in (7.27)J. In this case, the least squares problem (5.22), (5.23) at time t k is to minimize, with respect to {xo ,..., Xk}' lk'

=

~ (xo - xo? pi/(xo -

xo)

+ ~ f.

(Yi - M(i) Xi)T R,l(Yi - M(i) Xi),

i=1

subject to the constraints Xi+!

= q)(i + 1, i) Xi,

i = 0,..., k - 1.

In view of the constraints, we may alternatively minimize

I» = ~ (Xk - Xk)T q)T(O, k) po1q)(O, k)(Xk - Xk)

+~ t

(Yi - M(i) q)(i, k) Xk? R,I(Yi - M(i) q)(i, k) Xk)

(7.37)

i~1

with respect to xk • Note that ifJ(i, k) = ifJ-l(k, i). Since the prior estimate of X o is Xo , that of Xk is xk = ep(k, O)xo . To minimize lk in (7.37), we set its gradient with respect to Xk equal to zero: q)T(O, k)

po1q)(0, k)(Xk -

k

Xk) -

L: q)T(i, k) MT(i) Ri

1(Yi

- M(i) q)(i, k) Xk)

i=1

= 0.

(7.38)

Now defining ~.1

g

k

L: q)T(i, k) MT(i) R,IM(i) q)(i, k)

i-I

(7.39)

206

LINEAR FILTERING THEORY

(this is the information matrix defined in Section 5; see also Appendix 7A) and tt, g the solution

x/ =

xk k

k

L epT(i, k) MT(i) RjlYi ,

(7.40)

i-I

of (7.38) is written as

[epT(O, k) 'p;lep(O, k)

+ .J'ur1 [Hk + epT(O, k) 'p;lep(O, k) Xk]'

(7.41)

Note that the indicated inverse in (7.41) always exists since, by assumption, Po > 0, and u, > 0 implies ~.1 ~ 0 for all k. Now clearly the minimization of Jk+1 yields x~ti

= [epT(O, k + 1) 'p;lep(O, k + 1) + .J'k+1.J-1 X [Hk+1 + epT(O, ~ + 1) 'p;lep(O, k + 1) Xk+J.

It is easy to derive the following difference equations for

~.1

(7.42)

and H k

.J'k+1.1 = epT(k, k + 1) ~.lep(k, k + 1) + MT(k + 1) Rk~IM(k + 1),

Hk+1 = epT(k, k

+ 1) tt, + MT(k + 1) Rk"~lYk+l ,

:

(7.43)

and, using these and the definition

p:- g epT(O, k) POlep(O, k) + ~.1 1

(7.44)

'

to rewrite (7.42) as

x~ti = [epT(k, k + 1) ~-lep(k, k

+ 1) +

MT(k + 1) Rk~IM(k

+ 1)]-1

[epT(k, k + 1)(Hk +ep T(O, k) 'p;lep(O, k) Xk) + MT(k + 1) Rk";IYk+J. (7.45) Now, eliminating ut; + epT(O, k) 'p;lep(O, k) Xk) X

via (7.41) and defining k-1

P k+1 gepT(k,k X~+1

g

ep(k

+ I)Pkk ep(k,k + 1),

+ 1, k) x/,

t

(7.46) (7.47)

Eq. (7.45) takes the form

x~ti = [P~:: + MT(k + 1) Rk~IM(k + 1)]-1 [P~:~X~+1 + MT(k + I)Rk~IYk+1]' (7.48)

207

DISCRETE FILTER

It is easy to show by induction that the difference equations ~~~

= c[>T(i, i

+ 1) ~-lC[>(i, i + 1),

(7.49)

i = 0, 1,... ;

~-l

o

= Pz!

0 '

(7.50)

k-1

applied in the order listed, have P k as defined in (7.44), as their solution. Comparing (7.47), (7.49) with (7.28), and (7.48), (7.50) with (7.11), we see that we have derived the discrete Kalman-Bucy filter. At least, our recursive least squares filter has the same form as the Kalman-Bucy filter. We have not attached any statistical significance to the weighting matrices Po and R k and need not do so. We simply have a recursive algorithm, which requires no data storage, for generating the least squares estimate. Once we make the identifications

then we have the Kalman-Bucy filter. Then it is easy to compute from (7.41) directly that C{(Xk - xl)(xk - xl)T} = Pl,

where

Pkk

is given in (7.44).



Example 7.3 (Maximum Likelihood). Ho [25J, Schmidt [47J, Rauch et al. [45J, and others used a maximum likelihood approach to derive the discrete Kalman-Bucy filter (see also Smith et al. [50]). We derive the maximum likelihood (Bayesian) estimator (see Section 5.3) in this example. Since all the densities in our linear problem are Gaussian, their means correspond to their respective modes. As a consequence, the mode (mean) and its error covariance matrix satisfy (7.28) between observations. The recursion for the mode (mean) at an observation is obtained by maximizing, with respect to x, the conditional density p(X, t k I Y tt ) ,

which is given in Eq. (7.7). This is equivalent to minimizing the term in braces in (7.8). Setting the gradient of Eq. (7.8) with respect to x equal to zero, we get -W(k) R"k1(Yk

-

M(k) x k )

+ P~-l-l(Xk -

X~-l)

=

O.

(7.51)

208

LINEAR FILTERING THEORY

The solution of (7.51), denote it by Xkk = (MT(k) R,/M(k)

Xk k,

is

+ PZ-l-1)-1 (MT(k) Ri/Yk + PZ-r1xi-1),

which is just the first of Eqs. (7.11). Using the matrix identities in Appendix 7B, we get, as before, x/ =

X~-l

+ P~-lMT(k)[M(k) Pi-1MT(k) + Rk]-l (Yk -

M(k)

X~-l),

(7.14) and the estimation error Xkk ~ x k - Xkk = [1 - K(k) M(k)]

X~-l

- K(k) v k '

(7.52)

where we have used (7.14), (7.2), and (7.15). Therefore, the estimation error covariance matrix is computed to be P kk = 8{XkkX~T}

= [1 - K(k) M(k)] P;-V - K(k) M(kW + K(k) R~T(k)

= p~-l _

K(k) M(k) pZ- 1.

(7.53)

Comparing (7.14) and (7.53) with (7.29), we see that we have derived the Kalman-Bucy filter. ...

Example 7.4 (Linear Minimum Variance Estimator). Suppose the estimate and its error covariance evolve according to (7.28) between observations. At an observation, assume a linear estimator of the form X/ =

X~-l

+ K(k)[Y k -

M(k)

(7.54)

X~-l],

where the gain K(k) is to be determined so as to minimize the error variance tS'{(Xk - Xkk)T (Xk - Xkk)}

=

tr

tS'{(x k - Xl)(Xk - Xkk)T}

~ tr

Pkk. (7.55)

This approach to the linear filtering problem was taken by Battin [2, 3], Schmidt [47], and Sorenson [51]. With considerable hindsight, let K(k)

=

P~-lMT(k)[M(k) P~-lMT(k)

+ Rkr1 + K(k),

(7.56)

and find that 1?(k) which minimizes (7.55). Using (7.54)-(7.56), we get tr p/

=

tr[PZ-1 -

P~-lMT(MP~-lMT

+ K(MP:-1MT + R k) K T],

+ Rk)-l Mp;-l (7.57)

209

DISCRETE FILTER

where Now so that

tr f(Mp:-lM T + R,.) f(T

~

0

for all K, Therefore, the minimizing K = 0, and K(k) is the Kalman gain. From (7.57), we have the recursion for P. It is easy to rederive a known result. A The several examples that follow consider some extensions to our filtering problem, as outlined in Section 5.4. Example 7.5 (Correlated System and Measurement Noise: 1). Suppose the system noise (U'k) and measurement noise (Vk) are correlated, that is, 8{w,.v?} = CIt S,.I •

(7.58)

The discrete Kalman-Bucy filter for this case was derived by Kalman [36]. [Note that his problem is slightly different from ours in that the noise in our system (7.27) is F(k)U'k+1 , whereas Kalman has F(k)U'k .] We shall use the orthogonal projection lemma of Example 7.1 in our derivation. The reader can verify that the analysis in Example 7.1 up to, but not including, Eq. (7.36) applies in the present case. Equation (7.36) now becomes

0= pt1MT(k)

+ r(k -

+ Ck,TJ'T(k -

1) c, - K,.C[M(k) P~-lMT(k)

1) MT(k)

+ Rk,],

+ M(k) r(k -

1) CIt (7.59)

where, as before,

Therefore, the Kalman gain for this problem is K,. e

=

[P:-lMT(k)

+ r(k -

+ Ck,TJ'T(k -

1) C,.][M(k) P~-lMT(k) + M(k) r(k - 1) CIt

1) MT(k)

+ Rk,]-I.

(7.61)

Using (7.35) and (7.61), it is easy to compute the recursion for P:

r,

k,

= p:-I _ K,.C[M(k) p:-l + C k TrT(k - 1)].

(7.62)

210

LINEAR FILTERING THEORY

Thus, between observations, the estimate and covariance matrix still evolve according to (7.28). At an observation, the estimate satisfies the first of Eqs. (7.29), but with K k replaced by K k c. The covariance matrix satisfies (7.62). ... Example 7.6 (Correlated System and Measurement Noise: 2). that instead of(7.27), (7.2), our system is Xk+l Yk

+ 1, k) Xk + T(k) Wk , M(k) Xk + Vk ,

= ep(k =

Suppose

(7.63)

with (7.64)

This corresponds to the problem treated by Kalman [36] and others. We derive the optimal filter for this problem using the orthogonal projection lemma again, but in a slightly different form. Instead of minimizing (7.30), we minimize tff{(Xk+l -

X~+l)T

(Xk+l -

X~+l)}'

This will enable us to compute a linear recursion for X%+l ; that is, we will compute X%+l as a linear function of X~-l. (In Example 7.1, we computed the recursion for xk k . ) The Kalman gain for this problem is more conveniently computed within this structure. By the orthogonal projection lemma, we have for all t

«; k.

Computations analogous to those in Example 7.1 lead to X~+l

= ep(k

+ 1, k) X~-l + @"{Xk+lUkT} Uk ,

which is the analog of (7.33). It is easy to show that Yk -

M(k) X~-l

is orthogonal to Y k - 1 , and, since it is contained in Y k , that tff{Xk+1UkT} Uk

=

Kk
M(k) X~-l].

Thus, the filter for this problem has the structure X~+l

= ep(k

+ 1, k) X~-l + Kk
It remains to determine the Kalman gain

Kkd.

M(k) X~-l].

(7.65)

211

DISCRETE FILTER

From (7.63) and (7.65),

£

X~+I

X"+I - X~+I = ([>(k

+ 1, k) X~-l + F(k)

and y,. = M(k)(x~-l

WIt -

K,.d[M(k) xt1 + 'lJ,.], (7.66)

+ xt1) + 'lJ,. .

Therefore, by the orthogonal projection lemma,

o=

llf{x~+Iy,.T}

= ([>(k + 1, k) P:-lMT(k) + F(k) CIt -

K,.d[M(k) N-1MT(k)

+ R,.],

so that the Kalman gain is K,.d = [([>(k

+ 1, k) pt1MT(k) + F(k) C,.][M(k) P:-lMT(k) + R,.r1. (7.67)

Using (7.66) and (7.67), it is easy to compute the recursion for P: P:+I

£ Ilf{X~+IX~:l}

-

By setting f/J have

+ 1, k) ~-l([>T(k + 1, k) + F(k) Q,.FT(k) K,.d[M(k) p:'-l([>T(k + 1, k) + C,.TFT(k)]:l

= ([>(k

= I, r =

0 in (7.65) and (7.68) (so that

xk+l

A"-l + K ,.( y,. - M(k) X,., A,.-l) = X,. p,." = p:-l _ K,.M(k) p~-\ A" x,.

where K,. = pt1MT(k)[M(k) P:-lMT(k)

=

(7.68) Xk)'

we

(7.69)

+ R,.r1.

(7.15)

Then, using (7.69) with (7.65) and (7.68),

= ([>(k + 1, k) x,.,. + F(k) C,.[M(k) P:-lMT(k) + R,.r1 (y,. P:.+1 = ,p(k + 1, k) P,.",pT(k + 1, k) + F(k) Q,.rT(k) x:+1

- r(k) C,.[M(k) P:-1MT(k) - ([>(k

+ 1, k) K,.C,.TFT(k) -

Prediction for k X~+l

=

+ R,.r1 C,.TFT(k)

>

I

r(k) C,.K,.T,pT(k

+ 1, based on Y

,p(k + 1, k) X,.l,

"

M(k) X~-l), (7.70)

+ 1, k).

is clearly accomplished via

k

=

1 + 1,1 + 2,...,

, Equation (1II d ) , p. 303 in Kalman [36], which is analogous to this equation, is in error.

212

LINEAR FILTERING THEORY

where xl+l , Pl+1 is the filtering solution. This is easily verified directly from (7.63) (take the expectation with Y/ fixed). .. Example 7.7 (Sequentially Correlated (Colored) Measurement Noise). Consider the system (7.63), where {Wk} is white, Gaussian, Wk ,......, N(O, Qk), but {Vk} is sequentially correlated, the output of the linear system (see Example 4.12) V o ,......,N(O,

V),

Here {Uk} is also white, Gaussian, Uk ,......, N(O, Sk)' independent of {wk}. We could proceed as follows. Define the augmented state vector and the matrices @=

0]

[(/J,

'P'

0,

M= [M, 1],

Then the augmented dynamical system is (7.72)

with perfect measurements uncontaminated by noise. Assuming M(k) P~-).MT(k) is positive definite," the Kalman-Bucy filter of Theo-

rem 7.2 can be used in system (7.72). Recall that its derivation in Example 7.1 does not require a nonsingular measurement noise covariance matrix. In fact, the derivation stands in the absence of measurement 0). Thus the difference equations for the augmented state noise (R k estimation error covariance matrix are

=

Pk k P:+l

= P:-l_ P:-IMT(k)[M(k) Pt"lMT(k)r 1 M(k) P:-\ =

@(k

+ 1, k) PkkcPT(k + 1, k) + r(k) QkrT(k).

(7.73)

Now since the linear combinations of state variables M(k)Xk are known precisely at time tk , Pl must be singular. In fact, from (7.73),

5 In fact, this matrix need not be positive definite, if, following Kalman [36], we use the pseudo-inverse rather than the inverse in (7.73).

213

DISCRETE FILTER

As a consequence, P~+l can be ill-conditioned, leading to difficulties in the computation of P~t~ . Furthermore, the dimension of the augmented state filter is n m, which also compounds the computational problems. Following Bryson and Hendrickson [9], we introduce a measurement differencing scheme that results in measurements with additive white noise and leads to an equivalent filter of dimension n, in which computational ill-conditioning is eliminated, or at least reduced. With the aid of (7.71), the measurement can be written as

+

Yk = M(k) Xk

It is easily seen that

Vk-l

+ 1J'(k, k -

1) 'Vk-l + Uk-I'

can be eliminated from Yk by subtracting 1J'(k, k - 1) Yk-l .

Thus, Yk - 1J'(k, k - 1) Yk-l

+ Uk_l -

=

M(k) Xk

=

[M(k) f])(k, k - 1) - 1J'(k, k - 1) M(k - 1)] Xk_l

+ M(k) r(k -

1J'(k, k - 1) M(k - 1) Xk-l

1) Wk_1 + Uk-I'

Let H(k - 1)

~

M(k) f])(k, k - 1) - 1J'(k, k - 1) M(k - 1),

(7.74)

and define the measurements

ek ~ Yk - 1J'(k, k - 1)Yk-l

e e

=

H(k -

1) Xk-l

+ M(k) r(k -

(7.75) 1) Wk-l + Uk-I' k~2.

The measurements k , k ~ 2, now contain additive white noise which, we note, is correlated with the system noise (w). The general approach is this. We process 1 with the augmented-state filter described earlier. Subsequent observations k will be processed via the filter developed in Example 7.6, where the system is Xk

e

= «>(k, k - 1) Xk_l

+ r(k -

1) Wk-l'

Thus, aside from the starting procedure (processing e1 ) , the filter will be of dimension n. Our measurement noise covariance matrix will be R k- 1 = .M(k) r(k - 1) Qk_1J'T(k - 1) MT(k)

which, we assume, is positive definite.

+ Sk-l ,

214

LINEAR FILTERING THEORY

Assume, for simplicity, that to = t l . That is, the first observation is made at the initial time to • [If this is not the case, then a prediction via (7.28) propagates the initial conditions to t l . ] Then applying the augmented-state filter to gl results in the following estimate of Xl (not Xl) and the associated covariance matrix: XII

= Xo + PoMT(I)[M(I) PoMT(1)

p I 1 = Po -

+ VJ-1 (Y1 - M(I) xo), PoMT(I)[M(I) PoMT(I) + VJ-1 M(I) Po.

(7.76)

Now we use the n-dimensional filter of Example 7.6 on measurements k , k ~ 2, with initial conditions XII and P 11 from (7.76). The system is

e

+ 1, k) Xk + T(k) Wk , ek+1 = Yk+1 - 'P(k + 1, k) Yk = H(k) Xk + M(k + 1) T(k) Wk + Uk' H(k) = M(k -f 1) C]>(k + 1, k) :.- 'P(k + 1, k) M(k). Xk+1 = C]>(k

k?;; 1, (7.77)

e

Note that k +1 , although it depends on Xk and is to be formally treated as a measurement at tk , depends on Yk+1 . As a consequence, in applying the filter equations (7.65) and (7.68) to (7.77), XZ+1 and PZ+ l in those filter equations are, in fact, x~+~ and P~+~ respectively. That is to say, they are conditioned on Y k +1 ' not on Y k • Keeping this in mind, and noting that Yk in (7.65) is now ek+1 , and M is now H, we have

,

+ 1, k) x/ + [C]>(k + 1, k) P/HT(k) + T(k) Ck] X [H(k) PkkHT(k) + R k]- l [ek+1 - H(k) Xk k], (7.78) C]>(k + 1, k) P/c]>T(k + 1, k) + T(k) QkTT(k) - [C]>(k + 1, k) PkkHT(k) + T(k) Ck] X [H(k) PkkHT(k) + Rk]- l [H(k) Plc]>T(k + 1, k) + ClTT(k)],

X=~~

= C]>(k

P:~~

=

where

c; = s, =

QkTT(k) MT(k M(k

+ 1),

+ 1) T(k) QkTT(k) MT(k + 1) + Sk'

(7.79)

Note that the reduced filter (7.78) requires the storage of one measurement, since k is the difference of two consecutive measurements. The reduced filter does not generate estimates of Vk' but these

e

215

DISCRETE FILTER

estimates are, in general, not desired. It is easy to compute them, however. From the measurement equation (7.63), it readily follows that Vk k = Yk - M(k) Xk k, Vkk

~

tff{(Vk - Vkk)(Vk - Vkk)T}

=

M(k) PkkW(k).

(7.80)

From the system equations (7.77) and (7.71), it is easy to compute the prediction equations X~+1

=

V~+1

=

P~+1

= =

V~+1

+ 1, k) Xk" P(k + 1, k) Vk',



+ 1, k) Pk'
P(k

+ 1, k) Vk'pT(k + 1, k) + Sk'

k = I, I

(7.81)

+ 1,... ,

v,',

where ~,', P,', VI' are the filtering solutions. Now the reduced filter (7.78) was derived formally, assuming that optimal filtering of observations gk is equivalent to optimal filtering of observations Yk' This latter fact remains to be proved. Actually, it is easy to prove using the orthogonal projection lemma (see Rishel [46]). The fact that (7.78) is the (unique) minimum variance filter will be proved if we show that tff{(xk - Xk k) y,T}

=

0,

all I

~

k.

(7.82)

I

~

k.

(7.83)

Now it was proved in Example 7.6 that all

e

But 1 Then

=

Yl , so (7.82) is true for 1= 1. Suppose (7.82) is true for I.

tB'{(Xk - Xkk)Y;+1}

= tS'{(Xk - x/) {[+1} + C{(Xk - X/)y,TpT} = 0,

using (7.77), (7.83), and the inductive hypothesis. This proves (7.82). Since (7.82) also implies that tB'{(Vk - Vkk)y,T) = 0,

all

I ~ k.

.&

Example 7.8 (Linear Smoother). We develop here one form of the linear smoother using the Bayesian maximum likelihood approach of

216

LINEAR FILTERING THEORY

Rauch et al. [45]. Other derivations and (equivalent) forms of the linear smoother may be found in the references cited in Section 5.1. Our system is (7.27), (7.2) with {wk} and {Vk} independent. We assume for simplicity (see Section 3.9) that F(k) and Q(k) are n X nand nonsingular. The solution in the general case, without this assumption, is the same, but the derivation requires the cumbersome machinery developed in Section 3.9 to get an expression for the transition density. We leave that case for the reader. Hint: the Dirac delta function is formally maximized when its argument is zero. Set its argument equal to zero, and use that relation as a constraint. Assuming the filtering solution (Xk k, Pl) is available, we seek a recursion for xk', k < I. Now the conditional means Xk' and x/Hl are those values of X k and Xk+1 , respectively, which maximize the (Gaussian) density Now x I Y) - P(Xk , Xk+1 , Y,) _ P(Xk , Xk+1 , Yk, Yk+l ,..., y,) p(~,~ ~~ ~~

,=

p(Yk ) p( Y,) P(Xk , Xk+l 'Yk+l ,..., Yl I Yk) p(Yk )

= p(Y,) P(Yk+l ,..·,Y,I Xk' Xk+l' Yk)P(Xk, xk+11 Yk). But and

in view of the properties of our system, so that

where c is independent of X k • The densities in (7.84) are Gaussian, specifically, P(Xk+1 I Xk) P(Xk I Yk)

f"-.J

f"-.J

N(c.P(k + 1, k) Xk , r(k) Qk+lf'T(k)J, N(Xkk, Pkk).

Therefore, to maximize (7.84) with respect to

Xk , Xk+1 ,

+ 1, k) XkJT (T(k) Qk+lf'T(k)J-l [x~ + i(xk - xkk)T Pfl(X x/) + d(xk+l)'

![Xk+1 - c.P(k

k -

c.P(k

we minimize

+ 1, k) XkJ

217

DISCRETE FILTER

where d is independent of Xk available. Then xk I minimizes

J = Hi k+1 Xk •

Suppose

the minimizing

Xk+l'

is

Xkk)T Pfl(X k - Xkk)

Setting the gradient of ] with respect to

Xk = [p~-l + <1>T(k '

Xk+1 ,

<1>xkF [rQk+1J'T]-l [x~+1 - <1>xJ

+ t(x k with respect to zero, we get



Xk

equal to

+ I, k)[r(k) Qk+1rr(k)]-l <1>(k + I, k)]-l

x [pk;lxkk + <1>T(k + I, k)[r(k) Qk+lrr(k)]-l xk+1]'

(7.85)

Applying Eqs. (7B.5) and (7B.6) of Appendix 7B, we have Xk' = x/

where

s, =

Pl<1>T(k

+ Sixk+1 -

<1>(k

+ I, k) x/),

(7.86)

+ 1, k)[<1>(k + 1, k) Pkk<1>T(k + I, k) + r(k) Qk+lrr(k)]-l (7.87)

Equation (7.86) is the smoothing algorithm. The computation goes backwards (in the index k) with x,', the filtering solution, as initial condition. The filter solutions (Xk k, Pkk) are required in the computations. We now develop a recursion for the smoothing error covariance matrix. From (7.86), or Squaring both sides and taking the expectation, we get Pkl

+ Sk@"{xk+1x1:1} SkT = p/ + Sk<1>@"{x/xf} <1>TSkT,

(7.88)

where we have used the fact that (7.89)

It is easy to see that (7.89) holds. For example, T} T J.O{-Xk IAI moT _ Xk+l} -_ coJ.O{-Xk lA1 Xk ,... -

(0

=

C{C{Xkl

T

coJ.O{J.O{co Xk IAI Xk

I Y ,} xr} <1>T =

I Y}} moT I""

tf{(Xk l

-

Xk l )

xf} <1>T =

O.

218

LINEAR FILTERING THEORY

Now (7.90)

in view of (7.89). Similarly,

tf{x~l}

= tf{x,.kxf}

+ P.,/.

(7.91)

Also, (7.92)

Using (7.90)-(7.92) in (7.88), we get Pkl

=

P,.k

+ Sk[P~+l -

rp(k

+ 1, k) P/rpT(k + 1, k) -

T(k) Qk+1TT(k)] S/

(7.93)

or (7.94)

which is the desired recursion. It is also computed backwards, with PII as initial condition. Suppose our system is noise-free (T = 0). Then, from (7.86) and (7.94), it is easy to see that smoothing is simply backward prediction: Xk l

=

P k' =

+ 1, k) X~+1 , rp-l(k + 1, k) P~+1rp-T(k + 1, k). rp-l(k

See Bryson and Henrikson [9] for the linear smoothing algorithm in the case of sequentially correlated measurement noise. A 4. CONTINUOUS FILTER Our continuous linear system is described by the vector (Ito) stochastic differential equation (7.1), which we repeat here, ds, = F(t) Xt dt

+ G(t) df3t ,

(7.95)

and the vector (Ito) observation equation, dZt

=

M(t) Xt dt

+ dTJt ,

(7.96)

Equation (7.95) is described in Section 2. The observed process {Zt, t ~ to} is an m-vector process, M(t) is an m X n, nonrandom, continuous matrix time-function, and {7]t, t ~ to} is an m-vector Brownian motion process- with tf{d7]t d7],T} = R(t) dt, R(t) > O. Xto '

219

CONTINUOUS FILTER

{,B,}, and {1]I} are assumed independent. The formal analogy between (7.95), (7.96), and

+ G(t) w, , y, = M(t) x, + v"

dx,/dt = F(t) x,

(7.95') (7.96')

where {w,} and {v,} are white Gaussian noise processes, w, ~ N(O, Q(t», N(O, R(t», was noted in Section 5.1. (See also Chapter 4, especially Section 4.8.) For this linear filtering problem, Kushner's equation (6.79) of Theorem 6.5 becomes V, ~

dp = -p tr(F) dt - pflJTFx dt

+ (x -

+ ! tr(GQGJA",) dt

X)T MTR:-l(dz, - Mx dt) p.

(7.97)

We obtain the equations for the conditional mean and covariance matrix by specializing Theorem 6.6 to the linear case. Since f = Ex, h = Mx, r-;

/'..

(xhT - X/,T) = (xxT - xx T) MT,

and similarly for the terms involving xfT. Since, in this case, the conditional density is Gaussian, its third central moments are zero,

and, as a consequence, the last term in (6.101) is zero. We therefore have

Theorem 7.3. The optimal (minimum variance) filter for the continuous system (7.95), (7.96) consists of the following differential equations for the conditional mean and covariance matrix:

+ P/MT(t) R-l(t)(dz, - M(t) x/ dt), (7.98) + Pt'P(t) + G(t) Q(t) GT(t) - Pt'MT(t) R-l(t) M(t) Pt',

dX/ = F(t) x/ dt dP,'/dt

= F(t) Pt '

t

~

to • (7.99)



Prediction is accomplished as in the continuous-discrete problem (Theoin (7.98) and (7.99).] rem 7.1). [Set R-l

This is the now well-known continuous Kalman-Bucy filter [33, 36]. The variance equation (7.99) is the well-known matrix Riccati equation. It is independent of the observed process and, as a result, the Kalman gain K(t) = Pt'MT(t) R-l(t) (7.100)

220

LINEAR FILTERING THEORY

may be precomputed. The discussion of Theorem 7.1 applies to Theorem 7.3 as well. Since the Kalman gain is a nonrandom time-function, the coefficient of dZI in the filter equation (7.98) is nonrandom. As a result, Eq. (7.98) may be manipulated by formal rules (see Section 4.6). We may write (7.98) as dXtt/dt

= F(t) xtt + P{/'JT(t) R-l(t)(Yt -

M(t) Xtt).

(7.98)

Compare Theorem 7.3 with Example 6.4. The mechanization of the continuous Kalman-Bucy filter requires the solution of the variance equation (7.99). We must verify that the variance equation has a unique solution existing for all t ~ to . Now since the matrices F, G, M, Q, and R are assumed continuous, and, being quadratic in P, the right-hand side of (7.99) is locally Lipschitzian, standard existence and uniqueness proofs (see Coddington and Levinson [11]) show that the variance equation has a unique solution locally. We now show that the unique solution of (7.99) can be continued for all t ~ to, provided PI. is positive semidefinite (a covariance matrix). It suffices to show that P/ is bounded for all t ~ to [11]. By Theorem 2.13,

where B ~ 0 and P:. is the unconditioned covariance matrix of the solution of (7.95) given only the initial condition. That is,

(7.101) But it is easy to compute from (7.95) [see (7.19)-(7.25)] that

(7.102) which is bounded for all t ~ to' This, together with (7.101), shows that PII is bounded for all t ~ to' As a result, the variance equation has a unique solution existing for all t ~ to • It is useful to note, and can easily be verified, that the solution of the (nonlinear) variance equation can be obtained as a solution of the linear system:

= dY/dt =

dX/dt

-FT(t) X

+ MT(t) R-l(t) M(t) Y,

G(t)Q(t) GT(t) X +F(t) Y,

X(t o)

= I,

Y(t o)

= Pt.·

(7.103)

For then Y(t)

= P/X(t)

for all

t ~ to ,

(7.104)

221

CONTINUOUS FILTER

so that P/

=

Y(t) X-l(t).

(7.105)

This result was derived by Kalman [36]. The next few examples give alternate derivations of the continuous Kalman-Bucy filter. Example 7.9 (Conditional Characteristic Function). We use here the method of Example 4.15 (characteristic function), together with Ito's stochastic calculus, to derive the Kalman-Bucy filter from Kushner's equation (7.97). The conditional characteristic-function (2.122) is defined as

Therefore,

Substituting from Kushner's equation and carrying out the integrations by parts, we get dgJ = uTPgJu dt - igJuTGQGTu dt - (igJu T

+ gJx:

T

)

MT(t) R-1(t) (dz t - M(t) x/ dt). (7.106)

This is a stochastic partial differential equation for the conditional characteristic function. Now since p is Gaussian, we know that the conditional characteristic function is given by (2.130): (7.107)

so that Substituting this in (7.106), we get

+ P/MTR-l (dz t - Mx t t dt)] igJuT(PP/ + P/FT + GQGT) u dt.

dgJ = gJiuT[Px/ dt -

(7.108)

We next compute d


222

LINEAR FILTERING THEORY

partials with respect to the elements of these are real. Thus we have

Pt', nor

will it involve r'H , since

Im(dep/ep) = iuT die/.

Comparing this with (7.108), we immediately have (7.98)

which is (7.98). Now we compute Re(dcp) from (7.107). Using Ito's lemma, Re(dep/ep) =

-! tr P/MTR-IRR-IMP/uuT dt - ! tr uuT dP/

+ (terms quartic in the elements of u resulting from second partials of ep with respect to the elements of P/),

(7.109)

where we have used the fact that XII satisfies (7.98). Now there are no terms quartic in the elements of u in (7.108). Therefore, such terms must have zero coefficients in (7.109). Thus (7.109) becomes Re(dep/ep) = -!uTP/MTR-IMP/u dt - !u T dP/u,

and, comparing this with (7.108), we get (7.99).

..

Example 7.10 (Limit of Discrete Filter). We obtain the continuous filter (Theorem 7.3) from the discrete filter (Theorem 7.2) by the formal limiting process that was already employed in Example 4.18. For L1t = tk +1 - tk small, Eq. (7.95') is integrated to produce the difference equation or (7.110)

where

according to Example 3.20. The observations (7.96') are (7.111)

with

223

CONTINUOUS FILTER

Comparing (7.110), (7.111) with (7.27), (7.2), (7.112)

Using these relations in Theorem 7.2, we get

x [L1t M(P:: + O(L1t)) M T X

(Yt k

M(t k)

-

+ R(tk)r1

x:: - O(L1t)) + O(L1t)jL1t,

- (P:: + O(L1t)) MT(tk) [L1t M(P:: + O(L1t)) M + R(tk)]-l T

X

In the L1t

[M(tk)(P::

+ O(L1t))] + O(L1t)jL1t.

= 0 limit, we get Theorem 7.3.

A

Example 7.11 (Calculus of Variations). We solve the continuous least squares problem (5.24) using the calculus of variations [20]. Minimize

(7.113)

with respect to

XT

and

WT

,

to ::::;

'T ::::;

t,

subject to the constraint

dxTjdT = F(T) XT + G(T) WT ,

(7.114)

This variational approach to estimation was first taken by Bryson and Frazier [7]. The solution x t is the smoothing solution. In view of the constraint (7.114), x, and W to::::; T ::::; t, determine X to::::; T ::::; t, Thus, we minimize with respect to x'o and W to ::::; T ::::; t. x will then be determined by (7.114) once the minimizing x,o and W are established. Using the standard calculus of variations T

T

I/

,

T '

T

T

,

T

,

224

LINEAR FILTERING THEORY

technique, we adjoin the constraint (7.114) to multiplier function '\(r):

We compute the first variation of 8lt

=

[(Xt o - Xto)T

+

r

to

Jt' via a vector

Lagrange

J, :

Ft: - >?(to)] 8xto+ ,\T(t)8xt

{-[(YT- MXT)T R-IM + AT

Necessary conditions for

8J, =

+ ,\TF) 8xT + [W/Q-l - ,\TG] 8wT} dr,

0 are

(Xt o - Xto)T FIol

-

,\T(to)

=

0,

(7.115)

'\(t) = 0,

(7.116)

d>./dr = -?f'\ - MTR-l(YT - Mx T),

(7.117)

WTTQ-l -

,\TG = O.

(7.118)

Eliminating W from (7.117) and (7.114) via (7.118), we have the twopoint boundary-value problem: T

d>.(r)/dr = -FI''\(r) dxT/dr

=

+ MTR-IMxT - MTR-1YT , GQGT,\(r) + FXT ,

(7.119) (7.120) (7.121)

A(t) =

o.

(7.122)

Compare (7.119), (7.120) with (7.103). This problem is linear and easy to solve [7]. Let y{-r) and g(r) be particular solutions of (7.120) and (7.119), respectively, with initial conditions (7.123) Let Y(r) and X(r) be as defined in (7.103). Then it is easy to show that the smoothing solution is (7.124)

225

CONTINUOUS FILTER

and Now we are interested in the filtering solution x/

=

yet) - yet) X-let) get) = yet) - pttg(t)

(7.125)

in view of (7.105). We already saw that P/ satisfies the variance equation. To obtain the differential equation for the estimate, we differentiate (7.125) with respect to t: dx/Jdt = dy(t)Jdt - (dP/Jdt) get) - P/ dg(t)Jdt.

Using the variance equation, (7.119), and (7.120), it is easy to show that dx/Jdt = Fx/

+ P/MTR-I(Yt -

MXtt). ..

Example 7.12 (Dynamic Programming). The solution of the continuous least squares problem of Example 7.11 via dynamic programming [4] was first given by Cox [13]. The problem of minimizing It' (7.113) subject to constraint (7.114) is imbedded in a family of problems parameterized by values of XI by defining the cost function (7.126)

Keeping in mind the probabilistic interpretation of the least squares problem (Sect. 5.3), - Vex, t) can be interpreted as a measure of the likelihood of the most likely trajectory passing through Xl = x. The filtering solution XII is clearly that value of XI which minimizes V. By the principle of optimality [4], we have the following functional equation for V: V(Xt , t) =

Now for

min

{w T . t- 6t ";;;T";;;t }

1V(Xt -

Sx, t - ot)

ot small, the integral in (7.127) is [(Yt - Mxt)T R-I(Yt - Mx t) + w?Q-1Wt] St + o(St)

and V(Xt - Sx, t - St)

=

V(Xt, t) - Vt(Xt , t) ot - V,.,T(Xt , t) ox + o(St),

226

LINEAR FILTERING THEORY

so that (7.127) becomes Vt

=

min {-V.,T(8xj8t) {w",t-Bto;;u;;t}

+ !(Yt -

MXt)TR-l(Yt - MXt)

Substituting for SxjSt from (7.114) and passing to the St formally get the Bellman partial differential equation V t = min{ - V,/(Fx t

w,

+ Ow t) + !(Yt -

= 0 limit, we

MXt)T R-l(Yt - MXt) + iw?Q-iwt}. (7.128)

The minimizing w, is easy to compute by setting the derivative of the term in braces equal to zero: (7.129)

Comparing (7.129) with (7.118), we see that V", plays the role of the Lagrange multiplier. Thus we finally have

The boundary condition is clearly V(Xto' to) = t(Xt o - Xto)T P"to1(Xto - Xt o)'

The solution of the estimation problem is reduced to the solution of (7.130). Once V(x(r), r) is found, then, in view of (7.129), backward integration of (7.114) produces X,,'. To get a formal filtering solution, assume V of the form (7.131)

X,' clearly minimizes V with respect to x,, From (7.131), V., =

pr (x 1

t -

xt'),

Substituting these expressions into (7.130) and equating like powers of x,, we get (7.132)

227

CONTINUOUS FILTER

Since pP-l dPJdt

=

= I,

-P[d(P-l)Jdt] P.

Using this fact, (7.132) gives the variance equation.

..

Example 7.13 (Invariant Imbedding). We return to the variational twopoint boundary-value problem (7.119)-(7.122) of Example 7.11, and obtain the filtering solution by invariant imbedding (see Bellman et al. [5] and Detchmendy and Sridhar [14]). Suppose we solve the problem (7.119)-(7.122) and obtain the missing terminal condition on x I ' call it ~,'. Then, for a new time t' > t, the two-point boundary-value problem has to be re-solved to produce ~(t') = 0 and obtain ~r. If the solution of the original problem is continued from t to t', then, in general, ~(t') = c i= O. Clearly, x, is a function of t and c: (7.133) Xt = r(c, t) and (7.134) xl = reO, t).

We have imbedded the original problem in a family ~f problems'parameterized bye. reO, t) is the solution of our filtering problem. Now, on the one hand, r(e + Se, t

+ St) =

r(c, t)

+ (GQGTc + Fr) St + o(St)

(7.135)

in view of (7.120), and on the other, using a Taylor series expansion, r(e + Be, t

+ St) =

r(c, t)

+ rc Sc + rt St + o(St},

(7.136)

where, from (7.119), Sc = [-FTc

+ MTR-IMr -

MTR-lyt] St •

Equating (7.135) and (7.136), dividing by St, and passing to the St limit, we formally get

=0

(7.137)

This is a partial differential equation for r. Assume a solution of the form r(c, t) = x/

+ pet) c.

(7.138)

228

LINEAR FILTERING THEORY

This is motivated by (7.134). Then, from (7.138),

'0 =

P(t),

't

= £/ + p(t) c.

(7.139)

Substituting (7.138) and (7.139) into (7.137), and regarding the result as an identity in c (equate coefficients of CO and c to zero), produces the continuous Kalman-Bucy filter. ..

In the next three examples, we obtain the continuous filter in the case of correlated system and measurement noise, colored measurement noise, and the continuous linear smoother by using the limiting procedure of Example 7.10 in Examples 7.5-7.8. Example 7.14 (Correlated System and Measurement Noise). the system (7.95), (7.96) that tS'{dPt d"ltT)

=

Suppose in

C(t) dt,

Then, using the limiting procedure of Example 7.10 in the filter of Example 7.5, where we easily obtain dx//dt

= F(t) x/ + [P/M>,(t) + G(t) C(t)] R-l(t)(Yt -

M(t) x/),

+ P/P + G(t)Q(t) GT(t) [P/.MT(t) + G(t) C(t)] R-l(t)[M(t) P/ + CT(t) GT(t)].

dP//dt = F(t) P/ -

(7.140)

The reader can verify for himself that the same result is obtained from the discrete filter of Example 7.6. The filter (7.140) was first derived by Kalman [36], and also by Cox [13]. .. Example 7.15 (Colored Measurement Noise). generated by dvtldt

= L(t) Vt + u, ,

V to '"

Suppose

VI

in (7.96') is

N(O, V),

{Ut} is white, Gaussian, Ut '""'-' N[O, S(t)], independent of {WI}' We apply

the limiting procedure of Example 7.10 to the discrete filter of Example 7.7. We have Vtk+1 - v tk = L(t k ) Vtk LJt

+ LJt Utk + o(LJt),

229

CONTINUOUS FILTER

where so that

Note that Uk in (7.71) is replaced by Llt u ,•. Then, t k ~ t ~ tk+l , gk+l

= [dYtjdt -L(t)Yt] LIt

+ o(Llt),

+ o(Llt), H(t) g dMjdt + [M(t)F(t) -L(t) M(t)], c. = Q(t) Gf(t) MT(t) + O(Llt), R = R(t) LIt + o(Llt), R(t) g M(t) G(t) Q(t) GT(t) MT(t) + S(t).

H(k) = H(t) LIt

(7.142)

k

(7.143)

Therefore, from (7.78), (x::~~ - x;:)jLlt = Fx:: + [P;:HT + GQGTMT + O(Llt)]

x [R LIt + o(LI t)r1 [(dyjdt) LIt - Ly LIt where we have omitted the time argument t. In the Llt dXttjdt = F(t) Xt t X

Hx~: LIt

+ o(St)],

= 0 limit, we get

+ [PttHT(t) + G(t)Q(t) GT(t)MT(t)]

R-l(t)[dYtidt -L(t)Yt - H(t)x/].

(7.144)

The filter equation (7.144) involves the differentiation of observations. This can be removed for practical applications in the following way. Define B(t) g [PttHT(t) + G(t)Q(t) GT(t) MT(t)] R-l(t). (7.145) Since

(7.144) becomes

~ (x/

- B(t) Yt)

=

F(t) Xt l

+ B(t)[ -L(t) Yt -

H(t) x/] - B(t) Yt .

(7.146)

230

LINEAR FILTERING THEORY

Similarly, from (7.78) we get the variance equation dP//dt

= F(t) P/

+ P/P(t) + G(t) Q(t) GT(t) -

B(t) R(t) BT(t).

(7.147)

The initial conditions for (7.146), (7.147) follow from (7.76): x:~

= xto + PtoW(to)[M(to) PtoW(to) + Vl-l (Yt o -

M(t o) Xt o)'

(7.148)

Thus, there is a discontinuity in the estimate and the covariance matrix at the initial time. This filter was first derived by Bryson and Johansen [8], and later by Mehra and Bryson [39], who used variational techniques. The latter reference also derives the smoother for the colored measurement noise case. ~ Example 7.16 (Continuous Linear Smoother). Using again the limiting procedure of Example 7.10, applied to the discrete smoother of Example 7.8, we derive the continuous smoother for system (7.95), (7.96). Let t k ~ t ~ t k +1 ~ t, , t, fixed. From (7.86), At, - Xt At, = 'P .....(t k+l' t k) Xt Att - Xt At, Xt t t t tH

+ S-I(At, k Xtt -

Att) Xt t

+ L1t[F(tk) P:: + P::FT(tk ) + G(t k) Q(t k) GT(tk)] X

[1 + O(L1t>r1 pf(x:~ - x::)

+ o(L1t).

But and so, in the L1t = 0 limit, we get dXi'/dt

= F(t) x:'

+ G(t) Q(t) GT(t) pr(x: ' -

x/).

(7.149)

Similarly, dPNdt= [F(t)

+ G(t)Q(t) GT(t)p(] pi' + pNFT(t) + Pr'G(t)Q(t) GT(t)]

- G(t)Q(t) GT(t).

(7.150)

Equations (7.149) and (7.150) are to be integrated backwards from t, with initial conditions

231

OBSERVABILITY AND INFORMATION

This smoothing algorithm was first derived by Rauch, Tung, and Striebel [45] (see also Bryson and Frazier [7]). £. 5. OBSERVABILITY AND INFORMATION Assuming that a dynamical system (dynamics and measurement equation) has been defined, the preceding sections have dealt with derivations of the optimal filter for that system. Now it is important to ask what, if anything, can be gained from filtering the data. How much information about the state of the system is contained in the data? Can the state be determined from the data? Intuitively, it would seem that answers to such questions are related to the system model itself, and indeed this is so. The importance of such questions is obvious. If little is to be gained from filtering, then we should consider remodeling the system. This might involve taking additional or alternate measurements, or redesigning the dynamics of the system. How well the state is known is measured by the estimation error covariance matrix P,t. But PII depends on its initial condition P lo (the initial data) and does not reflect the uncertainty in the estimate by virtue of filtering the data alone. Consider the discrete least squares problem of Example 7.2. Set Pol = 0, which means that no weight is attached to the prior estimate Xo ; there is no prior information. Then from (7.41), it is clear that, in order to determine Xk' the information matrix

J(k, 1) ~

k

L t1>T(i, k) MT(i) R;lM(i) t1>(i, k)

(7.151)

i~l

must be positive definite. If J(k, 1) is singular, then certain linear combinations of the elements of Xk cannot be determined; there is no information about them in the data {Yl ,..., Yk}' Notice that the information matrix depends on if> and M, the system model, and not on the data themselves. Also note that the dynamical system in Example 7.2 is noise-free. From (7.44), we see that uncertainty is inversely proportional to information. This discussion motivates the following definitions, which are due to Kalman [29, 34]. The discrete dynamical system (7.27), (7.2) is said to be completely observable if, and only if,

J(k, 0)

>0

for some

k

> O. 6

• We assume that the first observation is taken at to rather than at t 1

(7.152) •

232

LINEAR FILTERING THEORY

It is uniformly completely observable if there exist a positive integer N and positive constants ex, f3 such that (7.153)

0< exI ~ J(k, k - N):::;;; f3I,

for all k ~ N. 7 Finally, we say that Xk is completely observable with respect to {YI 'Yl+l ,... , Yk} if, and only if, J(k, I)

>0

(7.154)

(k ~ I).

We next note some properties of the information matrix, already 0). We saw developed in Example 7.2, for a noise-free system (wk that the information matrix satisfies the following difference equation:

=

J(k

+ 1,0) =

epT(k, k

+ 1) J(k, 0) ep(k, k + 1) + MT(k + 1) R;!lM(k + 1), (7.155)

and is related to the covariance matrix by

r;

k-1

= epT(O, k) P;lep(O, k)

+ J(k, 0).

(7.156)

A concept dual to that of observability is the concept of controllability, also introduced by Kalman [29, 34, 35]. Consider the effect of the forcing term in (7.27). At tk , the difference between X k and ep(k, O)xo is Ll k =

k-l

L ep(k, i + 1) r(i)

Wi+!

i-O

and C{LlkLlkT}

=

k-l

L ep(k,i+ 1) r(i)Qi+1FT(i)epT(k,i + 1).

i-O

We define the controllability matrix k-l

CC(k,O)

g L

ep(k, i

i=O

+ 1) r(i) Qi+IFT(i) epT(k, i + 1).

(7.157)

The discrete dynamical system (7.27), (7.2) is said to be completely controllable if, and only if, ~(k,

0)

>0

for some k > O.

(7.158)

• For two symmetric matrices A and B, A > B (A ;;. B) means that A - B (A - B ;;. 0) is positive definite (positive semidefinite).

>

0

233

OBSERVABILITY AND INFORMATION

It is uniformly completely controllable if there exist a positive integer N and positive constants ex, f3 such that

o < -a : :; ; CC(k, k -

N) :::;;; pI

(7.159)

for all k ~ N. Uniform complete observability and controllability imply stochastic observability and controllability as defined by Aoki [1]. We prefer the present definitions because they involve intrinsic properties of the system. Completely analogous definitions are made for the continuous dynamical system (7.95), (7.96). In the limit of continuous observations [replace by R(ti)/J1t, Qi by Q(ti)/J1t, and F(i) by J1t G(ti); see Example 7.10], we have the information matrix

s,

.F(t, to) g

r r

(7.160)

CPT(T, t) W(T) R-l(T) M(T) CP(T, t) dT

to

and the controllability matrix CC(t, to) g

to

(7.161)

CP(t, T) G(T)Q(T) GT(r) CPT(t, T) dr,

The continuous dynamical system (7.95), (7.96) is said to be completely observable (controllable) if, and only if, for some t > to .

(7.162)

It is uniformly completely observable (controllable) if there exist positive constants a, ex, f3 such that

o < rxI :::;;; .F(t, t -

a) :::;;; f11

(0

< .a :::;;; CC(t, t for all

u)

t ~

~

f1I)

to + a, (7.163)

See Kalman [29, 34, 35] for equivalent and simpler conditions for controllability and observability of constant (stationary) continuous systems. By the limiting procedure of Example 7.10 applied to (7.155) (or differentiate 7.160), we get the differential equation for the information matrix in the continuous case: dJ/dt

=

-FT(t).F - .FF(t) + W(t) R-l(t) M(t).

(7.164)

AfT(t) R-I(t) M(t) is appropriately called the information rate xpatrix. Comparing (7.164) with (7.132), we see that J is analogous to pll. J-I satisfies the variance equation in the case of no system noise.

234

LINEAR FILTERING THEORY

In the next several sections, we use the concepts of observability and controllability in a study of qualitative properties of linear filters. 6. BOUNDS AND STABILITY-DISCRETE FILTER This section is devoted to the development of qualitative properties of the discrete filter (Theorem 7.2). Our main objective is to prove, under appropriate conditions, the uniform asymptotic stability of the discrete filter. In this connection, we note that optimality does not imply stability. The stability needs to be proved. In the course of proving stability, we develop upper and lower bounds on the covariance matrix Pkk and study its asymptotic properties. Our development is based on the work of Kalman [36], Deyst and Price [15], and Sorenson [52]. Before proceeding, it will be useful to write the discrete filter in the equivalent form [see (7.1 I)] Xkk = P(k I k)[P-l(k I k - 1) C1J(k, k - 1) xt~

+ W(k) Rklyk]'

+ W(k) RklM(k)]-l, C/J(k + 1, k) P(k I k) C/JT(k + 1, k) + r(k) Qk+lf'T(k).

(7.165)

P(k I k) = [P-l(k I k - 1) P(k

+ 11 k) =

(7.166)

The filter equation (7.165) can also be written as

+ K(k) Yk.

Xkk = [1 - K(k) M(k)] C1J(k, k - 1) x~:~

(7.167)

[I - K(k) M(k)] tP(k, k - 1) ~ iJ(k, k - I) is the state transition matrix of the filter, and, in view of (7B.I), it is nonsingular.

Lemma 7.1. If the dynamical system (7.27), (7.2) is uniformly completely observable and uniformly completely controllable, and if Po ~ 0, then P(k I k) is uniformly bounded from abovefor all k P(k I k)

PROOF:

~ J-l(k, k -

N)

+ f(/(k, k _

N)

~ (1

~

:

N,

rx{1) I,

k~N.

Consider the estimate XkN = J-l(k, k - N)

k

L

t=k-N

C1JT(i, k) Wei) Rtlyt '

k~N.

In view of Eq. (7.41) of Example 7.2, this is the least squares estimate of Xk based on the most recent N observations, ignoring the system

235

BOUNDS AND STABILITY-DISCRETE FILTER

noise Wk' This estimate is suboptimal (not minimum variance) and, as a consequence,

(7.168) We compute the covariance matrix in (7.168). Since, from (7.27), k-1

L rI>(k,j + 1)T(j) Wi+1 ,

Xi = rI>(i, k) Xk - rI>(i, k)

1-1

the measurement Yi can be written as Yi

=

M(i) rI>(i, k) Xk

1<-1

+ Vi -

M(i) rI>(i, k)

L rI>(k,j + 1) T(j) Wi+1

1-1

Therefore, Xk - Xk N = -J-1(k, k - N)

+ J-1(k, k -

N)

k

L

i-k-N k

L

i-k-N

rI>T(i, k) MT(i) Ri1Vi

rI>T(i, k) MT(i) Ri1M(i) rI>(i, k)

k-1 X

and

L rI>(k,j + 1) T(j) Wi+1 ,

1=i

C{(Xk - XkN)(Xk - XkN)T)

+ COY tJ-1(k, k X ~

=

J-1(k, k - N)

N)

k

L

i-k-N

rI>T(i, k) MT(i) Ri1M(i) rI>(i, k)

kf1=1. rI>(k, j + 1)T(j) W1+1!

J-1(k, k - N)

+ COY !J-1(k, k X

kf

1-k-N

rI>(k,j

= J-1(k, k - N)

N)

k

L

i-k-N

rI>T(i, k) MT(i) R;lM(i) fP(i, k)

+ 1) T(j) Wi+1! + ~(k, k -

N).

This, together with (7.168), proves the lemma.



236

LINEAR FILTERING THEORY

By considering the estimate Xk k

=

J-l(k,O)

k

L rpT(i, k) MT(i) R;lYi i-O

based on all the observations, and imitating the proof of the lemma, we can derive the bound P(k I k)

~

J-l(k, 0)

+ qj(k, 0),

k~N.

(7.169)

Suppose our dynamical system is uniformly completely observable and noise-free ('G' = 0). Then P(k I k)

~

J-l(k, 0),

k

~N.

(7.170)

[This also follows directly from (7.156).] Defining .J(k, 0) ~ rpT(k, 0) J(k, 0) rp(k, 0)

=

k

L rp\i, 0) MT(i) Ri1M(i) rp(i, 0), i..Q

(7.171)

(7.170) becomes P(k I k) ~ rp(k, 0) .J-1(k, 0) rpT(k, 0).

Now .J(k,O) = .J(N, 0)

(7.172)

+ .J(2N, N) + "',

the sum of positive definite matrices. Thus, if II .J-1(k, 0)11- 0 faster than /I @(k, 0)11 2 increases, as k -+ 00, P(k I k) - 0 (k - (0). The state of the system can be estimated as accurately as desired. This also shows that the initial statistic Po is gradually forgotten. The system noise, in the general system (7.27), prevents us from estimating the state exactly. Returning to the noise-free system, we see from (7.156) that, if the system is completely observable, then P(k I k) is positive definite for some k. Furthermore, it is evident that, once P(k I k) > 0, it remains positive definite thereafter. [This last property is also true for the general system, in view of (7.166).] In contrast with this, we saw in Example 7.7 that, if the measurements are noise-free, P(k I k) is never positive definite.

Lemma 7.2.

If the dynamical system (7.27), (7.2) is uniformly completely observable and uniformly completely controllable, and if Po > 0, then P(k I k) is uniformly bounded from below for all k ~ N, P(k I k) ~ [J(k, k - N)

+ qj-l(k, k -

N)]-l ~ ( 1 ; 0lf3 ) I,

k~N.

237

BOUNDS AND STABILITY-DISCRETE FILTER

It is evident from (7.166) that P(h I h) > 0 for all h, since Po > O. Define

PROOF:

S(k

S(k I k) ~ P-1(k I k),

(7.173)

S(k I k) ~ S(k I k) - MT(k) R;;lM(k),

(7.174)

+ 1 I k)

(7.175)

~ ([)-T(k

+ 1, k) S(k I k) ([)-l(k + 1, k).

Then, using (7.166), S(k I k) S(k

=

+ 1 I k) =

[S-l(k I k - 1)

+ T(k

- 1)Qkf'T(k - 1)]-1,

+ 1, k) S(k I k) ([)-l(k + 1, k) + ([)-T(k + 1, k) MT(k) R;;lM(k) (]r-1(k + 1, k).

([)-T(k

(7.176)

Comparing (7.176) with (7.166), we see that S(h I h) is the estimation error covariance matrix of the system (7.177) It is easy to verify that (7.177) is uniformly completely observable and uniformly completely controllable, since the dynamical system (7.27), (7.2) is. Therefore, applying Lemma 7.1 to (7.177), we have that S(k I k) ~

[

Lk

([)-l(i, k) T(i - 1) QiTT(i - 1)

i-k-N

k-1

+ L

i-k-N

x

=[

([)-T(k, i

+ 1) ([)-T(i + 1, i)

MT(i) R,lM(i) ([)-l(i

E 1

i-k-N-1 k-1

+ L

i=k-N

([)(k, i

s-v« k) ]~

+ 1, i) (]r-1(k, i + 1)

+ 1) T(i) Qi+lf'T(i) ([)T(k, i + 1)]-1

([)T(i, k) MT(i) Ri1M(i) ([)(i, k).

Now, in view of (7.174), S(k I k) 8

The assumption that

~ ~-l(k,

k - N)

+ J(k, k -

Oll: > 0 is no loss of generality.

N),

238

LINEAR FILTERING THEORY

and, in view of (7.173), P(k I k)

~ [~-l(k,

k - N)

+ J(k, k -

N)]-l.



Lemma 7.3. If the dynamical system (7.27), (7.2) is uniformly completely controllable and Po ~ 0, then P(k I k) > 0 for all k ~ N. In view of (7.166), if P(N IN) > 0, then P(k I k) > 0 for all k ~ N. So it suffices to prove that P(N I N) is nonsingular. Suppose P(N I N) is singular. Then there exists a vector v ::f- 0 such that

PROOF:

vTP(NI N)v = 0.

(7.178)

lJf(N, k) P(k I k) lJfT(N, k),

(7.179)

Let S(k)

£

where lJf(k, k - 1) = [1 - K(k) M(k)] tJ>(k, k - 1)

is the filter state transition matrix [see (7.167)]. It is easy to compute S(k) - S(k - 1) = lJf(N, k)[[1 - K(k) M(k)] T(k - 1) QkTT(k - 1)

[I - K(k) M(k)F

X

+ K(k) R~(k)] lJfT(N, k),

(7.180)

using Theorem 7.2 and (7.16). Now S(O) ~ 0, S(k) - S(k - 1) ~ 0, and, in view of (7.178), vTS(N)v = O. Therefore, vTS(k)v = 0 for all k ~ N. That is, vTlJf(N, k) P(k I k) lJfT(N, k) v = 0,

all k ~ N,

(7.181)

or, equivalently, P(k I k) lJfT(N, k) v = 0,

all k =::;; N.

(7.182)

Now let w(k) ~ lJfT(N, k) v,

w(N) = v.

(7.183)

We easily compute w(k - 1) = lJfT(k, k - 1) w(k)

(7.184)

and tJ>T(k - 1, k) w(k - 1)

Since vT[S(k) - S(k we have that

=

(I - K(k) M(k»T lJfT(N, k) v.

(7.185)

I)Jv = 0, 1 ~ k ~ N, using (7.185) in (7.180),

wT(k - 1) tJ>(k - 1, k) T(k - 1) QkTT(k - 1) tJ>T(k - 1, k) w(k - 1) = 0, 1 ~ k =::;; N.

(7.186)

239

BOUNDS AND STABILITY-DISCRETE FILTER

Now (7.184) can be written as

=

w(k - 1)

~T(k,

k - 1) w(k) - ~T(k, k - 1) W(k) K.T(k) w(k).

(7.187)

But KT(k) w(k)

[M(k) P(k I k - 1) W(k)

=

+ RkJ-

I

M(k) P(k I k - 1) ljIT(N, k) v,

and, using (7.29) and (7.182), it is easy to show that P(k I k - 1) 1jIT(N, k) v

= 0,

all k

~

N.

Therefore, w(k - 1)

=

~T(k,

k - 1) w(k),

which has the solution w(k)

=

~T(N,

k) v.

(7.188)

As a result, (7.186) becomes vT~(N,

k) T(k - 1) QkTT(k - 1) ~T(N, k) v

and vT

[t ~(N,

k) T(k - 1) QkTT(k - 1)

k-I

=

0,

1 ~k ~N,

~T(N, k)] v =

0,

which is vT~(N,

0) v = O.

But this contradicts the hypothesis of uniform complete controllability, and the lemma is proved. • We are now ready to consider the stability of the filter. For the definitions of various types of stability, and stability theory in general, the reader is referred to Kalman and Bertram [30, 31] and Hahn [23]. We are concerned here only with the linear discrete system (7.165) or (7.167). The linear system (7.189)

is said to be stable' if (7.190) 8 When speaking of the stability of (7.189), we mean the stability of the homogeneous part of (7.189).

240

LINEAR FILTERING THEORY

The system is asymptotically stable if, in addition, (7.191)

It is uniformly asymptotically stable if IllJf(t/r, , to)[1

~

C2

(7.192)

exp( -cs(t/r, - to»

(The Ci are positive constants.) We note in passing that, in uniformly asymptotically stable systems, bounded inputs Uk produce bounded outputs Zk . Kalman and Bertram [30, 31] show that (7.189) is uniformly asymptotically stable if there exist scalar functions V(Zk , tk)' 1'1(11 Zk 11), 1'2(11 Zk II), and 1'3<11 Zk /1), such that, for some integers N, M > 0, all

k

~

M,

(7.193)

where 1'101 Z II> -+ 00

01 Z 11-+ (0);

and

where 'Ys(O)

= 0;

and V(O, tk) - OJ 1'1 , 1'2 , and 1'3 are continuous, and 1'1 and 1'2 are nondecreasing. The 'function V is called a Lyapunov function for the system (7.189). We shall prove the uniform asymptotic stability of the linear filter by producing a Lyapunov function (and 1'1 , 1'2' 1'3) for it.

Theorem 7.4. If the dynamical system (7.27), (7.2) is uniformly completely observable and uniformly completely controllable, and if Po ~ 0, then the discrete linear filter of Theorem 7.2 is uniformly asymptotically stable. PROOF:

We shall prove that (7.195)

is a Lyapunov function for (7.165), or for Z/r, = P(k I k) P-l(k I k - 1)
(7.196)

241

BOUNDS AND STABILITY-DISCRETE FILTER

In view of Lemma 7.3, P(k I k) and 7.2, (1;

IX~

)

I

> 0 for k

~

N. Then, by Lemmas 7.1

~ P-l(k I k) ~ ( 1 ~ ~ ) I,

k

~ 2N,

so that

This establishes property (7.193). It remains to establish (7.194). Following Deyst and Price [15], we define

g s, g

Uk

[P(k I k) P-l(k I k - 1) - 1] rl>(k, k - 1) Zk-l , rl>(k, k - 1) Zk-l .

(7.197)

Then the system (7.196) can be written as Zk = Zk + [P(k I k) P-l(k I k - 1) - 1] Zk = Zk + Uk'

(7.198)

Using (7.166), we have ZkTP-1(k I k) Zk

= z/[p-l(k I k - 1) + MT(k) R;;lM(k)] Zk

= ZkTp-l(k I k - 1) Zk + 2ZkT[p-l(k I k) - p-l(k I k - 1)] Zk - ZkTMT(k) R;;IM(k) Zk + ZkTp-l(k I k - 1) Zk - ZkTP-1(k I k - 1) Zk = z/P-l(k I k - 1) Zk - ZkTMT(k) R;;lM(k) Zk - UkTp-l(k I k - 1) Uk = ZLl[P(k - 1 I k - 1)

+ rl>(k -

1, k) T(k - 1) QkTT(k - 1) rl>T(k - 1, k)]-l Zk-l

- ZkTMT(k) R;;lM(k) Zk - UkTp-l(k I k - l)uk ~ z~_lP(k - 1 I k - 1) Zk_l

- z/MT(k) R;;lM(k) Zk - u/P-l(k I k - 1) Uk .

Therefore, V(Zk' t k) - V(Zk_l' tk-I) ~ -ZkTMT(k) R;;lM(k) Zk - UkTp-l(k I k - 1) Uk'

242

LINEAR FILTERING THEORY

and V(ZIc , tic) -

V(ZIc_N_I , tl<-N_I) Ic

~

Let

- L

t-Ic-N

[z/MT(i) R'iIM(i) Zt + UtTp-I(i I i-I) Ut] ~ -

V(ZIc • tic) -

2N

+ 1.

(7.199)

k ~ 2N

+ 1.

(7.200)

k

JO be the minimum value

J,

~

of j. Then we have

V(ZI<-N_I • t k-N-I)

~

-

J~

-]0,

We are thus led to the problem of minimizing the system (7.198):

J with

i= k -N, ...• k;

respect to {Uk} for

ZI<-N-I

given.

(7.201)

This is an ordinary minimum problem once the Zi. i ~ k - N. are eliminated from J in favor of Zk-N-I via (7.201). This problem is readily solved for {uk°}, the minimizing {Uk} as a function of Zk-N-I , which results in JO as a function of Zk-N-I • Then. using the hypothesis of uniform complete observability, it can be shown that Then. in view of (7.196), ]0 ~

'5

I! ZIc 11 2•

'5

> 0,

and (7.200) becomes V(ZIc , tic) -

V(ZIc_N_I • t k-N-I)

which establishes property (7.194).

~

-'511 Zk 11 2,

k ~ 2N

+ 1.



An immediate consequence of Theorem 7.4 is Theorem 7.5. Let the dynamical system (7.27), (7.2) be uniformly completely observable and uniformly completely controllable. Suppose Pl(k I k) and FI( k I k) are any two solutions of (7.166) with respective initial conditions POl and P 02; POl, P02 ~ O. Let 8P(tk ) = PI(k I k) - P2(k I k). Then I! SP(tk)11 ~ '22e-2Ca(t k- t O) II POl - P o2 1! __ 0 (k-- 00)

(c2 10

, Ca

> 0).

We assume, of course, that all the matrices in the system description are bounded.

243

BOUNDS AND STABILITY-CONTINUOUS FILTER PROOF:

It is easy to verify that SP(t k ) satisfies the difference equation

SP(tk ) = [I - KI(k) M(k)] tP(k, k - 1) SP(tk-I) tPT(k, k - 1)[1 - K2(k)M(k)]T, (7.202)

where KI and K2 are the Kalman gains corresponding to PICk I k) and P2(k I k), respectively. But

[I - K(k) M(k)] tP(k, k - 1) = o/(k, k - 1) is the filter state transition matrix [see (7.167)]. Therefore, the solution of (7.202) is (7.203)

According to Theorem 7.4, the filter is uniformly asymptotically-stable, and, by definition, its state transition matrix has property (7.192). Taking norms in (7.203) and using (7.192) proves the theorem. • Theorem 7.5 has two important applications. It essentially states that the effect of the initial statistic Po is forgotten as more and more data are processed. This is important, since Po is often poorly known and sometimes rather arbitrarily set. Furthermore, this means that the computation of P(k I k) is stable. Numerical errors in P(k I k) are also forgotten. 7. BOUNDS AND STABILITY-CONTINUOUS FILTER

The qualitative properties of the continuous filter (Theorem 7.3) are identical to those already developed for the discrete filter in the preceding section. That is, Lemmas 7.1-7.3 and Theorems 7.4 and 7.5 hold if the dynamical system (7.95), (7.96) is uniformly completely observable and uniformly completely controllable, and pet I t) is the solution of the variance equation (7.99). The homogeneous part of the filter equation is, of course [see (7.98)], dX//dt

=

[F(t) - pet I t) Wet) R-I(t) M(t)]

xl.

The proofs of these lemmas and theorems are given by Kalman [36J. They are by and large analogous to those already given for the discrete case. Because of this, and because they are available in the book already cited, we shall not present these proofs here. In fact, the reader himself can adapt the proofs in the preceding section to the continuous case. In the case of the continuous filter, one can also speak of the steady-

244

LINEAR FILTERING THEORY

state solution of the variance equation. The existence of the steady-state solution and its properties for constant (stationary) systems are treated in detail by Kalman and Bucy [33] and Kalman [36]. 8. ERROR SENSITIVITY-DISCRETE FILTER

In applying the linear filter (Theorem 7.2) to a specific system, the dynamical model parameters (cI>, T', M), noise statistics (Q, R), and a priori data [xo ' P(O)] must be specified. Since the system model is usually an approximation to a physical situation, the model parameters and noise statistics are seldom exact. That is, the system model used in constructing the filter differs from the (real) system that generates the observations. Sometimes such an approximation is intentional. For example, it may be desirable to use a system model of lower dimension than the dimension of the real system in order to gain computational speed and simplicity. It has already been remarked in several places that the prior data are seldom precisely known and are generally rather arbitrarily specified to start the filtering process. We have seen (Theorem 7.5) that the prior data are eventually forgotten, after sufficient observations have been processed. It is clear that an inexact filter model will degrade the filter performance. In fact, such an inexact model may cause the filter to diverge (see Chapter 8). In designing a filter model, it is therefore important to evaluate the effect on performance of the various approximations made. This section is a study of the qualitative effects of such approximations. Suppose that the real or actual system is described by Xk+1 Yk

= =

cI>(k

+ 1, k) Xk + ep(k) + r(k) Wk+1 ,

M(k)

Xk

+

(7.204)

Vk ,

Wk ,...." N[O, Q(k)],

Vk""'"

N[O, R(k)],

Xo ,...." N[x(O),P(O)],

whereas we model this system by Xk+1 Yk

+ 1, k) + Po(k) + ro(k) = Mo(k) +

= cI>o(k

Xk

Xk

Vk ,

Wk+1 ,

(7.205) Xo ,...." N[xo(O), PocO)].

We include the fixed bias ep(k) in the dynamics to account in an approximate, yet convenient, way for errors due to nonlinearities, reduction in system dimension, and so on. In what follows, {ep(k)} could readily be

245

ERROR SENSITIVITY-DISCRETE FILTER

+

+

1, k), (/Je(k 1, k), treated as a random sequence. We assume that (/J(k T(k), Te(k), M(k), Me(k), Q(k), Qe(k), R(k), Re(k) are uniformly bounded; that (/J and (/Je are nonsingular; that R(k), Re(k) > 0; and that P(O), Pe(O) ~ O. Using the model (7.205), our filter design is

= !Pe(k

X~+1

x/ =

where

+ 1, k) x/ + fPe(k),

[I - Ke(k) Me(k)] X~-l

+ Ke(k)Yk'

(7.206)

and Peek

+ 1 I k) = P J..k I k)

+ 1, k) Peek I k) (/J?(k + 1, k) + reek) Qe(k + 1) reT(k),

!Pe(k

= [I = [I -

Ke(k) Me(k)] Peek I k - 1) Ke(k) Me(k)] Peek I k - 1)[1 - Ke(k) Me(k)]T

(7.208)

+ Ke(k) Re(k) K?(k), Pe(O I 0) = Pe(O).

Note that the filter operates on the real data {Yk}' Now the computed matrix Peek I k) is not the estimation error covariance matrix, since the filter model differs from the real model. Neither is this filter the minimum variance filter for the actual system (7.204). The actual estimation errors are X~+1

g

Xk+1 - X~+1

=

!Pe(k

+ 1, k) Xk k + I!1!P(k + 1, k) Xk + I!1fP(k) + r(k)Wk+l' (7.209)

Xk k g Xk - Xk k = [I - KoCk) Me(k)] X~-l - Ke(k) I!1M(k) Xk - Ke(k) Vk'

where I!1ep(k

+ I, k) g ep(k + 1, k) I!1fP(k) g fP(k) - fPe(k), I!1M(k)

g

!Pe(k

+ 1, k), (7.210)

M(k) - MoCk).

A measure of the filter performance is provided by the actual estimation error covariance matrix (7.211) We next develop a recursion for this matrix.

246

LINEAR FILTERING THEORY

Using (7.209), we get P(k

+ 1 I k) =

+ 1, k) P(k I k) f/}cT(k + 1, k) + T(k) Q(k + 1) TT(k) + L1f/}(k + 1, k) X(k) L1f/}T(k + 1, k) + L1f/}(k + 1, k) C(k I k) f/}cT(k + 1, k) + f/}c(k + 1, k) CT(k I k) L1f/}T(k + 1, k). + L1f/}(k + 1, k) m(k)L1epT(k) + L1ep(k) mT(k) L1f/}T(k + 1, k) + L1ep(k) L1epT(k) + f/}c(k + 1, k) L1m(k I k) L1epT(k) + L1ep(k) L1mT(k I k) f/}cT(k + 1, k), (7.212) f/}c(k

P(k I k) = [1 - Kc(k) Mc(k)] P(k I k - 1)[1 - Kc(k) Mc(k)]T

+ Kc(k) R(k) KcT(k)

- Kc(k) L1M(k) C(k I k - 1)[1 - Kc(k) Mc(k)]T

- [1 - Kc(k) Mc(k)] CT(k I k - 1) L1MT(k)KcT(k)

+ Kc(k) L1M(k) X(k) L1MT(k) K?(k),

(7.213)

where X(k) C(k I k)

g

tS'{X~kT},

g tS'{x~~\

m(k) g tS'{Xk},

C(k + 1 I k)

g tS'{Xk+1X~:l}'

(7.214)

L1m(k I k) g tS'{xl}.

Difference equations for these latter quantities are similarly computed to be

+ ep(k) epT(k) + + 1, k) m(k) epT(k) + ep(k) mT(k) f/}T(k + 1, k), (7.215) C(k + 1 I'k) = f/}(k + 1, k) C(k I k) f/}cT(k + 1, k) + f/}(k + I, k) X(k) L1f/}T(k + I, k) + T(k) Q(k + 1) TT(k) (7.216) + f/}(k + I, k) m(k)L1epT(k) + ep(k) mT(k) L1f/}T(k + 1, k) + ep(k) L1mT(k I k) f/}?(k + I, k) + ep(k) L1epT(k), X(k

+ 1) =

C(k I k)

=

f/}(k + 1, k) X(k) f/}T(k + 1, k) T(k) Q(k + 1) TT(k) + f/}(k

C(k I k - 1)[1 - Kc(k) Mc(k)]T - X(k) L1W(k) K?(k),

m(k + 1) = f/}(k

+ 1, k) m(k) + ep(k),

(7.217)

247

ERROR SENSITIVITY-DISCRETE FILTER

Llm(k

+ 1 I k + 1) =

+ 1) MCc(k + 1, k) Llm(k I k) + LlcI>(k + 1, k) m(k) + Llep(k)] - Kc(k + 1) LlM(k + 1) m(k + 1). (7.218)

[1 - KC
Initial conditions for (7.212)-(7.218) are

= X(O) = C(O I 0) = m(O) = Llm(O : 0) = P(O I 0)

P(O),

C{xoxoT} = P(O) + x(O) xT(O), C{xo[xo - x(OW} = P(O),

(7.219)

C{xo} = x(O), <9'{xo - x(O)} = O.

Various special cases of the actual estimation error covariance matrix equations have been developed by several authors. Battin [3, p. 334] considered the effect of incorrect measurement error statistics. Fagin [16] was concerned with errors in ifJ and Q. Heffes [24] considered errors in P(O), Q, and R. Heffes also presents some numerical simulations that show that the actual estimation error variance is greater than or equal to the minimum variance (errors degrade the filter performance). His simulations also indicate that, under appropriate conditions, the actual error variance is less than or equal to the computed variance [P(k I k) ~ Pc(k I k)]. Nishimura [41, 42] considers initial errors only [errors in P(O)] and develops an upper bound on P(k I k). His result is a special case of Theorem 7.6. Nishimura also presents applications and numerical simulations. Price [44] treats errors in ep(k), Q, and Rand proves, under certain conditions, that P(k I k) is bounded. This result is contained in our Theorem 7.7. The most complete development is given by Griffin and Sage [22]. They consider all the errors except Llep. They do not, however, prove any qualitative results but merely develop equations. Griffin and Sage also define and develop equations for local error sensitivities,

t

where ~ is a vector of parameters in the actual system, and is its value used in the filter model. Such equations'! are developed trivially by differentiating (7.212) and (7.213). Griffin and Sage also present some numerical results. Of some interest are their simulations of a model dimension error (via LlifJ). 11 These equations, as given by Griffin and Sage, appear to be in error in that they neglect to differentiate X and C.

248

LINEAR FILTERING THEORY

The equations just developed enable the a priori determination of the effect of various approximations on filter performance. As a result, they are a useful tool for filter design. Analogous equations for the linear smoother are presented in Griffin and Sage [22]. Consider first the special case of errors in P(O), Q(k), and R(k) only. In this case, Eqs. (7.212) and (7.213) reduce to

+ 1 I k) =

P(k

ep(k

+ 1, k) P(k I k) epT(k + 1, k) + r(k) Q(k + 1) FT(k),

P(k I k) = [I - K.(k) M(k)] P(k I k - 1)[1 - K.(k) M(k)]T

(7.220)

+ K.(k) R(k) K?(k), whereas

+ 1 I k) =

ep(k

P.(k I k)

[I - K.(k) M(k)] P.(k I k - 1)[1 - K.(k) M(k)]T

P.(k

=

+ 1, k) P.(k I k) epT(k + 1, k) + r(k) Q.(k + 1) FT(k), (7.221)

+ K.(k) Rc(k) K?(k). Define E(k I k) ~ P(k I k) - P.(k I k), E(k

+ 1 I k) =

P(k

+ 1 I k) -

P.(k

(7.222)

+ 1 I k).

Then E(k

+ 1 I k) =

ep(k

E(k I k)

[I - K.(k) M(k)] E(k I k - 1)[1 - K.(k) M(k)]T

=

+ 1, k) E(k I k) epT(k + 1, k) + r(k)[Q(k + 1) - Q.(k + 1)] rT(k),

+ K.(k)[R(k) -

(7.223)

R.(k)] K.T(k).

Suppose Q(k) ~ Q.(k), R(k) ~ Rc(k), all k. Then, if E(k I k - 1) 1 I k) it is evident from (7.223) that E(k I k) ~ 0 and E(k Therefore, by induction, we have

+

~

~

0, O.

Theorem 7.6. If P(O) ~ Pc(O) and Q(k) ~ Qc(k), R(k) ~ Rc(k), all k, then P(k I k) ~ Pc(k I k) and P(k + 11k) ~ Pc(k + 1 I k) for all k. The implications of this result are the following. Although the actual values P(O), Q(k), and R(k) are not generally available, conservative estimates of these quantities can often be made. In that case, the actual estimation error variances are bounded by the computed "variances," which are generated as part of the filter solution. In fact, Pc(k I k) may be precomputed to determine whether the conservative estimates of

249

ERROR SENSITIVITY-DISCRETE FILTER

P(O), Q(k), and R(k) give satisfactory filter performance, or whether

somewhat less conservative estimates are desirable. Theorem 7.6 has the following corollary.

Corollary. Hypothesis of Theorem 7.6. Further suppose that the system model (7.205) is uniformly completely observable and uniformly completely controllable. Then there is an integer N > 0 such that P(k I k) is uniformly bounded (from above) for all k

PROOF:

~

N.

By Lemma 7.1, we have Pc(k I k)

~

yl

(y

> 0),

k

~

N.

Combining this with Theorem 7.6 proves the corollary.



Consider now the case of errors in P(O), Q(k), R(k), and ep(k) only. As was mentioned earlier, ep(k) accounts, in an approximate way, for possible neglected nonlinearities, reduction in system dimension, and so on. In this case, Eqs, (7.212) and (7.213) become P(k

+ 11 k) =

+ 1, k) P(k I k) r[JT(k + 1, k) + r(k)Q(k + 1) rT(k) .dep(k) .depT(k) + ~(k + 1, k) .dm(k I k) .depT(k) .dep(k) .dmT(k I k) r[JT(k + 1, k),

r[J(k + +

P(k I k) = [1 - Kc(k) M(k)] P(k I k - 1)[1 - Kc(k) M(k)]T + Kc(k) R(k) K?(k),

and, combining these equations, we have P(k

+ 1 I k + 1) =

+ 1) M(k + 1)] r[J(k + 1, k) P(k I k) r[JT(k + 1, k) Kc(k + 1) M(k + 1)]T +F(k), (7.224)

[1 - Kc(k

x [1 where

+ 1) M(k + l)][r(k)Q(k + 1) rT(k) + .dlj?(k) .dlj?T(k) + r[J(k + 1, k) .dm(k I k) .depT(k) + .dep(k) .dmT(k I k) r[JT(k + 1, k)] X [1 - Kc(k + 1) M(k + 1)]T + Kc(k + 1) R(k + 1) K?(k + 1),

F(k) = [1 - Kc(k

(7.225)

and where [from (7.218)] .dm(k

+ 1 I k + 1) =

[1 - Kc(k + 1) M(k

+ 1)] r[J(k + 1, k) .dm(k I k) + f(k), (7.226)

250

LINEAR FILTERING THEORY

where f(k)

= [1 - Kc(k + 1) M(k

+ 1)] .dfP(k).

(7.227

Note that [1 - Kc(k

+ 1) M(k + 1)]
'Pc(k

+ 1, k)

(7.228)

is the state transition matrix of the filter [see (7.167)]. We have the following theorem due to Price [44].

Theorem 7.7. If the system model (7.205) is uniformly completely observable and uniformly completely controllable, and if F(k) is uniformly bounded and P(O) is bounded, then P(k I k) is uniformly bounded for all k.

PROOF:

We can write the solution of (7.224) as

P(k I k)

=

'Pc(k, 0) P(O) 'PcT(k, 0)

k-l

+ L 1J'c(k, i + I)F(i) 1J'cT(k, i + I).

(7.229)

i=O

Now in view of Theorem 7.4, the filter (7.206) is uniformly asymptotically stable; that is, all

k

~ j.

(7.230)

Using this and the uniform bound on F(k) in (7.229), it is not difficult to show that (cs , C4 > 0) k

11

P(k I k)1I ~ Cs

L e-

t-o

<; Cs L e- Ct i < 00

Ct i

00.

i-O

(This is an application of the well-known theorem that, in uniformly asymptotically stable systems, a uniformly bounded input produces a uniformly bounded response.) • Consider the problem of bounding F(k), which is a hypothesis of the theorem. In view of the assumed uniform complete observability and controllability, and Lemma 7.1, Peek I k) and Peek + I I k) are uniformly bounded, at least for k ~ N. It has been assumed that all the matrices in the description of system (7.205) are bounded (except rp and rpe). It therefore remains to bound Llrp(k) and Llm(k I k) in F(k). Suppose LlfP(k) is uniformly bounded. Thenf(k) in (7.226) is uniformly bounded. Then Theorem 7.7, applied to Llm(k I k), proves that it is uniformly bounded.

251

ERROR SENSITIVITY-CONTINUOUS FILTER

As a consequence, we have the

Corollary. If the system model (7.205) is uniformly completely observable and uniformly completely controllable, and if L1
+

9. ERROR SENSITIVITY-CONTINUOUS FILTER As in the discrete case, assume that the actual system is described by dxtldt Yt W t ""'

N[O,Q(t)],

=

=

+ f(t) + G(t) M(t) X t + Vt ,

F(t)

V t ""'

Xt

N[O, R(t)],

Wt ,

(7.231)

252

LINEAR FILTERING THEORY

whereas its model is dXtJdt = Fe(t) Xt + fc(t) + Ge(t) Wt , Yt = Me(t) s, + e, , Wt '" N[O, Qe(t)],

e, '" N[O, Re(t)],

(7.232)

Xto '" N[xe(to), Pe(to)]'

Boundedness requirements, analogous to those of Sect. 8, are imposed on the matrices in the description of systems (7.231) and (7.232). Our filter design is dXlJdt = Fe(t) xl dPe(t I t)Jdt Ke(t)

+ Ke(t)(Yt -

Me(t) xl),

= Fe(t) Pe(t I t) + Pe(t I t)FeT(t)

+ Ge(t)Qe(t) Gl(t) -

=

Ke(t) Re(t) Kl(t),

(7.233)

Pe(t I t) MeT(t) R;l(t),

Pe(to I to) = Pe(to)·

Define AF(t)

~

AI(t) AM(t)

~

F(t) - Fe(t), - le(t), M(t) - Me(t),

D. I(t)

(7.234)

and pet I t) X(t) C(t I t)

~

tB'{(x t - Xtt)(Xt - Xtt)T},

~

tS'{x,x,T}, tB'{xt(x t - xtt)T},

~

(7.235)

met) ~ tS'{Xt}, 8m(t I t) ~ tB'{Xt - xl}.

Then, applying the limiting procedure of Example 7.10 to Eqs. (7.212)(7.218), we (tediously) get the following differential equations for the actual estimation error covariance matrix P(t I t), and for X(t), C(t It), m(t), and 8m(t It): dP(t I t)Jdt = [Fe(t) - Ke(t) Me(t)] pet I t) + pet I t)[Fe(t) - Ke(t) Mc(tW + G(t) Q(t) GT(t) + Ke(t) R(t) KcT(t) + am(t I t) AIT(t) + AI(t) 8mT(t I t)

+ [AF(t) -

Ke(t) AM(t)] C(t I t) + er(t I t)[AF(t) - Ke(t) AM(tW,

(7.236)

253

ERROR SENSITIVITy-CONTINUOUS FILTER

dX(t)/dt = F(t) X(t) + X(t)P(t) + G(t)Q(t) GT(t) + m(t)fT(t) + f(t) mT(t), dC(t I t)/dt

= F(t) C(t I t) + C(t I t)[Fe(t) -

Ke(t) Me(t)]T T(t + G(t) Q(t) GT(t) + f(t) 8m I t) + met)AfT(t)

+ X(t)[AF(t) -

dm(t)/dt

(7.237)

Ke(t) AM(tW,

(7.238)

= F(t) met) + f(t),

(7.239)

d 8m(t 1t)/dt = [Fe(t) - Ke(t) Me(t)] 8m(t I t)

+ [AF(t) -

+ Af(t)

Ke(t) AM(t)] met).

(7.240)

Initial conditions are those given in (7.219). Several authors have developed special cases of the actual estimation error covariance equation (7.236). Nishimura [43], Friedland [18], and Mehra [38] considered the case of errors in P(to) , Q(t), and R(t). Fitzgerald [17] and Griffin and Sage [21] consider all the errors except 11/(t). Nishimura, Fitzgerald, and Griffin and Sage present examples and numerical simulations. Analogous equations for the linear smoother are developed by Griffin and Sage and by Mehra. Fitzgerald discusses the case of constant coefficient systems in detail and determines conditions under which divergence will occur. The analogs of Theorems 7.6 and 7.7 hold in the continuous case. We shall develop these theorems below, but our treatment will be sketchy, since it essentially duplicates the discrete results. The reader should refer to Section 8 for discussion, since everything carries over from the discrete to the continuous case. In case of errors in P(to), Q(t), and R(t) only, (7.236) reduces to dP(t I t)/dt

=

[F(t) - Ke(t) M(t)] pet I t)

+ pet I t)[F(t) -

Ke(t) M(t)]T

+ G(t) Q(t) GT(t) + Ke(t) R(t) KeT(t),

(7.241)

whereas dPe(t I t)/dt

= F(t) Pe(t I t)

+ Pe(t I t) pet)

+ G(t) Qe(t) GT(t) -

Ke(t) Re(t) KeT(t).

(7.242)

Defining E(t)

~

pet I t) - Pe(t It),

(7.243)

we have dE(t)/dt

=

+ E(t)[F(t) - Ke(t) M(t)]T Qe(t)] GT(t) + Ke(t)[R(t) - Re(t)] KeT(t).

[F(t) - Ke(t) M(t)] E(t)

+ G(t)[Q(t) -

(7.244)

254

LINEAR FILTERING THEORY

Equation (7.244) is linear and has the solution E(t) = lJ'e(t, to) E(t o) lJ'eT(t, to)

+

r to

+

f

lJ'e(t, T) G(T)[Q(T) -Qe(T)] GT(T) lJ'/(t, T) dr

to

lJ'e(t, T) Ke(T)[R(T) - Re(T)] KeT(T) lJ'/(t, T) dr,

(7.245)

where !fc(t, T) is the state transition matrix of the filter (see Section 7); that is, (7.246)

Inspection of (7.245) produces the theorem due to Nishimura.

Theorem 7.8. If P(to) :::;; Pc(to) and Q(t) :::;; Qc(t), R(t) :::;; Rc(t), all t, then pet I t) :::;; Pc(t I t) for all t. The following corollary is immediate if we apply the continuous analog of Lemma 7.1 (see Section 7).

Corollary. Hypotheses of Theorem 7.8. Further suppose that the system model (7.232) is uniformly completely observable and uniformly completely controllable. Then there is a a > 0 such that pet I t) is uniformly bounded for all t ~ to + a. Now consider the case of errors in P(to), Q(t), R(t), andj(t). Equation (7.236) becomes dP(t I t)Jdt

=

[F(t) - Ke(t) M(t)] pet I t)

+ pet I t)(F(t) -

Ke(t) M(t)]T

+ D(t), (7.247)

where D(t)

=

G(t) Q(t) GT(t) + Ke(t) R(t) K/(t)

+ Sm(t I t) LlfT(t) + Llf(t) SmT(t It), (7.248)

and where, from (7.240), d Sm(t , t)Jdt = [F(t) - Ke(t) M(t)] Sm(t I t)

+ Llf(t).

(7.249)

Theorem 7.9. If the system model (7.232) is uniformly completely observable and uniformly completely controllable, and if iJj(t) is uniformly bounded and P( to) is bounded, then P( tit) is uniformly bounded.

255

LINEAR LIMITED MEMORY FILTER

Note that both (7.249) and (7.247) are linear, and their solutions may be written down using the filter state transition matrix defined in (7.246). By Theorem 7.4 for the continuous filter, (7.249) is uniformly asymptotically stable, and, since iJf(t) is uniformly bounded, so is om(t I t). As a result, D(t) is uniformly bounded. But then, by the same arguments, P(t I t) is uniformly bounded. • PROOF:

10. LINEAR LIMITED MEMORY FILTER

The results and discussion of Sections 8 and 9 indicate that, if the dynamical model parameters are not sufficiently well known, the "optimal" filter may diverge. Even if the actual estimation error covariance matrix is bounded, the actual estimation errors may be too large and the estimate unusable. In that case, for all practical purposes, the filter is diverging. The onset of divergence is, of course, determined by statistical inconsistency of the actual estimation errors with the computed error covariance matrix. The problem of divergence is discussed in detail in the next chapter, where various remedies for divergence are outlined. One of these is the idea of limiting the memory of the filter, thus curtailing the growth of the actual estimation error covariance matrix. This has already been discussed in Section 6.11, as well as in Section 8. The limited memory filter appears to be the only device for preventing divergence in the presence of unbounded iJep(k) or iJf(t) (see Theorems 7.7 and 7.9). Since the limited memory filter has a sound theoretical basis (most other remedies are ad hoc), we develop its theory here. We note that the idea of limiting the filter memory by various means has been advocated by a number of authors. Schweppe [49] develops a form of the limited memory filter. Similar ideas are contained in Cosaert and Gottzein [12]. Schmidt [48] advocates overweighting the most recent observations, and limiting information by other means. Our development here follows that of Jazwinski [28]. We assume that our discrete dynamical system is noise-free:

Po >0,

(7.250)

> 0.

(7.251)

with observations Vk ,......,

N[O, R(k)],

R(k)

We propose to limit the filter memory sufficiently so that (7.250), (7.251) is an adequate approximation to the real system over the limited time

256

LINEAR FILTERING THEORY

arcs. The absence of noise in (7.250) is fundamental to our theory. We now specialize the results of Section 6.11 to this linear case. Using the notation (Ym , Yn , YN ) defined in Section 6.11, let

g tS'{x" I Y,,}, P,," g 19'{(x" - x"")(x,, - X,,")T I Y,,}, x"m g 19'{x" I Y m} = (/J(n, m) x mm, p"m g 19'{(x" - x"m)(x" - x"m)T I Y m} = (/J(n, m) Pmm(/JT(n, m). X,,"

(7.252)

Theorem 7.10.

If X n is completely observable with respect to Y N , then (given below) are, respectively, the conditional mean of X n and the conditional covariance matrix, where the conditioning is on Y N only: ~I, p~1

.0'1 "

~I

= PNI(pn-t. " _ pm-t. m) . . " x.. .. x.. '

(7.253)

is also the maximum likelihood estimate of X n based on Y N



Since Po > 0, and in view of the discussion following Lemma 7.1, Pnn, Pnm > 0, so that the densities appearing in Lemma 6.5 are well-defined Gaussian densities. According to Example 7.2, the parameters of I YN ) (that is, its mean and covariance matrix) are, respectively,

PROOF:

r».

19'{x.. 1 Y N ,p(xo)}

= cov{x" I Y N ,p(xo)} x

[f

i-m+1

(/JT(i, n) MT(i) R-1(i) Yi

+ (/JT(O, n) POIX

O] ,

(7.254)

where J(n, m

+ 1) = L"

(/JT(i, n) Wei) R-l(i) M(i) (/J(i, n)

>0

(7.255)

i=m+l

by hypothesis. Therefore, the Pol = 0 limits in (7.254) exist, so that p( X n I Y N I) is a well-defined Gaussian density [defined according to (6.133)]. We now compute the parameters of p(xn I Y N I) using Theorem 6.8.

257

LINEAR LIMITED MEMORY FILTER

Now p(xn

x

m n , Pnm.

I Y N ) has parameters xn n ,

Pnn;

Therefore, by Theorem 6.8,

andp(xn

I Y m) has parameters

p(~,,1 Y N I)

= c2exp{-U(x" - X:)T P;:-l(X" - x,,") - (x" - x"rn)T P;;>-l(X" - x"m)]). (7.256)

Since this density is Gaussian, its mean coincides with its mode. The mode minimizes the quantity in brackets in (7.256). Setting the gradient (with respect to x n ) of that quantity to zero, we get (7.257) ~I

1

is the solution of (7.257). p;:I- is the matrix of second derivatives

Note that p;:1 = J-l(n, m + 1), which is positive definite by hypothesis, so that p;:1 is nonsingular. This establishes (7.253) and proves the theorem. • Several remarks concerning Theorem 7.10 are in order. We note that

xnn , Pnn are the outputs of the discrete Kalman filter of Theorem 7.2. xnm , Pnm are predicted values computed via (7.28). We may solve (7.253) for xn nand P nn and use the resulting equations to process batches of preprocessed data (preprocessed via a least-squares procedure) and characterized by x;:1 and P;:'. If m = 0, Eqs, (7.253) can be used to

remove the conditioning of a Kalman estimate on any initial data. In fact, we can remove Xo , Po and replace it by xo' , Po'. We may note that, with n = m (N = 0), p;:,-l = 0 (no information), and ~, is arbitrary. For N large, p;:,-l 1'::::1 p;:-l and ~, 1'::::1 xn n, which are the intuitively correct properties. Equations (7.253) generate the minimum variance estimate based on a "moving window" of the most recent N observations. Two Kalman filters and a predictor are required in their implementation, and N observations have to be stored in memory. Three indicated matrix inversions are required. Now, aside from the large computational load, this scheme is clearly no solution to the problem of divergence. The two required Kalman filters would diverge! Based on Theorem 7.10, we propose the following limited memory filter, which, at the same time, decreases the computational load and eliminates the storage requirement. The conditioning of the estimate on old data is discarded in batches of N. We filter N observations and also

258

LINEAR FILTERING THEORY

predict over the same time arc. This produces XN N , pNN, xN O, PNo. Equations (7.253) are used to compute xZ', PZ' (we have just eliminated the prior data). We next set xZ', PZ' as initial conditions for both the filter and predictor, and repeat the procedure. The limited memory filter just described produces limited memory estimates with memory varying between Nand 2N. Clearly, variations on this are possible, and N itself may be varied in the process. N should be chosen so that the model (7.250) represents an adequate approximation to the real system over time arcs of 2N. We next discuss some alternate computational procedures. The second of Eqs. (7.253) involves two matrix inversions and a subtraction, and could be ill-conditioned computationally. Recognizing that P;;' = J-l(n, m + 1) (see 7.255), we may compute this matrix by the recursion f(k, m + 1)

=

fPT(k - 1, k) f(k - 1, m + 1) fP(k - 1, k)

+ MT(k) R-l(k) M(k), with f(m, m

k = m + 1,..., n,

(7.258)

+ 1) = O.

Then the first of Eqs. (7.253) can be written as

x:1

=

xnn

+ f-l(n, m + 1) Pr;:-l(xnn - xnm ) .

(7.259)

Equation (7.258) requires the inverse of the state transition matrix, but this is sometimes available in closed form.P In that case, p;:-l can also be computed recursively: k

= m + 1,... , n,

(7.260)

Recursions (7.258) and (7.260) can be carried along with the Kalman filter. As was mentioned earlier, Schweppe [49] develops another form of the limited memory filter with a constant memory length. That algorithm, however, requires the storage of N observations together with M(k) and R(k), k = m + 1,... , n; the inverse of the state transition matrix; prediction over arcs of length N every time a new observation is acquired; and inversion of an information matrix every time a new observation is acquired. Although a constant memory length is desirable, the computational load associated with the filter just described is very large. 12 In orbit determination, closed-form approximations to the state transition matrix and its inverse are available.

259

APPENDIX 7A

APPENDIX 7A CLASSICAL PARAMETER ESTIMATION In this Appendix, we point out some interesting connections between filtering theory and classical parameter estimation theory. We shall see that estimation and filtering are based on essentially the same theory. Consider the problem of estimating the (fixed) parameter Xk from the observations Yi

M(i) <1>(i, k) X k + Vi ,

=

i

=

1,... , k;

(7A.l)

where {Vi} is a white, but not necessarily Gaussian, vector sequence with C{Vi}

=

u, >0.

0,

The parameter Xk may be viewed as the state, at time t k , of the dynamical system Xi+! = <1>(i + 1, i) Xi'

Yi

=

M(i) Xi

+ Vi .

We assume that X k is completely observable with respect to {Yl ,..., Yk}' We wish to find the linear, unbiased, minimum variance estimator for Xk • That is, we seek that estimate Xk * in the class of estimates {t k},

gk

k

=

L AiYi,

(7A.2)

i-I

which is unbiased, (7A.3) and for which

*-

tr C{(Xk

Xk)(Xk * - Xk)T}

is a minimum. The answer is provided in the classical

Theorem 7A.l (Gauss-Markov). Let Xk be completely observable with respect to {Yl ,..., Yk}' Let {Vi} be a zero-mean, white, not necessarily Gaussian vector sequence, R, > O. Then the linear, unbiased, minimum variance estimate of Xk is Xk * = Jk.~

k

L <1>T(i, h) MT(i) Ri1Yi , i~l

where Jk,1

=

k

L <1>T(i, k) MT(i) Ri1M(i) rp(i, h) i~l

is the information matrix. The covariance matrix of X k * is J;;;\ .

260

LINEAR FILTERING THEORY

Consider any linear, unbiased estimate

PROOF:

/c

g/c =

L AiYi'

i~l

We have

and Therefore, /c

=

=

L AiRiAl

i=l /c

L [J;'~Li + (Ai ;=1

J;'~Li)] R;[J];.llLi

+ (Ai -

.J"k:1Li)]T,

(7A 4)

where Now

t

(Ai - J

i~l

since

gk

k.~Li) R;LlJ"k.; =

[t A;M(i) tJ>(i, k) - I] J];:l = 0, i~l

is unbiased. [From (7A.3),

°= g{g/c} -

X/c = [t1 A;M(i) tJ>(i,

k) - I] X/c

for all Xk]. As a result, (7 AA) becomes

=

/c

L [J;'~L;R;L~ J;,; + (Ai -

i-1

J;'~Li) Ri(A i - J"k~Ll].

Since R i > 0, the trace of this matrix is minimum when Ai that is, when gk = X k *. From (7A.5), it then follows that g{(x/c * - x/c)(x/c * - X/c)T} = J];.ll'



(7A.5)

= Jk-;fL i ,

APPENDIX

7B

261

Compare the linear, unbiased, rrunimum variance estimate (Theorem 7A.l) with the estimate (7.41) of Example 7.2, which derives the discrete Kalman-Bucy filter. We see that with POl = 0 they are identical. Thus we see again that the absence of a priori information about Xk is expressed formally by setting POl = O. We also have

The minimum variance filter in the Gaussian case is also the linear minimum variance filter in the general case, without any assumption of Gaussianness.

Theorem 7A.2.

This conclusion can also be understood in the following way. If we seek the best linear estimate, then only the first two moments of the {Yk} process need be known. These moments define a unique Gaussian process. Therefore, the linear minimum variance filter in the Gaussian case must be the same as in the general case. But in the Gaussian case, the variance is minimized by the conditional mean which is a linear functional of the {Yk} process. Therefore, in the Gaussian case, the minimum variance and linear minimum variance filters are the same. Thus the conclusion follows. Recall that, in the derivation of Example 7.2, no statistical significance was attached to the R i , that is, the errors Vi were not considered random variables with a given distribution. Now if one is willing to stipulate that the Vi are zero-mean, one can conclude that the estimate (of Theorem 7A.l) is unbiased. If one is further willing to specify the second moments of Vi' that is, R i , one can conclude by the Gauss-Markov theorem that the estimate is a linear, minimum variance, unbiased estimate. If the Vi are furthermore Gaussian, then Theorem 7A.2 asserts that the estimate is the minimum variance estimate. Thus there is a direct relationship between the assumptions made about the noise and the conclusions that can be drawn about the estimate. APPENDIX 7B

SOME MATRIX EQUALITIES Let P, R, and M be n X n, m X m, and m X n matrices, respectively. Suppose P ~ 0 and R > O. Then the following equalities hold: (1

(1

+ PMTR-IM)-I =

1 - PMT(MPMT

+ PMTR-IM)-l P = P -

(1 + PMTR-IM)-l PMTR-I

=

PMT(MPMT

+ R)-I M,

+ R)-l MP,

PMT(MPMT + R)-l.

(7B.l)

(7B.2) (7B.3)

262

LINEAR FILTERING THEORY

If, in addition, P

> 0, then

(I

+ PMTR-IM)-l P =

(P-l

+ MTR-IM)-l.

Using (7B.4) in (7B.2) and (7B.3), we get (for P, R (P-l (P-l

+ WR-IM)-l =

+ WR-IM)-l MTR-l =

> 0)

P - PW(MPMT PW(MPMT

(7B.4)

+ R)-l MP,

+ R)-l.

(7B.5) (7B.6)

These matrix equalities are relatively well known in numerical analysis (e.g., Bodewig [6, p. 30]). Equation (7B.5) was introduced to the filtering field by Ho [26]. Equation (7B.l) is easily proved by direct verification: (I

+ PMTR-IM)(I - PMT(MPMT + R)-l M) = I + PMTR-IM - PMTR-IR(MPMT + R)-l M - PWR-IMPW(MPW + R)-l M =1.

Equation (7B.2) follows trivially. To prove (7B.3), multiply (7B.2) on the right by MTRr": (I

+ PMTR-IM)-l PMTR-l = PMTR-l - PW(MPMT + R)-l MPWR-l =

+ [PW(MPW + R)-l PMT(MPMT + R)-l.

PMT(MPMT

+ R)-l RR-l]

Equation (7B.4) follows from [(I

+ PMTR-IM)-l P]-l =

P-l(! + PWR-IM) = p-l

+ MTR-IM.

REFERENCES 1. M. Aoki, "Optimization of Stochastic Systems." Academic Press, New York, 1967. 2. R. H. Battin, Statistical Optimizing Navigation Procedure for Space Flight, ARS ]. 32, 1681-1696 (1962). 3. R. H. Battin, "Astronautical Guidance." McGraw-Hill, New York, 1964. 4. R. E. Bellman and S. E. Dreyfus, "Applied Dynamic Programming." Princeton Univ. Press, Princeton, New Jersey, 1962. 5. R. E. Bellman, H. H. Kagiwada, R. E. Kalaba, and R. Sridhar, Invariant Imbedding and Nonlinear Filtering Theory, ]. Astronaut. Sci. 13, 110-115 (1966). 6. E. Bodewig, "Matrix Calculus." Wiley (Interscience), New York, 1959.

REFERENCES

263

7. A. E. Bryson and M. Frazier, Smoothing for Linear and Nonlinear Dynamic Systems, Proc. Optimum Systems Synthesis Conf. Wright-Patterson Air Force Base, Ohio, February 1963. U. S. Air Force Tech. Rept, ASD-TDR-063-1l9, 1963. 8. A. E. Bryson and D. E. Johansen, Linear Filtering for Time-Varying Systems Using Measurements Containing Colored Noise, IEEE Trans. Automatic Control 10, 4-10 (1965). 9. A. E. Bryson, Jr. and L. J. Henrikson, Estimation Using Sampled-Data Containing Sequentially Correlated Noise, AIAA Guidance, Control and Flight Dynamics Conf., Huntsville, Alabama. Paper 67-541, (1967). 10. A. E. Bryson, Jr. and Y. C. Ho, "Applied Optimal Control." Ginn (Blaisdell), Waltham, Massachusetts, 1969. II. E. A. Coddington and N. Levinson, "Theory of Ordinary Differential Equations." McGraw-Hill, New York, 1955. 12. R. Cosaert and E. Gottzein, A Decoupled Shifting Memory Filter Method for Radio Tracking of Space Vehicles, Intern. Astronaut. Congr. 18th, Belgrade, Yugoslavia. Presented paper (1967). 13. H. Cox, Estimation of State Variables via Dynamic Programming, Proc. 1964 Joint Automatic Control Conf., Stanford, California, pp. 376-381, (1964) .. 14. D. M. Detchmendy and R. Sridhar, Sequential Estimation of States and Parameters in Noisy Non-linear Dynamical Systems, Proc. 1965 Joint Automatic Control Conf., Troy, New York, pp. 56-63 (1965). 15. J. J. Deyst and C. F. Price, Conditions for Asymptotic Stability of the Discrete, Minimum Variance, Linear Estimator, IEEE Trans. Automatic Control (to appear). 16. S. L. Fagin, Recursive Linear Regression Theory, Optimal Filter Theory, and Error Analysis of Optimal Systems, IEEE Intern. Conv. Record 12, 216-240 (1964). 17. R. J. Fitzgerald, Error Divergence in Optimal Filtering Problems, Second IFAC Symp: Automatic Control in Space, Viennu, Austria, September 1967. Presented paper (1967). 18. B. Friedland, On the Effect of Incorrect Gain in the Kalman Filter, IEEE Trans. Automatic Control 12, 610 (1967). 19. P. A. Gainer, A Method for Computing the Effect of an Additional Observation on a Previous Least-Squares Estimate, NASA Technical Note D-1599, 1963. 20. 1. M. Gelfand and S. V. Fomin, "Calculus of Variations." Prentice-Hall, Englewood Cliffs, New Jersey, 1963. 21. R. E. Griffin and A. P. Sage, Large and Small Scale Sensitivity Analysis of Optimum Estimation Algorithms, Proc. Joint Automatic Control Conf., Ann Arbor, Michigan, June 1968, pp. 977-988 (1968). 22. R. E. Griffin and A. P. Sage, Sensitivity Analysis of Discrete Filtering and Smoothing Algorithms, AIAA Guidance, Control, and Flight Dynamics Conf., Pasadena, California, August 1968. Paper No. 68-824 (1968). 23. W. Hahn, "Theory and Application of Liapunov's Direct Method." Prentice-Hall, Englewood Cliffs, New Jersey, 1963. 24. H. Heffes, The Effect of Erroneous Models on the Kalman Filter Response, IEEE Trans. Automatic Control II, 541-543 (1966). 25. Y. C. Ho, The Method of Least Squares and Optimal Filtering Theory, The RAND Corporation, Memorandum RM-3329-PR, 1962. 26. Y. C. Ho, On the Stochastic Approximation Method and Optimal Filtering Theory, ]. Math. Anal. Appl. 6, 152-154 (1963). 27. Y. C. Ho and R. C. K. Lee, A Bayesian Approach to Problems in Stochastic Estimation and Control, IEEE Trans. Automatic Control 9, 333-339 (1964).

264

LINEAR FILTERING THEORY

28. A. H. Jazwinski, Limited Memory Optimal Filtering, Proc. 1968 Joint Automatic Control Con]. Ann Arbor, Michigan, June 1968, pp, 383-393 [Also in IEEE Trans. Automatic Control 13, 558-563 (1968)]. 29. R. E. Kalman, Contributions to the Theory of Optimal Control, Bol. Soc. Mat. Mexicana 5, 102-119 (1960). 30. R. E. Kalman and J. E. Bertram, Control System Analysis and Design via the "Second Method" of Lyapunov, I. Continuous-Time Systems, Trans. ASME, Ser. D: J. Bask Eng. 82, 371-393 (1960). 31. R. E. Kalman and J. E. Bertram, Control System Analysis and Design via the "Second Method" of Lyapunov, II. Discrete-Time Systems, Trans. AS},{E, Ser, D: ]. Basic Eng. 82, 394-400 (1960). 32. R. E. Kalman, A New Approach to Linear Filtering and Prediction Problems, Trans. ASME, Ser, D: ]. Basic Eng. 82, 35-45 (1960). 33. R. E. Kalman and R. S. Bucy, New Results in Linear Filtering and Prediction Theory, J. Basic Eng. 83, 95-108, 1961. 34. R. E. Kalman, On the General Theory of Control Systems, Proc, IFAC Congr lst, 1, 481-491 (1961). 35. R. E. Kalman, Y. C. Ho, and K. S. Narenda, Controllability of Linear Dynamical Systems, in "Contributions to Differential Equations" 0. P. Lasalle and J. B. Diaz, eds.), Vol. 1. Wiley (Interscience), New York, 1963. 36. R. E. Kalman, New Methods in Wiener Filtering Theory, Proc. Symp. Eng. Appl. Random Function Theory and Probability, 0. L. Bogdanoff and F. Kozin, eds.), Wiley, New York, 1963. 37. R. C. K. Lee, "Optimal Estimation, Identification, and Control," Res. Monograph No. 28. M.LT. Press, Cambridge, Massachusetts, 1964. 38. R. K. Mehra, On Optimal and Suboptimal Linear Smoothing, Tech. Rept. No. 559, Division of Eng. and Appl. Phys., Harvard Univ., Cambridge, Massachusetts, March 1968. 39. R. K. Mehra and A. E. Bryson, Smoothing for Time-Varying Systems Using Measurements Containing Colored Noise, Proc. 1968 Joint Automatic Control Conf., Ann Arbor, Michigan, June 1968, pp, 871-883 (1968). 40. V. O. Mowery, Least Squares Recursive Differential-Correction Estimation in Nonlinear Problems, IEEE Trans. Automatic Control 10, 399-407 (1965). 41. T. Nishimura, On the a priori Information in Sequential Estimation Problems, IEEE Trans. Automatic Control 11, 197-204 (1966). 42. T. Nishimura, Correction to and Extension of "On the a priori Information in Sequential Estimation Problems, IEEE Trans. Automatic Control 12, 123 (1967). 43. T. Nishimura, Error Bounds of Continuous Kalman Filters and the Application to Orbit Determination Problems, IEEE Trans. Automatic Control 12, 268-275 (1967). 44. C. F. Price, An Analysis of the Divergence Problem in the Kalman Filter, IEEE Trans. Automatic Control (to appear). 45. H. E. Rauch, F. Tung, and C. T. Striebel, Maximum Likelihood Estimates of Linear Dynamic Systems, AIAA J. 3, 1445-1450 (1965). 46. R. W. Rishel, Convergence of Sampled Discrete Optimum Filters, Proc, 1968 Joint Automatic Control Con]. Ann Arbor, Michigan, June 1968, pp, 863-870 (1968). 47. S. F. Schmidt, Application of State-Space Methods to Navigation Problems, Adoan. Control Syst, 3, 293-340 (1966). 48. S. F. Schmidt, Compensation for Modeling Errors in Orbit Determination Problems, Analytical Mechanics Associates, Inc., Seabrook, Maryland. Report No. 67-16, November 1967.

REFERENCES

265

49. F. C. Schweppe, Algorithms for Estimating a Re-entry Body's Position, Velocity and Ballistic Coefficient in Real Time or from Post Flight Analysis, M.LT., Lincoln Lab. Lexington, Massachusetts. Rept, ESD-TDR-64-583, December 1964. SO. G. L. Smith, S. F. Schmidt, and L. A. McGee, Applications of Statistical Filter Theory to the Optimal Estimation of Position and Velocity on Board a Circumlunar Vehicle, NASA TR R-135, 1962. 51. H. W. Sorenson, Kalman Filtering Techniques, Adoan. Control Syst. 3 (1966). 52. H. W. Sorenson, On the Error Behavior in Linear Minimum Variance Estimation Problems, IEEE Trans. Automatic Control 12, 557-562 (1967). 53. P. Swerling, First Order Error Propagation in a Stagewise Smoothing Procedure for Satellite Observations, ]. Astronaut. Sci. 6, 46-52 (1959).