CHAPTER 7
Approximations f a r Parabolic Equations and Nonlinear Filtering Problems
The chapter treats the nonhomogeneous finite time version of the problem of Chapter 6. A degenerate parabolic equation replaces the degenerate elliptic equation. As in Chapter 6, a suitable finite difference approximation to the partial differential equation yields a Markov chain approximation to the diffusion. The interpolations of the Markov chains converge to the diffusion in distribution, as the finite difference intervals go to zero. With the use of the chain, many functionals of the diffusion can be approximated. Equivalently, we can approximate the solution to the degenerate parabolic equation by simple calculations with the chain. The explicit, implicit, and explicitimplicit methods are discussed, together with their probabilistic interpretations. An iterative method is discussed for solving for the weak-sense solution to the Fokker-Planck equation; i.e., to approximate the evolution of the weaksense density of the diffusion as time increases. A standard problem in nonlinear filtering is also treated; we give a method analogous to that for 116
7.2
117
THE FINITE DIFFERENCE APPROXIMATION
(approximately) solving the Fokker-Planck equation for approximating the conditional density of the diffusion, given " noise-corrupted " data. In the last section, there is some numerical data concerning the approximation of the invariant measure for a simple problem. 7.1 Problem Statement We will treat computational problems associated with the nonhomogeneous Eq. (1.1) and the cost functional (4.2.4). X ( s )= x
+ lfSf(X(u),u ) du + j f s ~ ( X ( uu)),dw(u),
X ( t ) = x. (1.1)
Assume A7.1.1 and A7.1.2. A7.1.1 f(-,.), c(*,.) are bounded and continuous, and (1.1) has a unique solution (in the sense of probability law) on t h e j n i t e interval [t, TI,for each 0 < t < T , and each x E R'. The solution is a strong Markou and a Feller process. (Again, as in Chapter 4, the Markov and Feller properties actually are consequences of the uniqueness.) A7.1.2 k( *, * ), b,( -,.), and bT( are bounded and continuous functions on R' x [0, TI, R' x [0, T ] and R' resp. a )
In the next section we will apply a finite difference technique to the parabolic partial differential equation (4.5.4) which, formally, is satisfied by the functional (4.2.4). The finite difference method will yield a Markov chain, whose interpolations will converge to X ( in distribution as the finite difference intervals A, h + 0. Again, the interpolation will be piecewise constant, but over nonrandom time intervals A. 9 )
7.2 The Finite Difference Approximation and Weak Convergence In the last chapter it was necessary to choose the finite difference approximations with some care if we wanted the coefficients in the finite difference equations to be one-step transition probabilities for some Markov chain. We must make analogous choices for the current problem. Let A and h denote the finite difference intervals to be used to approximate the derivatives with respect to time and with respect to each of the spatial directions, resp. The
118
7
PARABOLIC EQUATIONS
quantity h could vary with the coordinate direction but the formulas then become much longer. The approximations (2.1)-(2.4) are for the explicit method of approximation for parabolic equations. The implicit method is treated in the next section. We will use the finite difference approximations (2.1)-(2.4). In these formulas, we do let h depend on the direction, although in the subsequent development we will set h = h i . Y ( x 7t )
+
[ V(X, t + A) - V(X, t)]/A,
(2.1)
+ e i h i , t + A) - V ( x , t + A)]/hi, if J ( x , t) 2 0, [V(x, t + A) - V ( x - e i h i , t + A)]/hi, if J ( x , t) < 0, (2.2) Vxixi(x,t ) -,[V(x + e i h i , t + A) + V(x - e i h i , t + A) - 2V(x, t + A)]/h:, Vxi(x,t ) + [V(x +
(2.3)
+ A) + V ( x + eihi + e j h j , t + A) + V(x - eihi - e j h j , t + A)]/2hihj - [V(x + e i h i , t + A) + V ( x - e i h i , f + A) + V(x + e j h j , t + A) + V ( x - e j h j , t + A)]/2hihj, aij(x,t) 2 0, i # j , (2.4 - [2V(x, t + A) + V(x + eihi - e j h j , t + A) + V(x - eihi + e j h j , t + A)]/2hihj + [V(x + e i h i , t + A) + V ( x - e i h i , t + A) + V(x + e j h j , t + A) (2.4 + V ( x - ejhj , t + A)]/2hihj , aij(x, t) < 0, i # j .
Vxixj(xrt) + [2V(x, t
Vxixj(x,t)
Analogously to (6.2.4), we assume that (for x E R', t < T ) aii(x, t) -
1 laij(x, t) 1 2 0,
i = 1,
and that the quantity defined below by phvA(x,x) is nonnegative:
"[
p y x , x) = 1 - h' h
(2.51
. . ., r,
i#i,j
1i 1 J ( x , t) I + 2 1 Uii(X, t) - 1 1 U i j ( X , t ) I i
i#j, i . j
1
2 0.
(2.6a)
Both (2.5) and (2.6a) are needed to assure that the finite difference coefficients will be nonnegative. The conditions (2.5) and (2.6a) can be considerably weakened if we allow h to depend on the coordinate direction or by use of a coordinate rotation before the substitution of (2.1)-(2.4) into the differential equations. Before proceeding, we briefly examine a simple twodimensional case where hi is allowed to depend on i.
7.2
119
THE FINITE DIFFERENCE APPROXIMATION
Example Consider
Y ( x 9t ) + 9VxIxl(x,t ) + 6VxIx2(x,t ) + Vx2x2(x, t ) + k(x, t ) = 0, XEG, t l T . Then a,, = 9, a l , = a,, = 3, and a,, = 1. A rotation of coordinates can clearly be used to transform the problem into one in which the a( ) matrix is diagonal. For numerical purposes this may be useful. But if we restrict ourselves to using (2.1)-(2.4) without a prior rotation and allow the possibility that h, # h,, then (2.5) must be replaced by
-
a,,/h~-a,,/h,h,~O,
a22/h~-a,,lhlh,20.
Any pair (h,, h,) satisfying h, /h, = 3 will satisfy these two inequalities. The Approximating Markov Chain Define MA by M A A = T , where we assume (an unimportant assumption, for notational convenience only) that M A is an integer. For x, y E R i , t = 0, A, ..., A ( M A - l), define the function p:*"(x, y ) by (2.6a) for y = x and otherwise by p:, ' ( x , x f e, h ) =
ifj.
i
1 aij(x,t ) I + hf'(x)
1
b
+ eih - e j h ) = p:vA(x, x - eih + e j h ) = (A/h2)a,7(x,t), ' ( x , x + e, h + ejh ) = p:* ' ( x , x - eih - e jh ) = (A/h2)a;(x, t),
p:"(x, p:
a,,(x, t ) -
x
pFA(x,y ) = 0,
(2.6b) (2.6~) (2.6d)
for all x , y pairs not covered in (2.6a-c).
Note that the p?'(x, y) are 2 0 and sum (over y) to unity for each x, t. Hence, *)} is a one-step transition function for a Markov chain, whose random variables we denote by {
V",' ( x , nA) =
1 P;~'(X, Y
y)Vh*"(y, nA
+ A ) + k(x, d ) A ,
n < MA, x
with the above boundary conditions. In terms of the random process
E
Gh ,
(2.7a)
{tt-"},
7
120
PARABOLIC EQUATIONS
we can write (2.7a) as Vh.A(x,nA) = E x ,
Vh*A(ti;"l , nA + A ) + k(x, nA)A,
n < MA, x E G h , (2.7b)
<:
where we use t = nA and A = x . The notation Ex, , denotes the expectation given that t;:. A = x . Let N ( h , A ) = min{n : 4: A # G,]. Since we have defined the chain only up to time M A , set N(h, A ) = oc) if <;A E G, for all n < M A . According to Section 3.1.3, the unique solution to (2.7b) is
[
Vh.' ( x , nA) = Ex.
(MA n N ( h , A ) ) - 1
1
i=O
+ bl(t$(t,A)
9
k(
N(h, A ) A ) ' { M ~ t N ( h .
A))
., .
1
(2.8)
.
) satisfies the required boundary conditions. Fix x E R; and t = nA. Define the interpolated process th*A(*) on [t, T ] by = x and thsA(s)= t i Aon [mA, mA + A), m 2 n, and let p(h, A ) = N ( h , A ) A denote the escape time of th*A(-) from G,. Then (2.8) can be rewritten as Vh*"(
Vh'' ( x , t ) = Ex,
[ 'r
T n p(h. A)
k(th' '(S),
S)
ds
+ E x , rbT(th' A(T))z(T
+ E x , r bl(th* " ( p ( % A))?p(h, A ) ) z ( T 2 p ( h .
A))
(2.9)
A)).
The notation E x , r denotes the expectation under the condition that 5,. ' ( t ) = x . The chain has the .local properties ( M A > m 2 n )
It iA =Y ] =f(y)A, .[titi - t i A I t i A = Y ] = 2a(y)A + h @ ( y ) - A 2 f ( y ) f ' ( y )
E x , .[tit1 - t i A COV,,
(2.10) (2.11)
= 'h(Y)A9
which are precisely the same expressions as (6.2.10) and (6.2.11) with A replacing Ath(y).The term 3( ) is defined below (6.2.11). We can write
-
tit1 = t i A+f(t, )
+ B:",
m 2 n,
= x,
(2.12)
where ( p i " } is an orthogonal sequence whose conditional covariance is given by the right-hand side of (2.11). From here on, the development is almost exactly like that in Chapter 6. It is a little simpler since we need only consider the processes on a finite
7.2
121
THE FINITE DIFFERENCE APPROXIMATION
interval [0, T ] and since the interpolation interval is a constant, not a random variable. Define (where i 2 n )
Then, analogously to (6.3.1), we have
th*'(s)
=x
+ Fh' '(s) + Bh*'(s),
sE
[t, TI.
The proofs of Theorems 6.3.1 and 6.3.2 immediately yield
.), o( ., - ) be bounded and continuous. Theorem 7.2.1 Fix x, t and let f Then { t h s A ( - ) , B h * A ( - t) , T , p(h, A) n T } is tight on D2'[0, T ] x [0, TI. If h, A indexes any convergent subsequence with limjt <(-), B ( * ) ,p n T , then t(.), B( .) have continuous paths w.p. 1. There is a Wiener process W (*), with respect to which t(.) is nonanticipative and ( a ,
=,+ItS f
( t ( v ) , v ) dv
+ B(s),
s
E
[t, TI.
Assume A7.1.1. Then the multivariate distributions of depend on the particular subsequence.
(2.13)
t(.),B ( . ) do
not
Define 7 and 7' as in Section 4.2. Then Theorems 7.2.1, 6.4.1, and 6.4.2 immediately yield
Theorem 7.2.2
Assume A7.1.1, A7.1.2, andfix x , t and suppose that P x , r= { ~T } = 0
Px,,{tn T
= 7' n
T } = 1.
(2.14a) (2.14b)
Then Vh'A(X,t ) + R ( x )
(2.15)
as h, A -+ 0 [always assuming that (2.5) and (2.6a) hold as h, A -+ 01. I f g ( * ) is any bounded measurable function: D'[O, T ] + R', which is continuous w.p. 1 with respect to the measure induced by (1.1) on D'[O, TI, then ~ x , r 9 ( t h ~ A ( . ) ) + E x , r 9 ( X as ( . ) h, ) A -+a
REMARKS For each x , t (2.14a) must hold except for a countable set of T. In fact (2.14) holds except for a set of ( x , T )ofzero measure. The problem of showing (2.14b) [which is more difficult than showing (2.14a)l is precisely
7
122
PARABOLIC EQUATIONS
the problem of showing the analogous result P,{z = T'} = 1 for the problem of Chapter 6. The condition (2.14a) is not required if the boundary function has no discontinuity at t = T, x E dG. Consider the classical problem of the r-dimensional heat equation, where f (- ) = 0, aij = 0 for i # j and uii(x)= a2/2, where a > 0 is a real number. Then the nonnegativity condition (2.6a) reduces to 1 - rAo2/h2 2 0 pr rA I h2/az,which is precisely the classical criterion of von Neumann for stability of the finite difference approximation [using (2.1) and (2.3)] to the heat equation (Garabedian [Gl, p. 4761, for r = 1). The chain can be used for Monte Carlo simulation of the X(.) process (1.1). Interpolations over random time intervals are not required and owing to the weak convergence Theorem 7.2.2, the distributions of the values of a large class [the g ( - ) functions of the theorem statement] of functions of X ( .) can be approximated using simulations of the chain {ti"}. In this context we should note that piecewise constant interpolations are not required. The interpolations can be taken piecewise linear, for example, without losing the convergence property; perhaps this would yield "better " estimates of the functions. The best form for the interpolation has not been investigated. Whether it is possible to (or how one may) interpolate between several points is not at all clear. In certain cases, one must exercise great care in doing this since we may get convergence of the resulting {ths"( .)} to some process other than (1.1). See Wong and Zakai [W3] or McShane [M2] for a discussion of limits corresponding to various naive or not "approximations" to diffusions. Let $(-) be a Rr-valued Wiener process which is defined on the same probability space on which {c:"} is defined, and let the two processes be independent of each other. Then, using {t: "} instead of in the constructions of Section 6.6, we define the random variables {6W:"} and process Wh* A( .), exactly as {6W;}and Wh(* ) were defined. Then, as in Section 6.6, we have that Wh7""()-,W ( . ) ,as h, A -,0, where W ( * )is the Wiener process used in (2.13), and we can also write (see (6.6.6))
{e:"}
{e;}
=
<: " +f((:
")A -I-ah({: ") SW:
"+
E;
"9
where the interpolation eh* "(. ) of {E: "} tends to the zero process as h, A -,0. The Fokker-Planck and Kolmogorov Equations Under various assumptions (which we will not be concerned with heresee Section 4.3), the density p(x, t ; y, s) of the transition function P ( x , t ; r, s) (s > t) satisfies the backward p.d. Kolmogorov equation in the initial variables (x, t), and the forward Kolmogorov or Fokker-Planck equation (4.3.2) in the forward variables (y, s). There are many problems that are
7.2
123
THE FINITE DIFFERENCE APPROXIMATION
associated with the usual approximations to the solution of the backward equation, even iff( -, -), c( * ) are smooth and a( -, - ) is positive definite. It is not uncommon that " numerical solutions " actually take negative values or have other pathologies. Furthermore, even if a "good" solution can be obtained, if it is only an approximation to the fundamental solution of (4.3.2) then it may be of use for approximating expectations of functionals of X( at some fixed time s, but may not be useful for approximating expectations of more general functionals of X(.). It should be clear that we can get an approximation to the solution of' both the forward and backward equations directly from the p$ "( .). We will develop the details for the nonhomogeneous case. Suppose that T = 00, and order the points on R; and? Gh u dGh in some way. Define the matrices
.,
a )
.,
where we let nA = t and NA = s. The empty product is defined to be the identity matrix. The matrix 02 i can be considered to be a function of four variables, the row and column entries x, y and the initial and final times n, N. Considered as such, it is a weak-sense approximation to the weaksense density p(x, t; y, s ) (as a function of x, t, y, s ) in the following sense. Let g(-) denote a bounded and continuous real-valued function on R'. As h, A + 0, but with nA and NA fixed at t and s, resp., Theorem 7.2.2 yields that
1 g(yP2 i ( x , Y ) Y E Rh'
+
I
(2.17)
g(y)P(x, t ; dY, s).
The approximation D: i is on the infinite state space R ; . We can stop the process on leaving a bounded open set G.Let P ( x , t ; r, s ) denote the transition function for the stopped process, and $;AA( -,- ) the one-step transition function for the chain, stopped on first hitting dG,. Define P:kA = {$:*"(x, y), X, y E Gh U dGh}, and define 3
eA *"
&I!
G {$i(X,
y), X, y
E
Gh
U
dGh).
(2.18)
Assume that P , ,{T = T'} = 1 and P , ,{T = s} = 0. Theorem 7.2.2 yields
1
Y E Gh u
s(y)$
i(x, y) +
JGg(y)@,
t ; dy, s)
as h, A -b 0. (2.19)
The limit in Eq. (2.19) is valid if we sum and integrate only over dGhand dG, resp., or over Gh and G, resp. Using (2.18), the expectation can be iterated
t Recall that aG, is the set of points in R, - G,
which are only one node away from G,.
7
124
PARABOLIC EQUATIONS
forward or backward in time, yielding weak-sense approximations to the weak-sense density j(x, t ; y, s) as s or t vary, resp. If we are only interested in approximating (weakly) the weak-sense solution to the forward equation (for the stopped diffusion) for some particular fixed initial value x E G,, then the iteration in (2.18) can be simplified by proceeding as follows. Let @$(x) denote the xth row of 62;.Define @p(x) = (0,0, ..., 1, 0, ...) where the only nonzero element is in the xth place. Define, recursively
O"!$+,(X)
= &$(x)P$",
N = n,
... .
(2.20)
The ;(x) constructed in (2.20) is precisely the xth row of D!:; . The amount of computation required to calculate (2.20) is of the same order as that required with the use of any finite difference approach to the forward equation. Our approximation is a transition density for a process {t;:"} which is very closely related to the diffusion X ( . ) .
7.3 Implicit Approximations The approximations (2.1t(2.4) constitute an explicit type of approximation, in the sense that they yield explicit recursive equations for the evaluation of the solution, in the classical sense of numerical analysis (Forsythe and Wasow [F2]). This is made evident by the iteration (2.7a or b), which starts at t = T = N A (where the boundary condition is given), and moves backward one A step at a time, and Vh*"(x, nA) is computed as an explicit linear combination of the Vh*"(y, nA A), y E G, u JG,. To get the standard type of implicit approximation (Forsythe and Wasow [F2]), we need to approximate the parabolic equation by considering it as a degenerate elliptic equation and use the method of Chapter 6. To do this, we make time into a state variable by defining a new variable X o ( . ) , with X o ( t ) = 1. Then we have (4.5.4) with a/dxo and xo, resp., replacing a p t and c, resp- The set G is replaced by G x [0, T), from which the escape time is always bounded above by T. Let T and T' be defined as in Chapter 4.2 and assume (2.14). Then we can apply the various theorems in Chapter 6. Since there are no second derivatives in t appearing in (4.5.4) even if we use a difference interval A for the approximation of the derivative a/& and h for the approximation of the spatial derivativatives, where h is not necessarily equal to A, we still get tractable formulas. The method is straightforward. Here we will give an illustration of the idea, by means of a simple onedimensional problem.
+
7.3
125
IMPLICIT APPROXIMATIONS
We consider the example K(x9t ) + a(x, t)Vxx(x,t ) + f ( x , t)Vx(x,t ) + k(x, t ) = 0, (x, t ) E G x [O, T ) , V ( X ,T ) = bT(x) V ( x , t ) = b,(x, t ) ,
x
E
(3.1)
G,
x E dG, t I T .
In Chapter 6, we required that the boundary function b( - ) be continuous. But, if P,, t(r = T } = 0, then the discontinuity at the corner dG x {T} does not affect the convergence since the function with values (at arbitrary x ( - )E D[O, TI, and where z(x(.)) = r is the escape time of x ( * ) from G x [O, T ) ) bT(X(f))l{r>T)
+ bl(x(r),
r)z{r
is still continuous w.p. 1 with respect to the measure induced by (1.1). Define 1 I f ( X , t ) lA/h, Qh, A(& t ) = 2a(& t)A/hz Ath(%t ) = W Q h , A ( & t). since X , namely
=
1 > 0, we use the forward difference approximation for K(x, t),
V(X, t ) + [ V ( X t, + A ) - V ( X ,t)]/A.
(34 Substituting (6.2.1), (6.2.2), and (3.2) into (3.1) and rearranging terms yields the finite difference approximation Vh.' ( x , t ) = Q [ i ( x , t){Vh*' ( x , t
+ A ) + Vh(x+ h, t)[a(x,t)(A/hz)
+ f + ( x ,t)(A/h)]+ Vh(x- h, t)[a(x,t)(A/h2)+ f - ( x , t)(A/h)l}
+ k(x, t)Ath.' ( x , t),
x E Gh , t C T. (3.3) Define the function phpA(x,t ; y, s ) by letting ph*'(x,t ; x, t A ) and ph*'(x, t ; x f h, t ) be the coefficients of Vh*'(x,t + A ) and P v A ( xk h, t), resp., in (3.3). Define ph*'(x, t ; y, s ) = 0 for all values of (y, s) besides (x, t A), ( x k h, t ) . The phsA(z;z') ( z = (x, t ) pair) are 2 0 and sum to unity over z' for each z. They are one-step transition probabilities for a two-dimensional Markov chain whose random variables we denote by We can rewrite (3.3) in the form
+
+
{s:."}.
VhsA(x,t ) = Vh(x,t + A)phvA(x,t ; x, t
+ A) + vh(x+ h, t)phVA(x,t ; x + h, t ) + Vh(x- h, t)phVA(x,t ; x - h, t ) + k(x, t)Ath*A(x,t), x E Gh, t < T.
(3.4a)
126
7 PARABOLIC EQUATIONS
If we write z = (x, t), then (3.4a) can be rewritten using the notation for the homogeneous case; namely, as
z? "
Vh*"(z) = E, Vh."(t? ")
+ k(Z)Ath."(z),
(3.4b)
zh*
is the successor state to z . Define the process "( to be the where interpolation of '}, in the usual fashion, with interpolation intervals At:. " = Ath. "). Then the results of Chapter 6 imply that Vh*"(z) -, R ( z ) = R(x, t) as h, A + 0. The flow of time is implicit in (3.4). Strictly speaking, (3.4a) cannot be solved in the same simple backwards iteration fashion as can (2.7). In a
{z:
a )
"(t:
(C)
FIG.7.1 State flows for the discretization of a parabolic equation in one state variable. (a) Explicit method. (b) Implicit method. (c) Implicit-explicit method.
7.4
127
DISCOUNTED COST; EXPLICIT METHOD
sense, "real" time (the zeroth component of ") advances in any step only with probability ph."(x, t ; x, t + A). When the zeroth component advances, the "real" state, namely x, remains the same. Figures 7.la and 7.lb illustrate the state flows for {<$"} and for {t:"}. To get the implicit-explicit method, we simply randomize between the implicit and the explicit methods. For any a E (0, l), choose an explicit step with probability a and an implicit step probability 1 - a (transitions illustrated in Fig. 7.1~). The implicit, or implicit-explicit, method can also be used for Monte Carlo (see Section 6.7). Real time increases at a rate of A per step of {<$"}. For the implicit scheme, A can take greater values than it can take for the explicit scheme-in fact, it is usually of the order of h rather than of h2. But the average increase in real time, namely At"*" ( z )= A/Qh, "(z), is of the order of h2 also. The time increment Ath. "(z) is "adaptive " to local dynamical conditions; perhaps this helps to explain the frequently observed numerical superiority of the implicit method.
7.4 Discounted Cost: Explicit Method Consider the cost functional (4.2.4b) with associated differential equation (4.5.4), with k ( x ) replaced by k ( x ) - A(x)R(x, t). Applying the approximations (2.1)-(2.4) to this equation yields the finite difference equation (compare with the elliptic case of Section 6.5)
instead of (2.7). Of course the boundary conditions remain the same. The limit of VhvA(x,t) as h, A + 0 can be shown to be the same as the limit (as h, A + 0) of both the solutions to (4.2) or (4.3) [compare with (6.5.3) or (6.5.4) in the elliptic case]
+ + k(x, t)A],
VhSA(x, t ) = (exp -A(x, t)A)[E, Vh9"(<~;",, t A)
Vh*"(x, t) = [exp - A(x, t)A]E, V".
+ [l -exp
-
t)
t k(X,
t),
+ A) (X,
t ) E Gh
X
[o, T ) .
(4.2) (4.3)
128
7
PARABOLIC EQUATIONS
The unique solution to (4.3), with the correct boundary conditions, is V",'(x, t ) = Ex,
np(h'A)[exp- j:L(th(v)) dv]k(r*.*(s),s ) ds
t
+ 4, t
[
p(h, A)
exp -
jt
n(t"s)) ds] .
(4.4) It is not hard to see that Theorem 7.2.2 holds for the above Vh*"( * ) and the functional R(., given by (4.2.4b). Of course, the same convergence result holds if we discretize by either the implicit or the implicit-explicit method. bl(
A(p(h?A)), p(h,
A ) ) z { T > p ( h , A))
0 ,
0
)
7.5 Nonlinear Filtering One of the basic and difficult problems of current stochastic control and communication theory is the nonlinear filtering problem for diffusion models. Let X ( - )be defined by (1.1) assume A7.1.1, and suppose that for some integer u, z( ) is a R"-valued Wiener process which is independent of w( ). Let g( ) and b( denote bounded continuous R"-valued and R-valued, resp., functions on R' x [0, CQ) and R', resp. At each time T 2 t ( t = initial time), let the observations Y(s),s E [t, TI, be available, where we define Y ( * )by a,
-
-
.
a )
dY(s) = g(X(s),S) ds
+ dz(s),
Y ( t )= 0.
In many applications, we wish to calculate or approximate either the conditional probability px, t { X ( T )E l- I Y(S),s E [t, TI},
or the conditional expectation
I
E x , t{b(X(T)) Y(S)9 s E
[t, TI19
where x = X ( t ) is either a constant or a random variable. a recursive solution to the or Using the approximations approximation problem can be obtained (which will be similar to the recursive solution to the approximation problem for the Fokker-Planck equation). For computational feasibility, it is necessary that the state space Gh u dG, be finite. So we will assume that X ( . ) stops on reaching the
rh*A(*),
7.5
129
NONLINEAR FILTERING
boundary 8G.Next, a precise formulation of the problem, which is convenient for our purposes (and also very convenient for other theoretical purposes-such as the derivation of It6-type equations for the evolution of conditional moments) will be given. Let R( .), f ( be solutions of the It6 equations a )
X ( S )= x +
[
s n?
f ( X ( s ) , s) ds +
['" 'o (X (s ))dw(s),
(5.h)
where (w( .), R( -)) and (w(.), X ( .)) are independent processes and T and 5 are the escape times of X(*) and resp., from G. By A7.1.1, both R(-) and f ( - have ) the same multivariate distributions. Let z ( - ) be a Wiener process which is independent of w( ) and ii( * ) and define the process p( ) by %(a),
-
dF(s) = g ( X ( s ) , s) ds + ~ z ( s ) ,
Define R ( - ,
a )
by
R(t, T ) = exp
-
s > t,
F(t) = 0.
[j,g'(x(s), s) d p ( s ) - 4 1 Ig(X(s),s) 1' T
T
ds],
(5.2)
and define
3: = minimal o-algebra over which and the term
a ( s ) , s E [t, TI, is measurable,
a:].
&(x, t , T , 0)= Ex,,[R(t, T ) b ( X ( T )1)
Note that the conditioning serves to fix the p(*)process; the expectation above is only over the R( .) process. One of the fundamental results of nonlinear filtering theory is the represen tation
where V, uses the function which is identically unity for.b(.) (Kushner [K4], Zakai [Zl]). In (5.3) the initial condition is X ( t ) = x, a constant. However, if X ( S )is a random variable with distribution P , ( - ) supported on G, we have (w.P. 1) E[b(%(T)Ia:]=
j V,(x, t, T, o)p,(dx)/j Vl(x, t, T, o ) P , ( d x ) .
Thus, to compute the conditional expectation, we must compute V, for arbitrary b. The continuity of b(.) is not required for the validity of (5.3),
7
130
PARABOLIC EQUATIONS
where b( - ) may be an arbitrary Bore1 function with finite expectation but it is used in the proof of convergence of the approximation procedure. suggests that an approximaThe exponential in the definition of R( tion to V, can be computed in a manner similar to the way we computed an approximation to the discounted cost problem in Section 4, where the exponent in (5.2) would be related to a discount factor such as A( .). In fact, if we note that (d, = It6 differential) 9,
d,[exp - j f T A ( s ) d s ]
=
9 )
[exp -jtTA(s) d s ] A ( r )dt
and that (use It6’s lemma) - d,R(t, T ) = R(t,
T ) g ’ ( X ( t ) ,t) df(t),
we might guess that the proper replacement for A(x) dt in the differential equation (4.5.4) (with k ( x ) - A(x)R(x, t) replacing k ( x ) there) would be -g’(x, t) d v ( t ) . The subscript t denotes the variable with respect to which we are calculating the differential. In fact, it can be shown that the unnormalized conditional expectation V,,(x, t, T, o)formally satisfies the backward equation (in the initial variables x , t) (5.4), although we will not go through the formal calculations; our only aim is the development of a computational procedure for approximation of (5.3) via the use of the chain {c: ’}, and observed data y( -).
d, V,(X,t , T,o)+ Y V , ( x , t, T, o)dt
+ g’(x, t ) d f ( t ) V , ( x ,t, T, o)= 0,
( x , t ) E G x [O, T), (5.4) V,(X,T, T , o)= b(x),
x
E
G,
t i T , xEdG. Equation (5.4) is a backward equation. Equations of the forward Kolmogorov or Fokker-Plank type for the conditional density have been discussed by Kushner [K3] and Mortenson [M5]. The backward equation for the unnormalized conditional expectation is better behaved and will serve, as it did for the ordinary Fokker-Plank equation, to yield a recursive procedure for approximating the conditional expectation or weak-sense conditional density, as T increases, and new observational data become available. However, we start the development by actually doing an iteration in the direction of decreasing t. In what follows, is the signal process, and P(.) is the actual process of observations taken on the diffusion .). We can assume, with no loss of
x(
a)
x(
7.5
131
NONLINEAR FILTERING
x(
generality, that *), ?( .) are the original diffusion and observation process, respectively. Let {Ef"} be a Markov chain on the state space G h u dGhr which is stopped on first exit from G h and which has the same transition function that {<:A} has in the set Gh. The process is taken to be independent of the given process p( -). Also, assume thatt
{r;"}
-=
P x , , { t = t'} = 1,
all x E G, 0 It oc). Let T = M A A , where MA is an integer, and define 6, p ( t ) = ?(t p(t), and define V i * A ( xt,, T, o)by the iteration
I
1
(5.5)
+A)-
V!*'(x, nA, T, o)= exp g'(x, nA)6, a ( n A ) - f I g(x, nA) I2A
X E , , , V ,h . A ( -h. < , A+ l , n A + A , T , w ) ,
XEGh,
MA, (5.6)
with the same boundary conditions as in (5.4). Note that in (5.6) we never take expectations over the p( .) process, only over the "} process. Equation (5.6) has a unique solution, which can be written in the form
{c:
where
V;. '(x, nA, T, o)= Ex, .[Rh*'(nA, T)b(zh* "( T ) )1
n exp[g'(cF ', iA)6, p ( i A )
MA- 1
Rh*'(nA, T ) =
i=n
-f
I g(?F ',
(5.7)
iA) 12A].
The conditioning serves to remind us that we d o not take expectations over the p(*)process. Skorokhod imbedding will be used where useful. Now let us fix x, t and use the weak convergence results of Section 7.2. Under the conditions (5.5) and A7.1.1, Theorems 7.2.1 and 7.2.2 imply that P h s A ( s )
+
qs),
s 2 t,
r(
uniformly in each finite time interval, as h, A + 0, where is a solution to (5.la) [for some Wiener process W ( - ) ]with c ( t )= x. This implies that, w.p. 1, as h, A , 0 ' a )
Since g( ', - ) is bounded and (x, t) are arbitrary, there is also convergence in the mean for each (x, t). Hence, w.p. 1,
t The condition can be weakened. We only need that it holds for the value oft which is the initial value in the original filtering problem. Similarly, if the initial distribution is concentrated on x E all G,, it need only hold for x.
7
132
PARABOLIC EQUATIONS
as h, A + 0. Thus, via the finite iteration (5.6), we can calculate V$ ‘(x, t, T, w ) / V t ‘(x, t, T, o), an approximation to the conditional expectation (5.3). In the paper (Kushner [K6]), a similar convergence is shown if the exponential in (5.6) is replaced simply by the first-order approximation [l g’(x, nA)ijA P(nA)].
+
Recursive Equation for the Approximate Conditional Density
We can derive a recursive formula for an approximation to the weak-sense conditional density from (5.6), (5.7). (Compare with the development for the Fokker-Planck equation in Section 7.2.) Fix h, A and let H denote the number of points on the grid Gh u 8Gh. Order the points in some way, and recall the definition of the one-step transition matrix pgs A from Section 7.2. For arbitrary H vectors V , X and an H x H matrix F , define the operator by VoF=
Vl F11
“p’”],
...
[ :
vex=
[
VHFHH
vHFHl
1.
0
Vl ;Xl
VHXH
Henceforth, the variables x, y range over G h u dGh and are ordered as the points of Gh w aGh are ordered. Let B, E m , and V,(m, o) denote the vectors {b(y)}, {exp[g’(y, mA)dA?(,A) - I g(y, m A ) I’A]}, and {V$ “(y, mA, T , w)}, resp. W e will suppress some of the h, A notation. Equation (5.6) can now be rewritten in vector form as
4
where
&(n, o)= En
0
p;’V,(n
+ 1, o),
n
(5-9)
V,(MA, O ) = B.
Now, for each pair of integers n, in, with m 2 n, define the H x H matrix Q(n, m, o)with components qx,,,(n,m, o)by Q(n, m, w ) = I, the identity,
for m
= n,
and, for m > n, by the recursive formula Q(n, m, o)= En 0 p;AEn+l0 = En 0
P;21 ... Em-l lP!: 0
P,”.‘Q(n + 1, m, w )
= Q(n, m -
1, w)Em-
0
P“hi -A 1.
Now let nA = t, M,A = T. Then by iterating (5.9), (5.10)
7.5
133
NONLINEAR FILTERING
In terms of components, (5.10)can be written as V'd*'(X, t, T, w )=
1 qXY(4MA,w)b(y)-
(5.11)
Y
By (54, (5.1l), and the weak convergence,
+
Ex., [ b ( z ( T )1)%), s E [ t , TI],
w.p. 1,
(5.12)
as h, A + 0. As we will see next, only one row of (5.9) needs to be iterated. Suppose that t is the initial time and that x ( t )is not concentrated at x but has a weak-sense density p,( -). Let the row vector {p:(x)} denote an approximation to p , ( . ) , with support on Gh u 8Ghrand which converges weakly to p , ( * ) , as h + 0. Let Q(n, m, w,x) denote the xth row of Q(n, m, 0).Then Q(n, m, w,x) = Q(n, m - 1, w,x ) E , - ,
0
(5.13)
P'-hA A- , .
For each pair of integers n, m, m 2 n, define the row vector @(n, m, w) with components ij,(n, m, w) by o(n,m, w) = {p:(y)} for m = n, and for m > n, by the recursive formula e ( n , m, w )= Q(n, m - 1, w)E,1
Then we have that
1 Y
iy(%
MA,
I,1
w)b(y)
MA?
+
,
0
'-h A PA-
E [ b ( x ( T ) )I p(s)?
(5.14)
[t,
w.p. 1,
as h, A + 0. Thus the vector with components
!.1
ijY(n,m, 0)
i.b, m, w )= FYhm, 0)
(5.15)
is simply a weak-sense approximation to the weak-sense density of X(mA), conditioned on F(s), nA I s ImA, in the sense that it converges weakly w.p. 1, as h, A -+ 0. Furthermore, it can be computed recursively via (5.14). The quantity j ( n , m, w) in Eq. (5.15) is simply the Bayes-rule conditional density f o r the chain namely (with an obvious abuse of notation) fly(n, m, w )= P{cA = y 16, p(i) = g ( e . ', iA)A + 6 , z(iA), n I i c m},where '} is independent of z( .) and has the same law that {f: "} has. We can see this as follows. It is true for m = n. Suppose that it is true for m = 1 - 1 2 n. By formal manipulations with Bayes's rule, we show it to be true for m = 1.
{e
{c"},
7
134
PARABOLIC EQUATIONS
Let E,(y) = yth component of the vector E,. We have, by the induction hypothesis,
P{z:iq = y I current observation
=
+ 6,z(lA
aA?(lA
- A) = g(z:it,
IA - A)A
- A); 6, ?(iA - A), n + 1 Ii < I } = P{current observation has value dA ?(IA - A) 1 e:it .P(z:i4=y(6,?(iA-A),n+ = (exp
-4
=y}
lIi
16, F(IA - A) - g(y, IA - A)A 1)’ . Qy(qI - 1, o ) K ;
= El- ,(Y)QY(&
I - 1, a),K;,
where K , ,K ; , K ; are normalizing factors which d o not depend on y. The last equation implies that, modulo normalizing factors,
~{z:,
= ~ 1 6 ?, ( i ~ ) , n Ii =
1 4- I(Y)9,(4 Y
< I}
I - 1, o)P{z:g A
=x X
I z:i4
= y},
E Gh U d G h ,
(5.16)
which equals Qx(n, 1, o)by the definition (5.14). Thus, the normalized value (5.15) is the conditional probability, as asserted. If we used the first-order approximation { 1 + g’(y, mA)6, ?(mA)} in lieu of Em, we would not have the Bayes-rule interpretation and, indeed, it is possible that some Q,(q m, o)would be negative even though we would still get the correct conditional expectation as h, A + 0.
7.6 Numerical Data: Estimation of an Invariant Measure Tables 7.1-7.3 give data on a numerical approximation to the invariant measure for the simple scalar case dx = - ( ~ / 2 )dt
+ J2
dw,
using both the elliptic and parabolic approximations. The unique invariant measure is normal with mean zero and variance 2. Since its support is the entire real line, the state space was truncated to either the interval [ - 3, 31 (Tables 7.1 and 7.2) or to the interval [ - 19, 191 (Table 7.3). Let x = left endpoint of the truncation interval. In order to account for the truncation, we selected a number q E (0, 1) and used ph(x, x + h ) = 1 - q, ph(x, x ) = q. Similarly, ph(x, x - h) = 1 - ph(x, x ) = 1 - q when x = right endpoint. The tables give the cumulatioes, i.e., the calculated distribution functions. Since these are antisymmetric about x = zero, only half the points are plotted.
7.6
135
ESTIMATION OF A N INVARIANT MEASURE
TABLE 7.1 ESTIMATE OF A N INVARIANT MEASURE. SPACETRUNCATED TO [ - 3, 31, h = f
x
Gaussian
Approximate Gaussian
E q =4
.2 .4 .6 .8 1.o 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
,556 ,610 .665 ,713 .761 ,805 341 .873 .900 ,921 .940 .962 ,967 ,976 .983
3 8 .614 .669 ,720 .767 .8 10 346 .88 1 .910 ,933 .953 .968 ,980 ,989 ,997
3 9 .616 ,670 .72 1 .768 .8 10 A47
P
B=
4, 4 = I 2
.559 617 .672 .723 ,770 ,812
350
,880 ,910 ,934 ,950 ,970 .982 992 ,998
380
.907 .93 1 .950 ,966 .978 .988 .996
E q=.75 .558
.614 .668 ,718 ,764 .805
,842 374 .901 .924 .943 ,959 ,971 .98 1 .993
Refer to Table 7.1. Column 1 gives the value of the state, and column 2 the cumulative Gaussian (variance 2). We calculated the Gaussian density at each point 0, fh, ..., on [-3, 31 or [- 14, 1-51, interpolated this by use of a piecewise linear interpolation, and then used the interpolation to calculate an approximation to the Gaussian distribution. This distribution is tabulated in column 3. The value at x = 3 is not unity; the error is the density at x = 3, which mass we assigned to (3, a).The other columns give the numerical data using either the elliptic (E) or parabolic (P) schemes. In all cases (and all tables), we took a piecewise linear interpolation between the value obtained at 0, & h, . . ., and plotted the cumulative from that. The
*
TABLE 7.2 ESTIMATE OF A N INVARIANTMEASURE. SPACETRUNCATED TO [ - 3, 31, h = 4
P
Gaussian
Approximate Gaussian
Table 7.1 column 4
q = .3
E
P B = 4, q = 2
P B = 4, q = .3
.6 1.2 1.8 2.4 3.0
,665 ,805 .900 .962 .983
.665 304 903 .96 1 .99 1
.670 .8 10 .907 ,966 ,996
.669 .804 .889 .957 .789
,667 ,799 .982 .949 .985
.673 ,811 ,907 .967 .994
136
7
PARABOLIC EQUATIONS
TABLE 7.3
BTWATE OF AN INVARIANT MEASURE. SPACE TRUNCATED TO [ - 13, 131, h = Approximate Gaussian .25 .5 .75 1 1.25 1.5
,593 .684 ,769 .846 ,915 .973
B
=
P 19 q = I2
P B =f, q = f
P B = f, q = .3
.601 .697 ,786 366 .937 ,985
.602 ,700 .79 1 .873 .945 .989
,595 .686 ,770 ,846 .913 ,972
a.
errors were smaller with the use of the interpolation. The value of h is given in the table; to calculate A (for the parabolic case), we chose a number B E [0, 11 and [see (2.6a)l selected A by the formula B = (A/h2)max[h I x/2 1
+ a’],
X
where I x I I 3 or 1x1 I 1i according to the case and a’ = 2. It is not at all clear whether column 2 or 3 should be taken as the basis of comparison. The values with the parabolic method and B = 1 were about identical to the values with the elliptic method. The approximations calculated by both the elliptic and parabolic methods seem to be rather close to the true values (columns 1 or 2). The main problem is the choice of q. The value should reflect the average amount of real time that the original process spends outside the truncation region.