252
European Journal of Operational Research 73 (1994) 252-264 North-Holland
S o m e c o m m e n t s on a simple nonlinear filter with application to adaptive control * N. C h r i s t o p e i t
Institute for Econometrics and Operations Research, University of Bonn, Adenauerallee 24-42, D-5300 Bonn, Germany. Abstract: This p a p e r studies the asymptotic behaviour of a 1-dimensional nonlinear stochastic differential equation which arises in the control of linear diffusions with unobservable drift. As application, performance of adaptive control laws over an infinite time horizon is investigated. Keywords: Nonlinear filtering; Adaptive control
I. Introduction In this p a p e r we study the asymptotic behaviour of the solution to the simple nonlinear stochastic differential equation
dxt=kxt(1-xt)
dwt,
t>0,
(1.1)
with initial value x 0 = x ~ [0, 1], driven by a Brownian motion (wt). (1.1) may be thought of as a filter equation arising in the context of certain control problems with an unobservable additive drift term. More precisely, consider the linear stochastic plant
dyt=(Oo+Ut) dt+dvt,
t>0,
(1.2)
where (c t) is a Brownian motion, (t/t) a control input and 00 an unobservable random variable, independent of (Y0, (vt)) and taking values a and b, a > b, with probabilities ~- and 1 - ~-, resp., with all objects defined on some probability space ( 0 , S r, P). Let "ITt = P ( O 0 ~ - a l ~ t y) denote the posterior probability given the observations of the s t a t e (yt) up to time t. Then (cf. Liptser and Shiryayev, 1977) d r r t = k T r t ( 1 - T r , ) dw,, with k = a - b
7r0=Tr,
(1.3)
and some (SrtY)-Brownian motion (wt).
2. Properties of the solution Obviously, for starting value x = 0 or x = 1, x t = 0 or x t =-- 1, resp., are the unique solutions to (1.1). So, henceforth, let us assume that x ~ (0, 1).
Correspondence to: N. Christopeit, Institute for Econometrics and Operations Research, University of Bonn, Adenauerallee 24-42, D-5300 Bonn, Germany. E-mail:
[email protected] * This research was supported by Deutsche Forschungsgemeinschaft, Sonderforschungsbereich 303. 0377-2217/94/$07.00 © 1994 - Elsevier Science B.V. All rights reserved SSDI 0377-2217(93)E0097-H
N. Christopeit / Simple nonlinear filter with application to adaptive control
253
For every such x, there exists a pathwise unique strong solution (x 7) defined up to some explosion time ~x. It follows from standard results on one-dimensional diffusions (cf. e.g. Ikeda and Watanabe, 1981, Theor. VI.3.2) that the first exit time ~'~,] = inf{t > 0: x{ ~ (0, 1)} satisfies P(~'~,I = oo) = 1. Hence the solution to (1.1) is nonexplosive and stays in (0, 1) forever. Moreover (cf. Ikeda and Watanabe, 1981, Theor. VI.3.1), since (x 7) is a diffusion on natural scale, lim t _~=x~ exists a.s. and P(lim xt=0 )=l-P(}imxt=l)=l-x. ~ l-->
o¢
Consider now the special case mentioned in the Introduction, where x, = ~t = P(Oo = a ISr [ ). We want to study the asymptotic behaviour of the a posteriori mean O, = E(O o I ~ t y) = k ~ t -4- b,
Oo = ~ = kTr + b.
1. If P(O o = a ) = 7r and 7r o = ~ , then limt__,~'n- t = l[oo=a] P a.e., i.e. 0 t is a strongly consistent estimate o f 0 o.
Proposition
P-
a.e.. Equivalently, limt__,=O t = 0 o
P r o o f . We know that
lim ~'t = ~'~ = t--,~
1
with probability 7r,
0
with probability 1 - 7r
On the other hand, from the definition of ~rt, 7r= = P(O o = a l ~-~) a.s.. We show that ~-= = lt0o=,] a.s.. Denote [0 0 = a] = A o, [7r= = 1] = A . Then, for all F ~ 9-~, P( AoF) = E[1FP( Ao
I ~-E)]
= P( AF).
In particular, P ( A o A ) = P ( A ) = 7r. Since ~- = P ( A ) = P ( A A o) + P ( A \ A o) = 7r + P ( A \ A o ) , we must have P ( A \ A 0) = 0. Similarly, since also P ( A o) = ~', P ( A o \ A ) = 0. This shows that assertion. [] Note that consistency holds only when the estimates ¢rt are calculated with the correct ~'0 = 7r, namely • r = P(O o = a). If the stochastic differential equation (1.3) is started with an initial value ~r0 = ~" =~~-,
consistency cannot hold, since then P(Tr= = 1) = ~'. This is in contrast to the case of Gaussian 00, where consistency obtains for any choice of initial value.
3. A B a y e s - f o r m u l a
approach
The consistency result of Proposition 2 may also be obtained by application of Bayes' formula. To this end, for 0 ~ {a, b}, let yO denote the solution of the equation dy t=(0+ut)
dt+dv t
(with initial value y 0 = y ) , and let /x 0 denote the corresponding solution measures induced on (C[0, ~), 5r[0, ~)). Noting that the restrictions of /z, and ix b to 5r[0, t] are mutually absolutely continuous for every finite time interval, we may define the R a d o n - N i k o d y m density -d~a -
(Y),
4,,( y ) = d~b ,~-to,,l
y ~ C[O, t].
N. Christopeit / Simple nonlinear filter with application to adaptive control
254
Proposition 2. The a posteriori estimate "ITt a) is given by ~" Ct(Y) 7rt = - ~1 - ~ " 1 + -1--¢¢tr( Y )
=
E(O o = a I ~ Y ) corresponding to initial value ¢ro = vr = P(O o =
,
(3.1)
where y = (Yt) is the solution to (1.2). Proof. We have to show that for C ~ ~r[0, t], P(O o = a, Y[0,t] ~ C)
=
~-----~ C'(Y) dP, ,r f0.ne c 1 + _--Z-1 ~ ~r 4)t(Y)
1
w h e r e Y[0,t] denotes the restriction of y to [0, t]. Since
P(Oo=a,yto,nC)=P(yto,nC]Oo=a)e(Oo=a)=P(y[ao,t]C)=Trtza(C), this is equivalent to 77"
Tl'~La(C) = - - - ~ 1
fc 1+~_
(3.2)
~,(T~)~. Iz(d'q), qbt(~7)
where/z is the distribution measure of the process y. Evidently, for A ~ 9"[0, t ], ~ ( A ) = (1 - vr)~b(A ) + "n'/.*a(A ). Hence (3.2) may be written f
"C
4,t
(1
[(1 - rr) d~b + ~- d/x~] 77") + 7T~bt ?/-
= fc (1 - at) +~'~b t
[(1 - 7r)¢ t d~b + "B'~t dtz~] = Tl'l£a(C ) •
The last equality follows from the definition of ~bt.
[]
By Girsanov's theorem (cf. Liptser and Shiryayev, 1977), (bt( y ) = e x p ( k ( y t - y o )
- k ~ u s ds - 1 ( a 2 - b 2 ) t ) = e x p { - 1 ( a 2 -
=exp{kt[Oo_½(a+b)+vt/t]}= Consequently, lim ~bt(y ) = [oo ~0
t - - - * oo
a.e. on 0 0 = a , a.e. on 00 = b.
(3.1) then shows that lim 7rt = [ 1 t-pop ~0
a.e. o n 0 0 = a , a.e. o n 0 0 = b ,
exp{kt(½k+vt/t)} exp{kt(-½k +v,/t)}
b2)t} exp{k( Oot + v t ) }
ifOo=a, if Oo=b.
N. Christopeit / Simple nonlinear filter with application to adaptive control
255
t 2t - (k + E)2 ~ 1og2t, E > 0, we which is precisely the assertion of Proposition 1. Actually, with At = ~k have on 0 0 = a
eo(
( k.log.+ + 1
---, 0
a.e.
and on 00 = b
e'~tq~ t = exp ~
--*0
~/2t log2t
a.e.
by the law of the iterated logarithm. Hence
Proposition 3. Ot - - O0 =
O(e-k2t/2+(k+e)~)
a.e. for every E > O.
4. Application to adaptive control Let us now come back to the dynamics (1.2) and suppose first that 0 0 is a (nonrandom) known parameter 0: d y t = ( 0 + u t ) dt + d v t ,
0 <_ t < T,
(4.1)
and consider the problem of minimizing
JT,o[u] -- e
)
(,~y, + u~) dt + ~ y g ,
,~, ~ >__0,
(4.2)
subject to the dynamics (4.1) over some class of admissible controls ~'. To be definite, let ~" denote the class of all measurable (~Y)-adapted process (u t) s.t. (4.1) possesses a unique weak solution and
E.g., all feedback controls u t = u(t, y) (with a measurable function u :[0, T] x C[0, T] ~ R adapted to the natural filtration on C[0, T]) and satisfying a quadratic growth condition in y will belong to ~' by virtue of the Girsanov measure transformation device. Briefly, for such u(t, y), starting with some Brownian motion (Yt) (under some measure P), d P u = exp { f o r ( O + u ( t , y ) ) d y t - l f o T l O + u ( t , y ) 1 2 d t } d P will define a probability measure under which ( y t ) solves (4.1) with some PU-Brownian motion (v,). (4.1)-(4.2) is a special case of the general L Q G problem (cf. Fleming and Rishel, 1975) and may be solved by either adapting the solution theory for such problems or by directly solving the Bellman equation with a quadratic Ansatz for the value function. The functional form of the optimal control law and the value function depends on the sign of the ratio
r
v~- +/x
N. Christopeit / Simple nonlinear filter with application to adaptive control
256
Since for infinite horizon problems terminal costs can be neglected, our main interest is in the case r > 0. For this case, the optimal control turns out to be the Markovian feedback control law
u~.(t,x;O)=-v~-tanh[p+v~-(T-t)].x-O
(
coshp ) 1 - cosh[p+x/A-(T-t)]
(4.3)
with value function
Vr(t, x; 0)=v~- tanh[o+ v ~ ( r - t ) ]
( coshp ) "x 2+20x 1 - cosh[p+ v ~ - ( T - t ) ]
1 tanh[v~-(T- t)] 1 cosh[p + , ~ - ( T - t)] V~- 1 + t a n - h p : t a - ~ [ f A ~ r - t)] ] +log cosh p
+02((T-t) The constant p is determined by P = - 7 1 log r.
Typically, for infinite horizon,/.~ = 0 and hence p = 0, so that (4.3) takes the simpler form ( 1 ) u~.(t, x; 0) = -v~-tanh[v:A-(T-t)] "x-O 1 - cosh[v~-(T-/)] and
(
1
Vr(t, x; O)= v~-tanh[v~-(T-t)] .x 2 + 20x 1 - cosh[v~-(T- t)] ( 1 +02 ( T - t ) - - - ~ t a n h [ f - A ( T - t ) ]
)
(4.3)
)
+logcosh[v~-(T-/)].
Turning now to an infinite time horizon T = 0% we would like to replace (4.2) by the average cost 11 lira s u p E ~~ .~_ &
T
(4.4)
2
Note that u*(x; 0) = lim u~(t, x; O) = -(v/-£x +O), Z~oo
1
V*(O) = r_~-~Vr(t,lim x; 0)
=
~ ÷
0 2.
(4.5) (4.6)
We want to show that (4.5) is a stationary optimal control law for (4.1), (4.4) and that the optimal value is Jo[u*] = V*(O). To this end, suppose that V(x) is a sufficiently nice function s.t., for some constant V*, ½V~x+ Ax 2 - V* + min [(0 + u)Vx + u 2] = 0. U
(4.7)
Then, by Ito's formula, t 1
V(Yt) = V(Y°) + f0 [2Vxx(Ys) + (0 + us)Vx(Ys) ] ds +M t
>_V(Yo) + V ' t - fo( Ay~2+u2) ds+Mt
(4.8)
N. Christopeit / Simple nonlinear filter with application to adaptive control
257
for all admissible controls u, with equality holding if and only if (4.9)
fora.a.(s, to),
us=u*~ = u * ( y s )
where u * ( x ) is a solution of minu[(0 + u)V, + u2]. The martingale part in (4.8) is given by M t = ftVx(Y~) dye. Jo
(4.8) implies
EV(y,)
V(Yo) + V * - E { t
t
l "t 2 7Jo(aYs
ds},
hence, if all expectations exist and EV(Yt)/t-+O
(4.10)
as t--*~,
passing to the limit yields
( 1 "t
2
)_V*
litminfE t J o (Ay* + u 2) ds >
and
tJo(aY~
t/~ 2 )
for the optimal (u*) (as given by (4.9)). Trying the Ansatz V ( x ) = ax 2 + bx for the solution of (4.7) yields u*(x)
= u*(x;
0) =
V ( X) = V ( x ; O) = VI~x 2 -q- 2 0 x ,
V* = V*(O) = ~
+ 0 2,
(4.11)
in accordance with (4.5) and (4.6). It remains to verify (4.10) for the solution process (Yt) corresponding to the optimal control law u* = - ( 0 + v~-y,). But in this case, (yt) is just the solution of the linear stochastic differential equation dy t =
-
(4.12)
vrAyt dt + dvt,
i.e. Yt ~- e-~/xt
Yo +
[teC;, dv~ )
,
E ( y t ) = e-Vgtyo,
~o
1
y2 = e-2~-'[y2 + 2y0f0te~-S dv s
E(y 2) = e-2//~-t(y2 + foe 2vt~-sds)= e-2~-ty2 +
1 -- e - 2~-t
2V~-
o(t),
(4.13)
whence (4.10) follows. P r o p o s i t i o n 4. u * = -(VcA-yt + 0) is a stationary optimal control for the infinite horizon average cost among all admissible controls satisfying (4.10), and
1 -t
2
l i m E /V1t fo' (ay*2 + u * e ) ds =vrA- + 02 = 1LmTjo(,ys_}_bl~s2) d s a.e. ,--,oo
258
N. Christopeit / Simple nonlinearfilter with application to adaptive control
Proof. (Of last equality) By the law of the iterated logarithm,
e-d£'f/e~,
te~* dr, dvs=e-V~tTe2Vrat log2(e2q ~-t)
~ l~t
"0(1),
(4.14)
7 e 2 ~ t log2(e 2 ~ ' )
hence lim ( y t / t ) = lim (yt2/t) = 0
t --* oo
a.e.
t --+ oo
and, consequently, by (4.11) and (4.13), lim ( V ( y t ) / t ) ~ 0
(4.15)
a.e.
l ---~oo
Moreover, Mt = 2
y, + O) dv, = 2Ov t + 2
y, dv s.
Setting Yt2 =JO r,y2s ds, @fotyS
dv s =
7Yt2l°g2(Yt2)~ ys dUs t
7yt 2 log2(yt 2)
= o(1)
(4.16)
by the law of the iterated logarithm and since
Yt2= o(t).
(4.17)
Hence Mt/t ~ 0
(4.18)
a.e.,
and the assertion follows from (4.8) (with equality) and (4.15).
[]
Actually, what is sufficient for (4.16) (and hence (4.18)) to hold, is something like Yt2 = O ( / 2 - ' )
for some ~ > 0
(4.19)
(rather than (4.17)). (4.19) follows easily from (4.13) and (4.14), since
f
[
yt2= ['y2 d s ~ t e - 2 V Z s fSef£r dv r J0 "0 wo
(log s) d s = t ( l o g t - 1).
As a consequence, since Ay2 + u .2 = 2Ay2 + 2v~-y,0 + 0 2, Proposition 4 yields for 0 = 0: 1 t-t 2 1 lira -- / Y, ds = t--,oo t J0 2qrA- '
(4.20)
and for 0 = 1: 1
t
lira t £Y" d s = 0 .
(4.21)
Thus the considerations above provide a simple proof of the ergodic properties (4.20) and (4.21) (which are of course well known).
N. Christopeit / Simple nonlinear filter with application to adaptive control
259
Coming back to the situation where 00 is a Bernoulli random variable as in the Introduction, we may distinguish the following cases. Case 1. 0 o observable. a) Finite time horizon. The optimal control law is in feedback form given by (4.3)' or
(
1
u*=-v/A-tanh[v/A-(T-t)] .Yt-O o 1- cosh[v/~(T_t)]
)
with (y,) a solution of (1.2) (for u = u*). The value function is Vr(x) = EVr(0, x; 0 o ) = v~-tanh(v~-r).x2+
2e(Oo)x(1
,
1) cosh( r)
b) Infinite time horizon. A n optimal stationary control law is in feedback form given by (4.5), or ut* = - (~fAYt + 0o),
dYt = - V~y, at + dvt,
(4.22)
with value 1
V* =EV*(Oo)
= T-~lim -~VT(X ) = V~ + E(O~).
Moreover, l
"t
2
7j0( y, + , . 2 ) d s = ¢ ; +00 a.e. Note that in both case la) and case lb) the optimal control (u*) is adapted to (@y,00). Case 2. 0 o not observable. a) Finite time horizon. In this case, every admissible control should be adapted to (gtY), reflecting the fact that control action can only be based on what is actually observed. As pointed out in the Introduction, we may calculate the posteriori probabilities via (1.3); equivalently, since 0t = krrt + b, dOt=(a-~,)(Ot-b)dwt,
(4.23)
where the innovation process (w t) is given by w t = Yt - fd(Os + Us) ds or dy, = (0 t + ut) dt + dw,
(4.24)
for some (~Y)-adapted Brownian motion w (which may depend on u, thus solutions to (4.24) have to be understood in the weak sense). (4.23)-(4.24) together with criterion (4.2) is the separated control problem with 2-dimensional state (Yt, Or). For the solution, one may try the Ansatz I)'(t, y, O) = a ( t ) y 2 + fl( t ) y O + y ( t , 0). Solving the Bellman equation (for criterion (4.2) with ~ = 0) ¢ + ½(l)yy + 2 ( a - ~)(0-b)12y0 + ( a - 0)2(0-b)21)0~} + )ty2+ min {(0 + u)12y + u 2} : 0 ,
y, 0) =0,
(4.25)
N. Christopeit / Simple nonlinear filter with application to adaptive control
260
with (4.25) yields
~*(t, x, 6) = - ( a ( t ) x + ½/3(t)0), &=a2-a,
a(T)=0,
/3=a/3-2a,
/3(T)=0,
~, + l ( a - 6 ) 2 ( 6 - b)Zyd~ + a -/3[ab - (a + b)O + ¼/362] = 0,
y ( T , 6) = 0.
(4.26)
Solving (4.26) gives
a(t)=~2tanh[~/-2(r-t)] 7(t, 6) =Et,~
/3(0=2
'
(4.27a)
ds ,
(4.27b)
1- cosh[~-(r-t)]
a(s)-/3(s)(ab-(a+b)6~+¼/3(s)62)]
with (6~) solution of (4.23) on s >__t with initial value 6 t = 6. Note that (4.27b) is nothing but the stochastic representation of the solution to the P D E (4.26) for y (cf. Durrett, 1984; see also Helmes and Rishel, 1991). Hence the optimal control is in this case given by a*=-v/~-tanh[v~-(T-t)]
(
1
"Yt--Ot 1 - - c o s h [ v ~ - ( r - t ) ]
)
'
(4.28)
with the two-dimensional state evolving according to the stochastic differential equations (4.23)-(4.24) (with u t = u*) and value function
(
1
Vr(t, x, 6) = ~ - t a n h [ ~ / 2 ( r - t ) ] .x a + 26x 1 - c o s h [ ~ - ( r - t ) ]
) +Tr(t, 6).
Remark 1. It is important to note that we have treated the separated control problem as a 2-dimensional problem in its own right, with (0 t) evolving according to (4.23) from some initial value 60= 6; equivalently, ~'t = (~t - b ) / k is the solution of the stochastic differential equation (1.3) with initial value ~0 = ~" related to 0 by 0 = k~- + b. The separated problem will be equiva~nt to the original incomplete information problem 2a) if and only if "ITt = E ( O 0 a I 5rt~), equivalently: 0 t = E(O o I ~Y), which, in turn, will only be true for the 'correct' initial value ~ = 7r ( ~ 0 = k~" + b = E(Oo)). =
Remark 2. An equivalent and often more natural way of thinking about the problem of minimizing (4.2) subject to (1.2) with an unobservable random variable 00 is provided by the Bayesian interpretation. There it is assumed that the actual dynamics are described by (4.1) with some unknown (nonrandom) parameter 0, for which, however, some prior distribution is given. Expectation in (4.2) is then with respect to both the 'objective' probability distribution governing (4.1) (conditional on the parameter value) and the 'subjective' prior. This is equivalent to the approach adopted here, i.e. replacing 0 by an unobservable random variable 00, provided 00 is distributed according to the prior of 0.
b) Infinite horizon. Note first that, for fi* from (4.26), (4..29)
fi*(x, 0) = ~--,= lim a * ( t , x, 6) = - ( v ~ - x + 6). Moreover, since 6= = lim s_~=6s exists a.e. (by the results of Section 2), 1
T
lim -T fo [a(s) - / 3 ( s ) ( a b - ( a
+b)6s + 1/3(s)02)] d s = ~/-~ - 2 [ a b - ( a
1^2
+b)6= + ~0~].
N. Christopeit / Simple nonlinear filter with application to adaptive control
261
Since, for ~ starting at 0 ( ~ 7rt starting at ~-) (cf. Remark 1), P d ( % = 0 ) = P~(0o~ = b ) = 1 - P ~ ( %
= 1 ) = 1 -P~(0o~ = a) = X - ~ -
by the results of Section 2, bounded convergence allows us to conclude that I~*(~)=
lim
1 ^
T~oo 7
1
Vr(O,x,O)=
lira-f'/r(t,O)=v~-2[ab-(a+b)E~(6~)+~E(O~)]
1
^2
T~oo
with Eo{O2) = k(a + b)~r + b 2. In the light of Remark 1, the situation of interest for us is when ~- = ~', in which case
9 " = ~ * ( ~ ) ~=k=+b = ~ - + E(00~). We want to show that fi*(x, 0~) is indeed an optimal stationary feedback control law and that the value of the infinite horizon problem with unobservable 00 is l)* = fh- + E(O2). Since this value must be greater or equal than V* from lb), it suffices to show that for f i * = - (v~-~t + if,), d:gt=(0t + a * ) d t + d w t = - f h :gt dt + dw t,
(4.30)
we have that
' ^2 ~7£(ay,+a: ~1ds} =¢;
limE( 1 ,
+E(O2).
(4.31)
This is basically clear from part lb) since (p,) satisfies the same dynamics as (y,). More precisely, 2Z t^2 -£(AY" +a*~)ds=--foYs t 1
t
^2
2~-
t
^
t
1 -t^2
7 Jo Os d's"
Taking expectations, since EO2 ~ E(O~) and (Ys) £ (Y~),
1 . .2
_~joys
,.4+2,EItf;,s,.dsl.
. +E(02)+o(1)
But, in the limit, the middle term on the rhs becomes zero since, by virtue of (4.21) and Ot ~ 00, 1
t
^
lim-f~s0~ds=0 t~ t 0
a.e.
and uniform integrability holds:
,So,,Os
,.
Since the same argument applies to (1/t)f~YsO o ds, we find that
ILmE[I for(h)32+ t1.2)ds}= limE[2A ~;y,ds)+E(02)= ]£mE{lj/(hy2+u~*2)ds} ,
t t
,
t
t
= d +e(oo~) (with (u*) as in part lb)).
7
262
N. Christopeit / Simple nonlinear filter with application to adaptive control
Moreover, it is immediate from (4.20)-(4.21) and /~t ~ O0 that lim -1lot ( ays^2 + R,2) ds = fA- + 02 a.e. One might also wish to directly compare the trajectories dynamics for (.pt) may also be written (using (1.2))
d.pt=- [(/~t-0o)+ v/A-.pt]d t + d v
(Yt)
and (.Pt). To this end, note that the
t
with the original Brownian motion (Ut). But then
.pt=e-CYt[yo+ foeCYSdv~- foeCT"(O~-Oo) ds ]
=y,-e-C~'fted;'(~s-Oo)ds.
~0
By Proposition 3,
foteVff'(O~-Oo) ds
< fot°eCYS(Oo-0o)ds + 0 ( 1 ) ftle (:-~2/2)' exp((k +e)~/2s l o g 2 s ) d s
< 0(1) + 0(1) fte(CT-(k~/2)+8)s ds to
= O(1)[1 + e (v~--k2/2 +8)t ]
for any 6 > 0 s.t. v~- - lk2 + 8 4= 0. Hence, by chosing ~ arbitrarily small, .P,-Yt = O(e-pt)
a.e.
for every p > 0 satisfying p
and
1 2, p < ~k
(4.32)
as well as
f0'I.ps-Ys : d s = O ( 1 )
For the actual average costs Ct = 2A-t
(4.33)
a.e..
2
(1/t)fd(Ay~ + u .2) ds 2v/~-
Ic,- e,I = --t-Jo(Ys-.p2) d s + - - f o t 2A
t
t(
<- --lot ](Y~-.Ps) IIY~+.Psi ds +
and Ct =
Y~O°-.p~Os)ds+
(1/t)f~(A.p~ + a .2)
1
t fo'(0°z- ~z)
2v~- t^ ^ ds --i-fOy,(O,-Oo)
ds
2v~--t
+ -7-JolL-y~l[Ool ds
1
+-fo
-
ds,
+0°1 ds
~_~M((lfolYs_Yslds)l/2..[_(1/o~S^ 2 - 00 2 ds~'~1/2~ / = O ( .~._ )1 by some straightforward H61der estimates together with (4.20), (4.33) and Proposition 3. Collecting the results we find
N. Christopeit / Simple nonlinear filter with application to adaptive control
263
Proposition 5. (i) The finite horizon problems with observable and unobservable 00, resp., are asymptotically equivalent in the sense that the optimal average values V r ( x ) / T and I)r(x, O')/T tend to a common value v * = ( £ + E(02o). (ii) V * = fA- + E(O 2) is also the common value of the corresponding infinite horizon problems. Optimal stationary controls are
u,*=
dYt = - V~Yt dt + dvt,
t~* = - (~-33t + 0,),
d33t= -~-~3 t d t + d w t ,
and dOt=(a-¢)(Ot-b)dwt
(with initial value Oo = E(Oo) = kTr + b), and we have
1LmE( 1 -t
7Jo(, ys2+ Us*:) ds)
7fo(,t y,^2 + a*:)
= limE{ 1
'So(J.y, +us*2) ds= ti_.>m7' So',tj~ ( s +~2)
Jim-
t-~
t
ds} =
+E(02),
ds=dX- +0~ a.e..
:
(iii) The optimal trajectories (Yt) and ( Yt), resp., satisfy
ly,- ,l
=O(e-P'),
folY.-g.l 2 ds=O(1)
a.e.
for etrery p satisfying (4.32), while for the average costs
IC, - Ct I = 0(1/,//)
a.e..
5. Concluding remarks and a view towards applications
1. The optimal control for the infinite horizon, 00 unobservable-problem is obtained by substituting for 00 an estimate 0t- In this sense, (ti*) may be called an adaptive control law. Consistency ~--+ 00 corresponds then to the self-tuning property, and (ii) expresses the fact that the adaptive policy is self-optimizing. 2. One could think of various extensions of the simple model (1.2) in quite a variety of directions and on different levels of complexity. First one might consider more general dynamics e.g. include autoregressive terms or treat the case of multiplicative random coefficient in front of the control itself. Or, one might take care of control costs not by including them in the criterion (as in (4.2)), but by imposing 'hard' restrictions (cf. Christopeit, 1986; Benes, Karatzas and Rishel, 1988). One might also admit for 00 time dependent stochastic process, e.g. random telegraph signal (cf. Liptser and Shiryayev, 1977) or a finite state Markov process (cf. Helmes and Rishel, 1991). 3. Actually, assuming a two-point distribution for 00 is a sort of paradigm for more general distributions. By appropriate conditioning these cases may be reduced to the Bernoulli case, so that asymptotic optimality of (ti*) carries over. 4. The model considered in Section 4 may be regarded as the simplest case of models with dynamics d y t = [ry t + u t] dt + d z t and noise process d z t = 0 t dt + dvt, where (0 t) is a Markov process independent of the white noise (vt). The objective of such a decoupling of noise components is to improve prediction of noise and thereby achieve better regulation of the system. In particular, the case where (0 t) is a finite state Markov process has interesting applications in meteorology and heating technology (cf. Lestienne, 1979; Scartezzini et al., 1987).
264
N. Christopeit / Simple nonlinear filter with application to adaptive control
T h e m o d e l (1.2) m a y b e u s e d as a first r o u g h a p p r o x i m a t i o n to m o r e realistic m o d e l s a n d gives an i d e a o f t h e q u a l i t a t i v e b e h a v i o u r to b e e x p e c t e d in ' r e a l life' situations. A s an e x a m p l e , c o n s i d e r t h e p r o b l e m o f t r a c k i n g a t a r g e t m o v i n g at a velocity ~, = 00 + w h i t e noise, w h e r e 00 is an u n k n o w n ' t r e n d ' c o m p o n e n t . I f srt is t h e p o s i t i o n o f t h e p u r s u e r , u t its velocity, t h e n ~ = u a n d a c r i t e r i o n to m i n i m i z e m i g h t b e t h e m e a n s q u a r e d d i s t a n c e plus t h e e n e r g y spent:
1
E{-~foT[A(~t-zt)2
+Ut2]
dt}
min!
T h e n e w state v a r i a b l e Yt = ~t - z t obeys t h e d y n a m i c s (1.2).
References Benes, V.E., Karatzas, I., and Rishel, R.W. (1988), "Degenerate diffusion arising in a control problem with partial observations", Preprint. Christopeit, N. (1986), "The predicted miss problem with unknown drift parameter", SFB-Preprint, University of Bonn. Durrett, R. (1984), Brownian Motion and Martingales in Analysis, Wadsworth, Belmont, CA. Fleming, W.H., and Rishel, R.W. (1975), Deterministic and Stochastic Optimal Control, Springer-Verlag, New York. Helmes, K., and Rishel, R.W. (1991), "An optimal control depending on the conditional density of the unobserved state", Preprint, University of Kentucky, Lexington, MA. Ikeda, N., and Watanabe, S. (1981), Stochastic Differential Equations and Diffusion Processes, North-Holland, Amsterdam. Lestienne, R. (1979), "ModUle Markovien simplifi6 de m&6orologie ~t deux 6tats: L'example d'Odeillo", in: Analyse Statistique des Processus M&~orologiques en Vue d'Application a l'Energie Solaire, CNRS-Pirdes, Paris, France. Liptser, R.S., and Shiryayev, A.N. (1977), Statistics of Random Processes, Vols. I&H, Springer-Verlag, New York. Scartezzini, J.L., Rey, D., and Liebling, T. (1987), "Predictive control for back-up auxiliary heaters in passive solar devices", in: Proceedings Conference ICBEM 1987, Vol. IV, Lausanne, Switzerland.