Chapter 11
Continuous Stochastic Processes
In this chapter, we shall consider briefly the case of a continuous stochastic process. In order that the presentation not be too weighty, we shall limit ourselves to processes which have a completely measurable state. Even in the case of discrete processes, if the state is available only through noisy measurements, the theory becomes somewhat complicated. In the continuous case, if the state measurements are noisy, the theoretical results become quite elaborate involving functional equations with partial derivatives, and their practical utilization seems remote (except in the linear case). We shall first study processes with continuous states, for which we shall recover, in the linear-quadratic case, results analogous to those found in the discrete-time case (a linear control law, and a separation theorem). In the final section discrete-state systems will be considered, for which simple results exist. 11.1 Continuous Stochastic Processes with Continuous States
11.1.I
Definition of the Problem
Let us consider a process with state vector x(t), to which are applied controls uft) chosen from some domain 9. The future evolution of the state is random, and at time t + At AX = X(t
+ At) - ~ ( t )
is a random vector. 151
(1 1.1)
152
11 CONTINUOUS STOCHASTIC PROCESSES
Suppcse that the process is Markov (see Boudarel et al. [ I , Vol. 1, Chap. 14]), and that, as At tends to zero, the first moments of Ax are of the form
(11.2) (11.3)
The higher moments are assumed to be of order greater than the first in At. In particular, these conditions hold when the process can be represented by a differential equation
+
(11.4)
1 = f ( x , u) G(x, u ) z , where z is a “white” noise vector with correlation matrix E[z(t)z’(t
+T)]
= r6(T)
,
(11.5)
with 6(r) the Dirac “function.” In this case, @(x, U) = G(x, u)T G’(x, U)
II.I.2
.
(11.6)
Solution of the Problem
The control law is to be found which optimizes the mathematical expectation of the total future return, up to the final time T. Let R(x, t ) be the mathematical expectation of the future return, using an optimal policy, starting from the initial state x at time t . For small At, the method of dynamic programming relates R(x + Ax, t + A t ) to &x, t ) through a functional equation obtained from the principle of optimality : R(x, t ) = opt [r(x,u) At
+ E{&x + Ax, t + A t ) } ] .
(11.7)
U S 9
The expectation in (1 1.7) is conditional on knowledge of x, and
+
x + Ax then corresponds to the state at t At resulting from application of the control u over that interval. Expanding in a series,
R(x
+ Ax, t + At) = R ( x , t ) + I?, At + RXTAx + f AxTRXxAX +
* - - .
(11.8)
1 1.2
LINEAR SYSTEMS WITH QUADRATIC CRITERIA
153
Because of (1 1.2) and (1 1.3), E[RxT AX] = R,TE[Ax] N R , T f ( x , u) At,
(11.9)
E[AxTR,, Ax] = E[trace(Ri,, Ax AxT)] = trace(@,, N
E[Ax Ax']}
trace(R,, #} At.
(11.10)
Substituting (11.8) and (11.9) into (11.7), and passing to the limit as At tends to zero, yields
ri, + opt ( r ( x , u ) + R:f(x,
u)
+ $ trace[R,,
@(x, u)]} = 0, (1I . 11)
U € D
with the boundary condition that &x, T) = O for all x in the domain of terminal values. 11.1.3 Discussion
a. Using the operator L,, [ ] introduced by Boudarel et al. [ l , Vol. 1, Sect. 14.2.41, Eq. (1 1.11) can be written simply opt {r(x, u) Y
€9
+ L,[&
t)]} = O
.
(11.12)
b. For # -,0, the stochastic process becomes deterministic, and the functional equation obtained in the preceding chapter is recovered here. c. So far as practical use of this theoretical result is concerned, the same remarks can be made as for the continuous deterministic case, since the presence of second partial derivatives here again complicates the situation. 11.2 Linear Systems with Quadratic Criteria
Let us consider the linear (not necessarily stationary) process i= Fx
+ Hu + G z ,
with z being a white noise vector with correlation matrix cost criterion be quadratic: r ( x , u ) = xTAx + 2xTBu+ uTCu.
(11.13)
r &t).
Let the (11.14)
We shall show that the functional optimality equation (1 1 .I l), in this case, has a solution of the form
R(x, t ) = xTQ(t)X +k(t) .
(11.15)
154
11
CONTINUOUS STOCHASTIC PROCESSES
Assuming (1 1.1 5),
8,= x'Qx + k , 8, = ~ Q x , fix,= 2 4 ,
and the term to be optimized becomes xTAx + 2xTBu + uTCu + 2xTQ(Fx + Hu)
+trace[QGrGT].
The optimum is obtained for 2BTx + 2Ch
+ 2HTQx=0 ,
so that Ci = -C-'[B
+QH]'x.
(11.16)
Substituting the solution (1 1.16) into the functional equation (1 1.1 l), and separating the terms quadratic in x from those independent of x, leads to Q ( t ) = -A - Q(t)F -FTQ(t)
k(t) =
+ (B + Q(t)H]C-'[B + Q(t)H1',
(11.17)
- trace[Q(t)GrGT].
(11.18)
The relation (11.17) is a matrix nonlinear differential equation of the Riccati type. Equation (1 1 .I 7) is the same as that obtained in the deterministic case, and thus the control law is unchanged. This result expresses the separation property for the control of a linear process. Relation (11.18) allows calculation of the degradation of the return, resulting from the presence of perturbations z. Note that the problem has a solution only if C-' exists, since otherwise it is possible to find control laws which lead to zero return even for T -+ 0. When the final state is free, Q(t,) = 0 , and the matrix Riccati equation can be integrated in reverse time. Remark. It can be shown that the separation theorem still holds true, even in the case that the state is not completely measurable. This result can be obtained either directly, or by passing to the limit in the discrete case. Thus an estimator P of x can be used in (1 1.16) to obtain d. The main results of the theory of state estimation for continuous linear processes are summarized in the appendix.
1 1.3
CONTINUOUS SYSTEMS WITH DISCRETE STATES
155
11.3 Continuous Systems with Discrete States
II.3.1 Definition of the Problem Consider a process with discrete states, denoted by i E [I, ...,m].Let Pij(t,, [u]::,, t , ) be the probability that at time t, the system is in state j , if at time :o it was in state i, and if the control u(t) was applied over the interval from to to t,. Suppose further that for t , -+ to this probability behaves as Pij(u) Az , i #j , Pij(t, u , t + At) -+ (11.19) 1 + .*., i = j .
+
i
The return to be optimized will have two kinds of terms: (1) a component of the form J ri(u) dt computed over times during which the state is fixed; (2) a component rjj(u) computed at a time at which the state changes. We shall first assume that the horizon is fixed at T and that the final state is free. 11.3.2 Solution of the Problem Let Ri(t) be the mathematical expectation of the optimal return when the initial state is i at time t. For small At, the principle of optimality leads to
fii(t)
= opt{ri(u) UEZ2
At
+ fii(t + A f ) + 1 [ r f j ( u )+ Rj(t +At)]l
j(u) At}.
(1 1.20)
j# i
Assuming Ri(t) to be continuous in t, Ri(t
+ At) = Ri(t) + (dRi(Z)/dt)At +
*
(11.21)
Using this in (1 1.20) and passing to the limit results in dRi(t)/dt +opt(r,(u) U € 9
+ 1 [rij(u) + Rj(t)]Pij(u)} =O.
(11.22)
j# i
Letting R(t) be the vector with elements Ri(t), C(u)the vector with elements Ci(u) = ri(u)
+ C r fj(u)Pij(u), j#i
156
11
CONTINUOUS STOCHASTIC PROCESSES
and P(u) the matrix with elements P i j ( u ) for i # j , and 0 for i = j , then the relations (1 1.22) can be written dR(t)/dt
=
-opt{C(u) +P(u)R(t)}.
(1 1.23)
U S 9
Each row of this is to be optimized independently for u E 9. If the final state is free, the boundary condition is
R(T) = 0 .
(1 1.24)
11.3.3 Practical Calculation of the Solution
Since the differential system (1 1.23) has a terminal condition, rather than an initial condition, it is integrated in reverse time. Thus, letting z=T - t, there results R(0) = 0 ,
dR(~)/dz= opt{C(u) + P(u)~(z)).
(11.25)
U S 9
This system can be integrated numerically step by step, or using the analog computation scheme indicated in Fig. 11.1. For certain simple problems, the optimization units can be replaced by an equivalent nonlinearity, such as the signum function. 11.3.4 The Case of an In5nite Horizon, or an Unspecified Horizon
With an infinite or unspecified horizon we suppose that the process and the criterion function are stationary. The optimal return is then independent of time. A development analogous to that used in the earlier case leads to the following result, in which the term dRldt no longer appears: opt{C(u) + P(u)R) = 0 .
(1 1.26)
U E O
Only those coordinates relating to states which are not in the terminal domain appear in the vector 8. The problems of existence and uniqueness of solutions, and of their practical calculation, can be treated the same as in the discrete case, and we shall not discuss them further.
11.3
CONTINUOUS SYSTEMS WITH DISCRETE STATES
r-----I
I
t I
I I
I
I I
I
I
I
L-
FIG. 11.1. Scheme for solution of Eq. (11.25).
157
1