SUMMARY OF DISCUSSION PREDICTION OF THE OPTIMAL CONTROL SEQUENCE
by V. N. Novoseltsev USSR National Committee on Automatic Control Moscow, USSR
EXAMPLE
Consider the discrete-time system described by the equation:
To simplify the numerical solution a continuous time system is used to illustrate the procedure of determining the optimal control sequence. The system is described by the following equations: where
xi
is the state of the system at time x(t)
i, ui is the control signal and hi is a random disturbance. x is the initial state. The
yet)
o
~.u(t)
= x(t)
+
••• (4) het)
performance index of the system is: ~ is an unknown gain but the prior mean and variance of ~ are y.nown to be m and ~
••• (2)
respectively. The control u(t) can be measured yet) is a noisy measurement of the state x(t). het) is ~hite noise of variance s.a(t). x*(t) is the desired output. For simplicity put x*(t) = const = a. The performance index is: perfectl~
i=O Let Ik
I
denote the information state at time k comprises the ~rior information Rnd the
k.
information obtained by meas~e~ent of the system at times 0,1, .•• k-1. Let ~k) denote the
2
control se~uence (~k), ~~~, ... u~k)) obtained by minimising E(Q/l ) at time k. Thus ~k) k is that sequence of controls (~, ~+1' ••• ~) obtained from performing the following minimisation:
r
N
lE
min
~"'''1rL
[L
F
i (Xi'
U)/~JJ
....
(3)
i=k
lx( t) - x*( t)J
~o
determine uO(O), the optimal control at t = 0, "e have to optimise E(~IIo) with respect to the function u(t), 0 ~ t ~ T; UO(O) is the value of the optimal function at t = O. Let se(t/I(O)) denote;~(x(t) 11(0)) and ~(tII~O~) denote Var (x(t) [r(e)). It is easily shown 1 that: ••• (7) t(tlr(o)) m u(t)
This minimisation is first performed at time 0 and U -(0) is obtained. The optimal control at o time 0 is the ~ member of this sequence, u(o). At time 1, 1 1 is obtained, the minimisao
U;I)
tion repeated yielding of this sequence, at time
Ufl),
~k)
k
sequence,
is the optimal control at time
o
' u1
,
•••
UN
••• (8)
z( t)
••• (9)
Thus:
:f
T
and the first member of this
E(Qlr(o))
[(i(tlr(0))-a)2 +
~(t'r(O))Jdt
o
k. It can be sho\'m that dynamic programming would yield the same optimal control sequence (0) f1) (N)) ( U
(t/r(o))
where:
is
obtained at time
~k),
x
is the optimal control
Proceeding in this way
1.
~
and the first member
• •• (6)
T
dt =/ Lr(m!! - a/ --:-~--l ~e ~
•
+
o
28
1 +
s
••• (10)
t
••• (11)
since:
-:<;nuation (10) must be optimised ',','ith respect to the function u(t), i.e. with respect to the function z(t), 0 ~ t ~ T, where z(O) = 0 and z(T) is free. If u(t) is differentiable, z(t) is twice differentiable and classical calculus of variations can be used. Let F(z, z) denote the integrand of equation (10). The Euler equation is:
~~~~_:_l __~~=~_~~:~_~:
1,;( t)
••• (19)
D (0) :: 2 11 -~s-- J u ( 't )d't o
• •• (20)
of F - z --
const
oz
i.e.
z'> (t)
c2 t
• •• (12)
and
uo(t)
c
• •• (13)
where ZO(t), uO(t) are optimal functions. value of c is given by 0
2
f(mc - a)
+
oc L
-~-=:-lJ
1 + rc
0
2
Pieure (1 ) shows for t = 0 the optimal control UC = c as a function of the tine left to EO, T, when m = a = ~ = S = const = 1. '"ben T is very small, r« 1, and uO" 0.5. .1ith larger T UO increases above the I complete information level' in order to 'probe' the system. '-'/hen T is very large sufficient time is available for information gathering, no extra probing is required, and the control decreases to the complete information level.
The
••• (14)
REFEREnCE T
(J"2
where
J:__
r
The optimal value of
+ (m
where
r «
1
,
c
2
1. A. A. Feldbaum. "The Principles of Optimal Control System Theory" Moscow, 1963.
• •• (15)
S
is therefore given by:
+ ~)c
- ma
Dr. D.1. J,Iayne (UK) commented as follows on this contribution:The procedure proposed in these remarks, while it may be useful in some applications, is not optimal. The procedure proposes the prediction, at time le end using the information 'state' I at time k, of a future opt~~al k control sequence '::hich is 'open-loop', Le. the acquisition of future data is ignored in the prediction other:Qse the future control values
0 • •• (16)
ma ------2 m + (J"2
UO
f.L
'::hen
r » 1
,
a
UO
("
r
»
k+r '
1
The solution defined by equation (16) can also be used at time t, if m and (J"~ are
M(t) D
fl
(t)
and S(x(t)
D (t) fl
!
Var (x(t)
where:
let))
••• (17)
I
• •• ( 18)
let))
r
~
1 ,
would have to be sryecified in feedback form as functions of the (future) information 'states' I + • For a lucid discussion of this point see k r S. E. Dreyfus "Some types of Optimal Control of Stochastic Systeos", J. Siam Ser. A. Control vol. 2, Na 1, Jan. 1963, pp.131-140.
mep.ns that Var (x('::7) ! reO)) «1 and is referred to as the complete information level.
replaced by I,r( t)
,
u\J... J
m
Dr. Novoseltsev replied as follows:It is obvious that the conciseness of the contribution has produced misunderstanding of some of its asuects. As a matter of fact the aquisition of"future data is not ignored when optimal prediction is to be performed.
It has been Shovm(1) that:
29
-----=-~-"..,.,~----
1.0
complete informat.ion level
0.5
0.1
--+----+---_--_ T
..L.-_--L
O~
0.2
0.5
2
1.0
complete inf. level is
5
_ a _ compl. - IT; -
UO
Figure 1
30
10
20
50
For detailed consideration take the example given in the contribution for Gaussian distributions. The aquisition of future data is now the measurement of Yj , (j = 0,1, ••• ,i).
.
yr = Y , ••• ,y, .t
=J
u = u o 'u1 , ••• ,ur , r Then for a closed-loop
Consider vectors etc.
aG. )
0
system when
Yi
distribution
kn~vn,
is
p(~)
and there would be no information accumulation at all. In the open-loop system it would be E(~) = m, D(~) = C? for all future 1. Then
is i IT
C.P (~)
j=O
1 0
p(y . J
!.L
I ~'UJ.)
E(x.II ) (open loop) = uim, D(x.1 1 0
As for the future optimal control sequence at the k-th step, it is clear that it will not be used in reality for at the next (k+I)-th step the new prediction of control sequence will be found, based upon I k + instead of I • For 1 k this reason there is no need to specify future ~(k) in feedback form, ~(k) are simply the K+r K+r numerical parameters used to find the actual
• •• (2)
control and ~
--------~--~---2 1
[
~
uj
I
00
00
(;;)U+1)72~i+1~- j ... J ~.....
.....
i
(" lcy'h )2 ~ u.(~u.
+ h.) + m
I.r-J. J J J ---------~=Q----------------- x ~
i
I +
(a" ICY. )2 \ ' u 2 !.L
h
0
j
j=O i IT
x
j=O
For the open-loop system, i.e. with no measurements of Yi the conditional probability ~ould
_ -
at every k-th step:
(k) ("(k-1) u k-1 '
~
The last equation is the feedback form of optimal control.
for the i-th step can be found as follows:
h
~pt = ~k) opt
••• (3)
j=O E(~)
(open-loop)
All these formulae are valid for any step, not only for k = O.
where
G~
11 0 )
u~a;: • • •• (1)
+
••• (5)
1
the a posteriori
be given by the next equation instead of
(1)
31