0005-1098/81/020393-03 $02.00/0 Pergamon Press Ltd (~ 1981 International Federation of Automatic Control
Automatica, Vol. 17, No. 2, pp. 393 395, 1981 Printed in Great Britain
Technical Communique A Generalized Certainty-Equivalence Result In Stochastic Control* W . J. R U N G G A L D I E R t Key Words--Dual control; dynamic programming; stochastic control.
Abstract--The certainty-equivalence property, known to hold under certain assumptions for the linear-quadratic stochastic control problem, is extended to more complex situations, when the system noise variables are related to a complicated (nonlinear) system, whose state can be partially observed,
that
where
(i=0
1. Introduction and problem definition A USEFUL MODEL in applications is the following (discretetime) linear-quadratic stochastic control problem
J
=hk(Xk, Wk)
( k = 0 ..... N - I )
(1)
_ E f N i ' (x'kQkXk+u'kRkUk)+x'NQ,vXN}~Iin --
kk=O
where x k denotes the (n-dimensional) state of a given system at time k, Yk is the (p-dimensional) state observation, uk the (m-dimensional) control variable and Vk, w k are disturbances. Fk, Gk, Qk, Rk are given matrices of appropriate dimensions. Setting yk: = {Yo..... Yk} and /Ak- 1 : = {t/0 ..... Uk- I }' the objective is to determine a sequence of control inputs uk* = uk. (y,k uk- t )
2. Assumptions and proof oJ the CE property Augment the state x k by considering the entire noise sequence {Vk=Uk(Zk)-[-Uk} as part of it. Let X k = (x~,t;~..... v~ i )' be the augmented state and set
that minimize J among those that for each k depend upon ( ) , k u k - 1 ).
Take now the following more general problem
[-Fk ~ 0..... I . . . . 07 q b k = | - -, . . . . "- - " ' - - ' - - I , L0 I IN, ~ ~ r " J
Xk + 1 = f kxk + G kUk -1- t!k ( 2k ) -b fk
zk+ 1 =L(zk, xk, u~, r~) rlk
=hk(Zk, Xg, % )
J
=E{k~
( k = 0 ..... N - l )
(3)
and U*(Xk, Vo..... t,N_l) is an optimal sequence of control inputs for the deterministic problem corresponding to (2), namely when x k is completely observed and Xo as well as vk = Vk(Zk) + Ok are deterministic and known. For applications of the result given here, knowledge about the structure offk, hk, vk and of the statistics for Xo, vk- rk and w k is required only to the extent that one be able to determine the quantities in (3) and to verify the assumptions (A.1) and (A.2) below. The result to be given is an extension to (2) of the one obtained in Tse and Bar-Shalom (1975) for problem (1) and requires the additional assumption A.I below.
Xk + ~ = FkX k + GkU k q- t!k Yk
. . . . . . 'g - 1 )
~R=[G;,,0 ..... 0]'
with I , , , in the kth position
(2)
N-1 °
(X'kQkXk+U'kRkUk)+xNQ~'xN} --~min
I0 j
(4)
Problem (2) then becomes equivalent to
obtained from tl) by considering the vk's in (1) as exogenous variables related to a complicated system with state zk through Vk=Vk(Zk)+fk, where gk is pure noise. Along with x k, also zk is now partially observed with observation noise wk, but it does not enter the criterion function. The dynamics for zk are allowed to depend on Xk, Uk and a system noise rk. Denoting the extended state observation by qk and setting qk: ={qn,...,r/k}, the objective here i s t o determine a control sequence u~, __ k Uk 1 ) minimizing J among those that for -Uk* (r/, each k depend on (F] k, Uk 1 ). The purpose of the paper is to give conditions under which a certainty-equivalence (CE) result holds for (2) in the sense
Xk+I=OkXk+U?kUk;
Xo random
zk + t =Jk (Zk, H X k , Uk, rk ) rlk
=hk(zk, H X k , % )
( k = 0 ..... N - I )
J
= E{ k~i (X'kQkXk+u'kRkUk)+XI,.QNXN}--'min
(5)
where H is a matrix of the type H = [1 . . . . 0,..., 0]. Assume: Qg>0, Rk>O all k, Q u > 0 and (A.1) For each k the first two conditional moments E{Xk]q~.,uk 1} and E{XkX'k[~k,u k-I } are finite and independent of uk. (A.2) With ~k:=(X~:,Vk,...,VI~._I)', the joint conditional covariance
*Received November 30 1979; revised June 10 1980. The original version of this paper was not presented at any IFAC Meeting. This paper was recommended for publication in revised form by associate editor Y. Bar-Shalom. This work was partially supported by G N A F A / C N R of the Italian National Research Council. tAddress: Seminario Matematico and Istituto di Statistica. Universit/l di Padova, 1-35100-Padova, Italy.
is, for each j ~ k, independent of u~- t. 393
394
Technical
Assumption IA.2) corresponds to the one in Tse and BarShalom ( t 9 7 5 ) a n d both ( A . I ) a n d ( A . 2 ) a r e less restrictive than those required by Duchan (1974) in an analogous situation. lhe following /emmas and Ihc corollary serve as preNminaries i,o the main theorem; they are partly extensions of results in Kind (1976).
Commtinique |~Off, u ~ ~)-:l:',A"~H~.\~,lq~'.u~
~',
;.
~!<.,
where ;q is independent el u~ ~ ~ith :zx <: 0 and ~>.l-lcre li,~ i. as in (6) with A~ as ill Lemma 2.2. l u r t h e r m o r c
with
.£~,__EIX~[,ff.,~
t,
il6~
Lemma 2.1. [.el % . ~a, (:2~ be as in {4} and let A, be symmetric, positi,
u, - 0 ,
Proql: The proof is by induction from k N to k--t) using the principle of optimality. For k - N {151 is easily seen to hold. Suppose it holds for k + 1; Ilae principle of optimality and the first equation ill t5) then yield l~(q~ ua
havl2 the strtlclnre
min(E',X'~Q~A~ +u'~Rdq
f1/
_5__
"-] t71
'
Pt Qs,+b~M~P~,II'~,
+ E ', (X'tql't + u~'~ )H~, , i {q)~)(~ + ~P~u~ I I ,/t + '.
n'iirl(l'.',X~,Q~.X~.4 1".',A~
(P\--Q~)
(8i
(M,,-i'))
(91
l,u ~', qLl," "
u
+ 2t,'aW'~ u~,, ,%El£:xti,:
~1~, d' l,
t,,tl
9u'~!R~ +"P'~I'/~,~,~)u~ ÷ t : : : q . ~ ! q ~ , u ~ ~',)
7~-F~m~[7~+t+,$~.fl'~+~]
(7"~-0
for
i: 0 .....
N-
1) I I()l
with bi. ~ the Kronecker symbol.
Prot also satisfies t ' ~ ~(;,A t ~(;'tPt
Q~,+l'~il' t . ,
,t,,
1) tire ( n x n I - m a t r i c e s
t' t . ~(/~A~, ~(;'~
,'i.l, - I
- min (/"~{[A'{QkX/, + u{Rd~ ~ u
'l'
where Pa as well as 7"[ ( i = 0 ..... N ~,atisf)ing tile recursion rehltions
t' x
I)
t171
where the symmetry of H~. ~ ~ isce (6l) has been tlsed as x~cll as the fact that ll~ is required to be a measurable ['tlnCliol/ el {q~ ltt i j. Note now that Ior ally {squareFmatrix .I al]d Iandoill vectors .\, )
sl,\.l,\
I ) i ::t/. 4 . 1 7 ~ . \ k " ) ',
II~l
Also, if till relevant i'andom qtltlntilleS are absoluicl 3 coi/tinuous, assumption (A.II can easily be seen It) imply that
thai P~ aild
I)I'A: 1)\
O\ {11)
u hi,I / k, k I, are measurablc functions o f (qt.u~ ~) thai du not depend upon thc choice of tk coincidii~g ~ith E{ Xx i~lt, u~ ~} and E I.\,.Y'~ [,ff.,t ~,, respectively. I s i n g ~!lc induct,ion hypothesis. ( 171 then becomes
alld l]ltll /'~
(I
tt)r
l~{qt.l{ ' ~i
i
{12l
Corolh,'y 2.1. Tile matrices, H, <.l>',tiJ.~~hu~:\~ ~W'~B~,, ~<1)~, have equal to zero their rows and columns fi'om the O~ + 1 I-..t to the {n+ kn}-th for all k - - 0 ..... ,\ 1. t>roqll Pollox~> from structure of (l)x. {iJ
l.emma
2.1.
fronl
i12)
and
Ak
tP k t l k ,
alld the
' ', 4 rain {2.;,q",l L
,
1{ll t "El .hi, q~-It"
''
the
Lemma 2.2. If Ok>0, R t > 0 for all k and (._ix >l). then A~ tt'~l/~ . ~
( I for all I~. Pro0L l:rom Leml'lla 2.1
E',~a+, i'/-'t
By Lemma 2.2. tile unique milmnizing u, is therefore given bx (161. Substituting {16)in (19} it follows that l~lq~.l?
~)
£17t.~lQtdtl)~H~,ldj~,jX~.lqku~,
i,
structure of %,
I tt'~k t RI, : (;'klJk + 1 (Tk +
Rx-
(20)
( 13 I For any square-matrix ,I the following holds
l-'urthermore, from the matrix cqualion (Meditch. 1969)
_t~,\,UI.\. (l't)l+(i,,R;<
~(;~1
=
q~.us. 1,, t.;X~iq~.tl~ t~ :i./..i.\.t ~l~.u~ I[
{14i
fi'om 1111 and from the assumplion>, it follo\~.~ bx reduction fi'om k N tO k " 0 thai P t > 0 tot all k.
where
(see
I1<~)} the
left
hand
side
i,, al,,o
equal
to
tr[,4 "£{~.~t.~'j. lq L,u*" i 1] having u~,ed the notation .?s. \'~ t71 .\~ l,ff. ,~ ~I. With 121 ). t2(1! Call be continued ;l~
7'hcorem 2.1. Let l~{,ff, u ~ i I
:
inf
L
~.+ IA;O,X,_ +,,',RHsI+Xi, Q , . \ , i , I ~ . , , a
II
-t:IX~qY~I4~,~ ,IP~A, I~P~I.'I,'~. ffl)k,¥klq*,, ~ t, ~[1:'~
then
+lrid~',H~
~i~lL, ~ ',, LtP~.%. ~ P [ t $ [
~{I)~'b.~,.tT~.~'~lt/~.u ~ ' [ j ]
i22
Technical Communique which, with Wk as in (6) and cq equal to the last term to the right (within [ ]-brackets), becomes (15), where, due to assumption (A.2) and Corollary 2.1, ~k has the required property. [] From the proof of the theorem one immediately has Corollary 2.2. The optimal control law for the deterministic problem corresponding to (5) is u*(Xk) = -- A ; lW'~Wk+ l~kXk.
(23)
Theorem 2,1 and Corollary 2.2 together imply the CEproperty for problem (5) and hence also for the original problem (2). Corollary 2.3. The optimal closed-loop stochastic control law (16) can be expressed directly in terms of the nonaugmented state problem (2) as Uk(~lk, Uk- 1)= _ A k I G,kPk + l (Fk~klk + ~k lk ) N-I
+
Z
G~,T~+I6Jlk
(24)
j=k+l
where ~kl~' ~Jtk' Ak' Pk, T~ are as defined in (3), (13) and (8) to (10) respectively. Proof. Follows directly from (16), (13), Lemma 2.1, (12) and the structure of ¢Pk" []
395
3. Conclusions
The certainty-equivalence (CE) property has been shown to hold for more general situations than the pure linearquadratic stochastic control problem. The motivation for the study came from an economic application of the linearquadratic model, where the system noise variables are exogenous variables affected by a very large and complex system, viz. the 'economy', whose state can be partially observed with the observations given by the informations to an economic agent.
author gratefully acknowledges a very stimulating discussion with Prof. Y. Bar-Shalom. Acknowledgements--The
References
Duchan, A. (1974). A clarification and a new proof of the certainty-equivalence theorem. Int. Econ. Rev., 15, 216-224. Kind, P. (1976). II concetto di aspettative razionali nei problemi di controUo adattativo ottimale. Thesis, University of Padova. Meditch, J. S. (1969). Stochastic Optimal Linear Estimation and Control. McGraw-Hill. Tse, E. and Y. Bar-Shalom (1975). Generalized certainty equivalence and dual effect in stochastic control. IEEE Trans. Aut. Control AC-20, 817-819.