Automatica, Vol. 8, pp. 797-798. Pergamon Press, 1972. Printed in Great Britain.
Correspondence Item A Note on Generating Certainty Equivalent Controls* Une Notice sur la G6n&ation de Contr61es de Certitude Equivalents Ein Bericht tiber Erzeugung van Kontrollen mit Bestimmtheits~iquivalenten 3aMe,-ianrIe no nobody BbIpa6oTKI4 Bepo~ITHOCTHrorrrpoola 3KBHBa.rIeHTOB H. HADIMOGLOU
a n d O. L. R . J A C O B S I "
Summary--It is suggested that in stochastic control problems where there is uncertainty about the current state the correct estimate of the state to be used for generating certainty equivalent controls is not always the mean value of the probability distribution for the state.
Control a, ,
THE IDEAof certainty equivalence in stochastic control is well known [1] and has been discussed elsewhere [2, 3]. Figure 1 represents the usual class of stochastic control problems for which the certainty equivalent control law
uc = f u n c t i o n ( i n f o r m a t i o n a b o u t state x)
disturbar~e zl
F
i[
State x m
FIG. 2. Deterministic equivalent of system in Fig. 1.
It has often been assumed that the "information about state x " ' to be used in generating the certainty equivalent control of equation (1) should be the mean .~ of the updated conditional probability distribution for x. The purpose of the present note is to suggest that problems exist where this conditional mean is not the most appropriate information about x to use. The suggestion arises from further consideration of a particular non-linear problem which has been discussed elsewhere [4, 5]. It follows that this particular problem is not, as was previously supposed [3, 6], one where intentional errors are essential for stable dual control.
(1)
noise z 2
State, •
u
Dynomics
Cont~[ laW
Introduction
Caetrol
[
Inforrnoflon
-
Measurement
,[
EstFrt~to¢
I
Obser~at~n y
The particular non-linear stochastic control problem
I= i
The particular problem is singie-variabled with scalar dynamics having a linear discrete-time difference equation
i
I
t [ Con._.troller
. _ _ _J
x(i +
1)= x(O + u(O + z(i)
(3)
Fio. 1. General class of stochastic control systems. where i = 1 , 2 . . . . is the discrete-time variable and the disturbances z(0 are normally distributed independent random variables with zero mean and known variance s 2. Non-linearity is introduced by the measurement equation.
is defined as that law having the same functional form as the optimal feedback control law for the equivalent deterministic problem of Fig. 2.
y(i)=x(i) 2 . ud = f u n c t i o n ( x ) .
(4)
(2) For every measurement y(i) there are two possible values of the state x(i)
* Received 9 May 1972; revised 31 July 1972. The original version of this paper was not presented at any I F A C meeting. It was recommended for publication in revised form by Associate Editor K. AstrSm.
x(i) = + w(i) where
t Department of Engineering Science, University of Oxford, Parks Road, Oxford, England.
w-- + 4 7 797
798
Correspondence item
and so the conditional probability distribution for x is as shown in Fig. 3 where
Ic=2"4s 2
{9)
which compares favourably with the optimal value 2.2s 2 of equation (5). With this change in the estimate used, the certainty equivalent control thus becomes a good, although not the best known [4], sub-optimal control. For this particular problem it can be shown that the same sub-optimal control is generated by the much simpler control law
q = probability that x has the positive value.
p(x)
u ( i ) : ( - 1)iw
(10)
which requires no calculation of q. The coincidence of the certainty equivalent control law with this much simpler suboptimal control results from the rather simple nature of the problem, in particular the absence of measurement noise; it is not thought to affect the conclusions stated below.
[
-~1-q +v,"
--W
,Jr
Fw.. 3. Probability distribution for the state x. This probability distribution can be updated according to Bayes rule [4] using w and q as sufficient statistics describing the state of the stochastic process. The optimal control minimising the performance criterion N
I= Lt --1 E x2=x----~ N~
X i= l
for the problem is known [5] and the resulting optimal value of the performance criterion is
Iopt = 2"2s2 •
(5)
The certainty equivalent control for the above system is
uc = - (estimate of state x )
(6)
and is not optimal. It has been shown [4] that when the mean value
.¢c=(2q - l)w
(7)
of the updated conditional probability distribution for x is used as the estimate in equation (6) the resulting system is unable to track the disturbances z(i) and so is, in a stochastic sense, unstable. It was therefore thought [3, 6] that this particular problem provided a counter example to invalidate the general proposition that certainty equivalent control is always a useful sub-optimal control. An alternative way o f generating the certainty equivalent control Further consideration of the problem now suggests that the mean value is not a good estimate for a variable like the state x having the discrete distribution of Fig. 3, because its value ~ given by equation (7) will never coincide with the true value +w, except when x is zero. A more natural estimate would be the mode; that is the more probable of the two possible values of x.
estimate o f x = w =-w
if q > 0 . 5 if q<0.5.
(8)
The system has been simulated using this estimate, equation (8), to generate the certainty equivalent control according to equation (6). The simulation showed this certainty equivalent control to be stable and the average value of the performance criterion was observed to be
Conclusions It is concluded : - (i) that the estimate used to generate a certainty equivalent control should not always be a mean value but should be chosen in whatever way is most appropriate to the problem at hand. This apparently trivial result has not, to the authors' knowledge, been previously recognised in the literature on control. The question of how, given some other non-linear stochastic problem, to choose an appropriate estimate is a matter for further research to which there may be no general answer. (ii) That the particular non-linear problem discussed does not provide a counterexample to invalidate the general proposition that certainty equivalent control is always a useful sub-optimal control. References [1] M. AOKI: Optimization o f Stochastic Systems. Academic Press, New York (1967). [2] J. W. PATCHELL and O. L. R. JACOBS: Separability, neutrality and certainty equivalence. Int. J. Control 13, 337-342 (1971). [3] O. L. R. JACOBS and J. W. PATCHELL: Caution and probing in stochastic control. Int. J. Control 16, 189-199 (1972). [4] O. L. R. JACOBS: Extremum Control and Optimal Control Theory, IFAC Symposium on Identification of Systems, paper 5-10, Prague (1967). [5] O. L. R. JACOBS and S. M. LANGDON" All optimal extremal control system. Automatica 6, 297-301 (1970). [6] D. L. ALSPACHand H. W. SOa~NSON : Stochastic optimal control for linear but non-gaussian systems. Int. J. Control 13, 1169-1181 (1971). R6sum~-I! est sugg6r6 que dans les probl6mes de contr61e stochastiques ok il existe une incertitude sur r6tat du courant, l'estimation correcte de l'6tat h utiliser pour g6n6rer des contr61es de certitude 6quivalents n'est pas toujours la valeur moyenne de la distribution de probabilit6s pour l'6tat. Z u s a m m e n f a s s u n g - - E s wird vorgeschlagen, dass bei stochas-
tischen Kontrollproblemen, bei denen eine Ungewissheit fiber den laufenden Zustand besteht, die korrekte Sehfitzung des Zustands, der zur Erzeugung von Kontrollen mit Bestimmtheitsiiquivalenten benutzt werden sell, nicht immer der Mittelwert der Wahrscheinlichkeitsverteilung ftir den Zustand ist. Pe3mMe---I[pe/InoY[araloT, ~/TO npH npo6~eMax CTOXaCTHqCCKOFO KOHTpO~Iff, Korea He HM~TC~q o n p e , ~ e J I e H H b I X ~aHHbIX O cylIICCTBytOI~eMB HaCTOfllHCCBpeM~ COCTOflHHH, npaBH.q~HblM HaXO~f~eHHe/~lC~CTO~rIH$IHO~j/ezaJ~ero HCnO.rlb3OBaHHIOjLrI.qBblpa6OTKHBcpO~THOCTHKOHTpOYLq3IOaHBaYIeHTa He scer~a sarmerca c p e ~ c e 3Ha~leI-mC pacnpe~c.~eHH~ BcpOflTHOCTH~ ~HHOFO COCTOflHHJI.