Copyright © 1996 IFAC
I3lh Triennial World
Congrc.~,
2b-12 1
San FranciS(:o. USA
REDUCED COMPLEXITY NONLlNEAR H= CONTROLLERS: RELATION TO CERTAINTY EQUIVALENCE John S. Baras and Nital S. Patel
Department of Electrical Engineering and Institute for Sy.tems &search University of Maryland, College Park, MD 20742
Abstract. This papt~r considers the problem of constructing reduced cOIU}Jlexity controllers for output feedback nonlinear Hoo control. Conditions are obtained under which such controllers achieve the closed-loop perfonnance requirements. These controllers are non-optimal in general. However, in case optimality holds, they are in fact the certainty equivalence controllers. Conditions under which certainty equivalence holds are simplified, an d linked to the ,olvability of a functional equation. Keywords. Nonlinear control systems, Robust control, H-infinity cOIltrol, Discrete-time systems, Disturbance rejection, Game theory.
traIler. All implementat.ion of a certainty equivalence controller can be found in (Teolis et al., 1993). 1. INTRODUCTION Since, Whittle (Whittle, 1981) first postulated the mill' imum stress estimate for the solutioll of a risk-sensitive stochastic optimal control problem, it has evolved iuto the certainty equivalence principle. The latter states that under appropriate conditions, all optimal output feed· back controller can be obtained by inserting all estimate of the state iuto the corresponding state feedback law. The certainty equivalence property is known to hold for linear systems with a quadratic cost (Dasar and Bemhard, 1991). The recent interest in nonlillear H 00 control has led researchers to examine whether I certainty equivalence could be carried over to non linear systems. If certainty equivalence were to hold, it would result in a tremendous reduction in the complexity of the problem. In a recent paper (Janw' et al., 1994), sufficient conditions were given for certainty equivalence to hold ill terms of a saddle point <:ondition. Also, ill (James, 1994), a simple example is given to demonstrat.e the non-optimal nature of t.he certainty equivalellce cou-
This paper, considers the infinite time case, and deals with establishing sufficiency conditions for a reduced complexity controller to exist. These conditions apply for bot.h optimal and llon-Ot, timal policies. In general, obtaining an optimal solution to the output feedback problem, involves solving aB infinite dimensional dynamic programming equation (.James and Baras, 1995). Hence, one may he &'\.tisfied with a reduced complexity non-optimal policy, which guarantees as)lIlptotit: stability of the nominal (no exogenous inputs) closed-loop system , as well as achieves a pre-specified disturbance attenuation level f. III the special case, it is shown that the policies so obtained are Gertainty equivalence policies. Furthermore, in doing 00, one obtains an equivalent sufficiency condition for certainty equivalence which may be more tractable than the one given in (James et al., 1994). The approach is h",ed on establishing di,sipati vity results, since these guarantee under detectability assumptions, asymptotic stability of the closed-loop system wheH exogenous inputs are :lero. Lastly, it is shown that the condition for certainty equivalence to hold is
2260
equivalent to the existence of a (unique) solution to a functional equation.
PH] = H(Pk,U,.,YH1), k=O,l, ...
Po
E
E
where 2. PROBLEM STATEMENT
"
H(PkoUkoYk+,)(X) = snp {Pk(~) + sup ClI(~,UkoW)
Consider the following system:
(ERn
n.,
l.: {
XHl = !(X" W,) , Y'+l = g(X.,Uk,W.) Zk+l = l(xklukl11Jd , k = 0, 1,2, ...
sup
sup {pa(:co)
+ L I zi+] I' -
sup {VO(x)
+ iJu' (x)}.
1'1
The problem is solved via dynamic programming, where the upper value function M satisfies
hO
"{' I Wi I'}:S
III
X = !(~,Uk,W)'YHl = 9(~,Uk,W))}.
where, Xk E Rn are the states) Yk E Rt are the measurements, 'Uk E U C R'm are the controls, Zk E Rq are the regulated outputs, and 'Wk E R r an~ the exogenous inputs. Furthermore, assume that 0 is an equilibrium point of L:, and U is compact. Denote the set of feasible polic;es as 0, i.e. if u E 0, then ?1k = U(Yl ,k, UO,k-I), where Si,j denotes a sequence {Si I Si+l, . .. , Sj}. Also assume that !, g, and I are continuous. The output feedback problem is, given 'Y > 0, find a control policy u· E 0, so as to ensure that there exists a finite f3u· (x) 2: 0, .B"· (0) = 0, such that
'wEP([o,oo),R") xoER"
I' -,,{21
wER"
(1)
M(p) = inf sup {M(H(p, u,y))}
(3)
UEU YER'
for all p E E, with M(p) 2: (1',0), and M( -iJU) = 0, for some .il"(x) 2: 0, iJ"(O) = O. In particular, M(P) is the least possible worst case cost to go, given Po = p. Now, supposing such a solution M exists to equation (3), one has the following result. Theorem 1. ((James and Baras, 1995)). Let u' E 0, be such that uk = U(Pk), when, U(Pk) achieves the mini· mum in (3) for P = Pk. Then u' E 0 solves the output feedback problem. Here, Pk is the information state trajectory initialized by Po = _/1", and is such that M(p.) is finite for all k.
zERn
where, Po E E, with E defined as E
~ {p E C(Rn) I p(x) :S R for some finite R 2: OJ
Here, 1·1 denotes the Euclidean norm. Also assume that for such u· E 0, Et!'" is z-detcctable. Equation (1) is based on the dynamic game interpmtation of the nonlinear Hoc control problem. Tt ensures that if Xo = 0 then sup wE"([o.oo).R"),w,",o
IlzlI,' < "I Ilwll" -
(2)
F\lrthermore, define the following sup-pairing 6
(p, q) =
snp {p(x) + q(x)} xER
Such a policy, obtained via the dynamic progranuning equation (3) is called an optimal policy. Now, assume u(P) is a non-optimal policy, t.hen there exists a function W : E -+ R, W(p) 2: (p, 0), and W( -iJU) = 0 (iJU(x) 2: 0, iJU(O) = 0), and W satisfies for all P E E W(p) 2: snp W(H(p,u(P),y)) yER
Such a W is called a storage function for the output feedback policy tt. Conversely, if such a function exists, then the corresponding control policy u(P) solves the output. feedback problem. In the well known, state feedback case, denote by V the upper value function of the state feedback problem. Furthermore, V 2: 0, V(O) = 0, and V satisfies
n
Vex) = inf I] -00
if f = else
sup {11(x. u, w) 12 -"{' I w 12
uEU wER n
and the function 6x E [., r5;r : Rn ---+ R"'
Jx(~) ~ {
t
+
l'(f(x,u,w))} X
for all x E Rn. The polic)'
Up,
such that UF(X)
u"', where u"' E U achieves the infimum in the above
An information state based solution was recently obtained in (James and Baras, 1995). The information state is defined by the following recursion
equation is called an optimal state feedback policy. For non-optimal state feedback policies u, there exists a U : Rn -+ R, with U 2: 0, U(O) ~~ 0, and satisfies
2261
U(x) 2: sup {ll(x,u(x),W)
12 -"/ I w 12 +
sup .J~'(x,u) 2: sup (H(Pk,u,y),U).
wERn
:z:ER"
IIERt
U(J(x, U(X), w))} for all x E Rn. Such a U is called a storage function for the state feedback policy u.
Proof:
sup (Pk+I, U) yER
From now on, define 'I cO, to be the set of output feedback policies which have the sHparated structure, Le. depend only on the informatioIl state Pk. Such policies are called information state feedback policies. The control policy generated by the dynamic programming equation (3) is of this type.
=
t
xER" {ER"
..f' I w 1'1 x
s
"( of the closed-loop system (equatioll (2)).
sup
S sup
h(f(~,u,w),w,~)
wERr
Proof: For any f: > 0, there exists x€ E Rn, and w€ E x') (i.e. with x' = f(~, u, w') such that
{j(~, u,
sup
h(x,w,~)
Proof: (Pk,Ul2: inf
=hU(~,u,VJ'),w',~)
S sup
h(f(~,u,1lI),w,~)
Define,
=
+E
X
2: sup (H(p., ,,(pd, v), U), via lemma 2 VERI
Furt.hermore, (Pk, U) 2: (Pk, 0), and ( -U, U) = O. Hem:e, (pk, U) is a storage function, and il is a (non-optimal) solut.ion to the output feedbaGk problem, with the infor-
U -+ R as
J~(x,u) ~ {p(x) + sup r {Il(x,-u, w)
sup J~'(X,U(Pk)) zER"
> 0 is arbitrary, the result follows.
Jf, : Rn
sup .J~'(,·,u)
ttEU zER"
+E
wERr €
"(' I w I' +U(x)}
then ,1(Pk) E argminuEu sUPxER" .J~' (x, u), solves the output feedback problem, and the associated storage function is W(Pk) = (pk,U).
xER" O({,U,x)
Since,
12 -
E U, and a given function
zER" tl'EO(x,tL',{)
sup
= f(~,u w) + U(x)} sup {Pk(~)+ Il(~, u, w)
Theorem 2. Given U : Rn ---) R, U 2: 0, and U(O) = 0. If for all Pk E [
n~sult.
h(x,w,~)
sup
I' -
The following theorem, give~ a sufficient condit.ion for the existence of dissipative TE·duced complexity policies.
R' I x == f(~,u,w)}.
U
sup
(ll(~, u, w)
trJERr
sup sup {Pk(~)+ Il(~, n, w) I' {ER" wER' "('I VJ 12 +U(f(~,u,w))}, via lemma 1 = sup .J~(~, u) (ER"
for reduced complexity control policies, which preserve the stability properties, as well as the attenuation level
Lemma 1. For any ~ E Rn, h:RnxRrxRn-+R,
12 -
{ER" zER" wEil(E,tt,z ;
mensional in general. Hence, this motivates the search
Then, one has the following
(ll(~,u,w)
wERJ'
U(X)}
The dynamic programming equation (3), is infinite di-
E
+ sup
EER"
< SUp SUp {PkW + SUp
3. REDUCED COMPLEXITY CONTROLLERS
{j(x,u,~) ~ {w
ZERTI.
"(' I w 1'1 X = f(~, u. w),y = g(~, U,W»+
= sup
For a given x ,~ E Rn 1 and u E: U I define
sup (Pk(~)
sup sup
YER t
I' -
mation state trajectory initialized via Po =
-u.
wER
"(2
I w I' +U(J(X,U,W)))
Remark: One could have considered any Uk such that
The following result is needed to establish conditions
for the existence of reduced complexity policies which achieve the desired closed-loop performance.
Lemma 2. For any
11.
E U, U : Rn -+ R, and Pk E E,
Corollary 1. (Certainty Equivalence). Given U =: V, the upper value function of the state feedback prob-
2262
lem, and the optimal state feedback policy UFo If for all P. E E sup Jr.' (x,u)
(Pk, V) = inf
(4)
uEU 7:ER"
then U(Pk)
= UF(X),
where x E arg max' ER' {Pk(X)
+
Vex)), is an optimal control policy for the output feed-
In general, since the traject.ory PI. is not known a prim-1,
one needs to check for all P. E E. In fact one ca.1l show that if this were the case, then V is in fact the only function which will satisfy (5). To do so, one requires the following inequality.
Lemma 3. Let il E I, with W its storage function. Then
back problem.
W(Pt)::: inf sup Jt'( x , u ), k= 0, 1, ... uEU ~ E Rn
Proof: Clearly (4) implies that
where, U(x) ~ W(5 x ). sup .J~' (x, llF(X)) = sup inf J~' (x, u) :r. ER'"
xER" uEU
=
inf
Slip
Proof:
.Tt~(X,U)
00
l{EU;r;ER"
W(p.)::: sup {P. (x)
Hence 1 a saddle point exists, and so for any
X E arg max (P.(x) + Vex)) , and U 'E R "
zER"
72
= UF(X) ,
=
I
+
w, 11 2
sup {Pk(X)
(Pb V) = J~'(x,u) = sup Jr' (x, It)
X)
+
Slip
'«'leER
7' I Wk
xER"
I Z'+ 1 I' -
wEI 1«(O,oo),R r) i=k
It "
x ERn
L
su p
I' +
r
L"
(I l (x ,u(pk),w.)
I " +1 I'
- 7'
I' -
lw, 1'1
i=$;;+1
::: sup (If(Pk,u ,y),V)
XHl
lIERt
::: sup {Pk(X)
=
Hence, W(P.) (Pk, V) is a storage function , and W (O,) = Vex), the optimal cost of the state feedback problem. lIellce, the policy is optimal for the output feedback problem.
In general, conditions for the optimal policy m aybe diffi· £:tllt to est.ahlish . However, there may exist non-optimal sta.te feedback policies such that their storage fullctions satisfy the conditions of theorem 2. In that case, using such reduced complexity policies based on the llOlloptimal state feedback policies will guarantee that th e system is asymptotically stable whenever the exogenous inputs are zero. Moreover, such policies will also ensure that the closed-loop system sat.isfies the attenuation level 7 (eqlla~ion (2) ).
+ sup
zER"
(I l (x, ii(Pk), w)
I' -
1J..'ERr
7' I w 12 +U(j"(x,il(Pk),W)))} ::: inf sup {Pk(X) -
sup
'«EU XER'"
1"
Remark: It is sufficient that t.he conditions in theorem 2 and corollary 1 hold only for all Pk , k = 0, 1, .... If this
is the case, then U need not be a storage function for the state feedback problem. It is only when one needs the conditions to hold for Pt E {o, I :r. E R n} that U is for ced to be a storage function.
= f (x,u(p.),wd)}
=
inf
(I (X,ll,W) I' -
wE R"
I w I' +U(f(x,u,w)))} sup .Tt/' (x. v }
uEU xER"
Theorem S. (Unicity). Let M be the upper value function of the output feedback problem . If there exists a function U: Rn -* R, such that M(Pk) (Pk,U ), for all Pk E E, then U '" V, the upper value functiou of the state feedback problem.
=
Proof: It follows from lemma 3, that
(Pk, U) = M (Pk) ::: inf sup Jt' (x, u). lIeU xER"
Now consider the condition th at characterizes certainty equivalence in terms of the upper value function of the
(p.,U)::: sup .Jt'(x , i'U't)) zER"
output. feedback problem. III (Jrunes et al., 1994),(.'amcs, 1994) it is shown that certainty equivalence holds if, for allk:::O
::: sup (H(pk' U(Pk) ,y), If) tlE R t
= sup M ( Ii(Pk,U(Pk) , y»)
(5)
YER
2263
t
Hence, il is an optimal policy since (Po , U)
= M(Po),
Y P. E E. Thus,
totic stability of the closed-loop system, in the absence of any exogenous inputs (w ;;; 0), as well as achieve the
....f(p.) = sup M(H(P. ,.l(p.),y))
pre-specified attenuation levd ,. In the optimal case, it
IIER'
is observed that the controller generated by such strategies reduces to the certainty equivalence controller. In
which implies that (l'b U) = inf
doing so) one was able to obtain a more tractable version of the certainty equivalence condition stated in (.James 1 1994),(Jarnes et al., 1994). Also, the certainty equivalence condition is shown to be equivalent to the existence of a solution to a functional ,~quat ion .
sup .I~·(x,u).
uEU -;tERn
Setting, Pk
= 6~
U(x)
one obtains
t
= inf
sup {Il(x,
u, w) 12 -"'(' 1w
I' +
Future research in this area pertains to showing whether
uEU 1IJERr
(if at. all) solvability of the out.put feedback problem im-
U(f(x , u, w))) . with U(x) = M(b.) 2: (0, b.) O. Hence, If == V.
conditiOIlS have been stated ) which guarantee asymp-
= 0, and U(O) =
M(Jo)
=
Curullary 2. If there exists a p. such that M(p.) oF (Pk. V), then there exists no function Y ; Rn -+ R , such that M(p) (p, Y) for all pE 1".
plies existence of such redu('cd complexity controllers. Also, a more constructive approach to tlte problem needs to be deve loped. Finally, one can view the approach as trying to reduce complexity by conside ring storage functions that are evaluated by iuterpolation through the sup-pairing. Further investigation into alternate meth-
ods of interpolation may also prove fruitful.
=
Curollary .~. Let W be a storage function for a (nonoptimal) information state feedback policy u E I, and let W(P.) = (Pk, U), k 2: o. Then u(P') E argminuEu sUP'ER" ,J~' (x, u) solves the output feedback problem with the s t.orage function W(p). Furthermore, if one in-
5. ACKNOWLEDGMENTS This work was supported by t.he Nat.ional Science Foundation Engineering Research Center~ Program: NSFD CDR 8803012, and the Lockheed Martin Chair in Systems Engineering.
sists that W(b.) = (0", U), Y" ER", then U is a storage function for a (non-optimal) state feedback policy.
=
Also, if W J.'vf the upper value function of the output feedback problem, then the controller is a certa.inty eqllivalencf! controller.
Remark: It i. clear from the proof of theorem 3, that if (5) holds, then so does (4). However, (4) is a 1II0re tractable condition, since it does not involve the upper value function A:f, which is what onfe' is trying to avoid having to compute ill tile first place. The following alternate conditiOll is a direct consequence
of theorem 3. Curullary
6. REFERENCES
j
Basar, T. and P. Bernhard (199 1) . Hoc -optimal control and reJated minimax de.'li91' p1vblems: A dynamic game approach. Birkhau~er. Boston. James, M.R. (1994) . On the certainty equivalence principle for partially observed dynam ic games. IEEE Tronsactions on Automatic Control 39(11), 23212324. James, M.R. and J.S. Baras (1995). Robust Hoo output feedback control of nonlil1ear systems. IEEE 'lransactions on Automatic Cunt,vI40(6), 1007-10 17. James, M.R., J.S. Haras and R.J. Elliott (1994) . Risksensitive control and dY1lamic games for partially observed discrete-time flonlinear systems. IEEE
4-
(Certainty Equivalence). The certainty equivalence controller is optimal, if there exists a solution If : Rn ~ R to the functional equation M(P)
= (p, If),
T1"tlnsactions on Automatic Cot/t,..139(4),780-792. Teolis, C.A., M.R. James aI,d J.S. Baras (1993) . Implem(~ntation of a dynamic game controller for partially observed discrete-time nonlinear systems. In: Proc. .'l2nd IEEE CDe. pp. 2297-2298.
Yp E [
4. CO]\"CLUSION
Whittle, P. (198 1). Risk-Sf·nsitive linear/quadratic/ Gaussian control. Adv. Appl. Prob. 13, 764- 777.
This paper has identified a strategy for generating re-
duced complexity output feedback policies. Sufficiency
2264