Reduced Complexity Nonlinear H∞ Controllers: Relation to Certainty Equivalence

Reduced Complexity Nonlinear H∞ Controllers: Relation to Certainty Equivalence

Copyright © 1996 IFAC I3lh Triennial World Congrc.~, 2b-12 1 San FranciS(:o. USA REDUCED COMPLEXITY NONLlNEAR H= CONTROLLERS: RELATION TO CERTAIN...

279KB Sizes 0 Downloads 73 Views

Copyright © 1996 IFAC

I3lh Triennial World

Congrc.~,

2b-12 1

San FranciS(:o. USA

REDUCED COMPLEXITY NONLlNEAR H= CONTROLLERS: RELATION TO CERTAINTY EQUIVALENCE John S. Baras and Nital S. Patel

Department of Electrical Engineering and Institute for Sy.tems &search University of Maryland, College Park, MD 20742

Abstract. This papt~r considers the problem of constructing reduced cOIU}Jlexity controllers for output feedback nonlinear Hoo control. Conditions are obtained under which such controllers achieve the closed-loop perfonnance requirements. These controllers are non-optimal in general. However, in case optimality holds, they are in fact the certainty equivalence controllers. Conditions under which certainty equivalence holds are simplified, an d linked to the ,olvability of a functional equation. Keywords. Nonlinear control systems, Robust control, H-infinity cOIltrol, Discrete-time systems, Disturbance rejection, Game theory.

traIler. All implementat.ion of a certainty equivalence controller can be found in (Teolis et al., 1993). 1. INTRODUCTION Since, Whittle (Whittle, 1981) first postulated the mill' imum stress estimate for the solutioll of a risk-sensitive stochastic optimal control problem, it has evolved iuto the certainty equivalence principle. The latter states that under appropriate conditions, all optimal output feed· back controller can be obtained by inserting all estimate of the state iuto the corresponding state feedback law. The certainty equivalence property is known to hold for linear systems with a quadratic cost (Dasar and Bemhard, 1991). The recent interest in nonlillear H 00 control has led researchers to examine whether I certainty equivalence could be carried over to non linear systems. If certainty equivalence were to hold, it would result in a tremendous reduction in the complexity of the problem. In a recent paper (Janw' et al., 1994), sufficient conditions were given for certainty equivalence to hold ill terms of a saddle point <:ondition. Also, ill (James, 1994), a simple example is given to demonstrat.e the non-optimal nature of t.he certainty equivalellce cou-

This paper, considers the infinite time case, and deals with establishing sufficiency conditions for a reduced complexity controller to exist. These conditions apply for bot.h optimal and llon-Ot, timal policies. In general, obtaining an optimal solution to the output feedback problem, involves solving aB infinite dimensional dynamic programming equation (.James and Baras, 1995). Hence, one may he &'\.tisfied with a reduced complexity non-optimal policy, which guarantees as)lIlptotit: stability of the nominal (no exogenous inputs) closed-loop system , as well as achieves a pre-specified disturbance attenuation level f. III the special case, it is shown that the policies so obtained are Gertainty equivalence policies. Furthermore, in doing 00, one obtains an equivalent sufficiency condition for certainty equivalence which may be more tractable than the one given in (James et al., 1994). The approach is h",ed on establishing di,sipati vity results, since these guarantee under detectability assumptions, asymptotic stability of the closed-loop system wheH exogenous inputs are :lero. Lastly, it is shown that the condition for certainty equivalence to hold is

2260

equivalent to the existence of a (unique) solution to a functional equation.

PH] = H(Pk,U,.,YH1), k=O,l, ...

Po

E

E

where 2. PROBLEM STATEMENT

"

H(PkoUkoYk+,)(X) = snp {Pk(~) + sup ClI(~,UkoW)

Consider the following system:

(ERn

n.,

l.: {

XHl = !(X" W,) , Y'+l = g(X.,Uk,W.) Zk+l = l(xklukl11Jd , k = 0, 1,2, ...

sup

sup {pa(:co)

+ L I zi+] I' -

sup {VO(x)

+ iJu' (x)}.

1'1

The problem is solved via dynamic programming, where the upper value function M satisfies

hO

"{' I Wi I'}:S

III

X = !(~,Uk,W)'YHl = 9(~,Uk,W))}.

where, Xk E Rn are the states) Yk E Rt are the measurements, 'Uk E U C R'm are the controls, Zk E Rq are the regulated outputs, and 'Wk E R r an~ the exogenous inputs. Furthermore, assume that 0 is an equilibrium point of L:, and U is compact. Denote the set of feasible polic;es as 0, i.e. if u E 0, then ?1k = U(Yl ,k, UO,k-I), where Si,j denotes a sequence {Si I Si+l, . .. , Sj}. Also assume that !, g, and I are continuous. The output feedback problem is, given 'Y > 0, find a control policy u· E 0, so as to ensure that there exists a finite f3u· (x) 2: 0, .B"· (0) = 0, such that

'wEP([o,oo),R") xoER"

I' -,,{21

wER"

(1)

M(p) = inf sup {M(H(p, u,y))}

(3)

UEU YER'

for all p E E, with M(p) 2: (1',0), and M( -iJU) = 0, for some .il"(x) 2: 0, iJ"(O) = O. In particular, M(P) is the least possible worst case cost to go, given Po = p. Now, supposing such a solution M exists to equation (3), one has the following result. Theorem 1. ((James and Baras, 1995)). Let u' E 0, be such that uk = U(Pk), when, U(Pk) achieves the mini· mum in (3) for P = Pk. Then u' E 0 solves the output feedback problem. Here, Pk is the information state trajectory initialized by Po = _/1", and is such that M(p.) is finite for all k.

zERn

where, Po E E, with E defined as E

~ {p E C(Rn) I p(x) :S R for some finite R 2: OJ

Here, 1·1 denotes the Euclidean norm. Also assume that for such u· E 0, Et!'" is z-detcctable. Equation (1) is based on the dynamic game interpmtation of the nonlinear Hoc control problem. Tt ensures that if Xo = 0 then sup wE"([o.oo).R"),w,",o

IlzlI,' < "I Ilwll" -

(2)

F\lrthermore, define the following sup-pairing 6

(p, q) =

snp {p(x) + q(x)} xER

Such a policy, obtained via the dynamic progranuning equation (3) is called an optimal policy. Now, assume u(P) is a non-optimal policy, t.hen there exists a function W : E -+ R, W(p) 2: (p, 0), and W( -iJU) = 0 (iJU(x) 2: 0, iJU(O) = 0), and W satisfies for all P E E W(p) 2: snp W(H(p,u(P),y)) yER

Such a W is called a storage function for the output feedback policy tt. Conversely, if such a function exists, then the corresponding control policy u(P) solves the output. feedback problem. In the well known, state feedback case, denote by V the upper value function of the state feedback problem. Furthermore, V 2: 0, V(O) = 0, and V satisfies

n

Vex) = inf I] -00

if f = else

sup {11(x. u, w) 12 -"{' I w 12

uEU wER n

and the function 6x E [., r5;r : Rn ---+ R"'

Jx(~) ~ {

t

+

l'(f(x,u,w))} X

for all x E Rn. The polic)'

Up,

such that UF(X)

u"', where u"' E U achieves the infimum in the above

An information state based solution was recently obtained in (James and Baras, 1995). The information state is defined by the following recursion

equation is called an optimal state feedback policy. For non-optimal state feedback policies u, there exists a U : Rn -+ R, with U 2: 0, U(O) ~~ 0, and satisfies

2261

U(x) 2: sup {ll(x,u(x),W)

12 -"/ I w 12 +

sup .J~'(x,u) 2: sup (H(Pk,u,y),U).

wERn

:z:ER"

IIERt

U(J(x, U(X), w))} for all x E Rn. Such a U is called a storage function for the state feedback policy u.

Proof:

sup (Pk+I, U) yER

From now on, define 'I cO, to be the set of output feedback policies which have the sHparated structure, Le. depend only on the informatioIl state Pk. Such policies are called information state feedback policies. The control policy generated by the dynamic programming equation (3) is of this type.

=

t

xER" {ER"

..f' I w 1'1 x

s

"( of the closed-loop system (equatioll (2)).

sup

S sup

h(f(~,u,w),w,~)

wERr

Proof: For any f: > 0, there exists x€ E Rn, and w€ E x') (i.e. with x' = f(~, u, w') such that

{j(~, u,

sup

h(x,w,~)
Proof: (Pk,Ul2: inf

=hU(~,u,VJ'),w',~)

S sup

h(f(~,u,1lI),w,~)

Define,

=

+E

X

2: sup (H(p., ,,(pd, v), U), via lemma 2 VERI

Furt.hermore, (Pk, U) 2: (Pk, 0), and ( -U, U) = O. Hem:e, (pk, U) is a storage function, and il is a (non-optimal) solut.ion to the output feedbaGk problem, with the infor-

U -+ R as

J~(x,u) ~ {p(x) + sup r {Il(x,-u, w)

sup J~'(X,U(Pk)) zER"

> 0 is arbitrary, the result follows.

Jf, : Rn

sup .J~'(,·,u)

ttEU zER"

+E

wERr €

"(' I w I' +U(x)}

then ,1(Pk) E argminuEu sUPxER" .J~' (x, u), solves the output feedback problem, and the associated storage function is W(Pk) = (pk,U).

xER" O({,U,x)

Since,

12 -

E U, and a given function

zER" tl'EO(x,tL',{)

sup

= f(~,u w) + U(x)} sup {Pk(~)+ Il(~, u, w)

Theorem 2. Given U : Rn ---) R, U 2: 0, and U(O) = 0. If for all Pk E [

n~sult.

h(x,w,~)

sup

I' -

The following theorem, give~ a sufficient condit.ion for the existence of dissipative TE·duced complexity policies.

R' I x == f(~,u,w)}.

U

sup

(ll(~, u, w)

trJERr

sup sup {Pk(~)+ Il(~, n, w) I' {ER" wER' "('I VJ 12 +U(f(~,u,w))}, via lemma 1 = sup .J~(~, u) (ER"

for reduced complexity control policies, which preserve the stability properties, as well as the attenuation level

Lemma 1. For any ~ E Rn, h:RnxRrxRn-+R,

12 -

{ER" zER" wEil(E,tt,z ;

mensional in general. Hence, this motivates the search

Then, one has the following

(ll(~,u,w)

wERJ'

U(X)}

The dynamic programming equation (3), is infinite di-

E

+ sup

EER"

< SUp SUp {PkW + SUp

3. REDUCED COMPLEXITY CONTROLLERS

{j(x,u,~) ~ {w

ZERTI.

"(' I w 1'1 X = f(~, u. w),y = g(~, U,W»+

= sup

For a given x ,~ E Rn 1 and u E: U I define

sup (Pk(~)

sup sup

YER t

I' -

mation state trajectory initialized via Po =

-u.

wER

"(2

I w I' +U(J(X,U,W)))

Remark: One could have considered any Uk such that

The following result is needed to establish conditions

for the existence of reduced complexity policies which achieve the desired closed-loop performance.

Lemma 2. For any

11.

E U, U : Rn -+ R, and Pk E E,

Corollary 1. (Certainty Equivalence). Given U =: V, the upper value function of the state feedback prob-

2262

lem, and the optimal state feedback policy UFo If for all P. E E sup Jr.' (x,u)

(Pk, V) = inf

(4)

uEU 7:ER"

then U(Pk)

= UF(X),

where x E arg max' ER' {Pk(X)

+

Vex)), is an optimal control policy for the output feed-

In general, since the traject.ory PI. is not known a prim-1,

one needs to check for all P. E E. In fact one ca.1l show that if this were the case, then V is in fact the only function which will satisfy (5). To do so, one requires the following inequality.

Lemma 3. Let il E I, with W its storage function. Then

back problem.

W(Pt)::: inf sup Jt'( x , u ), k= 0, 1, ... uEU ~ E Rn

Proof: Clearly (4) implies that

where, U(x) ~ W(5 x ). sup .J~' (x, llF(X)) = sup inf J~' (x, u) :r. ER'"

xER" uEU

=

inf

Slip

Proof:

.Tt~(X,U)

00

l{EU;r;ER"

W(p.)::: sup {P. (x)

Hence 1 a saddle point exists, and so for any

X E arg max (P.(x) + Vex)) , and U 'E R "

zER"

72

= UF(X) ,

=

I

+

w, 11 2

sup {Pk(X)

(Pb V) = J~'(x,u) = sup Jr' (x, It)

X)

+

Slip

'«'leER

7' I Wk

xER"

I Z'+ 1 I' -

wEI 1«(O,oo),R r) i=k

It "

x ERn

L

su p

I' +

r

L"

(I l (x ,u(pk),w.)

I " +1 I'

- 7'

I' -

lw, 1'1

i=$;;+1

::: sup (If(Pk,u ,y),V)

XHl

lIERt

::: sup {Pk(X)

=

Hence, W(P.) (Pk, V) is a storage function , and W (O,) = Vex), the optimal cost of the state feedback problem. lIellce, the policy is optimal for the output feedback problem.

In general, conditions for the optimal policy m aybe diffi· £:tllt to est.ahlish . However, there may exist non-optimal sta.te feedback policies such that their storage fullctions satisfy the conditions of theorem 2. In that case, using such reduced complexity policies based on the llOlloptimal state feedback policies will guarantee that th e system is asymptotically stable whenever the exogenous inputs are zero. Moreover, such policies will also ensure that the closed-loop system sat.isfies the attenuation level 7 (eqlla~ion (2) ).

+ sup

zER"

(I l (x, ii(Pk), w)

I' -

1J..'ERr

7' I w 12 +U(j"(x,il(Pk),W)))} ::: inf sup {Pk(X) -

sup

'«EU XER'"

1"

Remark: It is sufficient that t.he conditions in theorem 2 and corollary 1 hold only for all Pk , k = 0, 1, .... If this

is the case, then U need not be a storage function for the state feedback problem. It is only when one needs the conditions to hold for Pt E {o, I :r. E R n} that U is for ced to be a storage function.

= f (x,u(p.),wd)}

=

inf

(I (X,ll,W) I' -

wE R"

I w I' +U(f(x,u,w)))} sup .Tt/' (x. v }

uEU xER"

Theorem S. (Unicity). Let M be the upper value function of the output feedback problem . If there exists a function U: Rn -* R, such that M(Pk) (Pk,U ), for all Pk E E, then U '" V, the upper value functiou of the state feedback problem.

=

Proof: It follows from lemma 3, that

(Pk, U) = M (Pk) ::: inf sup Jt' (x, u). lIeU xER"

Now consider the condition th at characterizes certainty equivalence in terms of the upper value function of the

(p.,U)::: sup .Jt'(x , i'U't)) zER"

output. feedback problem. III (Jrunes et al., 1994),(.'amcs, 1994) it is shown that certainty equivalence holds if, for allk:::O

::: sup (H(pk' U(Pk) ,y), If) tlE R t

= sup M ( Ii(Pk,U(Pk) , y»)

(5)

YER

2263

t

Hence, il is an optimal policy since (Po , U)

= M(Po),

Y P. E E. Thus,

totic stability of the closed-loop system, in the absence of any exogenous inputs (w ;;; 0), as well as achieve the

....f(p.) = sup M(H(P. ,.l(p.),y))

pre-specified attenuation levd ,. In the optimal case, it

IIER'

is observed that the controller generated by such strategies reduces to the certainty equivalence controller. In

which implies that (l'b U) = inf

doing so) one was able to obtain a more tractable version of the certainty equivalence condition stated in (.James 1 1994),(Jarnes et al., 1994). Also, the certainty equivalence condition is shown to be equivalent to the existence of a solution to a functional ,~quat ion .

sup .I~·(x,u).

uEU -;tERn

Setting, Pk

= 6~

U(x)

one obtains

t

= inf

sup {Il(x,

u, w) 12 -"'(' 1w

I' +

Future research in this area pertains to showing whether

uEU 1IJERr

(if at. all) solvability of the out.put feedback problem im-

U(f(x , u, w))) . with U(x) = M(b.) 2: (0, b.) O. Hence, If == V.

conditiOIlS have been stated ) which guarantee asymp-

= 0, and U(O) =

M(Jo)

=

Curullary 2. If there exists a p. such that M(p.) oF (Pk. V), then there exists no function Y ; Rn -+ R , such that M(p) (p, Y) for all pE 1".

plies existence of such redu('cd complexity controllers. Also, a more constructive approach to tlte problem needs to be deve loped. Finally, one can view the approach as trying to reduce complexity by conside ring storage functions that are evaluated by iuterpolation through the sup-pairing. Further investigation into alternate meth-

ods of interpolation may also prove fruitful.

=

Curollary .~. Let W be a storage function for a (nonoptimal) information state feedback policy u E I, and let W(P.) = (Pk, U), k 2: o. Then u(P') E argminuEu sUP'ER" ,J~' (x, u) solves the output feedback problem with the s t.orage function W(p). Furthermore, if one in-

5. ACKNOWLEDGMENTS This work was supported by t.he Nat.ional Science Foundation Engineering Research Center~ Program: NSFD CDR 8803012, and the Lockheed Martin Chair in Systems Engineering.

sists that W(b.) = (0", U), Y" ER", then U is a storage function for a (non-optimal) state feedback policy.

=

Also, if W J.'vf the upper value function of the output feedback problem, then the controller is a certa.inty eqllivalencf! controller.

Remark: It i. clear from the proof of theorem 3, that if (5) holds, then so does (4). However, (4) is a 1II0re tractable condition, since it does not involve the upper value function A:f, which is what onfe' is trying to avoid having to compute ill tile first place. The following alternate conditiOll is a direct consequence

of theorem 3. Curullary

6. REFERENCES

j

Basar, T. and P. Bernhard (199 1) . Hoc -optimal control and reJated minimax de.'li91' p1vblems: A dynamic game approach. Birkhau~er. Boston. James, M.R. (1994) . On the certainty equivalence principle for partially observed dynam ic games. IEEE Tronsactions on Automatic Control 39(11), 23212324. James, M.R. and J.S. Baras (1995). Robust Hoo output feedback control of nonlil1ear systems. IEEE 'lransactions on Automatic Cunt,vI40(6), 1007-10 17. James, M.R., J.S. Haras and R.J. Elliott (1994) . Risksensitive control and dY1lamic games for partially observed discrete-time flonlinear systems. IEEE

4-

(Certainty Equivalence). The certainty equivalence controller is optimal, if there exists a solution If : Rn ~ R to the functional equation M(P)

= (p, If),

T1"tlnsactions on Automatic Cot/t,..139(4),780-792. Teolis, C.A., M.R. James aI,d J.S. Baras (1993) . Implem(~ntation of a dynamic game controller for partially observed discrete-time nonlinear systems. In: Proc. .'l2nd IEEE CDe. pp. 2297-2298.

Yp E [

4. CO]\"CLUSION

Whittle, P. (198 1). Risk-Sf·nsitive linear/quadratic/ Gaussian control. Adv. Appl. Prob. 13, 764- 777.

This paper has identified a strategy for generating re-

duced complexity output feedback policies. Sufficiency

2264