Generalized Riccati Equations and Stabilization of Stochastic Systems

Generalized Riccati Equations and Stabilization of Stochastic Systems

Copyright @ IFAC Control Applications of Optimization, St. Petersburg, Russia, 2000 GENERALIZED RICCATI EQUATIONS AND STABILIZATION OF STOCHASTIC SYS...

1MB Sizes 2 Downloads 87 Views

Copyright @ IFAC Control Applications of Optimization, St. Petersburg, Russia, 2000

GENERALIZED RICCATI EQUATIONS AND STABILIZATION OF STOCHASTIC SYSTEMS Tobias Damm Institut fur Dynamische Systeme, Universitiit Bremen, FRG [email protected]

Abstract: This paper is concerned with mean-square stabilization of stochastic linear systems by linear state feedback. We derive a generalized Riccati-type matrix equation to characterize stabilizability and develop a method to solve this equation. Numerical examples are given to illustrate our results. Copyright @ 2000 IFAC Keywords: stochastic systems, feedback stabilization, Newton method.

1. INTRODUCTION

an example, that our method can also be applied to solve Riccati-type matrix equations occuring in the suboptimal HOO-type stabilization problem.

We address the problem of stabilizing stochastic linear systems with state- and input-dependent noise. While in the deterministic case one can reduce linear systems to normal forms, from which their stabilizability properties can be read, to our knowledge there are no general stabilizabilty criteria available in the stochastic case. On the other hand the problem of optimal stabilization of stochastic systems (in the LQsense see e.g. (Wonham, 1968), or in the Hoo_ sense see e.g. (Hinrichsen and Pritchard, 1998) , (Ugrinovskii, 1998)) has been studied over more then thirty years; but in addressing the optimal stabilization problem one usually assumes a nonoptimal stabilization to be already given.

2. STOCHASTIC CONTROL SYSTEMS Consider the linear Ito differential equation dx(t)

= Ax(t)dt + Bu(t)dt

(1)

N

+

N

L A~x(t)dwi(t) + L B~(t)u(t)dwi(t) i=l

i=l

where (A, B) E K n x n x K

nxm

and

(A~,B~) E K nxn x K nxm ,

i

= 1, . .. ,N.

Our aim is to derive an algorithm that can be applied without any a priori knowledge on the stabilizability of the system to compute a stabilizing feedback-gain matrix if the system is stabilizable.

The (Wi(t))tER+ are uncorrelated normed zero mean real Wiener processes on a probability space (0 , .1', J.L) with respect to an increasing family (.1't)tER+ of a-algebras .1't C .r.

To this end we consider the Riccati equation of the stochastic LQ-problem. Thus, actually, we are concerned with an optimal stabilization problem, too. To solve the Riccati equation we present a nonlocal convergence result for Newton's method in a partially ordered space. The idea is, that after an appropriate rational transformation, we can solve the Riccati equation by Newton's method starting at a scalar multiple of the identity matrix. We illustrate our results by some numerical examples from the literature. In particular we give

Let L!(R+, Km) denote the corresponding space of non-anticipating stochastic processes u with

lIu(')lIh

~ c (1IlU(t) 1I2 dt) <

00,

where £ denotes expectation. The process u is considered as the control input of the given system.

437

It is known from It6-theory, that for all (xo, u) E Kn x L!(R+, Km) there exists a unique solution x(·, xo, u) of (1) with initial value x(O, xo, u) = Xo·

N

S(X)=XB+ L:ArXB~+So, i=l

N

Remark 1. By our assumption the co variance matrix Q of the N -dimensional Wiener process [W1 (t), .. . ,WN(t)]T is the identity. This is not a restriction, for if Q :j; I, we can consider uncorrelated and normed linear combinations of the Wi, if we transform the matrices Ab and B~ as follows:

[A~ B~]:1 = (VQ®In) [A~ B~]:1

Q(X)

i=l

The matrices A, B, Ab and B~ are the same as in (1), whereas Po E ll n , So E K nxm and Qo E ll m are the blocks of a given weight matrix

'

M

dom'R.

Definition 2. A solution x(',xo,u) (with given initial vector Xo and control law u E L!(R+, Km) is said to be (exponentially mean square) stable if x(', xo,u) E L!(R+,K n ) for all Xo E Kn, or equivalently if there exist K, W > 0

K n , t ~ 0; Ellx(t)1I2 ~

Ke-

> 0.

= {X E ll n I det Q(X) :j; O}.

(i) System (1) is open-loop stabilizable. (ii) System (1) is stabilizable by static linear state-feedback. (iii) The Riccati-type matrix equation 'R.(X) = 0 has a positive definite solution X > O.

2wt llxoIl 2.

Moreover, if X > 0 solves 'R.(X) = 0, then the static linear feedback control u = Q(X)-1S(X)*X minimizes the cost functional

We call system (1) (open-loop) stabilizable if for all Xo E Kn there exists a control u:z: o E L!(R+, Km) such that x(·, Xo, u:z: o ) is stable.

-E/

*

00

J( xo, u ) -

Finally the system (1) is called stabilizable by static linear state-feedback if there exists a matrix F E Km x n such that the closed loop system

[x(t;xo,u)] u(t)

M

[x(t;xo,u)] dt u(t)'

o This result in an infinite-dimensional Hilbert space setting can be found in (Tessitore, 1992). For the convenience of the reader we sketch the proof for our finite-dimensional case in the Appendix.

N

= (A + BF)xdt + 2)A~ + B~F)xdwi i=1

is internally stable.

In the following we will deal with the matrix equation 'R.(X) = 0, which we call the stochastic algebraic Riccati equation. This equation combines features of both the continuous-time and the discrete-time algebraiC Riccati equation from deterministic LQ-control and, in fact, contains these as special cases.

In the theory of deterministic linear systems it is well-known, that open-loop stabilizability implies stabilizability by static linear state-feedback and both are equivalent to the existence of LQ-optimal stabilizing state-feedback controls. The latter condition again can be expressed equivalently via the solvability of a certain algebraic Riccati equation.

A first analysis of the stochastic algebraic Riccati equation seems to have been undertaken in (Wonham, 1968).Basically Wonham proposed a non-local Newton iteration to solve the equation 'R.(X) = 0 under the assumption that the system (1) is stabilizable and a stabilizing (in the sense of Def. 10 below) initial matrix Xo is given. The same method for the deterministic Riccati equation was proposed e.g. in (Coppel, 1974).

To state an analogous result for the stochastic case we introduce the Riccati-type rational matrix operator 'R.(X) = P(X) - S(X)Q(X) -1 S(X)*.

~~]

Theorem 3. The following are equivalent:

The system (1) is said to be internally (exponentially mean square) stable if for all Xo E K n the uncontrolled solution x(·, xo, 0) is stable.

dx

= [:~

It is clear that the rational matrix operator 'R. is well-defined and analytic on the domain

where ® denotes the K ronecker product, and [A~ B~]:1 E KnN x(n+m).

VXo E

= L:BrXB~ + Qo ·

(2)

Here P, S and Q are affine linear operators on the real vector space ll n c Knxn of n x n Hermitian matrices (with entries in K = R or K = C) of the form

But unlike in the deterministic case there are no simple criteria available, whether a given stochastic system is stabilizable at all; and if the system is stabilizable it is also not clear, how to find a stabilizing matrix and thus an initial matrix for the iteration.

N

P(X)=A*X+XA+ LArXA~+po, i=1

438

To circumvent these difficulties, we present a general framework for the non-local Newton iteration to be applicable. Then we show that after a certain transformation the equation R(X) = 0 can be solved by this method without any previous knowledge on the stabilizability of (1).

(i) (ii) (iii) (iv)

+ IT is stable, i.e. a(£ + IT) C C_ . -(£ + IT) is inverse positive. a(£) C C_ and p (£-1 IT) < 1. 3X > 0 : (£ + IT)(X) < O. £

Remark 1. If £ = £A(X) = A- X + XA and IT(X) = ITAo (X) = L:[:1 Ab - X Ab with matrices A and Ab as in (1) then any of the conditions in the theorem is equivalent to the internal stability of system (1), (Khasminskij, 1980).

3. NEWTON'S METHOD IN A PARTIALLY ORDERED VECTOR SPACE In (Damm and Hinrichsen, 1999) we have shown that the non-local Newton iteration for the different types of Riccati equations relies only on a few properties of a certain class of concave operators in a partially ordered vector spacej therefore this method can be applied to more general matrix equations.

3.2 Concavity, stabilizability, and Newton's method

Let 9 be a nonlinear Frechet-differentiable operator from some open domain dom 9 C 1I. n to 1I. n . Let further dOII4 9 be some nonempty open convex subset of dom9. By 9'x(H) we denote the derivative of 9 at X in direction H.

We briefly summarize the main facts and definitionsj for convenience we restrict ourselves to the partially ordered space of Hermitian matrices. Let lI. n c Kn x n (K = R or K = C) as above denote the real space of real or complex n x n Hermitian matrices. By 11.~ = {X E 1I. n X 2': O} we denote the closed convex cone of nonnegative definite matrices and by int (11.~) its interior, i.e. the open cone of positive definite matrices. The cone 11.~ induces a partial ordering on lI. n : we write X 2': Y, if X - Y E 11.~.

Definition 8. The operator 9 is said to be dom+ 9 -concave on dom 9 if for all Y E dom 9 and Z E dom+ 9

I

9(Y) - 9(Z)

+ 9ir(Z -

Y) 2':

o.

In geometric terms 9 is dom+ 9-concave on dom 9, if the graph of 9 over dom+ 9 lies below all tangents to the graph of 9 at arbitrary points in dom 9. Thus dom+ 9-concavity on dom 9 implies concavity on dom+ 9.

3.1 Resolvent positive operators

Example 9. Consider the Riccati operator R from (2) and let dom+ R = {X E lI. n Q(X) > O} C dom R . Then the operator R is dom+ Rconcave on dom R, (Damm and Hinrichsen, 1999).

I

An important tool in our approach is the theory of positive operators in ordered vector spaces based on the work (Krein and Rutman, 1950).

Definition 10. The operator 9 is said to be stabilizable if there exists a matrix X E dom 9, such that a(9'x) C C_. The matrix X is then called stabilizing (for 9).

Definition 4. A linear operator 7 : lI. n --* lI. m is called positive (7 2': 0) if it maps 11.~ to 11.,+. A linear operator 7 : lI. n --* lI. n is called inverse positive if it is invertible and 7- 1 is positivej it is called resolvent positive, if for all sufficiently large a > 0 the resolvent operator (aI - 7)-1 is positive.

Example 11. The Riccati operator R from (2) is stabilizable if and only if the underlying system (1) is stabilizable. A stabilizing matrix X yields the stabilizing feedback-gain matrix F = Q(X)-IS(X)*, (Damm and Hinrichsen, 2000).

By a(T) we denote the spectrum of 7, and by p(T) = max{I.Alj.A E a(7)} the spectral radius. Example 5. (i) Let Ao E Knxn, then the operator IT : lI. n --* lI. n defined by IT(X) = AoX Ao is positive. IT Ao is nonsingular, then it is also inverse positive. (ii) All positive operators IT are also resolvent positive, since for a > p(IT) the resolvent (aI - II)-1 = L:~o a-(k+l)ITk is positive. (iii) Given A E Knxn, the associated Lyapunov operator £A: lI. n --* lI. n , £A(X) = A-X + X A, is resolvent positive but, in general, not positive.

Theorem 12. Let 9 and the sets dom 9 and dOII4 9 be given as above and assume that the following conditions hold: (a) The set dOII4 9 is saturated above, i.e. dom+ 9 = dom+ 9 + 11.~. (b) The operator 9 is dom+ 9 -concave on dom 9. (c) For all X E dom 9 the derivative 9'x is resolvent positive. Assume further that 9 is stabilizable and let Xo be stabilizing. If the inequality 9(X) 2': 0 has a solution X in dOII4 9, then the iteration scheme

Theorem 6. (Schneider, 1965) Let £ : lI. n --* lI. n be resolvent positive and IT : lI. n --* lI. n be positive. Then the following are equivalent:

439

defines a sequence (X k) in dom 9 with the following properties:

a stopping criterion, and regard the system as not stabilizable if e.g. det Yk < c for some k.

(i) Vk = 1,2, ... : X k ~ X H1 ~ X and thus X k E dom+ g. Moreover a(g~k) C C_. (ii) (X k ) converges to a limit matrix Xoo E dOII4 9 that satisfies g(Xoo) = 0 and is the largest solution of g(X) ~ O.

4. EXAMPLES We first consider some simple examples that illustrate differences between the stochastic and the deterministic case. Then we apply our algorithm to some problems from the literature.

If the inequality g(X) ~ 0 is not solvable in dom+ g, then either (i) fails, i.e. for some iterate X k we have X k rt dom+ 9 or a(Q~J rt. C_; or (ii) fails i.e. the X k diverge to 00 or the limit matrix Xoo is a boundary point of dom+ g.

4·1 The scalar case

Consider the scalar stochastic system

It can be shown, that the operator n satisfies the assumptions (a), (b), and (c)j but if we want to apply the theorem in order to solve the equation R(X) = 0, we need to find a stabilizing matrix Xo which might be hard.

dx

with real numbers a, b, 00, and boo By Thm. 3 this system is stabilizable if and only if there exists an fER such that the closed-loop system

3.3 An algorithmic test for stabilizability

dx

Instead of the operator n we consider on domg = {Y E ll n detY f. 0,y-1 E domn} the transformed operator g(y)

= _YR(y-1)y.

= int 1£+, then it can be shown that the triple (Q, dom g, dom+ g) satisfies the conditions (a), (b), and (c) of Thm. 12. Since our proof (in particular of (c» involves lengthy technical computations, the details shall be omitted here. Moreover a direct calculation (Damm and Hinrichsen, 2000) shows that

If we set dom+ 9

Hm ..!.. a

a-++oo

g~I = -Lp, -s Q-'s· 0

0

0

0

= (a + bf)x dt + (ao + bou) dw

is stable i.e. 2(a+bf)+(ao+bof)2 < 0 (by Thm. 6). But such an f cannot exist if bo f. 0 and the discrimant of the quadratic equation is negative, i.e. -2ab~ + 2baobo + b2 < O. Thus stabilizability in the presence of control dependent noise is not a generic property.

I

g: domg -t lln,

= ax dt + bu dt + aox dw + bou dw

4.2 A non-stabilizable system

Now we consider a system with state-dependent noise only:

dx

.

= Ax dt + Bu dt + Aox dw .

One might ask, whether controllability of the pair (A, B) is sufficient for the system to be stabilizable. This is not the case. As an example

Since M > 0 it follows that also Po - 8 0 Qo180 > 0, whence -LPo-SoQo'So is stable. Thus for large a > 0 the matrix Yo = aI stabilizes g, i.e. a(Q~I) C C_ and can serve as a starting point for the Newton iteration.

we choose A

= [~

n, [n, B

=

= [~ ~].

Ao

The pair (A, B) is controllable, but the system is not stabilizable. To see this we consider the operator LA+BF + IIAo : 1£2 -t 1£2 as in Rem. 7, with F = rh, hJ E R1X2. Since 1£2 C R 2X2 is a three-dimensional space, we can represent this operator by the matrix

To stabilize system (1) one can thus proceed as follows: • Choose a matrix M > 0 (e.g. M = I) and consider the corresponding operator g. • Find a > 0 large enough, S.t. a(Q~I) C C_. • Start the standard Newton-iteration (3) for 9 with initial matrix Yo = al. • Watch in each step, whether the iterate Yk is stabilizing for g. If one Y k is not stabilizing, then (1) is not stabilizable. • If the Y k converge to some Y00 E int 1£+ then (1) is stabilizable and F = Q(Yc:;l )8(Yc:;1). is a stabilizing feedback-gain matrix. If otherwise the Yk leave int 1£+ or converge to some then (1) is not stabilizable. Yoo E

But det(MF - I) = 2(11 - 12 - 1)2 ~ 0, whence for all F the matrix MF (and thus the operator LA+BF+IIAo) has an eigenvalue >. with Re>. ~ l.

In extreme cases it is, of course, numerically hard to decide whether Xoo E or notj one can use

Though we can move the eigenvalues of LA+BF as far as we wish into the left half plane, we cannot

1-h 2+~212 ]ER

MF=[11 3 1 211

with respect to the basis

81£+

81£+

440

3X3 ,

[~ ~], ! [~ ~ ], [~

n.

stabilize the operator £A+BF + nAo. H, however, we decrease the noise intensity and replace Ao by aAo, then our test produces a stabilizing feedback 2 for a < 1/2. For a 2 ~ 1/2 our test suggests, that the system is not stabilizable: The iterates converge to o.

Biswas shows that the system can be stabilized for noise intensities a 2 < 3.865, whereas our algorithm produces stabilizing feedback-gain matrices up to the value a 2 = 5.306.

We also might alter the matrix A to A - 01. By the above discussion it is clear, that for 0 < 1/2 the system cannot be stabilizable. For 0 ~ 1/2 + ~0-1O, however, our algorithm (starting at Yo = I, If M = -I) finds a stabilizing feedback, whereas for 0 = 1/2 the iterates converge to O.

4.5 A matrix equation from robust control

H e.g. 0

To show the applicability of our method in robust stochastic control, we consider a linear model (of a two-mass spring system) from (Ugrinovskii, 1998), where further details can be found:

= 1/2+10-6 , then for F = [1225.7,1226.7]

= (Ax + Blv + z = Cx + Du.

we havemaxRea (£A-a/+BF + IIAo) ~ -2.10- 6 •

4·3 A

dx

We want to find a feedback-law u = Fx, such that the closed-loop system is internally (Le. for v == 0) stable and the effect of the disturbance v E L2(R+, Ri) on the output z E L2(R+, Rq) is small; see also (Hinrichsen and Pritchard, 1998). This problem is a stochastic analog of the Hoo_ control problem and leads to the parametrized matrix equation (where the attenuation value "{ is an upper bound for the effect of v on z) :

linear oscillator with uncertain parameters

In (Biswas, 1998) the following model of a harmonic linear oscillator with random damping and restoring forces was analyzed:

The parameters ai, a2 and a3 are uncertain and modelled as uncorrelated random Gaussian processes with mean values 0.1 = -1, 0.2 = 2, and 0. 3 = 1 and covariance a 2 • The model can thus be written in the form (1) with A = B

=

[n,

AA

= [~

'R-y(X)

[~1 ~],

+ X A + A~X Ao + C*C ,,{-2 B1Bn

(4) X = 0,

to be solved in int 1ft..

~], A5 = [~ ~ ]. Bo = [~] .

As in Sec. 3 we have the transformed operator

I

det Y t- O}, and with on dom 9 = {Y E 1fn dom+ 9 = int 1ft. the triple (g, dom g, dom+ g) satisfies (a), (b), and (c) in Thm. 12.

Our method yields a slight improvement and produces a stabilizing feedback for a 2 = 0.2265.

To solve (4) we thus need to find an Yo E domg, such that the derivative

An example of satellite dynamics

Biswas (1998) also considers a model for the dynamics of a satellite. For details he refers to (Sagirow, 1970). The linear model has the form

[

= A* X

- X (B 2 (D* D)-l Bi -

Biswas derives a sufficient criterion for stabilizability, which allows him to stabilize the system up to noise intensities a 2 < 0.2247.

4·4

B2U)dt + Aoxdw ,

~~ ] = [ ~1 -c(l ~ ( +

2 )]

(II and £ as in Rem. 7) is stable. H (A, C) is observable, it can be shown, that such an Yo exists. Obviously Yo can be chosen independently of "{.

[~~]

For the data in (Ugrinovskii, 1998):

[6~t) 6~t)] [~~] + [1 + °17(t) ] u .

A

= [J/4

5~4 ~ !].

5/4 -5/4 0 0

Ao

=

[-~/4 1~4 ~ ~]. 1/4 -1/4 0 0

[~f] = [~ ~ ~ n· c = [~ !~ ~ l = [~j.

Here 6 and 6 are correlated zero-mean Gaussian processes with covariances £(€l> = a 2 , £(€i) = . (4a 2 +2(4 )2, and £(66) = 2a 2 c; the parameter c is taken as c = 0.4266, and 17 is a zero-mean Gaussian process with covariance 0.1.

D

=

we found Yo 2I - (er e4 + er el) (with canonical unit vectors ei) to be stabilizing. For "{ = 2 our algorithm reproduces (in 10 steps) the solution obtained by Ugrinovskii; by a bisection search we found the optimal attenuation value to be "{* ~ 1.8293. For "{ = 1.8293 the solution

Note that in this example the noise terms are correlated. We can, however, transform the system as indicated in Rem. 1 and apply our results to the arising system with uncorrelated noise terms.

441

_

X -

-1 ~

Yoo

~

4.6440 -6.4837 -4.7299 [ -3.3828

-6.4837 19.1359 17.4201 10.4627

- 4.7299 17.4201 18.9092 9.8687

-3.3828] 10.4627 9.8687 7.0731

Wonham, W .M. (1968) . On a matrix Riccati equation of stochastic control. SIAM J. Control Optim ., 6, 681-698.

>0

of R.y(X) = 0 is obtained in 13 steps, whereas for = 1.8292 the 11th iterate Y u is not stabilizing (i.e. by Thm. 12 in this case g.., (Y) = 0 is not solvable in int 1£+.).

"I

Appendix A. PROOF OF THEOREM 3 We only sketch (i)::} (iii) . For T > 0 we consider the finite-horizon cost-functional

5. CONCLUSIONS Jr(xo, u)

We have developed an algorithmic method to decide, whether a given linear stochastic control system is stabilizable, or not. This method can also be applied in problems of optimal and robust stochastic control, as we have demonstrated by means of numerical examples.



j o

T [...

]* M

[x(t~(t)' . xo

u)] dt ,

and on dom+ R :> 1£+. (see Ex. 9) the differential Riccati-equation P = R(P) with initial condition P(T) = 0 E dom+ R. By time-invariance there exists a number il > 0 independent of T, such that the solution PT(t) = P(t; T, 0) E dom+ R exists on ]T - il, T] . For 0 ~ t ~ T < il we can therefore consider a control of the form

6. ACKNOWLEDGEMENTS This work was supported by the Deutsche Forschungsgemeinschaft (D FG).

where Ut(t) E L~([O,T]) is arbitrary. An application of Ita's formula, like e.g. in (Hinrichsen and Pritchard, 1998), yields for all T E [0, il[

REFERENCES Biswas, S.K. (1998) . Robust stabilization of linear systems in the presence of Gaussian perturbation of parameters. Optim. Contr. Appl. Meth., 19, 271-286. Coppel, W.A. (1974) . Matrix quadratic equations. Bull. Aust. Math. Soc., 10,377-401. Damm, T. and D. Hinrichsen (1999) . Newton's method for a rational matrix equation occuring in stochastic control. Report 443. Institut flir Dynamische Systeme, Univ. Bremen. Damm, T. and D. Hinrichsen (2000). On the parameter dependence of a class of rational matrix equations occuring in stochastic optimal control. In: Proc. MTNS 2000. Hinrichsen, D. and A.J . Pritchard (1998). Stochastic Hoo. SIAM J . Cont. , 36, 1504-1538. Khasminskij, R.Z. (1980) . Stochastic Stability of Differential Equations. Sijthoff & Noordhoff, Alphen aan den Rijn, NL. Krein, M.G. and M.A. Rutman (1950). Linear operators leaving invariant a cone in a Banach space. Amer. Math. Soc. Transl., 26, 199325. Sagirow, P. (1970). Stochastic Methods in the Dynamics of Satellites. Springer-Verlag, New York. Schneider, H. (1965). Positive operators and an inertia theorem. Num. Math., 7, 11-17. Tessitore, G. (1992). Some remarks on the Riccati equation arising in an optimal control problem with state- and control- dependent noise. SIAM J. Control Optim., 30(3), 717-744. Ugrinovskii, V.A. (1998). Robust Hoo-control in the presence of stochastic uncertainty. Internat. J. Control, 71(2), 219-237.

Jr(xo, u)

= X~PT(O)XO T



j ui

(t)Q(PT(t»Ut (t) dt .

o

As Q(P) > 0 for P E dom+ R the cost functional is minimized if Ut O. Obviously PT(O) > 0, since JT(XO , u) > 0 for all Xo . Moreover PT(t) = PT-t(O) is uniformly bounded for t E]T - il, TJ, since min JT(XO, u) ~ J(xo, u xo ) for all T > 0, Xo E K n , and stabilizing U xo E L2(~,Km). Therefore the solution PT can be extended to an open neighbourhood of T - il, and it follows that for all T > 0 the solution PT exists in fact on ] - 00, T]. For T' > T > t we have PT' (t) > PT(t), since Jr(xo, u) increases with T. Therefore PT(t) converges to some Poo > 0 as T ~ 00, where Poo is independent of t. By continuity X = Poo solves R(X) = 0 and yields an LQ-optimal feedback control.

=

442