Chapter II Optimal Bayesian Control of General Stochastic Dynamic Systems

Chapter II Optimal Bayesian Control of General Stochastic Dynamic Systems

Chapter I1 Optimal Bayesian Control of General Stochastic Dynamic Systems I n this chapter, we develop a systematic procedure for obtaining optimal ...

2MB Sizes 0 Downloads 109 Views

Chapter I1

Optimal Bayesian Control of General Stochastic Dynamic Systems

I n this chapter, we develop a systematic procedure for obtaining optimal control policies for discrete-time stochastic control systems, i.e., for systems where the random variables involved are such that they all have known probability distribution functions, or at least have known first, second, and possibly higher moments. Stochastic optimal control problems for discrete-time linear systems with quadratic performance indices have been discussed in literature under the assumptions that randomly varying systems parameters and additive noises in the plant and/or in the state variable measurements are independent from one sampling instant to the next.67,80T h e developments there do not seem to admit any ready extensions to problems where the independence assumption is not valid for random system parameters, nor to problems where distribution functions for noises or the plant parameters contain unknown parameters. I n this chapter, a method will be given to derive optimal control policies which can be extended to treat a much larger class of optimal control problems than those mentioned above, such as systems with unknown parameters and dependent random disturbances. This method can also be extended to cover problems with unknown parameters or random variables with only partially known statistical properties. Thus, we will be able to discuss optimal controls of parameter adaptive systems without too much extra effort. T h e method to be d i s c u s ~ e d ~partly ~ , ' ~ overlaps those discussed by other investigators, notably that of F e l ' d b a ~ r n . Although ~~ the method presented here is essentially its equivalent,105a the present method is 20

1.

FORMULATION OF OPTIMAL CONTROL PROBLEMS

21

believed to be more concise and less cumbersome to apply to control problems. For example, the concept of sufficient statistic^'^ are incorporated in the method and some assumptions on the systems which lead to simplified formulations are explicitly pointed 0 ~ t . l 5 J ~T h e evaluations of various expectation operations necessary in deriving optimal control policies are all based on recursive derivations of certain conditional probabilities or probability densities. As a result, the expositions are simpler and most formulas are stated recursively which are easier to implement by means of digital computers.

1. Formulation of Optimal Control Problems

A. PRELIMINARIES I n this section, purely stochastic problems are considered. Namely, all random variables involved are assumed to have known probability densities and no unknown parameters are present in the system dynamics or in the system observation mechanisms. We consider a control system described by

where p,,(x,) is given and observed by

and where xk is an n-dimensional state vector at kth time instant, uk is a p-dimensional control vector at the kth time instant, U , is the set in the p-dimensional Euclidean vector space and is called the admissible set of controls, ( k is a q-dimensional random vector at the kth time instant, y k is an m-dimensional observation vector at the kth time instant, and vk is an r-dimensional random vector at the kth time instant. T h e functional forms of Fk and G, are assumed known for all k. Figure 2.1 is the schematic diagram of the control system. T h e vectors f k and qk are the random noises in the system dynamics and in the observation device, or they may be random parameters of the system. I n this chapter, they are assumed to be mutually independent, unless stated otherwise. Their probability properties are assumed to be known completely. T h e problem of optimal controls with imperfect probability knowledge will be discussed in the next chapter.

22

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

CONTROLLER WITH MEMORY

I

PLANT

"I

F, ( XI

,u,C, 1

DELAY

I

XI

-

I

I

OBSERVATION Gi(Xi,Si I

Fig. 2.1.

Schematic diagram of general stochastic control system.

From now on, Eq. (1) is referred to as the plant equation and Eq. (2) is referred to as the state variable observation equation or simply as the observation equation. T h e performance index is taken to be

This form of performance index is fairly general. It contains the performance indices of final-value problems, for example, by putting W i= 0, i = 1,..., N - 1 and taking W , to be a function of xN only. We use a notation uk to indicate the collection u,, , u1 ,..., uk. Similarly xk stands for the collection x,,,x1 ,..., x,. Although in the most general formulation the set of admissible control at time k, U , , will depend on xk and uk-l, U, is assumed in this book to be independent of xk, uk-l. a. Optimal Control Policy One of our primary concerns in the main body of this book is the problem of deriving optimal control policies, in other words, obtaining the methods to control dynamic systems in such a way that some chosen numbers related to system performances are minimized. Loosely speaking, a control policy is a sequence of functions (mappings) which generates a sequence of control actions u,, , u1 ,... according to some rule. T h e class of control policies to be considered throughout this book is that of closed-loop control policies, i.e., control policies such that the control uk at time k is to depend only on the past and current observations y k and on the past control sequences uk-l which are assumed to be also observed. A nonrandomized closed-loop control policy for an N-stage

1.

FORMULATION OF OPTIMAL CONTROL PROBLEMS

23

control process is a sequence of N control actions ui, such that each ui takes value in the set of admissible control Ui, ui E Ui, 0 i N - 1, depending on the past and current observations on the system yo , y l ,..., yi-l , y i and on the past control vectors uo ,..., uiPl.Since past controls uo ,..., uip1 really depend on y o ,..., y i - l , ui depends on y o ,..., yi-l ,y i .* T h u s a control policy +(u) is a sequence of functions , ,..., +N-l such that the domain of +i is defined to be (mappings) the collection of all points

< <

+,,

yi

= (yo

,...,yi),

with y j E Y j , 0


where Y j is the set in which t h e j t h observation takes its value, and such that the range of di is Ui. Namely, ui = ui(yi, ui-l) = q&(yi)E Ui.+ When the value of ui is determined uniquely from yi, ui-l,that is when the function +% is deterministic, we say a control policy is nonrandomized. When +i is a random transformation from yi, ui-l to a point in U i , such that +i is a probability distribution on U i ,a control policy is called randomized. A nonrandomized optimal control policy, therefore, is a sequence of mappings from the space of observable quantities to the space of control vectors; in other words, it is a sequence of functions which assigns definite values to the control vectors, given all the past and current observations, in such a way that the sequence minimizes the expected value of J. From (3), using E ( - )to denote the expectation operation, the expected value of J is evaluated as

N

= 1

R,c

Essentially, the method of F e l ' d b a ~ mconsists ~~ in evaluating E( W,) by

* For the sake of convenience, initially available information on the system is included in the initial observation. + IL" = U,,(Y") = $<,(Yo), ..., = Zl,(Yi, ui-1) = U , ( Y i , $"(YO),.... L d Y " ) ) = MY').

24

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

A

---

where dxk 6 dx, dx, ... dx,, dyk--l dy, dyk-,, and duk-1 6 du, ... du,-, , and writing p ( x k ,yk-l) in terms of more elementary probability densities related to (1) and (2). Since we do not follow his method directly, we will not discuss it any further in this chapter. However, in order to give the readers some feeling for and exposure to his method, we give, as an example in Section6, his method of the treatment of a particular class of parameter adaptive systems. T h e other method, to be developed fully in this and later chapters, evaluates not R, directly but the conditional mean of Wk ,

and generates P(XK

I Y k , uk--l)

and P(Y,+, I Y k , U k ) ,

0

< fz <

-

1

recursively. See (21) and (22) for the significance of these expressions. b. Notations

It may be helpful to discuss the notations used in the book here. I n the course of our discussions, it will become necessary to compute various conditional probability densities such asp(x,+, I yi).As mentioned before, we are interested in obtaining optimal closed-loop control policies; i.e., the class of control policies to be considered is such that the ith control variable ui is to be a function of the past and current i N - 1. observable quantities only, i.e., of yi and ui-l only, 0 If nonrandomized control policies are used," then at time i, when the ith control ui is to be determined as a function of yi as ui = +i(yi),it is the functional form of +i that is to be chosen optimally, assuming +i-l are known. I n other words, +i depends on +i-l. Note that even though the function di is fixed, +i(yi) will be a random variable prior to time i since yi are random variables. It will be shown in the next section that these +'s are obtained recursively starting from +N-l on down to +". Therefore, +$ is expressed as a function of 4, ,...,+ i - l , which is yet to be determined. Therefore, it is sometimes more convenient to express ui = +i(yi) as ui = ui(ui-l,yi), whereby the dependence of ui on past controls +o ,..., di-, is explicitly shown by a notational abuse of using ui for +j , 0 6 j i. Since ui is taken to be a measurable function of yi,

< <

<

* It is shown later on that we need consider only the class of nonrandornized closedloop control policies in obtaining optimal Bayesian control policies.

1.

25

FORMULATION OF OPTIMAL CONTROL PROBLEMS

Of course, one must remember that p ( *j yi,u i ) is a function of ui , among others, which may yet be determined as a function of yi (or equivalently of ui-l and yi). T o make this explicit, sometimes a subscript +i is used to indicate the dependence of the argument on the form of the past and current control, e.g., P&Z+l

I x, Y , ) 9

= P(%+l

I x, > u,

=MY”).

When randomized control policies are used, the situation becomes more complicated since it is the probability distribution on Ui that is to be specified as a function of ui--l and y i ;i.e., a randomized control policy is a sequence of mappings , ,..., +N-l such that 4%maps the space of observed state vectors yi into a probability distribution on Ui. A class of nonrandomized control policies is included in the class of randomized control policies since a nonrandomized control policy may be regarded as a sequence of probability distributions, each of which 1 N - 1. T h e assigns probability mass 1 to a point in Ui , 0 question of whether one can really find optimal control policies in the class of nonrandomized control policies is discussed, for example, in Ref. 3. For randomized control policies,

+,

< <

hence 1 yi) is a functional depending on the form of the density function of ui , p(u, 1 yi). When ui is nonrandomized, 1 yi) is a functional depending on the value of ui and we write P@*(YZflI Y Z )= [P(YZ+l I YZ)IU,=+*(Y’) = P(Yt+l I Y t >ut

= cbz(Y”)

or simply P(Yi+l I Yi,U i ) . T h e variables ui or ui are sometimes dropped from expressions such as p ( - I yi,ui)or p ( - I yi,ui)where no confusion is likely to occur. Let p&+’(xi,yz-1) d(xi, yi-1) (5) be the joint conditional probability that the sequence of the state vectors and observed vectors will lie in the elementary volume dx, dxi dyo ... dyiPlaround xi and yi-l, given a sequence of control specified by @-I, where the notation - 1 .

d(xi, yi-1) = d ( x , )..., x, ,yo ,... yi-1) )

26

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

is used to indicate the variables with respect to which the integrations are carried out. Let P(Yk I

Xk)

dYk

(7)

be the conditional probability that the observation at time k lies in the elementary volume dyk about y k , given xk . Finally, let

be the probability that the initial condition is in the elementary volume about xo . Various probability density functions in ( 5 ) , (7), and (8) are assumed to exist. If not, they must be replaced by Stieltjes integral notations.

B. DERIVATION OF OPTIMALCONTROL POLICIES We will now derive a general formula to obtain optimal control policies. At this point, we must look for optimal control policies from the class of closed-loop randomized control policies. a. Last Stage Consider the last stage of control, assuming yN-l have been observed and u ~ have - ~been determined somehow, and that only the last control variable uN-l remains to be specified. Since uN-l appears only in W, , EJ is minimized with respect to uNPlby minimizing E W , with respect to u , - ~ . Since

R,

=

E ( W N )= E[E(W, I yNpl,u"')]

(9)

where the outer expectation is with respect to yN-l and u ~ - R~, ,is minimized if E( W , j yN--l,u ~ -is~minimized ) for every yN-l and u ~ - ~ . One can write

1.

FORMULATION OF OPTIMAL CONTROL PROBLEMS

27

We will use Eq. (13) throughout this section. Developments are quite similar when this Markov property58 does not hold. One merely uses the left-hand side of Eq. (13). See Section 2 of Chapter IV for mbre general discussions of the Markov property. I n particular, in (12),

since uNPlaffects xN but not xNPl. Define

I n (16), the probability density p ( x N 1 xN-l , u ~ - is ~ )obtainable from the known probability density function for f N - , and the plant equation (1) under appropriate assumptions on (1). See for example Eq. (27). T h e second probability density in (16), p(xN-l I yN-l, uN-'), is not generally directly available. It will be shown in the next section how it can be generated. For the moment assume that it is available.

11.

28

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

T hus A, is in principle computable as a function of yN-' and uNpl, hence its minimum with respect to u,-~ can in principle be found. . Denote this minimizing uNPl by Z L $ - ~Define min E ( W , I yN-l) "N- 1

=

yN*

(17)'

Thus, the minimization of EW, with respect to pN-l is accomplished by that of E( W , 1 yN--l,z P 2 ) , which is achieved by taking pN-l = 6(uNp1 - u $ - ~ ) Since . A, is a function of yN-l and uN--l, u$-l is obtained as a function of yN-l and uN-2 as desired. See Fig. 2.2 for illustrations of random and nonrandom control policies and the corresponding values of the conditional expectation of W, . I n Eq. (15) the expression pN-l(uN-l) represents a probability density function of uNPl E i7,-', where the functional form of the density function depends on the history of observation or on yN-l. T h e functional form of pN-l specifies the probability ~ , - ~ ( u ~& - ~,),with which a control in the neighborhood of a point uN--lis used in the last control stage. However, we have seen that this generality is not necessary, at least for the last control u , - ~ , and we can actually confine our search for the optimal u,-~ to the class of nonrandomized control policies; i.e., the value of the optimal control vector uNPlwill actually be determined, given y N - - l , and it is not that merely the form of the probability density will be determined. We can see by similar arguments that the ui are i N - 1. Thus, we can remove u,-~ from all nonrandomized, 0

< <

Fig. 2.2. ~~

E(WN I yN-') versus uNPl .

~

+ If is not unique, then the following arguments must be modified slightly. By choosing any one control which minimizes AN and concentrating the probability mass one there, a nonrandomized control still results.

1.

FORMULATION OF OPTIMAL CONTROL PROBLEMS

29

the probability density function in Eq. (11) and we can deal with p ( x N I y N - l ) with the understanding that uNPlis uniquely determined by

y N-1.

Figure 2.3 illustrates this fact schematically for scalar control variable. A typical , O ~ - ~ ( Umay ) have a form like Fig. 2.3(a), where UNp1is taken to be a closed interval. Optimal p N p 1 , however, is given by Fig. 2.3(b). A nonrandomized control is such that a point in UN-l is taken with probability I . If UNPlconsists of Points A, B , and C for two-dimensional control vectors, as shown in Fig. 2.4(a), then there are three possible nonrandomized u , , , - ~, i.e., uN-l given by Point A, Point B , or Point C, whereas a neighborhood of any point in Triangle ABC typically may be chosen with a randomized control policy with probability P , , + ~ ( U ) du, where du indicates a small area about u in Triangle ABC. This is shown in Fig. 2.4(b).

b. Last Two Stages Putting aside, for the moment, the question of how to evaluate

p(xN-l I y N - l ) , let us proceed next to the consideration of optimal control

*

PN-,

I

Fig. 2.3.

rPROBABILITY

MASS 1

Schematic representation of randomized and nonrandomized control.

Fig. 2.4. Admissible control variable with the randomized and nonrandomized control policies.

30

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

policies for the last two stages of the process. Assume that yN-2and u N p 3 are given. T h e control variable uNP2appears in WNPland W , . Since E[WN-l(XN-l

7

UN-2)

+

WN(xN

f

=

uN-l)l

+

WN-l

1 yN-27uN-3)i

WN

where the outer expectation is with respect to yN-2, and since a choice of certain uN-2 transforms the problem into the last stage situation just considered, E J is minimized by choosing uN--2such that it minimizes E( WNp1 W N1 yN-2,~ ~ - 3for ) every yNp2and by following this uN-2 by usp1 . Analogous to (15) we have

+

E( WN-, 1 YN-',

where

= p(UN--2

pN-2(uN--2)

and where

AN-^ A Also

1 yN-2>uN - 3)

j

I

~ N - i ( x N - i > uN-2) ~ ( x N - 1 xN-2

x

I yNP2,U N P 3 ) d ( x N - I

p(xN-2

E(W, I y ~ - 2 , . ~ - 3 )

= E[E(W,

(18)

AN-~PN-z dUN-2

=

ZIN-')

Ip - 1 ,

uN-2)

(19)

> xN-2)

I

~ N - z ) Y N - 2 , +31

since y N - 2 C yN-1. This is seen also from

p(' I yN-2,UN-')

=

/ p ( ' 1 yN-l, U N p 2 ) ) ( Y N - 1 1 yN-', P(UN-2)d(YN--1

UN-')

uN--2)

9

where use is made of the elementary operations (1) and (2) discussed in Chapter I. T h e optimal pN-2 is such that it minimizes E ( W N p , WN*)where the asterisk on WNis to indicate that u $ - ~is used for the last control. Now,

+

min E(WN-, PN-?

+ WN*I yNp2,uN-')

= min[E(WN-, j PN -2

yNw2,uN-')

+ E( WN*I yN-2,u"')]

=

min E[WN-, PN-2

+ E(w,*

= min PN-2

+ yN* I

= min PN-?

j y ~ - l u,~ - - 2 )

1 y ~ - 2 , +31

E[WN-, y ~ - 2 , +31

j [XN-l

+j

yN* P ( Y N - 1

PN--2 duN-Z

I U N - 2 ? YN-')

'YN-11

(20)

1.

31

FORMULATION OF OPTIMAL CONTROL PROBLEMS

I uN-2,yN-2)is available. Defining yN-, by

where it is assumed thatp(y,_, YN-1

=

'N-1

+1

YN* P b N - 1

1 YN-2, uN-2)

dYN-l

Eq. (20) is written as

Comparing this with Eq. (15), it is seen that the optimal control is such that pzP2 = S(uN-, - us__,),where usp2 is uN-2 which minimizes Y ~ - ~ , and the control at ( N - 2)th stage is also nonrandomized. c. General Case Generally, E(C;+, Wi) is minimized by minimizing E(CkN1Wilyk,uk--l) with respect plC for each y k , uk-l and following it with p+:l ,..., pzPl . I t should now be clear that arguments quite similar to those employed in deriving p$-l and p$-2 can be used to determine pk*. Define Y k

='k

+ 1Y;c*,l

p(yk

I y"',

uk-l) d y k >

< <

(21)

=0 where p(yk 1 yk-l, uk-l) is assumed available and where A, is given, assuming p(xk-l I yk--l, uk-2) is available, by Y;+l

=

(

J

wk(xk

!

uk-l)

P(.k

1 xk-l

1

.k-l)

P(.k-l

I ?-',

uk-2)

d(xk

9

xk-l>,

1
Then optimal control at time k - 1, uz-, miny, uk--l

= yk*,

1

, is Uk-1 , which minimizes yk :


(23)

By computing yk recursively, optimal control variables are derived in , * ,..., uo*. Once the optimal control policy is the order of U Z - ~ uN--2 derived, these optimal control variables are used, of course, in the order of time uO*,ul*, ..., us-, . T h e conditional probability densities assumed available in connection with (21) and (22) are derived in Section 1,C. At each time K , uo* ,..., uz-l and yo ,..., y k are no longer random but known. Therefore, uk* is determined definitely since

and

$k

is given as a deterministic function.

32

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

From (22), A, = 0 if W, = 0. Therefore, if we have a final value problem, then A, = 0, k = 1 , 2,..., N - 1 and, from (21), ylC’s are simply obtained by repeated operations of minimization with respect to u’s and integrations with respect to y’s. From (21) and (23) we have yk* = min y k

This is precisely the statement of the principle of optimalityZ0 applied to this problem where

T o see this simply, let us assume that the state vectors are perfectly observable, i.e., y . = x. I ) O
which is the result of applying the principle of optimally to

We have the usual functional equation of the dynamic programming if the {x,)-process is a first-order Markov sequence, for example, if fk’s are all independent. Th en

When the observations are not perfect, then the arguments of yk* are generally yk-l and u,-~. Th u s the number of the arguments changes with k. yN* is computed as a function of yN-l and u N - ~ and, at step k, y k in y$+, is integrated out and the presence of u,-~ is erased by the minimization operation on uk--l to obtain yk* as a function of yk-l and u k - ~As . we will discuss in Section 3, when the information in (yk, uk-l) is replaceable by that in quantities called sufficient statistic^,'^ s, , and when s, satisfies a certain condition, then the recursion relation for the

1.

FORMULATION OF OPTIMAL CONTROL PROBLEMS

33

general noisy observation case also reduces to the usual functional equation of dynamic programming

where sk satisfies the relation Sk

= $%-1

>Ylc7

Uk-1)

for some function +. For detail, the reader is referred to Sections 11, 3 and IV,2. Similar observations are valid for recurrence equations in later chapters.

C. DERIVATION OF CERTAIN CONDITIONAL PROBABILITY DENSITIES Equations (21)-(23) constitute a recursive solution of optimal control policies. One must evaluate y's recursively and this requires that the + ~ or, equivalently, conditional densities $,(xi 1 yi) and ~ , ( y ~I yi) $(xi 1 yi,ui-') and 1 yi, ui) are available.* We have noted, also, that these conditional densities are not readily available in general. T h e general procedure for deriving such densities are developed in Chapters 111 and IV. To indicate the method, let us derive these densities under the assumption that noise random vectors 5's and 7's are mutually independent and independent for each time. Consider a conditional density , yi+l 1 yi,ui). By the chain rule, remembering that we are interested in control policies in the form of ui = ~ $ ~ ( y < , 0 i N - 1,

< <

P(.i+l

I

Yitl

I Y i >4 = P(Yi+l I Y i , 4P(%+l I Yi", ui)

We can write, using (13), P(X2

7

%+l ? Y i + l

I Yit 4

* Alternately, one can just as easily generatep(x,+, j y t , ut) andp(y,+, I y ' , u ' ) recursively. They are related by P ( ~ , +I IY',

21')

=

sp(x.+l I x, , u,) p ( x , I y', u*-') dx,

34

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

where the denominator of (26) givesp(yi+l I yi, ui)and wherep(xifljxi, ui) and p(yi I xi) are obtainable from the plant and observation equations and the density functions for ti and r l i . T h e recursion formula is started from p ( x , I yo), which may be computed by the Bayes formula P(X0

I Yo)

Po(Xo)P(Yo I xo)

=-

JPo(.o) P(Yo I xo) dxo

where p,(x,) is assumed available as a part of the a priori information on the system. Equation (26) is typical in that the recursion formulas for p ( x , I yi, 1 yi,ui) generally have this structure for general stochastic and and adaptive control problems in later chapters. 1 x~i ,ui) is computed from the I n the numerator of Eq. (26), p ( ~ , + plant equation and the known density function for ti and p(yi.t-l 1 xi+l) is computed from the observation equation and the known density function for q i . T h e first factor p ( x , 1 yi, ui-') is available from the previous stage of the recursion formula. With suitabIe condition^^^.^^^" and where Jc and J q are appropriate Jacobians and where the plant and the observation equations are solved for ti and r l i , respectively, and substituted in the right-hand sides. When 4's and 7's enter into Eqs. (1) and (2) additively, then the probability densities in Eq. (26) can be obtained particularly simply

I.

FORMULATION OF OPTIMAL CONTROL PROBLEMS

35

from the probability densities for 6’s and 7’s. See Ref. 1 for multiplicative random variable case. For example, if Eqs. (1) and (2) are xk+l

= Fk(xk

Y

Y k = Gk(Xk)

then

uk)

+

+

Ek

Tk

and and q k = Y k - Gk(xJc)

are substituted in the right-hand sides of Eq. (27). Thus, if

then

and

Equation (26) indicates clearly the kind of difficulties we will encounter time and again in optimal control problems. Equation (26) can be evaluated explicitly by analytical methods only in a special class of problems. Although this special class contains useful problems of linear control systems with Gaussian random noises as will be discussed in later sections of this chapter, in a majority of cases, Eq. (26) cannot be integrated analytically. We must resort either to numerical evaluation, to some approximate analytical evaluations of Eq. (26), or to both. Numerical integrations of Eq. (26) are nontrivial by any means since the probability density function p ( x , I yi,ui-l) will not be any well-known probability density in general, cannot be represented conveniently analytically, and hence must be stored numerically. See Appendix I V at the end of this book and Chapter I11 for additional details. Also see Ref. 73a. by ) I n order to synthesize ui*,it is necessary to compute $(xi I yi,zk’ (26) and then to compute Xi+l , to generate p(yi+l 1 yi, ui), to evaluate E(yiY,, 1 yi,ui), to obtain yi+l , and finally to minimize yi+l with respect to ui .

36

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

N,ote that the controller must generally remember yi and ui-l at time

i in order to generate ui*.

Although some of the information necessary to compute ui can be precomputed, i.e., generated off-line, all these operations must generally be done on the real-time basis if the control problem is the real-time optimization problem. If K sampling times are needed to perform these operations, one must then either find the optimal control policy from the class of control policies such that ~i

= +i(y2pk,ZL-'),

i

=

k, k

+ 1,..., N

-

1

where uo* through must be chosen based on the a priori information only, or use approximations so that all necessary computations can be performed within one sampling time. I n practice we may have to consider control policies with the constraints on the size of the memory in the controller and/or we may be forced to use control policies as functions of several statistical moments (such as mean or variance) instead of the probability density functions and generate these statistics recursively. For example, ui* may have to be approximated from the last few observations and controls, say yi-l , yi , ui--2,and ui-l . T h e problems of suboptimal control p ~ l i c i e s l l ,are ~ ~ important not only from the standpoint of simple engineering implementations of optimal control policies but also from the standpoint of approximately evaluating Eq. (26). T h e effects of any suboptimal control policies on the system performance need be evaluated carefully either analytically or computationally, for example, by means of Monte Carlo simulations of system behaviors. We will return to these points many times in the course of this book, in particular in Chapter VII, where some approximation techniques are discussed.

2. Example. Linear Control Systems with Independent Parameter Variations A. INTRODUCTION As an application of the optimal control formulation given in Sections l , B and 1,C, the optimal control policy for a linear stochastic sampled-data control system with a quadratic performance index will be derived. We assume that system parameters are independent random variables, that systems are subject to external disturbances, and that

2.

SYSTEMS W I T H INDEPENDENT PARAMETER VARIATIONS

37

the state vector measurements are noisy. These random disturbances are all assumed to have known means and covariances. Specializations of this general problem by dropping appropriate terms lead to various stochastic optimal control problems, such as the optimal control of a deterministic plant with noisy state vector measurements, the optimal control of random plant with exact state vector measurements, and so on. Scalar cases of such systems have been discussed as Examples 2 4 of Chapter I. This type of optimal control problem has been analyzed by means of dynamic p r ~ g r a m m i n g . T ~ h~ e. ~key ~ step in such an analysis is, of course, the correct application of the principle of optimality to derive the functional equation. By the method of Section l,B the correct functional equations will result naturally without invoking the principle of optimality explicitly. Consider the sampled-data controi system of Fig. 2.5, where the state vector of the system satisfies the difference equation (28a), where the system output vector is given by (28b), and where the observation equation is given by ( 3 3 ) :

-

where p,(x,) is assumed given,

Ck = M k X k

where

x, is an n-vector (state vector), A, is an n x n matrix, B, is an n x p matrix,

PLANT INPUT

CONTROLLER

CONTROL VECTOR

STATE

PLANT

PLANT OUTPUT

OBSERVATION

Fig. 2.5. System with linear random plant, with additive plant disturbances, and with noisy measurement. T h e sequence of imput signals d, are generated by Eq. (34).

38

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

uk is a p-vector (control vector), u, E U, , where U, is a subset of E, ( p-dimensional Euclidean space) and is called an admissible set of controls, f , is an n-vector (noise vector), c, is an s-vector (output vector), and M , is an s x n matrix. I n (28a), A,, B, , and f k are generally random variables, which are assumed to be independent for each k. T h e (8,) random variables are also assumed to be independent of {A,} and of {B,}. T h e independence assumption on f , for each k can be weakened somewhat by introducing another random variable v, such that

where C, is a known (n x n) matrix, D, is a known (n x q) matrix, v, is a random variable assumed to be independent for each k, and independent of A ' s and B's at all times. Equation (29) is introduced to handle random disturbances on the system which is not independent in k but which may be derived from another stochastic process (v,} which has the desirable property of being independent for each k." This type of noises is not more general, since by augmenting the state vector x, with f , , Eqs. (28) and (29) can be combined to give an equation similar to Eq. (28) with an independent random variable as a forcing term. Let v, is a q-vector, and

and where x, is the generalized (or augmented) state vector.+T h e random noise in (30), 6, , is independent for each k and of random variables S, *'The noises Fs are analogous to those generated by white noise through a linear shaping filter in continuous time processes. See for example Ref. 98. t See Chapter IV for more systematic discussions of the idea of augmented state vectors.

2.

SYSTEMS W I T H INDEPENDENT PARAMETER VARIATIONS

39

and T , for all k. Thus, it is seen that, by augmenting the original equation for the system state vector by another equation describing the noise generation mechanism, it is possible to treat certain classes of dependent noises by the augmented state equation, Eq. (30), on which only independent noises act. Thus, it is no loss of generality to discuss Eq. (28) with independent ,$k for this class. Assume that the control problem is to make the system output follow the desired output sequence {d,} as closely as possible, measured in terms of the performance index J :

where w k is a functional which assigns a real number to each pair of an error vector e, 2 d, - c k and U k - 1 . For example, w k may be a quadratic form in e, : wk

= ek’Vkek

/ / ek

(32)

where v k is a positive symmetric (s X s) matrix, and a prime denotes a transpose. T h e feedback is assumed to consist of Y k = Hkxk

+

?lk

(33)

where Yk is an m vector (observation vector); i.e., the controller does not observe x k directly but receives Yk where r), is the random observation error. I n most control situations, the desired output sequence {d,} is a sampled sequence of a solution to some linear differential equation on which some noise is possibly superimposed. Assume that {d,} is generated by

where

g, F, G, l;,

ir,

is is is is is

an m’ vector, an (m’x m’)matrix, an (m’x r ) matrix, an r-dimensional random vector independent for each k, and an (s x m’)matrix.

40

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

Since most deterministic signals are solutions of linear differential or difference equations or can be approximated by such solutions, the class of desired output sequences described by (34) is fairly large. It is possible to combine Eqs. (28) and (34) into a single equation. Define X,

and

=

ek =

(iz)

(35)

where

and the generalized output of the system is given by Ek = L k X k

(37)

where

T h e performance index for systems described by (36) can be expressed as a quadratic from in X by defining a new V , appropriately when W's are quadratic in (31). For example, since ek =

di, - c k

-

= Hkgk - MkXk =

(-Mk,

I?,)&

the quadratic form (e,'V,e,) becomes

Letting the new V , be

one can write (X,'V,X,) instead of (ek'Vkek),where the new V , again is positive symmetric with dimension (m'+ n)." Thus, by suitably

* For those not familiar with operating with partitioned matrices, see for example Gantmacher."'

2.

SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS

41

augmenting the state equation for the plant, it is possible to incorporate the mechanisms for dependent noises and/or input signals and the control problems can be taken to be the regulator problem, i.e., that of bringing the (augmented) state vector to the origin in the state space. Since we are interested in closed-loop control policies, the control at the kth sampling instant is assumed to depend only on the initially available information plus y k and uk--l and on nothing else. We see from the above discussions that the problem formulation of this section with the system of (28) observed by (33) is not as restrictive as it may appear at first and is really a very general formulation of linear control systems with quadratic performance indices. It can cover many different control situations (for example, by regarding (28) as the state equation for the augmented systems). With this in mind, we will now discuss the regulator problem of the original system (28). I n the development that follows, W, of the performance index is taken, for definiteness, to be

B. PROBLEM STATEMENT Having given a general description of the nature of the problem, we are ready to state the problem more precisely. T h e problem is to find a control policy uN-l such that it minimizes the expected value of the performance index EJ

where ui E Ui, 0 given by

< i < N - 1, and 1=

N xk‘ v7cxk

1

where the performance index is

+

N-1 uk’pkuk 0

where V,’s and Pk’s are symmetric positive matrices, and where the system’s dynamics is given by

where p,(x,) is given and where A,, Bk , and (, are random variables with E ( [ i ) == 0, I?([+[$) = Q+S13, i = 0, 1,..., N - 1 (39b)

42

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

It is assumed that Sk’S are independent of all (A,, B,), that [ k and ( A , , B k )are independent for each k , and the system is observed by yk

Hkx,

+

k

TIC,

.

=

-

(40)

1

where E(qk) = 0, E(qkq,‘) R, , and for simplicity of exposition y k is assumed independent for each k and of all other random variables tk and ( A , , B,), 12 : 0, 1,..., N - 1 . R’s and Q’s are assumed known. T h e situation where [’s and 7’s are not independent can be treated also. See for example Section 3,E. We have seen in the previous section that this problem statement can cover situations with dependent noise, input signal dynamics, and others by considering Eq. (39) as the equation for the augmented state vectors, if necessary. Various conditional probability density functions and moments are all assumed to exist in the following discussions.

C. ONE-DIMENSIONAL EXAMPLE Before launching into the derivations of the optima1 control policy for the problem, let us examine its simpler version of the one-dimensional problem so that various steps involved in arriving at the optimal control policy are made clear. I n this way, we will avoid getting lost when we deal with general vector cases. T h e one dimensional problem is given with the plant equation

+

~ i + = l ~lixi

+

,

0


- 1,

U
( --co,

GO)

(41)

and the observation equation

where ai , pi, ti,and qi , 0 pendent random variables. It is assumed that

< i < N - I,

are assumed to be inde-

and that the random variables all have finite variances. Take J to be

.I= Xh,2

(44)

2.

SYSTEMS W I T H INDEPENDENT PARAMETER VARIATIONS

43

Then, according to the development in Section 1,B, in order to obtain

where

var(ai) = ui2

(484

var(Pi) = Zt2

(48b)

var(6,) = q,:

0


-

1

(48c)

Let E(x2 lYi)

and

var(xi 1 yz) = A t 2 ,

(49a)

= Pi

0


-

I

(49b)

These p's and A's may be computed explicitly with the additional assumptions on the random variables. For example, if these random variables are all assumed to be Gaussian, then they can be computed as in the examples of Section 3 . From (47)-(49), Y N ==

('N-lPN-l

+ bN-lUN-l)2 f ui-lPL-l

f A L l ( 4 L f 4-1)

f

'i-1';-I

+ d-1 (50)

44

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

Assuming pi and A , are independent of ui, y N is minimized with respect to uN-l to give (51)

= - 'hi-1PN-1

':-l

where and

min y N N-1

= yN* -

where

and P1

2 d-1

+-

+

4-1(&1

(54b)

4-1)

T h e expression for yN* can be put slightly differently, retaining the conditional expectation operation E ( . I yN-l) in the expression for yN*. I n this alternate form, (47) can be written as y N * = E[I&,

where

+

y1

I yN-ll

(55)

One can easily check that Eqs. (53) and ( 5 5 ) give the same value for

yN* since

E[I,xk-,

+

v1

I y-1

=I l P L

=4 P L l

+ &J, +

+

v1

P1

Having obtained some insight into the problem, we will treat the general case next.

D. OPTIMALCONTROL POLICY As discussed in Section 1,B, in order to determine the optimal control k N: policy for problem one must first compute A,, 1

< <

A,

= =

E(W, 1 yk--l,.k-2) Wkp(xkI y"l, u,-~) dx,

(574

2.

SYSTEMS W I T H INDEPENDENT PARAMETER VARIATIONS

45

where

x x x

1 X k - 1 > uk-l ’k-1 ‘k-1, 6k-1) P k k - 1 I Y”-l, uk-7 P ( A k - 1 Bk-1) P K k - 1 )

$(.k

7

d(xk

> Xk-l

Ak-l

Bk-l

6k-1)

(59)

AN is evaluated first. Since the mean of fN-l is zero by T o obtain Assumption (39b), the contribution of (xN’VNxN) to A, is given by

where E, is the expectation operation with respect to 5. Denoting by a bar the expectation operation with respect to the random variables A and B, we have, from (39b), (59), and (60),

By minimizing (61) with respect to u ~ - the ~ ,optimal u ~ is- given ~ by

(q99)

(E9)

I-N

N

I-N

I-N

B A

B A

I-N

,8+(

N

I-N

N

8 A

I-N

+

,8

'-"a"AI-%

= ILL

,a+[I-N 8NA I-N 8 + l-Ndl = aiayM

2.

SYSTEMS W I T H INDEPENDENT PARAMETER VARIATIONS

47

We encountered this relation earlier in Section 1 ,B. Proceeding as before, noting that now ( V N p 1 I,) corresponds to V N, PN--2to P N p 1 ,etc., the development from (60) to (66) is repeated to give

+

48

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

When pi's are computed explicitly as a function of yi and ui-l,Eqs. (73)(75d) solve the proposed problem completely. Equations (74) and (75) show that the feedback coefficients A are computable before the control process begins, i.e., off-line since they do not depend on the previous control vectors nor on the observation vectors. Note aIso that A's are not random. Computations of p's generally must be done on-line. They are computed later in Section 3 with the additional assumptions that the noise are all Gaussian random vectors. Figure 2.6 shows the configuration of the optimal control system. I n terms of p, (73) can also be written as

rz*il = Pz'IN-tPi

where

pN--i

and where

ZL =

= vN-6

+

(76a)

PN--s

+tr(IN-Jz)

a(%

- PZ)(XZ -

PJ'

I Y21

is the conditional covariance matrix of xi.

RANDOMLY VARYING PLANT WITH

I

UNIT DELAY

I I

k

I

1 OBSERVER I

ESTIMATOR

Fig. 2.6. Optimal controller for the stochastic system of Fig. 2.5 with noisy state vector measurement. See Fig. 2.8 for the schematic diagram of the optimal estimator. = - [Pk + B/(Vk+l ated by Eq. (75).

At

+ I N - ~ - ~ ) B JBk'(Vkbl ' + J N . . ~-*)Ak;{I,),i

=

1 , ..., N gener-

2.

SYSTEMS W I T H INDEPENDENT PARAMETER VARIATIONS

49

When the state vectors can be observed exactly, E(xi j y i ) reduces to xi and the term E[(xi - pi)’ ri(xi - pi) 1 y i ] vanishes in the equation for vi . Replacing pi by xi in (62)-(76), the optimal control vectors with the exact state vector measurements are given by ~ i *=

0

-Atxi,


(77)

where Ai is the same as before and is given by (7%) and

rf+l = X i ’ l N - t X i

where 6N-i

vN-i-l

with

+

f tr[( h + l

(78)

6N-i

+

(794

IN-i-I)Qi]

(79b)

== tr(VNQN-I)

Figure 2.7 is the optimal control system configuration with no observation noises. Thus, as already observed in connection with a simple system of Example 4 of Chapter I, the effect of additive noise to the system is merely to increase y * by 6. When the performance index is given by

J

N

=

C xi vkxk

1

RANDOMLY VARYING P L A N T

r--------

Fig. 2.7.

1

Optimal controller for the stochastic system of Fig. 2.5 when the state

_ _ _ _ _ _ ~

+ ~

vector measurement is exact. A k = -[Pk B%’( V,,, 4- I N - , - , ) B , ] ~Bk’( Vkil I N - % - ~ ) A ~ ; { I z } ,I = 1 , ..., N , generated by Eq. (75).

50

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

rather than by (38), the recursion formula for y * is obtained by putting all Pi equal to zero. Then, from (75c), the optimal control policy with the new performance index of (80) is given by (74a) and (7%) with Pi = 0. I n particular,

and

Equations (74) and ( 7 9 , which define recursively the optimal feedback control coefficients and the optimal criterion function values, can be put in a little more transparent forms when the system parameters A’s and B’s are deterministic and 5’s and 7 ’ s are the only random variables. From (74a) and (75c), we write A,i as A,

where N , & (P,

=

N,A,

+ B,’LN_,-iB,)+B,‘LN-,-l

(74a-1) (74a-2)

and where we give a symbol LN-i to Vi + I N p ifor ease of reference. Defining Ji by ‘N-z & At’jzA, (75a-1) we have from (75a) and (75b)

J? = LN-v-l(l

- BzNz)

(75a-2)

2.

SYSTEMS W I T H INDEPENDENT PARAMETER VARIATIONS

and LN-,

=

V,

+ A,'J,Az

51 (75a-3)

T h e recursion formulas (74a-2), (75a-2)) and (75a-3) for N's, J's and L's are initiated by putting I, = 0 or equivalently J N = 0. Then from (75a-3) Lo = V N

From (74a-1) and (74a-2)

A,-,

=

"-1A&1

=

(PN-1

+ BL-1 VNB&,)+B;-, V,A,_,

which is in agreement with (63)) taking note of the fact that AN-l and BN-l are now deterministic by assumption. By using (75a-2), we have JN-1

= L"(I

and from (75a-3) Ll

= VN-1

+

-

BN-I"-d

4,-1J,-l4-1

Now, NN-, J 2 and L , etc. are determined in the orders indicated. Later in Section 3 of this chapter as well as in Chapter V, we will encounter a similar set of recursion equations in expressions for conditional means and conditional covariance matrices of certain Gaussian random sequences. We will postpone the discussions of the significance of this similarity until then. )

)

E. CERTAINTY EQUIVALENCE PRINCIPLE If we consider a plant with nonrandom plant parameters and if E's and 7's are the only random variables in the system, then the bars over the expressions for A i Ii and r i in (75) can be removed. Since these quantities are independent of the plant noise process {ti},and of the observation noise process they are identical to the ones derived for a deterministic plant with no random input and with exact state vector measurements. As observed earlier in connection with (58)) (66), (74)) and (75), {ti}and {qi} processes affect only vi and E(x, I yi). Since the optimal control vectors are specified fully when E(xi 1 yi) are given, the problem of optimal control is separated into two parts: the estimation of the state vectors, given a set of observation data; and the determination which can be done from the correof proper feedback coefficients, {Ai}, sponding deterministic plant. )

)

(?;I,

52

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

If A’s are random but B’s are deterministic in the plant equation, then A t , li,and ri are the same as the ones for the equivalent deterministic plant = Aixi Bzui

+

T h e procedure to obtain control policies for stochastic systems by considering the optimal control policies for the related deterministic systems where the random variables are replaced by their expected ~ . ~may ~~~ values, is called the certainty equivalence p r i n ~ i p l e . ~One speak of a modified certainty equivalence principle when the random variables are replaced with some functions of their statistical moments. For systems with random A’s and deterministic B’s, their optimal certainty equivalent control policies are the optimal control policies for the same class of stochastic systems with yi = xi , i.e., when the xi are observed exactly and when E(.$J = 0, 0 i N - 1, or if yi # x i , then xi is replaced by E(xi Iyi). When A’s and B’s are both random, the optimal certainty equivalent control policies are optimal control policies for the deterministic system with the plant equation

< <

Xi+l

= &Xi

+ Bzui

For example, with

F. GAUSSIAN RANDOMVARIABLES

It has been assumed in connection with Eq. (74) that quantities E [ ( ( X i - pi)’ n N - i ( X i

are independent of xi and yi.

-

pi)) I y’]],

0

< <

3.

SUFFICIENT STATISTICS

53

Two sufficient conditions for this to be true are: (a) All random variables in the problem have a joint Gaussian distribution. (b) T h e plant and observation equations are all linear. This will be shown by computing the conditional error covariance matrix E[(xi - pLi)’(xi- pi) [ y i ] explicitly under Assumptions (a) and (b) in the next section, Section 3 . See Appendix111 at the end of this book for brief expositions of Gaussian random variables.

3. Sufficient Statistics We have seen in previous sections that u, is generally a function of y k and not just of y k . From (21) and (22) of Section 1 ,B, we note that this dependence of uk on y k occurs through p ( x , 1 y k ) and p(yk+lI y k , uk) in computing y’s. Intuitively speaking, if a vector s, a function of y k , exists such that p ( x , I y k ) = p ( x , 1 sJ, then the dependence of uk on past observation is summarized by s, and optimal u, will be determined, given s, and perhaps y k without the need of additional knowledge of yk-l. Such a function of observations is called a sufficient stat is ti^.'^ See also Appendix IV. We discuss two simple one-dimensional examples first. Those readers who are familiar with matrix operations and Gaussian random variables may go directly to Section 3, C.

A. ONE-DIMENSIONAL EXAMPLE 1 T o show that such a function exists and to see how it helps simplify the control problem solution, consider a scalar control system with a plant equation x,+1 = a,%

+ b,u, + L ,

0


-

1 , u,E (-a, a)

(87)

and the observation equation y,=h,x,+rl,,

h,#O,

OGiGN-1

Take as the performance index a quadratic form in x and u, N

(88)

54

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

where ai and bi are known deterministic plant parameters and where (’s and 11’s are assumed to be independent Gaussian random variables with E ( & ) = I?(?),)

=

0,

0


~

1

(904

E(tt2)= 4t2 > 0,

O
(90b)

> 0,

O
(904

all i and j

(90d)

E(V,2) = Yz2

E(&?)A= 0,

Assume also that x,,is Gaussian, independent of ( ’ s and 7’s with mean CL and variance 02. This system is a special case of the class of systems discussed in Section 2. Now

where Po =

(W + h , y , / ~ , 2 ) / ( 1 / 4~ 2ho2/r,2)

+ ho2/r,2

li.2

l/ao2 =

From (26) of Section l,C,

From (88), (90a), and (~OC), p ( y , 1 x,)

=

const exp

From (87), (90a), and (90b),

p ( ~ , +1 x, ~ , u,)

=

const exp

We will now show by mathematical induction on i that

1

< <

i N - 1 with appropriately chosen pi and cri. holds for all 0 This relation is satisfied for i = 0 by (91). Substituting (94)-(96) into (93) and carrying out the integration with respect to x i , ~ ( x , ,I yi+l) ~ = const exp

* Ifp(xo) = 6(x, - E ) , i.e., if we are absolutely sure of the value of xo, then E(x,, 1 y,,) i.e., the measurement y o does not change our mind about x,, = a.

= 01,

3.

55

SUFFICIENT STATISTICS

where where

and where

T h u s (96) is established for all i = 0, 1,... Note that pi and ui2 in (96) are the conditional mean and variance of x i , respectively, given yi and 2 P - l . K,+l can also be expressed as Ki+l = u:+l h,+l/~i+l. Equation (96) shows that ( p i , oi2) are sufficient statistics and contain all Equation (97) shows that the equation information carried by ( y i , &I). satisfied by the sufficient statistic pi is composed of two parts: the first part is the same as the dynamic equation of the plant, and the second part constitutes a correction term proportional to Y , + ~- hitl(api bu,) which may be interpreted as the difference between the actual observed value of xi+l and the predicted value of the observation based on the estimate pi of xi , hitl(api hi).Note that yi and ui-l are replaced by pi and ui2 and that u's are computed from the knowledge of the noise variance and are constants independent of the observations and controls. I n other words, u's can be generated off-line. We are now ready to determine U Z - ~, As usual we first compute from (87), (89), (90), and (96):

+

+

56

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

where DN

+

'N(d-1

2 2 aN--l(TN-l)

Minimization of (99) with respect to uNPlgives uz-l

=

-'N-lpN-I

where

Substituting (101) into (99) gives

(104a)

C,

T o compute similar to (99): AN-1

=

(104b)

DN

,one must compute yN--l . AN-l is given by computations

= DN-1

+ tj-,-+;-z

+

vN-~(~N-+N-~

+

~N-~uN-,)'

(105)

T h e probability density p(yN--I1 yN-2),necessary to evaluate E ( ~ , * J Y ~ - ~ ) , is given from (88) and (96) by the Gaussian probability density with mean hN-l(~N-2pN-2 b,-,~,-,) and variance

+

From (97) and (103), yN* is seen to depend on yNPlonly through pN-l , since p N - 2 are functions of yN-2and u ~ - We ~ . have E(pN-1

I yN-')

aN-z~N-2

+

and var(pN-, I yN-')

=

(106)

bN-2~N-z

Kk-l[h;-l(a~-z4.-z

+

qN--2) 2

+

%1I

(107)

3.

57

SUFFICIENT STATISTICS

T h e above process is perfectly general and one has (llla)

uz* = -A+

where

c+( v +, + T,+z)bzaz +~ Tz+2)lJ?' ('z+l

=

YZ* =

where

0

< i < N - I,

T N f l= 0 ( I l l b )

c, + T&Z2

(1 1 lc)

c, = D, + CZ+l+ T,+1K2[h,2(a:-l~:-l+ Ll)+ r,2J CN+l

Dz

= = %(422-1

and

+

2

2

~ Z - P - l ) ,

1
< <

(11 1 4 (Ille)

When ti = 0, 1 i N - 1, in (89), from (1 1I b), A , = a,/b, , and, from ( l l l a ) , aopo b,uo* = 0. More generally, Ai = aJb, and aipi biui* = 0 for all 0 6 i N - 1. Therefore, from (97),

+

+

<

p,. = K % . y1, O G i G N - 1

58

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

and, from (11la), 0

-AiKiyi,

ui* =


-

1

--aiKiyi/bi

-

is the optimal policy for the system with the plant equation (87) and the vixi2; i.e., observation equation (88) and the criterion function J = with this criterion function ui* is proportional to yi More discussions on this point are found in Section 3,D.

xf

B. ONE-DIMENSIONAL EXAMPLE 2 As a special case of the above example, consider a system x,+1

Yt

= ax, =

xt

+ buz ,

+ 7% >

b f 0,

uz E

(112)

(-a, a)

(113)

O
where a and b are known constants and where 7’s are independent random variables with E(rli) = 0, var(qi) = yi2, 0 i N - 1, and where x,,is a random variable independent of everything else. This is the system discussed as Example 3 of Chapter I, with J = xN2. There we have obtained the optimal control policy in terms of the statistics

< <

E(xZ1 y t ) = p Z and var(x, 1 y t ) = uZ2, 0


without indicating how they may be computed. With the additional assumption that x,, and 7’s are Gaussian, 2 ( x o ) = N ( a , UZ),

5?(Vi)

=

N(0, Yi”),

0


~

I

the result of Example 1 of this chapter can be used to compute these statistics. Namely, p’s and 0’s are sufficient and can be computed as

When J is given, hi is computable since p ( x , ] y i ) is known as a Gaussian probability density function with the conditional mean pi

3.

S U F F I C I E N T STATISTICS

and the conditional variance

ut2.

59

T h e conditional probability density

p ( y , k1 1 y b ,u z ) needed to compute E ( y z 2 I y % )is obtained analogous to p(,uN-l 1 y N - 2 given ) by (106) and (107) or independently as follows. From Eq. (1 13), x7

hence

I YZ)= Yz - E(rlZ I Y ?

Pz = & Z

or

Similarly

= Yz - r l z

q r l i I Yi) =Yi

-

Pi

M r l z y i ) = var(x, iyz) = uL2

From Eqs. (112) and (113), Y2+1 =

whereyit1 is a Gaussian random variable since it is a linear combination of Gaussian random variables. Now

,

Equations (1 15) and (1 16) determine p(y, 1 yi, zd) completely. T h e reader is asked to compare the effectiveness of the optimal closed-loop control policy for the system of Example 2 with that of the optimal open-loop control policy (i.e., the controls are function of y o only) with a quadratic criterion function, What difference do these two policies make in the values of E( J 1 y o )?

C. EXAMPLE 3 . UNCORRELATED GAUSSIAN NOISES T h e above examples can be extended to a system with a vector difference equation as the plant equation %+I

=

4x7

+ B7% t t z

(117)

60

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

and the vector observation equation Yz

=

+

Hzxz

(118)

rlz

where xo is assumed to be a Gaussian random variable with E(xo) = a, cov(xo) = Zo, where f ' s and 9 ' s are Gaussian random variables independent of xo with E(&) = 0 E(rl?)= 0 f G z t 7 ' )

E(rlZrl7')

= S,,87

(119)

S*,R for all i a n d j

E([,y,') = 0

where Qi and R,are assumed to be positive definite. T h e last assumption on the independence of f ' s of ~ ' iss made to simplify computations and can easily be removed with some additional complications in the derivations. This is indicated in Section 3,E. T o derive optimal control policies, one must first compute

(121)

where I yi-l, u , - ~ ) is computed recursively by (26) of Section 1, C . Actually, one could just as well derive the recursion relation forp(xi+, 1 y i ) rather than for p(x, 1 yi). See (149) in Section 3,E for derivation. T h e recursion process is initiated by computing

where by assumption po(xo)= const exp(-

and where the notation jl x 1; P(Y0

I .o>

4 j j xo - a &I)

x'Sz is used. From (1 19),

= const T -

a I1yo

-HOXO

From (123), p ( x , 1 y o )is seen to be a Gaussian:

p(.o

I Yo) = const e

v -

B II xo - Po)l;lI

11i;l)

( 124)

3.

SUFFICIENT STATISTICS

61

where and T h e detail is carried out in Appendixes C and D at the end of this chapter. See also Refs. 84-86, 141. Assume that p(x, I yi) has also a Gaussian density:

where pi & E(xi 1 y i ) and Ti cov(xi 1 yi). T h e variable p r is therefore the conditional mean and Tiis the conditional covariance matrix of x i , given the past and current observation yo ,..., yi . This is certainly true for i = 0 from (125). From (26),

= const

J

exp(-

4E

~dxi )

(129)

where

and P(Yi I

Xi>

= const exP(-

13- II Yi - Hixi l&;J

After carrying out the integration, which is shown in detail in Appendix C, one has

(1 3 2 4

62

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

and where

r;21 = Hl+lRzlHc+l+ Qi'

-

Q;'Ai(17:' $- A:Q;'Ai)-'A/Q;'

+ (Qc + AzTzA;)-'

= Nz;lR,-:,Hi+I

(132b)

This completes the mathematical induction on i and Eq. (128) has been established for i = 0, I , ... . Figure 2.8 shows a schematic diagram of a filter which generates p i . Note that in (132a) the terms Aipi Biui show that the pi+l satisfies the same dynamic equation as the plant equation plus a correction term proportional to yi+l - Hitl(Aipi Bp,) which may be interpreted as the error in predicting xitl based on the estimate pi of xi . An alternate expression for the constant multiplying the correction term in (132a) is given by (C20) in Appendix C. T h u s we have seen from (132a) that pi+l is computable given p i , yi+l, and ui . Using (132b) and the matrix identity in Appendix D is computable from Ti . Hence, pi and Ti are sufficient statistics for xi and summarizes all available information contained in yip1 on xi . There is another way of obtaining the recursion formula for the sufficient statistics. T o do this we first obtain the expression for p(x,+, I yi) in terms of p ( x i 1 yi). T h e detail is also found in Appendix C.

+

+

p(x,+' y i )

=

const exp(-

4 iI xi+'

~

Aipi

-

Bzui iI&;i)

(133)

7 - - - - -1I UNIT DELAY

I

I

I i 'i

iCi

'i+l

CONDITIONAL MEAN GENERATOR

K.=r.H.'R-' I

I

l

l

L. z 1 - K . IH .1 Fig, 2.8. Schematic diagram of the conditional mean generator (Wiener-Kalman filter). This is the optimal estimator in Fig. 2.6.

3.

63

SUFFICIENT STATISTICS

T h e last expression is obtained using the matrix identity in Appendix D, or more directly knowing that p(x,+, 1 y i ) has a Gaussian distribution. Since the conditional mean of xi+lis given by

where the independence assumptions of f ’ s and 7’s and (1 19) are used in putting the last term equal to zero. T o obtain the conditional covariance, we compute

since from the independence assumptions of f ’ s

We see, therefore, that Mi,,of (134) is given by (136) From (135) and the recursion relation of pi given by (132a), the recursion formula for vi is simply given by vi+l = Aivi

where

+ Bui + A,Ki[yi

Ki

=

riHi’RL1

-

Hpi]

(137) (138)

Note that, since Ti’s do not depend on particular y’s nor on u’s, they can be precomputed if necessary. We see that the derivation of I y”+I) by first obtaining the conditional mean and the conditional variance of 1 y i ) by (135) and (1 36) and then by making use of Bayes rule

yields the relations of (1 32a) and (1 32b) without too much manipulations of matrices.

64

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

Duality Principle T h e set of recursion relations to generate p’s and r’s bears a remarkable resemblance to that given by Eq. (75a-1)-(75a-3) at the end of Section 2,D. T h e problem discussed in this section is that of the filtering, i.e., to obtain the conditional mean and conditional variance associated with the Gaussian probability density function where the constant associated with the correction term (called the filter gain), K iand Ti are generated by the following set of equations [see Eqs. (134), (C17), and (C18)] K, Mt+i

=

M,H,‘(R,

81 f

+ HtMZHL‘)pl

AzrzAz’

r, = M,(I - H,’K,’)

where the system equations are given by (117) and (118) where 6’s and 7’s are Gaussian random variables with moments given by (1 19). T h e regulator problem discussed in Section 2,D is that of minimizing the quadratic criterion function where the plant equation is given by x,+1 = Ftx,

+ Gzu, + t,

T h e result of the minimization is expressed by

where

Therefore, by comparing these two sets of equations we can establish the correspondence between these two problems as follows K , ++ N,’ Mt ++L&r-t-i

r, I, t-,

A, HF,’

-

H , t-, G,‘ Qt t--f Vt R, p,

This correspondence is sometimes referred to as the duality prin~iple.8~” Making use of this principle, whatever results we obtain

3.

65

SUFFICIENT STATISTICS

for regulator (filtering) problems can be translated into the corresponding results for filtering (regulator) problems.

D. PROPORTIONAL PLUS INTEGRAL CONTROL Using the sufficient statistics just derived, the optimal control policy for the system of (1 17) and (1 18) can now be given explicitly with the criterion function

T h e optimal control policy for this system has already been derived in Section 2 if we take A, , B, , and Hk to be deterministic with the additional assumption that 6, and y k are Gaussian random variables with mean 0 and covariances of (1 19). Th en ui*= -flipi of (74a) still gives the optimal control policy where pi is now given explicitly by (I32a). Bars over various matrices can be removed in the expression for since A , B, and H are assumed deterministic in this section. Therefore, the optimal controller has the structure shown in Fig. 2.9, where p-generator has the structure shown in Fig. 2.9. Since ( and are now assumed Gaussian, what appeared as an assumption in Section 3-that (xi- p i ) has a conditional covariance matrix independent of xi and yi-is now one of the properties of the Gaussian random variables and this constant is Tiof (132b). Since po is proportional to y o when 2;' is a null matrix, po = r,,Ho'R,'yo

I

I UNIT DELAY

8

t I

1 I

;=fp-GENERATOR

I ui

t---+lP.

,

4

PLANT

OBSERVER

"i

Fig. 2.9. Structure of optimal controller for stochastic system with linear plant and observation equations.

66

11.

one has AoPo

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

+ Bouo*

=

AOPO

=

( A o- B o A o ) T o H ( R ~ l ~ o

-

BofloPo = (A0 - B0flo)Po

< <

Therefore, it is 'easily seen that pi is linear in y i , 0 i N - 1. T h e assumption that L;' is null implies no a priori knowledge of xo . Generally, E(xi I y i ) is some (measurable) function of y i . When Gaussian random variables are involved, we have just seen that E(xi I y . ) turns out to be linear in y i . This fact may be used to construct an approximation to E(xi 1 y i ) when the random variables are not Gaussian. From the recursion formula for p, (132a), with the optimal control ui* given by -flipi , (132a) can be rewritten as k+l =

where Cj

=

CiflPi

+ Ki+lYi+l

(1 39)

(140)

( I - KiHj)(Aj-1 - Bj-IAj-1)

and where Kiis given by (138). This can be written as

Therefore

uo* =

--floKoYo

which can be interpreted to mean that the optimal control is of the proportional plus integral type,llg where the first term in (141) gives the control proportional to the measurement of the current state vector and the second term in (141) expresses the control due to the integral on the past state vector measurements. Figure 2.10 gives a block diagram description of the proportional plus integral control generation. T h e effects of past observations are therefore weighted according to the weight C. Thus, if ll(nk$.+, Ck)Ki11 11 Kij / for i j , then the remote past measurements have negligible effects on the current control variables to be chosen. I n the extreme case C, = 0, ui* depends only on yi and past observations yip1 have no effect on ui*. If the control problem is such that

<

A,

-

B,A,

=

0

>

( 142)

3.

67

SUFFICIENT STATISTICS

I

I I f

then from (139) and (140) we see that pt = r'tH,'R;lyz

for all i

= 0,

Therefore, u,*

=

-AzI'tH,'RF1yc,

0

I , ..., N - I


-

1

is the optimal control policy. Namely, the optimal controller becomes a pure proportional controller. One-dimensional case of (142) has been mentioned at the end of Section 3,A. One sufficient condition that (142) holds is that Pi = 0 and that By1 exists for all i = 0, I , ..., N - 1. Then, Ai = B;'A, and Condition (142) will be met. a. Accurate Measurements We see from (139) that the control ui* consists of proportional part plus the integral part unless Cj = 0 of all j i - 1. We have seen one way that Cj = 0 results by having A j - BiAj= 0. Now, suppose A j - BjAj # 0. Then, unless KjHi= I,the proportional part does not disappear. Intuitively speaking, if the measurements of the state vectors are exact and there are no unknown parameters in the problem statements as we are assuming now, then the control at time i will be a function of xi alone, indicating K j H j will be equal to I under the perfect measurements. For systems with poor measurements of the state vectors it is intuitively reasonable that the controller makes use not only of the current measurements but also of past measurements in synthesizing optimal controls.11g

<

68

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

This turns out to be true, as we can see from the following. When the measurements are accurate, this will be expressed by small ~ covariance matrices Rj in some sense. Let us therefore write E R instead of Rj with the understanding that E is a small positive scalar quantity. Then, from (127a)) r - Z'; 1 H,'R;'H,) -1

+-

0

4

E

= E(H,'R;'H,

+ cZil)-'

m E(H,'R;'H,)-'

I n general, from (132b),

ri

E ( ~ i ' ~ ; l ~ i ) - l

Therefore KiH,

= riHiRi'Hi m

I

Th us - ui* = (liCLi € ( l i ( ~ i ' ~ ~ l ~ i ) - l ~ i ' ~ ~(143) l ~ i

Equation (143) shows that ui* is small compared with yi and the integral part will be negligible compared with the proportional part. b. Inaccurate Measurements Now, let us examine the relative magnitudes of the integral and the proportional parts when the accuracy of measurements is poor. We will now suppose that Ri is large or R;' is small in some sense. ' of R;', where E is a small positive scalar quantity Writing E R ~ instead as before, we now have To = (Zi' rn

+ EH,'R~'H,)-'

zo

po M E Z ~ H ~ ' R ~ ' Y ~

I n general

Equation (144)can be solved as

r,,'

=L,

3. where

69

SUFFICIENT STATISTICS

Lj

+ AjLj-lAj'

4 Q,

L-, &

Th us

zo

K i= E L ~ - ~ H ; R ~ ' I - KiH,

FV I

Th us pi = di-,H,'-,R~,yt

+

i-2 E

k=l

(Ak - B~Ak)~,,H,-,'R~~o

(145)

It is seen from (145) that the integral part is of the same order as the proportional part unless 11 fliz: (Ak - B,Ak)l/ is of the order E or less, I for for example by satisfying the inequality I/ A, - BkAk 11 1 GkGi-2.

<

E. EXAMPLE 4. CORRELATED GAUSSIAN NOISES Before closing this chapter, we shall briefly outline the derivations of sufficient statistics when &noises and 7-noises are correlated, while retaining the assumption that 8 and 7 are independent at different times. As discussed in Section 2, this independence assumption can also be dropped by dealing with augmented state vectors. See Refs. 33a and 141 for continuous-time counterparts. Instead of Eq. (119), it is now assumed that 5 and r) are jointly normally distributed with

where Qi and Ri are assumed positive definite. T h e joint probability density function €or ( t i ,.ii)has the form

P(&

7

7 , ) = const ex&

5 4 (L',7 % 'c;, ) (

2 ) )

171

(147)

where

It is convenient to work with the expression for p(xitl 1 y i ) rather than p ( x , I y i ) now since ti and qi are correlated.

70

Ir.

OPTIMAL CONTROL OF STOCHASTIC S Y S T E M S

T o obtain the recursion equation for pjx, I yf-l), consider

where the constant (a function of y i ) is determined by the relation

I n (149), p(x,+, ,yi 1 xi , yip1) is given by

Thus, we obtain the recursion formula

T h e relation to be verified by the mathematical induction is now

where (uZ , Ti) is the sufficient statistics. Substituting (152) into (151) and carrying out Integration (151), recursion formula for ui and 2, are obtained in much the same way as before. Only the result is listed here: u,+1 =

A+,

+ B,u, + K,+l(Y,

-

H,%)

(153)

4.

71

DISCUSSIONS

where

Assuming xo is independent of are given as follows: u1 =

Aoa

to,rl0 and of N(a, ,!lo), the initial values

+ Bouu + Kl(Y0

-

(155)

Hoa)

with To replaced by A, in the expression for Kl and

Z ; l .

4. Discussions There are several related classes of problems that may be investigated using the techniques developed in this chapter. We have already mentioned the desirability of investigating the control problems with control policy based on observation data k sampling time or more old

where uo , u1 ,..., u/,must be chosen from some other considerations. Then, instead of generating p ( x , Iyi) and ~ ( y , I,y~i ) , it is necessary to generate p(xi 1 y i - k ) and p (y , 1 yipk). Using these latter density expressions, the formulation of optimal control is formally identical to the one given in this chapter. T h e reader is invited to investigate the optimal control problem for the system of Section 3 , when the criterion function includes an additional term, representing the cost of computing, which may be taken to be a decreasing function of k.

72

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

Another class of problems that are amenable to anaIysis with the techniques of this chapter is that of control systems with delay. Then defining new augmented state and control vectors appropriately the difference equation for it can be put into a standard form. T h e theory can now be applied to the augmented systems. Closely related to control problems with delays either in the plants or in the observation data available for control synthesis are the problems with intermittent observation data. Although we develop in this book the optimal control synthesis method assuming that system state vectors are observed each sampling time instant, there is a class of systems, be it chemical or aerospace, where it is neither feasible nor desirable to observe the state vectors at every sampling time instant. For such systems it is more realistic to derive optimal control policies imposing some constraints on the way observations on the state vectors are performed. One such possibility is to specify the total number of possible observations for N-stage control process and to optimize E J with respect to a control policy and the spacing of observations. See KushnerS5 for a preIiminary study of such systems. A more straightforward example of systems with constrained observation schemes is a system where observations are taken every k sampling instants for some fixed k. Such a system can be treated by the techniques of this chapter by rewriting the plant (i.e., the state transition) equation in terms of the time instants at which observations are made. Another way of imposing constraints on observations is to assume that at any time i there is a positive probability that the state vector will not be observed. Such a probability may be constant throughout the process or may be modified by the control variable with possible penalty incurred for modifying the probability. See E a t ~ nfor~ such ~ analysis for purely stochastic systems. More direct constraint can be imposed on possible observation schemes by incorporating a cost associated with observation in the system performance indices. Realistically, such cost of observations will be functions of state vectors. See B r e a k ~ e l lfor ~ ~ an elementary example where the cost of control is taken to be independent of the state vectors. Note that the recursive procedure developed in Section 2,C for generating p ( x , l y i ) can be modified to generate p(xj / y i )for j < i recursively. Such conditional probability densities can be used to obtain more accurate estimate of xi based on the observations y o ,..., yi , yjtl ,..., y i , rather than on just y o ,..., yj .

73

APPENDIX A: MINIMIZATION OF A QUADRATIC FORM

Appendix A. Minimization of a Quadratic Form Consider the problem of finding u which minimizes (u, Su)

+ 2(u, Tx)

(Al)

where (., is an inner product and where it is assumed that S is symmetric and positive definite, hence S-l exists. By completing the square in (Al), a)

+ 2 ( ~ T, x ) = (U + S-lTx, S(u + S-’Tx))

(u,S U )

-

(x, I”S-’Tx)

one sees that u which minimizes (Al) is given by u which minimizes (u

+ S-lTx, S(u + S-lTx))

(A21

Since S is positive definite, (A2) is minimized by

Now consider the case where x is a random variable and it is desired to minimize Su) 2(u, T x ) ! Y1 (A5)

+

with respect to a deterministic u(y) where y is another random variable. (See Appendix I at the end of this book for a more general discussion on conditional expectations.) Then again, by completing the square in (A5),

+ S-lTx, S(u + S-lTx)) (x,T’S-lTx) 1 y } T’S-ITS) I y ] + min E [ ( u + P l T x , S(u $- S-lTx)) I y ]

I ( y ) 2 min E{(u U -

-E[(x,

-

U

(A6)

Defining another random variable w by w

=

(A7)

--S-lTx

one sees that u which minimizes I ( y ) is the same u which minimizes min E [ ( ( u - w),S(u

-

w ) ) I yl

(A81

74

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

Equation (A8) can be rewritten by defining Zit

=

E(w I y )

(A91

minE[(u-&++-zu,S(u--++-w))Iy] U

=

min{[(u - 6, S(u - Zit)) U

+ E((&

-

w),S(4 - w ) ) 1 y ] } (A10)

Thus, one sees that u = w

minimizes (A10) and (A5). T h e minimizing u is given from (A7), (A9), and ( A l l ) by -A2

(A121

SPIT and 2 = E ( x I y )

W3)

u* =

where A

=

T h e minimal value of (A5) is given, then, from (A6), (A7), and (AlO), I ( y ) = - E [ ( x , T'S-1Tx) I y ] -

- E [ ( x , T'S-1Tx)I y ]

-

-.(2, T'S-lT2)

+ E[((& w),S(&- w)) 1 y ] + E [ ( ( x a), T'S-lT(x a)) I y ] -

-

-

(A14)

Note that u* given by (A12) is such that it satisfies the equation E((u - u * , S(u* - w)) 1 y } = 0 for any u. This fact is sometimes referred to as the orthogonality principle.H"8fi,1"Y" See also Chapter V for other instances where the orthogonality principle is applied.

Appendix B. Use of Pseudoinverse in Minimizing a Quadratic Form Consider I ( u ) = (u, S U )

+ 2(u, T x ) + (x,Rx)

(B1)

where S is positive and symmetric. We know from Appendix A that when S-l exists I is minimized by choosing u to be

APPENDIX B. USE OF PSEUDOINVERSE

75

and ( B l ) becomes

z

= (X,

n II

( R - T'S-1T)x) (B3)

X II:--T,S-'T

When S-l does not exist, we will see that u* given by (B4)

u* = -S+Tx

minimizes I , where S+ is the pseudoinverse of S. Pseudoinverses are discussed in Appendix I1 at the end of this book. As it is discussed there in detail, when the pseudoinverses are involved, the minimizing u are not usually unique unless one imposes the additional condition such as the condition that I/ u I/ is also minimal. T h e u given by (B4) is the one with the minimum norm. One can write u = u1

+ u2

where (range space of S)

u1 E g(S)

and (null space of S )

u2 E M ( S )

Since S is symmetric, 9 ( S )and N ( S ) are orthogonal and we have

II 24. /I2

=

I1 u1 112 -i- I1 u2 /I2

T o derive (B4) we will rewrite ( B l ) such that it includes a term

I/ u

+ S+Tx l S 2

20

(135)

and a term independent of u. Then (B5) is never negative and it vanishes when u = u*. T h e n it will be seen that I(u)is minimized by u* of (B4). Using the identities142 S+SS+= s+

(S+)'= (S')+= s+

Z(U) = / / u

+ S+TX/Is2+ 11 x

IIi-yStT

4-~ ( T x(Z, - S+S)U)

I n (B7), note that

(Z - S+S)u = u1 + u2 - S+S(U, + U 2 ) = u1 = u2

+ u2

-

u1

( B6a)

(H6b) (B7)

76

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

since S+S restricted to a ( S )is the identity.142 Also,

u1 =

-S+Tx

From the requirement of the minimal norm of u one has u2

=0

Thus, U* =

-S+TX

Appendix C. Calculation of Sufficient Statistics I n this appendix an integral of the form J exp( - & Ei) dxi is evaluated where Ei is given by (1 30). By completing the square, Ei can be rewritten as

Let

APPENDIX C. CALCULATION OF SUFFICIENT STATISTICS

77

After integration with respect to xi,

J exp (-

+-I

dx,

=

const exp

(C4)

where

and where MZl

= Q;’

-

Q;’A,niA/QF1

(C6)

By substituting (C2) into (C6) and using the identity of Appendix D, Mzl1 can be shown to be equal to

MT21 = (Qi

+ AiTiA;)-’

(C7)

T h e first term I/ xifl - A i p , - B,U~II&;;~ in (C5) is the result of evaluating JP(X,+~ 1 xi , u,) p(x, j y t ) dx, = ~ ( x , , I ~y”. Therefore, from ( C 5 ) one sees that the conditional distribution of xifl given yi is normal with and

E(x,+1 I Y 2 ) = AtPt

+ Btu, + AtrtAt’

cov(xz+l I Y’) = Mz+1 = Qt

T h e expression E,‘ of (C5) can be rewritten as

(C8) (C9)

78

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

where

TZl

g M;:

+ Hi+lR&Hi+l

Hence

Ki+l 6 ri+lHt!+lR;:l

((34)

is the optimal gain of the filter. There is another expression for pi+l given by Pi+l = G+W%!+lR$lYi+l

+ M,-:,(AiPi + mi)]

(CW

since ri+1 =

(M21

+ fc+lqlHi+l)-l

by definition (C11). An alternate expression for Ti+l can be obtained directly from its definition = cov(xi+,l yi+l). We do this by computing Ti5 cov(xi) and noting that Ti= Ti for all i. Since pi+l = Aipi Biui Ki+l(yi+i- Hi+i(&i Biui)), where &+I is the gain of the filter, the estimation error satisfies I

+

+

+

79

APPENDIX D. MATRIX IDENTITIES

therefore, noting that EZ+ = 0 for all i, E(X“i+lX”C+l) G C+l = ( I - Ki+lHi+l)(Aimi’ @)(I - K+lHi+l)’

+

+ Ki+&+&+l

(C16)

Note that the expression for pi+lgiven by (C16) is valid not only for the optimal filter gain Ki+, = ri+lHi+lR;tl but for any arbitrary gain. in (C16), By completing the square in Ki+,

C+l = (KZ+l - K,*,l)C,(Ki+l e + d ’+ [ I - G+&+lIK+l -

K,*,, & Mi+lHi+lCC1

ci 2 Hi+lMi+lHi+l + Ri+1

(C18a)

(C18b)

Thus the norm of pi+, is minimized by choosing Ki+l to be equal to K:+l since Ci is positive definite. By the matrix identity in Appendix D and by the mathematical induction we see that

-

Ti= Ti

for all

i

(C19)

T h e equivalence of (C18a) and (C14) is established by means of the matrix identity of Appendix D. Now IW~+~H~+~CL’ - ri+lH,!+lR;:l =

M~+~H~!+ ~( cM; ~~ + ~~ i

=

M,+,Ht!+,,[C;l - R;:

=

Mi+lH&l(Cil[Ri+i ~c+lMi+lHt’+lIRG1l - K;i}

=

Mi+lH,!+l{CYIC$T~l- RT,!’}

Appendix D.

+

+ l ~ ~ ! + l ~ ~ l ~ , + ~ ~ i + l ) ~ t !

+ C;lHi+lMi+lHl+lRzl] =0

Matrix Identities

T h e following matrix identities are often useful in obtaining equivalent, computationally convenient expressions for error-covariance matrices and gain matrices.

+

( A BCB’)-1 ( A + BCB’)-1

=

A-1 - A-lBC(C

=

A-’ - A-1B(C-1

+ CB’A-lBC)-lCB’A-l + B’A-lB)-lB’A-l

80

11.

OPTIMAL CONTROL OF STOCHASTIC SYSTEMS

where the indicated inverses are assumed to exist. These are due to Hou~eholder.’~ T h e proof is by direct substitution. There are similar identities involving pseudoinverses: A+ A+

+ BCB‘ = [ A + BCB’ = [ A

ABC(C + CB’ABC)-’CB’A]+ - AB(C+ + B’AB)-’B’A]f

-

where the indicated inverses are assumed to exist. These identities are due to F a r r i ~ o n . ~ ~ Another useful formula is the expression for the inverse of a matrix in terms of submatrices: