A contribution to the theory of optimization with Markovian controls

59 INFORMATION SCIENCES A Contribution to the Theory of Optimization with Markovlan Controlst R A Y C H O W AND S. S A N K A R SENGUPTA Department ...

Download PDF

728KB Sizes 0 Downloads 20 Views

Report

PDF Reader
Full Text

59

INFORMATION SCIENCES

A Contribution to the Theory of Optimization with Markovlan Controlst R A Y C H O W AND S. S A N K A R SENGUPTA

Department of )~Ianagement Sciences University of Waterloo Waterloo, Ontario, Canada Communicated by Mihajlo D. Mesarovic

ABSTRACT Conditions have been obtained in Theorems I and 2 for the solution u*(-) of the problem minimize J(u) = f L(x, u) dt subject to

dx]dt = f(x, u),

x(to) = xo

to be characterized by a non-linear, first-order differential equation, called the optimal differential equation. These conditions are not affected by the imposition of constraints, such as c~ < u < c2. But the trajectory of u*(t) has to be reinterpreted, and the durations of time during which u*(t) makes a sojourn between the boundaries and stays on the boundaries have to be computed. The uniqueness of these durations is considered inTheorem 3. The optimal differential equation can always be discretized in time and phase, to obtain the system of difference equations u*+~= Fl(u**) for uk* E I, an interval in the phase space. It is shown that when the Fl(.) are linear, the system 1~ is a Markov chain. Some analysis is made of the efficiency of discretization in terms of certain invariants of the system; the theorems relating to this analysis are stated without proof.

INTRODUCTION P o n t r y a g i n ' s m e t h o d [1] f o r d e t e r m i n i n g a c o n t r o l f u n c t i o n u(t) s u c h t h a t

J(tt) -- / L(x, u, 7) dr

(1)

10

t The preparation of this paper was partially supported by Grant No. A7416, N.R.C. of Canada and Grant No. 68-1130, Canada Council.

:

bformation Sciences 3 (1971), 59-75 Copyright 9 1971 by American Elsevier Publishing Company, Inc.

60

RAY CHOW AND S. SANKAR SENGUPTA

is minimized subject to the state dynamics dxldr = f (x, u, r),

X(to) = xo,

(2)

leads to an ordinary minimization problem, namely, max H(x, u,p, r)

(3)

where H(x, u,p, r) - - L ( x , u, r) + p "f(x, u, r).

The magnitude, u*, of the optimal control at any instant of time depends on the magnitudes of x and p at that instant of time. An interesting problem is to determine whether or not one can express u* at any hlstant h~ terms o f u* at some previous hlstant(s) o f thne, i.e., whether or not the u* appropriate to a given problem can be characterized as the solution to a well-defined differential or difference equation, preferably of the first order. It is clear that not all control problems will possess the characteristics which permit such a determination of u*(.). The task of the present note is to examine certain sufficient conditions which characterize the class of control problems in the indicated manner. The practical significance of the problem we have posed here may be appreciated in light of the following considerations. If u* can be characterized by means of a (non-linear) differential equation, then the sequence of optimal controls can be determined automatically, once the initial magnitude of u* has been computed; in other words, there will be no need to calculate u* from the state and costate variables at every hlstant of thne. In particular, one need not assume that precise measurement o f u* is necessary. It is possible to discretize the differential equation for u* in time, such as u*(t + 1) = F(u*(t)),

(4)

and to discretize the state-space of u*, such that the mapping F generates certain transition probabilities, Pba --= prob [u*(t + 1) ~ alu*(t ) ~ b].

(5)

The second purpose of this note is to present some sufficient conditions for 1=(.) to generate Markov transition probabilities. The practical significance of securing Markov chain should be clear: The opthnal control can be realized as a random device which will generate the probabilities with which the optimal control action of the next period will be determined, given the present state of the optimal control. 1. STRUCTURAL CHARACTERISTICS

We will determine those structural properties of a control process which make it possible to characterize the optimal control, u*, as the solution to

61

A CONTRIBUTION TO TIlE THEORY OF OPTIMIZATION

a differential equation, preferably of the first order. The general method is suggested by a careful analysis of the structure of Pontryagin's method. 1.1. Pontryagin's Maximum Principle states that, if u is a control action governing the state dynamics

dX/dt = f ( X , u, t) u = {u,(t),..., urn(t)},

X = {xl(t) . . . . . xn(t)},

(1.1)

m < n,

and if the control objective is one of minimizing tl

J(u) -- ~ L(X,u,T)dr,

(1.2)

Lo

then the optimal control tt* is given by maximizing

i.e.,

H ( X , u,p, t) = - L + p .f,

(1.3)

H(X*,p*, u*, t) = max H(X*,p*, u*, t),

(1.4)

u6U

the starred functions denoting the respective optimal trajectories. Furthermore, for all u--and, hence, u*--the Hamiltonian H ( X , u , p , t ) is to satisfy the pair of conjugate equations,

dx, ldt = OHlOp,,

dp,/dt = - a H l O x , .

(1.5)

The standard solution procedure is to determine tt relationship between u and p, such that the Hamiltonian is maximized. The steps are: (i) solve for tt in terms o f p , (ii) substitute u for p in the set of conjugate equations (1.5) and (iii) solve the latter to satisfy a set of mixed boundary conditions, (iv) use the associated optimal state and costate trajectories X*(t) and p*(t) to compute the optimal control function u*(t). The precise manner in which we will exploit this general procedure is the following: After determining the relationship between u and p which maximizes the Hamiltonian, we will substitute p--not tt, as in the standard procedure--into the conjugate equations. If the Hamiltonian, i.e., the associated physical process, contains certain characteristics (to be presented in Theorem 1), then we can derive the required differential equation. Since our aim is to exhibit Markov probabilities and since these are most easily studied for the case of a single measurable function, therefore, the following analysis will be restricted to the case of control problems involving onlyonestate variable. Also, for the sake of simplicity, we will consider processes whose dynamics is described by an autonomous differential equation. 1.2. In the following theorem we present a simple condition for an optimal control u*(t) to be characterized by means of a well-defined differential equation.

lnfornzation Sciences 3 (1971), 59-75 3

62

RAY CHOW AND S. SANKAR SENGUPTA

THEOREM 1. Let Ku be the class of control processes with the atttononzous state d),namics equation dx/dt = g(x) + s(x) h(u)

(I.6)

and with the functional t!

J(u) - f [re(x) + s(x) N(u)] dr

(1.7)

to

to be mhshnized. Let (i) g(x), re(x) and s(x) be pol)vtomials hi x of an order not exceeding 2, and let s(x) not vanish identically, (ii) u(-) be contitlltOttS and not subject to ally constrahlts, and (iii)

H(x, u,p) - -re(x) + s(x) N(u) + p[g(x) + s(x) h(u)]

(1.8)

be strictly convex in u, conthmous, and differentiable. Then a unique differential equation hl u* can be obtahled and a sohttion of this differential equation is the optimal control fimction. Proof. Consider a single-variable autonomous process with state dynamics dx/dt = f (x, u)

(1.9)

and the criterion to be minimized, |1

J(tt) --- f L(x,u)dt,

(1.10)

tO

and let it be required to choose some u = u*(t) which optimizes (1.10) subject to (1.9). The Hamiltonian associated with this problem is H(x, u,p) =- - L ( x , u) + p f ( x , tO. Since, according to the hypothesis, H is continuous, differentiable, and convex in. It, therefore, according to the Maximum Principle, the optimal st is to satisfy the equations OH

0u =

OL Of 0u I - P ' O u = 0 '

tlp dt

OH aL Of Ox = O x - P ' O x "

(l.ll, 12)

Because u is assumed to be unconstrained and because H is strictly convex in u, therefore, u(t) determined from (1.1!, 12) is unique and optimal.

A CONTRIBUTIONTO THE THEORYOF OPTIMIZATION

63

Now, observe that, if af/att does not vanish for all u, then (1.11) can be solved for p, to obtain

aL/ i, = a , , / a , , '

dp d[aL/al] ~TL~/~J"

~7 =

Substituting these in (I. 12), one obtains the identity,

,,,[aLlay 1 aL aL/a,, a/ atL a.lazd

ax

aflau ax'

which can be written in a more symmetric form, namely,

af.d[aL/au]

a. ,'tLaf/a.J

a f aL

a f aL

a.. ax

ax a."

Writing g(x)+s(x)h(u) for f ( x , u ) and m(x)+s(x)N(u) expression (1.13) becomes

ah.,,faN/a, [a,,, + N ( , , ) ~ ] ~ah a,, dtt ah/a,,j1 = Lb-Ex -[~

(1.13) for L(x,u), the

asiaN + h(u)" ~-xxJ~u"

(1.14)

Let us now examine the implications of hypothesis (i). Suppose that g(x), re(x), and s(x) are polynomials of order one or less in x; it follows that (1.14) is afirst-order differential equation in u, a n d t h i s u is u* because it satisfies the Maximum Principle. Next, suppose that one, or all of the functions g(x), m(:0, and s(x) are second-order polynomials in x. Then, again, (1.14) is solvable for x in terms of u and du/dt to obtain, say,

x = G(u, cht/dt).

(1.15)

Between (1.15),its time-derivative and the state dynamics equation we can make the necessary elimination and obtain a second-order differential equation in tt*, of the form

d G(u*, du*/dt) =f(G(u*, du*fllt), u*). This completes the demonstration. The proof shows that there must be some restrictions on the relationship between N(u) and h(u). One aspect of the restrictions is already given in an hypothesis of the theorem, namely, that the Hamiltonian must be convex in u. In addition, we see from identity (1.14) that afirst-order differential equation is obtained whenever (a) ah/att is not identically zero, i.e., h(u) is at least linear in u, and (b) (aN/au)/(Oh/au) is not identically a constant. 1.3. The proof of Theorem 1 shows that, in order to obtain afirst-order differential equation in "u*, the Hamiltonian must be a linear fimction o f the

hiformation Sciences3 (1971), 59-75

64

RAY CHOW AND S. SANKAR SENGLIPTA

state variable x for only then will (1.14) be independent of x. The converse of Theorem 1 can similarly be proved. However, in this case, the necessary conditions on the structure of the dynamic system and the minimizing criterion can be relaxed to a certain extent. This is the content of TIIEOREM 2. Let

(a) O(du*/dt, u*)=O,

or

(b) O(d2u*/dt2,du*/dt, u*)=O

(1.16)

be a differential equation in u*, whose sohaion is the opthnal control fimction for a system dxldt = f ( x , tO (1.9) with the criterion fimctional It

a(u) = f L(x, tOdt.

(1.10)

tO

Then,f (x, tO and L(x, u) are of the form f ( x , tO = g(x) + s(x) It(tO,

L(x, tO - re(x) + s(x) N(u)

ht which h(u) attd N(u) are nowhere vanishhtg and ht which the fimcti~ re(x), and s(x) are polynomials in x.

(l. 17) g(x),

Proof. Since there is a differential equ~ition in u*, it follows that u(t) must be continuous over some domain U x T a n d differentiable there, which again implies that (a) L(x, tO is not identically a constant, and (b) the Hamiltonian H = -L(x, tO + pf(x, tO is not linea r in u for, otherwise, it can readily be proved that the optimal control would be apieeewise-constant control function. One can now proceed as in the proof of Theorem 1 and derive the identity,

afdpL/au]

~La--fj~J

OfOL Ou ax

Of OL Ox au"

(I.13)

Now, depending on the structure o f f ( x , zO and L(x, tO, one can rewrite (1.13) in any one of the following equivalent forms. (a) Q(dxldt, x, du/dt, l0 = O, or

or

(b) Q*(x, u, du/dt) = O,

(c) Q(du/dt,tO = O.

(1.18)

The problem, then, is to determine the structure o f f ( x , t0 and L(x,u) such that (l.18) can be reduced to either of the forms (1.16).

A CONTRIBUTIONTO THE THEORYOF OPTIMIZATION

65

Equation (1.18c), Q(du/dt, tO = O, is a first-order differential equation (such as (l.16a)) and one may easily verify that it can be so only if (I.13) is independent o f x . In other words, f(x,u) and L(x,u) must take one of the following forms, namely,

(al x + a2 + (a3 x + a4)h(u)

~bl x + b2 + It(u) f ( x , u) = }(ci x + c2) h,(u)

kh~Ot) (d~ x + d~ + (a~ x + a4) NOt)

]rl x + rz + Nl(ff) L(X, u) = |(k, x + ks) N2(u) kN~(u) or, more generally,

f (x, zO - g(x) + s(x) hO0,

1.(x, zO - re(x) + s(x) N O0,

0.17)

with g(x), re(x) being zero or at most linear in x, and s(x) being either a constant or at most linear in x. Now observe that (l.18b), Q*(x,u, du]dt)= 0, m u s t b e solvable for x so that, together with the state equation dx[dt = f ( x , u), the state variable x can be eliminated. Now, since (l.18b) does not depend on dx[dt, therefore, upon an examination of (1.13), it can be verified t h a t f ( x , u ) and L(x,u) mfist have the general form (t.17). Between (1.17) and (1.13) it is possible to determine

x = G(du/dt, u),

(1.19)

and, then, between (1.19) and its time-derivative and the state equation we can obtain a second-order differential equation m u, as required. Finally, in order to obtain from (l.18a) a differential equation in u, one should be able to eliminate x and dx/dt. But then there is only one other equation given by the state dynamics which contains dx/dt, x, and u. Consequently, it would be impossible to eliminate both x and dx/dt unless they are dependent which, clearly, is not the case. Thus a differential equation in u* can be obtained only if the system is of the form tl

dx/dt = g(x) + s(x) h(tO,

JOt)= f [m(x) + s(x) NOt)]dt. tO

This completes the proof of Theorem 2. 1.4. Theorems 1 and 2 present some necessary and sufficient conditions for the Hamiltonian to generate an ordinary differential equation in the optimal control u*. Although specific to the "one-state, one-control" case, the proofs suggest how the theorems can be extended to the "'vector" or multidimensional case. Consider a multidimensional system with n state variables

hlformatlon Sciences 3 (1971), 59-75

66

R A Y C H O W A N D S. S A N K A R S E N G U P T A

and m(0 < m < n) control variables. Certainly, a necessary condition for a system of first-order or~linary differential equations in tq*, ..., u,,* is that the Hamiltonian of the system be a linear function of the state variables and such that they are separable and[or factorable from the control functions. For only then will the set of control equations be free of the state variables. The second condition which is not as obvious can be derived as follows: Applying the Maximum Principle and assuming that the Hamiltonian satisfies the conditions of Theorem 1, one obtains three sets of equations consisting of the m equations of optimality and the 21l adjoint equations, (a) aHlOu, = O,

(b) dxj/dt = aH/~pj, 1 ~
(e) dpj/dt =-OH/Oxa,

(1.20)

1
Following the method of Theorem I, m additional equations are obtained by taking the time-derivatives of (l.20a). According to the assumptions relating to H, the equations (l.20b) are not required for deriving first-order differential equations, and the equations (1.20c) are independent of the state variables. Therefore, there is a total of 2m + n equations with 2m + 21l variables. It is clear that in order to eliminate the 2n costate variables by algebraic manipulations alone there must be at least 2 n + 1 equations so that we require 2111+ n > 271. However, in order that the resulting set of differential equations be solvable, there must be !1l of them. Consequently, in view of the requirement that 2m:l- n > 211, it follows that 217l + n -- 2n = m ,

i.e., m = i1.

(1.21)

This result simply means that in order to obtain, for a multidimensional system, a set of solvable ordinary first-order differential equations in u*, there must be an equal number o f state and control variables. This, however, does not preclude the possibility that one may yet obtain higher order ordinary differential equation in u* when m < n. Consider, for instance, a case in which there are two state variables and one control variable, e.g.. dxlldt

= x2,

dx2/dt = - x l + u,

and the criterion functional is 2rt

J(u) = f tt2 dt. 0

Here the Hamiitonian is H - - u 2 + P f x 2 + p 2 ( - x l + tO. The u which maximizes H is given by OH]Oa-- 0, i.e., 11" = p / 2 , and the costate equations are dpl/dt = --OH/Oxl =P2,

dp,_/dt = -OH/Oxz = - P l .

A CONTRIBUTION TO TIlE TtfEORY OF OPTIMIZATION

67

The desired differential equation is seen to be d 2 u * / d t 2 = --It*. This concludes our discussion of the multidimensional case; from now on we will consider only the single-variable case and the optimal differential equation will be written as

du*/dt = $(u*)

(1.22)

which is a version of (l.16a). 1.5. Even for the single-variable case, there is an important practical issue which relates to the stable criticalpoints, i.e., the real values of u* for which du*[dt = 53(u*) = O. Consider, first, the formal integral

(_tldt j = "*i 0 dz/~(z), q/:u*(to) < u* < ll*(tl) ~ to

u*(t 0)

and suppose that (~(u*) = 0 for some u* = (t ~ ql. This implies that the integrand I/~(z) tends to grow without bound as the motion approaches ft. But this cannot be allowed because .~tto~dt is a finite interval of time. In other words, q~(.) must not be allowed to have real zeros ht ~?/.An additional reason for this requirement may be seen from the fact that if~(u*) were to tend to zero then, since

dx* ldu* = f(x*(u*), u*)/r (which is a consequence of(l.9) and (1.22)), the response of x* to u* would tend to be extremely sensitive. From now on, therefore, it will be implicitly assumed that tile "'original" problem ((1.6)-0.7)) has been formulated so that in the

time-hlterval in qttestion ~(u*) does not have real zeros.

2. THE CASE OF THE CONSTRAINED CONTROLS

2.1. When constraints are imposed on the magnitudes of control action, it is clear that (1.11) is 11olonger applicable everywhere on U • T; in particular, the optimal u will not satisfy (1.11) at all instants of time. If at some instant of time a magnitude a of u is known to be optimal for (1.6)-(1.7) and satisfies the constraint C 1 < /~ ~.~ C2,

(2.1)

then ~ will certainly satisfy (1.11). It is clear that the solution of the optimal differential equation will coincide with the magnitude t7 at the instant in question. Thus it can be concluded that, if the solution of the unconstrainedoptimal differential equation continues to fulfil the constraint (2.1) over a certain interval of time, then, over that interval, it must also be optimal for the constrained case. The optimal it (see Figure 1) during the intervals toq

htformation Sciences 3 (1971), 59-75

68

RAY CIIOW AND S. SANKAR SENGUPTA

and t3 t5 are the solutions of the optimal differential equation with appropriate initial conditions. 2.2. Now, if at some instant the solution of the unconstrained-optimal differential equation violates a constraint (e.g., t~§ and ts+) then, because the Maximum Principle must apply at all times (i.e., the Hamiltonian must be at maximum at all times) the optimal magnitude of u must be its boundary value. But this does not mean that the sohttion for the constrahled case can be obtahzed by trtmcathtg portions of the sohttions of the unconstrahled-optimal differential equation which do not agree with the constraints. The reason is that, when u is at one of the boundaries, the trajectories of x(t) and p(t), calculated from the pair of adjoint equations (1.5), will, in general, differ from U -

UnconstrQined Constrained

-

....

....

e2

~\. \

I

I

to

tl

1

I I

tz t 3 ! 4

/. t

I

t5

I

ts

t

I::IGURE I. Constrained and unconstrained u*(t).

the optimal trajectory determined without imposing any constraints'on u. In particular, shzce the calculation of the opthnal u depends on the magnitude of p, it follows that the hlterval of time clurhzg which u* is at the boundary-e.g., t~ t a or t5 t6, as in Figure l~---willnot necessarily be equal to the corresponding h~terval tl t2 of the tmconstrahled case. 2.3. It is important to determine the length ofthne during which u* will be at one of the boundaries. Evidently, the calculation rests on the assumptio n that the hypotheses of Theorems 1-2 are fulfilled. TttEOREM 3. I f a process, with state dynamics

dx]dt=g(x)+s(x)h(u),

u e U: Cl < u < C:,

(2.2)

is.to be controlled so as to mhthnize t

J(u) - f [re(x) + s(x) N(u)] dr,

(2.3)

to

and if thefuactions g(x), re(x), s(x) and the Hamiltonicm H(x, u,p), . H(x, u,p) - -re(x) - s(x) NOt) + pig(x) + s(x) h(u)],

(2.4)

A CONTRIBUTIONTO THE THEORYOF OPTIMIZATION

69

satisfy the condition of Theorem 1, then the duration of thne over which u* is at either one of the boundaries can be ttniqtte.ly determbwd. Proof. We know from Theorem 1 that, under the hypotheses made above, there is an optimal differential equation/fwe ignore the constraint on u. So, the proof will consist of enumeration of cases in which the constraint does not affect the optimal differential equation. Case la: g(x) and[or re(x) is linear or constant in x, and s(x) is a constant. In this case, the Hamiltonian is H(x, u,p) =- - a l x - N(tt) +p(a2 x + h(u))

(2.5)

and the costate equation is dp/dt = - O H /Ox -- y.t(p),

(2.6)

indicating that the trajectory of the costate variable p is independent of changes in u. On the other hand, as may be verified from (2.5), the magnitude of u which maximizes H does depend on p. It follows, therefore, that, if the solution p*(t) of (2.6) is such that the u** - u[p*(t)] which maximizes H lies within the constraints, then it must also be the same as the one obtained as the solution of the unconstrained-optimal differential equation, du*/dt = (~(u*); for, otherwise, the optimal magnitude of u will be on the boundary. In other words, the trajectory of the optimal u for the case on hand will be the truncated trajectory of the unconstrahled-opthnal differential equation. Case lb: s(x) is a first-order polynomial. Here the Hamiltonian will be of the form, H(x, u,p) - - a I x - (a 2x + a3) N(u) + p[a 4 x + (a 2 x + a3) N(u)]

(2.7)

so that the costate equation will be a function of it, and p, i.e., dp[dt = ~/J(u,p).

(2.8)

Suppose that Ct < u* < C2 holds at some instant of time. Then, clearly, the u which maximizes H(x,u,p) can also be determined by straightforward differentiation: -ON/Ou + pOh/Ou = 0. (2.9) Equations (2.9) and (2.8), with appropriate initial conditions, are sufficient to generate the trajectories for both u(.) and p(.). But note that the same set of equations are employed to determine the optimal differential equation in u. Thus, if the solution ul* of the optimal differential equation at some instant satisfies the constraints, then it* =ttt* necessarily. If, however, the ut* for the succeeding period is outside one of the boundaries, say, C2, then it* will be at the boundary C2. Durhtg this period, u* is constant and the trajectory of p(.) is changed from (2.8) to dpldt = ~I(P).

(2.10) Information Sciences3 (1971). 59-75

4

70

RAY CHOW AND S. SANKAR SENGUPTA

Let the magnitude of p at the instant when u* meets or departs from the boundary (for instance, tl and t3 in Figure 1) be called critical vahtes ofp. Then points of discontinuity of u occur when a critical value ofp is encountered and, therefore, the time interval between successive critical values o f p determines (a) the duration of stay of the optimal u at one of the boundaries, and (b) the duration of stay withhz the boundaries. From what we have just seen, the critical values (see Figure 1), say p~, P3, Ps, and P6, occur at t~, t3, ts, and t6, respectively. The magnitudes of these critical vahtes are determhzable from (2.9) by substituting u = C2 (to find Pt and if3) and u = Cl (to find P5 and P6). The p(t)-trajectory between, say, Pi and P3 is confined to two successive p's determined by the same boundary value of u and can be obtained from (2.10). Therefore, the duration of stay at the boundary 6"2 is _

t=

[l/~,(p)ldp.

(2.11)

p|

Similarly, the trajectory of p(t) between two successive encounters with the boundaries, say, t3 and t5 (see Figure 1) is confined to two successive critical values o f p determined by the Ci and C2 and is given by (2.8). Substitution of p3 and P5 into the solution of (2.8) yields the required length of time. A simpler approach (which applies in the present case) is to solve the integral t = i ~ [1/~(u)] du

(2.12)

Ct

obtained from the unconstrained-optimal differential equation. Case 2: One or more of the fimctions g(x), s(x) and re(x) are polynomials of the second order. In this case, as we have seen in Theorem 1, the unconstrained-optimal differential equation will be of the second order. Applying the analysis which was employed in the preceding cases, it is seen that there will be intervals of time during which the optimal u is constrained at one of the boundaries, and there will be intervals of time during which the optimal u will be given by the solution of the unconstrained-optimal differential equation. The points of discontinuity of u*(t) are determined, as before, by the critical values ofp. Since the relationship (2.9) will also hold in the present case, it can again be used to obtain the critical values ofp. These latter can be employed to obtain the duration of stay of u* at the boundary. The method of calculation is straightforward. Note, first, that the costate equation is coupled to the state dynamics and is of the form dpldt = ~P(x,u,p).

(2.13)

This indicates that, in order to deduce the p(t)-trajectory during the time that u* is at one of the boundaries, it is necessary to determine the trajectory x(t) over the same duration of time. This is done by solving the state dynamics

71

A CONTRIBUTION TO THE THEORY OF OPTIMIZATION

equation with u taking one or other of the boundary values, and substituting the result and the boundary value into equation (2.13); this will yield

dp]dt = ~[/,(p, t).

(2.14)

This is apparently harder than (2.10). However, the separation between the two critical values of p, during which the solution of (2.14) holds, is known. Therefore, one can determine the time spent by the p(t)-trajectory between the critical values; once this is determined, the time spent by the u* at the boundary is determined automatically. This completes the p r o o f of Theorem 3 and makes precise~ the rules for computing the intervals between two successive encounters of u*(t) with the boundaries, as well as the duration of stay of u*(t) at the boundaries. 3. DISCRETIZATION AND THE MARKOV CHAIN

3.1. It is known [21 that for any n-th order ordinary differential equation there exists an n-th order difference equation, such that the two are "paired systems." That is, if x~ = (xfl,x~2,...,x~) are the solutions of an n-th order difference equation and x(t) = (xt(t) .... ,x"(t)) are the solutions of an n-th order differential equation, then the two are paired if and 0nly if x0 = x(0)

implies

xk =

x(kT)

for all k (k -- 1,2 .... ), T being the unit length irl which time, t, is discretized, and xo and x(0) the initial conditions of the difference and differential equations, respectively. Thus, if for a certain process there exists an optimal control which is given by a first-order differential equation, e.g.,

du/dt =

~3(u),

(3.1)

then there exists a paired difference equation of the form 9 tt~+~ = F ( u D . t

(3.2)

3.2. In principle, the difference equation paired to (1.22) can be obtained by an integration, i.e. (k+l I T

f kT

u*t(k+l )T)

dt=

f

dz/(~(z),

(3.3)

u*(kT)

t An apparently simple procedure is to approximate the derivative du*[dtat the point

(kT,uk*)by divided forward differences,writing

(u*+t - u,*)lT= 4 ( u : ) , or u*+t = F ( u : ) . (*) But, as shown in [2], a necessary and sufficientcondition for the "pairing" of (2.2) and (1.22) is that F(.) is to be art orientation-preserving homeomorphism, i.e., a one-to-one, bicontinuous mapping with positive first-order derivative. In general, there is no guarantee that (*) will fulfil this requirement. InformationSciences3 (1971), 59-75

.72

RAY CHOW AND S. SANKAR SENGLrpTA

T being chosen arbitrarily. A straightforward application of this method is difficult in the general case; some approximation is called for. The one which we propose consists in replacing ~(u*) by piecewise-linear segments, (~l u* + 3,, if u* e I1 - {u* :a < u* < at du*/dt = { . . . . . . I,~, u* + fir, if u* e Ir = {u* : ar - t < u* < at'= b}

(3.4)

in which 11. . . . . Ir is a partition (cover) of the domain of ~(.) into non-overlapping subintervals. Thus, on I l there is a difference equation which is paired to &t*[dt = ~dt* + ill; this difference equation is zz*+l=exp(~lT)zzk*--(1--exp(eqT))fldcq,

ztk* e It.

(3.5)

(3.5) is an equation to a straightline, with slope exp(etT) and intercept exp(oqT))flt[=v It is clear from the construction that the slopes are all different on different subintervals 1, 1 < i < r , and are all positive, thus ensuring that the pairhzg is an orientation-preserving homeomorphism (see the preceding footnote). (1 -

3.3. Elsewhere [3] we have demonstrated the following: THEOREM 4. Let there be a piecewise-linear, first-order scalar difference equation X,+I = G(X,),

n = 0, 1,21 ....

(3.6)

with (X.}.=o,l,2 .... ~ E - - R l, and let FJ have a finite non-overlapping cover E = Al U A2 U ... U Am, m < o% hz which A l = { e : 8 o < x < . SI},

At={e:Sl_l
with -oo < 8o < 8m < oo. Let -~ be the Borel field generated by and contahlhzg E and its subsets, and let tz(.) denote a Lebesgue-measure defined on e~'and let G(.) be measurable in o~. Then the sequence {A(~)}, 1 < r < m, generated by (3.6), constitute a Markov chah) for which the transition probabilities are Pr [X.+t e AA.u e A,] - Pt.~ -= tz(AtAj)/I~(A3,

(3.7)

in which tz(AtAs) is the total measure of those subsets of A~ which are mapped by G(') hlto A j, i.e.,

~(At A3 = Y ~{St~ ~ At: G(S,O ~ A j}. k

Concrete application .of (3.7) calls for a specification of the sign of ~(.). Suppose, for the sake of definiteness, that ~(.) is positive. Define zt* -

at exp (-~q T) - (1 - exp (-~q T))/3d~q

and note that the largest value of u* for which u*+l ~ It is given by min{a, zl*}.

73

A CONTRIBUTION TO THE THEORY OF OPTIMIZATION

Then, the numerator of (3.7) becomes min{at,zt* } - al-i and, therefore, Pt,~ --- (min {at, zt*} - al-l)/(at - at-l).

(3.81)

Again, let the quantity z** be defined as z** -= at+l exp (-~17") - (1 - exp (-cq T))fll/ut; then it may be verified that Pt,,+1 = (min {a, z**} - rain {a, zt*})/(a, - at-a).

(3.82)

The general formula for pt.t§ in the case ~(.) > 0 and, similarly, for pt.t_~ in the case ~(-) < 0, can be computed in a manner indicated by (3.7). It may also be verified that thept.y/> 0 and that ~ j P t . j = 1. 3.4. It will be appropriate to consider two questions Of practical importance. The first relates to the validity o f continuing the solution of (3.5) from It to the adjacent interval It• The second relates to the dependence of the transition probabilities on the choice of T and lt. 3.51. Suppose It and It+l are adjacent and suppose, without loss of generality, that at*" " (~1 u* + flit > 0, if u* ~ Ii t Idt = t 9 9 (3.9) [=t+i u + ft+! > 0, if u ~ It.l and that cq < 0, ~t+! < 0, and u*(to) = Vo e lt. Then the time required to reach some point v* e It+! will be ~-: a! .f

=f r0

v*

+ fl) + f

z + ft+,).

(3.10)

ai

On the other hand, the difference equations paired to (3.9) are tt*+l

Ivg*exp(cqT)-(l-exp(=tT))fl,/~,,ifu~*,u~+l eli (3.11) = [Uk* exp (cq+ l T) - (1 - exp (~l+t T)) fll+l/cq+t, if uk*, u*+t ~ It+1

(3.12) Thus, if a sequence (u~*} is started off by (3.11) with an initial value Vo then, after some time, it will tend to seep into lt+1. But on lt§ the equation ofmotion is (3.12) which, of course, will not be satisfied by a continuation of {uk*}. In order to remove this apparent difficulty, define the interval Ii*(T):

11"(T) - {u*: a, ~< u* < aL exp (~1 T) -- (1 - exp (cq T)) ill/u:}

(3.13)

This will be contained in lt+l and can be reached from li in one transition. Thus, the question of validity o f continuation of a sequence can be settled if one employs the intervals defined by (3.13). Information Sciences 3 (I 971), 59-75

74

RAY CHOW AND S. SANKAR SENGUPTA

3.52. The manner in which the choice of T and It affects the Pt.s can be analysed only if a concrete form is assumed for 4(u*). However, if an It is held fixed and Tis allowedto vary then, regardless of the sign of at, the variations in Pt.t are inversely related to T; this may be verified with the aid of (3.81)-(3.82).

4. EFFICIENCY OF DISCRETIZATION

4.1. Undoubtedly, there is a loss of information when (1.22) is discretized in time and phase. Therefore, it is important to inquire how much essential information is retained after discretization. There is always an element of arbitrariness in the notion of "essential"; but the principle should be clear: One should look for certain hwariants with respect to which the fully deterministic system (1.22) and the probabilized system (3.7) can be compared. There are many ways of comparing a probabilistic quantity with a deterministic one; we have chosen to compare in the mean-values. Our choice ofinvariants has been confined to (a) the expected total duration of stay of the Markov chain in a given state, if the motion of u* begins in that state, and (b) the expected time to reach some preassigned state, if the chain is at some other preassigned state. 4.2. If ~'l =- [.t~ dz/41(z) denotes the duration of stay (of the deterministic system) in the interval It, and ifNl is the (random) number of periods spent by the Markov chain in the state lt, then the criterion suggested by (a) ofw 4.1 is the equality "rt = E(Nt)" T.

(4.1)

The discretization procedure which guarantees the fulfilment of this requirement is described in TrlEOREM 5. Suppose that the condition stated in w is fidfilled, and let at-l ~ q[ be given. Then, a su.ff~cient condition for a~ E qZ to generate states It at:d their transition probabilities which satisfy (4.1) is that 4(at) = 4(at-l) exp (=, Z), if 4(') > 0,

(4.2)

4(at) = 4(a,_,) exp (-at T), if 4(') < O,

(4.3)

the aL behlg determhted as

=, = [4(at) - 4(at_,)]/(at - a,_l). COROLLARY.Pt.~ = 0 if and only if 4(at) = 4(al_ l) exp (:t:=tT").

A CONTRIBUTIONTO THE THEORY OF OPTIMIZATION

75

Proof. Omitted. The significance of this result is seen t o b e the following: The intervals 11 and the unit T of time are determined by (4.2)-(4.3) in such a way that the system(3.4) spends equal time in each interval (state). In probabilistic terms, this means that almost surely the phase-point will leave a state after one transition whose duration is T. 4.3. If the (deterministic) system takes off from an It, then the time taken to reach l~§ is l§

~,(i,k)=

~.

n--I

f dz/$o(z).

(4.4)

In

On the other hand, the expected time for the Markov chain to enter lt+k for the first time from It is the sum [4] of the k-th row Ntk) ofthe matrix N, N-

~ ,~th) el.J,

(4.5)

h=0

m being the total number of periods measured in units of T. The second criterion mentioned in w thus calls for th e equality y(i,k)= Nt~)-l. However, we have THEOREM 6. If ~(tt*) > 0 and if (4.2) is employed for discretization, then N(R)'I is at most equal to y(i,k).

Proof. Omitted. Put another way, the probabilistic system generated by the procedure (4.2) cannot, on the average, spend a larger number of periods than the associated (linearized) deterministic system (3.4) before reaching any preassigned state.

REFERENCES I L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mishchenko, The 5Iathematical Theory o f Optbnal Processes, Eng. Tr., Interscience, New York, 1962. 2 R. E. Kalman, Nonlinear aspects of sampled-data control systems, Proc. Sympositon on Nonlinear Circuit Analysis, Polytechnic Institute of Brooklyn, 1956, pp. 273-315. 3 S. Sankar Sengupta, P. Czarny, and Ray Chow, A representation theorem for finite Markov chains whose states are subintervals of [0,1 ], htformation ScL 3, No.l, January 1971 pp. 51-58. 4 J. G. Kemeny and I. L. Snell, Finite 2$[arkov Chahts, D. Van Nostrand, Princeton, N.J., 1960.

Receh,ed April 10, 1970 Information Sciences 3 (1971), 59-75

A contribution to the theory of optimization with Markovian controls

A contribution to the theory of optimization with Markovian controls

Recommend Documents