On the Determination of Optimal Trajectories Via Dynamic Programming

On the Determination of Optimal Trajectories Via Dynamic Programming

-8On the Determination of Optimal 7 rajectories Via Dynamic Programming RICHARD BELLMAN The RAND Corporation, Sarita XIonica, California 8.1 Introduc...

407KB Sizes 0 Downloads 58 Views

-8On the Determination of Optimal 7 rajectories Via Dynamic Programming RICHARD BELLMAN The RAND Corporation, Sarita XIonica, California

8.1 Introduction There is little difficulty in formulating various problems arising in tho study of optimal trajectories as questions within the calculus of variations. Standard variational techniques then transform the original problems into those of solving nonlinear differential equations subject to two-point boundary conditions. In many cases, the presence of constraints on the possible motions introduces a combination of equations and inequalities. Even in the simpler cases where constraints are not present, the twopoint aspect introduces complications which often render a computational solution extremely difficult to achieve, even when equipped with the biggest and fastest of modern computers. In this chapter we wish to discuss the application of the theory of dynamic programming to the numerical solution of problems in the calculus of variations. Our aim is to describe a systematic approach to these problems which will permit routine solutions with the aid of digital computers. 2231

282

Richard Bellman

8.2 Dynamic Programming The basic step is the recognition of the fact that the calculus of variations is a particular example of a multistage decision process of continuous type. The reader interested in the fundamentals of the theory of dynamic programming, an alternate term for the theory of multistage decision processes, may refer to Bellman’s* or Bellman and D r e y f ~ s .I~n what follows we shall assume that the reader is familiar with the basic ideas of the theory.

8.3 One-Dimensional Problems Let us begin our discussion with t.he problem of minimizing t,he functional

J(u)

=

/

T

(8.1)

g(u, u’)dt

0

over all functions u ( t ) defined over the interval [0, T ] and satisfying the initial condition u ( 0 ) = c. Writing f(c, T )

=

min J ( u )

(8.2)

21

the principle of optimality, cf. Bellman,lr2 yields the nonlinear partial differential equation

f~ = min

[g(c,

0)

+ vfcl,

f(c, 0)

=

0

(8.3)

2)

From this equation the familiar Euler equation can readily be obtained, cf. Dreyfus4 and Bellm~in.~ For computational purposes, it is frequently better to use the discrete approximation f(c, T

+ A)

=

min [ g ( c , v > A

+ f ( c + vA, 7’11

(8.4)

21

-

= 0, A, 2A, * -,cf. Bellman2 and Bellman and D r e y f u ~ . ~ The solution obtained in this way requires the tabulation of two sequences of functions of one variable, the functions { f(c, T } and the “policy functions,” v = v(c, T ) . It is thus a routine problem as far as modern digital computers are concerned. For a discussion of the advantages of this approach as opposed to the usual approach of the calculus of variations, see Bellman.2

I’

8. Determination of Optimal Trajectories

283

8.4 Constraint-I

If we add a constraint such as

I u‘ I I m,

O l t l T

(8.5)

Eq. (8.3) is replaced by

f~

min Cg(c, U )

=

Ivl
+ vfc3

The corresponding version of (8.4) is f(c, T

+ A)

=

min [ g ( c , v > A 4-f ( c -I- v 4 T ) ]

(8.7)

lvl
The minimization is carried out by a search process over some finite set of U-values lying in the interval [-m, m]. In many cases, sophisticated techniques can be used to reduce greatly the number of values that must be examined, cf. Bellman and Dreyfus.3 Observe that the presence of the coiistraint simplifies the solution by dynamic programming techniques since it serves to reduce the number of feasible policies, which is to say, the possible choices of v.

8.5 Constraint-11 Suppose that the problem is that of minimizing J ( u ), as given in Section 8.3, subject to the additional constraint

[

h(u, u’) dt

5k

I n place of adding another state variable, we employ a Lagrange multiplier and consider the new problem of minimizing the functional

[

[g(u, u’)- Xh(u,

u’)]dt

subject only t o the conditions (a)

u(0)

=

c, (8.10)

(b) Writing

[ u ’ ( l ) I 5 m,

0

It 5

T

Richard Bellman

284

we have, for each fixed value of A, the discrete recurrence relation

from which the numerical solution can be obtained easily. The value of X, a “price,” is then varied until the original constraint is attained.

8.6 Discussion We may conclude that. the general one-dimensional variational problem of minimizing

1 T

J(u)

=

g(u, u’)dt

(8.13)

0

subj ect to

[

h(u, u’)dt

(b)

u(0)

=

5

k

c

(8.14)

is a routine problem which can quickly and accurately be solved using dynamic programming techniques in conjunction with digital computers. For some results, see Bellman and D r e y f ~ s . ~ Minor modifications handle the case where there is a boundary condition att = T T(U(

T ) , u’(T ) ) = 0

(8.15)

the case where the integrand depends explicitly upon t, and the more general case where it is desired to minimize (8.16) over all functions v where u and v are connected by the differential equation du

- = h(u, v),

dt

u(0) = c

(8.17)

285

8. Determination of Optimal Trajectories

8.7 Two-Dimensional Problems Let us now consider the problem of minimizing the functional

I,

,i”

J(u1, where

u1 and

UZ) =

g(u1,

ZLZ, UI’,

u?’) dt

(8.18)

u2 are subject to the initial conditioiis Ul(0)

=

c1,

U Z ( 0 ) = C?

(8.19)

Setting f ( c ~ CZ, , T)

=

min J(u1,

uZ)

(8.20)

UI,U2

we obtain as before the rionliiiear partial differential equation

For computational purposes we can employ the recurrence relation f(s,

CZ,

T

f A)

=

min Cs(c1, CZ, 211, VZ) A +f(cl

fVIA, cz fVZA,T ) ] (8.22)

U11V2

-

T = 0, A, 2A, * -,with the initial condition f(c1, cz, 0) = 0, or standard methods for the numerical solution of partial differential equations. The numerical solution along the foregoing lines involves the tabulation and storage of sequences of functions of two variables. This introduces some complications. Consider, to illustrate this point, a situation in which c1 and cZare both allowed to assume one hundred different values. Since the number of different sets of c1 and c2 values is now lo4,the tabulation of the values of f(c1, cq, T ) for a particular value of T requires a memory of lo4. Moreover, since the recurrence relation requires that f a t T be stored while the values for T+A are calculated, and since the two policies v1 = vI(c1, CZ, T ) and vZ = v2(c1,cZ, T ) must also be stored, we see that we need a memory of a t least 4 X lo4. Let us note that when we use the term “memory,” we always mean just memory. There is, of course, no limit on the slow memory that is available. There are many ways of cutting down the number of grid-points. These are discussed in Bellman and D r e y f ~ s .Generally ~ speaking, with the current digital computers with memories of 32,000 words, we can handle two-dimensional variational problems in one way or another. The situation becomes very much worse, however, as we turn to higher dimensions. A three-dimensional trajectory problem, involving three position variables

286

Richard Bellman

and three velocity variables, leads by way of the dynamic programming approach to functions of six phase variables, or “state variables.” Even if each variable is allowed to take only 10 different values, this leads to lo9 values, an absurdly large number. We wish to employ a different idea, the technique of polynomial approximation. This will enable us to tabulate functions of several variables in a quick and efficient way, and allow us to use the functional equation approach to solve multidimensional variational problems.

8.8 One-Dimensional Case In order to present the idea in a simple form, let us begin with the onediinensional problem discussed above. We wish to obtain a numerical solution of the recurrence relation f(c, 2’ f A) = min Cg(c, v) A

T

=

0, A, 2A,

- ., with f(c, 0)

+ f ( c 4- vA, 2‘) 1

(8.23)

8

=

f(c,

0. To simplify the notation, let us write

w

=fk(C)

(8.24)

Let us agree to consider only values of c lying in a fixed interval, which with suitable normalization we can take to be [-1, 11. To ensure that c remains in this interval, we add a constraint on v, namely, -1 _< c

+ VA 5 1

(8.25)

By taking the interval sufficiently large, we can ensure that the effect a t the boundaries will be negligible as far as the internal values are concerned. I n many cases, a constraint of the foregoing nature exists as part of the original problem. We now approximate t o each member of the sequence ( f k ( c ) } by a polynomial in the state variable c. Instead of writing this in the usual polynomial form, we write it in terms of orthonormal Legendre polynomials, (8.26) where the coefficients depend upon k. The advantage of using Legendre polynomials in place of the usual powers of c lies in the fact that we can use the formula akn =

fk

( c )P n ( c ) dc

(8.27)

to determine the coefficients rather than relying upon a differentiation process.

287

8. Determination of Optimal Trajectories

The point of the representation of Eq. (8.26) is that the function f k ( c ) is now represented for all points in the interval [-I, 11 by the set of N 1 coefficients { u k r z }Once . these N 1 values have been stored, we can then calculate the value of fA(c) for any value of c in this interval. This calculation is naturally approximate, but me can expect to obtain excellent agreement by choosing N t o be of the order of magnitude of 10 or so. How do we actually calculate the sequence of coefficients (akn)?If we use a Riemann approximation

+

+

(8.28) J-1

7=-w

we may end up either tabulating as many values of f k ( c ) as before, or suffer serious inaccuracies. In place of evaluating the integral as in Eq. (8.28)) we use a quadrature technique. If the points ti and the weights W i are chosen suitably, we may write

(8.29) an approximation formula which is exact if g(c) is a polynomial of degree 2M - 1 or less. I t is easy to show that the ti are the M roots of the Legendre polynomial of degree M , and the wj are constants determined by the Legendre polynomials, the Christoffel numbers. The parameters are readily available up to quite large values of M . It follows that the values of f k ( c ) for all points in [--1, 11are determined by the values of f k ( t j ) , j = I, 2 , . - - ,M , since these values determine the coefficients akn in Eq. ( 8 . 2 8 ) , and these coefficients determine . f k ( c ) by way of Eq. ( 8 . 2 6 ) . Let us now see how this simplifies the determination of the sequence ( f k ( c ) ). Starting with the known functionfl(c), obtained from

(8.30)

.-

we convert fl(c) into the sequence [all, u21,

a ,

aNl] by using the relations

P l

(8.31) M

Richard Bellrnan

288

Since w2~jand P k ( t j ) are fixed constants, calculated once and for all, we can store t,heir product, b k i , and write M Ukl

=

C b k i fl(ti),

k

=

N

(8.32)

+ vA)]

(8.33)

1, 2,

***,

j=1

Turning to the determination of f2(c) from the relation

h(c>

=

min b ( c , V I A 4- fl(c I)

we note first of all that we need only compute the Jf values fi( t i ) , j = 1, 2, * ., M . The value of fi(tj vA) is obtained for each value of u examined by use of the formula

-

+

fi(tj

+ vA)

M

=

C a k l P k ( t j + UA)

(8.34)

j=1

The evaluation of this expression is not too much more difficult than that ) a simple of a polynomial since the Legendre polynomial P T L ( zsatisfies 3-term recurrence relation which makes its evaluation very simple starting from the initJialvalues Po(z) = I, P l ( z ) = z. Having calculated the PI values &(ti) } , j = 1, 2, , M , we determine the coefficients { & 2 ) , k = 1, 2, .-.,N ,

.- -

M

(8.35) The values ( f 2 ( t j ) 1 are now discarded and the sequence ( a k 2 ) , k = 1, 2, ., N is stored. All values of f2(c) required for the determination of f3(c) from the equation

--

fdc)

= min [ d c , v)A + f i ( c u

+W

l

(8.36)

are now obtained by means of the relation N

f2(c)

=

C akZPk(c)

(8.37)

j=1

We compute the M values { f3(tj)

1 and continue in this fashion.

8.9 Discussion

It will be seen that the memory requirements for this process are quite small. We store only the sequence ( a k R }a t the Rth stage, as well as the constants b k j and the instructions for the machine to carry out the indicated operations. For M , N 5 20, this is a negligible total.

8. Determination of Optimal Trajectories

289

The choice of M and N is a matter of convenience and experience. It is generally true in variational problems that regardless of the behavior of the policy function, v = v(c, T ) ,the return function j ( c , T ) is quite smooth. This is to be expected from physical reasons (stability under small changes) and from mathematical reasoning as well. Consequently, we can expect a polynomial approximation of say degree N = 10 to yield very accurate results. Some preliminary results and comparisons with an exact solution are given in Bellman and D r e y f ~ s . ~ , ~ The determination of the policy function can be accomplished in two ways. We can either compute all the polynomial approximations the first time around, store the coefficients, and then compute the policy functions, v k = &(el T ) , from the recurrence relation Eq. (8.23)) or we can compute the policy functions as we go along and print out the values.

8.10 Two-Dimensional Case Let us now consider the two-dimensional problem discussed in Section 8.6. We approximate to a function of two variables f(s, c2) by a polynomial in c1 and c2, most conveniently taken to be of the form

c N

f(c1,

c2)

=

aklPk(Cl)Pl(CZ)

(8.38)

k,l=l

The coefficients are determined by the relation

(8.89)

where the weights wj and interpolation points tli are as before. We see then that the function !(el, cq) is determined for storage purposes by the N 2 coefficients { a k L }k,, 1 = 1,2, .,N , which in turn are determined by the valuesf(tlj, tZr). The recurrence relation Eq. (8.16) may be written

--

f,+i(ci, c2) =

min [g(ci,

CZ, 81, 82)A

+

fn(c1

+ VIA, + vA)] c2

(8.40)

U11V2

As before, we start with the function f n (el, cp) in the form of the coefficients { ~ k & ~ and ) ) use these to compute the values of f n needed to determine { f + ~ ( t l i , t~,.)1. From these we evaluate ( ~ ~ ( 9 ~ f l and ) ) so on.

290

Richard Bellman

A function of two variables over -1 5 cl, c2 5 1 is stored by means of the N 2 coefficients ( u ~ z I}f.N = 10, an approximation which should yield quite accurate results, we need only 100 values. 8.11 Discussion

Proceeding along the same lines, we see that the general trajectory problem in the plane could be treated in terms of polynomial approximation t o functions of four variables, requiring N4 coefficients. If N = 10, this requires not only a large memory, but an enormous amount of time for the evaluation of particular functional values. If we decrease N to 5, we reduce the number to N 4 to 625, a more reasonable quantity. Similarly, if we turn to three-dimensional trajectory problems, involving functions of six variables, we see that a choice of N = 4 yields the figure 46 = 1024. We may expect to get by with polynomials of lower degree as we increase the number of state variables. What we have presented above is an outline of the general idea of polynomial approximation. Combining it with various other techniques such as successive approximations, in the form of approximation in policy space or otherwise, we feel that it is reasonable a t the present time to think in terms of routine solutions of three-dimensional trajectory problems involving six state variables. With the computers of 10 years hence, with memories 10-30 times larger and speeds 10-30 times faster, we can consider routine solutions of problems involving other state variables such as fuel, mass, and so on. REFERENCES 1. R. Bellman, “Dynamic Programming.” Princeton Univ. Press, Princeton, S e w Jersey, 1957. 2. R. Bellman. “Adaptivc Control Processes: A Guided Tour.” Princeton Univ. Press, Princeton, New Jersey, 1961. 3. R. Bellman and S. E. Dreyfus, “Applied Dynamic Programming.” Princeton Univ. Press, Princeton] New Jersey, 1962. 4. S. Dreyfus, Dynamic programming and the calculus of variations, J . Math. Anal. and AppZ. 1, 228-239 (1960). 5. R. Bellman, Dynamic programming of continuous processes, Rept. R-271, The RAND Corporation, Santa Monica, California (1954). 6 . R. Bellman and S. Dreyfus, Functional approximations and dynamic programming, Math. Tables and Other Aids to Computation 13, 247-251 (1959).