Optimal Control of Discrete Systems
T h e problems discussed in the last chapter were characterized by the choice of certain parameters in order to optimize some performance index. I n this chapter the problems considered will involve such choices or decisions at each of a finite set of times or stages. A cost, or the value of a performance index, will be associated with each sequence of decisions and the problem will be to optimize this cost. T h is is the so-called multistage, or discrete, optimal control problem. After defining some notation, the solution of the problem is reformulated in terms of dynamic programming. T h e solution is shown to depend upon the maximization of a sequence of functions. Next, a n analytic solution is shown to exist for the linear quadratic problem. T h is analysis is illustrated with a simple example. Unfortunately, in general analytic solutions do not exist, and recourse must be made to numerical methods. These methods are the subject of Sections 3.5-3.8. Two methods, the gradient and the Newton-Raphson (successive sweep), are described in detail and are shown to lead to stationarity and convexity conditions for a local maximum of the multistage problem. T h e neighboring extremal method and its relation to the successive sweep method is discussed. Finally, necessary and sufficient conditions for a local maximum of the general problem are given. 3.1
NOTATION
3.1 .I System Variables At each stage the system is assumed to be characterized by a finite set of real numbers that is referred to as the state of the system. T h e state, 35
36
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
at the ith stage, is denoted by an n-dimensional column vector x ( i ) where
T h e controI variables are classified in two sets. Variables of the one set are referred to as control functions, or, more simply, as controls. These control functions, which normally vary from stage to stage, are denoted by an m-dimensional column vector of real numbers u(i) where
T h e othcr set of control variables are constant and are referred to as control parameters. T h e control parameters are denoted by a p-dimensional vector 01
3.1.2
System Dynamics
T h e dynamics of the system are expressed in terms of a set of discrete equations. T h e process is assumed to have N stages, and the evolution of the state through these stages, x(O), x(l), ... , x ( N ) is governed by an equation of the form x(i
where F 3.1.3
=
+ 1) = F(x(i),u(i),a ) ,
i
= 0,1,
... , N
-
1
(3.4)
(Fl , F, ,..., Fn)Tt is an n-dimensional vector of functions.
Performance Index
T h e performance index is denoted by
t Superscript T is used to denote the transpose.
1,
3.3.
DYNAMIC PROGRAMMING SOLUTION
37
where L and $ are scalar, single-valued functions of their respective arguments. T h e total number of stages N may be free or specified. I n some cases, especially in stabilization problems, N may be regarded as being infinite.
3J.4 Side Constraints T h e most general form of constraint is given by
0 = +(x(N),4
+ c M(x(i),M Y . ) N -1 i=O
(3-6)
where I,!Iand M may be scalar-valued or vector functions. If the function M = 0, the foregoing is referred to as a terminal constraint. As well as this equality constraint, inequality constraints may also have to be considered. I n particular, constraints on the state of the form
<
S(x(i)) 0
(3.7)
and on the control variables,
Wi), .(4) < 0 are very common. T h e functions C and S may be scalar-valued or vector functions. Constraints may also be placed on the initial state, although in the analysis that follows the initial state is assumed specified. 3.2
THE PROBLEM
I n the light of these definitions it is now possible to define the problem in a precise manner, viz: T o find the sequence of controls u(i), i = 0, 1, ... , N - 1 , and the control parameters 01 that maximize the performance index, Equation ( 3 . 9 , subject to the state transition equation (3.4) and the constraints, for example, Equations (3.6)-(3.8). 3.3
DYNAMIC PROGRAMMING SOLUTION
T h e dynamic programming solution is based on principles that are a direct consequence of the structure of the problem.
3.3.1 The Principle of Causality T hi s principle is a fundamental property of deterministic multistage systems. It may be stated as follows. Principle of Causality: T h e state x ( k ) and control parameter 01 at the Kth stage, together with the sequence
38
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
+
of controls u[k, r - 11 = [u(k),u(k I), ... , u(r - l)] uniquely determine the state of the rth stage x ( r ) . T h e principle follows directly from a consideration of the system equations (3.4). T h e principle implies that there exists a function G ( 4 , 01, U P , r
such that A@)
=
-
11, A, 01)
G(x(K), ~ [ kY ,- 11, K, r )
(3.9)
(Y,
G is the transition function. T h u s the initial state x(O), control parameters a , and the control sequence u[O, N - 13 uniquely determine the trajectory x[l, N ] = [x(l), x(2), ... , x ( N ) ] . Hence, the performance index J defined by Equation (3.5) may be written as a function of x(O), a, and u[O, N - 11, i.e., there exists a function V(x(O),01, u[O, N - 11) such that (3.10) J = q49,%u[o, N - 11) If x(0) is specified, this relation implies that it is necessary only to determine a and u[O, N - 11 to maximize J. T h is assumption is implicit in the statement of the problem.
3.3.2
The Principle of Optimality
Suppose that somehow the control sequence u[O, k - I] and the control parameter a have been chosen in an optimal manner. Then, from the principle of causality, the trajectory x[O, k] is also determined. Now the performance index J may be written as J = J1 Jz where
+
(3.11)
and, (3.12)
T h e first term, J 1 , has been determined from the assumption that the optimal u[O, k - 13 and a are known. Thus, to complete the optimization of J , it is clear that it is both necessary and sufficient to determine u[k,N - 11 to maximize J z . This observation was summarized precisely by Bellman as the principle of optimality. It may be stated as follows: An optimal sequence of controls in a multistage optimization problem has the property that whatever the initial stage, state, and controls are, the remaining controls must constitute an optimal sequence of decisions for the remaining problem with stage and state resulting from the previous controls considered as initial conditions.
3.3. 3.3.3
39
DYNAMIC PROGRAMMING SOLUTION
The Principle of Optimal Feedback Control
An important consequence of the principle of optimality is that the choice of the optimal control at some stage may be expressed as a function of the state at that stage. This idea is defined in the principle of optimal feedback control which asserts that "The optimal control at the kth stage, u(k), provided it exists and is unique, may be expressed as a function of the state at the kth stage, x(k), and the control parameter, 01. T h e principle implies that there exists a function u0Pt(x(k),a , k ) such that uoPt(k) = u"Pt(x(k), 01, k ) (3.13) This function, uoPt(x(k),01, k) is referred to as the optimal control law, and, in conjunction with the system equations, may be used to generate optimal control sequences. T h e optimal control law defines the closedloop solution to the optimal control problem, and the actual numerical sequence of optimal controls, uopt(k) (k = 0, 1 , ... , N - l), is referred to as the open-loop solution. The closed-loop solution is more desirable, as it yields the open-loop solutions for all values of the initial state and initial stage. T h e principle of optimal feedback control may be derived quite easily from a consideration of the optimality and causality principles. First, from the causality principle it follows that there exists a function V(x(k),01, u[k, N - 11, k) such that v ( x ( k ) , a,u[k, N
-
11, k )
+ C {L(x(i),~ ( i ) , N-1
= +(x(N),01)
01))
i=k
(3.14)
Now, from the principle of optimality u[k, N - 11 must be chosen to maximize V(x(k),01, u[k, N - I], k). Clearly, uopt[k,N - 11 [and hence uopt(k)] will be functionally dependent upon x(k), 01, and k.
3.3.4 The Return Function T h e function V(x(k),01, u[k, N - 11, k) introduced before is referred to as the return function corresponding to the control sequence u[k, N - 13. If the control sequence u[k, N - I] is replaced by the optimal control uOPt[k, N - 11, V becomes the optimal return function Vopt(x(k), a, k) where Y"Pt(x(k),01, k )
=
V(x(k),ct, &'t[k, N
-
11, k )
T h e dynamic programming solution will require the construction of
40
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
the optimal return function. Hence it is appropriate here to explain how 1, ... , the return function may be obtained. Let u(x(i),a, i), i = K N - 1, be some arbitrary control law and V(x(k),a,k) the corresponding return function where
+
V(x(k),
01,
k)
= 4(x(N), a )
+ c {L(x(i), N -1
U(.(i>l
01,
i=k
(3.15)
i), % i)>
and where x(k, N ) is chosen such that x(i
+ 1) = F(x(i),a , u(x(i),
01,
i
i)),
= k,
K
+ 1, ... , N
-
1
(3.15a)
Now, from the definition of the return function, Equation (3.15), V(x(k),a , k) must satisfy the following backward transition equation
where x(k
+ 1 ) is given by Equation (3.15a) with i = K . At the final stage, W W ,a, N ) = 4(x(N), a )
(3.17)
T hus the return function V ( x ( i ) ,a , i),may be constructed as a backwards sequence on the index i using Equations (3.16) and (3.17). If the general feedback law is replaced by the optimal control law, the foregoing equations may be used to construct the optimal return function, i.e., V O P t ( X ( N ) , a ) -= 4 ( x ( N ) ,a )
V'JP'(r(k),01, k )
= L ( x ( ~01,) ,u " P ~ ( x ( K ) ,
where x(k
3.3.5
k))
+ 1) = F(x(k),a,
+ Vopt(x(k -1U " P t ( X ( k ) , a,
1, 01, k
+ 1)
(3.18)
(3.19)
k))
Derivation of the Optimal Control Law
Next, the relations that define the optimal control law will be obtained. 1,et the initial stage be denoted by k, and it is assumed that the optimal 2, ... , control law for the succeeding stages, unBt(x(i),a,i ) i = k $- I, k N 1 , has somehow been found. From Equations (3.18) and (3.19) it is clear that this will determine Vo*lt(x(k l), a , k 1). Now define a return function, R(x(h),u(k),'Y, k), as the return obtained from an initial stage k and state x(k) by using u(k)followed by u0Pt[k 1, N - 11, LC.,
+
+
~
+
+
R(x(k),~ ( k )a ,)
= L(x(k),~
( k )a ,)
+ VoPt(x(k + I),
01,
K)
(3.20)
3.3. where x(k
DYNAMIC PROGRAMMING SOLUTION
+ 1) is given by x(k
+ 1)
41
(3.21)
= F(x(k),u@), a )
It follows from the principle of optimality that the optimal choice of u ( k ) must be such that R is maximized, i.e., uopt(x(k), a ) = arg max{R(x(k), u(k),a)] u(k)
(3.22)
T h e control parameters 01 may be solved for at the initial stage by maximizing the optimal return VoPt(x(0), a ) with respect to a. I n general, the closed-loop solution for a o p t can be found as a function of the initial state, i.e., aOPt(x(0)) = arg max{ Vopt(x(O),a)} (3.23) These equations [(3.17), (3.22), and (3.23)] completely define the optimal solution. This solution may be summarized as follows.
1. Set VOPt(x(N),a )
2. Given VoPt(x(k
=M
N ) ,a )
+ l ) , a), choose uoPt(x(k),a ) from
uopt(x(k), a)
= arg max(R(x(k), u(k), a)} u(k)
3. Given uopt(x(k),a) and VoPt(x(k
+
VODt(X(k),
(3.25)
+ I), a ) , form VoPt(x(k),a ) from
Vopt(x(k),a) = L(x(k), ~ " p ~ ( ~a()k) ) , V0Pt(F(x(k),u0Pt(x(k), a), a )
4. Set k 5. At k
(3.24)
(3.26)
k - 1 and repeat steps 2 and 3 until k = 0. = 0 solve for the optimal choice of a by maximizing
=
a).
I n utilizing these equations there may be some question as to whether or not a solution exists or as to whether the solution is unique. However, if a unique solution to the dynamic programming equations does exist, then from the preceeding analysis it is clear that a unique optimal solution does exist and is given by the dynamic programming solution. If side constraints are present, only admissible values of the control variables may be considered in the maximization, Equation (3.25). If the constraint is of the form C(x(i),u(i)) 0, it is usually satisfied quite easily. A far more difficult type of constraint is
<
+ 1 M(x(i),w, 4 N -1
$(x(N),a )
i=O
=0
(3.27)
42
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
I n general, the best approach, as in the parameter optimization problem, is to adjoin these constraints to the performance index with a Lagrange multiplier V . Thus, an augmented performance index J* is formed, viz,
J*
= 4(x(N), a )
+
VT4(X(N),
4
+ c {L(x(i),u(i), + vTM(x(i),.(i), N-1
a)
i=O
a)>
(3.28)
T h e problem now is to find a control u[O, N - 11 and parameter v that extremixes J*. T h e control does not necessarily maximize J*, as shall be seen in later sections. I n general, an analytic solution of the dynamic programming equations is not possible, and recourse must be made to numerical techniques. One approach to this numerical problem is given in detail in future sections. Fortunately, one major class of problems may be solved analytically, and it will be discussed in detail in the next section. 3.4
LINEAR QUADRATIC PROBLEMS
A linear quadratic problem is characterized by a linear system equation and a quadratic performance index. T h e linear quadratic problem is important for a variety of reasons. I n particular, a simple analytic solution for the closed-loop optimal control for a linear quadratic problem may be found. Th is control consists of a linear combination of the components of the state vector. Because of this simple form, the problem of the mechanization of optimal control laws for linear systems is relatively straightforward. Of course, practical problems are generally nonlinear, but these problems can often be approximated, realistically, by a linear quadratic form. After deriving the solution, the application will be illustrated with examples.
3.4.1
Linear Quadratic Algorithm
So that the analysis may be simplified, side constraints and control parameters will not be included. Linear constraints will be considered in a later section. T h e linear system equations are given by
where, as before, x is an n-dimensional state vector and u is an mdimensional control vector. T h e matrices F z ( i ) (dimensioned n x n) and
3.4.
43
LINEAR QUADRATIC PROBLEMS
F,(i) (n x m) are system matrices that may vary from stage to stage. I t is assumed that the initial conditions are specified, i.e.,
x(0)
(3.30)
specified
xo
=
T h e quadratic performance index J is chosen to be
J
+ C {xT(i)~,,(i)x ( i ) + uT(i)Lu(Z)u(i)} N-1
= xT(N)4,,x(N)
(3.31)
i=O
where the matrices L , ,L,, , and q3,, are dimensioned n x n, m x m,and n x n,respectively. (The more general case, in which linear terms are included, is considered as an exercise.) I t shall be shown by induction that the optimal control law and the optimal return function have the following form, =
U"Pt(X(i))
and
= x'(i)
V.Pt(X(i))
(3.32)
--D(i) .(i)
P(i)x ( i )
(3.33)
where D(i) and P ( i ) are m x n and n x n matrices which will also be determined. First note that at the final time (3.34)
VOPt(X(N)) = xT(N)+,,x(N)
and, hence, assuming that Vopt has the form of Equation (3.33), P ( N ) = 4m
(3.35)
From Equation (3.25), u0Pt(x(N- 1)) may be written uopt(x(N - 1)) = arg max{R(x(N - l), u(N u(N-1)
-
l))}
(3.36)
where from the definition of R, Equation (3.20), R(x(N - I), u(N - 1)) = xT(N - I)L,,(N - 1) x(N WITP ( N )x ( N )
+
-
1)
+ uT(N
-
l)L,,(N - 1) u(N - 1) (3.37)
Next, x ( N ) is expressed in terms of x(N - 1) and u(N - 1) by using the system equations (3.29) giving R(x(N - l), u(N - 1)) = xT(N - l)L,,(N - 1 ) x(N - 1) u'(N - l)L,,(N - 1) u(N - 1) (F,(N - 1) x(N - 1) Fu(N - 1) u(N - 1))' P ( N ) (3.38) x (F,(N - 1) x(N - 1) F,(N - 1) u(N - 1))
+
+ +
+
44
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
This equation may be written R(X(N - l), u(N
1)) = X T ( N - 1) Z,,(N - 1) x(N - 1) + 2u(N - 1) Z,,(N - 1) x(N - 1) -
where
+
+ F,T(N
UT(N -
1) P(N)F,(N - 1)
Z,,(N
-
1)
= L,,(N
-
1)
Z,,(N
-
1)
= F,yN
-
1) P(N)F,(N - 1)
Z,,,(N -- 1)
= L,,(N
-
1)
+ F,T(N
-
-
1) Z,,(N - 1) u(N - 1) (3.39)
(3.40)
1) P(N)F,(N
-
1)
Now performing the indicated maximization of R with respect to u [Equation (3.36)], by differentiating and setting the resultant partial derivative equal to zero, gives aR/&
=0 =
2Z,,,(N
Th e n solving for u(N UODt(N -
-
I)
-
1) x(N - 1)
+ 2Z,,(N
-
1 ) u(N - 1)
(3.41)
l),
=
-Z;i(N
- 1) Z,,(N
-
1) x(N
-
(3.42)
1)
and, hence, (cf Equation 3.32), D(N
Using this value of VOD'(x(N
-
-
u0Pt
1)
= ZLi(N
-
(3.43)
1) Z,,(N - 1)
in the equation (3.26) for Vopt(x(N - 1)) gives
1))
=x(N-
l)T{Z,,(N-
1 ) -Z,,(N-
l)ZZi(N- l)Z,,(N-l)}x(N-l) (3.44)
which is of the form of Equation (3.33) if P ( N - 1) is chosen so that P(N
-
1) =- Z,,(N
-
1) - Z,,(N - 1) Z,-,1(N - 1) Z,,(N
-
1)
(3.45)
I n the foregoing it is assumed that Z,,(N - 1) is nonzero. I n fact, in order that the control law equation (3.42) maximizes the return, Z,,(N - 1) must be negative definite, i.e., Z,,(N - 1) < 0. Now, to complete the induction assume that the optimal control law, UOI"(X(;)), and the optimal return, VOpt(x(i)), have been obtained for i = ( k + 1, ... , N - 1) and that VO"t(x(K + 1)) has the form VOPt(X(K
+ 1)) = x(R + 1)'
P ( R -1 1) x(R
+ 1)
(3.46)
3.4.
45
LINEAR QUADRATIC PROBLEMS
T h e optimal value of u at the kth stage is given by uopt(x(k)) = arg max(R(x(k), u(iz))>
(3.47)
u(k)
where R ( x ( 4 ,u ( 4 )
=XT(4
-%,(k)
44 + U T ( 4 L,,(k) 44
+ x(k + I)'P(K + 1)x(h + 1 )
Following the analysis just given for the N
-
(3.37)-(3.39)], Equation (3.48) is written R ( x ( 4 ,U
( 4
= xT(k)
(3.48)
1 stage [Equations
&,(4 4 k ) + 2 U Y 4 Z,,(k)
+ uT(4 - G u ( 4 44
4k)
(3.49)
where Z,, , Z,, , and Z,, are given by Equation (3.40) with N - 1 replaced by k. Again maximizing with respect to u gives UOPt(k)
and, hence,
=
-Z;i(k) Z,,(K) x(K)
W )= Zi(4
~ U Z ( 4
(3.50) (3.51)
Finally, Vopt(x(k))is given by [cf Equation (3.25)] V0Pt(4k))= xT(w-,,(4 - Z z u ( 4
GX4 z,,(k))
(3.52)
Hence, Vopt is the form of Equation (3.33) for all K, provided that P(k) is chosen such that P(4
= Z,,(N
-
Z,,(4
.Gi(4ZU,(R)
(3.53)
Again, for a maximum, the matrix R,,(k) is required to be negative definite for all k, i.e., Z,,(R)
Uniqueness and Sufficiency
I n obtaining the dynamic programming solution to this problem the convexity condition Z,,(k) < 0 was required. If Z,,(k), for some R, is greater than zero, it is clear that a bounded optimal solution does not exist. Also, if Z,,(k) is singular, the optimal solution will not be unique.
3.
46
OPTIMAL CONTROL OF DISCRETE SYSTEMS
Assuming that neither of the foregoing cases hold, a uniqueness and sufficiency theorem follows.
THEOREM 3.1. If &'(R)
< 0,
R
= 0,
1,
... ,N
-
1
(3.55)
the optimal control solution of the foregoing problem exists, is unique, and is provided by the above dynamic programming equations. T h e proof follows directly from the derivation of the dynamic PROOF. programming equations.
3.4.3 Summary of the Algorithm T h e algorithm may be summarized as follows.
1 . Set P ( N ) = +zz and k = N - 1 . 2. Evaluate Z,,(k), Z,,(k), and Z,,(k) from the equations (3.56)
3. Compute D ( k )where (3.57)
D(k) = : Z i i ( k ) Zux(k)
4. Compute P ( k ) where 5. Set k = k 1 and repeat steps 2 and 3 until k 6. Determine uoPt(k) ( k = 0, N - 1) from ~
UOPt(R) =
-D(R)x(k)
= 0.
(3.59)
3.4.4 Example Low-Thrust Guidance. T h e sequential algorithm developed in the last section will now be applied to a simple example. Although the numbers involved refer to a low-thrust space probe, similar equations arise in many other problems and the techniques employed are universal. Consider the space probe traveling along a trajectory from the Earth to Jupiter. Only the motion in a plane normal to the trajectory is of
3.4.
LINEAR QUADRATIC PROBLEMS
47
interest as time of flight errors are assumed to be relatively unimportant (Figure 3.1). T h e motion of the probe in this plane may be approximated by the equations x=uo, y=u, (3.60)+ where x and y are the components of the position deviations centered on the trajectory, and where u, and uy are the control variables in the x and y directions. As motion in the x and y directions is assumed to be uncoupled, only one axis will be considered. T h e problem is, given some
FIG. 3.1. Definition of probe coordinate system.
initial displacement, to minimize the deviation in position x and velocity 3i. at the final time (which is given), while minimizing the control u, . I n vector notation the system equations are written, XI =
au,
xz = XI,
q(0) = xlo
1
xz(0) = xzo
specified
(3.61)
where a is a system parameter and x1 is the velocity deviation (kilometers per second), x2is the position deviation (kilometers), and u is the control force (kilometers per square second). Now, at least two approaches might be taken in the design of a performance index, for example, one criterion might be J1 =
+
w%(b)z xz(tr)2)
+J
‘f
{P2) dt
(3.62)
where tf is the final time. T h e value of the performance index is proportional to the state variables x1and x2at the final time, and the total control effort used. T h e parameters h and y adjust the relative importance of the +
T h e overdots denote differentiation with respect to time.
48
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
terms in performance index. T h e other approach might be to include a measure of the state throughout the trajectory, for example,
It might be thought that both indices would lead to similar control laws and similar trajectories, but, in general, this is not the case. T h e difference is well illustrated by this example. First, however, consider some of the system parameters. For a Jupiter mission, in the mid 1970s, typical flight times are of the order of 900 days of which the last 430 would be unpowered flight where no guidance would be possible. Suppose, for the purpose of this illustration, only the last 50 days of powered flight are considered. Typical errors in position and velocity at the initial (50 days before cutoff) time would be 3000 km and 0.3 m/sec. At this distance from the sun (1-2 AU)' assuming an ion engine, the thrust available for guidance corrections would be equivalent to an acceleration of lo-* km/sec2. Final position and velocity errors should be of the order of 30 km and 3 mmisec. I t is obvious from the time scale that continuous control is somewhat unrealistic, as this would require continuous monitoring of the probe's position. A more realistic approach would be to consider the 50 days as four stages,* and the final unpowered part of the flight as the final fifth stage. T h e problem may thus be reformulated as a linear multistage problem. T h e state transition equations may be obtained by integrating the equations (3.61) over one stage, i.e., Xl(i
+ 1) :=
X2(i
+ 1) = Xz(i)
Xl(i)
+ h(i).(i) .(if + h(i).@) + +h(i)Z.(i)
u(i),
i
= 0,
1, ... , 4
(3.64)
where h ( i ) is the length of the stage. Hence the matrices F, and F, are given by (3.65)
T h e performance indices
J1 and J2 become i=-4
11 =
A(X,(5)2
+ 1 AU (astronomical unit)
5
+ x2(5)2) + 1 W(i)W 2 }
(3.66)
i=O
149.6 x lo6 km.
* T h e choice of the number of stages was made somewhat arbitrarily for this example.
In practice many different physical, practical, and economic constraints might have to be considered before making a final decision.
3.4.
LINEAR QUADRATIC PROBLEMS
Hence the matrices L,, , L,,
49
, and +zz are, for J1 ,
and for J z ,
One feature of this problem is the large numbers involved. And, as is often the case, it is worthwhile to scale the variables. Here units of time were taken as megaseconds (seconds x lo6)and, of distance, megameters (meters x lo6).I n these units each of the first four stages becomes one unit of time, and the last stage becomes 35 units, i.e., h(i) = 1,
4 4 ) = 35,
i
= 0,
I , 2, 3
Control was possible only over the first four stages, hence,
.(i)
=
1,
44)
= 0,
i
= 0,
1,2,3
I n these units the thrust available for guidance was approximately one unit, and so the value of the weighting parameter y had to be chosen to limit the thrust to about this value. Here y = 100 was found to be suitable. T h e value of h was chosen to ensure that the terminal errors were within the allowed limits, h = 39 was found to be suitable. I t must be emphasized that the choice of these parameters, y and A, is not critical. T h e sequential relations equations (3.56) and (3.59) were programmed, and the optimal control computed for both performance indices. T h e optimal control and the optimal trajectory are given in Tables V and VI and are plotted on Figures 3.2-3.4. I t can be seen that there are significant differences between the two trajectories although the final states are similar. T h e first performance index J1 leads to a far more economical fuel consumption although the majority of the trajectory is a long way off the nominal. T h e second, J 2 , leads to a trajectory that follows the nominal far more closely although much more control effort is needed. Of course, the terminal state may not be the only consideration when a trajectory is chosen. On-board experiments may dictate other factors. However, from this simple
3.
50
OPTIMAL CONTROL OF DISCRETE SYSTEMS
TABLE V. Stage
0 1 2 3 4 5
LOW-THRUST PROBLEM OPTIMALTRAJECTORY Jl
0.300000001)-00 0.221946351)-00 0.14412870D-00 0.66547024D-01 - 0.1079866OD-01 -0.10798660D-01
U
Xa
XI
0.30000000D-01 0.33000000D-01 0.3521 9464D-01 0.36660750D-01 0.37326221D-01 0.60509528D-03
-0.78053646D-01
-0.77817659D-01
-0.77581671D-01 -0.77345684D-01
-0.00000000D-38
Weighted terminal error is 0.45621110D-02. Fuel used is 0.3 1 I .
TABLE VI. Stage
0 1 2 3 4 5
LOW-THRUST PROBLEM J2
X1
0.30000000D-00
- 0.84893898D-00
-0.1 2661807D-01 - 0.98040824D-00 - 0.61 262721 0-03 0.61 26272113-03 -
X2
U
0.30000000D-01 0.33000000D-0 1 0.24510610D-01 0.1 1848803D-00 0.20447207D-00 -0.7251 8953D-02
-0.1 1489390D-01
-0.41724174D-00
0.28577248D-00 0.97579561D-00 - 0.00000000D-38
Weighted terminal error is 0.20656466D-02. Fuel used is 1.7.
4t
-- J2 Index
1
Time
FIG. 3.2.
Optimal trajectories (position).
~
f l index, - - -
J2
index.
3.4. +I
-
1
1
51
LINEAR QUADRATIC PROBLEMS 1
1
1
1
1
1
1
1
1
1
1
1
1
-
-J, Index --- J2 Index
-
-
-
\
% 0- \
-
II
\ \
-
-I
\
I
I
\
\
, /
\
I.+-<
/
I
I
I
I
I
A
I
-
I
-
1
I
I
I
1
I
I
I
I
Time
FIG. 3.3.
Optimal trajectories (velocity). -Jl index, - - - Jz index. I
I
1.0 -
I
0.5,----J
-
-
g
I I I I
I 1
0;
0
-0.5
r---J
-
I
I
I
I I I I
1
-
-JI Index
--- J2 Index
I
-
-
I I I I
1
-
I
-
V
I
r---i
-
3
I
I
-
I I
-
I I
-1.0 -
,---J
I
-
I I
I
I
I
l r .
1
I
analysis it appears that it would be expensive to follow the nominal trajectory closely, and, in terms of the final position, little would be gained.
3.4.5
Linear Quadratic Problems with Linear Terminal Constraints
An analytic solution to the linear quadratic problem also exists if linear terminal constraints are present. These constraints will be denoted by2 K = $$x(N) (3.70)
3.
52
OPTIMAL CONTROL OF DISCRETE SYSTEMS
where K = ( K , , K , , ... , K,) denotes the constraint levels. T h e dimension of K must be no greater than that of n (Y n). T h e vector K may be considered as a specified constant, and the solution will be obtained for an arbitrary K . $ x is an Y x n matrix and must have full rank, i.e., the constraints must be linearly independent. This form of terminal constraint is a special case of the general constraint given in Equation (3.6). However, Equation (3.6) may be converted into the terminal constraint by introducing a new state variable x,+~where
<
%+l(O)
xn+l(i
=
0
+ 1 ) = xn+l(i) + M ( x ( i ) ,a, u(i)),
i
= 0,
1 , ... , N - 1
+
T h u s the general constraint may be rewritten K = $,(x(N)) X ~ + ~ ( N ) . Hence, if M is a linear function of its arguments, the more general form is included. T h e first step in deriving the algorithm is to form an augmented performance index J* by adjoining the terminal constraints to the old index with a set of Lagrange multipliers v, i.e., form,
J*
= .xT(N)cb,x(N)
+
VT(+P(N)
-K)
+ c {xT(Wz,(i) 4;)+ u T ( i ) L ( i ) .(i) N-1
(3.71)
i=O
Now, following the development for the unconstrained problem, an extremal control law and the corresponding return function are sought in terms of x and v. They are assumed to be of the form u""t(x(k),v) = --D(k) x(k) - E ( k ) v
V""t(x(k),v)
= x T ( k )P ( k ) ~
+ vTS(k) v
-
Now, as before, assume that Vext(x(k
uext may be found from
+
( k ) 2xT(k)Q(k) v vTK
(3.72)
+ l), v) has been obtained, and
uext(x(JZ),v) = arg ext[R(x(k),V, u(k))] u(k)
where R ( x ( k ) , v, u p ) )
=
xT(k)L,,(k) x(h) iuT(k)L,(k) u(k)
+ .xT(k + 1 ) P ( k )x ( k + 1) + 2xT(k + 1) Q(k + 1 ) v + vTS(k + 1) v
-
vTK
(3.73)
3.4. and where x(k
53
LINEAR QUADRATIC PROBLEMS
+ 1)
= Fx(k)
44 + F,(N
u(k)
This expression, Equation (3.73), is, for convenience, rewritten as
W k ) , Y , 44)= xT(k)zzm 44
+ 2uTW -G,(444 + u T ( 4 Z U , W
+ 2xT(4 Z,(k) + 2uT(k) Z,,(k) V
+ vTZV,(K)v - vTK where Z,, , Z,,
44
V
(3.74)
, and Z,, are defined as before by Equation (3.40), and = FzT(h)Q@
Z,,(k)
+ 11,
Z”,(k) = S(k
+ 1)
The extremal control uext(k) is easily found by differentiating this expression with respect to u ( k ) and setting to zero, giving uext(k) =
Hence,
w
=
- z . : ( k ) Z&)
-G:(4‘ L d k ) ,
x(k) - Z;:(K) Z,,(k)
Y
E(k) = ZX4 -&v(k)
(3.75) (3.76)
Now, by definition, VeXt(x(k),V) = R(x(k),V, u“Xt(~(k), v))
and so by substituting uext(k) [Equation (3.75)] back into Equation (3.74), it may easily be seen that P X t ( x ( k ) , v) will have the quadratic form of Equation (3.72), provided P , Q, and S are chosen by
W )= -Gx(k)
- Z,,(4
QW
= Zm(4 - z C u ( 4
S(4
= Z””(k)- -
z : ( w -%&) . m k ) ZU”(4
(3.77)
U k )G x k ) Z U m
These relations, Equations (3.75)-(3.77), enable Vext and uext to be computed as a backwards sequence. The relations are initialized by noting that VeXt(x(N), 4 = xT(N)4eP(N) (xT(N)9zT - K ) V
+
which implies that P ( N ) = +zx, Q ( N ) = +xT, S ( N ) = 0. Note that these relations have been obtained for an arbitrary value of v. The multiplier v must be chosen before determining whether the solution
54
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
maximizes the performance index. Th is is done by extremizing the return V e x t ( x ( k ) ,V ) with respect to v, i.e., vext =
(3.78)
arg ext{ Vext(x(k),v)} Y
Equation (3.78) leads to vext = S-l(k)(K - Q T ( k )x ( k ) ) providing the matrix S can be inverted. Although in this equation vext appears to vary from stage to stage, it is, in fact, a constant over a specific optimal trajectory. I n other words, the multiplier v may be found at any point along a trajectory so long as the matrix S is nonsingular. Substituting this expression for v into the equations for ucXtand Vext,the optimal control law and the optimal return are obtained, viz, U " ~ t ( X ( k )K ,
) = --D(k)
-
E ( k ) S-l(k) QT(k) x(k)
-
E(k) S - y k ) K
and VOPt(x(k), K ) = xT(k)(P(k)- QT(k) S-l(k) Q(k))~ ( k ) -XT(k)
Q(k) S-l(k) K
+ KTS-l(k) K
(3.79)
As noted before this construction requires that S ( k ) has full rank. If, for some k , S ( k ) is singular, veXt must be computed at a previous stage and the resulting control is essentially open loop. If at the j t h stage S obtains full rank, then the algorithm may be simplified by computing V o p t and u0*)tfor the preceeding stages explicitly. I n such cases, assuming that K = 0, Vopt(x(k)) and u o p t ( x ( k ) ) have the following forms, V o p t ( ~ ( k )= ) xT(k) P*(k)~(k),
u""(x(k))
=
-D*(k) x(k),
k = j , j - 1 , ... , 0
where P*(k) and D*(k) satisfy the same relations as P(k) and D(k) for the unconstrained problem. However, the initialization for P*(k) is provided by Equation (3.79) with the subscript k replaced by j . 3.5
NUMERICAL METHODS
Unfortunately, in general elegant analytical solutions to the dynamic programming equations, such as that obtained for the linear quadratic problem, are not possible. Instead numerical techniques must be employed. T h e traditional dynamic programming approach involves a systematic search procedure that is best described in terms of a simple example.
3.5.
55
NUMERICAL METHODS
Consider a very simple one-dimensional regulator problem. T h e 1) = x ( k ) u(k) where k equals 0, 1, ... , system equation is x(k and the performance index, to be minimized, is
+
J
+
1 ( X 2 ( k ) + u"k)) 1
=
k=O
+ x(2)2
T h e conventional approach would be as follows.
1. Compute values of the optimal return V0Pt(x(2)) for a range of 42) 1, and values of x(2), e.g., for ten values in the range 0 construct a table of V0pt(x(2)) versus 42). (See Table VII.)
<
TABLE vIr.
vopt(42))
<
V0pt
0
0.01
0.04
0.09
0.16
0.25
0.36
0.49
0.64
0.81
1
42)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
I
2. T h e next step is to compute a similar table €or values of Vopt(x(l)) vs. x( 1). T h e optimal value of u( 1) must maximize the return R(x(l), u( 1)) where R ( x ( l ) ,u(1)) = x(1)2 u(1)2 VOPt(X(2))
+
+
+
with 4 2 ) given by x(2) = x( 1) u( 1). Again specific values of x( 1) for 0 < x(1) < 1 are considered. T h e control u(l), in some algorithms, may only take discrete values, whereas in others is not so constrained. Consider first x(1) = 0, then R(0, u(1)) = ~ ( 1 ) ~Vopt(u(l)) which is clearly a minimum if u(1) = 0. Next, consider x(1) = 0.1,
+
+ u(1)2 + VOPt(O.1 + u(1)) We might try values for u in the range -0.1 < u < 0.1. Note that values for Vopt(O.1 + u(1)) have to be interpolated if 0.1 + u( 1) does not equal R(O.1, u(1)) = 0.01
one of the stored values of 42). Values of R for values of u( 1) are given in Table VIII where it can be seen that u( 1) = -0.05 gives the minimum. T h e procedure is repeated for each value of x( 1) so that a table of values of VOPt(x( l)), x( 1) and uoPt( 1) may be constructed.
R(O.1 >
4 1))
41)
0.02 -0.1
TABLE VIII.
R(O.l, u(1))
0.0075
0.02
0.035
0.06
0
0.05
0.1
-0.05
56
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
3. T h e last step is repeated for x(0) and u(0) until finally a table of VoI)t(x(0)), x(0) and ~ " " ~ is ( 0constructed. ) T h e general procedure may be summarized as follows. T h e state i n d the control are divided into a number of discrete values. T h e value of the optimal return Vopt(x(N)) is computed simply by evaluating #(x(N)) for each of the quantized states. T h e n Vo"t(x(N- 1)) is computed by evaluating R(x(N - l ) , u(N - 1)) for each control value and the maximum is found by an elementary trial and error procedure. It may, of course, be necessary to interpolate the values of Vopt(x(N))if x ( N ) = F ( x ( N - l ) , u(N - 1 ) ) is not a quantized state. T h e procedure is repeated for each value of x ( N - 1) so that a table may be constructed of VoLlt, x(N - l), and u O ~ ) ~ ( N 1). This procedure is repeated at each stage until the initial stage is reached. T h e final result is a table of values of VOl)t(x(0)),x(O), and u o p t ( 0 ) . There are certain advantages to this procedure. For example, very general systems can be handled, and constraints actually tend to help the computation by reducing the search areas. Also the optimal control is obtained as a feedback solution. I n practice, however, the computational requirements become excessive for all but very simple systems. Both storage requirements and computation time can be troublesome. For example, most of the storage is required for the table of VoIlt as a function of the state. For the n-dimensional case with 100 quantized values of the state, 100" storage locations would be required. As a consequence several ingenious methods have been developed that attempt to reduce the amount of storage required, e.g., if there are fewer control variables m than state variables n, a substantial saving may be achieved by transforming the state space. A completely different approach is to obtain the optimal solution from a single initial state. T h e procedures developed here are iterative in nature, i.e., they are based on updating some nominal control sequence such that the optimum is eventually reached. T h e storage and time requirements are comparatively small. A true feedback solution is not obtained, although there exist techniques by which it is possible to compute the optimum feedback in a small region about the optimal trajectory. 3.6
3.6.1
THE GRADIENT METHOD
The Gradient Algorithm
I n this section a gradient method for the optimization of multistage problems will be developed. T h e method is motivated from a
3.6.
57
THE GRADIENT METHOD
consideration of a first-order expansion of the return function, V(x(O),a , u[O,N - l]), about some nominal control variable sequence uj[O,N - 11 and a nominal parameter, aj, viz, V(x(O),af+l,uj+l[O, N
-
11)
V ( X ( O ) , aj, uj[O, N
=
+ aV(x(O),
-
11)
aj, uj[O, N
aor
-
11)
601
aj, uqo, N - 11) 6u[O, N aura, N - 11
aV(X(O), +
-
I]
(3.80)
T h e variations in the control variables 801 = and 6u = uj+l - uj must be small enough to ensure the validity of the expansion. Clearly, if Su[O, N - 11 and Sol are chosen by
6u[O, N
-
11 = E
uj[O, N [ 8V(x(O), au[o, N - 11 aj,
11)
T
-
11)
T
and
a,=.[ where
E
aV(x(O),aj,aa uqo, N
1 -1
-
(3.81)
(3.82)
is some positive parameter, the return V(x(O),
Uj+l[O, N
-
11)
will be greater than V(x(O),aj, d [ O , N - 11). Instead of forming V(x(O),a , u[O, N - 11) explicitly and then differentiating, the gradients are computed more easily by means of a backward sequence of equations. It is clear that a change in u(k) will not affect L(x(i),01, u(i))for i < k. Hence, the gradient of the return function V(x(O),01, u[O, N - I]) with respect to the control function u (k) is the same as the gradient of V(x(k),a , u[k,N - l]), i.e.,
Now the return V ( x ( k ) ,a , u[k, N - 11) from its definition may be written
t T h e superscript j , denoting the nominal, is dropped to clarify the text.
58
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
IIence, differentiating with respect t o u ( k ) , we obtain PI-(.Y(k), a,u [ k , N &(k) ~
~
-
11)
8L(x(k), a, u p ) ) i-u(k)
+
aV(.r(k
+ l), a,u[K + 1, N Ex(K
-
4-1)
13) EF(X(K), a, u(k))
Wk) (3.84)
I n the following analysis the arguments x(i),a , u[i,N - 11 or x(i), a , u(i) will be replaced by the single argument i. Subscripts, unless specifically defined otherwise, denote partial derivatives. For example, Equation (3.84) is written V,,(k) L,,(k) V,(k l)f;,(k). I n order to evaluate this expression it is necessary to obtain Vz(k 1) for k = 0, 1, ... , N - 1 . A sequential set of relations for V , evaluated along a trajectory may be obtained by taking partial derivatives of Equations (3.16) and (3.17), 1.e., (3.85) 1 ’ A N ) = 4&(N)> a> 2
+
F-,(k)
=
+
+
v,(K i1)FJk) + L,(k)
(3.86)
T h e partial derivative of V(0) with respect to a is obtained in a similar fashion by partially differentiating Equation (3.16), viz,
+
where VJ(k 1) is given by Equations (3.85) and (3.86). T h u s using Equations (3.85)-(3.88), the gradients of the return V(x(O),a , u[O, N - 11) with respect to n, and u[O, N - 11 may be formed as a backward sequcnce. T h e gradient algorithm may now be summarized as follows. 1. Choose a nominal control sequence d [ O , N - I] and control parameter a?. Construct and store the trajectory x[O, N ] from the system equations (3.4) and the nominal control variables. Also compute the cost J where (3.89)
2. Compute the partial derivatives V,(k)
=
V,(x(k),d,uj[k,N
-
I])
3.6. and V,j(k)
=
Va(x(h),~ 3 uJ[k, , N
-
l’z3(N) = +rn(N),
11) for h
=
N, N
=
-
1, ... , O from
VW3(N)= +,(W
+ 1)Fz’(K) +L,J(K) V,J(k)= VZ3(K+ l)F,+) + Val@ + 1)
P,+)
59
THE GRADIENT METHOD
(3.90)
Vz3(k
L
L,’(k)
where Fa’(k) = F,(x(k), d ,ul(k)) and L,l(k) = L , ( x ( k ) , ( ~ 3 u)(k)). , 3. Compute and store the gradient with respect to the control at each stage from
4. For some nominal parameter E &‘‘(k)
=:
d(k)
> 0, compute the new
control from
+ t[aVjad(k)]’
(3.92)
5. At the initial stage compute the new control parameter &+l = 013
where
< > 0.
+ .V,T(O)
d+l
from
(3.92a)
6. Use the new control variables uj+-l[O,N - I], .j+l and the system equations to construct a new trajectory xj+l[O, N ] and compute the new cost J j + l . 7. If J i + I > J j , continue with steps 2 and 3 . If ji+l < J i , reduce the step size parameters E , 2; for example, set E = ~ / 2 <, = 212, and repeat steps 4, 5 , and 6, etc. 8. T h e iterations are continued until either no further increase in the cost is possible, or until the desired accuracy is attained. If side constraints are present, appropriate modifications must be made. For example, consider a side constraint of the form, [cf Equation
(341,
+ 1 M ( x ( i ) ,a , 49) N-1
k’ = # ( x ( N ) ,01)
(3.93)
i=O
T h e gradient projection approach taken here is derived by introducing W(x(k),01, u[k, N - 13) associated with the constraint so that W ( k ) ,01, u p , N
~
11) = # ( x ( N ) ,01)
+ 1 {MW, N-1
i=k
01,
+)I
(3.94)
60
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
Following a development similar to that just given above for the return V it is possible to derive the partial derivatives W, , W, , i.e.,
(3.95)
where the arguments of the return W have again been abbreviated. Thus, it is possible to form the gradient of W with respect to the control u(k)as
T h e corrections in the gradient algorithm become (3.97)
and
It remains to choose values for the parameter V. They are chosen to enforce a desired change SK in the level of the linearized constraint. I n terms of the return W, SK is given by SK
Substituting for
Sol
=
WN(0) 601
= ,it1
6u[O, N
~
11
-
cij
+ W,(O) 6u[O, N - I]. and
uj+l[O,N
-
I]
- Ui[O,
N
-
11
3.6.
THE GRADIENT
61
Hence, v may be found to be N-I
x [8K - E
c (W& + l)F,(i)
N-1
i=n
Of course it must be assumed that abnormality does not occur, i.e.,
3.6.2
The Penalty Function Technique
Another approach to the problem of fixed terminal constraints is provided by the penalty function technique. A strictly convex function of the constraints is adjoined to the original performance index J , i.e., J* = J C[#(x(N),a ) - K ] 2 rwhere the parameter r is often taken as unity and where (in a maximization problem) C is a negative constant. T h e problem is now normal and unconstrained and the straightforward gradient technique may be used. By increasing the magnitude of the parameter C, the term [#(x(N),a ) - K ] 2 rcan be made arbitrarily small. T hi s approach is appealing from an engineering viewpoint as, in normal circumstances, hard constraints do not have to be met exactly. One disadvantage is that additional extrema may be introduced and convergence to the wrong extremal is possible. Fortunately, however, such cases seem to occur in theory rather than in practice. Another disadvantage is that the addition of penalty functions with large weighting constants C tends to slow convergence. T h e effect, obviously, depends on the problem, but it can become a major factor in the number of iterations required to obtain a solution.
+
3.6.3 Example
The Brachistochrone Problem. A classic problem in the calculus of variations is the brachistochrone problem, which may be phrased as follows. Find the path by which a bead may slide down a frictionless wire between two specified points in minimal time. T h e initial velocity of the bead will be assumed to be specified by Vn # 0, and the gravitational field has a constant acceleration, g per unit mass. Restricting the bead to
62
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
a plane, the position of the particle may be described with the rectangular coordinates ( y , z),where z is the verticle component and y is the horizontal component of the position vector. (See Figure 3.5). Assuming that along the optimal path, x may be expressed as a piecewise differential function of y , the time of transfer may be expressed by the path integral,
where y o and y n are the initial and final values of y and V ( z , y ) is the velocity of the bead.
FIG. 3.5.
The brachistochrone problem.
+
+
T h e conservation of energy implies Vo2/2 gz, = V 2 / 2 gz. Hence, V may be expressed as terms of x and known parameters
+
2g(zo - z ) ) 1 / 2 . T o simplify the form of the performance index, dimensionless variables are introduced I/ _. (I/”
*y =
( 2 d ~ o 2 )-( ~Zo),
t
=
(2g/V,2)(y Yo)
J
ZgTiV,, ,
u
=
dZ/dy
=
~
3.6.
63
THE GRADIENT METHOD
T h e performance index now becomes ]=
j
tN
+ .”)””(I
(1
0
- x)ll2dt
T h e system equation relating the time derivative of the state vector x to control u is simply dxldt = u. T h e value of the state at the initial time t = 0 is x(0) = 0. T o complete the problem, let the final time t , and final state x, be specified by t , = 1 and X, = -0.5. T o apply the discrete gradient method to this problem, the performance index may be replaced with the following discrete approximation,
J
c {[l +
N-1
=
u(i)2]1’/”[1
i=O
- x(i)]1/2} h(i)
where h(i) is the length of the ith stage. T h e system equation is approxi1) = x ( i ) h(i) u(i). I n this example 100 equal stages mated by x ( i were chosen so that h(i) = 0.01, i = 0, 1, ... , 99, and N = 100. T h e boundary conditions, Equations (3.90), for the partial derivatives are V,(N) = 0 and W,(N) = 1. T h e partial derivatives required to compute Equations (3.90) are
+
+
+ u(i)”>’/2(1 x(i))-”/” L J i ) = h(i)u(i)(1 + ~(i)~)-’/~(1 ~(i))-’/~ L,(i)
=
yZ(iX1
-
-
MJi)= 0,
F,(i)
=
1,
and
(3.99)
F,(i) = h(i)
Finally, the nominal value for the control ui was chosen as u ( i ) = 0,
i
=
0, 1 , ... ,49
~ ( i=) -0.5,
i
=
50, 51, ... ,99
and the parameter E was chosen as 0.5. T h e gradient algorithm was used to determine the optimal control as follows.
1 . T h e nominal uj was used to compute a trajectory x[O, N ] and the nominal cost p . 2. T h e partial derivatives V,(lz), W J k ) were computed using Equations (3.90) and (3.95). 3. A value was chosen for SK, here, SK = xj-’(N) 0.5 and the Lagrange multiplier v was computed.
+
64
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
4. T h e gradients aV/au(k), aW/au(k) were formed using Equations (3.91) and (3.96) for k = N - 1 , ... , 0. 5. With E = 0.5, the new control uj+l was computed [Equation (3.97)}. 6. T h e new control was used to compute a new trajectory and the new
cost
was computed.
Jj-kl
7. T h e new cost was checked to ensure repeated for 10 iterations.
Jj+l
and steps 2-6 were
T h e results of the computations are shown in Table IX and Figure 3.6. T h e computer time required was under one second on the IBM 7090
computer.
TABLE cost
Nomitial
1.1369744 1.0251133 1.0011187 0.9997580 0.9995843 0.9995633 0.9995604 0.9995600 0.9995599 0.9995599
2 3 4 5 6 7 8 9
BRACHISTOCHRONE PROBLEM
41)
Iteration 1
Ix.
0.0000000 - 1.I508888
-0.7745588 -0.7875542 -0.7767487 0.781 6071 -0.7799865 -0.7805840 -0.7803639 -0.7804477 -
-0.4999999 -0.4999997 -
0.50oooO0
- 0.4999998
-0.4999998 -0.4999998 -0.4999998 -0.4999998 -0.4999998 -0.4999999
V
- 0.0000000
0.0693148 0.1821865 0.2224539 0.223 I 138 0.2228842 0.2229772 0.2229454 0.2229454 0.2229574
Nominal control
-02
I 0
I
~
i
i
J
08
10
-0 4
-06
x(101)
0 2
04 06 Time
FIG. 3.6. Optimal solution to the brachistochrone problem.
3.6.
65
THE GRADIENT METHOD
The computation was repeated with the penalty function technique. T h e terminal constraint was adjoined to the performance index J , i.e., J* = J c(x(N) 021)~. T h e boundary condition for the function V , becomes V,(N) = 2c(x(N) 0.5) and the other partial derivatives remain unaltered. [The partial W,(k) is, of course, no longer needed.] T h e gradient algorithm was then used to compute the optimal control for several values of the parameter c. I t was found that the solutions were near to those obtained previously, in the sense that the terminal constraint was met and the value of the returns were similar. However, the control histories were substantially different; in other words, final convergence was not obtained. (See Figure 3.7.)
+
+
+
-0 2
Nominal control
- 0.6 -0 8 u
3.6.4
I
-I
Nominal control
The Stationarity Condition for a Local Maximum
From a consideration of the first-order expansion of the return function equation (3.80) it is clear that the stationarity conditions for the unconstrained problem are (3.100)
and
3.
64
OPTIMAL CONTROL OF DISCRETE SYSTEMS
From the preceding analysis, the first condition may be rewritten Vz(k
+ l ) F , ( k ) + L,(k) = 0 ,
k
= 0,
1, ... , N - 1
(3.102)
T h e same conditions may also be derived via the classical techniques. T h e system equations are considered as constraints and adjoined to the performance index with a set of Lagrange multipliers p (a row vector). T h e augmented performance index J* is given by
I*
+ c [L(x(i),a, +))I N-1
=W
N ) ,a )
i-0
Introducing the notation
+ p ( i + 1)F(x(i),a, .(i)
H(i) = L(x(i),01, .(i)
(3.104)
Equation (3.103) becomes
/*
= +(x(N),01)
+
or alternatively
/*
N-1
i=O
[ H ( i )- p(i
+ 1) 4;+ 111
+ c [W) P ( i ) 493 + P(0)40) N-1
= +(x(N),a )
-
p ( N )x ( N )
-
i=O
Now consider the first variation of J*, 6l J*, which is given by i=o
(3.105)
3.6.
67
THE GRADIENT METHOD
So far the Lagrange multipliers p have been considered as arbitrary parameters. They are now chosen by p ( N ) = q5T(x(N),a ) ,
i
p ( i ) = HT(i),
=
0, 1, ... , N
-
1
(3.106)
so that the first variation becomes Sly*
=
p(0)Sx(0)
+ 1 {H,(i) Su(i) + H,(i) Sa} N-1 i=O
+ q 5 C M W Y 4 Sn
(3.107)
But at a local maximum the first variation, S J * , must be zero for arbitrary variations in &(i) and Sa(0). I t follows that H,(i)
i
= 0,
4&(W
1)...)N
= 0,
+c N-1
a)
-
(3.108)
1
(3.109)
=0
i=O
Equation (3.108) is the same condition as Equation (3.102) if p ( i + 1) is identified with V,(x(i l), u(i I), N - I)), and Equation (3.109) is identical to Equation (3.101). If side constraints of the form given by Equation (3.6) are present,
+
+
then the condition for stationarity requires that there exists a Lagrange multiplier v such that V,(i
+ l)F,(i) + L,(i) + vT(Wz(i)F,(i)+ M,(i)) V,(O)
3.6.5
+ V'W,(O)
=0
=0
(3.1 10)
The Discrete Maximum Principle
T h e function H ( i ) is often referred to as the Hamiltonian for discrete systems. From Equation (3.108) it may appear that the optimal control maximizes the discrete Hamiltonian H ( i ); i.e., for the control u*[O, N - I] to be optimal, H(x(i),u*(i),p ( i
+ 1))
H(x(i),u(i),p(i
+ l)),
i
= 0,
1, ... , N - 1
Thus, for a local optimum the stationarity condition is H,(i)= 0 and the convexity condition is Huu(i) 0. Although it is true that this stationarity condition must hold [cf Equations (3.102) and (3.109)], the convexity condition Huu(i) 0 does not, in general, ensure a local optimum. For some specific types of problems the discrete maximum principle has been shown to hold, e.g.,
<
<
68
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
Halkin (1966). However the correct convexity condition for the general case will be derived in 3.7.4. THE NEWTON-RAPHSON METHOD
3.7
3.7.1
The Successive Sweep Algorithm
T h e successive sweep algorithm is motivated from a consideration of a second-order expansion of the return function V(x(O),01, u[O, N - l]), about some nominal control sequence uj[O, N - 11, and some nominal control parameter a:$, viz,
+ +
11) = V(x(O),d,uj[O,N - I]) V,(O) 601 V,(O) SU[O, N - 11 46a:' VaE(0) Sa: SffT V,,(O) S U [ O , N - I] SuT[O, N - 11 V,,(O) 6u[O, N - I] (3.1 11) where 6a: = aj+l - a:j and 6u[O,N - 11 = uj+l[O,N - 11 - uj[O, N - 11 must be small enough to ensure the validity of the expansion. I n the successive sweep algorithm 6 n and 6u are chosen to maximize the righthand side, i.e., V(x(O),d + 1 , uj+*[O,N
-
+ + ++
One of the main drawbacks here is that such a choice would require the explicit inversion of a large matrix. Accordingly, as with the gradient method, a method is developed which chooses 6u(K) in a sequential manner. First, we note that the Newton-Raphson choice for Guj+l[K, N - 11 and hence for Su(k) must maximize the second-order expansion of V(x(K),d+l,uj+l[k,N - l]), i.e., Su[k, N
-
11 =
max
Su[b,N-l]
{V,(k) Sx(k)
+ V J k ) Su[k,N
+ V,(k) 601
+
11 3 Sx(k)T VZ,(k)Sx(k) + S X ( ~ )Vzu(k) ~ Su[k, N - 11 S X ( ~ )Vz,(k) ~ 601 SolT C'=*(k)Su[k, N - I] -1 4 SCX~V,,(K) 601
+
-k 8 S U [ k , N
-
I]'
-
+
V,,(k) Su[k, N
-
13)
(3.113)
Here the expansion includes terms in Sx(k) to allow for changes in the state x(K) due to changes in the previous controls 6u[O, K - I].
3.7.
69
THE NEWTON-RAPHSON METHOD
From this expansion it can be concluded that the Newton-Raphson choice of Su[k, N - 11, and hence of 6u(k), must be a linear function of Sx(k) and Sar, i.e., Su(k) = u,(k) Sx(k)
+ u,(k) 601 + Su,(k)
(3.114)
This feedback law will be referred to as the corrective feedback law. In order to obtain the feedback coefficients for the kth stage, it is necessary to assume that the corrective feedback law has somehow been computed for the stages k 1, ... , N - 1. The return obtained by using a control u(k) followed by the updated control uj+l(k 1) is given by R(x(k),44, a, k), (3.115) R(x(k),~ ( k )a,, k ) = L(x(k),~ ( k )01,, k ) Vj+'(x(k I), a, k 1)
+
+
+
+
+
+
where x ( k 1) = F(x(k),u(K), a, k). Now, expanding the return R to second order in terms of Su(k), Sx(k),and Sa about the nominal solution gives R(x(k),d+l,uj+l(k)>= Z,,(k)
+ Z,(k) &(k) + Z,(k) &(k) + Z,(k) 601
+ [SxT(k) SET SuT(k)I where Z,(k)
=
V+l(k
1
-G,(k) &,(4 Z,UW S x ( 4 Z,,(k) .T,,(R) -T&) ~ , d 4 -G,(k) -G,(4 W k )
[
+ 1 ) +L(k)
(3.116)
+ 1)F,(k + 1) +L,(k) Z,(k) V:+l(k + 1)FJk + 1) + L ( k ) Z,(k) v:+yk + 1)F,(k + 1) +La@) + V,(k + 1) Z,,(k) FZT(k)V::'(k + l)F,(k) + V P ( k + l)F,,(k) + L,,(k) z,,(k) FZT(k)V;p(k + 1) F,(k) + v;+l(k + l)F,,(k) + L,,(k) Z,,(k) F,T(k) V;;l(k + l)F,(k) + C+l(k + 1 ) F,,(k) + L,,(k) Z,(k)
=
v;+'(k
=
=
=
=
=
Z,,(k) Z,,(k)
Z,,(k)
+ l)F,(k) + F,T(k) v;:l(k + 1) + v;+v + 1) F,,(k) +L,,(k) = F,T(k) v:;'(k + l)F,(K) + F,T(k) v;:l(k + 1)
= F,T(k)
v;;I(k
+ v;+v + 1) F,,(k) + L,,(k)
= FmT(k)VL;'(k
+ l)F,(k) + FmT(k)V$;'(k + 1) + V::'(k + 1) F,(k) (3.1 17)
70
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
Thus, taking a second-order expansion of the left-hand side about d(k) and d(k) (uhas just been chosen) and equating coefficients of identical terms in Sx, 801, etc., on both sides gives the following difference equations for the partial derivatives
These partial derivatives are computed as backward sequences starting at the final time where
L',&v
V , ( N) = 4,(N)>
=+TAN),
V,,(N)
V,(N>= 4,") =+
z m 7
Vm(W = + a Z , , ( W
(3.121)
Finally, at the initial stage, the change in the control parameter 8~ is obtained by maximizing the expansion of the return V(x(O),01, ui+l(O,N - 1)) which gives
fh= - Vi-,1(0)[ V,T(O)
+ V,,(O) S X ( 0 ) I
(3.1 22)
3.7.
71
THE NEWTON-RAPHSON METHOD
Summary of the Newton-Raphson Method
3.7.2
1. Choose a nominal control sequence d [ O , N - 11 and control parameter d.Construct and store the trajectory x[O, N ] from the system equations (3.4) and the nominal control variables. Also compute the value of the performance index, Equation (3.5). 2. Compute the partial derivatives of V , V , , V,, , V,, , etc., from Equations (3.120) and (3.121). 3. Compute and store the feedback gains for the &[0, N ] in Equation (3.1 18). 4. At the initial time compute where d+l = aj SCYand where 6a: is given in Equation (3.122). 5. Generate a new trajectory using the linear feedback law, i.e.,
+
Uj+l(K)
= uj(K) - 2,-(1(K)[Z,,(K)(xj+l(k)
-
d(k)) + Z,,(k)(oli+'
-
ai)
+ Z,(k)]
and compute the cost. [If the cost is not reduced, a smaller step must be taken, e.g., replace ZJK) by EZ,(K) where 0 < E < I.] 6 . Return to step 2 and repeat until convergence is obtained. If constraints of the form equation (3.6) are present, the performance index may be augmented, i.e.,
J*
=
W N ) ,a)
+
VT4(X(N),
4+
c [L(x(i),+), 4 +
N-1
V
T
i=O
W
4 01,
W l
where v is the unknown Lagrange multiplier. Now, at the optimum for the constraints to be satisfied the return must be extremized with respect to V. Hence, these multipliers may simply be considered as control parameters. Thus, the control parameters a are simply increased to include the Lagrange multipliers. Note, however, that an extremal is sought, not a maximum.
3.7.3
Examples
The Brachistochrone Problem. First, to compare the convergence of the successive sweep method with the gradient method, consider the brachistochrone problem that was described in detail earlier. The discrete system equation was x(i I) = x(i) u(i)and the augmented performance index ]* is given by
+
J*
+
+ 0.5) + C h(i)[l + ~ ( i ) ~ ] l ' ~~~(/i [) ]l ' / ' N -1
= v(x(N)
-
i=O
72
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
Again, 100 equal steps h ( i ) were chosen, i.e., h(i) = 0.01,
i
= 0, 1,
... , 9 9
T h e partial derivatives Lx(i),Lu(i),F J i ) , and Fu(i)are given by Equations (3.99) and the following additional partial derivatives are required, V,(N)
= v,
V,Z(N)
V””W)= 0
= 0,
+ u(i)”)’/~(1 L&) = *h(i)u(i)(l + Luu(i)= h(i)/(l - x)l/Z (1 + L,,(i)
=
VZ”(N) = 1
V”(N)= x ( N ) ,
$h(i)(1
-
.(i))-”’”
u(i)2))”2(1
- x(i))-”’”
u2)3/2
T h e nominal value for vj was taken to be 0. Thus, all the quantities needed to complete the algorithm are available. T h e results of computations are portrayed in Tables X and XI. T h e optimal trajectory was identical to that obtained previously with the gradient algorithm (Figure 3.6). From Table X it can be seen that the fourth and fifth iterates yield good (six-place accuracy) approximations to the optimal solution, which was obtained by an independent calculation with the same program. T h e following results, when compared with those obtained by gradient procedures in the previous section, demonstrate the advantages of the successive sweep. Far fewer iterations were required for convergence. I n comparison with gradient methods, the successive sweep method provides rapid convergence. Theoretically, the number of accurate decimal places TABLE X.
BRACHISTOCHRONE PROBLEM
cost
41)
x(101)
V
1.1369745
0.0000000
-0.4999999
0.0000000
1.0012708
-0.5704293
-0.4999999
0.35 10944 -
2
0.9995637
-0.7754882
-0.5000000
0.2254505 -
3
0.9995598 _-_0.9995599 ___
-0.7804 1 38
-0.5000000
0.2229594
- 0.7804242
-0.4999999
0.2229543
0.9995599
-0.7804242
-0.5000000
0.2229543
Iteration Nominal
-
1 ~
4
5
-
~-
-
____
3.7.
THE NEWTON-RAPHSON
73
METHOD
TABLE XI
BRACHISTROCHRONE PROBLEM ; BACKWARD SWEEP QUANTITIES t
U
0 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.o
-0.780 -0.707 -0.642 -0.584 -0.531 -0.482 -0.437 -0.395 -0.355 -0.317 -0.281
X
0
-0.075 -0.142 - 0.204 -0.260 -0.311 -0.357 -0.399 - 0.436 -0.470 -0.500
z ; :
V
0.495 0.529 0.561 0.590 0.616 0.640 0.662 0.681 0.699 0.715 0.728
0.624 0.568 0.515 0.468 0.426 0.387 0.351 0.317 0.285 0.255 0.233
0.468 0.399 0.3 30 0.271 0.219 0.174 0.133 0.096 0.062 0.021 0.000
VZ"
V""
0.947 0.968 0.989 1.oo 1.01 1.02 1.02 1.02 1.02 1.01 1 .oo
-1.62 - 1.44 1.26 1.09 -0.92 -0.75 -0.59 - 0.44 - 0.29 -0.14 0.00 ~
~
should double with each successive iteration. I n Table X the accurate digits are underlined. I n general, the theoretical speed of convergence seems to be sustained.
An Orbit Transfer Problem. Let us consider a problem more closely related to modern technical problems, the orbit transfer problem. For the purpose of this problem, the rocket is constrained to a plane. T h e rocket position may be characterized by polar coordinates (r, +), where r is the distance from the center of the gravitation field and is an angle (Figure 3.8). T h e velocity of the rocket may be expressed in terms of its components in the radial direction u and in the tangential direction v. T h e rocket will be assumed to have constant low thrust T. T h e only control that is available to the engineer is the direction of the thrust, which we measure by the angle 6' that the rocket makes with the tangent to contours of constant radius. Figure 3.8 pictorally represents the variables just mentioned. At a particular point in time the dynamics of the rocket are governed by the following rules:
+
T sin 0
VZ
u=--"
r
r2
+ m,
+ h(t
-
' .
to) '
=
T cos 0
uv
7+
m,
+
h(t
-
to)
T h e parameters appearing in the foregoing equations are p equals MG, M equals the mass of the central body, G equals the gravitational constant, T is the thrust, m, is initial time, to is initial time, and -ni represents mass flow/unit time.
74
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
FIG. 3.8.
The orbit transfer problem.
T h e problem that shall be considered here is the transfer of a spacecraft in a fixed time from an initial specified circular orbit to another circular orbit while maximizing the final radius. T o simplify the scaling the following normalized coordinates will be introduced
I n the new coordinates, the discrete form of the system equations is
+ 1) = Xl(i) + h(i)x z ( i ) xz(i + 1) = xz(i) + h(i)[xa2(i)/xl(i)- X/xlz(i) + A(i)sin u(i)] x3(i + 1) x,(i) + h(i)[-xz(i) X3(i)/Xl(i)+ A(i)cos 441 Xl(i
=
where A(i)equals a/(1 - b t J . T o agree with Moyer and Pinkham (1964; see bibliography at end of Chapter 8) a and b are chosen as 1.405 and 0.07487, respectively. T h e initial boundary conditions become q(0) = 1, xz(0) = 0, and ~ ~ (= 0 1. ) T h e terminal boundary conditions are = x,(N) = 0
and
$02
= x3(N)
~
( X ~ ( N ) )=- ~ 0. / ~
3.7.
THE NEWTON-RAPHSON METHOD
75
T h e final time t , = 3.32 is chosen to agree with the earth-Mars minimal time according to the results obtained by Moyer and Pinkham
(1964).
T h e modified performance index becomes
J*
=
x,(N)
+
"l(XZ(W
+
"2(X:,(N) -
(@Y9
Sun
0
, FIG. 3.9. Optimal transfer trajectory.
Time
FIG. 3.10. Optimal control history.
76
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
The problem becomes that of maximizing J* with respect to u(i) (i = 1, N ) and extremizing J* with respect to vl and v2 . Numerical results were obtained using the discrete successive sweep algorithm. One hundred equal time intervals were employed. Several nominals were used that yielded convergence to the optimal solution. The number of steps required to obtain three-place accuracy ranged from 13-18, depending upon the initial nominal. About 2 sec of IBM 7090 computer time was required per iteration. The converged solution agrees with the results of others. Roughly three-place accuracy was obtained. The trajectory is graphed in Figure 3.9. The direction of the thrust is indicated at eleven equally spaced time intervals. The thrust program is graphed in Figure 3.10. Note that typical of low-thrust trajectories, the thrust has a positive radial component during the first half of the journey and then rapidly reverses itself to yield a negative radial component on the last half of the strip. The optimal feedback gains along the optimal are tabulated in Table XII. These are the partial derivatives of the control with respect to the TABLE XII. u, x(l), x(2), 4 3 ) t
0
0.166 0.332 0.498 0.664 0.830 0.996 1.162 1.328 1.494 1.660 1.826 1.992 2.158 2.324 2.490 2.490 2.656 2.822 2.988 3.154 3.32
U
0.439
0.500
0.590 0.700 0.804 0.933 1.08 1.24 1.42 1.66 2.39 4.36 4.73 4.88 5.00
5.10 5.10 5.18 5.25 5.32 5.38 5.45
ON THE
OPTIMAL
XU)
42)
1.oo 1.oo 1.OO 1.Ol 1.02 1.04 1.06 1.10 1.14 1.18 1.23 1.29 1.34 1.36 1.42 1.45 1.45 1.48 1.50 1.51 1.52 1.52
0.008 0.030 0.058 0.093 0.133 0.176 0.224 0.261 0.298 0.330 0.320 0.285 0.269 0.206 0.206 0.167 0.124 0.093 0.061 0.036 0.0
0.0
43) 1.00 1.02 1.03 1.04 1.05 1.05 1.04 1.02 0.989 0.950 0.905 0.847 0.810 0.800 0.711 0.711 0.766 0.762 0.766 0.775 0.788 0.806
3.7.
THE NEWTON-RAPHSON
77
METHOD
; # z . These derivatives are large on the state and constraint levels first half of the trajectory, indicating that the optimal control function changes rapidly in this region.
3.7.4
The Convexity Condition
From a consideration of the second-order expansion of the return function equation (3.1 1l), it is clear that the convexity condition is azV(x(O), a, u[O, N - 11) aa au[o, N - 11 a2V(x(0),a, u[O,N - I])
x(O), a, u[O, N - 13) a U z [ o , N - 11 ~ ( 0a,) ~ u[O,N - 11) au[O, N - 11 aa
aaz
1
< . n \
R I \-----1
-
However, to test this condition would require an eigenvalue analysis of an ( m N + p ) x ( m N p ) matrix. A somewhat simpler condition is suggested by the development of the successive sweep algorithm. It is clear from this development that for a maximum
+
Z,,(k)< 0,
VJO)
< 0,
k
= 0,
1, ... ,N
-
1
(3.124)
must hold and is equivalent to the condition just given. Here only an eigenvalue analysis of N m x m matrices plus one p x p matrix is required. Classically the problem is viewed as a finite variable optimization problem with constraints. T h e convexity condition becomes SzJ*
<0
(3.125)
for all 6x(i), 6u(i), and 801 where S2J* is the second-order term in an expansion of the cost J*, Equation (3.105), i.e.,
SX(i)
8.01 So1
1
(3.126)
and where the discrete Hamiltonian H was defined in Equation (3.104). T h e variations 6x(i),6a, 6u(i) are governed by the equations Sx(i
+ 1) = F,(i) Sx(i) + F,(i) Su(i) + F,(i)601
Sx(0) = 0
(3.127)
~
78
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
This formulation may be used to obtain the same convexity condition as in Equation (3.124). T h e approach is to seek a control parameter Scr and a control sequence Su[O, N - I] that maximizes S2J*, Equation (3.126), subject to the system equations (3.127). This may be recognized as a linear quadratic problem to which the linear quadratic algorithm described earlier may be applied. T h e convexity condition for this problem [cf Equation (3.124)] is exactly equivalent to the convexity condition of Equation (3.125). T h e strengthened convexity condition is that Z,,(k) < 0 and Vmm(0) 0 where k equals 0, 1, ... , N - 1 which implies that the second variation, S 2 J , is negative. This follows from the fact that the optimal solution to this problem [defined by Equations (3.126)and(3.127)] has the form
snopt = -(
8ZL0Pt(i) =
V::t(o))-'
V,",pt(0) &(O)
-z;:(i)(zu&)Sx(i) + Zum(i)
801)
Now, since Sx(0) = 0, it may be shown easily that h o p t = 0, (i = 0, ... , N - I), and Sx(i) = 0 (i = 0, ... , N ) . But, with these values for Sa, Su(i),and Sx(i), S2J* is zero ; i.e., the upper bound on the second variation is uniquely determined by the trivial perturbation Sol = 0, Su(i) = 0 (i = 1, ... , N - 1) and is equal to zero. T h u s the second variation must be negative definite for all nontrivial perturbations. Note that if Z,,(k) is singular, the sequential algorithm apparently fails because Zu,(k)-l does not exist. However, the algorithm may still be employed by using the generalized inverse of Z,,(k), although in this case Z,, is, at best, negative semidefinite. Su"Pt(i) = 0
3.7.5 The Discrete Maximum Principle: A Convexity Condition T h e convexity condition obtained from the discrete maximum principle is a2H(k)jauz(k)< 0 where k = 0, 1, ... , N - 1 or P(k
+ 1) Fuu(k) + L u ( 4 < 0
Comparing this condition with the convexity condition described in the previous section, namely, Z,,(k) < 0 or
FuT(k) V,,(k) Fu(k)
+ V,(k + 1)
Fuu(k)
+ 4Lu(k) < 0
the reader can see that there is a difference between the two conditions; the term F,,T(K) Vz.(k) F,(k). Th u s it is apparent that the discrete maximum principle will not necessarily be valid.
3.8. 3.8
NEIGHBORING EXTREMAL METHODS
79
NEIGHBORING EXTREMAL METHODS
I n many practical situations an approximation to the closed-loop solution is required even though this approximation may be valid only in the neighborhood of the optimal solution. For example, in the orbit transfer problem it may be more advantageous to correct deviations from the optimal trajectory with a feedback law than to recompute a new nominal trajectory at every stage using some numerical algorithm. One approach to this problem, which involves forming a linear approximation to the control law, is known as the neighboring extremal technique. It is assumed that an optimal control sequence uOPt[O, N - 11 has been obtained by some means, e.g., with some numerical algorithm. If the solution is truly optimal, it may be further assumed that there exists an optimal control law uoPt(x(k)) such that UO"(k)
= U"Pt(."Pt(k)),
k
= 0,
1, ... , N - 1
(3.128)
where xoPt(k) is the optimal state at the kth stage. The approximation that is desired is (3.1 29)
where 6x is small enough to ensure the validity of the expansion. The feedback law, for small perturbations about the optimal solution u0Pt, %Opt, will then be given by 6u(6x(K)) = u, &(A). The partial derivative u, (an n x m matrix) is obtained from the dynamic programming equations. The return R was defined as R = L(x(K),u(K)) V 0 P t ( x ( K l)), where x(K 1) is given by x(K 1) = F(x(k),u(k)). At the optimum, from Equation (3.25), R, is zero, i.e.,
+
+
aR(k)/au
+
=
vV,OPt(x(k
+
+ 1)) F,(K) + L,(K) = o
Differentiation of this expression with respect to x gives
80
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
and where all the partial derivatives are evaluated at xoPt(k) and u"pt(k). Assuming Z,, is not singular, (3.131)
I n order to compute uZPt,the partial derivatives V2pt and V:zt must be computed. They are obtained by partially differentiating the dynamic programming equations (3.24) and (3.26) with respect to x, i.e., (3.132)
and (3.133)
where Z,,(k)
= F,T(k) V::t(k
+ I)F,(k) + VzPt(k + I)F,,(k) + L,,(k)
(3.134)
Thu s Equations (3.130)-(3.132), etc., allow the formation of the optimal feedback law, Equation (3.129). Now returning to the equations derived in the previous section for the successive sweep method, we note a striking similarity. I n fact, as the nominal solution approaches the optimal solution, the feedback gains used in the successive sweep algorithm become identical to those obtained before for the neighboring extremal algorithm. This similarity leads to the following observation: The linear approximation to the optimal control law optimizes the second variation of the performance index. If terminal constraints are present the optimal feedback becomes
T h e approach taken here is to eliminate Sv in terms of Sx. Note that (3.136)
V, = K
where K is the value of the constraint, i.e.,
vw"
=K
Considering variations in Equation (3.136) gives V,, Sx(k)
(3.137)
+ V,, Sv =
81
PROBLEMS
8K. And assuming that the system is not abnormal, i.e., V,, # 0, av is given by Sv = Vzl[SK- V,, Sx(k)]. Substituting this expression for 6v back into the expansion of u0Pt, we obtain uoPt(x(k),v )
= uopt(k)
+ [uzpt(k) - U ; ~ ~ ( KVil(k) ) Vvz(k)]S X ( ~ ) + u;pt(k) Vil SK
where the equations for u : ~ ~ ( K ) ,Vvv(K),etc. may be obtained from the dynamic programming equations. Thus the neighboring optimal control law is linear in both the state and the level of the terminal constraint. I n Table XI11 the feedback gains are tabulated for the low-thrust example considered in the last chapter. TABLE XIII. t 0
0.166 0.332 0.498 0.664 0.830 0.996 1.162 1.328 1.494 1.660 1.826 1.992 2.158 2.324 2.490 2.656 2.822 2.988 3.154 3.320
FEEDBACK GAINSON
W l )
%h)
1.62 1.65 1.70 1.76 1.83 1.93 2.06 2.30 2.84 4.80 21.88 10.0 -14.0 -11.0 -11.0 -5.50 8.61 25.0 75.2 315.0
0.937 0.920 0.935 0.986 1.08 1.24 1.49 2.00 2.92 6.75 52.9 90.8 -76.9 -83.4 - 83.4 - 104.0 -183.0 - 280.0 - 548.0 -1.42 X 10'
+a
--co
W S )
1.33 1.40 1.52 1.67 1.89 2.18 2.62 3.29 4.89 9.60 44.2 7.30 - 26.4 - 20.0 -3.64 15.0 44.6 112.0 292.0 +1.2 x 103
fw
THE
OPTIMAL
Yl -0.537 -0.604 -0.699 -0.828 - 1.00 - 1.29 - 1.61 -2.21 -3.39 -1.39 -47.03 -9.40 64.26 70.00 88.0 111.0 162.7 250.0 510.6 1.8 x 103 +-co
u*l
1.02 1.06 1.16 1.28 1.43 1.61 1.04 2.16 2.83 5.88 54.52 75.00 - 80.30 - 86.70 - 103.1 - 123.0 - 172.3 - 230.0 -495.4 -1.8 x 103 --a,
PROBLEMS
1. Thelinear quadratic algorithm derived inSection3.4.1 may be extended
to include performance indices with linear terms. Show that if the
82
3.
OPTIRIAT, CONTROL OF DISCRETE SYSTEMS
system equations are of the form x(i performance index has the form
+ 1) = F,x(i) + F,u(i) and the
the optimal control law and the optimal return function have the following form Zl””t(,T(i)) =
I’””t(X(i))
--I)@) x(i) - E(i)
= XT(2)
P ( i ).(i)
+ Q(i).(i) -+- K(2)
I>erive sequential relations for D ( i ) and E(i).
2. Similarly, repeat the analysis of 3.4.5 using the generalized performance index. 2
~ ( i$-)~~( iwith ) ~ 3. Minimize the function J , where J = x(i 1) = x(i) 0.1 u ( i ) x(0) 10. (Here x and u are scalar variables.) -1-
+
1
4. Repeat Problem 3 with the side constraint 4 3 )
=
0.
5 . Extend the analysis of 3.7.1 to include problems with terminal
constraints. Show that the successive sweep correction to the nominal control becomes
\\here v is the multiplier used to adjoin the terminal constraints to the performance index.
6. Using the analysis from Problem 5 show that the return is minimized by the extreinizing v (for a maximization problem).
7. Find a simple example for which the discrete maximum principle does not hold.
8. Program the gradient and successive sweep algorithms and solve the brachistochrone problem.
BIBLIOGRAPHY AND COMMENTS
83
BIBLIOGRAPHY AND COMMENTS Section 3.3. T h e theory of dynamic programming was first announced formally in a paper by Bellman (1952): Bellman, R. (1952). “The Theory of Dynamic Programming,” Proc. Natl. Acad. Sci. U S A , Vol. 38, pp. 716-719. Subsequently several books have been written; for example, the reader should consult: Bellman, R. (1957). Dynamic Programming. Princeton Univ. Press, Princeton, New Jersey. Bellman, R. (1961). Adaptive Control Processes: A Guided Tour. Princeton Univ. Press, Princeton, New Jersey. Bellman, R. and Dreyfus, S. (1962). Applied Dynamic Programming. Princeton Univ. Press, Princeton, New Jersey. Section 3.4. T h e observation that the linear quadratic problem may be solved analytically has been exploited by a number of authors, for example, Bellman, R. (1958). “Some New Techniques in the Dynamic Programming Solution of Variational Problems,” Quart. Appl. M a t h . , Vol. 16, pp. 295-305. Kalman, R. E. and Koepke, R. W. (1958). “Optimal Synthesis of Linear Sampling Control Systems Using Generalized Performance Indices,” Trans. A . S . M . E . , Vol. 80, pp. 18201838. Noton, A. R. M. (1965). Introduction to Variational Methods in Control Engineering. Pergamon, Oxford. Tou, J. T. (1963). Optimum Design of Digital Control Systems. Academic Press, New York. Wescott, J. H., ed. (1962). A n Exposition of Adaptive Control. Pergamon, Oxford. Section 3.5. Since the introduction of dynamic programming in 1952, a vast number of papers have appeared in the literature describing various computational techniques and applications. Apart from the books by Bellman (1957, 1961, and 1962), the reader may find the following references useful as a base for further study. Aris, R. (1961). The Optimal Design of Chemical Reactions: A Study in Dynamic Programming. Academic Press, New York. Ark, R. (1963). Discrete Dynamic Programming. Random House (Blaisdell), New York. Bryson, A. E. and Ho, Y. C. (1969). Optimization, Estimation and Control. Random House (Blaisdell), New York. Larson, R. E. (1967a). State Increment Dynamic Programming. American Elsevier, New York. Larson, R. E. (196713). “A Survey of Dynamic Programming Computational Procedures,” Trans. IEEE Autom. Control., Vol. AC-12, No. 6, pp. 767-714. Roberts, S. M. (1964). Dynamic Programming in Chemical Engineering and Process Control. Academic Press, New York. Section 3.6.
Discrete gradient algorithms are discussed by
Bryson, A. E. and Ho, Y. C . (1969). Optimization, Estimation and Control. Random House (Blaisdell), New York. Mayne, D. (1966). “A Second Order Gradient Method for Determining Optimal Trajectories of Non-Linear Discrete-TimeSystems,” Intern. J. Control,Vol. 3, pp. 85-95.
84
3.
OPTIMAL CONTROL OF DISCRETE SYSTEMS
Wilde, D. J. and Beightler, C. S. (1967). Foundations of Optimization. Prentice-Hall, Englewood Cliffs, New Jersey. A dynamic programming derivation is also given by Bellman and Dreyfus (1962). Section 3.6.5. Early versions of the discrete maximum principle were given by Chang, S. S. I. (1960). “Digitized Maximum Principle,” Proc. I.R.E., Vol. 48, pp. 20302031. Katz, S. ( 1 962). “A Discrete Version of Pontryagins Maximum Principle,” J. Elect. Control. ( G B ) , Vol. 13, December. and by: Fan, L. T. and Wang, C. S. (1964). The Discrete Maximum Principle. Wiley, New York. However, the assumptions made in the derivation of the principle were not made clear and some confusion resulted. As discussed in 3.6.5, the principle, in general, gives only a stationarity condition. Care must be taken when distinguishing between a maximum, a minimum, and a point of inflexion. These points were clarified in papers by the following: Butkovskii, A. G. (1964). “The Necessary and Sufficient Conditions for Optimality of Discrete Control Systems,” Automation and Remote Control, Vol. 24, pp. 963-970. Halkin, H. (1964). “Optimal Control Systems Described by Difference Equations,” Advances in Control Systems: Theory and Applications ( C . T. Leondes, ed.). Academic Press, New York. Halkin, H., Jordan, B. W., Polak, E., and Rosen, B. (1966). Proc. 3rd I.F.A.C. Cong. Butterworth, London and Washington, D.C. [See also the discussion in the book by Wilde and Beightler (1967) referenced before.] Section 3.7. by
The successive sweep algorithm developed here is similar to ones obtained
Bryson, A. and Ho, Y. C. (1 969). Optimization, Estimation and Control. Random House (Blaisdell), New York. McReynolds, S. R. (1967). “The Successive Sweep and Dynamic Programming,” /. Math. Anal. Appl., Vol. 19, No. 2. Mayne, D. (1966). “A Second-Order Gradient Method for Determining Optimal Trajectories of Non-Linear Discrete Time Systems,” Intern. J. Control, Vol. 3, pp. 85-95.