A Second-Order Variational Method for DiscreteTime Optimal Corrtrol Problems* by
A. J. KOIVO
Department of Electrical Engineering Purdue University, Lafayette, Indiana ABSTRST:
The solution to an optimum control problem is often determined by ful$lling
sary conditions for the optimality.
neces-
It usually amounts to solving a two-point-boundary-value
problem. Most computational
schemes used for the solution are based on the first-order variation
of performance index (PI).
In order to obtain a more accurate improvement,
and hence a
quicker convergence to the optimal solution, the second-order variation of the PI can be introduoed in the algorithm. equations.
This leads to computing
the solution of a set
The equations result from the perturbation
of linear
of the system equations,
difference-
which may be
nonlinear. In this paper recurrence relations for determining the solution to the perturbation equations are developed. An algorithm for improving the control sequence from one iteration to the next is constructed. opera:ions
The optimal
solution
is obtained fast,
although
the number
of mathematical
is large. The advantages and drawbaelcs of this second-order variational
scheme are
discussed. An exampledemonstrates the applicability of the algorithm.
Introduction
For a general discrete-time optimum control problem, necessary conditions for the optimal solution can be obtained from the discrete-time maximum principle (5). These equations constitute a two-point-boundary-value problem (TPBVP) . The solution, in general, is determined by means of a digital computer. Several computational techniques have been proposed for the solution of such a TPBVP, for example, various gradient methods (4,6). The general idea in these methods is to fulfill two of the three necessary conditions, and to iterate on the third one so as to satisfy it more closely. An improvement in each iteration is usually based on the first-order variation of the performance index (PI). The principal drawback of the first-order variational methods is that the convergence to the optimal solution is slow, requiring a large number of iterations. The rate of the convergence can be increased by determining improvements more accurately from one iteration to the next. It is accomplished by considering second-order variational terms in perturbing the PI. Second-order variational algorithms have been proposed for the solution of continuous-time optimum control problems in (1, 2). A discrete-time method analogous to (2) has been presented in (3). This paper presents an alternative second-order variational technique for discrete-time nonlinear optimum control * This work was supported in part by NSF Grant No. GK-1970 and in part by NASA under Contract No. 15-00-021.
321
A. J. Koivo problems. The algorithm makes use of an approach often used in conjunction with the discrete-time optimum control problems when the plant is linear and the PI is quadratic. This approach is different from that presented in (3). The procedure is analogous to the continuous-time method of (1). However, the transformation used in solving the linear perturbation equations is different in the discrete-time case from that of (1). The paper briefly poses the discrete-time optimum control problem and presents necessary conditions for the solution. Equations for the algorithm are developed by means of the second-order variation of the PI. The linear perturbation equations are solved by a transformation method. It results in a set of recursive relations, which can readily be solved on a computer. Having the solution to the perturbation equations, the control sequencecan beimproved so as to satisfy necessary conditions for the optimality more closely. The applicability of the method is demonstrated by an example. Discrete-Time
Optimum
Control
Problem
A general discrete-time optimum control problem is to determine an admissible control sequence (24(k) ) so as to minimize a performance criterion, N-l
JCUI
=
4CavI
+
c Lr.49, i-0
u(i), il
subject to the difference equation constraint
a + 1) = fix@>, u(k), kl,
x (0) specified
(2)
x(k),an-vector,isthestateofthesystemattimek;k=O,l,*..,Nl;u(k), a m-vector, is an admissible control at time t = Ic; f( .) is a vector-valued function, which is continuous and differentiable in x and U, and bounded for all k; 9 ( -) and L ( a) are continuous and differentiable in x and U; N signifies the duration of the process (the number of the sampling periods). Let the target set be specified by
AIxW)l = 0
(3)
where #[*I is a smooth vector-valued function consisting of q components (q 5 n) ; and 0 is a null-vector. To obtain necessary conditions for the optimal solution, the augmented PI is considered N-l
JCU] = P'~Cx(~>l
+ 4CxW>l
+
c (-xx(i),
U(i)>iI
i-0
+ p’(i + l)[f(x(i>,
u(i),
i) -
di
+ 1111.
(4)
The prime signifies the transposition; p(i + 1)) an n-vector, is the costate of the system at time i + 1; @ is a vector consisting of q Lagrange multipliers.
322
Joumal of The Franklin Institute
A Method for Discrete-Time Optimal Control Problems
The first-order variation of J[u] point of J[u] relative to u: z(k + 1) = fcx(k), P(k) 6dW)
=
yields necessary conditions for a stationary
u(k),
lc= 0,
kl,
1, * * *, N -
k=N-
Ca_f’/a4k)lP(k+ 1) + CauaG)l,
iCa(vP)ladN)l+
C%‘(k) iCaJvau(k)l +
Ca4IadN)I - P(N) I = 0, #[z(N)] = 0,
(2)
1, **-, 1
(5)
s(O) specified
k = 0, 1,
Ca.f’la4~)Ip(~+ 1)1 2 0,
1
***, N -
(6) 1
(7) where
L ax(k)
_c:l
*:::
P(k)
= co1 L%(k),
aL/ax(k) =
~01
. .
axW)
(3)
?-b(k)1
(9)
[aL/a21(k),. . ., aLlax,
a+JadN),
- a* =
***,
;;I;;;;]
hh/axdN)
-0,
. . .
. a&/as(N),
a+/ax(N) = ~01i Ca$&wW
w,/adN)
-a,
I,
- - a, Lw/a~dW
1
1 II
(10)
(11)
(12)
7 ah/ah(k) . . af -= . au(k) . 1 1
aL/au(k) =
~01 {[aL/a24(k)],
---,
[ah/au,(k)]}
(13)
afilat4dk),
(14)
.
[
.
af,/a2h(k), ...9 ah/ah(k)
and 6x(N) and h(k) represent admissible variations in x(N) and ‘II(k) , respectively. Equations 2, 5-7 provide necessary conditions for the optimality. They establish a well-posed TPBVP. Indeed, (i) Eqs. 2 and 5 constitute the difference equation constraint; (ii) Eq. 6 specifies the boundary conditions; and (iii) Eq. 7 is a necessary condition for a stationary point of J[u] in the control space. The optimal solution can be determined on a computer by fulfilling any two of the three conditions (i) , (ii), (iii), and by iterating on the third one in attempting to satisfy it to a desired degree of accuracy. In the sequel, Eqs. 2, 5, and 6 are
Vol.
286. No. 4, October 1968
323
A. J. Koivo assumed to be satisfied. The iteration is performed in the control space so as to satisfy the third stationarity condition, (7). The procedure is based on the second-order variation of J[u]. Variational
Equations
for the Algorithm
The discrete-time Hamiltonian of the system is
HL[x@), p(k + 11, u(k), kl = Xx(k), u(k), kl + p’(k + l).fCx(k),u(k), kl. (15)
Then Eqs. 5 and 7 can be written as p(k)
= aHklax(k)
(5’)
h’(k) [aHdau(k) J 2
0
(7’)
with obvious definitions. Suppose now that trajectories (x(k) } and {p(k) ) are determined from Eqs. 2, 5, and 6 corresponding to an admissible initial control sequence (u(k) ) . The task is to change the control so as to fulfill Eq. 7’. In an unconstrained case, Eq. 7’ implies
aHk/au(k) = 0.
(7”)
Let the variation in the control-vector be &u(k); then, the perturbed control at time k is u(k) + &u(k). It follows that the resulting perturbations (6x(k) ) and (t@(k) ) in the statevariables and the costate-variables, respectively, are governed by
Wk + 1) = CW/ax(k>lMk>
k=O,l,...,N-1
+ CV/~~(k)I~~(k),
(16)
SP(k) = CazHdax2(k)]6x(k) + [a2H,/ax(k)au(k)]su(k) +
k = 1, . . . . N - 1
C~2Wax(k)Mk + 1)16p(k + 1) 6x(N) +
WCx(N>l= CwlWN)lWN)
.
&dB = 0,
6x(O) = 0
where dfl is the resulting perturbation in the B-vector,
d2Hk = ax2(k)
-
d2Hdax12(k) , .
I
=0
(18)
(19)
. 1(20
a2Hk/ax,(k)as(k) . .
.
a2H~/ax~(k)ax,(k), 324
a--,
6p(N)
(17)
***,
a2Hk/axn2(k)
Journal of The Franklin Institute
A Jleth.odfor Discrete-Time Optimal Control Problems
a2H’k
a’fh
= au(k)ax(k)
adk)aW
a2~~/a~~(k)att~(k),
=
a2Hk/ax,(k)a24,(k), a2Hn:
a2Hkldxn(k)ah(k)
a--,
(21)
1
a’& + 1) = ap(k
as(k)dp(k
a2Hk/a2hu+3~~hk) . . .
-,
. . .
+
i)akqk)
dwk/azl(k)apl(k . . .
=
+ i),
awk/az,(k)apl(k + l),
-,
a2Hd3zl(k)~~dk . . .
+ 1)
-em, a2Hklazn(k)dpn(k + 1): (22)
. . . , aypv)/ax,(N) axlW)
avw iaxw), . .
aawk) ___ =
aZ(N)
. .
. a”(s’~)/az,(N)az,(N), --,
- . . , a2+/ax, (N) ax,(N)
a24/az12(N), .
824 = ___ ax2Gz( N)
.
. .
.
.
a2$/axl(Nbh&V, -a,
a2+/axn2(N)
1 I
(23)
. a”WW%?(N)
(24)
The change in dHk/au (k) is d2Hk
au(k)ax(k) 6x(k) +
a2Hk
au(k)ap(k
a2Hk
dp(k + 1) + atyk) + 1)
6u (1;)
where a2Hk/aUl(k)apl(k . . .
d2Hk
+ 1) =
au(k)ap(k
i awk/au,(k)ap,(k
Vol.
286, No.
4,
October 1968
f
i),
+ I),
-,
d2&/dlh(k)aPn(k . . .
+ 1)
--,
awkihdk)apn(k
+ 1) I (26)
325
A. J. Koivo a2Hk/aU12
(k)
a2Hk/au,(k)aul(k) . . .
**a,
)
. . .
awk
-= ad(k)
awk/a2hl(k)a24,(k), ..-, Assuming that d2fZk/du2(k) is nonsingular can be solved for b(k)
a2Hk
au(k)az(k) Substituting
6x(k)
-
awk/aUmZ(k)
(a non-degenerate
I.
(27)
case), Eq. 25
a2Hk wk)ap(k
+ 1)
Eq. 28 into Eqs. 16 and 17 leads to the following expressions 8z(k + 1) = A(k&(k)
I
Wk)
+ B(k)Gp(k + 1) + v(k)
= C(kM4k) + A’(k)&@ + 1) + w(k)
(N (30)
wherek=O,l,***,N-land a2Hk
(31)
atA(k a’Hk du(k)dp(k
(32)
+ 1)
(33)
a2HE au(k)ax(k)
w(k) =
(34)
(35)
The problem is now to determine the solution to linear difference Eqs. 29 and 30, where the boundary conditions are given by Eqs. 18 and 19. It can be accomplished by a transformation 6p(k + 1) = R(N - k) h(k) where R(N - k) is an (n X n)-matrix,
326
+ &(N - k) d/?+ h(N - k) &(N - k) is an (n X q)-matrix,
(36) and
Journal of The Franklin Institute
A Method for Discrete-Time
Optimal Control Problems
h(N - k) is a vector of dimension n. The motivation for the form of expression (36) stems from the optimal solution of the discrete linear plant-quadratic criterion problem, in which the optimal control can be expressed either as a function of the costate or of the state vector. Equation 36 is now used in conjunction with Eqs. 29,30, 18, and 19 to convert the TPBVP governing the perturbations to an initial value problem specified in terms of R(N - k) , &(N - k) and h(N - k) . Substituting expression (36) into Eqs. 29 and 30, and combining the resulting equations, we obtain R(N - k) = G-‘(k)[A’(k &(N - k) = G-‘(k)A’(k h(N - k) = G-‘(k)
G(k)
+ l)R(N + l)&(N
- k - k -
+ C(k + l)A(k)l
l)A(k) 1)
(38)
{A’(k + l)h(N - k - 1) + w(k + 1) + CC(k + 1) + A’& + l)R(N - k -
= I - A’(k
+ l)R(N
- k -
l)B(k)
(37)
-
l)-&(k))
C(k + l)B(k)
(39) (40)
where k = 0, 1, ..a, N - 2. To determine the initial values for Eqs. 37, 38, and 39, we observe that (i) Eq. 19 gives Q conditions for &z(N) ; (ii) Eq. 18 determines n conditions for 6p (N) as a function of 6x(N) and dfi ; (iii) the remaining n conditions are specified at the initial time, Eq. 19. Combining Eq. 36 with Eq. 29 for k = N - 1 and Eq. 18 with Eq. 36 for k = N - 1 leads to two expressions in &r(N), &r(N - 1) and dfi; the use of Eq. 29 with k = N - 1 in these expressions results in an equation that yields the initial values R (1)) Q (1) and h (1) : R(1)
= H-1{[d2~/axz(iV)]
+ [a(P’~)/ax2(N))A(N
-
1)
Q(1) = H-‘[&V/ax(N)] h(l)
=
H-1f[a2~/axZ(N)] + [a”(~‘~>/ax~(N)])v(N
H = I -
([a24/ax2(N)]
+ [a2@‘#)/ax2(N)]}B(N
(41) (42)
- 1)
(43)
- 1)
(44)
It is noted that G(k) and H in Eqs. 40 and 44 can be inverted, except in some degenerate cases. In Eq. 36, R, Q and h can now be computed for all k. The sequences {&z(k) ] and {6p(k + 1) ) can be determined, since 6x(O) is known, and dp can be evaluated from Eqs. 18, 19, which may be combined with Eqs. 36 and 29. Now an algorithm can be constructed for determining the optimal solut.ion: (1) Choose initially an admissible control sequence. Compute the corresponding trajectories from Eqs. 2, 5, and 6. (2) Compute A(k), B(k), u(k), C(k), and w(k), k = 0, 1, +a., N - 1 from Eqs. 31-35. (3) Determine R(k), Q(k), and h(k), k = 1,2, **a, N from Eqs. 3’7-44. Evaluate d/I from Eqs. 18, 19, and 36.
Vol. 286, No: I, October 1968
327
A. J. Koivo (4) Evaluate the perturbation trajectory f&r(k) } and (6p(k + 1) ) from Eqs. 29, 30, and 36. (5) Determine the improvement 6u(k) in the control u(k) , k = 0, 1, . . a, N - 1 from Eq. 28. Note that G[dHk/&(k) ] is to be specified in advance. (6) Repeat from 2 on until the optimal solution is obtained to the desired degree of accuracy. In applying the algorithm, G[aHk/du (k) ] can advantageously be chosen as a fraction of aH&u(k). It can be decreased, for example, inversely proportional to the number of iterations. For the optimal solution, aHk/au(k) = 0. Example. The state transition of a discrete-time nonlinear system is governed by zl(k +
1)
=
xl(k)
+
5%2(k)
+
Ce(Wl
4ptan-l
a(k + 1) = z2(lc)+ T tan-‘[e(k)1 e(k)
= u(k)
- xl(k)
- a(k),
s(0)
=
a(0)
=
2.0
(45)
where z(k) = co1 [zl(k), w(k)] represents the state of the system at time k; u(k) is a scalar input to the system at time k; e(k) represents the error signal at time k; T = 0.1 and k = 0, 1, m**,iV - 1. The problem is to determine the optimal control sequence (u*(k) ] that minimizes the PI given by N-l
J[u]
= &d(N)x(N)
+
c
$[d(i)z(i)
(45’)
+ u”(i)].
i-0
The terminal state is assumed to be free, i.e., # = 0 (null-vector). duration of the prooess is given by N = 100. The discret&irne Hamiltonian of the system is Hk[z(k),
u(k), p(k + l)]
= 3Ca”(k) + n2(k) + 3(k)] p,(k + l)zl(k + 1) + p2N
+
+
lbz(k
+
1).
The
(46)
The costates evolve according to
p(k) = s(k) + (1 - 4P/Cl + e2(k)]lpl(k + 1) + TIC1 + eYk)]pz(k w(k)
= a(k) + (T - iT2/[1
wherek=
+ eYk>I)m(k
+ 1) + (1 - T/Cl
l,***,N-land pi(N)
= xl(N),
pz(N)
=
u(k) + [3T2pl(k + 1) + Tpdk +
(43)
zz(N).
The optimal control is determined by aHk/du(k)
328
+ 1) + ePik)l}pdk~-l-))
= 0, i.e.,
1)lCl+ e2(k)l+= 0
(49)
Journal of The FranklinInstitnte
A Method
for Discrete-Time
Optimal Control Problems
Necessary conditions for the optimal solution are specified by (i) Eqs. 44, 47; (ii) the initial conditions, Z(O), and Eq. 48; (iii) Eq. 49. Conditions (i) and (ii) are fulfilled in each cycle. The algorithm proposed changes the control sequence so as to force dHk/&(k) to zero as closely as desired. The computational scheme presented converged to the optimal solution in five iterations requiring 13 set for compilation and I.02 set for computational operations per iteration. The optimal value of the PI is J[u*] = 181.290. The convergence of the control sequence to the optimal one is presented in Fig. 1. For
I
(b)
FIG. 1. Convergence of the solution to the optimal one.
comparison, the problem was also solved by means of a version of the well-known gradient method based on the first-order variation of the PI, (6). This algorithm yields the solution after 50 iterations, requiring 2.0 set for compilation and 0.34 set for computational operations per iteration. Discussion
The example demonstrates that the convergence of the proposed algorithm is fast. In fact, the improvement in the control sequence is determined quite accurately, since the algorithm uses not only the gradient of the PI but also the curvature of the PI (in the control space). It should be emphasized, however, that such an improvement is accomplished at the expense of more complicated mathematical expressions. Furthermore, most of the expressions needed must be written explicitly in any given problem. It requires a fair amount of unavoidable manipulation of the mathematical expressions by the designer. The digital program (FORTRAN IV) for the determination of the optimal solution by means of the second-order variational method becomes quite complex. Indeed, the program of the example is much longer and more complicated than the corresponding program written for the gradient method, which is based on the first-order variation of the PI. Second-order variational algorithms, in general, exhibit certain desirable properties, such as fast convergence, insensitivity to initial values, ease in deter-
Vol. 286, No. 4, October 1968
329
A. J. Koivo mining the step-size for improvements. However, these characteristics are achieved by trading off some of the advantageous features of first-order variational methods, such as the simplicity of mathematical expressions needed, a straightforward digital program whose operation is readily checked. In compromising between these factors, the designer should also t.ake into account the properties of the available computer and the importance of the time during which the computer can be used. It appears, though, that fast converging algorithms, like the one proposed, are more amenable to practical problems of on-line optimization than slow first-order variational schemes. Conclusions
A computational method for determining solutions to nonlinear discrete optimal control problems is presented. It is based on the second-order variation of the performance criterion. The control sequence is improved from one iteration to the next (towards the optimal one) quite accurately. This is accomplished using the solution of the linear perturbation equations, which are solved by means of a discrete transformation. The algorithm increases the rate of the convergence to the optimal solution, which is demonstrated by an example. This desirable property is achieved at the expense of more complex mathematical expressions and lengthier digital programs. Hence, in choosing a computational method, the designer has to compromise between the convergence of the solution, the complexity of the mathematical expressions needed for the algorithm, and also the complexity of the digital program. In general, the decision is dictated by the particular problem, the time and the computer available, as well as by the familiarity of the designer with various computational methods. Acknowledgment The author wishes to thank Mr. J. Katz for programming
the example.
References (1) J. R. McReynolds and A. E. Bryson, Jr., “A Successive Sweep Method for Solving Optimal Programming Problems,” Joint Auto. ControZ Conf., Troy, New York, June 1965. (2) S. K. Mitter, Ph.D. Thesis, University of London, 1965. (3) D. Mayne, “A Second-Order Gradient Method for Determining Optimal Trajectories of Non-Linear Discrete-Time Systems”, Inter. J. Control, Vol. 3, No. 1, pp. 85-95, 1966. (4) Y. C. Ho and P. B. Brentani, “On Computing Optimal Control with Inequality Constraints,” SIAM J. Auto. Control, Sec. A., Vol. 1, No. 3, 1963. (5) H. Ha&in, B. W. Jordan, E. Polak, and J. B. Rosen, “Theory of Optimum Discrete-Time Series,” Math. Res. Ctr., Rept. No. 601, Univ. of Wise., Madison, Sept. 1965. (6) W. Kipiniak, “Dynamic Optimization and Control.” New York, John Wiley and Sons, 1961.
330
Journal of The Franklin Institute