Journal of Applied Mathematics and Mechanics 75 (2011) 334–342
Contents lists available at ScienceDirect
Journal of Applied Mathematics and Mechanics journal homepage: www.elsevier.com/locate/jappmathmech
The sufficient conditions for continuous -optimal feedbacks in control problems with a terminal cost夽 V.Ya. Dzhafarov, N.N. Subbotina Ekaterinburg, Russia
a r t i c l e
i n f o
Article history: Received 2 July 2009
a b s t r a c t The existence of continuous positional strategies of -optimal feedback is proved for linear optimal control problems with a convex terminal cost. These continuous feedbacks are determined from Bellman’s equation in -perturbed control problems with an integral-terminal cost and a smooth value function. An example is given in which an -optimal continuous feedback does not exist. It is shown that the point limit of the -optimal feedbacks when →0 determines the optimal feedback, that is, a positional strategy and, possibly, a discontinuous strategy. © 2011 Elsevier Ltd. All rights reserved.
Problems on the minimum of a convex terminal cost function for control systems with linear dynamics are investigated. It is shown that a non-smooth value function in such a problem can be approximated using smooth value functions in -perturbed linear optimal control problems with a regularizing terminal-integral cost function. At the same time, it is found that the arguments of the minimum operation in the corresponding regularized Hamiltonians are single point sets. It is shown that aiming along the gradients of the smooth value functions in perturbed optimal control problems determines the continuous -optimal feedback for the unperturbed optimal control problem. The point limit of the continuous -optimal feedbacks when →0 determines the positional strategy, which can be discontinuous. It is proved that, in the case of the unperturbed problem, this limiting positional strategy is the optimal feedback according to Krasovskii’s formalization. 1,2 The basis of the optimality of the limiting feedback rests on a property of the value function and the theory of the minimax solutions of the Hamilton–Jacobi equations, 3 while a non-smooth value function in an unperturbed problem is a generalized solution of the Hamilton–Jacobi-Bellman equation. A discontinuous optimal feedback is typical of optimal control problems 4–7 and, as a rule, the apparatus of non-smooth analysis 8 is required to construct it. The main differences between this paper and earlier papers on optimal feedback are as follows. The problem of the possibility of approximating the optimal feedback with continuous positional strategies that are -optimal for all initial states of the control system is investigated. A class of linear problems with a convex terminal cost function, in which such an approximation exists, is distinguished. In the case of a non-smooth value function, a new approximation is proposed in this problem using the smooth value function in the -perturbed problem, where the minimizing element in the -perturbed Hamiltonian turns out to be unique and it determines the continuous -optimal feedback in the unperturbed problem. As a rule (see Refs 6, 7, 9 and 10, for example), the apparatus of derivatives along directions and differential inclusions is invoked in optimal control theory for a non-smooth value function and feedback, and these inclusions depend irregularly on a phase variable, which considerably hinders the approximation of the optimal trajectories. Only the classical apparatus of mathematical analysis, the classical solutions of partial differential equations and the solutions of ordinary differential equations are used in this paper in -perturbed problems to construct the smooth value function of a continuous -optimal feedback and the optimal trajectories. It is also shown that the existence of a continuous -optimal feedback may turn out to be impossible if the condition for the convexity of the terminal cost function is not satisfied. 1. The terminal optimal control problem Consider the linear control system (1.1)
夽 Prikl. Mat. Mekh. Vol. 75, No. 3, pp. 474–486, 2011. E-mail address:
[email protected] (V.Ya. Dzhafarov). 0021-8928/$ – see front matter © 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.jappmathmech.2011.07.011
V.Ya. Dzhafarov, N.N. Subbotina / Journal of Applied Mathematics and Mechanics 75 (2011) 334–342
335
(1.2) The control parameters are restricted by the geometrical constraints: u∈P, where P⊂Rm is a given convex compact set; the functions A(t), B(t), f(t) and (x) are component-wise continuously differentiable and (x) is a convex cost function. For t0 ∈[t00 ,], the set of all measurable functions u(·):[t0 ,]→P is denoted by the symbol U[t0 ,]. We will match a value of the value function to each initial position (t0 ,x0 )∈[t00 ,]×Rn as follows: (1.3) where x(·;t0 ,x0 ,u(·)) is the solution of Eq. (1.1) for u(·)∈U[t0 ,] that satisfies the initial condition x(t0 )=x0 . According to the known results of an analysis, a minimum is attained in (1.3). If, for all small values of the parameter > 0, an upper semicontinuous multivalued mapping (t,x)→U (t,x) exists with non-empty, convex, compact values such that, for all (t0 ,x0 )∈[t00 ,]×Rn and all solutions x(t)=x(t,t0 ,x0 ) of the differential inclusion
the inequality
holds, then U (t,x) is called an -optimal feedback. The Unified problem. In the general case, the value function (t0 ,x0 )→C(t0 ,x0 ) is not differentiable. We will consider the well known transformation of the segment [t0 , ] into the standard segment [0,1]
(1.4) which also has an inverse transformation
At the same time, the set U[t0 ,] is mapped one-to-one onto the set
using the transformation
Equation (1.1) and the initial condition x(t0 )=x0 are transformed as follows:
(1.5) Here, y() = yt0 () = x(t0 + ( − t0 )). The equality
(1.6)
˜ ˜ is continuously differentiable is the solution of problem (1.5). The function t0 x0 → yt0 (1; 0, x0 , u(·)) holds, where y(·; 0, x0 u(·)) according to the theorems on the differentiability of the solutions of differential equations with respect to the initial data x0 and the parameter t0 . A minimum is attainable in (1.6). 2. Regularization of an optimal control problem Regularization of the value function. Consider the quantity
(2.1) where
˜ and the set U is convex and closed in L2 [0,1]. According ˜ ∈U The expression under the minimum sign is strictly convex with respect to u(·) to well known theorems of analysis, the minimum in (2.1) is therefore attainable.
336
V.Ya. Dzhafarov, N.N. Subbotina / Journal of Applied Mathematics and Mechanics 75 (2011) 334–342
˜ The expression under the minimum sign is continuously differentiable with respect to (t0 ,x0 ) and strictly convex with respect to u(·) ˜ exists here for any point (t0 ,x0 ) and the function ˜ ∈U According to known results of convex analysis, a unique minimizing element u(·) C (t0 ,x0 ) is differentiable with respect to (t0 ,x0 ). Actually, for a scalar function of the form
where (x,y)→c(x,y) is a continuous function, c(·) is a function that is differentiable with respect to x, (x,y)→∂c(x,y)/∂x is a continuous function and Y is a compact m, it is known that the function (x) has a derivative with respect to any direction h∈Rn :
The symbol ·,· denotes a scalar product and Y(x) is the non-empty set of all the minimizing elements in the definition of the function (x). If the set Y(x) consists of a single element, then, and the function x→y(x) is then continuous. Y (x) = y(x) It follows from the last equality that the function (x) has continuous partial derivatives:
Applying the inverse transform
from expression (2.1) we obtain the relations
(2.2) from which the estimates
(2.3) follow. It is obvious that C (t0 ,x0 ) is the value function (the Bellman function) in the problem of the optimal control in the case of the control system (1.1) for a minimum of a cost functional of the form
(2.4) Continuous -optimal feedback. It is known 5 that the value function C (t0 ,x0 ) satisfies the Hamilton–Jacobi–Bellman equation (2.5) where
A unique minimizing element u (t,x) exists for each point (t, x) in Eq. (2.5) since the expression under the minimum sign is strictly convex with respect to u. The function u (t,x) is continuous. Consider the equation (2.6) and suppose x (t) is the solution of this equation. The estimates, where C =C (t,x (t)),
V.Ya. Dzhafarov, N.N. Subbotina / Journal of Applied Mathematics and Mechanics 75 (2011) 334–342
337
hold and therefore
Putting t = , we obtain
The inequality
follows from estimates (2.3), the last two relations and the validity of the following assertion. Theorem 1. A continuous ε-optimal feedback uε (t,x) exists in the optimal control problem (1.1), (1.2) considered. The minimizing elements in Eq. (2.5) are values of the function u (t,x). 3. The Structure of the limiting optimal feedback Limiting discontinuous feedbacks. The value functions C (t,x) introduced are continuously differentiable. Their gradients C (t,x) are continuous functions, uniformly bounded in any compact domain D⊂[t00 ,]×Rn . Hence, according to Theorem 1 and known results of analysis, the functions C (t,x) uniformly converge when →0 to a locally Lipschitz function C(t, x) in any domain D⊂[t00 ,]×Rn . Their gradients C (t,x) converge to the gradients C(t,x) almost everywhere It is known that the value function C(t, x) in problem (1.1) - (1.3) is superdifferentiable. 7 Its superdifferential ∂C(t,x) is a set that is non-empty for all (t,x)∈(t00 ,]∈Rn and is defined by the relations
where coH is the convex hull of the set H. It is known that, in this problem, the set ∂C(t,x) is identical to Clarke’s subdifferential, 6 that is, it has the form
(3.1) for all (t,x)∈(t00
,) × Rn ,
where (ti ,xi ) are the points of differentiability of the function C(t,x). It follows from Rademacher’s theorem that
almost everywhere in [t00 ,] × Rn . We now define a feedback u0 (t,x) as follows: (3.2) The function (t,x)→u0 (t,x) can turn out to be discontinuous. Following Krasovskii’s approach to the formalization of a control according to the feedback principle, we introduce the value of a result V(t0 ,x0 ) for a system of the form of (1.1), starting out from the initial point (t0 ,x0 )∈[t00 ,] × Rn under the action of a discontinuous feedback u(t,x) (see Refs 1 and 2, for example). This value has the form (3.3) where
It is clear that
The Structure of the limiting discontinuous optimal feedback. We will prove the validity of the following assertion. Theorem 2.
In optimal control problem (1.1), (1.2) considered, any feedback u0 (t, x) of the form of (3.2) is an optimal feedback, that is, (3.4)
Proof.
We consider the difference
338
V.Ya. Dzhafarov, N.N. Subbotina / Journal of Applied Mathematics and Mechanics 75 (2011) 334–342
(3.5) where (3.6) and estimate the difference (3.6). We fix the small parameter i ≤ (ti+1 -ti )2 and choose the other parameter i ≤ i such that the conditions
(3.7) are satisfied, which can be done by virtue of Theorem 1 and condition (3.2). We now estimate the difference (3.8) To do this, we consider the solution
xi (t)
of the equation
and the solution x (t) of the equation
subject to the condition that xi (t) = x (ti ). By virtue of the fact that xi (t) is the optimal trajectory for the i -regularized problem (2.2), we have the estimates (3.9)
(3.10) According to Pontryagin’s maximum principle,
4
the following equalities hold
The conjugate variable pi (t) satisfies the relations (3.11) and the equalities (3.12) (3.13) hold for all t∈[ti ,ti+1 ]. The Hamiltonian on the right-hand side of the last equality has the form
Note that the small parameter i can be chosen such that the following conditions are satisfied
V.Ya. Dzhafarov, N.N. Subbotina / Journal of Applied Mathematics and Mechanics 75 (2011) 334–342
339
Using equalities (3.11) and (3.12), we estimate the difference (3.14) where L = L(D) > 0 is a constant quantity for a sufficiently large compact domain D⊂Rn such that
We continue the estimate of (3.14) by applying Cauchy’s formula for the solutions x(t)∈D,t∈[t0 ,] of linear system (1.1) when (t0 ,x0 )∈[t00 ,] × Rn and using the continuity of the functions A(t), B(t), f(t). As a result, we obtain the inequality
(3.15) where the constant M > 0 is independent of the values i = 0, 1,..., N - 1 and i ≤ -. According to the choice of i (3.7), the inequality (3.16) holds for any t∈[ti ,ti+1 ]. We now consider the strictly convex function
(3.17) for all t∈[t00 ,], p∈Rn , > 0. Using the fact that the argument of the global minimum for the function εi (t, pi (t), u) is a set containing one element and denoting this element by the symbol uεi (t, pi (t)), ⊥= uεi (t , xi (t)), we have the relations ˜ ˜
(3.18) Recalling the definition of the constant c (see above, (2.1)) and using the properties of the conjugate variable pi (t) and the modulus of continuity → ( ): [0,1] → [0,R] for the fundamental system of solutions of conjugate system (3.11), we continue the evaluation of (3.18) as follows: (3.19) We emphasize that the modulus of continuity ( ) is independent of i . It is easy to see that relations (3.19), (3.16) (3.15) and (3.7) lead to the satisfaction of the following inequalities for Ci (ti+1 ) (3.14) (3.20) Inequalities (3.20), (3.10) and (3.8) imply the estimate (3.21) It follows from relations (3.21) and the last inequality of (3.7) that the resulting estimate for i C has the form (3.22) Summing inequalities (3.22) for all i = 0, 1,..., N - 1 and putting diam < 1, we obtain the final estimate of difference (3.5) in the form (3.23) where the constants c, L and M are independent of the specific partitioning of . It is obvious that
Other constructions of a discontinuous optimal feedback have been proposed earlier (see Ref. 11, for example).
340
V.Ya. Dzhafarov, N.N. Subbotina / Journal of Applied Mathematics and Mechanics 75 (2011) 334–342
4. Example of the absence of a continuous -optimal feedback The following counterexample shows that the assumption that the terminal function (x) is convex is important for the existence of continuous -optimal feedbacks approximating a discontinuous optimal feedback u0 (t,x). Consider the following optimal control problem of the form of (1.1), (1.2)
Here, it is easy to find the value function (4.1) In this example, an upper semi-continuous multivalued mapping (t,x) → U (t,x), ⊂[-1,+1], the values of which are non-empty convex compacta and for which the inequalities
are satisfied for all x(t) satisfying the relations
does not exist for any small parameter ∈(0,1/2). (In particular, a continuous -optimal feedback u (t,x)) does not exist in this example). In fact, let us assume that the opposite is true, that is, suppose an -optimal feedback exists. Then, for any ∈(0,1/2), a mapping (t,x) → U (t,x) exists such that all the solutions x(t) = x(t, t0 , x0 ) of the differential inclusion (4.2) satisfy the inequality (4.3) We fix the index and, in the subsequent discussions, we shall omit it in the calculations. We put t0 = 0 and denote the set of solutions of differential inclusion (4.2) with the initial condition x(0) = x0 by the symbol Y(x0 ). We now consider the set
which is the attainability set at the instant t = 1 for the solutions of the inclusion (4.2) starting from the initial point (0, x0 ). It is known that the mapping x0 → Y(x0 ) is upper semicontinuous and Y(1,x0 ) is a connected set from R (see Ref. 7, for example). From condition (4.3) using the inequality
we obtain that the inequalities (4.4) hold for the points x(1)∈Y(1,x0 ). We now consider the point x0 = 0. Using the last inequality and the property of the connectedness of the set Y(1, 0), we conclude that, for all x(·)∈Y(0), either the inequality x(1) ≤ 1/4 or (4.5) holds. Without loss of generality, we will assume that inequality (4.5) holds. We now consider the motion of the initial point x0 into the domain of negative values. Then, from the second condition of (4.4), the first condition of (4.5) and by virtue of the inequality |u| ≤ 1, we obtain the existence of an initial point x0 = x* < 0 such that
The inequalities obtained contradict the upper semicontinuity of the many-valued mapping x0 > Y(x0 ) at the point x0 = x* 5. Example of the construction of a continuous -optimal feedback Consider the problem of an optimal control with dynamics of the form
V.Ya. Dzhafarov, N.N. Subbotina / Journal of Applied Mathematics and Mechanics 75 (2011) 334–342
341
Fig. 1.
In this problem it is required to minimize the following cost functional
Bellman’s equation in this problem has the form
and the following boundary condition for the value function is satisfied
The optimal synthesis in this problem is determined by the conditions
The results of the numerical solution of this optimal control problem are presented in Fig. 1 in the form of a graph of the discontinuous optimal feedback u0 (t, x1 , x2 ) constructed for the instant t = 0. In the regularized optimal control problem, the cost functional has the form
The value of the regularized problem satisfies the perturbed Bellman’s equation
and the corresponding continuous -optimal feedback has the form
The results of the numerical solution of the regularized optimal control problem are presented in Fig. 2 in the form of a graph of the continuous -optimal feedback u (t, x1 , x2 ) constructed for the instant t = 0. According to the results obtained here, this continuous function is the -optimal feedback in the initial unregularized optimal control problem. We note that the original method 12 was employed for the numerical modelling in which non-uniform meshes are used in the space of the phase variables x1 and x2. This is very clear in Figs 1 and 2 in constructing the positional control laws u0 (t, x1 , x2 ) and u (t, x1 , x2 ) at the nodes of these meshes. The adaptive character of the meshes enables one to improve the accuracy of the numerical approximations of the optimal control laws considerably.
342
V.Ya. Dzhafarov, N.N. Subbotina / Journal of Applied Mathematics and Mechanics 75 (2011) 334–342
Fig. 2.
Acknowledgements We wish to thank T. B. Tokmantsev for help in modelling the examples. This research was carried out with the support of the Scientific and Technological Research Council of Turkey (TÜBITAK, Programme 2221) and also of the Russian Foundation for Basic Research (08-01-00410) and the Programme for State Support of Leading Scientific Schools (NSh-2640.2008.1). References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Krasovskii NN. Theory of the the Control of Motion. Moscow: Nauka; 1968. Krasovskii NN, Subbotin AI. Positional Differential Games. Moscow: Nauka; 1974. Subbotin AI. Generalized Solutions of First Order PDEs: The Dynamical Optimization Perspectives. Boston: Birkhauser; 1995, 312p. Pontryagin LS, Boltyanskii VG, Gamkrelidze RV, Mishchenko EF. The Mathematical Theory of Optimal Processes. New York: Wiley; 1962. Bellman R. Dynamic Programming. Princeton: Univ. Press; 1957. Berkowitz LD. Optimal feedback controls. SIAM Journal Contr Opt 1989;27(5):991–1007. Frankowska H. Value function in optimal control. Preprints Summer School On Math. Control. Theory (Sept. 2001). The Abdus Salam Int Centre for Theor Phys 2001:3–28. Clarke FH. Optimizations and Nonsmooth Analysis. N.Y: Wiley; 1983. Blagodatskikh VI, Filippov AF. Differential inclusions and optimal control. Tr Mat Inst im Steklova 1985;169:194–252. Subbotina NN. The method of characteristics of Hamilton–Jacobi equation and applications to dynamical optimization. J Mat Sci 2006;135(3.):2955–3091. Subbotina NN. On structure of optimal feedbacks to control problems. In: Abstracts 11th IFAC. In-term. Worshop on Control Application of Optimization (CAO’2000). St.Petersburg: Publish. St.Petersburg State University; 2000, 254-5. 12. Subbotina NN, Tokmantsev TB. Estimation of the error of network optimal feedback in non-linear optimal control problems of specified duration. Avtomatika i Telemekh 2009;9:141–56.
Translated by E. L. S.