Automatica 44 (2008) 2427–2434
Contents lists available at ScienceDirect
Automatica journal homepage: www.elsevier.com/locate/automatica
Brief paper
Finite-horizon dynamic optimization of nonlinear systems in real timeI Vicente Costanza ∗ , Pablo S. Rivadeneira Nonlinear Systems Group, INTEC (UNL-CONICET), Güemes 3450, 3000 Santa Fe, Argentina
article
info
Article history: Received 12 December 2006 Received in revised form 26 October 2007 Accepted 31 January 2008 Available online 8 May 2008 Keywords: Optimal control Hamilton equations Finite-horizon optimization Nonlinear boundary-value problems First-order PDEs
a b s t r a c t A novel scheme for constructing and tracking the solution trajectories to regular, finite-horizon, deterministic optimal control problems with nonlinear dynamics is devised. The optimal control is obtained from the states and costates of Hamiltonian ODEs, integrated online. In the one-dimensional case the initial costate is found by successively solving two first-order, quasi-linear, partial differential equations, whose independent variables are the time-horizon duration T and the final penalty coefficient S. These PDEs should in general be integrated off-line, the solution rendering not only the missing initial condition sought in the particular (T, S)-situation, but additional information on the boundary values of the whole two-parameter family of control problems, which can be used for designing the definitive objective functional. Optimal trajectories for the model are then generated in real time and used as references to be followed by the physical system. Numerical improvements are suggested for accurate integration of naturally unstable Hamiltonian dynamics, and strategies are proposed for tracking their results, in finite time or asymptotically, when perturbations in the state of the system appear. The whole procedure is tested in models arising in aero-navigation optimization. © 2008 Elsevier Ltd. All rights reserved.
0. Introduction The Hamiltonian formalism has been at the core of the development of modern optimal control theory (Pontryagin, Boltyanskii, Gamkrelidze, & Mischenko, 1962). When an optimal control problem involving an n-dimensional system and an additive cost objective is regular (Kalman, Falb, & Arbib, 1969), i.e. when the Hamiltonian H(t, x, u, λ) of the problem can be uniquely optimized by a control value u0 depending on the remaining variables (t, x, λ), then the solution can be constructed from the trajectories of a set of 2n ordinary differential equations (ODEs) subject to two-point boundary conditions, known as Hamilton (or Hamiltonian) equations (in what follows, HEs). This boundary-value problem, if solvable, frequently presents numerical difficulties due to the mixed-sign eigenvalues of its linear approximation (see for instance Abraham and Marsden (1978), Costanza (2005) and Rao and Mease (1999, 2000)). For the linear-quadratic regulator problem (LQR) with a finite horizon, there exists at least a well-known method (see for instance Sontag (1998)) to transform the boundary-value problem into a related initial-value one, and a subsequent matrix decomposition. But this
I This paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor Ben M. Chen under the direction of Editor Ian Petersen. ∗ Corresponding author. E-mail address:
[email protected] (V. Costanza).
0005-1098/$ – see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.automatica.2008.01.033
kind of transformation does not apply in general to nonlinear systems. In the infinite-horizon bilinear-quadratic case, for both the regulator and the change of set-point problems there exists a recent attempt to find the missing initial condition for the costate variable, which allows us to integrate the equations online with the underlying control process (Costanza & Neuman, 2006). Also, a new procedure for finding initial costates in two-point boundaryvalue situations (finite horizon) for bilinear-quadratic problems, along the lines of invariant imbedding (Bellman & Kalaba, 1963), has been anticipated in Costanza (2007). Here an extension to general one-dimensional nonlinear dynamics, subject to objective functions with arbitrary Lagrangians (provided they lead to regular Hamiltonians) and quadratic final penalty functions, is developed and illustrated. This is the main contribution of this paper, namely the discovery of the PDEs whose solutions provide the exact initial conditions for Hamilton’s equations. The PDEs are suitable to be integrated with standard software. Once the two missing conditions (the final value of the state and the initial value of the costate) are calculated offline, then the HEs can be integrated online, and the optimal control can be constructed in real time from their solution. But the optimal control expression is not strictly a feedback law, since it also depends on the costate variable. Therefore departures of the real (physical) state from the solution to the HEs need to be efficiently abated to preserve optimality. This poses a tracking problem which is also discussed in this article, but restricted to the time-varying linearization (Sontag, 1998) of the deviation dynamics (for higher-approximation approaches see Costanza, Tuncelli, and Neuman (1996) and García and D’Attellis (1995)).
V. Costanza, P.S. Rivadeneira / Automatica 44 (2008) 2427–2434
2428
The organization of the article is as follows: in the Section 1 the main equations governing regular finite-horizon optimal control problems are posed, and the resulting ODEs (HEs) are imbedded in a two-parameter family of problems in order to recover the missing initial condition for the costate variable. The Section 2 shows the application of the two first-order PDEs resulting from this imbedding to a nonlinear optimization problem arising in aircraft navigation, specially in the ‘change of set-point’ situation. The Section 3 discusses some numerical aspects to improve the computation of the optimal state/costate trajectory, and also the alternatives for abating perturbations in the state of the real plant while tracking the optimal state evolution calculated online for the nominal system. The conclusions coming up from the results of the previous sections are then rendered, and an Appendix is added to prove in dimension one the relevant equations involving missing boundary conditions as functions of the time-horizon duration and the final penalty coefficient. 1. Regular optimal control problems with a finite horizon
The elaboration here will apply to initialized, time-constant, finite-dimensional, deterministic, nonlinear control systems x(0) = x0 ,
(1) n
with the state x evolving in some open set O of R containing x0 , and assuming that f is a smooth function in its domain. The admissible control trajectories are the piecewise continuous functions u : [0, T ) → R, T < ∞, and there will be objective functions under consideration of the form Z T 0 (2) L(x(t), u(t))dt + (x(T ) − x¯ ) S (x(T ) − x¯ ) , J (u) = 0
where J (u) is a short form for J (0, T, x0 , u(·)), i.e. the sum of the cost generated by the admissible control strategy u(·) and by the state trajectory {x(t) = ϕ(t, x0 , u(·)), t ∈ [0, T ]} (with ϕ denoting the state-transition function or transition map of system (1) when t0 = 0, see Sontag (1998) and Kalman et al. (1969)) measured by the integral of the Lagrangian L(x, u) plus a quadratic final penalty weighted by some positive definite matrix S (this could be generalized to a non-autonomous L and to a differentiable final penalty function K : O → R, but they are kept as indicated, namely K (x) , (x − x¯ ) S (x − x¯ ) , 0
(3)
just to show the explicit forms for the main results, and to illustrate the incidence of parameters like S in the structure of the problem). This set-up will be called a (T, S)-optimal control (or dynamical optimization) problem. Clearly the desired final state is x¯ , so the problem aims to send x0 as near as possible to x¯ with optimal cost. The Hamiltonian of the problem is defined as usual H(x, λ, u) , L + λ0 f = L(x, u) + λ0 f (x, u),
(4)
where λ is the adjoint or costate variable, with values in R , in particular ∂K 0 λ(T ) = (5) (x(T )) = 2S (x(T ) − x¯ ) . n
∂x
H is assumed to be regular (Kalman et al., 1969). Precisely, for each pair (x, λ) there will exist a unique minimizing control value u0 (x, λ) such that the ‘optimal’ Hamiltonian is realized at
H 0 (x, λ) , H(x, λ, u0 (x, λ)), with u0 a smooth function of its variables.
(7)
are equivalent to a 2n-dimensional ODE in (x, λ) (see Pontryagin et al. (1962) and Sontag (1998) for the foundations of this twopoint boundary-value problem). If the HEs could be solved online, their solution denoted (x∗ (·), λ∗ (·)), then the optimal control strategy u∗ (·) would be computable at each time as a state/costate law u∗ (t) = u0 (x∗ (t), λ∗ (t)).
(8)
The immediate difficulty against this plan is the lack of information on the initial value λ(0) for the costate, whose unveiling will be undertaken next. 1.2. An exact solution based on invariant imbedding ideas
1.1. The Hamiltonian formalism
x˙ = f (x, u);
Then the HEs associated with this problem, namely !0 ∂H 0 ˙ x = , F (x, λ); x(0) = x0 , ∂λ !0 0 ˙ = − ∂H , −G(x, λ); λ(T ) = 2S (x(T ) − x¯ ) , λ ∂x
(6)
Two-point boundary-value problems for nonlinear ODEs have received much attention in applied mathematics and numerical analysis, and a variety of methods have been developed to approximate their solution. Invariant imbedding, one of the findings of Richard Bellman, attempts to include the missing final value of the state as a new variable, and the time interval under consideration as a new parameter, into the flow of the original ODE (transition function for control systems), and take advantage of the smooth dependence of the flow on these variables and parameters whenever possible. Since the flow associated with a Cauchy problem is roughly C n (n ≥ 1) with respect to state variables, initial conditions, and parameters whenever the vector field is C n (see for instance Abraham and Marsden (1978), Folland (1995) and Sontag (1998)), Bellman extrapolated this result to the boundaryconditions situation and worked with the partial derivatives of the flow to obtain dynamic equations, involving the boundary values and the optimization parameters. Here the missing initial value of the costate is added into the variables, and the final penalty coefficient into the parameters. In the Appendix a proof of the arising partial differential equations in dimension one, relating the augmented set of variables and parameters, is presented. Let us call ρ(T, S) = xT,S (T ) = x∗ (T ) the optimal final value of the state x, and σ(T, S) = λT,S (0) = λ∗ (0) the optimal initial value of the costate λ, both corresponding to a given (T, S)-optimal control problem. After defining for the case x¯ = 0 F (ρ, S) , F (ρ, 2Sρ),
G(ρ, S) , G(ρ, 2Sρ),
(9)
and after exploiting the properties of the flow associated with the HEs’ two-point boundary-value problem (see the Appendix for the rigorous treatment, and also Bellman and Kalaba (1963) for the motivating idea coming from invariant imbedding), then the following equation arises in one-dimension (additionally, see Costanza (2007) for a detailed discussion of several examples, including analytical solutions for the linear-quadratic case): G ρρT − SF + ρS = ρF, (10) 2 where ρT and ρS are notations for the partial derivatives of ρ with respect to the first and second variables, respectively. Expression (10) is a first-order, quasi-linear PDE for ρ, that can be integrated independently from σ , with the boundary condition
ρ(0, S) = x0 .
(11)
In the present context the integration of Eq. (10) for ρ is not only valuable in itself but specially for its relation to the initial
V. Costanza, P.S. Rivadeneira / Automatica 44 (2008) 2427–2434
2429
value σ of the costate variable λ. More precisely, if ρ is known or numerically approximated, then σ can be obtained from this information, namely by solving (see the proof in the Appendix) G ρσT − SF + σS = 0, (12) 2 which is another first-order PDE, in this case a linear and homogeneous one, subject to the boundary condition
σ(0, S) = 2Sx0 .
(13)
Solutions ρ, σ will cover a whole range of parameter values. The usefulness of imbedding an individual problem into a (T, S)-family becomes evident from these solutions, because: (i) the value ρ of the final state is revealed for each (T, S)problem without any need to compute its optimal state trajectory, indicating how near to the desired value x¯ the system will really end under each set of parameter values. This allows us to assess or modify the values of both T and S at the modeling/design level. (ii) the initial value σ of the costate is a measure of the marginal cost ∂∂Vx (0, x0 ), where V is the value function (or Bellman function) of the problem, namely V (t, x) , inf J (t, T, x, u(·)). u(·)
(14)
Therefore, knowing σ(T, S) will allow us to estimate the effect on the total cost of the perturbations/uncertainties on the initial state for each parameter set, and so helping in the choice of appropriate parameter values. (iii) the pseudo-trajectory (ρ(·, S), σ(·, S)) may qualify as a safeguard during computation of the optimal trajectory (x∗ (t), λ∗ (t)) needed to control the system via Eq. (8), specially when impulsetype perturbations are potentially conceivable (accidents, air bumps, sudden engine failures, and the like). This topic is still under research.
Fig. 1. Optimal final state for the change of set-point problem.
2.1. Changes of set-point The change of set-point (or servo) problem will be understood as the one corresponding to Eq. (15)–(16) with x¯ = x¯ c = x¯ ` = 1.5 (in this section Q = R = 1 will be maintained). It should be interpreted as the wish to steer the system from some equilibrium (in this case (x0 , u0 ) = (1, 1)) towards another one (in this case (x¯ , u¯ ) = (1.5, 1.53 )) with optimal cost, where the cost functional is completely referred to the target. The exact ‘invariant imbedding’ approach is also applicable to this problem. After the appropriate modifications in the derivations for the new relevant functions
2. A problem with similarities to aircraft routing optimization
F (ρ, S) , F (ρ, 2S(ρ − x¯ )),
The results of the previous section will be applied here to the following initialized dynamics
the resulting PDEs for the missing boundary conditions are (notice that now x¯ 6= 0 and appears explicitly in the Lagrangian cost) (i) for the final state x(T ) = ρ: # "
x˙ = −x3 + u;
x(0) = x0 = 1,
subject to the quadratic optimality criterion Z Th i 2 J (u) = Q (x(t) − x¯ c )2 + Ru2 (t) dt + S (x(T ) − x¯ ` ) .
(15)
ρT − − (16)
0
This problem, in the absence of final penalty and for x¯ c = 0, and with Q = R = 1, has been studied in Rao and Mease (2000), specially as a hyper-sensitive boundary-value problem (HSBP), since small perturbations seriously affect its numerical solution for long time horizons T . Typical optimal trajectories for x¯ c = 0, x¯ ` 6= 0 tend to exhibit a three-stage behavior, usually labelled ‘take-off’ (from x0 and towards x¯ c ), ‘cruising’ (staying around x¯ c ), and ‘landing’ (approaching x¯ ` ). During the first stage the system looks for the natural equilibrium (x¯ c , u¯ c ) = (0, 0) favored by the Lagrangian. The state remains around some steady state related to x¯ c and x¯ ` during the second segment, but after some appropriate time it is forced to depart towards the desired final state x¯ ` (third stage). This kind of performance occurs, for instance, in airportto-airport routes of a transport aircraft, but there exist similar situations in other applications (see Costanza et al. (1996) and Rao and Mease (1999)). The example was chosen mainly by its nonlinearity and low dimensionality, but the treatment here also illustrates that high sensitivity can be significantly counterbalanced, for moderate T values, by (i) recovering information on initial costates, (ii) manipulating the influence of final penalties, and (iii) using the properties of the Hamiltonian along optimal trajectories to improve HEs integration accuracy.
Sρ3
ρ − x¯
G(ρ, S) , G(ρ, 2S(ρ − x¯ )),
− S2 + 1 − 3ρ2 S ρS = −ρ3 − 2S2 (ρ − x¯ )2 ,
ρ(0, S) = x0 = 1,
(17)
(18) (19)
(ii) and for the initial costate λ(0) = σ : " #
σT − −
Sρ3
ρ − x¯
− S2 + 1 − 3ρ2 S σS = 0,
σ(0, S) = 2S(x0 − x¯ ) = −S.
(20) (21)
Figs. 1 and 2 reveal the solutions for ρ and σ in a range of (T, S) values. It can be observed that, in the range of time-horizons T ∈ [0, 1], for small final penalties the attainable final states are far from the target, and for even smaller S-values ρ results below the initial state x0 = 1 despite the target being bigger (x¯ = 1.5). But this could be put right by adopting a bigger value for S. This shows that having the whole family of solutions for a region in the (T, S)-plane is useful to assess, and eventually to modify the original optimization criterion, i.e. the information concerning ρ, σ is also a design tool. Fig. 3 shows the influence of adopting increasing values of S over the qualitative behavior of optimal state trajectories, for a fixed time-horizon T = 5, obtained by the integration of HEs with initial conditions x(0) = 1, λ(0) = σ(5, S), i.e. λ x˙ = −x3 − ; x(0) = 1 (22) 2 ˙ λ = −2(x − x¯ ) + 3λx2 ; λ(0) = σ(5, S).
V. Costanza, P.S. Rivadeneira / Automatica 44 (2008) 2427–2434
2430
Fig. 2. Optimal initial costate for the change of set-point problem.
Fig. 4. Control action for optimal trajectory corresponding to a change of set-point 1 → 1.5. T = 5, S = 60.
obtained after the following modifications
F (x, λ) = −x3 −
λ 2
,
G(x, λ) = 2Q (x − x¯ c ) − 3λx2 ,
F (ρ, S) = F (ρ, 2S(ρ − x¯ ` )) = −ρ3 − S(ρ − x¯ ` ),
(24)
G(ρ, S) = G(ρ, 2S(ρ − x¯ ` )) = 2Q (ρ − x¯ c ) − 6Sρ2 (ρ − x¯ ` ).
(25)
The PDEs now take the form (i) for the final state x(T ) = ρ: "
ρT − −
Sρ3
ρ − x¯ `
2
−S +
Q (ρ − x¯ c )
ρ − x¯ `
#
− 3ρ S ρS 2
= −ρ3 − S(ρ − x¯ ` ),
(26)
ρ(0, S) = x0 = 1, Fig. 3. Optimal trajectories for different S-values and fixed time-horizon T = 5.
For the value S = 60 the desired target x¯ = 1.5 is effectively reached, as can be seen in the same Figure. However, even in this case, the state does not climb to the target monotonically, but first descends up to less than 0.8. This is due to the high cost associated with the control action, as can be appreciated in Fig. 4. The equilibrium control corresponding to x(0) = 1 would be u = 1; however the term Ru2 = 1 would cost too much compared to the term Q (x − x¯ )2 = 0.25. Therefore, the optimal control begins by descending to approximately 0.3, then climbs a little to establish the optimal cruising value for the state (around 0.76), and for t near 3.2 it is forced to ascend by the final penalty, so driving the state to the target x¯ . In Fig. 4 an (imagined) strategy for t ≥ T is added: apply the equilibrium control u¯ = 1.53 corresponding to x¯ , which would produce a smooth landing after a negligible overshoot. 2.2. The mixed regulation/servo problem When x¯ c 6= x¯ ` in the cost functional (16), then the problem may be considered as pursuing a combination of the regulation and the change of set-point objectives, balanced by the relative weights of the coefficients Q , S (R will be kept equal to 1). During the time period [0, T ) the state trajectory should try to stay near the cruising state x¯ c , but is compelled to land near x¯ ` at the final time T . The invariant imbedding equations corresponding to this situation are
(23)
(27)
(ii) and for the initial costate λ(0) = σ : " #
σT − −
Sρ3
ρ − x¯ `
− S2 +
Q (ρ − x¯ c )
ρ − x¯ `
− 3ρ2 S σS = 0,
σ(0, S) = 2S(x0 − x¯ ` ).
(28) (29)
These equations were run for x0 = 1, x¯ c = 0, x¯ ` = 1.1, and a range of (T, S, Q ) values. For T = 3, S = 20, the following results were obtained after integration of the corresponding HEs: Q
x(1.5) → x¯ c
x(3.0) → x¯ `
σ = λ(0)
1.0 2.0
0.337 0.219
1.062 0.970
0.724 1.408
Therefore, keeping S fixed, for Q < 1 a trajectory cruising higher than 0.337 and landing higher than 1.062 would be expected. Conversely, for Q > 2 smaller cruising and landing altitudes than those reported in the table would be obtained. This is typical of optimal control limitations; not all the isolated summands Ji of the P cost objective J = i Ji are optimized individually by the control strategy that optimizes the whole J . 3. Some numerical aspects The numerical integration of the first-order PDEs associated with the invariant imbedding approach is not straightforward (see for instance Bergallo, Costanza, and Neuman (2006)). However, since that integration is performed and validated off-line, it will
V. Costanza, P.S. Rivadeneira / Automatica 44 (2008) 2427–2434
2431
be assumed here that the solution can be obtained after enough computational effort, and that it will be reasonably accurate. One way to check this accuracy will be illustrated by the following example. For the set-point-changing problem with (T, S) = (5, 60), the values ρ(T, S) = 1.40899; σ(T, S) = −0.57205 have resulted from the numerical solution of the appropriate PDEs and boundary conditions, namely Eqs. (18)–(21). From these, the final value of the costate may be readily estimated
λ∗ (T ) = 2S(x∗ (T ) − x¯ ) = 2S(ρ(T, S) − x¯ ) = −10.92079.
(30)
The Hamiltonian (31) along the optimal trajectory
H 0 (x∗ (t), λ∗ (t)) = [x∗ (t) − 1.5]2 − λ∗ (t) [x∗ (t)]3 −
[λ∗ (t)]
2
(31) 4 should be constant, in particular it must be the same at the initial and final times, as checked below
H 0 (x∗ (0), λ∗ (0)) = H 0 (1.00000, −0.572050) ≈ 0.74024
(32)
H (x (5), λ (5)) = H (1.40899, −10.92079) ≈ 0.74024.
(33)
0
∗
0
∗
Eq. (31) may be used to calculate the Hamiltonian when the HEs (22) are integrated online, and the resulting value compared against the adopted ‘constant’ (an appropriate average of (32)–(33)). Fig. 5. Compensation with ε values.
3.1. Accuracy improvement on optimal trajectories’ generation The symplectic structure (Abraham & Marsden, 1978) of Hamiltonian equations is a source of numerical instability near equilibrium. Besides, in general their nonlinearity induces lack of precision in the numerical solutions for increasing t. When online generation of the optimal trajectory (x∗ (·), λ∗ (·)) is adopted, it seems convenient to reinforce the accuracy of standard ODEs integrators to prevent these inaccuracies, since the nominal value of the control is constructed directly from this information (8). Here a simple and effective technique with such objective is proposed: (i) The value of the Hamiltonian H 0 , being theoretically constant along optimal trajectories, implies that its transpose gradient 0 ∂H 0 ∂H 0 (34) (∇H 0 )0 =
∂x
∂λ
˙ 0. is orthogonal to the velocity vector v = (x˙ , λ) 0 e (ii) At each point (e x(t), λ(t)) in the numerical solution to the g0 (t) may be evaluated HEs, the Hamiltonian H 0 (e x(t), e λ(t)) = H from (31), and possibly it will differ from its exact value H 0 = H 0 (x∗ (t), λ∗ (t)) ≡ constant. By calling g0 (t) − H 0 , 1H(t) , H
(35)
1z(t) , (e x(t) − x (t), e λ(t) − λ∗ (t))0 ,
(36)
∗
it becomes clear that 1z(t) will have a normal component in the same direction as sign(1H) · [∇H 0 (x∗ (t), λ∗ (t))]0 . (iii) Therefore, it is natural to attempt a correction in the velocity, say 1v(t), in the opposite direction to the normal component of 1z(t) and proportional to |1H|, i.e. " 0 # 0 ∇H 0
= −ε · 1H · ∇H 0 . (37) 1v = −γ · |1H| · sign(1H) · 0
∇H
This adjustment looks similar to the ‘MIT rule’ for adaptive control γ (Åström & Wittenmark, 1989), ε = ∇H ≥ 0 playing the role of 0
k
k
a ‘relaxation coefficient’ to be tuned in each specific problem. (iv) With the corrected v → v + 1v the HEs to integrate are now !0 !0 ∂H 0 ∂H 0 x˙ = − ε(1 H ) ∂λ ∂x ! !0 (38) 0 0 ∂ H ∂H 0 ˙ − ε(1H) . λ = −
∂x
∂λ
Notice that no correction is made when 1H = 0. Also, the form of Eq. (38) show that the computational cost of the additional term (37) will be small since the partial derivatives of H 0 may be calculated only once at each step. Fig. 5 illustrates the differences 1H(t) for some values of the relaxation coefficient ε, showing that an adequate value in this case would be ε ≈ 0.1. 3.2. Perturbations abatement through optimal tracking The optimal state solution x∗ (·) coming from HEs’ integration provides a desired state trajectory, but at some time t the state x(t) of the real system may differ from x∗ (t) due to perturbations in the signals. To annihilate the effect of these perturbations a corrected control u(·) may be necessary, instead of the formally optimal control H-optimal u∗ (·) = u0 calculated from (8). Assuming that these differences are relatively small, then the deviation variables X (t) , x(t) − x∗ (t),
U (t) , u(t) − u∗ (t),
t ∈ [0, T ]
(39)
must approximately follow a linear dynamics (see Sontag (1998) for details) X˙ (t) = x˙ (t) − x˙ ∗ (t) = f (x(t), u(t)) − f (x∗ (t), u∗ (t))
≈ fx (x∗ (t), u∗ (t))X (t) + fu (x∗ (t), u∗ (t))U (t) = A(t)X (t) + B(t)U (t).
(40)
The control deviation U may then be designed, either in a finitehorizon context (typically with the same duration T < ∞ of the original problem), or in an asymptotic tracking fashion. 3.2.1. Finite-horizon optimal tracking Assuming that the finite-horizon set-up with the same value for T is adopted to treat the optimal tracking problem, then a collateral optimization problem arises for the deviations, consisting of the dynamics (40) with A(t) = −3[x∗ (t)]2 ,
B(t) = 1,
(41)
subject to some optimality criterion, which may well be
J (U ) =
Z 0
T
h
i Q1 X 2 (t) + R1 U 2 (t) dt + S1 X 2 (T ).
(42)
2432
V. Costanza, P.S. Rivadeneira / Automatica 44 (2008) 2427–2434
The corresponding Riccati differential equation (DRE) for this problem is then (with R1 = 1 for simplicity) P˙ = P 2 (t) − 2A(t)P (t) − Q1 ,
(43)
P (T ) = S1 ,
(44)
from whose solution the optimal deviation feedback can be programmed U ∗ (t) = −P (t)X (t),
(45)
and therefore the real control applied to the system should be, at each time t, u(t) = u∗ (t) + U ∗ (t) = u0 (x∗ (t), λ∗ (t)) − P (t) [x(t) − x∗ (t)] .
(46)
This tracking procedure was applied to the change of set-point problem studied before, now in presence of state perturbations, with Q1 = 1, S1 = 60. The tracking performance was very good, as will be seen when comparing against asymptotic tracking (Fig. 7). The main drawback of this method is that P(t) must be calculated off-line, because DRE (43) needs a priori knowledge of the whole optimal trajectory x∗ (t), t ∈ [0, T ] ((41) and (44)). This precludes the possibility of computing the whole control strategy in real time. 3.2.2. Asymptotic tracking Online computations of the control are mandatory, and online computation of the optimal trajectory is highly desirable, so a method that computes u∗ (t) and its correction along with the real process may be preferred to DRE. The algebraic Riccati equation (ARE) is studied as an alternative to approximate DRE. The ARE nominal expression is 0 = 2A¯ P¯ + Q1 − P¯ 2 ,
Fig. 6. Online tracking of the optimal trajectory subject to perturbations in the initial state and an impulse in an intermediate instant, using adaptive ARE, Q1 = 1, S1 = 60.
(47)
where A¯ would assume some characteristic (constant) value for the A(t) given by the DRE method (41). Then P¯ is the corresponding
(positive definite) constant gain that would replace P(t) in the feedback law (45). The value of A¯ may be also adapted online. In Fig. 6 an application is shown where A¯ is calculated twice: at the beginning, due to the perturbed initial state (resulting clearly in A¯ = −3, P¯ ≈ 0.16230), and after a second perturbation was imposed (at t ≈ 2), resulting in a gain that doubles the first one: P¯ ≈ 0.26902. The relative difference in the tracking performance of the DRE and ARE methods, namely [Jt (UDRE ) − Jt (UARE )] /JT (UDRE ), where Jt here means the cumulative cost corresponding to the integration of Eq. (42) in [0, t], is shown in Fig. 7 for different values of the coefficient Q1 ≥ 1. In general, DRE seems better, as expected (its cost results smaller than ARE’s); but the differences are not significant. Actually, for Q = 1 both costs are almost the same, and even ARE looks better at the end (since the tracking is applied to a linearization of the real deviation, no strict optimality of DRE is guaranteed). Therefore, tracking through the ARE approach looks a good alternative for abating perturbations, and it allows us to do so in a completely online scheme. 4. Conclusions An efficient strategy for treating regular optimal control problems in the nonlinear system context has been presented and illustrated. Hamilton’s equations coming from optimal control theory are treated as the central object of this work. They are ordinary differential equations that can be explicitly posed when the Hamiltonian function of a given optimal control problem can be uniquely optimized with respect to u, and when the minimizing control value can be expressed as an explicit function u0 of the remaining variables. The main obstacle in the practical use of these ODEs has always been that for classical finite-horizon situations
Fig. 7. Comparison of DRE and ARE trackings for perturbation abatement.
they generate two-point boundary problems, usually difficult to solve, sometimes impossible. The proposed strategy for dynamic optimization consists of: (i) In the paper a rigorous approach motivated by invariant imbedding ideas generates a pair of first-order PDEs for the missing boundary conditions ρ, σ . This PDEs should be integrated first, off-line from the process (standard software like Mathematica or MATLAB can handle the situation). Definitive values for T, S are adopted. (ii) The solutions to these equations allow us in turn to numerically integrate the original HEs online with the control process, from the initial conditions x(0) = x0 ; λ(0) = σ(T, S), and to continually construct the optimal value of the manipulated variable u∗ (t) = u0 (x(t), λ(t)) from the state and costate values of the integrated trajectory. (iii) Perturbations in the physical system are corrected by an optimal tracking component added to the control constructed in the previous step. The online provision of an accurate value for the state variable is most valuable in practical situations, since some physical states of nonlinear control systems (typically chemical or biological species’ concentrations) are hardly measurable at all times. Controlling through HEs equations has obvious advantages over optimal control methods based on the Hamilton–Jacobi–Bellman
V. Costanza, P.S. Rivadeneira / Automatica 44 (2008) 2427–2434
PDE previously devised for the bilinear-quadratic problem (known as power series expansion approaches). So the integration of the quasi-linear PDE proposed in this article is worth the effort, above all when mathematically oriented software can render accurate solutions, as has been illustrated in the examples. Having the values of ρ, σ for a wide range of T, S values may be helpful at the control design level. From one side, the values of T, S can be reconsidered by the designer when acknowledging the final values of the state ρ(T, S) that will be obtained under the present conditions. And if a change in the parameter values is decided, then it will not be necessary to perform additional calculations to manage the new situation. Besides, the value of σ(T, S) is an accurate measure of the ‘marginal cost’ of the process, i.e. it measures how much the optimal cost would change under perturbations, which can also influence the decision on T, S values. Appendix. PDEs for missing boundary conditions Let us assume that for each combination of parameters (T, S) the two-point boundary-value problem has a unique solution, varying smoothly with smooth changes in parameters. For the system ( x˙ = F (x, λ) λ˙ = −G(x, λ)
(A.1)
it will be assumed that a smooth flow φ : R × O → Rn exists, with O denoting some sufficiently large open set in Rn × Rn , such that φ(t, x, λ) will render the values of the state and costate at time t along the trajectory starting (when t = 0) at the generic value (x, λ). The following notation indicates that the two ‘components’ of the flow, referring to the state (denoted by φ1 below) and the costate (denoted φ2 ), each in Rn , will be considered separately
φ(t, x, λ) = (φ1 (t, x, λ), φ2 (t, x, λ)),
(A.2)
ρ(T, S) , xT,S (T ) = φ1 (T, x0 , σ(T, S)),
(A.3)
σ(T, S) , λT,S (0),
(A.4)
λ (T ) = 2Sρ(T, S) = φ2 (T, x0 , σ(T, S)), T,S
(A.5)
T,S
where (x (.), λ (.)) is the optimal trajectory corresponding to a fixed-horizon duration T and a fixed penalty weight, and the point in (.) stands for the independent variable time t ∈ [0, T ]. By taking partial derivatives with respect to T in Eq. (A.3) (the notation D1 = ∂∂T is adopted, and similarly for other partial derivatives when necessary), D1 ρ(T, S) = D1 φ1 (T, x0 , σ(T, S)) + D3 φ1 (T, x0 , σ(T, S))D1 σ(T, S).
(A.6) Since the existence of the flow implies D1 φ(t, x, λ) = (F (φ(t, x, λ)), −G(φ(t, x, λ))),
(A.7)
i.e. it ‘verifies’ the original ODEs (A.1); then, at the final time T, D1 φ1 (T, x0 , σ(T, S)) = F (φ(T, x0 , σ(T, S)))
= F (φ1 (T, x0 , σ(T, S)), φ2 (T, x0 , σ(T, S))) = F (x(T ), λ(T )) = F (ρ(T, S), 2Sρ(T, S)),
(A.8)
which allows us to rewrite Eq. (A.6) simply as
ρT = F + φ1λ σT ,
(A.9)
with F standing for F (ρ, S) , F (ρ, 2Sρ).
(A.10)
Now, the derivative of Eq. (A.3) with respect to S gives, in the simplified notation,
ρS = φ1λ σS ,
(A.11)
∂ ∂ ∂T , ∂S
with Eq. (A.5) renders
2SρT = −G + φ2λ σT ,
(A.12)
2(ρ + SρS ) = φ2λ σS ,
(A.13)
where G is consistently defined as G(ρ, S) , G(ρ, 2Sρ).
(A.14)
Formally, from Eqs. (A.9) and (A.11) φ1λ can be eliminated; and so can φ2λ from Eqs. (A.12) and (A.13); to obtain, respectively,
ρS σT + (F − ρT )σS = 0, 2(ρ + SρS )σT − (2SρT + G)σS = 0.
(A.15) (A.16)
Therefore, in order for the problem to have a nontrivial solution, the determinant of the augmented system must be null, i.e. G ρρT − SF + ρS = ρF, (A.17) 2 which is a first-order, quasi-linear PDE for ρ, that can be integrated independently from σ , with the boundary condition (reflecting a process of duration T = 0)
ρ(0, S) = x0 .
(A.18)
Also, by performing ρ · [Eq. (A.15)] + σS · [Eq. (A.17)] the relevant relation for the partial derivatives of σ is obtained, namely G σS = 0, (A.19) ρσT − SF + 2 in this case a linear, but homogeneous, first-order PDE, subject to the boundary condition
σ(0, S) = 2Sx0 .
and, accordingly, the following identities become clear
T,S
and repeating the procedure
2433
(A.20)
References Abraham, R., & Marsden, J. E. (1978). Foundations of mechanics (2nd ed.). Reading, Massachusetts: Benjamin, Cummings. Åström, K. J., & Wittenmark, B. (1989). Adaptive control. Reading, Massachusetts: Addison-Wesley. Bellman, R., & Kalaba, R. (1963). A note on Hamilton’s equations and invariant imbedding. Quarterly of Applied Mathematics, XXI, 166–168. Bergallo, M., Costanza, V., & Neuman, C. (2006). La ecuación generalizada de Riccati en derivadas parciales. Aplicación al control de reacciones electroquímicas. Mecánica Computacional, XXV, 1565–1580. Costanza, V. (2007). Finding initial costates in finite-horizon nonlinear-quadratic optimal control problems. Optimal Control Applications & Methods. Online early-view publication in http://www3.interscience.wiley.com/cgi-bin/ fulltext/116324177/PDFSTART. Costanza, V. (2005). A variational approach to the control of electrochemical hydrogen reactions. Chemical Engineering Science, 60, 3703–3713. Costanza, V., & Neuman, C. E. (2006). Optimal control of nonlinear chemical reactors via an initial-value Hamiltonian problem. Optimal Control Applications & Methods, 27, 41–60. Costanza, V., Tuncelli, A. C., & Neuman, C. E. (1996). Rotational speed tracking for hydraulic-motor-driven machine-tools. Optimal Control Applications & Methods, 17, 309–327. Folland, G. B. (1995). Introduction to partial differential equations (2nd ed.). Princeton, NJ: Princeton University Press. García, R. A., & D’Attellis, C. E. (1995). Trajectory tracking in nonlinear systems via nonlinear reduced observers. International Journal of Control, 62, 685–715. Kalman, R. E., Falb, P. L., & Arbib, M. A. (1969). Topics in mathematical system theory. New York: McGraw-Hill. Pontryagin, L. S., Boltyanskii, V. G., Gamkrelidze, R. V., & Mischenko, E. F. (1962). The mathematical theory of optimal processes. New York: Wiley. Rao, A. V., & Mease, K. D. (1999). Dichotomic basis approach to solving hypersensitive optimal control problems. Automatica, 35, 633–642. Rao, A. V., & Mease, K. D. (2000). Eigenvector approximate dichotomic basis method for solving hyper-sensitive optimal control problems. Optimal Control Applications & Methods, 21, 1–19. Sontag, E. D. (1998). Mathematical control theory. New York: Springer.
2434
V. Costanza, P.S. Rivadeneira / Automatica 44 (2008) 2427–2434
Vicente Costanza was born in Rosario, Argentina in 1950. He studied Chemical Engineering at the Universidad Tecnológica Nacional, in Rosario, until 1973; and then at Princeton University, USA, where he obtained his Ph.D. degree in 1980. After a postdoctoral stay at Caltech, Pasadena, he returned to Argentina, where he initiated research work on Control of Nonlinear Systems and graduate teaching in subjects related to Applied Mathematics at INTEC, Santa Fe, where he holds a full professorship. He has been a visiting professor at ETHZürich and EPF-Lausanne, at Universidad de Buenos Aires, and other institutions. In 2008 he will be in sabbatical at the Centro de Matemática Aplicada, Universidad Nacional de San Martín, Argentina.
Pablo S. Rivadeneira was born in Ipiales, Colombia in 1981. He received the Control Engineering degree from the Universidad Nacional de Colombia, at Medellín, in 2005. He is now a candidate for the Chemical Technology Doctorate at the Universidad Nacional del Litoral, Santa Fe, Argentina. His main research interests include mathematical optimization, optimal and geometrical control.