Journal Pre-proof Robust min-max optimal control design for systems with uncertain models: A neural dynamic programming approach Mariana Ballesteros, Isaac Chairez, Alexander Poznyak
PII: DOI: Reference:
S0893-6080(20)30018-6 https://doi.org/10.1016/j.neunet.2020.01.016 NN 4379
To appear in:
Neural Networks
Received date : 18 June 2019 Revised date : 7 January 2020 Accepted date : 14 January 2020 Please cite this article as: M. Ballesteros, I. Chairez and A. Poznyak, Robust min-max optimal control design for systems with uncertain models: A neural dynamic programming approach. Neural Networks (2020), doi: https://doi.org/10.1016/j.neunet.2020.01.016. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 Elsevier Ltd. All rights reserved.
Journal Pre-proof
Robust Min-Max Optimal Control Design for Systems with Uncertain Models: a Neural Dynamic Programming Approach?
pro of
Mariana Ballesterosa , Isaac Chairezb,∗, Alexander Poznyaka a Department
b Department
of Automatic Control, CINVESTAV-IPN, Mexico City of Bioprocesses, UPIBI-Instituto Polit´ecnico Nacional, Mexico City
Abstract
lP
re-
The design of an artificial neural network (ANN) based sub-optimal controller to solve the finite-horizon optimization problem for a class of systems with uncertainties is the main outcome of this study. The optimization problem considers a convex performance index in the Bolza form. The dynamic uncertain restriction is considered as a linear system affected by modeling uncertainties, as well as by external bounded perturbations. The proposed controller implements a min-max approach based on the dynamic neural programming approximate solution. An ANN approximates the Value function to get the estimate of the Hamilton-Jacobi-Bellman (HJB) equation solution. The explicit adaptive law for the weights in the ANN is obtained from the approximation of the HJB solution. The stability analysis based on the Lyapunov theory yields to confirm that the approximate Value function serves as a Lyapunov function candidate and to conclude the practical stability of the equilibrium point. A simulation example illustrates the characteristics of the sub-optimal controller. The comparison of the performance indexes obtained with the application of different controllers evaluates the effect of perturbations and the sub-optimal solution.
urn a
Keywords: Approximate dynamic-programming; Artificial Neural Networks; Hamilton-Jacobi-Bellman equation; Bellman function; Sub-optimal controller.
1. Introduction
1.1. Brief survey
Jo
5
1.1.1. Optimal control for systems with exact mathematical models Optimal control theory has the main objective of designing the input function for the dynamic system in such a way that a given performance criterion can be optimized. Consumed energy, norm of tracking error or prescribed convergence time, among others can be formalized as performance criterion. It is widely accepted that Dynamic ? Mariana Ballesteros is sponsored by Mexico scholarship from the National Council for Science and Technology, (Scholar reference: 550803) ∗ Corresponding author Email addresses:
[email protected] (Mariana Ballesteros),
[email protected] (Isaac Chairez ),
[email protected] (Alexander Poznyak)
Preprint submitted to Journal of LATEX Templates
January 26, 2020
Journal Pre-proof
30
35
40
45
50
55
pro of
re-
25
lP
20
1.1.2. Robust optimal control for systems with uncertain mathematical models Robust controllers are aimed to enforce the existence of a suitable practical equilibrium point that may solve the stabilization and the tracking trajectory problems for uncertain systems (including the class of systems proposed in this study). There are several theories to design robust controls such as: H∞ guaranteed cost control (Xue Anke et al., 2001; Modares et al., 2014b), attractive ellipsoid (Azhmyakov, 2011; Poznyak et al., 2014; Azhmyakov et al., 2019), Sliding Mode control (Utkin et al., 2009; Saad et al., 2019; Edwards & Spurgeon, 1998) and Artificial Neural Networks (ANN) (Yang et al., 2015; Tang et al., 2016; Patan, 2018; Wang et al., 2018), among others. The combination of robust and optimal control theories is not common because they are dealing with systems having dissimilar characteristics. Under some given conditions of the uncertain system that should be regulated, a degree of robustness has been considered in the design of optimal controllers. Usually, these control approaches are known as realizable sub-optimal control methods. Systems with parametric uncertainties, multimodel representation (Poznyak & Boltyanski, 2012), or bounded perturbations (Mulje & R.M.Nagarele, 2016) have been controlled sub-optimally. In general, sub-optimal robust approaches try to compensate the unknown section of the system exactly, but the effect of uncertainties over the optimization criterion has not been completely researched. Other approaches use the optimal control estimated by the DP theory, but assuming that the system is not perturbed.
urn a
15
Jo
10
Programming (DP) and Maximum Principle (Sage, 1968; Kirk, 2004; Liberzon, 2012) provide the fundamentals to calculate the optimal controllers. The DP theory designs optimal controllers presented as close-loop solutions. This characteristic makes these controllers more useful within the automatic control area. The potential solution coming from an optimal control approach requires three principal parts: 1) an accurate mathematical model of the system that should be controlled, 2) the performance criterion (usually known as the cost functional) which must be optimized and 3) the set of state or/and control restrictions for the system (Bryson, 1975). In addition, optimal control problems in most practical applications consider the optimization of the performance index in the infinite horizon time period (Palanisamy et al., 2015). The infinite horizon controls are preferred because the gain of the optimal controller depends on the solution of algebraic matrix equations (the Riccati equation is the most common of such matrix relationships). Nevertheless, if the optimal control problem considers the finite-time horizon problem (which usually corresponds to several real control design problems), time varying gains normally modify the control action (see, for example, the first 4 chapters in Poznyak & Boltyanski (2012)). Even in these challenging cases, there are remarkable results in the optimal control theory, but the necessity of having the exact mathematical model of the system cannot be neglected. One of the major difficulties in solving optimal control problems relies on the assumption that the precise mathematical model is available, which appears to be unfeasible for most real plants. Indeed, the application of optimal control solutions on diverse real systems is limited because of the lack of sufficient accurate mathematical models. If the system that must be controlled is uncertain and/or affected by external perturbations, the classical optimal control approaches cannot be applied straightforwardly (Bertsekas, 1995). The alternative of robust controllers appeared as an option to deal with modeling uncertainties and the effect of bounded perturbations (see Chapters 5-17 in Poznyak & Boltyanski (2012)).
2
Journal Pre-proof
80
85
90
95
100
105
pro of
re-
75
lP
70
urn a
65
Jo
60
The degree of robustness is evaluated using the variation of the performance index by neglecting the uncertain section of the system. Independently of the uncertainties in the model, the application of the main tools of DP theory leads to the design of the gains in the controller using the solution obtained of a partial differential equation known as the Hamilton-Jacobi-Bellman (HJB) equation (Bertsekas, 1995). The HJB equation may have analytic solution only if the system satisfies an exact mathematical representation of the plant and no perturbations affect the dynamics. The HJB equation has a continuous and differentiable solution in some specific cases. Indeed, the effect of perturbations or uncertainties on the solution of the HJB equation is analyzed just in a small number of studies (Lewis & Vrabie, 2009; Masmoudi et al., 2011). Furthermore, the stability (in some well-defined sense) of the equilibrium point, forced by the application of the sub-optimal controller, has not been clearly explained yet. One manner to overcome the problem of finding a solution for the HJB equation (maybe not exact but close enough to it) is the Approximate Dynamic Programming (ADP) methodology (Powell, 2007; Lewis & Liu, 2013; Jiang & Jiang, 2014). This technique, aimed to attain a justifiable approximation of the HJB solution, uses diverse methods of approximation. For example, in Beard et al. (1997), the infinite horizon control design was realized based on the implementation of a suitable approximation for the so-called generalized HJB equation implementing the discretization technique proposed by Galerkin. An alternative manner to get a feasible solution for the optimal control problem is using the concept of state dependent Riccati equations (Cloutier & Cockburn, 2001; C¸imen, 2008; Heydari & Balakrishnan, 2013). The studies developed in Huang et al. (2000) and Wang et al. (2003) used the method of characteristics yielding an analytic solution, which was linearly parametrized by some not-unique constant scalars. This study showed the handling of perturbations using non explicit close-loop forms. In other studies, the approximation uses ANN structures without internal feedbacks (Heydari & Balakrishnan (2013), Kim et al. (2000), Cheng et al. (2007), Yang & He (2018), among others). The aforementioned method is known as Neuro-Dynamic Programming (NDP), which uses the artificial intelligence technique known as Reinforcement Learning (RL). RL uses iterative algorithms to obatin smaller approximation errors with respect to the optimal solutions, which are also referred as cost-to-go, Q-factors, or optimal policy. In most of such cases, the DP approach involves a dimensional problem. The NDP has been successfully applied to solve optimization problems which have large number of states or/and complex dynamics (Bertsekas & Tsitsiklis, 1996). NDP is a technique that may find optimal or sub-optimal solutions based on the principle of non-supervised or semi-supervised learning techniques. The NDP technique has been used in different applications. In Wang & Qiao (2019), the approximate optimal control design is described for a class of non-affine systems, with the application in a torsional pendulum device; a class of robustness in the design is included in Wang (2019) for systems that are disturbed by uncertain factors. 1.2. Main contribution of the paper The NDP may apply diverse learning techniques, such as the reinforcement learning (RL), adaptive automatic critics, integral RL, as well as the reinforcement Q scheme, which are classified as unsupervised methods (Lewis & Vrabie, 2009; Vamvoudakis et al., 2014; Kiumarsi et al., 2014; S.Sutton & Barto, 2012). In particular, NDP has been used to solve finite horizon optimal problems in systems affected by admissible 3
Journal Pre-proof
115
120
pro of
110
uncertainties (Modares et al., 2014a; Vrabie & Lewis, 2009; Masmoudi et al., 2011; Jiang & Jiang, 2014). Therefore, based on all the aforementioned works, a closed loop solution for an optimal control problem in finite horizon time can be obtained by the DP approach. However, for the class of systems tackled in this study, a robustness property in the design must be included. Furthermore, considering the uncertainties and the theoretical findings of ADP and in particular of NDP, ANN can be used for the approximation in this class of systems. The main novelties and contributions of this paper consist on the following aspects: • the exact HJB equation with the max-min form for the class of quasi-Lipschitz uncertain dynamics is obtained, and the final exact analytic expression for this max-min Hamiltonian is found; • a different ANN structure for the NDP is suggested to approximate the exact solution of the max-min HJB equation, justifying the suggested finite series ANN approximation and including the robustness property analysis;
125
re-
• the learning law for the adjustment of the NDP weights is given in the explicit continuous-time form of ordinary differential equations (ODE); • the workability analysis (the ultimate boundedness of the approximation error) is realized by the Lyapunov-like stability theory, for the characterization of the class of control-design parameters;
lP
• the obtained theoretical results are also confirmed by the presented numerical examples. 1.3. Structure of the paper
140
urn a
135
The rest of this manuscript is defined by the following order: - Section II details the problem statement; - Section III defines the specific aspects of the robust-optimal approach. In addition, in this section, the Theorem of the min-max HJB equation is described; - Section IV introduces the ANN topology which establishes the approximation for the Value function; - Section V details the convergence of the obtained procedure and the NDP characterization is summarized in the main Theorem; - Section VI presents some aspects, dealing with the numerical solutions for the NADP, which are analyzed to realize the robust optimal control; - Section VII closes the paper with some conclusions associated to this work. 2. System Dynamics and Problem Statement The class of studied uncertain systems satisfies:
Jo
130
x(t) ˙ = [A0 + ∆A(x,t)]x(t) + B0 u(t) + ∆B(x,t), t ∈ [s, T ]; x(s) = y.
(1)
Here, the vector x ∈ Rn defines the system state, the vector u ∈ Uadm ⊂ Rm corresponds to the control. The set Uadm defines the set of admissible controllers, u ∈ Uadm ⊂ Rm . The admissible controllers are piece-wise continuous in the interval [s, T ]. Both 4
Journal Pre-proof
150
155
matrices A0 ∈ Rn×n , as well as B0 ∈ Rn×m are constant and known, the column rank of B0 is equal to m. The state-associated parametric uncertainties ∆A : Rn ×R+ → Rn×n depend on both time and state, while ∆B : Rn × R+ → Rn represents the external perturbations, which is unknown. In this study, assume that the non-perturbed system in (1), that is ∆A = 0 and ∆B = 0, can be stabilizable. The state-associated uncertain section ∆A is absolutely bounded (entry by entry) while ∆B(x,t) is continuous. The class of systems in (1) includes a wide range of potential nonlinear plants. The so-called parametric robustness control design has used the class of systems described by (1) for many years (Yu, 1983; Gu et al., 1990; Qu, 1993). The uncertainties sections, that is ∆A and ∆B, belong to a given admissible set characterized as:
pro of
145
Assumption 1. The set of admissible uncertainties for (4) is: > Ψadm := η = [∆A(x,t), ∆A(x,t) ≤ α2 ; >∆B(x,t)] : tr ∆A (x,t), tr ∆B (x,t), ∆B(x,t) ≤ β2 ,
lP
urn a
165
The selected type of uncertain dynamics included in (1) can describe different plants such as biotechnological, mechanical, chemical, among others (Spong et al., 2006). Problem statement. The main aim proposed in this study corresponds is to design the controller function u∗ ∈ Uadm ⊆ Rm such that the application of the controller on (1) minimizes the proposed performance criterion J. For this study, this criterion is subjected to the class of uncertainties included in ψadm . The proposed method used to design the controller applied the min-max optimization methodology, with the minimization running over the admissible control set Uadm . The maximization should be calculated with respect to the uncertainties. In consequence, the value of η∗ ∈ ψadm that maximizes the Hamiltonian should be calculated exactly. Then, the controller should include a given degree of robustness with respect to the bounded perturbations included in Ψ, and the uncertainties associated to the parameters. The cost functional satisfies the named Bolza form corresponding to: J(s, y; u(·)) := h0 (x(T )) +
Z T
t=s
h(x(t), u(t))dt.
(3)
The functional (3) is related with the state, the control signal, and the evolution time Poznyak & Boltyanski (2012). The specific structure of the vector field h : Rn ×Uadm → R corresponds to the following sum of quadratic forms: h(x, u) = x> Qx + u> Ru.
Here Q ∈ Rn×n and R ∈ Rm×m are symmetric and positive matrices. The positive scalar function h0 : Rn → R is also continuous defining the so-called terminal condition. Therefore, the design of the optimal controller is associated to the next min-max optimization problem of the criterion given in (3):
Jo
160
re-
where t ∈ [s, T ], s ≥ 0, ∀x ∈ Rn η ∈ Rn×(n+1) .
(2)
max J(s, y; u(·)) →
η∈Ψadm
subject to (1).
5
min
u(·)∈Uadm [s,T ]
Journal Pre-proof
3. Robust Optimal Control Design The mathematical model (1) is equivalent to: > x˙ = A0 x + B0 u + η x> 1 , x0 = x(0), t ∈ [0, T ].
(4)
Here, the scalar function V (t, x) receives the name of value function. Such proposal is a robustified reformulation of the HJB where the effect of the uncertainties has been taken into account. Definition 1. The given function V is defined for any (s, y) ∈ [0, T ) × Rn by: V (s, y) := min max J(s, y; u(·)).
re-
u∈Uadm η∈Ψadm
The corresponding boundary condition for V is V (T, y) = h0 (y). In (5), the scalar function H : Rn × Rn ×Uadm × Ψadm → R is the Hamiltonian, which is defined as: H(ρ, x, u, η) := ρ> x˙ − h(x, u),
η∈Ψadm
lP
where ρ ∈ Rn is the adjoin variable. The first step in the optimal control design is finding η∗ such that the Hamiltonian should be minimized with respect to the controller given in the admissible class of uncertainties for both ∆A and ∆B. This process solves arg max {J(t, x; u(·))} according to the main theorem introduced in the follow-
urn a
ing subsection. Considering only the terms containing η, the solution for this part of the problem can be achieved as: ( ) ∂V (t, x) > x ∗ η = argmin − η . (6) 1 ∂x η∈Ψadm ∂V (t, x) Here, defines the gradient with respect to the state x for the value function V . ∂x The equation given in (6), is equivalent to: ∂V (t, x) x , , (7) η∗ = argmin − η> 1 ∂x η∈Ψadm where (·, ·) represents the inner product in the Euclidean space. The application of the Cauchy-Schwartz inequality on (7) leads to:
> ∂V (t, x) x ∂V (t, x) x
. − η> , ≥ − η (8)
1 ∂x ∂x 1
Jo
170
pro of
Here, we omitted the dependence with respect to time for sake of simplicity. The application of the DP method entails finding the continuous and smooth enough solution associated to the HJB equation evaluated on trajectories of (4): ∂V (t, x) ∂V (t, x) + max min H − , x, u, η = 0. (5) − u∈Uadm η∈Ψadm ∂t ∂x
Using the triangle inequality on (8) (with the adequate consistent norm in the space of matrices):
∂V (t, x) x
> > ∂V (t, x)
x , , ≥ − η − η 1 ∂x ∂x 1 F 6
Journal Pre-proof
pro of
q
where the norm η> F = tr ηη> is compatible with the vector norm k·k, tr ηη> = tr η> η and tr η> η = tr ∆A> ∆A + tr ∆B> ∆B . Therefore, using the admissible set of uncertainties Ψadm , one gets:
∂V (t, x) q ∂V (t, x) x 2
(9) , ≥ − − η>
∂x φ (kxk + 1), 1 ∂x where φ = α2 + β2 , with α and β given in (2). The inequality (9) can be used to obtain the exact value of η∗ by taken into account only the equality:
p ∂V (t, x) h > i ∂V (t, x) h > i −1
x 1 x 1 η = φ
. ∂x ∂x ∗
The calculus of the sub-optimal unconstrained control u∗ is given as follows (by direct differentiation of the HJB with respect to u):
re-
1 ∂V (t, x) u∗ = − R−1 B> . 0 2 ∂x
(10)
Substituting both u∗ and η∗ on the HJB equation leads to:
185
urn a
180
(11)
The solution of the restricted optimization problem for (1) is attained by getting the solution of (11). This solution cannot be obtained analytically because the nonlinear form ∂V (t, x) of (11) in relation to the gradient and with respect to η∗ . Nevertheless, Section ∂x IV describes the approximation for the HJB solution of (11) based on the application of dynamic ANN. 3.1. Max-min HJB Equation The DP method leads to sufficient conditions for the optimality of the designed controller selected from Uadm . This method is supported on the optimality principle proposed by Bellman. The following theorem defines the form of the HJB equation for systems affected by both bounded perturbations and parametric uncertainties. Theorem 1. Suppose that the function V (s, y) is continuously differentiable with respect to both arguments t and x and assume that it satisfies (11). Then, it is a feasible solution for the terminal optimization problem associated to the partial differential equation, the HJB equation (5), (t, x) ∈ (0, T ] × Rn , satisfying the given boundary conditions V (T, x) = h0 (x).
Jo
175
∂V (t, x) ∂V (t, x) > 1 ∂V (t, x) > ∂V (t, x) − A0 x + B0 R−1 B> 0 ∂t ∂x 4 ∂x ∂x
p
∂V (t, x) 2 2
−kxkQ − φ (kxk + 1)
∂x = 0.
lP
−
P ROOF. Consider the feasible control u ∈ Uadm and the following representation supported on the Bellman’s optimal principle for the so-called value function V (t, x): Z sˆ V (s, y) = min max h(x, u,t)dt +V (s, ˆ x (s)) ˆ , ∀sˆ ∈ [s, T ] . u∈Uadm η∈Ψadm
t=s
7
Journal Pre-proof
Then, using the definition of the minimum operation, the following inequality is valid: Z sˆ V (s, y) ≤ max h(x, u,t)dt +V (s, ˆ x (s)) ˆ . η∈Ψadm
t=s
η∈Ψadm
pro of
Considering that max(a1 + a2 ) ≤ max(a1 ) + max(a2 ) if a1 ≥ 0 and a2 ≥ 0; the next inequality is satisfied: Z sˆ V (s, y) + min − h(x, u,t)dt + min {−V (s, ˆ x (s))} ˆ ≤ 0. η∈Ψadm
t=s
1 and using the definition of the Value function, one may get: Multiplying by sˆ − s Z sˆ 1 min {V (s, y) −V (s, ˆ x (s))} ˆ + min − h(x, u,t)dt ≤ 0. sˆ − s η∈Ψadm t=s η∈Ψadm
Therefore:
re-
By the Mean Value Theorem, the following partial differential equation for the Value function takes place: ∂V (t, x) > ∂V (t, x) x − A0 x + B0 u + η + min {−h (x, u,t)} ≤ 0. − u ∂t ∂x η∈Ψadm ∂V (t, x) ∂V (t, x) + max min H − , x, u,t ≤ 0. − u∈Uadm η∈Ψadm ∂t ∂x
(12)
Therefore
max
η∈Ψadm
Z
lP
Consequently, for any positive scalar ε > 0 and s close to s, ˆ exists a control u ∈ Uadm satisfying: u(·) := uε,sˆ(·) ∈ Uadm [s, T ] . sˆ
t=s
h (x, u,t) dt +V (s, ˆ xsˆ) ≤ V (s, y) + ε (sˆ − s) .
urn a
The Value function evaluated at (s, y) is independent of the perturbations, then, the following inequality is valid: Z sˆ 1 max h (x, u,t) dt +V (s, ˆ xsˆ) −V (s, y) ≤ ε. (13) sˆ − s η∈Ψadm t=s The left-hand side of (13) admits the lower-bound determining the minimum operation instead of the maximum, the following inequality holds and by the fundamental theorem of calculus, (13) can be replaced by: Z sˆ ∂V (t, x) ∂V (t, x) 1 −ε ≤ min H − , x, u, η,t − dt. sˆ − s t=s η∈Ψadm ∂x ∂t
Jo
Taking the maximum value on the right-hand side with respect to the elements in the admissible control set yields: Z sˆ 1 ∂V (t, x) ∂V (t, x) −ε ≤ max min H − , x, u, η,t − dt. sˆ − s t=s u∈Uadm η∈Ψadm ∂x ∂t Assume that s ↓ s, ˆ then: −ε ≤ max
u∈Uadm
190
∂V (t, x) ∂V (t, x) H − , x, u, η,t − . ∂x ∂t η∈Ψadm min
From (12) and (14) when ε → 0, the Theorem is proven. 8
(14)
Journal Pre-proof
4. Approximated solution of the HJB equation
215
220
pro of
re-
210
lP
205
urn a
200
4.1. Weights adjustment for the ANN The feasible approximation associated to (11) is given by: V (t, x) = Va (t, x) + V˜ (t, x).
(15)
Here, Va (t, x) represents an approximate solution associated to the HJB by the following ANN structure: Va (x,t) = ω(t)σ(x) + x> P(t)x, (16)
Jo
195
The approximated solution uses an ANN structure that should consider the specific settings for the Value function (continuity, positiveness, and differentiability at least with respect to its first component). This strategy has been used in diverse control and estimation problems, as in (Vrabie & Lewis, 2009; Beard et al., 1998; Wang et al., 2014) among others. ANN have been recognized by their ability to approximate nonlinear continuous functions, at least locally (Hornik et al., 1989). This characteristic served to justify the approximation of vector valued static maps, which can even be time dependent as well as multivariable functions. Approximation properties of the ANN have been studied in several works, this fact represents a finding for this work. For example, the issue of the ANN structure to achieve a local approximation for nonlinear mappings depends on the basis of activation functions, the number of layers, the configuration or in general the complexity of the net. In Barron (1993), the approximation bounds for a class of ANN with sigmoidal activation functions are derived, the structure is a one-layer with n nodes. In Kainen et al. (2012) the role of the input dimension is considered in the estimation for the bounds of approximation errors, the structure of the ANN has a perceptron and Gaussian radial computational units. The quality of the approximation remains as an interest of research. In Guliyev & Ismailov (2018) the bound of the approximation error for a single hidden layer feedforward ANN with fixed weights is analyzed. Based on all the aforementioned properties, ANN could be implemented as feasible approximations for the trajectories corresponding to well-defined solutions of either ordinary or partial differential equations. There are variants for such approximations proposed in the last decade aimed to calculate the approximation of the solutions for the HJB equations. Commonly, these approximate solutions did not meet the requested restrictions of the HJB equations. Even more, the approximation ability associated to ANN is only justified if the number of basis components in the ANN grows unbounded. This part of the article considers two main open problems in the ANN approximation of the HJB: the first defines how to get the justification for the properties requested in the approximation of the HJB correctly and second how to get the characterization of the effect for limiting the number of components in the basis associated to the network topology.
and V˜ : R+ × Rn → R+ corresponds to the approximation error produced because of the finite number of activation functions in the ANN structure. In (16), the matrix P ∈ Rn×n is positive definite and bounded uniformly with respect to t. The structure of ω ∈ R+ and σ ∈ R+ is proposed as follows: ˜ > (t)ω(t), ˜ ω(t) = ω
˜ σ(x) = σ˜ > (x)σ(x).
9
Journal Pre-proof
225
˜ : R+ → Rr corresponds to the weights vector and σ˜ : Rn → Rr is an The function ω ANN activation vector based on an independent sigmoid activation functions. This particular structure ensures the positiveness of the approximate value function. For this study, the sigmoid function satisfies the following definition:
The following assumptions are proposed for the approximated Value function. Assumption 2. The given local approximation (Cybenko, 1989) error V˜ (t, x) corresponding to the given ANN based approximate HJB solution is absolutely bounded, that is: V˜ (t, x) ≤ V + (N).
re-
Here V + ∈ R+ and N is the number of activation functions. Notice here: V + (N) corresponds to a monotonically decreasing function with respect to the given number N of elements in the ANN. Therefore, the increment of the number of activation functions can decrease the value of V˜ (t, x) to any arbitrary small constant.
lP
Assumption 3. The proposed formal connection between the approximated HJB function Va (t, x) with the approximation error function V˜ (t, x) satisfies the inequality proposed ahead: V + (N) |Va (t, x)| .
Therefore, the HJB equation based on the approximated function Va (t, x) including the approximation error function V˜ (t, x) → 0 is governed by: ˙ ˙ −x> P(t)x − ω(t)σ(x) − ω(t)∇> σ(x)A0 x − 2x> P(t)A0 x 1 (17) + ω(t)∇> σ(x) + 2x> P(t) B0 R−1 B> 0 (ω(t)∇σ(x) + 2P(t)x) 4 p 2 − φ (kxk2 + 1) kω(t)∇σ(x) + 2P(t)xk − kxkQ = 0. ∂ of the given Here, the mathematical operator ∇ corresponds to the gradient ∂x scalar function of the vector x. The equation (17) can be split in both time dependent differential equations using the rewriting of quadratic terms x> Ric(P)x, −1 > ˙ − P(t)A0 + A> Ric(P) = −P(t) 0 P(t) + P(t)B0 R B0 P(t) p 2 4 φ (kxk + 1) 2 (18) −Q − P (t), z(ω(t), P)
urn a
235
This function is bounded, as well as differentiable real-valued scalar function, which is defined for all possible real-valued arguments. This function has no negative derivative, x ∈ Rn comes from the system (1), g ∈ R, ς ∈ Rn , c is the shifting scalar and ϕ ∈ Rn are real-valued constant vectors.
Jo
230
pro of
Definition 2. The sigmoid component in the ANN topology is defined as follows: −1 > sig(x) = g 1 + exp−ς (x−ϕ) + c.
˜ >W˜ , where and the elements ω
> σ(x) ˜ ˜ d ω(t) ω(t)∇ 1 ˜ W =− + −A0 x + ω(t)B0 R−1 B> 0 ∇σ(x) dt 2σ(x) 4 # p p 4 φ (kxk2 + 1) ω(t) φ (kxk2 + 1) −1 > +B0 R B0 2P(t)x − P(x)x − · ∇σ(x) , z(ω, P) z(ω, P) 10
(19)
Journal Pre-proof
240
with z(ω(t), P) = kω(t)∇σ(x) + 2P(t)xk, considering W˜ = 0 and Ric(P) = 0. The simultaneous on-line numerical solution of (18) and (19), and the numerical solution for the initial conditions give an approximated numerical solution of the HJB. The numerical solution is described in Section VI. 4.2. Characterization of the approximation quality The DP equation under the approximation of the Value function can be summarized in the Theorem presented here:
pro of
245
Theorem 2. Taking into account the function given in Definition 1 and equation (15) for (s, y) ∈ [0, T ) × R. Then, the relation satisfies: Z sˆ Va (s, y) = min max h (x, u,t) dt +V (s, ˆ x(s)) ˆ − V˜ (s, y), (20) u∈Uadm η∈Ψadm t=s ∀sˆ ∈ [s, T ].
η∈Ψadm
t=s
re-
P ROOF. Taking into account the Definition 1, for any u(·) ∈ Uadm [s, T ]: Z sˆ Va (s, y) ≤ max h (x, u,t) dt + J(s, ˆ x(s); ˆ u(·)) − V˜ (s, y).
(21)
Denoting the right-hand side of the inequality (21), except the approximation error function by V¯ (s, y) and considering the application of the minimum operator over the set Uadm [s, T ] with respect to u(·), it follows:
lP
Va (s, y) −V + (N) ≤ V¯ (s, y).
(22)
Notice that, for the given κ > 0 there is an admissible control uκ (·) ∈ Uadm [s, T ] in such a way that xκ (·) := x(·, s, y, uκ (·), η∗ ): max {J(s, y; uκ (·))} − V˜ (s, y) ≤ Va (s, y) + κ.
κ∈Ψadm
urn a
Therefore, the following inequality holds:
Va (s, y) + κ +V + (N) ≥ V¯ (s, y).
(23)
Assume that κ → 0, the aggregation of the inequalities proposed in (22) and (23) imply the result (20) of Theorem 2, assuming that u∗ corresponds to the sub-optimal admissible controller. Indeed, if V + → 0, u∗ is the optimal control. 4.3. Exact solution for the optimal controller design based on ANN The ANN structure proposed in (16) served to prove that the exact result for the optimal controller is attainable if the number of neurons in the ANN layer tends to infinity (Lewis & Vrabie, 2009; Vrabie & Lewis, 2009). Indeed, because V + (N) is a decreasing function with respect to its only argument. Then, if we use
Jo
250
lim sup Va (s, y) +V + (N) ≥ V¯ (s, y). N→∞
It is possible to justify that Va (s, y) ≥ V¯ (s, y), which leads to get the optimal control u∗ if N → ∞. 11
Journal Pre-proof
5. Practical stability analysis for the sub-optimal control solution The presence of uncertainties ∆A and ∆B has an impact on the value estimation for the performance index (3) evaluated over the trajectories of the nonlinear system (1). Therefore, a robustness analysis for the state trajectories is requested. The stability analysis for the equilibrium point of (1) is based on the Lyapunov stability method.
pro of
255
Proposition 1. Consider the nonlinear uncertain system given in (1) with the suboptimal controller design introduced in (10). Suppose that Assumption 1 stands, then, if there is a bounded positive and symmetric matrix P (P− ≤ P(t) ≤ P+ with P− ∈ Rn×n and P+ ∈ Rn×n positive matrices) of the time varying Riccati matrix equation (Λ1 > 0, Λ1 ∈ Rn×n ): d P(t) − P(t)A0 + A> 0 P(t) dt +2P(t)B0 R−1 B> 0 P(t) − P(t)Λ1 P(t) − Q2 = Q4 (t),
(24)
β=
270
275
(25)
lP
The proof of this proposition is given in the Appendix. Proposition 1 states sufficient conditions only. The boundedness of the stable trajectory under the robust optimal closed loop control depends on the existence of the positive solution for (18). The existence of necessary conditions for this proposition requires further research activities (a topic beyond the scope of this study). The proposed method can provide just sufficient arguments in view of the main result, which comes from a Lyapunov-like stability analysis.
urn a
265
ε0 . supt≥0 λmax {Q4 (t)}
6. On the numerical realization of ROC The solution of the proposed class of robust sub-optimal controller requires the implementation of a special class of recurrent algorithm (Algorithm 6). The algorithm is a routine to check the terminal condition for the approximated Value function and adjust the weights parameters associated to the learning laws. The numerical method used a mixed strategy executing the evolution of the states controlled by the sub-optimal controller. Each sequence was evaluated by adjusting only the initial condition for the weights and the matrix P estimated as solution of the Riccati differential equation. The parameters µ, Γ1 and Γ2 are real positive constants, the value V0 is chosen to satisfy V0 > µ, Pa ∈ R4×4 and wa ∈ R4 are the initial conditions for the first iteration (ωo and Po ). In Figure 1, the depicted scheme represents the interaction between the numerical solution and the Simulinkr simulation of the model. Notice that the numerical evaluation of each controller attained by the implementation of the ANN approximation should be realized in Simulinkr , while the external simulations executed in m-languager supervises and updates the weights values.
Jo
260
re-
with Q4 : R+ → Rn×n a time-dependent positive definite matrix, and the weights in the ANN approximation (16) governed by (19) are bounded by ω ≤ ω+ ∀t ≥ 0 with σ(x) 6= 0, ∀x 6= 0 then, the origin defines an uniformly practical stable equilibrium point of (1) (Haddad & Chellaboina, 2011) with an ultimate bound given by
12
Journal Pre-proof
280
7. Numerical simulations
re-
pro of
Algorithm 1 Recurrent algorithm to adjust the initial parameters of the learning laws equations 1: Start 2: Initialization i ← 0; ω0 (0) ← wa ; Pi (0) ← Pa ; Vi+1 (0, z(0)) ← Vi (T, z(T )); 3: µ; Γ1 ; Γ2 ; 4: while Vi (T, z(T )) ≥ µ do 5: i = i + 1; ∂ 1 6: u∗i = − R−1 B> Vi (t, x); 0 2 ∂x 7: Pi+1 (0) = Pi (T ) + Γ1 Pa ; d 8: ωi+1 (0) = ωi (0) + Γ2 Vi (t, x); dωi d 9: Exert simulation of x; dt 10: Vi+1 (0, z(0)) = Vi (T, z(T )); 11: end while 12: u∗ = u∗i 13: Stop
lP
To analyse the performance of the proposed sub-optimal controller algorithm, this study considered the next academic example. The nominal matrices A and B for the system were: 0 1 0 0 0 0 1 , B0 = 0.34 ∗ 0 0 . A0 = 0 −8 −4 −12 7 5
urn a
The uncertain matrices were: sin(100 ∗ t) sin(10000 ∗ t) 0.6 ∗ sin(10000 ∗ t) , sin(90 ∗ t) A = 0.8 ∗ cos(69 ∗ t) 0.3 ∗ sin(854 ∗ t) 0.5 ∗ sin(100t) 0.4 ∗ cos(972t) 0.6 ∗ cos(80t) 0.245 ∗ sin(10t) cos(15t) sin(0.3t) ∆B = 0.157 ∗ cos(5t) sin(0.3t) 0.245 ∗ cos(0.2t)
Jo
The matrices ∆A and ∆B belong to the given set Ψadm . The boundary values for the uncertain matrices are: α = 1.00 and β = 0.13. The proposed ANN was simulated using the following number of ANN weights ω = [ω1b ω2b ω3b ω4b ω5b ω5b ]> . A simple validation scheme used the comparison of the states calculated if the robust sub-optimal controller was evaluated in contrast to the numerical result achieved when the pole placement technique was proposed. In this numerical case, the desired poles vector corresponded to [−23.68, −0.15 + 0.8 j, −0.15 − 0.8 j]> that leads to the following control gain 2.17 −1.08 −3.25 K= . −1.66 −0.83 −2.49
285
The comparison of the state trajectories of both controllers (the ANN based sub-optimal and the pole placement based) appears in Figure 2. This comparison demonstrates that 13
re-
pro of
Journal Pre-proof
Figure 1: Structure of the routines for the numerical simulation and the adjustment of the parameters
lP
all the states converge to a bounded region near the origin with an upper bound of 0.03 after 1.0 seconds. However, the trajectories of the system dynamics controlled with the sub-optimal controller approaches the origin slower than the ones controlled with the pole-placement method. The trajectories depicted in Figure 2 served to estimate the
urn a
5
States x(t)
0
x1,Min-Max
-5
-10
0
5
10
x
2,Min-Max
x3,Min-Max x1,LQR x2,LQR x3,LQR
15
20
25
30
Time, s
290
Jo
Figure 2: States determined with the states and control actions using the sub-optimal and pole-placement approaches.
norm of x. The euclidean norm was used for comparison purposes. This comparison confirmed that the pole-placement technique enforces a faster movement of the state trajectories toward the origin within the first 1.5 seconds. The variation of the state norm also served to estimate the variation of the value function, as well as the functional time evolution (Figure 3). The evolution of the states and the evaluation of the control 14
Journal Pre-proof
100 ||x||LQR
80
60
40
20
0 0
5
10
15
Time, s
pro of
Euclidean Norm of States ||x(t)||
2
||x||Min-Max
20
25
30
Figure 3: Norm of the states vector estimated with the states and control actions using the sub-optimal and pole-placement approaches.
re-
12
×109
urn a
Performance Index J(t)
10
lP
300
function yield to determine the variation of the performance index. This comparison was a major element to define the advantage of the control design attained in this study. The comparison of the performance index calculated with the sub-optimal controller and the pole-placement technique showed a significant reduction of such value after 0.05 seconds and a final (after 0.5 seconds) of 33 % after 0.5 seconds (Figure 4). The depicted results also show that the variation of the initial condition for the weights in the ANN provides a significant variation in the performance index temporal evolution. Also, with the local optimal values for the weights of the ANN, the lowest performance index is attained (Figure 4).
8 6 4 2
5
10
×1010
1 0 0
0.02
0.04
Time, s 15
20
25
30
Time, s
Jo
0 0
2
J(t)
295
Figure 4: Cost functional estimated with the states and control actions using the sub-optimal and poleplacement approaches.
305
With the aim of showing the dependence of the performance index with respect to the initial condition of the weights, Figure 5 demonstrates the value of the performance index J(u(·)) calculated at the the of the simulation process with T = 30s. This behavior confirms the usefulness of the suggested algorithm, which is in charge of adjusting
15
Journal Pre-proof
the ANN weights. 1.2
×1010
pro of
Performance Index
1.15
1.1
1.05
1
0.95 0
20
40
60
80
100
Iterations over the W(0) adjustment
Figure 6 shows the variation of the approximate value function Va obtained with the states and the control signal calculated from the functional J(u(·)). The decreasing behavior of this value function confirmed the applicability of the approximation based on the ANN, as well as the numerical methodology used to adjust the initial weights in each round of evaluation considered in Section VI.
1010
urn a
Value function
1015
lP
310
re-
Figure 5: Cost functional estimated with the states and control actions using the sub-optimal approach presented as a function of the variation of the initial weights in the ANN.
105
100
0
5
10
15
20
25
30
Time, s
315
320
Jo
Figure 6: Value function calculated with the states and control actions using the sub-optimal controller obtained by the approximation based on neural networks.
Figure 7 demonstrates the time evolution of the controller using the sub-optimal solution and the pole-placement approach. The sub-optimal control signals were smaller than those produced by the pole-placement except during the first 0.05 seconds. This fact justifies the differences between the convergences of the states to a bounded zone near the origin. It is relevant that sub-optimal control signals reach the values obtained when the pole-placement controller is considered. This fact can be confirmed after 1.0 second of simulation. Figure 8 depicts the variation for the norm associated to the 16
Journal Pre-proof
10 u1,Min-Max u2,Min-Max u1,LQR u2,LQR
0
-5
-10 0
5
10
15
Time, s
pro of
Control function u(t)
5
20
25
30
Figure 7: Comparison of the control signals obtained by the sub-optimal approach and the PD design strategy.
re-
controller signal u. The time evolution of this function was considered due to its participation on the value function, as well as the functional estimation. The variation of the control norm together with the variation of the states norm justifies the increment of the function when the sub-optimal controller is considered in comparison to the regular pole-placement method, which is not taking care of the perturbations effect. Figure
30 25 20 15
urn a
Control function u(t)
35
lP
40
||u||Min-Max ||u||LQR
10
5
0 0
5
10
15
20
25
30
Time, s
Figure 8: Norm of control signal
330
335
˜ that were 9 shows the variation of all the four components in the weights vector ω considered in the approximation of the value function solution for the HJB equation Va (t, x). Notice that in opposition to the usual behaviour in the ANN approximation of uncertain functions, the weights obtained in this study do not converge to constant values because of the influence of the system states x and the solution of the Riccati equation P. The evolution of the weights included in the ANN structure can be complemented with the variation of sigmoidal functions. Their variations demonstrates the effect of the states evolution over the ANN structure (Figure 10). Figure 11 shows the time evolution of all the four elements included in P which was
Jo
325
17
Journal Pre-proof
10
×1011
6
4
2
0 0
5
10
15
Time, s
pro of
Components of W
8
20
25
30
Figure 9: Time dependent trajectories of ωb (t).
re-
2.5
1.5
1
0.5
0 0
lP
Sigma components
2
5
10
15
20
25
30
Time, s
345
350
obtained by the on-line numerical solution for the Riccati time varying matrix differential equation. The ANN approximation proposed in this study represents a contribution to the robust realization of optimal controller for systems with the admissible class of parametric uncertainties and external perturbations considered in this study. The proposed numerical results demonstrated the existence of the practical stable equilibrium point associated to the origin. This result justified the theoretical result associated to the stability analysis based on the Lyapunov-like function (energetic type). Moreover, the approximate solution based on the Riccati matrix equation also showed the convergence to bounded and constant values. Interestingly, the weights of the ANN are not converging to constant values in contrast to the regular ANN approximations of the HJB for the optimal control realization. The existing approximate results of the optimal controller for the class of systems considered in this study have not proposed the analytic results attained here. Notice that this study provides the evaluation of the approximate function over the sub-optimal control, as well as its impact on the Hamiltonian associated to the ANN approximation. The quasi-linear form of the uncertain system has motivated the proposal of a mixed
Jo
340
urn a
Figure 10: Time dependent trajectories of σb (t).
18
Journal Pre-proof
1.9 1.8 1.7 1.6 1.5 1.4 0
5
10
15
Time, s
pro of
Components of matrix P
2
20
25
30
Figure 11: Time evolution of the components of the matrix P calculated by the numerical solution of the time-dependent Riccati matrix differential equation.
re-
controller using a linear form plus the approximated solution based on the ANN. 8. Conclusions
365
370
lP
360
In this manuscript, the finite time horizon optimal control problem for a class of uncertain system has been tackled by the proposition of a min-max sub-optimal control with a NDP approach. The controller used an ANN approximation of the Value function. The proposed approximated solution of the HJB equation has been based on an ANN structure added with a classical quadratic form of the state weighted by a timedependent positive definite matrix. The ANN structure obeys the nature of the Value function (positive and continuous) by using quadratic terms and by choosing sigmoidal activation functions. The tuning of the free-parameters for the approximation adjustment was solved with a recurrent numerical algorithm. Compared with other works, in this study, the effect of the unknown terms is added in the main theorems, and the structure of the ANN considered the natural quadratic structure of the nominal part of the system. The numerical simulation of the proposed controller was implemented to regulate a quasi-linear system with parametric uncertainties and external perturbations and it was compared with a classical pole-placement controller. The performance of the sub-optimal controller was illustrated by the numerical simulation proposed in this study.
urn a
355
References
375
Jo
Azhmyakov, V. (2011). On the geometric aspects of the invariant ellipsoid method: Application to the robust control design. In 2011 50th IEEE Conference on Decision and Control and European Control Conference (pp. 1353–1358). doi:10.1109/ CDC.2011.6161180. Azhmyakov, V., Mera, M., & Ju´arez, R. (2019). Advances in attractive ellipsoid method for robust control design. International Journal of Robust and Nonlinear Control, 29, 1418–1436.
19
Journal Pre-proof
Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39, 930–945.
385
Beard, R. W., Saridis, G. N., & Wen, J. T. (1997). Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica, 33, 2159 – 2177. doi:http://dx.doi.org/10.1016/S0005-1098(97)00128-3. Beard, R. W., Saridis, G. N., & Wen, J. T. (1998). Approximate Solutions to the TimeInvariant Hamilton–Jacobi–Bellman Equation. Journal of Optimization Theory and Applications, 96, 589–626. doi:10.1023/A:1022664528457.
pro of
380
Bertsekas, D. P. (1995). Dynamic Programming and Optimal Control volume 1. Athena scientific Belmont, MA. Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming volume 5. Athena Scientific Belmont, MA. 390
Bryson, A. E. (1975). Applied optimal control: Optimization, estimation and control. CRC Press.
395
re-
Cheng, T., Lewis, F. L., & Abu-Khalaf, M. (2007). A neural network solution for fixed-final time optimal control of nonlinear systems. Automatica, 43, 482 – 490. doi:https://doi.org/10.1016/j.automatica.2006.09.021. Cloutier, J. R., & Cockburn, J. C. (2001). The State-Dependent Nonlinear Regulator with State Constrains. In Proceedings.
400
lP
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2, 303–314. Edwards, C., & Spurgeon, S. (1998). Sliding mode control: theory and applications. CRC Press.
405
urn a
Gu, K., Zohdy, M. A., & Loh, N. K. (1990). Necessary and sufficient conditions of quadratic stability of uncertain linear systems. IEEE Transactions on Automatic Control, 35, 601–604. Guliyev, N. J., & Ismailov, V. E. (2018). On the approximation by single hidden layer feedforward neural networks with fixed weights. Neural Networks, 98, 296–304. Haddad, W. M., & Chellaboina, V. (2011). Nonlinear dynamical systems and control: a Lyapunov-based approach. Princeton University Press.
Jo
410
Heydari, A., & Balakrishnan, S. N. (2013). Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics. IEEE Transactions on Neural Networks and Learning Systems, 24, 145–157. doi:10.1109/TNNLS. 2012.2227339. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359 – 366. doi:http://dx.doi. org/10.1016/0893-6080(89)90020-8.
415
Huang, C.-S., Wang, S., & Teo, K. (2000). Solving Hamilton—Jacobi—Bellman equations by a modified method of characteristics. Nonlinear Analysis: Theory, Methods & Applications, 40, 279 – 293. doi:http://dx.doi.org/10.1016/ S0362-546X(00)85016-6. 20
Journal Pre-proof
420
C¸imen, T. (2008). State-Dependent Riccati Equation (SDRE) Control: A Survey. IFAC Proceedings Volumes, 41, 3761 – 3775. doi:http://dx.doi.org/10.3182/ 20080706-5-KR-1001.00635. 17th IFAC World Congress.
425
pro of
Jiang, Y., & Jiang, Z. P. (2014). Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems. IEEE Transactions on Neural Networks and Learning Systems, 25, 882–893. doi:10.1109/TNNLS.2013.2294968. Kainen, P. C., Kurkova, V., & Sanguineti, M. (2012). Dependence of computational models on input dimension: Tractability of approximation and optimization tasks. IEEE Transactions on Information Theory, 58, 1203–1214. Khalil, H. K. (1996). Noninear systems. Prentice-Hall, New Jersey, 2, 5–1. 430
Kim, Y. H., Lewis, F. L., & Dawson, D. M. (2000). Intelligent optimal control of robotic manipulators using neural networks. Automatica, 36, 1355 – 1364. doi:https://doi.org/10.1016/S0005-1098(00)00045-5.
435
re-
Kirk, D. (2004). Optimal Control Theory: An Introduction. Dover Books on Electrical Engineering Series. Dover Publications. Kiumarsi, B., Lewis, F. L., Modares, H., Karimpour, A., & Naghibi-Sistani, M.-B. (2014). Reinforcement Q-learning for optimal tracking control of linear discretetime systems with unknown dynamics. Automatica, 50, 1167 – 1175. doi:https: //doi.org/10.1016/j.automatica.2014.02.015.
440
lP
Lewis, F. L., & Liu, D. (2013). Reinforcement Learning and Approximate Dynamic Programming for Feedback Control volume 17. John Wiley & Sons. Lewis, F. L., & Vrabie, D. (2009). Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control. IEEE Circuits and Systems Magazine, 9, 32– 50. doi:10.1109/MCAS.2009.933854.
450
455
Masmoudi, N. K., Rekik, C., Djemel, M., & Derbel, N. (2011). Two coupled neuralnetworks-based solution of the Hamilton–Jacobi–Bellman equation. Applied Soft Computing, 11, 2946 – 2963. doi:https://doi.org/10.1016/j.asoc.2010.11. 015. Modares, H., Lewis, F. L., & Naghibi-Sistani, M.-B. (2014a). Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica, 50, 193 – 202. doi:https: //doi.org/10.1016/j.automatica.2013.09.043. Modares, H., Lewis, F. L., & Sistani, M.-B. N. (2014b). Online solution of nonquadratic two-player zero-sum games arising in the H∞ control of constrained input systems. International Journal of Adaptive Control and Signal Processing, 28, 232–254. doi:10.1002/acs.2348.
Jo
445
urn a
Liberzon, D. (2012). Calculus of variations and optimal control theory: A concise introduction. Princeton University Press.
Mulje, S. D., & R.M.Nagarele (2016). LQR Technique based Second Order Sliding Mode Control for Linear Uncertain Systems. International Journal of Computer Applications, 137, 23–29. 21
Journal Pre-proof
465
Palanisamy, M., Modares, H., Lewis, F. L., & Aurangzeb, M. (2015). Continuoustime Q-learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems. IEEE Transactions on Cybernetics, 45, 165–176. doi:10.1109/TCYB. 2014.2322116. Patan, K. (2018). Two stage neural network modelling for robust model predictive control. ISA Transactions, 72, 56 – 65.
pro of
460
Powell, W. B. (2007). Approximate Dynamic Programming: Solving the curses of dimensionality volume 703. John Wiley & Sons. Poznyak, A., Polyakov, A., & Azhmyakov, V. (2014). Attractive Ellipsoids in Robust Control. Springer. 470
Poznyak, A. S., & Boltyanski, V. G. (2012). The Robust Maximum Principle. Springer Science & Business Media. Qu, Z. (1993). Robust control of nonlinear uncertain systems under generalized matching conditions. Automatica, 29, 985–998. Saad, W., Sellami, A., & Garcia, G. (2019). H∞ -sliding mode control of one-sided lipschitz nonlinear systems subject to input nonlinearities and polytopic uncertainties. ISA Transactions, 90, 19 – 29.
re-
475
Sage, A. P. (1968). Optimum Systems Control. Prentice-Hall.
485
S.Sutton, R., & Barto, A. G. (2012). Reinforcement Learning an Introduction. (Second edition ed.). The MIT Press. Tang, Z. L., Ge, S. S., Tee, K. P., & He, W. (2016). Robust Adaptive Neural Tracking Control for a Class of Perturbed Uncertain Nonlinear Systems With State Constraints. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 46, 1618– 1629. doi:10.1109/TSMC.2015.2508962.
urn a
480
lP
Spong, M. W., Hutchinson, S., & M., V. (2006). Robot Modeling and Control volume 141. John Wiley and Sons, Inc.
Utkin, V., Guldner, J., & Shi, J. (2009). Sliding mode control in electro-mechanical systems volume 34. CRC press.
490
Vamvoudakis, K. G., Vrabie, D., & Lewis, F. L. (2014). Online adaptive algorithm for optimal control with integral reinforcement learning. International Journal of Robust and Nonlinear Control, 24, 2686–2710. doi:10.1002/rnc.3018.
495
Jo
Vrabie, D., & Lewis, F. (2009). Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks, 22, 237 – 246. doi:https://doi.org/10.1016/j.neunet.2009.03.008. GoalDirected Neural Systems. Wang, D. (2019). Intelligent Critic Control With Robustness Guarantee of Disturbed Nonlinear Plants. IEEE Transactions on Cybernetics, . Wang, D., Liu, D., Li, H., & Ma, H. (2014). Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Information Sciences, 282, 167–179. 22
Journal Pre-proof
500
Wang, D., Liu, D., Zhang, Y., & Li, H. (2018). Neural network robust tracking control with adaptive critic framework for uncertain nonlinear systems. Neural Networks, 97, 11 – 18. doi:https://doi.org/10.1016/j.neunet.2017.09.005. Wang, D., & Qiao, J. (2019). Approximate neural optimal control with reinforcement learning for a torsional pendulum device. Neural Networks, 117, 1–7.
510
Wang, S., Jennings, L. S., & Teo, K. L. (2003). Numerical Solution of HamiltonJacobi-Bellman Equations by an Upwind Finite Volume Method. Journal of Global Optimization, 27, 177–192. doi:10.1023/A:1024980623095.
pro of
505
Xue Anke, Jiang Nan, & Sun Youxian (2001). Robust guaranteed cost control with H∞ γ disturbance attenuation performance. In Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148) (pp. 4218–4219 vol.6). volume 6. doi:10.1109/ ACC.2001.945638. Yang, Q., Jagannathan, S., & Sun, Y. (2015). Robust Integral of Neural Network and Error Sign Control of MIMO Nonlinear Systems. IEEE Transactions on Neural Networks and Learning Systems, 26, 3278–3286. doi:10.1109/TNNLS.2015.2470175. Yang, X., & He, H. (2018). Self-learning robust optimal control for continuous-time nonlinear systems with mismatched disturbances. Neural Networks, 99, 19 – 30.
re-
515
lP
Yu, Y. (1983). On stabilizing uncertain linear delay systems. Journal of Optimization Theory and Applications, 41, 503–508.
Appendix
urn a
Proof for Proposition 1 P ROOF. Introduce the approximate value function Va : R+ × Rn → R+ as a feasible Lyapunov function candidate for (1). Based on the assumptions, the candidate energetic function satisfies x> P− x ≤ Va (t, x) ≤ x> P+ x + ω+ σ(x).
Notice that both x> P− x and ω+ σ(x) + x> P+ x are both class-K functions (Khalil, 1996). The full-time derivative of Va along the trajectories of x and ω: d d d Va (t, x(t)) = 2x> (t)P(t) x(t) + x> (t) P(t)x(t) dt dt dt d d + ω(t)σ(x) + ω(t)∇> σ(x) x(t). dt dt
(26)
Reorganizing the elements on (26), the differential equation yields
Jo
520
d d Va (t, x) = 2x> (t)P(t) + ω(t)∇> σ(x) x(t) dt dt d d ¯ > (t) ω(t)σ(x). ¯ +x> (t) P(t)x(t) + 2ω dt dt
(27)
The substitution of (1) in (27) leads to d d d ¯ > (t) ω(t)σ(x) ¯ Va (t, x(t)) = x> (t) P(t)x(t) + 2ω dt dt h dt > i + 2x> (t)P(t) + ω(t)∇> σ(x) A0 x(t) + B0 u(t) + η x> (t) 1 . 23
(28)
Journal Pre-proof
The application of the sub-optimal admissible controller u∗ in (10) transforms (28) to
(29)
pro of
d Va (t, x) = x> (t) P(t)A0 + A> 0 P(t) x(t) dt > 1 −1 > ∂V (t, x) > +2x (t)P(t)B0 − R B0 + 2x> P(t)η x> 1 2 ∂x(t) 1 −1 > ∂V (t, x) > + ω(t)∇ σ(x) A0 x + B0 − R B0 2 ∂x h > > i d d > > > ¯ ¯ (t) ω(t)σ(x). +x + ω(t)∇ σ(x) η x 1 P(t)x(t) + 2ω dt dt
∂ The gradient of V , ∂x V , can be estimated using the definition of activation functions. Then, the reorganization of the terms in (29) is equivalent to:
re-
d −1 > Va (t, x) = x> P(t)A0 + A> 0 P(t) − 2P(t)B0 R B0 P(t) x dt > > > −1 > > −2x P(t)B0 R B0 ω(t)∇σ(x) + 2x P(t)η x 1 (30) 1 ω(t)∇σ(x) + ω(t)∇> σ(x)A0 x − ω(t)∇> σ(x)B0 R−1 B> 0 2 > > d d > ¯ > (t) ω(t)σ(x). ¯ + ω(t)∇ σ(x) η x 1 + x> P(t)x(t) + 2ω dt dt The application of the Young’s inequality (Poznyak et al., 2014), justifies the following inequality, straightforwardly from (30)
lP
d −1 > Va (t, x) ≤ x> P(t)A0 + A> 0 P(t) − 2P(t)B0 R B0 P(t) x dt 1 ω(t)∇σ(x) + ω(t)∇> σ(x)A0 x − ω(t)∇> σ(x)B0 R−1 B> 0 2 > > > −2x> P(t)B0 R−1 B> 0 ω(t)∇σ(x) + ω(t)∇ σ(x) Λ2 ω(t)∇ σ(x) > > > > −1 > > +x> P(t)Λ1 P(t)x + x> 1 η> Λ−1 + x 1 η Λ2 η x 1 1 η x 1 d d > > ¯ (t) ω(t)σ(x). ¯ +x P(t)x(t) + 2ω dt dt
urn a
−1 Factorizing with respect to ω and η (Λ−1 = Λ−1 1 + Λ2 ), d d −1 > Va (t, x) ≤ x> P(t) + P(t)A0 + A> 0 P(t) − 2P(t)B0 R B0 P(t) dt dt > +P(t)Λ1 P(t) x + x> 1 η> Λ−1 η x> 1 d ¯ > (t) ω(t)σ(x) ¯ +2ω − ω(t) 2x> P(t)B0 R−1 B> 0 dt 1 > −1 > > −x> A> 0 + ∇ σ(x)B0 R B0 ω(t) − ω(t)∇ σ(x)Λ2 ∇σ(x). 2
Jo
Considering that ∆A> Λ−1 ∆A ≤ Q0 y ∆> BΛ−1 ∆B ≤ ε0 , one gets d d > Va (t, x) ≤ x P(t) + P(t)A0 + A> 0 P(t) dt dt d ¯ ¯ > (t) ω(t)σ(x) − 2P(t)B0 R−1 B> 0 P(t) + P(t)Λ1 P(t) + Q0 x + 2ω dt > > +ε0 − ω(t) 2x> P(t)B0 R−1 B> 0 − x A0 1 > −1 > > + ∇ σ(x)B0 R B0 ω(t) − ω(t)∇ σ(x)Λ2 ∇σ(x). 2
24
(31)
Journal Pre-proof
pro of
Substituting the adjustment law for the weights in (31) yields to d d −1 > Va (t, x) ≤ x> P(t) + P(t)A0 + A> 0 P(t) − 2P(t)B0 R B0 P(t) dt dt > > +P(t)Λ1 P(t) + Q0 x + ε0 − ω(t) 2x> P(t)B0 R−1 B> 0 − x A0 > σ(x) ˜ 1 ω(t)∇ > σ(x)Λ ∇σ(x) + 2ω ¯ > (t) + ∇> σ(x)B0 R−1 B> ω(t) − ω(t)∇ 2 0 2 2σ(x) 1 −1 > −A0 x + ω(t)B0 R−1 B> 0 ∇σ(x) + B0 R B0 2P(t)x 4 # p p 4 φ (kxk2 + 1) ω(t) φ (kxk2 + 1) ∇σ(x) − P(x)x σ(x). − z(ω, P) z(ω, P)
(32)
Simplifying the expression (32), one gets d d Va (t, x) ≤ x> P(t) + P(t)A0 + A> 0 P(t) dt dt
re-
−2P(t)B0 R−1 B> 0 P(t) + P(t)Λ1 P(t) + Q0 x + ε0 1 > σ(x)Λ ∇σ(x) ω(t) − ω(t)∇ −ω(t) ∇> σ(x)B0 R−1 B> 2 0 p p4 2 2 ω (t) φ (kxk2 + 1) 4 φ (kxk + 1) ∇σ(x) − ∇> σ(x) ∇σ(x). −ω(t)x> P(t) z(ω, P) z(ω, P)
lP
> Considering the extended vector ϕ := x> ∇> σ(x) yields to the simplified form: d d > Va (t, x) ≤ −x − P(t) − P(t)A0 − A> 0 P(t) dt dt (33) > +2P(t)B0 R−1 B> 0 P(t) − P(t)Λ1 P(t) − Q2 x − ω(t)ϕ Q3 ϕ + ε0 ,
urn a
where:
Π11 Π12 Q2 := Q0 + ω(t)Q1 , Q3 (t) := , Π> 12 Π22 p φ (kxk2 + 1) Π12 := 2 P(t), Π11 := Q1 , Q1 ∈ Rn×n , Q1 = Q> 1 , Q1 > 0 z(ω, " P) # p ω(t) φ (kxk2 + 1) 1 Π22 := B0 R−1 B> ω(t). 0 − Λ2 + 4 z(ω, P)
Considering (24), the inequality (33) is equivalent to:
Jo
d Va (t, x) ≤ −x> Q4 (t)x − ω(t)ϕ> Q3 ϕ + ε0 , dt If one considers the subspace of the state variables x ∈ x> Q4 (t)x ≥ ε0 (Q4 (t) is given in (24)), then d Va (t, x) ≤ 0 dt In consequence, the states x are ultimately bounded with the bound given in (25).
25
Journal Pre-proof
Conflict of Interest and Authorship Conformation Form Please check the following as appropriate: All authors have participated in (a) conception and design, or analysis and interpretation of the data; (b) drafting the article or revising it critically for important intellectual content; and (c) approval of the final version.
o
This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue.
o
The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript
o
The following authors have affiliations with organizations with direct or indirect financial interest in the subject matter discussed in the manuscript:
re-
pro of
o
Affiliation Unidad Profesional Interdisciplinaria de Biotecnología,
urn a
Author’s name
lP
Robust Min-Max Optimal Control Design for Systems with Uncertain Models: a Neural Dynamic Programming Approach
Mariana Ballesteros Isaac Chairez
Jo
Alexander Poznyak
Instituto Politécnico Nacional Unidad Profesional Interdisciplinaria de Biotecnología, Instituto Politécnico Nacional Departamento de Control Automático Centro de Investigación y Estudios Avanzados