- Email: [email protected]

Accepted Manuscript

Bounded robust control design for uncertain nonlinear systems using single-network adaptive dynamic programming Yuzhu Huang, Ding Wang, Derong Liu PII: DOI: Reference:

S0925-2312(17)30867-6 10.1016/j.neucom.2017.05.030 NEUCOM 18448

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

14 September 2016 29 March 2017 12 May 2017

Please cite this article as: Yuzhu Huang, Ding Wang, Derong Liu, Bounded robust control design for uncertain nonlinear systems using single-network adaptive dynamic programming, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.05.030

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Bounded robust control design for uncertain nonlinear systems using single-network adaptive dynamic programming ✩ Yuzhu Huang a,∗ , Ding Wang b , Derong Liu c a

State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China c

CR IP T

b

System Control Research Section, National Research Center of Gas Turbine & IGCC Technology, Beijing 100084, China

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China

Abstract

AN US

This paper is an effort towards developing an optimal learning algorithm to design the bounded robust controller for uncertain nonlinear systems with control constraints using single-network adaptive dynamic programming (ADP). First, the bounded robust control problem is transformed into an optimal control problem of the nominal system by a modified cost function with nonquadratic utility, which is used not only to account for all possible uncertainties, but also to deal with the control constraints. Then based on single-network ADP, an optimal learning algorithm is proposed for the nominal system by a single critic network to approximate the solution of Hamilton-Jacobi-Bellman (HJB) equation. An additional adjusting term is employed to stabilize the system and relax the requirement for an initial stabilizing control. Besides, uniform ultimate boundedness of the closed-loop system is guaranteed by Lyapunov’s direct method during the learning process. Moreover, the equivalence of the approximate optimal solution of optimal control problem and the solution of bounded robust control problem is also shown. Finally, four simulation examples are provided to demonstrate the effectiveness of the proposed approach.

Introduction

ED

1.

M

Keywords: Neural networks; Optimal control; Adaptive dynamic programming; Bounded robust control; Uncertain nonlinear systems

AC

CE

PT

As is well known, there exist lots of uncertainties arising from modeling errors, system aging and exogenous disturbances, which always result in the degradation of control system performance in practice. Therefore, it is necessary to design robust controllers to tackle these uncertainties for avoiding the deterioration of actual performance [1]. During the past several decades, various robust control approaches have been developed, including the H∞ approach [2], the Lyapunov approach [3], the sliding model control approach [4] and others [5–7]. Although many robust control design methods have been studied for designing a control system, ✩ This work was supported in part by the National Natural Science Foundation of China under Grants 61233001, 61533017, and U1501251, in part by Beijing Natural Science Foundation under Grant 4162065, and in part by the Early Career Development Award of SKLMCCS. ∗ Corresponding author. Tel.: +86-010-82151816; Fax: +86-01082150611. Email addresses: [email protected] (Yuzhu Huang), [email protected] (Ding Wang), [email protected] (Derong Liu).

Preprint submitted to Neurocomputing

which guarantees robust stability, it is desirable to design a control system that is not only stable but also has better performances. Among these methods, it is worth noting that in [7], the robust control problem was solved by the solution to optimal control problem for the relevant nominal system, and the designed robust controller not only ensured the system to be asymptotically stable for all possible uncertainties, but also was optimal with respect to a meaningful cost. This could improve the system’s performance to some extent, and also help to probe the relationships between robust control and optimal control. When considering the optimality, i.e., the optimal control law for the nonlinear system, in addition to the stability alone, it makes the system have better performance. From a mathematical point of view, the optimal control problem of nonlinear system requires the solution of the HamiltonJacobi-Bellman (HJB) equation, which is generally difficult or impossible to solve due to its inherent nonlinearity [8]. Although dynamic programming (DP) is very useful in solving optimal control problems, it is often computationally untenable to run DP and get the optimal solution for high dimensional nonlinear systems, which is referred to as the “curse of dimensionality” [9]. Inspired by the principle of

19 May 2017

ACCEPTED MANUSCRIPT in a typical ADP architecture [11]. This results in a simpler architecture, less computational burden, and removing the approximation error resulting from action network. In this paper, by a modified cost function with nonquadratic utility, the bounded robust control problem is first transformed into an optimal control problem of the nominal system. Then, based on single-network ADP, an optimal learning algorithm is proposed for the nominal system by a single critic network to approximate the solution of HJB equation, wherein an additional adjusting term is used to stabilize the system and relax the requirement for an initial stabilizing control. Furthermore, uniform ultimate boundedness of the closedloop system is guaranteed by Lyapunov’s direct method.

brain operation, using neural networks (NNs) and combing DP with reinforcement learning (RL), adaptive dynamic programming (ADP), as a new paradigm for solving the optimal control problem, successfully avoids the “curse of dimensionality” [10, 11]. By now, much works in the literature have applied ADP to obtain the approximate optimal solutions of the optimal control problems of both discretetime systems [12–17] and continuous-time systems [18–23]. Moreover, how to utilize ADP to obtain good robustness and better performance for uncertain nonlinear systems is attracting more and more attention.

CR IP T

Recently, Modares et al. [24, 25] described an online adaptive control scheme to find the optimal control for nonlinear continuous-time systems subjected to input constraints, and utilized integral reinforcement learning to further solve the optimal tracking control problem of partially-unknown nonlinear constrained-input systems. In both cases, the effects of possible uncertainties were not taken into consideration. Adhyaru et al. [26] developed an HJB equation-based optimal control algorithm for robust controller design by least squares method for uncertain nonlinear systems. However in their algorithm, the stability analysis of closed-loop system was not addressed. Wang et al. [27] proposed a novel design method of robust controller for a class of continuous-time nonlinear systems with matched uncertainties by online policy iteration algorithm. Jiang et al. [28] gave a robust ADP algorithm to solve the robust control problem for a class of uncertain nonlinear systems, where the requirement of initial stabilizing control was a rather restrictive condition in practice. Liu et al. [29] presented an RL-based robust adaptive control algorithm for constrained-input uncertain nonlinear systems by solving the converted optimal control problem, where the uncertainties needed to meet the matching condition and the designed robust controller could only guarantee the uncertain system to be stable in the sense of uniform ultimate boundedness. Wang et al. [30] developed a neural network based robust optimal control design for nonlinear systems with matched uncertainties and extended this approach to nonlinear interconnected large-scale systems, but the boundedness of actual control input was not considered. Wu et al. [31] studied the finite-horizon optimal guaranteed cost control problem for time-varying uncertain nonlinear system and applied the proposed design method to the entry guidance problem of the Mars lander, whereas the solution process was off-line and involved a backward integration.

AN US

The rest of this paper is organized as follows. In Section 2, the problem formulation is presented. In Section 3, the bounded robust control problem is transformed into an optimal control problem of the nominal system by a modified cost function with nonquadratic utility, and the validity of the problem transformation is shown. In Section 4, an optimal learning algorithm is proposed for the nominal system based on single-network ADP, then the bounded robust controller is constructed by the optimal solution, and the stability analysis is also presented. In Section 5, simulation results are shown to verify the effectiveness of the proposed approach. Several conclusions are drawn in Section 6. 2.

Problem statement

PT

ED

M

Consider the uncertain nonlinear continuous-time systems given by x(t) ˙ = f (x(t)) + g(x(t))u(t) + ∆f (x(t)),

where x = [x1 , x2 , . . . , xn ]T ∈ Rn is the state vector, u ∈ U is the control input, U = {u = [u1 , u2 , . . . , um ]T ∈ Rm : |ui | ≤ λ, i = 1, 2, . . . , m}, where λ ∈ R is the saturating bound for the actuators, f (x) ∈ Rn and g(x) ∈ Rn×m are the known system dynamics, and ∆f (x) ∈ Rn is the nonlinear perturbation term of the corresponding nominal system x(t) ˙ = f (x(t)) + g(x(t))u(t),

CE

(1)

(2)

which represents the system uncertainties and results from modeling error, aging and disturbances. For the nominal system (2), it is assumed that f (0) = 0 and f (x) + g(x)u is Lipschitz continuous on a compact set Ω ⊆ Rn containing the origin, and system (2) is stabilizable in the sense that there exists a continuous control on Ω that asymptotically stabilizes an equilibrium point of the system. For the perturbation term, ∆f (x) is assumed to be bounded by a known function fmax (x), i.e., k∆f (x)k ≤ fmax (x). Besides, we also assume that ∆f (0) = 0 so that the origin is an equilibrium point of system (1).

AC

Motivated by the above works, in this paper, we propose a novel optimal learning algorithm to design the bounded robust controller for uncertain nonlinear systems with control constraints based on single-network ADP. Unlike the existing studies and methodologies [26–32], the main contribution of this paper is the proposed approach simultaneously deals with the control constraints, relaxes the requirement for an initial stabilizing control and handles more general uncertainties including, but not limited to, matched uncertainties. Moreover, the single-network ADP is adopted in the design of bounded robust controller, where only one critic network is used instead of the action-critic dual network used

Remark 1 It is noted that the assumption that ∆f (x) is bounded in norm by a known function fmax (x) reduces

2

ACCEPTED MANUSCRIPT where υ ∈ Rm , tanh−1 (u) = [tanh−1 (u1 ), tanh−1 (u2 ), . . . , tanh−1 (um )]T with tanh(·) being the hyperbolic tangent function, tanh−T (·) denotes (tanh−1 (·))T , Q and R are positive definite matrices with appropriate dimensions, and R = diag{r1 , r2 , . . . , rm } > 0 is assumed to be diagonal for simplicity of analysis. Since tanh−1 (·) is odd monotonic function and R is positive definite, W (u) is also positive definite. Note that the feedback control u(x) must not only stabilize the nominal system (2) but also guarantee that (3) is finite, i.e., the control must be admissible [19].

For bounded robust control problem, it is desired to find a feedback control law u(x) ∈ U such that the closed-loop system is globally asymptotically stable for all admissible perturbation ∆f (x). Such bounded robust control is generally difficult to solve for the uncertain nonlinear system directly.

Definition 1 A control law u(x) is said to be admissible with respect to the cost function (3) on a compact set Ω, denoted by u ∈ Ψ(Ω), if u(x) is continuous on Ω, u(0) = 0, u(x) stabilizes system (2) on Ω, and V (x0 ) is finite ∀x0 ∈ Ω.

CR IP T

the applicability of the proposed approach in this paper. Specifically, the bounded robust control problems without prior knowledge of the bounds of the uncertainties cannot be solved by the proposed approach. However, it should be pointed out that the uncertainties encountered in many practical systems are, indeed, state-dependent. So, we can acquire necessary information about the bounds of the uncertainties. Typical examples of state-dependent uncertainties in practical systems include spring stiffness, friction coefficient and nonlinear resistor [3].

Given an admissible control u ∈ Ψ(Ω) and the associated function V ∈ C 1 , then the infinitesimal version of (3) is the following Lyapunov equation

In this paper, to confront the uncertain nonlinear system (1) with control constraints, the design of bounded robust controller can be divided into two steps. First, by a modified cost function with nonquadratic utility, the bounded robust control problem is transformed into an optimal control problem of the nominal system, where the additive modified term and the nonquadratic utility of cost function are used to account for system uncertainties and deal with the control constraints, respectively. Second, based on the modified cost function, an optimal learning algorithm is proposed for the nominal system by single-network ADP to approximate the solution of HJB equation. Using the obtained HJB solution, the bounded robust controller is constructed for the original uncertain system.

(5)

AN US

0 = Γ(x) + xT Qx + W (u) + VxT (f (x) + g(x)u)

M

Define the Hamiltonian of the problem as

Transformation of bounded robust control into optimal control

ED

3.

with V (0) = 0, where Vx denotes the partial derivative of the value function V (x) with respect to x, i.e., Vx = ∂V (x)/∂x. Further, the Lyapunov equation becomes the HJB equation on substitution of the optimal control u∗ .

PT

V ∗ (x0 ) = min

u∈Ψ(Ω)

CE

AC

∞

(Γ(x) + xT Qx + W (u))dt,

Z

∞

(Γ(x) + xT Qx + W (u))dt

(7)

0

with x0 = x is known as the value function, and satisfies the HJB equation

For the nominal system (2), the optimal control objective is to derive a feedback control u(x) which minimizes the following cost function V (x0 ) =

(6)

The optimal cost function V ∗ is given by

In this section, the bounded robust control problem is transformed into an optimal control problem of the nominal system, where a modified cost function with nonquadratic utility is used to account for the nonlinear perturbation of system and deal with the control constraints. Moreover, the validity of the problem transformation will be shown.

Z

H(x, u, Vx ) =Γ(x) + xT Qx + W (u) + VxT (f (x) + g(x)u).

0 = min H(x, u, Vx∗ ). u∈Ψ(Ω)

(3)

(8)

Assume that the minimum on the right-hand side of (8) exists and is unique, then by solving ∂H(x, u, Vx∗ )/∂u = 0, the bounded optimal control u∗ for the nominal system (2) can be derived as

0

where x0 = x(0) is the initial state, Γ(x) is the additive modified term to account for the perturbation that satisfies Γ(x) ≥ 0 with Γ(0) = 0, W (u) is the nonquadratic utility to deal with the control constraints that is defined as Z u (4) W (u) = 2 λtanh−T (υ/λ)Rdυ,

u∗ = −λtanh

1 R−1 g T (x)Vx∗ . 2λ

(9)

Substituting (9) into (8), we have the formulation of the HJB

0

3

ACCEPTED MANUSCRIPT equation in terms of Vx∗ as

0 =Γ(x) + 2

Z

Using (6), (8) and (11), (12) can be rewritten as

1 −λtanh( 2λ R−1 g T (x)Vx∗ )

1 2 (x) − xT Qx − W (u∗ ) V˙ ∗ (x) = − Vx∗T Vx∗ − fmax 4 + Vx∗T ∆f (x) 1 (13) 2 = − xT Qx − W (u∗ ) − fmax (x) − Vx∗T Vx∗ 4 − Vx∗T ∆f (x) .

−T

λtanh

(υ/λ)Rdυ ! 1 −1 T ∗ f (x) − λg(x)tanh R g (x)Vx 2λ 0

+ Vx∗T + xT Qx

(10)

By adding and subtracting ∆f T (x)∆f (x) to the right-hand side of (13), we can derive 2 V˙ ∗ (x) = − xT Qx − W (u∗ ) − fmax (x) − ∆f T (x)∆f (x) 1 T 1 − Vx∗ − ∆f (x) Vx∗ − ∆f (x) 2 2 T ∗ 2 ≤ − x Qx − W (u ) − fmax (x) − ∆f T (x)∆f (x) .

CR IP T

with V ∗ (0) = 0. For obtaining the optimal control u∗ , one only needs to solve the HJB equation (10) for the value function, and then substitute the solution in (9) to obtain the optimal control.

(14)

AN US

Remark 2 The HJB equation (10) is actually a nonlinear partial differential equation with respect to V ∗ (x). The HJB equation cannot generally be solved. Moreover, due to the nonlinear nature of the HJB equation, finding its analytical solution is generally difficult or impossible. Thus, it is extremely difficult to obtain the optimal control by solving the HJB equation (10) directly.

M

Next, the validity of the problem transformation, that is, the equivalence of the solution of bounded robust control problem and optimal control problem, is shown in the following theorem.

1 T 2 V (x)Vx (x) + fmax (x). 4 x

PT

Γ(x) =

ED

Theorem 1 Consider the nominal system (2) with the cost function (3), and assume that the solution of HJB equation (10) exists. Then, the optimal control u∗ developed in (9) ensures global asymptotic stability of uncertain nonlinear system (1), provided that Γ(x) is selected as (11)

Z

t

V˙ (x(τ ))dτ Z t ≤− xT Qxdτ Z0 t ≤− γdτ = −γt.

V ∗ (x(t)) − V ∗ (x(0)) =

0

(15)

0

Letting t → ∞, we have V ∗ (x(t)) ≤ V ∗ (x(0)) − γt → −∞, which contradicts the fact that V ∗ (x(t)) ≥ 0 for all x(t). Therefore, the conclusion can be drawn that limt→∞ x(t) = 0 no matter where the system trajectory begins.

CE

This also means that the solution to the optimal control problem is a solution to the bounded robust control problem. Proof. We will prove that the optimal control of nominal system u∗ is a solution of the bounded robust control problem, i.e., the equilibrium point of uncertain system (1) is globally asymptotically stable for the perturbation ∆f (x). Considering the optimal cost function V ∗ (x) in (7), V ∗ (x) is a positive definite function. Thus, V ∗ (x) can be chosen as a Lyapunov function candidate for the original uncertain system. Taking the time derivative of V ∗ (x) along the system trajectory x˙ = f (x) + g(x)u∗ + ∆f (x), we have

So far, based on Theorem 1, the equivalence of the solution of bounded robust control problem and optimal control problem is validated. For the nominal system (2), by solving for the solution of HJB equation (10), then substitute the solution in (9) to compute the optimal control, which is also the bounded robust control of the original system (1). However, due to the nonlinearity of HJB equation, solving the HJB equation is generally difficult. To overcome the difficulty, an optimal learning algorithm is proposed by singlenetwork ADP to approximate the solution of HJB equation in the next section.

AC

V˙ ∗ (x) = Vx∗T (f (x) + g(x)u∗ ) + Vx∗T ∆f (x).

Considering the perturbation satisfies k∆f (x)k ≤ fmax (x) and W (u) is positive definite, it is clearly observed from (14) that V˙ ∗ (x) ≤ −xT Qx < 0 for any x 6= 0. Then, by the Lyapunov local stability theory, we can conclude that there exists a neighborhood Ξ = {x : kx(t)k < c} for some c > 0 such that if x(t) enters Ξ, then limt→∞ x(t) = 0. Moreover, x(t) cannot remain forever outside Ξ, otherwise kx(t)k ≥ c for all t ≥ 0. By finding a scalar quantity γ = inf{xT Qx} > 0 for kx(t)k ≥ c, it is easy to derive

(12)

4

ACCEPTED MANUSCRIPT Bounded robust controller design

where ∇σcT = Vc (∂σc (z)/∂z)T and ∇εc = ∂εc /∂x. Considering (17), the derivative of Vˆ (x) with respect to x is

In this section, based on the modified cost function, an optimal learning algorithm is proposed for the nominal system by single-network ADP to approximate the solution of HJB equation. Then using the approximate HJB solution, the bounded robust controller is constructed for the original uncertain system. Moreover, the stability analysis of the proposed algorithm is shown.

ˆ c. Vˆx = ∇σcT W

Then, using (9) and (18), the optimal control u∗ is derived as u∗ = −λtanh

1 R−1 g T (x)(∇σcT Wc + ∇εc ) 2λ

u ˆ = −λtanh

1 ˆc . R−1 g T (x)∇σcT W 2λ

x˙ = f (x) − λg(x)tanh

1 ˆc . R−1 g T (x)∇σcT W 2λ

AN US Z

u

λtanh−T (υ/λ)Rdυ

W (u) =2

Remark 3 It is noted that the cost function is continuously differentiable, and thus uniformly continuous on the compact set Ω. So, based on the universal approximation property of NN, there exists an NN such that V (x) = WcT σc (VcT x) + εc (x), where the hidden layer weight Vc is chosen initially at random and held fixed, and the output layer weight Wc is tuned online. It is demonstrated in [9] that if the number of hidden layer neurons k is sufficiently large, the NN reconstruction error εc can be made arbitrarily small for all input x ∈ Rn in Ω, since the activation function σc (·) forms a stochastic basis.

W (u∗ ) =λ(∇σcT Wc + ∇εc )T g(x)tanh(Λ) ¯ + λ2 Rln(I − tanh2 (Λ)),

0

M

ED

(23)

¯ (u/λ)Ru + λ Rln(I − (u/λ)2 ), 2

¯ = [r1 , r2 , . . . , rm ] ∈ R1×m , I = [1, 1, . . . , 1]T ∈ where R m R and ln(I − (u/λ)2 ) ∈ Rm . Then by (20), substituting u∗ into (23) yields (24)

where Λ = R−1 g T (x)(∇σcT Wc + ∇εc )/2λ. Further based on (24), substituting (18) and (20) into the Lyapunov equation (5), the HJB equation becomes

PT

CE

−T

=2λtanh

1 T 2 W ∇σc ∇σcT Wc + fmax (x) + xT Qx 4 c ¯ + WcT ∇σc f (x) + λ2 Rln(I − tanh2 (D)) = εHJB , (25)

For the critic NN, its output can be expressed as

AC

(22)

According to [19], the nonquadratic term (4) can be given by

where Wc ∈ Rk , Vc ∈ Rn×k are the ideal weights of the output and hidden layers, respectively, k is the number of hidden layer neurons, εc is the NN reconstruction error. Based on [9,23], for simplicity of learning, the output layer weight Wc is adapted online, whereas the hidden layer weight Vc is selected initially at random and held fixed during learning process.

(17)

where D = R−1 g T (x)∇σcT Wc /2λ, and εHJB is defined as 1 T εHJB = − WcT ∇σc ∇εc − ∇εT c ∇εc − ∇εc f (x) 2 ¯ ¯ + λ2 Rln(I − tanh2 (D)) − λ2 Rln(I − tanh2 (Λ)) (26)

ˆ c is the estimate of the ideal weight Wc . Since where W the hidden layer weight Vc is fixed, the activation function σc (VcT x) is denoted as σc (z) : Rn → Rk with z = VcT x. The derivative of V (x) with respect to x is Vx = ∇σcT Wc + ∇εc ,

(21)

By applying (21) to the nominal system (2), we have the closed-loop system dynamics as follows:

(16)

ˆ cT σc (VcT x) = W ˆ cT σc (z), Vˆ (x) = W

(20)

and the estimate of optimal control u ˆ is given in terms of Vˆx in (19) as

Due to the nonlinear nature of HJB equation, it is difficult or impossible to find its analytical solution and obtain the optimal controller. Thus, an optimal learning algorithm is proposed to approximate the solution of HJB equation by single-network ADP, where a single critic network is adopted instead of the action-critic dual network. The single critic network implemented using a neural network is used to approximate the cost function. For brevity, this neural network is termed as critic NN. According to the universal approximation property of NN, the cost function can be represented by the critic NN on a compact set Ω as V (x) = WcT σc (VcT x) + εc (x),

(19)

CR IP T

4.

due to the NN reconstruction error. According to [25], it is shown that as the number of hidden layer neurons k increases, the approximate error εHJB converges to zero.

(18)

5

ACCEPTED MANUSCRIPT Hence, there exists a constant error bound εM , so that sup∀x kεHJB k ≤ εM .

kΨ(x)k = 0 if and only if kxk = 0, and Ψmin ≤ kΨ(x)k ≤ Ψmax for χmin ≤ kxk ≤ χmax with positive constants Ψmin , Ψmax , χmin and χmax . In addition, let Ψ(x) satisfy limx→∞ Ψ(x) = ∞ as well as

For the critic NN, by using (19), (21) and (23), the approxˆ c ) is derived as imate Hamiltonian H(x, W

Vx∗T Ψ(x)J1x =Γ(x) + xT Qx Z u∗ +2 λtanh−T (υ/λ)Rdυ.

ˆ c) = 1 W ˆ T ∇σc ∇σ T W ˆ c + f 2 (x) + xT Qx H(x, W c max 4 c T 2 ˆ c ∇σc f (x) + λ Rln(I ¯ ˆ +W − tanh2 (D)) =ec ,

0

Then, the following relation holds:

(27)

T T J1x (f (x) + g(x)u∗ ) = −J1x Ψ(x)J1x .

ˆ˙ c = − α ∂ec ec + β Π(x, u W ˆ)∇σc g(x)ΘR−1 g T (x)J1x ˆc 2 ∂W (28)

V˙ ∗ = Vx∗T (f (x) + g(x)u∗ ) Z = −Γ(x) − xT Qx − 2

(31)

Proof. When the optimal control u∗ in (9) is applied to the nominal system (2), the value function V (x, u∗ ) becomes a Lyapunov function. Then, taking the time derivative of V (x, u∗ ), we have

AN US

Π(x, u ˆ) =

(

CR IP T

ˆ = R−1 g T (x)∇σ T W ˆ c /2λ, and ec is the residual where D c error. Thus, for approximating the solution of HJB equation, it is necessary to minimize the squared residual error Ec , ˆ c ) = eT i.e., Ec (W c ec /2 to train the critic NN. The critic NN weight tuning law is designed by the gradient descent algorithm with an additional term introduced to ensure the boundedness of system states while the critic NN learns the value function, i.e.,

with

(30)

u∗

λtanh−T (υ/λ)Rdυ.

(32)

0

By using (30), (32) can be rewritten as

T ˆ <0 0, if J˙1 (x) = J1x (f (x) − λg(x)tanh(D))

1, otherwise

M

(29)

ED

where α, β are the positive learning rates, ∂ec /∂ w ˆc = T ˆ ˆ ∇σc (f (x) − λg(x)tanh(D) + ∇σc Wc /2), Θ = diag{1 − ˆ i )}, i = 1, 2, . . . , m, and Π(ˆ tanh2 (D x, u ˆ) is the index operator designed by Lyapunov theory, J1 (x) is a Lyapunov function candidate defined in the following lemma, J1x is the partial derivative of J1 (x) with respect to x. Moreover, it should be noted that the additional adjusting term, i.e., the last term in (28) is removed when the nominal system exhibits stable behavior, and activated when the system state becomes unstable. By using the minus gradient direction of T ˆ /∂ W ˆ c , the last J˙1 (x), i.e., −∂ J1x (f (x) − λg(x)tanh(D)) term of (28) is worked and used to reinforce the training process of critic NN until the system exhibits stable behavior. This relaxes the requirement for an initial admissible control, in contrast to [27, 28], where the admissible control is needed for initialization which is hard to obtain in practice.

f (x) + g(x)u∗ = − (Vx∗ Vx∗T )−1 Vx∗ Γ(x) + xT Qx Z u∗ +2 λtanh−T (υ/λ)Rdυ 0 (Vx∗ Vx∗T )−1 Vx∗ Vx∗T Ψ(x)J1x

(33)

=− = − Ψ(x)J1x .

PT

T Next, multiplying both sides of (33) by J1x , (31) can be obtained.

AC

CE

Remark 4 It should be pointed out that f (x) + g(x)u∗ is generally assumed to be bounded by a positive constant [22]. To relax the condition, in this paper, the optimal closed-loop dynamics are assumed to be upper bounded by a function of system state such that kf (x) + g(x)u∗ k ≤ ϑ(x).

(34)

p The general bound ϑ(x) is taken as ϑ(x) = 4 κkJ1x k with κ > 0 for the following stability analysis, where kJ1x k is selected to satisfy the general bound in this paper. Moreover, J1 (x) can be obtained by properly selecting a quadratic polynomial [30].

Lemma 1 Consider the nominal system (2) with the associated value function V (x) and the optimal control u∗ in (9), it is assumed that there exists a continuously differentiable and radially unbounded Lyapunov function candidate T (f (x) + g(x)u∗ ) < 0 with J1 (x) such that J˙1 (x) = J1x J1x being the partial derivative of J1 (x) with respect to x. Moreover, let Ψ(x) be a positive definite matrix satisfying

˜ c = Wc − W ˆ c , by using (25) and some polynomial Define W 6

ACCEPTED MANUSCRIPT adjustments, (27) is rewritten as

Then substituting (38) and (39) into (35), we derive ˆ c) = − 1 W ˜ T ∇σc ∇σ T Wc + 1 W ˜ T ∇σc ∇σ T W ˜c H(x, W c c 2 c 4 c ˜ cT ∇σc f (x) + λWcT ∇σc g(x)sgn(D) −W ˆ T ∇σc g(x)sgn(D) ˆ + λ2 R(ξ ¯ ˆ − ξD ) − λW

ˆ c) = − 1 W ˜ cT ∇σc ∇σcT Wc + 1 W ˜ T ∇σc ∇σcT W ˜c H(x, W 2 4 c ˜ cT ∇σc f (x) + λ2 Rln(I ¯ ˆ −W − tanh2 (D)) ¯ − λ2 Rln(I − tanh2 (D)) + εHJB . (35)

c

¯ ˆ based on R ¯ = [r1 , r2 , . . . , rm ] and For Rln(I − tanh2 (D)), −1 T T ˆ m ˆ D = R g (x)∇σc Wc /2λ ∈ R , we have

=

m X i=1

m X i=1

ˆ into the By adding and subtracting λWcT ∇σc g(x)sgn(D) right-hand side of (40), it can be rewritten as ˆ c) = − 1 W ˜ T ∇σc ∇σcT Wc + 1 W ˜ T ∇σc ∇σcT W ˜c H(x, W 2 c 4 c ˜ cT ∇σc f (x) + λWcT ∇σc g(x)(sgn(D) −W ˆ + λW ˜ cT ∇σc g(x)sgn(D) ˆ − sgn(D)) ¯ ˆ − ξD ) + εHJB . + λ2 R(ξ D (41)

ˆ i )) ri ln(1 − tanh(D ˆ i − 2ln(1 + e−2Dˆ i )), ri (ln4 − 2D (36) ˆ

Considering (20) and (21), ∂ec /∂ w ˆc in (28) can be represented as

ˆ

ln(1 + e−2Di ) =

AN US

where the term ln(1 + e−2Di ) can be closely approximated as

1 ˆi ≥ 0 ξD for D ˆi , −2D ˆ i + ξ 2 , for D ˆi < 0 ˆ D

(37)

i

ˆ

ˆ

M

2Di 2 −2Di 1 ) being the ) and ξD with ξD ˆ i = ln(1 + e ˆ i = ln(1 + e ˆ i ≥ 0 and D ˆ i < 0, respectively. approximation errors for D 2 1 k ≤ ln2 and kξD Considering the fact that kξD ˆ i k < ln2, ˆi substituting (37) in to (36) yields

i=1

ˆ i sgn(D ˆ i) + ξ ˆ ) ri (ln4 − 2D Di

¯ =(RI)ln4 −

1 ˆT ˆ W ∇σc g(x)sgn(D) λ c

CE

¯ ˆ, + Rξ D

∂ec =∇σc f (x) − λg(x)tanh(Λ) + λg(x)(tanh(Λ) ∂w ˆc T ˆ ˆ + ∇σc Wc (42) − tanh(D)) 2 ˜c ∇σcT W =∇σc f (x) + g(x)u∗ + d1 − , 2

ˆ + ∇σcT Wc /2. Furwhere d1 = λg(x)(tanh(Λ) − tanh(D)) ther, based on (41) and (42), we obtain ˆ c) = 1 W ˜ T ∇σc ∇σcT W ˜c − W ˜ cT ∇σc Υ H(x, W 4 c ˆ + λWcT ∇σc g(x)(sgn(D) − sgn(D)) ˜ cT ∇σc g(x)(sgn(D) ˆ − tanh(D)) ˆ + λW 2¯ + λ R(ξ ˆ − ξD ) + εHJB .

ED

m X

PT

¯ ˆ = Rln(I − tanh2 (D))

(40)

CR IP T

¯ ˆ = Rln(I − tanh2 (D))

D

+ εHJB .

(43)

D

where Υ = f + g(x)u∗ + d1 .

(38)

˜˙ c = −W ˆ˙ c , substituting (42) and (43) Therefore, based on W into (28), the dynamics of critic NN weight estimation error is given by

AC

where ξDˆ i is the bounded approximate error, i.e., kξDˆ i k ≤ T m ˆ = sgn(D ˆ 1 ), sgn(D ˆ 2 ), . . . , sgn(D ˆ ln4, sgn(D) m) ∈ R with sgn(·) being a sign function, and ξDˆ = ξDˆ 1 , ξDˆ 2 , . . . , T ¯ ξDˆ m ∈ Rm is also bounded. Similarly for Rln(I − 2 ˆ tanh (D)), it is clear that

˜˙ c = − α ∇σc Υ − 1 ∇σc ∇σcT W ˜c − 1W ˜ cT ∇σc ∇σcT W ˜c W 2 4 ˜ T ∇σc Υ + W ˜ cT ∇σc d2 + d3 +W c β ˆ)∇σc ΣJ1x , − Π(x, u 2 (44)

1 ¯ ¯ Rln(I − tanh2 (D)) =(RI)ln4 − WcT ∇σc g(x)sgn(D) λ ¯ D. + Rξ (39)

ˆ − where Σ = g(x)ΘR−1 g T (x), d2 = λg(x)(tanh(D) ˆ and d3 = λWcT ∇σc g(x)(sgn(D) ˆ − sgn(D)) + sgn(D)) 7

ACCEPTED MANUSCRIPT ¯ D − ξ ˆ ) − εHJB . Since kg(x)k ≤ gM and the boundλ2 R(ξ D ness of Θ, Σ is upper bounded by a positive constant ΣM , i.e., kΣk ≤ ΣM . Note that d1 , d2 and d3 are bounded because all the terms in d1 , d2 and d3 are bounded under Assumption 1 and εDˆ and εHJB are bounded.

Proof. Consider the following Lyapunov function candidate J(t) =

1 ˜T˜ W Wc + βJ1 (x), 2 c

(45)

where β > 0 and J1 (x) is given in Lemma 1. The time derivative of J(t) along the system dynamics (22) is derived as

Besides, the persistence of excitation (PE) condition is required for tuning critic NN weights, which ensures the exploration of the state space and is crucial for a proper convergence of critic NN weights to their ideal values. For satisfying the PE condition, a probing noise is added to the control input [22].

T ˙ =W ˜ cT W ˜˙ c + βJ1x J(t) x. ˙

(46)

CR IP T

Substituting (44) into (46), and applying some polynomial adjustments, we can have 1 ˜ T ∇σc ∇σcT W ˜ c )2 − 3 W ˜ T ∇σc ∇σcT W ˜ cW ˜ cT J˙ = − α (W 8 c 4 c ˜ cT ∇σc ΥW ˜ cT ∇σc d2 + (W ˜ cT ∇σc Υ)2 × ∇σc Υ + W 1 ˜T ˜ cW ˜ T ∇σc d2 + W ˜ T ∇σc Υd3 ∇σc ∇σcT W − W c c 2 c β 1 ˜T ˜ c d3 − W ˜ T Π(x, u − Wc ∇σc ∇σcT W ˆ)∇σc ΣJ1x 2 2 c T + βJ1x x. ˙ (47)

Next, the stability analysis will be performed by Lyapunov’s direct method. Before presenting our main theorems, the following assumptions are needed [29, 30].

AN US

Assumption 1 a. The input gain matrix g(x) is known and bounded, i.e., kg(x)k ≤ gM , where gM is a positive constant. b. The NN reconstruction error and its gradient are bounded on a compact set containing Ω so that kεc k ≤ εcM and k∇εc k ≤ εdM , with εcM and εdM being positive constants. c. The NN activation functions and their gradients are bounded, i.e., kσc (·)k ≤ σcM and k∇σc (·)k ≤ σdM for positive constants σcM and σdM d. The ideal NN weights Wc and Vc are upper bounded by known positive constants, i.e., kWc k ≤ WM and kVc k ≤ VM .

M

˜ c and ˜ cT ∇σc ∇σcT W Completing the squares with respect to W T ˜ Wc ∇σc Υ and taking the upper bound yields

ED

Remark 5 These are standard assumptions, except for the rather strong assumption on g(x) in Assumption 1a. Assumption 1a restricts the considered class of uncertain nonlinear systems, many physical systems, such as robotic systems and aircraft systems fulfill such a property [22, 25]. Assumption 1c is satisfied, e.g., by sigmoids, tanh, and other standard NN activation functions.

α ˜T 9α ˜ T 5α 2 4 J˙ ≤ − kW kWc ∇σc k2 kΥk2 + d c ∇σc k + 32 2 2 3 5α ˜ T β ˜T + kWc ∇σc k2 kd2 k2 − W Π(x, u ˆ)∇σc ΣJ1x 2 2 c T + βJ1x x. ˙ (48) ˜ cT ∇σc k2 kΥk2 and the term Considering the term kW 2 2 T ˜ kWc ∇σc k kd2 k , the following relations hold:

2 1 ˜ cT ∇σc k4 + m1 kΥk4 , kW 2 2m1 2 2 1 m ˜ cT ∇σc k2 kd2 k2 ≤ ˜ cT ∇σc k4 + 2 kd2 k4 , kW k W 2m22 2

PT

˜ T ∇σc k2 kΥk2 ≤ kW c

CE

Now, we present the main theorems which not only reveal the optimality and stability of the proposed algorithm for the nominal system, but also show the approximate optimal solution of optimal control problem solved by single-network ADP is equivalent to the solution of bounded robust control problem of the original uncertain system.

(49)

AC

with m1 and m2 being nonzero constants chosen for the design purpose. Then, using the relations (49), (48) is rewritten as 5 ˜T 9αm21 1 9 − − αkWc ∇σc k4 + kΥk4 2 2 32 4m1 4m2 4 5αm22 5α 2 β ˜ T + kd2 k4 + d − W Π(x, u ˆ)∇σc ΣJ1x 4 2 3 2 c T + βJ1x x, ˙ (50)

J˙ ≤ −

Theorem 2 Consider the nominal system (2) with the modified cost function (3). Let the control input be updated by (21) and the critic NN weight tuning law be given by (28). Then, the closed-loop system state x and the weight estima˜ c are uniformly ultimately bounded tion error of critic NN W (UUB). Furthermore, the obtained control input u ˆ in (21) can be proved to converge to the optimal control approximately, i.e., u ˆ is close to the optimal control u∗ with a finite bound given by (65).

where m1 and m2 are chosen to satisfy 1/32 − 9/(4m21 ) − 5/(4m22 ) > 0. According to Lemma 1, for Υ = f +g(x)u∗ +

8

ACCEPTED MANUSCRIPT ˜ c k2 yields Then, completing the square with respect to kW

d1 , the relation kΥk ≤ ϑ(x) + kd1 k holds. Furthermore, by the Cauchy-Schwarz inequality and the given assumption p ϑ(x) = 4 κkJ1x k, we have (51)

with 0 < σdm ≤ k∇σc k ensured by kxk > 0 for a constant σdm , δ = 1/32 − 9/(4m21 ) − 5/(4m22 ) > 0, and Φ1 = 18m21 kd1 k4 + 5m22 kd2 k4 /4 + 5d23 /2 is bounded.

s 1 Φ3 kJ1x k ≥ = A2 2 λ 2σdm (Ψ) 2δαβλ min min (Ψ)

Next, for Π(x, u ˆ) = 0, it is clear that J˙1 (x) < 0, i.e., T J1x x˙ ≤ 0. Since kxk > 0 is guaranteed by the persistence of excitation condition, there exists a constant x˙ min satisfying 0 < x˙ min ≤ kxk, ˙ then, (51) becomes

or ˜ ck ≥ 1 kW 2 2σdm

4 ˜ c k4 − (β x˙ min − 18αm21 κ)kJ1x k + αΦ1 . J˙ ≤ −δασdm kW (52)

σdm

4

(59)

AN US

M

˜ ck > kW

Φ3 = B2 δ 2 α2 λ2min (Ψ)

(53)

Furthermore, we prove kˆ u − u∗ k ≤ . Since tanh(·) is a continuously differential function, expanding tanh(Λ) using ˆ yields Taylor series about the operating point of D

or r

4

In summary, for the two cases Π(ˆ x, u ˆ) = 0 and 1, with α and β satisfying β/α > 18m21 κ/x˙ min , the inequalities kJ1x k ≥ ˜ c k ≥ max{B1 , B2 } = B ¯ holds, max{A1 , A2 } = A¯ or kW ˙ then J < 0. Thus, by the Lyapunov Extension Theorem [8], we can draw the conclusion that the system state x and the ˜ c are UUB. critic NN weight estimation error W

αΦ1 = A1 β x˙ min − 18αm21 κ

1

s

(58)

hold, it is guaranteed that J˙ < 0.

Considering (52), if α and β are selected to satisfy β/α > 18m21 κ/x˙ min and given the following inequalities kJ1x k >

(57)

4 4 where Φ3 = β 2 σdM Σ4M + 32δασdm λ2min (Ψ)Φ2 . Thus, as long as the following inequalities

CR IP T

4 ˜ c k4 + 18αm21 κkJ1x k + αΦ1 J˙ ≤ − δασdm kW β ˜T T Π(x, u ˆ)∇σc ΣJ1x + βJ1x x˙ − W 2 c

δα 4 ˜ c k4 − β λmin (Ψ)kJ1x k2 J˙ ≤ − σ kW 2 dm 4 Φ3 + 4 λ2 (Ψ) , 32δασdm min

Φ1 = B1 δ

(54)

ED

hold, then J˙ is negative definite, i.e., J˙ < 0.

PT

For Π(ˆ x, u ˆ) = 1, by adding and subtracting βJ1x λg(x)tanh(Λ) into the right-hand side of (51), we obtain

CE

4 ˜ T ∇σc ΣJ1x ˜ c k4 + 18αm21 κkJ1x k − β W J˙ ≤ − δασdm kW 2 c T T + βJ1x (f (x) + g(x)u∗ ) + αΦ1 + βJ1x d4 , (55)

AC

β

4λmin (Ψ) β − λmin (Ψ)kJ1x k2 + Φ2 , 4

2 ˜ c k2 σdM Σ2M kW

(60)

ˆ i )}, i = 1, 2, . . . , m, O((Λ − where Θ = diag{1 − tanh2 (D 2 ˆ D) ) is the higher-order terms of the Taylor series. Subˆ = stituting Λ = R−1 g T (x)(∇σcT Wc + ∇εc )/2λ and D −1 T T ˆ R g (x)∇σc Wc /2λ into (60), we have ˆ 2 ) = 1 ΘR−1 g T (x)∇σ T W ˜ c + 1 ΘR−1 g T (x) O((Λ − D) c 2λ 2λ ˆ × ∇εc + tanh(Λ) − tanh(D). (61)

ˆ is bounded. Accordwhere d4 = λg(x)(tanh(Λ)−tanh(D)) ing to (31) in Lemma 1, and by completing the squares, (55) becomes 4 ˜ c k4 + J˙ ≤ − δασdm kW

ˆ + Θ(Λ − D) ˆ + O((Λ − D) ˆ 2 ), tanh(Λ) = tanh(D)

Recalling ktanh(·)k ≤ 1, kg(x)k ≤ gM , k∇εc k ≤ εdM and boundness of Θ and R, it is observed from (61) that ˆ 2 ) is bounded by O((Λ − D)

(56)

ˆ 2 )k ≤ θ1 kW ˜ c k + θ2 , kO((Λ − D)

where Φ2 = (18αm21 κ)2 /(βλmin (Ψ))+βkd4 k2 /λmin (Ψ)+ αΦ1 , and λmin (Ψ) is the minimum eigenvalue of Ψ(x).

(62)

where θ1 , θ2 are computable positive constants. By using

9

ACCEPTED MANUSCRIPT (20) and (21), and combining (60), u ˆ − u∗ is given by

5.1.

ˆ u ˆ − u∗ =λ(tanh(Λ) − tanh(D)) 1 ˜ c + ∇εc ) = ΘR−1 g T (x)(∇σcT W 2 ˆ 2 ). + λO((Λ − D)

Consider the uncertain nonlinear mass-spring system [2] (63)

Example 1

where

Based on (62) and the assumption kΘk ≤ ΘM , we obtain ˜ c k + Γ2 , kˆ u − u k ≤ Γ1 kW ∗

f (x) =

(64)

where Γ1 = ΘM kR−1 kgM σdM /2 + λθ1 and Γ2 = ΘM kR−1 kgM εdM /2 + λθ2 . According to the above anal˜ c k exceeds the bound, i.e., kW ˜ c k > B, ¯ kW ˜ c k is ysis, if kW ¯ which implies the critic NN weight estimabounded by B, ˜ c is also bounded by B. ¯ Further, by (64), we tion error W can conclude

g(x) =

1

,

x2 −0.01x1 − 0.67x31 " ∆f (x) =

#

,

0

p1 x2 sin x1

#

with x = [x1 , x2 ]T ∈ R2 is the system state, u ∈ U = {u ∈ R : |u| ≤ 0.5} is the control input, and p1 is the unknown parameter. The term ∆f (x) represents the unknown perturbation in the system. Let the initial state x0 = [1, −1]T and assume the parameter p1 ∈ [−1, 1] in the simulation. 2 With regard to ∆f (x), we choose fmax (x) = x22 such that 2 2 k∆f (x)k ≤ fmax (x).

(65)

Therefore, the proof is completed.

" # 0

"

CR IP T

¯ + Γ2 = . kˆ u − u∗ k ≤ Γ1 B

(66)

x(t) ˙ = f (x) + g(x)u + ∆f (x),

At first, for the transformation of bounded robust control problem into an optimal control problem of the nominal system (i.e., x(t) ˙ = f (x) + g(x)u), the modified cost function with nonquadratic utility is defined by (3), where the sate and control penalties are Q = diag(1, 1) and R = R 1, respectively, the nonquadratic utility is u W (u) = 4 0 tanh−T (2υ)dυ due to |u| ≤ 0.5. Then, an optimal learning algorithm is developed for solving the optimal problem by single-network ADP, where a single critic NN is used to approximate the cost function. The activation function of the critic NN is selected with k = 3 neurons as σc (x) = [x21 , x1 x2 , x22 ]T , and the critic NN ˆ c = [Wc1 , Wc2 , Wc3 ]T . The learning rates for the weight W critic NN are chosen as α = 0.5 and β = 1, and the Lyapunov function candidate J1 (x) is selected as a quadratic ˆ c are initialized polynomial. Moreover, all the weights of W to zero, which means that no initial stabilizing control is needed for implementing the proposed algorithm.

AN US

In the next theorem, the equivalence of the approximate solution of optimal control problem and the solution of bounded robust control problem is proved.

M

Theorem 3 Assume that the approximate HJB solution to the optimal control problem of nominal system is obtained by single-network ADP. Then, the obtained optimal control u ˆ defined by (21) ensures the closed-loop asymptotic stability of original uncertain system if the additive modified term and nonquadratic utility of cost function are defined by (11) and (4), respectively.

PT

ED

Proof. By single-network ADP, the approximate solution of HJB equation, i.e., Vˆ (x), and the obtained optimal control u ˆ are obtained by (17) and (21). Now, we show that with this control u ˆ, the closed-loop system remains asymptotically stable under the perturbation ∆f (x). Considering (17) and the selection of σ(·), we have Vˆ (x) > 0, ∀x 6= 0 and ˙ Vˆ (0) = 0. Then, it is easy to prove Vˆ (x) ≤ −xT Qx < 0 for any x 6= 0 similarly to the proof of Theorem 1 by replacing V ∗ (x) with Vˆ (x).

CE

During the implementation of the present ADP algorithm, the PE condition needs to be guaranteed by adding a probing noise into the control input [22]. The damped exponential probing noise consisting of sinusoids of different frequencies is used and added to the control input for the first 1150s. After the learning phase, the weights of the critic NN converge to Wc = [0.0441, 0.7827, 1.3501]T as shown in Fig. 1. It is observed from Fig. 1 that after 1150s, the convergence of critic NN weights has occurred. This shows the added probing noise effectively guarantees the PE condition. On convergence, the PE condition is no longer needed and is thus removed. Fig. 2 presents the evolution of system sates. From Fig. 2, it is clear that the system states remain very close to zero after the probing noise is turned off.

AC

Based on Theorem 3, it can be concluded that instead of solving the robust control problem directly, we can solve the optimal control problem of the nominal system by singlenetwork ADP to obtain the bounded robust control. It is also shown that this optimization procedure can produce the bounded robust control that stabilizes the original system with nonlinear uncertainties. 5.

Simulation results

In this section, four examples are provided to demonstrate the effectiveness of the proposed design method of bounded robust controller.

Based on the converged weights of critic NN, the bounded robust control of the original uncertain system, that is, the

10

ACCEPTED MANUSCRIPT

0.5 1.5

u

0.4 0.3

1

The control

0.5

0

Wc1

1 −0.5

−1

0 −0.1 −0.2

Wc2 0

0.1

−0.3

Wc3

−0.4 0

1

2

3

−0.5

−1 0

200

400

600 Time (s)

800

1000

0

2

1200

0.6

System States

0.4 0.2 0 −0.2

5.2.

400

600 Time (s)

800

1000

PT

0.8

0.2 0 −0.2 −0.4 −0.6 −0.8

2

4

6

8

10 Time (s)

20

Consider the uncertain nonlinear continuous-time system derived from [22]

f (x) =

"

g(x) =

"

x2

(67)

−x1 + x2

−0.5x1 − 0.5x2 (1 − (cos(2x1 ) + 2)2 ) 0 cos(2x1 ) + 2

#

,

∆f (x) =

"

#

,

p1 x2 sin x1 p2 x1 cos x2

#

with the state x = [x1 , x2 ]T ∈ R2 , the control u ∈ U = {u ∈ R : |u| ≤ 1}, and p1 , p2 are the unknown parameters in the perturbation ∆f (x). Let the initial state be x0 = [0.5, −0.5]T . Since the unknown parameter pi ∈ [−1, 1], i = 2 1, 2 in the simulation, we choose fmax (x) = x21 + x22 such 2 2 that k∆f (x)k ≤ fmax (x).

−1 0

18

Example 2

x1

AC

System States

0.4

16

where

CE

0.6

14

x(t) ˙ = f (x) + g(x)u + ∆f (x),

Fig. 2. Evolution of nominal system states during learning process

1

12

1200

ED

200

M

−0.4

0

10 Time (s)

AN US

x1 x2

0.8

−1

8

optimal control of the nominal system, can be obtained by using (21). For assessing the robust performance of the obtained control, p1 is selected as the time-varying parameter, i.e., p1 = sin(t), in the perturbation. Fig. 3 presents the state trajectories of the original system under the action of the obtained robust control, which is shown in Fig. 4. It can be seen from Figs. 3 and 4 that the approximate optimal control makes the original uncertain system asymptotically stable and satisfies the given control constraints, which validates our proposed approach.

1

−0.8

6

Fig. 4. The bounded robust control

Fig. 1. Convergence curves of critic NN weights

−0.6

4

CR IP T

Weights of the critic NN

0.2

12

14

16

18

20

The nominal system is x(t) ˙ = f (x) + g(x)u with f (x) and g(x) given in (67). The corresponding cost function is defined R u as (3), where Q = diag(1, 1), R = 1 and W (u) = 2 0 tanh−T (υ)dυ. The activation function of the critic NN

Fig. 3. The trajectories of original system states

11

ACCEPTED MANUSCRIPT is selected as σc (x) =

[x21 ,

0.5

x1 x2 ,

x22 ,

x41 ,

x31 x2 ,

x21 x22 ,

x1 x32 ,

x42 ]T ,

x1

0.4

x2

0.3

which results in eight neurons and the critic NN weight ˆ c = [Wc1 , Wc2 , . . . , Wc8 ]T . Note that the contributions W

System States

0.2

1

Wc1

−0.2

Wc2

0.6 0.4 0.2

Wc3

−0.3

0.4

Wc4

−0.4

0.2

Wc5

0

Wc6

−0.2

Wc7 0

1

2

−0.5

0

Wc8

1

2

0 100

200

300

400 500 Time (s)

600

700

800

−0.1

900

−0.2

5 Time (s)

6

7

8

9

10

u

−0.3

The control

Fig. 5. Convergence curves of critic NN weights

0.6

4

AN US

0

3

Fig. 7. The trajectories of original system states

0 −0.2

x1 x2

0.4

−0.4 −0.5 −0.6 −0.7 −0.8 −0.9

M

0.2 0 −0.2 −0.4

0

100

200

300

PT

−0.6 400 500 Time (s)

600

700

800

−1

0

1

2

3

4

5 Time (s)

6

7

8

9

10

Fig. 8. The bounded robust control

ED

System States

0 −0.1

CR IP T

Weights of the critic NN

0.8

0.1

of system sates is depicted in Fig. 6. From Figs. 5 and 6, it is clear that the convergence of critic NN weights has occurred after 850s, and the system states converge to zero after the probing noise is turned off. Based on the converged weights of critic NN, the bounded robust control, namely, the approximate optimal control, is obtained by (21). For assessing the robust performance of the obtained control, the constant and time-varying parameters are chosen for the unknown parameters (p1 , p2 ) in the perturbation ∆f (x), respectively.

900

CE

Fig. 6. Evolution of nominal system states during learning process

of different inputs (x1 , x2 ) in the basis functions of σc (x) are selected mainly through trial and error such that the NN error is small. Let the learning rates for the critic NN be α = 0.3 and β = 1. For the additional term in (28), the Lyapunov function candidate J1 (x) is also selected as a quadratic polynomial, i.e., J1 (x) = xT x/2. Moreover, the ˆ c are also initialized to zero. Then there is no weights of W need for an initial stabilizing control.

AC

For the constant parameters (p1 = −1, p2 = 1), the approximate optimal control computed from (21) is applied to the original uncertain system. Fig. 7 shows the system state trajectories, while Fig. 8 depicts the control input. It can be seen from Figs. 7 and 8, under the action of the approximate optimal control, the system states converge to the equilibrium point. In other words, the approximate optimal control computed from (21) is the bounded robust control of the original system and can globally asymptotically stabilize the uncertain system under the perturbation without exceeding any control constraint.

Similar to Example 1, the same probing noise is used and added to the control input for the first 850s. After the learning phase, the weights of the critic NN converge to Wc = [0.0811, − 0.1851, 0.9828, 0.0337, − 0.0393, 0.1261, − 0.1386, 0.4972]T as shown in Fig. 5, while the evolution 12

ACCEPTED MANUSCRIPT 5.3.

0.5

x1

0.4

To further illustrate the usefulness of the proposed approach, a single link robot arm is considered [6], in which the dynamic equation is given by

x2

0.3 0.2 System States

Example 3

0.1

¨ = − M gL sin(θ(t)) − D θ(t) ˙ + 1 u(t) + 1 w(t), θ(t) J J J J

0 −0.1

(68)

−0.2

where θ(t) is the angle position of the arm, u(t) is the control input and w(t) is the external disturbance. M is the mass of the payload, J is the moment of inertia, g is the acceleration of gravity, L is the length of the arm and D is the viscous friction. The values of parameters g, D and L are given by g = 9.81, D = 2 and L = 0.5, respectively.

−0.3

−0.5

0

1

2

3

4

5 Time (s)

6

7

8

9

CR IP T

−0.4 10

Fig. 9. The trajectories of original system states

˙ Assuming x1 (t) = θ(t) and x2 (t) = θ(t), the dynamic function (68) can be rewritten by

0

u

−0.1

x˙1 (t) = x2 (t) 2 4.905M sin(x1 (t)) 1 1 x˙2 (t) = − x2 (t) − + u(t) + w(t) J J J J (69)

AN US

−0.2

The control

−0.3 −0.4 −0.5 −0.6 −0.7 −0.8

−1

0

1

2

3

4

5 Time (s)

6

7

8

M

−0.9 9

10

ED

Fig. 10. The bounded robust control

where the system state x = [x1 , x2 ]T , the control input u with the saturation limit of |u| ≤ 1, and the perturbation term ∆f (x) = [0, w(t)/J]T . In the simulation, we assume the external disturbance as w(t) = x2 (t) sin(x1 (t)) and choose the parameters M and J as M = 1 and J = 1. Since the perturbation term ∆f (x) = [0, x2 sin(x1 )]T , we choose 2 2 (x). The initial (x) = x22 such that k∆f (x)k2 ≤ fmax fmax T state is set as x(0) = [1, −0.5] . 0.8

For the time-varying parameters (p1 = cos(t), p2 = − sin(1.5πt)), using the same weights of critic NN and (21), we apply the obtained control to the original uncertain system. From Figs. 9 and 10, it clear that the obtained bounded robust control makes the original system asymptotically stable and satisfies the given control constraints.

Wc1

PT

Weights of the critic NN

0.6

CE

In Figs. 7−10, two specific cases for the parameters (p1 , p2 ) are simulated to illustrate the boundedness of control input and the convergence of the system state to the equilibrium point under the action of the approximate optimal control of nominal system. From these results, we can further find that a set of perturbation parameters (p1 , p2 ) leads to a new state trajectory. Without loss of generality, the perturbation parameters can be chosen randomly within their boundaries. Therefore, we can conclude that under the bounded robust control, namely, the approximate optimal control of nominal system, the system states can asymptotically converge regardless of the values of the perturbation parameters within their boundaries. These results show the applicability of the proposed approach for the design of bounded robust controller.

0.4

0.2

Wc2

0

Wc3 Wc4

−0.2

0.2

0

1

2

Wc5 Wc6

0 −0.2

AC

−0.4 −0.6

0

200

400

600 Time (s)

800

1000

1200

Fig. 11. Convergence curves of critic NN weights

The nominal system is derived by removing the perturbation term ∆f (x) from the original system. The corresponding cost function with nonquadratic utility is also defined as (3), where Q = diag(1, 1),

13

ACCEPTED MANUSCRIPT point without exceeding the saturation limit. These simulation results demonstrate the effectiveness and feasibility of the propose approach.

1

x1 x2

5.4.

Consider the third-order nonlinear uncertain system derived from [32]

−0.5

0 x(t) ˙ = 0.1x1 − x2 − x1 x3 + 1 u + ∆f (x), 0 x1 x2 − x3

−1

−1.5

0

1

2

3

4

5 Time (s)

6

7

8

9

10

0.1

u 0.05

(70)

AN US

0 −0.05

Without the perturbation term ∆f (x), we can obtain the nominal system from (70). The corresponding cost function is defined R u by (3), where Q = diag(1, 1, 1), R = 1 and W (u) = 4 0 tanh−T (2υ)dυ due to |u| ≤ 0.5. The activation function of the critic NN is selected as

−0.1 −0.15 −0.2 −0.25 −0.3

0

1

2

3

4

5 Time (s)

6

7

8

M

−0.35 −0.4

−x1 + x2

where the state x = [x1 , x2 , x3 ]T ∈ R3 , the control u ∈ U = {u ∈ R : |u| ≤ 0.5}, and the perturbation term ∆f (x) = [0, 0, px1 sin x2 cos x3 ]T and the parameter p ∈ [−1, 1]. Let the initial state be x0 = [1, −1, 0.5]T . With 2 regard to ∆f (x), we choose fmax (x) = x21 sin2 x2 cos2 x3 2 such that k∆f (x)k2 ≤ fmax (x).

Fig. 12. The state trajectories of the single link robot arm system

9

10

Fig. 13. The bounded robust control

σc (x) = [x21 , x22 , x23 , x1 x2 , x1 x3 , x2 x3 , x41 , x42 , x43 , x21 x22 , x21 x23 , x22 x23 , x21 x2 x3 , x1 x22 x3 , x1 x2 x23 , x31 x2 , x31 x3 , x32 x3 , x32 x1 , x1 x33 , x2 x33 ]T ,

ED

which results in 21 neurons and the critic NN weight ˆ c = [Wc1 , Wc2 , . . . , Wc21 ]T . Let the learning rates for the W critic NN and the Lyapunov function candidate be α = 0.2, β = 1 and J1 (x) = xT x/2, respectively. Moreover, all the ˆ c are initialized to zero, and similar to Example weights of W 1, a probing noise is used and added into the control input for guaranteeing the PE condition. After a sufficient learning phase, the weights of the critic NN converge to Wc = [0.0434, 0.1963, 0.0343, 0.7243, −0.0350, 0.0161, 0.0808, 0.2606, 0.0069, −0.0645, 0.0476, 0.0573, −0.0374, 0.0134, − 0.0126, 0.1307, 0.0263, 0.0245, − 0.0445, 0.0391, 0.0226]T as shown in Fig. 14. Based on the converged weights of critic NN, the bounded robust control of the original uncertain system, namely, the optimal control of the nominal system, can be obtained by using (21). In the simulation, the scalar parameter p = −1 is chosen for evaluating the robust control performance. The state trajectories are shown in Fig. 15 when applying the obtained robust control to system (70) for 10s. Fig. 16 depicts the control input. It can be seen from Figs. 15 and 16 that the obtained robust control makes the original uncertain system asymptotically stable and satisfies the given control constraints. The results validate the proposed approach.

CE

PT

Ru R = 1 and W (u) = 2 0 tanh−T (υ)dυ due to |u| ≤ 1. The activation function of the critic NN is selected as σc (x) = [x21 , x1 x2 , x22 , x31 x2 , x21 x22 , x1 x32 ]T , which results in six neurons and the critic NN weight ˆ c = [Wc1 , Wc2 , . . . , Wc6 ]T . The learning rates for the W critic NN are selected as α = 0.1 and β = 1, and J1 (x) is chosen as a quadratic polynomial. Similar to Example 1, a probing noise is added to satisfy the PE condition ˆc during the implementation. Moreover, the weights of W are also initialized to zero. Then there is no need for an initial stabilizing control. After a sufficient learning phase, the weights of the critic NN converge to Wc = [0.1811, −0.5004, 0.7707, −0.1628, 0.1584, −0.1413]T as shown in Fig. 11. Then, using the converged weights of critic NN, the bounded robust control, namely, the optimal control of nominal system, can be computed by (21). When applying the obtained robust control to the single link robot arm system (68) for 10s, the state trajectories are shown in Fig. 12 and the control input is depicted in Fig. 13. It can be seen from Figs. 12 and 13 that the obtained robust control makes the system state variables reach the equilibrium

AC

The control

Example 4

0

CR IP T

System States

0.5

14

ACCEPTED MANUSCRIPT 6.

Conclusion

0.9

Wc1

0.8

Wc2

Wc3 . . .

Wc20

In this paper, we propose an effective optimal approach to design the bounded robust controller for uncertain nonlinear systems subject to input constraints. The bounded robust control problem is transformed into the corresponding optimal control problem of nominal system by a modified cost function with nonquadratic utility and the equivalence of the transformation is proved. Then based on single-network ADP, an optimal learning algorithm is developed to obtain the optimal control of nominal system. Uniform ultimate boundedness of the closed-loop system is guaranteed by Lyapunov’s direct method during the learning process. Four simulation examples are presented to reinforce the theoretical results as well.

Wc21

Weights of the critic NN

0.7 0.6

0.1

0.5

0

0.4 −0.1

0.3

0

0.5

0.2 0.1

−0.1

0

200

400

600

800

1000 1200 Time (s)

1400

1600

1800

CR IP T

0 2000

Furthermore, it is noted that the number of basis functions for the critic NN, used in the design, tends to increase exponentially with the dimension of the input space, which brings great difficulty to the construction of the critic NN and even make the critic NN become practically infeasible when the dimension of the input space is high. To a certain extent, this limits the applications to the developed approach for higher-order nonlinear systems in practice. In our future work, we will try to overcome this limitation by means of other advanced NNs and intelligent algorithms. Additionally, how to extend the current results to more generalized nonlinear systems, such as nonaffine nonlinear systems, serves as another interesting direction of future research for more practical applications.

Fig. 14. Convergence curves of critic NN weights

1 0.8 0.6

AN US

x1 x2 x3

System States

0.4 0.2 0 −0.2

−0.6 −0.8 0

1

2

3

4

5 Time (s)

6

7

8

9

10

PT

Fig. 15. The trajectories of original system states

0.5 0.45

0.3 0.25 0.2 0.15 0.1 0.05 0

[4] X.Y. Lu, S.K. Spurgeon, Robust sliding mode control of uncertain nonlinear systems, Syst. Contr. Lett. 32(2) (1997) 75–90.

u

[5] X. Zhao, S. Peng, X. Zheng, J. Zhang, Intelligent tracking control for a class of uncertain high-order nonlinear systems, IEEE Trans. Neural Netw. Lear. Syst. 27(9) (2016) 1976–1982. [6] H.N. Wu, K.Y. Cai, Mode-independent robust stabilization for uncertain Markovian jump nonlinear systems via fuzzy control, IEEE Trans. Syst. Man Cybern. Part B Cybern. 36(3) (2005) 509–519.

AC

The control

0.35

0

1

2

3

[1] K. Zhou, J. Doyle, K. Glover, Robust and optimal control, Prentice Hall, New Jersey, 1996.

[3] X. Zhao, L. Zhang, P. Shi, H.R. Karimi, Robust control of continuoustime systems with state-dependent uncertainties and its application to electronic circuits, IEEE Trans. Ind. Electron. 61(8) (2014) 41614170.

CE

0.4

References

[2] H. Zhang, J. Yang, C. Su, T-S fuzzy-model-based robust H∞ design for networked control systems with uncertainties, IEEE Trans. Ind. Inform. 3(4) (2007) 289–301.

ED

−1

M

−0.4

[7] F. Lin, An optimal control approach to robust control design, Int. J. Control 73(3) (2000) 177–186 [8] F.L. Lewis, V.L. Syrmos, Optimal Control, Wiley, New York, 1995. 4

5 Time (s)

6

7

8

9

[9] S. Jagannathan, Neural Network Control of Nonlinear Discrete-Time Systems, CRC Press, Boca Raton, 2006.

10

[10] J. Si, A.G. Barto, W.B. Powell, Handbook of Learning and Approximate Dynamic Programming, Wiley, New York, 2004.

Fig. 16. The bounded robust control

[11] F.Y. Wang, H. Zhang, D. Liu, Adaptive dynamic programming: an introduction, IEEE Comput. Intell. Mag. 4(2) (2009) 39–47.

15

ACCEPTED MANUSCRIPT [12] D. Wang, D. Liu, Q. Wei, D. Zhao, J. Ning, Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Automatica, 48(8) (2012) 1825–1832.

[32] D. Liu, D. Wang, F.Y. Wang, H. Li, X. Yang, Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems, IEEE Trans. Cyber. 44(12) (2014) 2834–2847.

[13] H. He, Z. Ni, J. Fu, A three-network architecture for on-line learning and optimization based on adaptive dynamic programming, Neurocomputing, 78(1) (2012) 3–13. [14] A. Heydari, S.N. Balakrishnan, Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics, IEEE Trans. Neural Netw. Lera. Syst. 24(1) (2013) 145–157. [15] Q. Wei, F.Y. Wang, D. Liu, X. Yang, Finite-approximatino-errorbased discrete-time iterative adpative dynamic programming, IEEE Trans. Cybern. 44(12) (2014) 2820–2833.

CR IP T

[16] R. Song, H. Zhang, Y. Luo, Q. Wei, Optimal control laws for timedelay systems with saturating actuators based on heuristic dynamic programming, Neurocomputing 73(16-18) (2010) 3020–3027.

Yuzhu Huang received the B.S. degree in automation from Auhui University of Science and Technology, Huainan, China, the M.S. degree in control theory and control engineering from Beijing Institute of Technology, Beijing, China, and the Ph.D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2008, 2010, and 2013 respectively. From 2013 to 2015, he was a post-doctoral fellow with the Department of Thermal Engineering, Tsinghua University, Beijing, China. He is currently a Senior Engineer with National Research Center of Gas Turbine & IGCC Technology and Beijing Huatsing Gas Turbine & IGCC Technology Co., Ltd., Beijing, China. His research interests include adaptive dynamic programming, neural networks, nonlinear control, and their industrial applications.

[17] Y. Huang, D. Liu, Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm, Neurocomputing 125(3) (2014) 46–56. [18] J.J. Murray, C.J. Cox, G.G. Lendaris, R. Saeks, Adaptive dynamic programming, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 32(2) (2002) 140–153.

AN US

[19] M. Abu-Khalaf, F.L. Lewis, Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica 41(5) (2005) 779–791. [20] D. Vrabie, M. Abu-Khalaf, F.L. Lewis, Adaptive optimal control for continuous-time linear systems based on policy iteration, Automatica 45(2) (2009) 477–484. [21] D. Vrabie, F.L. Lewis, Neural network approach to continuoustime direct adaptive optimal control for partially unknown nonlinear systems, Neural Netw. 22(3) (2009) 237–246.

[22] K.G. Vamvoudakis, F.L. Lewis, Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica 46(5) (2010) 878–888.

M

[23] X. Yang, D. Liu, Y. Huang, Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints, IET Control Theory Appl. 7(17) (2013) 2037–2047.

ED

[24] H. Modares, M.N. Sistani, F.L. Lewis, A policy iteration approach to online optimal control of continuous-time constrained-input systems, Isa Trans. 52 (2013) 611–621. [25] H. Modares, F.L. Lewis, Optimal tracking control of nonlinear partially-unknown constrained input systems using integral reinforcement learning, Automatica 50(7) (2014) 1780–1792.

PT

[26] D. Adhyaru, I. Kar, M. Gopal, Bounded robust control of nonlinear systems using neural network-based HJB solution, Neural Comput. Appl. 20(1) (2011) 91–103.

Ding Wang received the B.S. degree in mathematics from Zhengzhou University of Light Industry, Zhengzhou, China, the M.S. degree in operations research and cybernetics from Northeastern University, Shenyang, China, and the Ph.D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2007, 2009, and 2012, respectively. He was a Visiting Scholar with the Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI, USA, from December 2015 to January 2017. He is currently an Associate Professor with The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences. His research interests include adaptive and learning systems, computational intelligence, and intelligent control. He has published over 80 journal and conference papers,

CE

[27] D. Wang, D. Liu, H.L. Li, Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems, IEEE Trans. Auto. Sci. Eng. 11(2) (2014) 627–632.

AC

[28] Y. Jiang, Z.P. Jiang, Robust adaptive dynamic programming and feedback stabilization of nonlinear systems, IEEE Trans. Neural Netw. Lear. Syst. 25(5) (2014) 882–893. [29] D. Liu, X. Yang, D. Wang, Q. Wei, Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints, IEEE Trans. Cybern. 45(7) (2015) 1372–1385. [30] D. Wang, D. Liu, H.L. Li, H. Ma, Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming, Inf. Sci. 282 (2014) 167–179. [31] H.N. Wu, M.M. Li, L. Guo, Finite-horizon approximate optimal guaranteed cost control of uncertain nonlinear systems with application to Mars entry guidance, IEEE Trans. Neural Netw. Lear. Syst. 26(7) (2015) 1456–1467.

16

ACCEPTED MANUSCRIPT and coauthored two monographs.

sembly in 2014. He is a Fellow of the IEEE, a Fellow of the International Neural Network Society, and a Fellow of the International Association of Pattern Recognition.

M

AN US

CR IP T

He is the Publications Chair of the 24th International Conference on Neural Information Processing (ICONIP 2017). He was the Finance Chair of the 12th World Congress on Intelligent Control and Automation (WCICA 2016), the Secretariat of the 2014 IEEE World Congress on Computational Intelligence (IEEE WCCI 2014), and the Registration Chair of the 5th International Conference on Information Science and Technology (ICIST 2015) and the 4th International Conference on Intelligent Control and Information Processing (ICICIP 2013), and served as the program committee member of several international conferences. He was a recipient of the Excellent Doctoral Dissertation Award of Chinese Academy of Sciences in 2013, and a nomination of the Excellent Doctoral Dissertation Award of Chinese Association of Automation (CAA) in 2014. He serves as an Associate Editor of IEEE Transactions on Neural Networks and Learning Systems and Neurocomputing. He is a member of Institute of Electrical and Electronics Engineers (IEEE), Asia-Pacific Neural Network Society (APNNS), and CAA.

AC

CE

PT

ED

Derong Liu (S’91M’94SM’96F’05) received the Ph.D. degree in electrical engineering from the University of Notre Dame in 1994. He was a Staff Fellow with General Motors Research and Development Center, from 1993 to 1995. He was an Assistant Professor with the Department of Electrical and Computer Engineering, Stevens Institute of Technology, from 1995 to 1999. He joined the University of Illinois at Chicago in 1999, and became a Full Professor of Electrical and Computer Engineering and of Computer Science in 2006. He was selected for the “100 Talents Program” by the Chinese Academy of Sciences in 2008, and he served as the Associate Director of The State Key Laboratory of Management and Control for Complex Systems at the Institute of Automation, from 2010 to 2015. He has published 18 books. He is the Editor-in-Chief of Artificial Intelligence Review (Springer). He was the Editor-in-Chief of the IEEE Transactions on Neural Networks and Learning Systems from 2010 to 2015. He received the Faculty Early Career Development Award from the National Science Foundation in 1999, the University Scholar Award from University of Illinois from 2006 to 2009, the Overseas Outstanding Young Scholar Award from the National Natural Science Foundation of China in 2008, and the Outstanding Achievement Award from Asia Pacific Neural Network As-

17

Copyright © 2023 C.COEK.INFO. All rights reserved.