Event-driven optimal control for uncertain nonlinear systems with external disturbance via adaptive dynamic programming

Accepted Manuscript Event-Driven Optimal Control for Uncertain Nonlinear Systems with External Disturbance via Adaptive Dynamic Programming Aijuan Wa...

Download PDF

1MB Sizes 0 Downloads 128 Views

Report

PDF Reader
Full Text

Accepted Manuscript

Event-Driven Optimal Control for Uncertain Nonlinear Systems with External Disturbance via Adaptive Dynamic Programming Aijuan Wang, Xiaofeng Liao, Tao Dong PII: DOI: Reference:

S0925-2312(17)31837-4 10.1016/j.neucom.2017.12.010 NEUCOM 19143

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

19 May 2017 8 September 2017 2 December 2017

Please cite this article as: Aijuan Wang, Xiaofeng Liao, Tao Dong, Event-Driven Optimal Control for Uncertain Nonlinear Systems with External Disturbance via Adaptive Dynamic Programming, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.12.010

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Event-Driven Optimal Control for Uncertain Nonlinear Systems with External Disturbance via Adaptive Dynamic Programming a b

, Xiaofeng Liao

a b 1

, Tao Dong

a b 1

CR IP T

Aijuan Wang

a College of Electronic and Information Engineering, Southwest University, Chongqing 400715,China and b Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing

Abstract

AN US

This paper investigates the optimal control of uncertain nonlinear systems with external disturbance via adaptive dynamic programming, where two kinds of controllers are introduced and event-driven strategy is used to update the controllers as well as the network weights. One controller is constrained-input, and another is used to offset the external disturbance. Three kinds of neural networks (identifier network, critic network and actor network) are used to approximate

M

the uncertain dynamics, the optimal value, control inputs and external disturbance, respectively. To the best knowledge of the authors, the advantage of this paper is that almost all the existing

ED

systems about optimal control via adaptive dynamic programming can be considered as special cases of this system, where the external disturbance is considered. By designing appropriate pa-

PT

rameters, the uncertain nonlinear system is asymptotically stable with the event-driven controller. And, under the given learning rates of neural networks, the system state and the estimation error

CE

of neural networks are proved to be uniform ultimate boundedness (UUB) via adaptive dynamic programming. Finally, numerical simulation results are presented to verify the theoretical analysis. Keywords: Uncertain nonlinear systems, control constraints, event-driven optimal control,

AC

external disturbance, adaptive dynamic programming (ADP) 1. Introduction The past few decades has witnessed the rapid development of optimal control [1-4] due to its extensive applications in the fields of engineering like aerospace, unmanned aerial vehicles, pilotless automobiles, robotics and so on. Optimal controller can be obtained by solving the 1

Corresponding author E-mail addresses: [email protected],david [email protected].

Preprint submitted to Elsevier

December 5, 2017

ACCEPTED MANUSCRIPT

Hamilton-Jacobi-Bellman equation which is usually difficult to have its globally analytic solution. Recently, a kind of new developed technique, commonly named as adaptive/approximate dynamic programming [5-9], provides a mean to solve the HJB equation. There are some significant

CR IP T

results about ADP method in [10-15]. In [10], a self-teaching adaptive dynamic programming is presented to tackle the common Gomoku problem. In [11], a kind of robust adaptive dynamic programming is proposed to analyze the stability of nonlinear systems. In [12], the adaptive dynamic programming is used for partially-unknown constrained-input continuous-time systems. And, completely unknown nonlinear discrete-time Markov jump systems are considered using the

AN US

ADP method in [13]. ADP method is also successfully applied in [14-19].

However, the all the above studies are time-driven, that is, controllers are updated all the time, which is a waste of the network resources and possibility blocks channels due to the mass of transmission data. In order to settle that, event-driven control (EDC) has become an alternate control method due to its effectiveness in the utilization of network resource, where less data is

M

used. The EDC strategy is that an event is designed to sample the system state and update the controller in an aperiodic way. The event is usually a condition which is related to error between sampled states and true states. Only when the event occurs, system state is sampled

ED

and controller is updated. In this case, computational costs are reduced for systems. To date, various types of event-driven controllers [20-25] have been successfully applied to the dynamic

PT

behavior analysis of different systems. Those system dynamics are considered as linear [20-23], and nonlinear systems [24-25]. Soon afterwards, the event-driven approach can be also used to

CE

obtain high performance for a BLDC motor in [26]. It can be clearly found that EDC control in the above works [20-25] rarely combines with ADP. But, the combination of both has achieved good results, see references [27-33]. In [27-

AC

28], the authors have combined their optimal adaptive control algorithm with EDC for nonlinear systems. In [26], the authors have studied the event-driven H∞ control for nonlinear systems via ADP method. Then the optimal EDC for nonlinear discrete-time systems are discussed in [30-31]. And, the near optimal EDC for nonlinear discrete-time systems is investigated in [31], where three NNs (identifier network, critic network and actor network) are used to solve the problem and the weights update of three NNs are event-driven. In [32], the input-constrained nonlinear H∞ state feedback control under event-based environment is investigated. Afterward, the event-triggered optimal control for partially-unknown constrained-input systems is discussed, see [33]. Among all ADP-based works, the constrained control inputs are taken into account in [11], 2

ACCEPTED MANUSCRIPT

[15] and [33]. In practical applications, the control inputs are constrained by some bounds. So, constrained-input optimal control becomes a non-negligible problem which should be discussed. In the paper, we focus on the EDC optimal control of uncertain nonlinear systems with exter-

CR IP T

nal disturbance via adaptive dynamic programming. The main contributions of this paper are summarized in four parts:

(1) Uncertain nonlinear systems with external disturbance are considered via EDC-based ADP method. A model-dependent approach is employed which is a different method from [19]. In [19], a model-free approach to the Hamilton-Jacobi-Isaacs equation is developed based on the

AN US

policy iteration method. Although the authors have taken the external disturbance into account in [19], the optimal control is time-driven, not event-driven. Moreover, comparing with the works in [27-33] via EDC-based ADP method, this paper considers the external disturbance. (2) The constrained control inputs are taken into account in this paper. The constrained control inputs are considered in [12], [15] and [33], but they do not employ the EDC in [12] and [15], and

M

system dynamics of [33] do not include the external disturbance.

(3) The weights’ updates of three NNs are event-driven, and their update laws are obtained. (4) Asymptotically stable condition for the uncertain nonlinear system is obtained with the

ED

event-driven controller. And, the system state and the estimation error of neural networks are proved to be uniform ultimate boundedness (UUB) via adaptive dynamic programming.

PT

The remainder of this paper is organized as follows. The section 2 formulates some preliminaries for optimal control theory. Event-driven optimal controller and its results are presented in the

CE

section 3. The section 4 states ADP-based event-driven optimal control and its main results. In section 5, numerical results are shown to verify the theoretical analysis. The section 6 contains some main conclusions.

AC

Notations: Let 1 denote a column vector with all entries equal to one. kRk is the Euclidean

norm for vectors in RN or the induced 2-norm for matrices in RM ×N . 2. Preliminaries for Optimal Control Theory

Consider the nonlinear continuous-time system with partially constrained-input, which can be described by: x˙ (t) = f (x) + g1 (x) u1 (t) + g2 (x) u2 (t) + k (x) w (t) , |u2 (t)| ≤ λ,

3

(1)

ACCEPTED MANUSCRIPT

where x (t) ∈ Rn is system state, ui (t) ∈ Rm , i = 1, 2 are the control input and constrained-input

u2 (t) with a positive bound λ. The f (x) ∈ Rn is the unknown system dynamic, g1 (x) ∈ Rn×m ,

g2 (x) ∈ Rn×m and k (x) ∈ Rn×m are the unknown gain matrix. w (t) ∈ Rm is the external

CR IP T

disturbance and satisfies w (t) ∈ L2 [0, +∞). We assume the system (1) is stabilizable on a

campact set Ω. For f (x) ∈ Rn , g1 (x) ∈ Rn×m , g2 (x) ∈ Rn×m and k (x) ∈ Rn×m , the following

assumption is needed.

Assumption 1:[33] f (x) ∈ Rn , g1 (x) ∈ Rn×m , g2 (x) ∈ Rn×m and k (x) ∈ Rn×m are Lipschitz

continuous on a compact set Ω and have

AN US

(1) f (0) = 0, kf (x)k ≤ Lf kxk with a positive constant Lf and kw (t)k ≤ w. ¯

(2) kg1 (x)k ≤ bg1 , kg2 (x)k ≤ bg2 and kk (x)k ≤ bw with positive constants bg1 , bg2 and bw .

We design the controllers u1 (t) and u2 (t) to render the following cost index function is nonpositive for all w(t): V (x, u1 , u2 , w) =

Z

∞

t

xT Qx + uT1 (s) Ru1 (s) + U (u2 (s)) − γ 2 wT (s) w (s) ds,

(2)

M

where Q > 0, R > 0, γ > 0 and U (u2 ) is a positive non-quadratic function. U (u2 ) is chosen the Z

u2

ED

following form in [34]: U (u2 ) =

2λtanh−1 (s/λ)T Λds,

(3)

0

PT

where Λ > 0 with Λ = diag (r1 , r2 , ..., rm ). It follows from (3) that ¯ ln 1 − u22 /λ2 , U (u2 ) = 2λuT2 Λtanh−1 (u2 /λ) + λ2 Λ

CE

(4)

¯ = [r1 , r2 , ..., rm ]. The derivation of formula (2) with respect to t is given as: where Λ

AC

∇V T (f (x) + g1 (x) u1 (t) + g2 (x) u2 (t) + k (x) w (t)) + xT Qx + uT1 Ru1 + U (u2 ) − γ 2 wT w = 0, (5) where ∇V is the partial derivative of V with respect to x. We define the Hamiltonian function as:

H (x, ∇V, u1 , u2 , w) = ∇V T (f (t) + g1 (x) u1 (t) + g2 (x) u2 (t) + k (x) w (t)) + xT Qx + uT1 Ru1 + U (u2 ) − γ 2 wT w

.

(6)

The optimal control is to find the controllers and disturbance policies, i.e, it corresponds to the solution of the following Hamilton-Jacobi-Bellman equation: V ∗ (x, u1 , u2 , w) = min max H (x, ∇V, u1 , u2 , w) . u1 ,u2

w

4

(7)

ACCEPTED MANUSCRIPT

According to the stationary condition, the optimal controller and disturbance policies are established by: u∗1 = − 12 R−1 g1T ∇V ∗

CR IP T

Λ−1 g2T ∇V ∗ /2λ .

u∗2 = −λ tanh

w∗ = 21 γ −2 k T ∇V ∗

(8)

It is clearly found that the optimal controllers and disturbance policies are updated all the time, which causes the great amount of resource consumption for the network system with limited bandwidth. This drives us to employ the event-driven strategy to reduce bandwidth usage.

AN US

Remark 1: Compared with other algorithms, our algorithm can deal with the uncertain nonlinear systems with external disturbance.

3. Event-Driven Optimal Control and Main Results

M

In this subsection, we introduce the event-driven control strategy. The triggering instant sequences are set {t0 , t1 , ..., tk , tk+1 , ....} if the triggered condition is satisfied, and x (tk ) is the sampled

ED

state at the triggering instant tk . Then, the controllers are given:

(9)

PT

u1 (t) = u∗1 (x (tk )) = − 12 R−1 g1T (x (tk )) ∇V ∗ (x (tk )) . u2 (t) = u∗2 (x (tk )) = −λ tanh Λ−1 g2T ∇V ∗ (x (tk )) /2λ , t ∈ [tk , tk+1 )

The update of the controller relies on an event condition rather than the fixed time instant.

CE

The errors of the sampled state x (tk ) and real-time state x(t) are defined as e (t) = x (tk ) − x (t) , t ∈ [tk , tk+1 ) ,

(10)

AC

where e(tk ) = 0 at the triggering instant tk . Then, the system (1) evolves into: x˙ (t) = f (x) + g1 (x) u∗1 (x (t) + e (t)) + g2 (x) u∗2 (x (t) + e (t)) + k (x) w (t) .

Let D1∗ = − 21 R−1 g1T (x (tk )) ∇V ∗ (x (tk )), D2∗ =

then the following assumption is needed.

1 −1 T Λ g2 ∇V ∗ 2λ

(x (tk )) and D3∗ = 21 γ −2 k T ∇V ∗ (x (tk ))

Assumption 2: Di∗ is Lipschitz continuous on a compact set Ω and satisfies: kDi∗ (x1 ) − Di∗ (x2 )k ≤ LDi kx1 − x2 k , where LDi > 0, i = 1, 2, 3. 5

(11)

ACCEPTED MANUSCRIPT

Theorem 1: Suppose the Assumptions 1-2 hold. Consider the system (1) under the optimal solution V ∗ (x, u1 , u2 , w) of Hamilton-Jacobi-Bellman equation (7) and the event-driven optimal

CR IP T

controller (9), the system (1) is asymptotically stable with the triggering condition s p (1 − β) (1 − τ ) xT (tk ) Qx (tk ) + U (u∗2 (x (tk ))), ke (t)k = 1 2 L2 kΛk − 1 kQk + λ D τ

(12)

where 0 < β < 1,0 < τ < 1 and the gain parameters satisfy g1 R−1 R−1 g1 ≥ γ 2 kk T .

Proof. Since the optimal value V ∗ (x, u1 , u2 , w) is positive definite, we choose it as the Lyapunov

AN US

function. By the simple calculation, we have:

2 ∗ ∗ 2¯ ∗T V˙ ∗ (x, u1 , u2 , w) = −xT Qx − u∗T g2 u∗2 (x (tk )) + γ 2 w∗T w∗ .(13) 1 Ru1 − λ Λ ln 1 − tanh D2 + ∇V The third term in (13) can be rewritten as: 2

λ Λ ln 1 − tanh

D2∗

Z

=

u∗2 (x)

u∗2 (x(tk ))

2λtanh−1 (s/λ) Λds + U (u∗2 (x (tk ))) − ∇V ∗T g2 tanh (D2∗ ) . (14)

M

2¯

And the fourth term in (13) can be rewritten as: g2 u∗2

Z

u∗2 (x(tk ))

ED

∇V

∗T

(x (tk )) =

u∗2 (x)

2λD∗2 T Λds − λ∇V ∗T g2 tanh (D2∗ ) .

(15)

PT

Substituting (14) and (15) into (13) becomes

CE

2 ∗T ∗ ∗ ∗ V˙ ∗ = −xT Qx − u∗T 1 Ru1 − U (u2 (x (tk ))) + γ w w . T R u∗ (x(t )) + u∗2(x) k 2λtanh−1 (s/λ) + D2∗ Λds

(16)

2

According to the Young’s inequality 2ab ≤ τ a2 + (1/τ ) b2 with τ > 0, the first term of (16) can

AC

be inverted into

xT (t) Qx (t) = xT (tk ) Qx (tk ) − 2xT (tk ) Qe (t) + eT (t) Qe (t) ≥ (1 − τ ) xT (tk ) Qx (tk ) − τ1 − 1 eT (t) Qe (t) .

(17)

Let s = −λ tanh (w), then the last term in (16) becomes: R u∗2 (x(tk ))

2λtanh−1 (s/λ) + D2∗

u∗2 (x) R D∗ (x(t )) ≤ D∗2(x) k 2

2λ2 (w − D2∗ )T Λdw

T

Λds

= λ2 (D2∗ (x (tk )) − D2∗ (x (t)))T Λ (D2∗ (x (tk )) − D2∗ (x (t))∗ )

≤ λ2 L2D2 kΛk ke (t)k2

6

,

(18)

ACCEPTED MANUSCRIPT

where the Assumption 2 (kD2∗ (x1 ) − D2∗ (x2 )k ≤ LD2 kx1 − x2 k) is used. Substituting (17) and (18) into (16), we have:

CR IP T

V˙ ∗ (x, u1 , u2 , w) ≤ − (1 − τ ) xT (tk ) Qx (tk ) − 14 ∇V ∗T g1 R−1 R−1 g1 − γ 2 kk T ∇V ∗T . −U (u∗2 (x (tk ))) + τ1 − 1 kQk + λ2 L2D2 kΛk ke (t)k2

Let g1 R−1 R−1 g1 ≥ γ 2 kk T , one has:

V˙ ∗ (x, u1 , u2 , w) ≤ − (1 − τ ) xT (tk ) Qx (tk ) − U (u∗2 (x (tk ))) . + τ1 − 1 kQk + λ2 L2D2 kΛk ke (t)k2

(19)

(20)

AN US

Under the triggering condition (12), the V˙ ∗ (x, u1 , u2 , w) can be written by

V˙ ∗ (x, u1 , u2 , w) ≤ −β (1 − τ ) xT (tk ) Qx (tk ) + U (u∗2 (x (tk ))) ≤ 0.

(21)

Through the above discussion, we get V˙ ∗ (x, u1 , u2 , w) ≤ 0, which implies that the system (1)

is stable. The proof of Theorem 1 is completed. The following theorem is used to prove the

M

exclusion of the Zeno-behavior.

Theorem 2: Under the event-driven optimal controllers (9) and the triggering condition (12),

ED

the minimum interval between two triggering time instants has a lower bound

where P =

r

[[(

P ≤ tk+1 − tk , ∀k, L (P + 1)

PT

0<

1 (1−β)(1−τ )λmin (Q) 2 +(1−β)(1−τ ) kQk+2λ2 L2D kΛk 2

)

1 −1 τ

]

]

(22)

and L = Lf + bg1 LD1 + bg2 LD2 + bw LD3 .

CE

Proof. It follows from triggering condition (12) that event is driven if the following equation

AC

holds: 1 2 2 − 1 kQk + λ LD2 kΛk ke (t)k2 = (1 − β) (1 − τ ) xT (tk ) Qx (tk ) + U (u∗2 (x (tk ))) . (23) τ Note that

xT (tk ) Qx (tk ) = xT (t) Qx (t) + 2xT (t) Qe (t) + eT (t) Qe (t) ≥ xT (t) Qx (t) − 12 xT (t) Qx (t) − 2eT (t) Qe (t) + eT (t) Qe (t) .

(24)

= − 12 xT (t) Qx (t) − eT (t) Qe (t) Substituting (24) into (23) yields:

(1 − β) (1 − τ ) λmin (Q) kx (t)k2 . 1 ≤ − 1 + (1 − β) (1 − τ ) kQk + 2λ2 L2D2 kΛk ke (t)k2 τ 1 2

7

(25)

ACCEPTED MANUSCRIPT

Because e(tk ) = 0 at the triggering time tk , the time that ke (t)k / kx (t)k evolves from 0 to P

gives a lower bound for the minimum inter-event time, as is described in [33]. Here P is given after (22).

CR IP T

Under the Assumption 1 and 4, the system state satisfies: kx˙ (t)k ≤ Lf kxk + bg1 u∗1 (x (t) + e (t)) + bg2 u∗2 (x (t) + e (t)) + bw w (t)

≤ Lf kxk + bg1 LD1 kx (t) + e (t)k + bg2 LD2 kx (t) + e (t)k + bw LD3 kx (t)k ≤ (Lf + bw LD3 ) kxk + (bg1 LD1 + bg2 LD2 ) kx (t) + e (t)k ≤ L kx (t) + e (t)k

,

(26)

that d dt

ke (t)k kx (t)k

AN US

where L = Lf + bg1 LD1 + bg2 LD2 + bw LD3 . It follows from Theorem III.1 given in [35] and (26) 2 ke (t)k . ≤L 1+ kx (t)k

The proof of Theorem 2 is completed.

P . L(1+P )

M

As a consequence, the time that ke (t)k / kx (t)k evolves from 0 to P is greater than 0 <

(27)

ED

4. ADP-Based Event-Driven Optimal Control and Main Results In the above section, we only give the event-driven optimal control policies due to the unknown

PT

system dynamics. In this section, we resort to ADP method to solve the HJB equation. That is, three kinds of neural networks (identifier network, critic network and actor network) are in-

CE

troduced to approximate the uncertain dynamics, the optimal value, control inputs and external disturbance, respectively. In the following, three kinds of neural networks (identifier network,

AC

critic network and actor network) are designed, respectively. 4.1. Identifier Network Design The universal approximation is used to represent the unknown dynamics f (x), g1 (x), g2 (x) and

k(x), and the system (1) is described by: x˙ (t) = WIT σI (x (t)) u¯I (t) + εI (x) ,

(28)

T T where WI = WfT , WgT1 , WgT2 , WkT is the ideal weight matrix of NNs, u¯I (t) = 1, uT1 (t) , uT2 (t) , wT (t)

is the augmented control input, σI (x (t)) = diag {σf (x) , σg1 (x) , σg2 (x) , σk (x)} is the NN activation function matrix and εI (x) is the reconstruction error. An estimate of the system (1) is 8

ACCEPTED MANUSCRIPT

constructed by using the following dynamics: ˆ IT σI (ˆ x (t)) u¯I (t) + µ, ˆ˙x (t) = W

(29)

CR IP T

h iT T ˆ T ˆ T ˆ T ˆ ˆ where xˆ (t) is the estimate of state x(t), WI = Wf , Wg1 , Wg2 , Wk is the weight matrix estimate

and µ is the RISE feedback term defined in [37]. Then, we show an existing conclusion which is given in Theorem 2 of [37].

ˆ T, W ˆ T, W ˆ T and W ˆ T for the (28) and Corollary 1: There exist weight update laws for W g1 g2 f k

t→∞

AN US

the identifier (29). Then, the unknown dynamics f (x), g1 (x), g2 (x) and k(x) can be accurately

estimated by fˆ (x), gˆ1 (x), gˆ2 (x) and kˆ (x) with lim fˆ (x) − f (x) = 0, lim kˆ g1 (x) − g1 (x)k = 0, t→∞ t→∞

lim kˆ g2 (x) − g2 (x)k = 0 and lim kˆ (x) − k (x) = 0. t→∞

4.2. Critic and Actor Network Design

According to the Weierstrass high-order approximation theorem [36-37], the optimal value is

M

approximated over the compact Ω as follows:

ED

V ∗ (x) = WVT φ (x) + εV (x) , ∇V ∗ (x) = ∇φ (x) WV + ∇εV (x) ,

(30)

where WV is the ideal weight, φ(x) is the linearly independent basis function, εV (x) is the approximation error, we let Dd denote numbers of hidden neurons and lim εV (x) = 0. Then, optimal Nd →∞

PT

controllers and disturbance are represented by u∗1 = − 21 R−1 g1T (x) (∇φ (x) WV + ∇εV (x))

CE

u∗2 = −λ tanh

Λ−1 g2T (x) (∇φ (x) WV + ∇εV (x)) /2λ .

(31)

w∗ = 21 γ −2 k T (x) (∇φ (x) WV + ∇εV (x))

AC

To identify the ideal value, a critic network is designed to estimate the V ∗ (x) using ˆ VT φ (x) , ∇Vˆ (x) = ∇φ (x) W ˆV . Vˆ (x) = W

(32)

Similarly, optimal controllers and disturbance are estimated using the actor networks with ˆ u1 uˆ1 = − 21 R−1 gˆ1T (x) ∇φ (x) W ˆ u2 /2λ , uˆ2 = −λ tanh Λ−1 gˆ2T (x) ∇φ (x) W ˆw wˆ = 21 γ −2 kˆT (x) ∇φ (x) W

9

(33)

ACCEPTED MANUSCRIPT

ˆ u1 , W ˆ u2 and W ˆ w are the networks weights, and we define estimation errors of the weights where W ˜V = W ˆ V − WV , W ˜ u1 = W ˆ u1 − WV , W ˜ u2 = W ˆ u2 − WV and W ˜w = W ˆ w − WV . Then, the as W Hamiltonian function is estimated by

+ xT Qx + uˆT1 Rˆ u1 + U (ˆ u2 ) − γ 2 wˆ T wˆ

CR IP T

T ˆ ˆ ˆ H (x, ∇V, u1 , u2 , w) = WV ∇φ (x) f (t) + gˆ1 (x) uˆ1 (t) + gˆ2 (x) uˆ2 (t) + k (x) wˆ (t)

.

(34)

AN US

Under the event-driven controller, we define the Hamiltonian error at the triggering instant as ˆ T ∇φ (x (tk )) fˆ (tk ) + gˆ1 (x (tk )) uˆ1 (tk ) + gˆ2 (x (tk )) uˆ2 (tk ) + kˆ (x (tk )) wˆ (tk ) eH(tk ) = W V . (35) T u1 (tk ) + U (ˆ u2 (tk )) − γ 2 wˆ T (tk ) wˆ (tk ) + x (tk ) Qx (tk ) + uˆT1 (tk ) Rˆ ˆV We use the gradient-descent algorithm with the partial derivative of eH(tk ) with respect to W (36)

ˆ V follows: So, the gradient-descent update law of W   W ˆ V (tk−1 ) − αV ∂eH (tk ) , t = tk , ˆV ∂W ˆ V (tk ) = W  W ˆ V (tk−1 ) , t ∈ (tk−1 , tk ) ,

(37)

ED

M

∂eH(tk ) = ∇φT (x (tk )) fˆ (tk ) + gˆ1 (x (tk )) uˆ1 (tk ) + gˆ2 (x (tk )) uˆ2 (tk ) + kˆ (x (tk )) wˆ (tk ) . ˆV ∂W

AC

CE

PT

ˆ u1 , where αV is the learning gain. According to the derivation of the optimal policy, the weights W ˆ u2 and W ˆ w are expected to approach the W ˆ V , and their update laws is as follows: W   W ˆ u1 (tk−1 ) + αu1 W ˆ V (tk−1 ) − W ˆ u1 (tk−1 ) , t=tk , ˆ u1 (tk ) = W  W ˆ (t ) , t ∈ (tk−1 , tk ) ,  u1 k−1  W ˆ u2 (tk−1 ) + αu2 W ˆ V (tk−1 ) − W ˆ u2 (tk−1 ) , t=tk , ˆ u2 (tk ) = W (38)  W ˆ u2 (tk−1 ) , t ∈ (tk−1 , tk ) ,   W ˆ w (tk−1 ) + αw W ˆ V (tk−1 ) − W ˆ w (tk−1 ) , t=tk , ˆ w (tk ) = W  W ˆ (t ) , t ∈ (t , t ) , w

k−1

k−1

k

where αu1 , αu2 and αw are the learning gain.

Assumption 3: The partial derivative of NN basis function is Lipschitz continuous on a compact set Ω and satisfies: k∇φ (x1 ) − ∇φ (x2 )k ≤ Lφ kx1 − x2 k , where Lφ > 0. 10

ACCEPTED MANUSCRIPT

Theorem 3: Suppose the Assumptions 1-2 hold. Consider the system (1) with NNs (28)-(29) and (32)-(33). The transmission of system(1)’s states and update of NNs are based on event-driven

CR IP T

with the triggering condition s p (1 − β) (1 − τ ) xT (tk ) Qx (tk ) + U (ˆ u2 (x (tk ))). ke (t)k = 1 2 L2 kΛk − 1 kQk + λ D τ 2

(39)

The weights’ update laws for NNs are shown by (37) and (38), and learning gains satisfy 2αV d2 + 1 α 2 u1

+ 12 αu2 + 21 αw < c, 12 αu1 + 2αV d2 < 1, 21 αu2 + 2αV kKk2 < 1 and 12 αw − 2αV kM k2 < 1,

where 0 < c ≤ min H, 0 < d ≤ max H and K, M are defined after (42). Then, the system k

Proof. We define the Lyapunov function as:

AN US

k

states and estimation errors are UUB.

L = V ∗ (x) + L1 + L2 + L3 + L4 . ˜w ˜ T α−1 W ˜ V + 1W ˜ T α−1 W ˜ u2 + 1 W ˜ T α−1 W ˜ u1 + 1 W ˜ T α−1 W = V ∗ (x) + 12 W u2 u2 u1 u1 w w V V 2 2 2

(40)

another is in inter-event intervals.

M

In the following analysis, we divide into two parts to prove it. One is at triggering instants,

ED

(1) At the triggering instants:

For this case, the weights’ update laws are rewritten at triggering instants in the form of discrete time dynamics:

AC

CE

PT

ˆ V (k + 1) = W ˆ V (k) − αV ∇φT (x (k)) fˆ (k) + gˆ1 (x (k)) uˆ1 (k) + gˆ2 (x (k)) uˆ2 (k) + kˆ (x (tk )) wˆ (k) , W ˆ ˆ ˆ ˆ Wu1 (k + 1) = Wu1 (k) + αu1 WV (k) − Wu1 (k) , (41) ˆ u2 (k + 1) = W ˆ u2 (k) + αu2 W ˆ V (k) − W ˆ u2 (k) , W ˆ w (k + 1) = W ˆ w (k) + αw W ˆ V (k) − W ˆ w (k) . W

For the first equation in (41), we use the mean value theorem to the term ˆ u2 /2λ , then, there exists a vector ξk satisfying uˆ2 = −λ tanh Λ−1 g2T (x) ∇φ (x) W ˆ λ∇φ (x (k)) gˆ2 (x (k)) tanh (x) ∇φ (x) Wu2 /2λ , ˆ u2 = 21 ∇φT (x (k)) gˆ2 (x (k)) 1 − tanh2 (ξk ) Λ−1 gˆ2T (x) ∇φ (x) W T

where 0 ≤ ξk ≤

Λ−1 gˆ2T

Λ−1 gˆ2T

ˆ ˆ V (k) as follows: (x) ∇φ (x) Wu2 /2λ. So, we can rewrite the W

˜ V (k + 1) = W ˜ V (k) − αV H W ˜ V − HW ˜ u1 − K W ˜ u2 + M W ˜ w + εk , W 11

(42)

ACCEPTED MANUSCRIPT

where H = 12 ∇φT (x (k)) gˆ1 (x (k)) R−1 gˆ1T (x) ∇φ (x), K = 12 ∇φT (x (k)) gˆ2 (x (k)) 1 − tanh2 (ξk ) Λ−1 gˆ2T (x) ∇φ (x), ˆV . M = 21 γ −2 ∇φT (x (k)) kˆ (x (tk )) kˆT (x) ∇φ (x) and εk = ∇φT (x (k)) fˆ (k) − (K − M ) WV − H W

CR IP T

It follows from (42) that

2

2

˜ 2 2 ˜ 2 ˜ k∇L1 k ≤ − (c − 2αV d2 ) W + 2α d W + 2α kKk W

V V u1 V u2

2

,

˜

˜

˜ 2 + 2αV kM k2 W w + 4αV d kKk Wu1 Wu2 + 2αV kεk k

where 0 < c ≤ min H and 0 < d ≤ max H. k

k

(44)

M

i h

˜ ˜ ˜ ˜ T and on the basis of (43)-(44), we obtain:

WV Wu1 Wu2 Ww

   where A =   

ED



k∆L1 k + k∆L2 k + k∆L3 k + k∆L4 k ≤ −z T Az + 2αV kεk k2 , (45)  A11 A12 A12 A12  0 A22 4αV d kKk 0    with A11 = c − 2αV d2 − 21 αu1 − 21 αu2 − 12 αw , A12 = 0 0 A33 0   0 0 0 A44

PT

Let z =

AN US

According to (41), the k∇L2 k, k∇L3 k and k∇L4 k can be similarly expressed as

˜ 2 1

˜ 2

˜

˜ 1 k∆L2 k ≤ − 1 − 2 αu1 Wu1 + 2 αu1 WV + (1 + αu1 ) WV Wu1

˜ 2 1

˜ 2

˜

˜ . k∆L3 k ≤ − 1 − 21 αu2 W + α W + (1 + α W W ) u2 u2 V u2 2 u2 V

2

2

˜

˜

˜

˜ 1 W + α W + (1 + α ) k∆L4 k ≤ − 1 − 21 αw W

V Ww w V w 2 w

(43)

CE

− (1 + αu1 ), A22 = 1 − 21 αu1 − 2αV d2 , A33 = 1 − 12 αu2 − 2αV kKk2 and A44 = 1 − 21 αw − 2αV kM k2 .

It can be found that if all eigenvalues of A√are positive constants, the estimation errors of weights

AC

are UUB with the residual bound kzk ≤

2αV λmin (A)kεk k2 λmin (A)

(2) In inter-event intervals:

and lim kzk = 0. εk =0

For this case, the weights keep unchanged. Then, the Lyapunov function is L = V ∗ (x) . It

follows from the analysis of Theorem 1 that the system state is UUB with similar event-driven condition (39). From the above discussion, the proof of Theorem 3 is completed. Remark 2: The Assumption 3 is the special case of the Assumption 2, so, in the 4.2 section, the proof for the exclusion of Zeno-behavior is similar with that of Theorem 2.

12

ACCEPTED MANUSCRIPT

5. Simulation examples In the overhead crane system, a load is suspended to a trolley which moves along an overhead track [38]. The system is under-actuated. Both its motion control and stable control have attracted can be described by 



0

x4

ml sin(x2 )x24 +mg sin(x2 ) cos(x2 ) m+M sin2 (x2 ) ml sin(x2 )x24 +(m+M )g sin(x2 ) − − kt x4 (m+M sin2 (x2 ))l





0





       − g2 (x) kr |x3 | x3 ,g2 (x) =       

AN US

   f (x) =    



x3

CR IP T

considerable interests from the field of control technology. The dynamics of overhead crane system 0 0

sin2 (x2 ) m+M sin2 (x2 ) cos(x2 ) − m+M sin2 (x2 ))l (



   .   

M

       0  0     g1 (x) =  . , k (x) =   sin (x3 )   x1      cos (x4 ) x2 And the initial disturbance is w (t) = 7 sin (t) exp (−0.3t), where M is the mass of the load, m is the mass of the trolley, l is the length of the suspension rope. kt and kr denote friction coefficients

ED

of the trolley and the load, respectively. The state vector is composed of [x1 , x2 , x3 , x4 ], where x1 and x2 are the trolley position and the load angle, and x3 and x4 are their derivatives. Let M = 7kg, m = 1.025kg, l = 0.75m, g = 9.8m/s2 , kt = 0.3, kr = 0.5, and we select Q =

PT

diag {0.1, 1.5, 0.15, 0.01}, R = 0.1, LDi = 2.7, i = 1, 2, 3, |u2 | ≤ λ = 9, β = 0.7, τ = 0.4, γ = 1. 1 4 1 4 T The basis functionφ (x) = x21 , x1 x2 , 31 x1 x3 , 31 x1 x4 , x22 , 31 x2 x3 , 13 x2 x4 , 19 x23 , 19 x3 x4 , 91 x24 , x41 , x42 , 81 x3 , 81 x4

CE

and Nd = 30. The evolutions of states xi (t)(i = 1, 2, 3, 4) during online learning are depicted in ˆ V . The Figure 1, where the system is stable. Figure 2 shows the update curves of weight W

AC

ˆ u1 , W ˆ u2 , W ˆ w are analogous, which are given in Figure 3 ((a),(b),(c)), update curves of weights W respectively. Figure 4 shows trajectories of event-triggered control variables, and Figure 5 depicts trajectories of error and triggering threshold under event-triggered control strategy.

13

ACCEPTED MANUSCRIPT

6

−25

5

5

x 10

x1 x2 x3

0

x4

3 −5 1900

xi(t)

2

1950

1 0

−2 −3 0

500

1000 t/s

1500

2000

M

−4

2000

AN US

−1

CR IP T

4

ED

Figure 1: Evolution of states during online learning.

20

PT

15

CE

10

AC

v

W (t)

5 0

20

−5

1.5 1

−10

0

−15

−20

−20

0.5

0

0

0 100 1990

50 500

1000 t/s

1995 1500

2000 2000

ˆ V during online learning. Figure 2: The update curves of weight W

14

ACCEPTED MANUSCRIPT

15

Wu1(t)

5

0

−5

20

1.5 1

−10

0 0.5

−15

−20

0

0

0 100 1990

50 500

1000 t/s

1995 1500

2000

2000

AN US

−20

CR IP T

10

(a)

15

10

0

−5

M

Wu2(t)

5

20

−10

0

−15

ED

−20

0

0

500

1 0.5 0 100 1990

50

1000 t/s

1995 1500

2000 2000

(b)

PT

−20

1.5

15

5

0

AC

Ww(t)

CE

10

−5

20

1.5 1

−10

0 0.5

−15

−20

−20

0

0

0 100 1990

50 500

1000 t/s

1995 1500

2000 2000

(c) ˆ u during online learning; (b) The update curves of weight W ˆ u during Figure 3: (a) The update curves of weight W 1 2 ˆ online learning; (c) The update curves of weight Ww during online learning.

15

ACCEPTED MANUSCRIPT

0.6 u1 u2

0.4

u3

CR IP T

u4

0.2

u(t)

0

−0.2

AN US

−0.4

−0.6

−0.8

0

20

40

60

80

100

M

t/s

ED

Figure 4: Trajectories of event-triggered control variables.

6. Conclusion

PT

In this paper, we have investigated the optimal control of uncertain nonlinear systems with external disturbance via adaptive dynamic programming, where two kinds of controllers are intro-

CE

duced and event-driven strategy is used to update the controller as well as the network weights. One controller is constrained-input, and another is used to offset the external disturbance. Three kinds of neural networks (identifier network, critic network and actor network) are used to ap-

AC

proximate the uncertain dynamics, the optimal value, control inputs and external disturbance, respectively. To the best knowledge of the authors, the advantage of this paper is that almost all the existing systems about optimal control via adaptive dynamic programming can be considered as special cases of this system, where the external disturbance is considered. By designing appropriate parameters, the uncertain nonlinear system is asymptotically stable with the event-driven controller. And, under the given learning rates of neural networks, the system state and the estimation error of neural networks are proved to be uniform ultimate boundedness (UUB) via adaptive dynamic programming. Finally, numerical simulation results are presented to verify the theoretical analysis. 16

ACCEPTED MANUSCRIPT

0.7 ||ek|| threshold

CR IP T

0.6

k

||e ||/threshold

0.5

0.4

0.3

AN US

0.2

0.1

0

0

20

40

60

80

100

M

t/s

ED

Figure 5: Trajectories of error and triggering threshold under event-triggered control strategy.

Acknowledgment

PT

This work was supported in part by the National Key Research and Development Program of China under Grant 2016YFB0800601, in part by the National Natural Science Foundation

CE

of China under Grant 61472331 and 61503310, in part by Fundamental Research Funds for the Central Universities under Grant XDJK2016E031 and XDJK2016B018, in part by the Natural

AC

Science Foundation project of CQCSTC under Grant ctsc2014cyjA40053 and cstc2016jcyjA0559. References

[1] F. L. Lewis and V. L. Syrmos, Optimal Control. New York, NY, USA: Wiley, 1995. [2] K. Zhou, J. C. Doyle, and K. Glover, Robust and Optimal Control. Upper Saddle River, NJ, USA: Prentice-Hall, 1996. [3] T. Dong, A. J. Wang, X. F. Liao, Impact of discontinuous antivirus strategy in a computer virus model with the point to group, Applied Mathematical Modelling, Volume 40, Issue 4, pp. 3400-3409, 2016. 17

ACCEPTED MANUSCRIPT

[4] B. Ata, R. Coban, Artificial Bee Colony Algorithm Based Linear Quadratic Optimal Controller Design for a Nonlinear Inverted Pendulum, International Journal of Intelligent Systems and Applications in Engineering 3 (1), pp. 1-6, 2015.

Comput. Intell. Mag., vol. 4, no. 2, pp. 39-47, May 2009.

CR IP T

[5] F. Y. Wang, H. Zhang, and D. Liu, Adaptive dynamic programming: An introduction, IEEE

[6] W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, vol. 703. New York, NY, USA: Wiley, 2007.

Athena Scientific, 1996.

AN US

[7] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA, USA:

[8] D. Zhao, Z. Zhang, Y. Dai, Self-teaching adaptive dynamic programming for Gomoku, Neurocomputing 78 (1), pp.23-29, 2012.

M

[9] R. Coban, Dynamical Adaptive Integral Backstepping Variable Structure Controller Design for Uncertain Systems and Experimental Application, International Journal of Robust and

ED

Nonlinear Control, 2017. DOI: 10.1002/rnc.3810. [10] R. Coban, A Context layered locally recurrent neural network for dynamic system identifi-

PT

cation, Engineering Applications of Artificial Intelligence, 26 (1), pp. 241-250, 2013. [11] Y. Jiang and Z.-P. Jiang, Robust adaptive dynamic programming and feedback stabilization

CE

of nonlinear systems, IEEE Trans. Neural Networks Learn. Syst., vol. 25, no. 5, pp. 882-893, May. 2014.

AC

[12] H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuoustime systems, Automatica, vol. 50, no. 1, pp.193-202, Jan. 2014.

[13] H. Jiang, H. Zhang, Y. Luo, J. Wang, Optimal tracking control for completely unknown nonlinear discrete-time Markov jump systems using data-based re-inforcement learning method, Neurocomputing 194, pp.176-182, 2016. [14] Z. Ni, H. He, D. Zhao, X. Xu, and D. V. Prokhorov, GrDHP: A general utility function representation for dual heuristic dynamic programming, IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 3, pp. 614-627, Mar. 2015. 18

ACCEPTED MANUSCRIPT

[15] Q. Chao Zhang, D. Bin Zhao, Y. Heng Zhu, Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs, Neurocomputing, Volume 238, 17, pp.377-386, May 2017.

CR IP T

[16] A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control, Automatica, vol. 43, no. 3, pp. 473-481, 2007.

[17] D. Liu, H. Li, and D. Wang, Neural-network-based zero-sum game for discrete-time nonlinear

AN US

systems via iterative adaptive dynamic programming algorithm, Neurocomputing, vol. 110, no. 13, pp. 92-100, 2013.

[18] H. Zhang, C. Qin, B. Jiang, and Y. Luo, Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems, IEEE Trans.

M

Cybern., vol. 44, no. 12, pp. 2706-2718, Dec. 2014.

[19] Y. Zhu, D. Zhao and X. Li, Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data, in IEEE Transactions on Neural Networks

ED

and Learning Systems, vol. 28, no. 3, pp. 714-725, March 2017. [20] E. Garcia and P. J. Antsaklis, Model-based event-triggered control for systems with quan-

PT

tization and time-varying network delays, IEEE Trans. Autom. Control, vol. 58, no. 2, pp.

CE

422-434, Feb. 2013.

[21] X. Meng and T. Chen, Event detection and control co-design of sampled-data systems, In-

AC

ternational Journal of Control, vol. 87, no. 4, pp. 777-786, 2014. [22] A. J. Wang, T. Dong, X. F. Liao, Event-triggered synchronization strategy for complex dynamical networks with the Markovian switching topologies, Neural Networks, Vol. 74, pp.52-57, February 2016.

[23] X. Meng and T. Chen, Event based agreement protocols for multi-agent networks, Automatica, vol. 49, no. 7, pp. 2123-2132, July 2013. [24] P. Tabuada, Event-triggered real-time scheduling of stabilizing control tasks, IEEE Trans. Autom. Control, vol. 52, no. 9, pp. 1680-1685, Sep. 2007. 19

ACCEPTED MANUSCRIPT

[25] X. Wang and M. Lemmon, On event design in event-triggered feedback systems, Automatica, vol. 47, no. 10, pp. 2319-2322, Oct. 2011. [26] R. Horvat, K. Jezernik, and M. Curkovic, An event-driven approach to the current control of

CR IP T

a BLDC motor using FPGA, IEEE Trans. Ind. Electron., vol. 61, no. 7, pp. 3719-3726, Jul. 2014.

[27] K. Vamvoudakis, Event-triggered optimal adaptive control algorithm for continuous-time nonlinear systems, IEEE/CAA J. Autom. Sin., vol. 1, no. 3, pp. 282-293, Jul. 2014.

AN US

[28] X. Zhong, Z. Ni, and H. He, Event-triggered adaptive dynamic programming for continuoustime nonlinear system using measured input-output data, in Int. Joint Conf. Neural Networks (IJCNN), pp. 1-8, 2015.

[29] Q. Zhang, D. Zhao, and Y. Zhu, Event-triggered H1 control for continuous-time nonlinear system via concurrent learning, IEEE Trans. Syst. Man Cybern. Syst., DOI

M

10.1109/TSMC.2016.2531680, 2016.

ED

[30] A. Sahoo, H. Xu, and S. Jagannathan, Adaptive neural network-based event-triggered control of single-input single-output nonlinear discretetime systems, IEEE Trans. Neural Networks

PT

Learn. Syst., vol. 27, no. 1, pp. 151-164, Jan. 2016. [31] A. Sahoo, H. Xu and S. Jagannathan, Near Optimal Event-Triggered Control of Nonlinear Discrete-Time Systems Using Neurodynamic Programming, in IEEE Transactions on Neural

CE

Networks and Learning Systems, vol. 27, no. 9, pp. 1801-1815, Sept. 2016.

AC

[32] D. Wang, C. Xu Mu, Q. Chao Zhang, Derong Liu, Event-based input-constrained nonlinear state feedback with adaptive critic and neural implementation, Neurocomputing, 214, pp. 848-856, 2016.

[33] Y. Zhu; D. Zhao; H. He; J. Ji, Event-Triggered Optimal Control for Partially-Unknown Constrained-Input Systems via Adaptive Dynamic Programming, in IEEE Transactions on Industrial Electronics , vol. PP, no.99, pp.1-1, 2017. [34] M. Abu-Khalaf and F. L. Lewis, Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica, vol. 41, no. 5, pp. 779-791, May. 2005. 20

ACCEPTED MANUSCRIPT

[35] P. Tabuada, Event-triggered real-time scheduling of stabilizing control tasks, IEEE Trans. Autom. Control, vol. 52, no. 9, pp. 1680-1685, Sep. 2007. [36] B. A. Finlayson, The Method of Weighted Residuals and Variational Principles. New York,

CR IP T

NY, USA: Academic, 1990.

[37] K. Hornik, M. Stinchcombe, and H. White, Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks, Neural Netw., vol. 3, no. 5, pp. 551-560, 1990.

AN US

[38] J. Yu, F. Lewis, and T. Huang, Nonlinear feedback control of a gantry crane, in Proc.

AC

CE

PT

ED

M

American Control Conference, vol. 6, pp. 4310-4315, 1995.

21

CR IP T

ACCEPTED MANUSCRIPT

Aijuan Wang has received the B.S. degree from Qufu Normal University, ShanDong, China, in 2013. Currently, she is working toward the PhD degree in the College of Electronics and Information Engineering at Southwest University, China. Her research interest focuses on nonlinear

AN US

dynamical systems, optimal control, event-triggered control, neural networks and consensus of

ED

M

multi-agent systems.

Xiaofeng Liao received the BS and MS degrees in mathematics from Sichuan University, Chengdu,

PT

China, in 1986 and 1992, respectively, and the PhD degree in circuits and systems from the University of Electronic Science and Technology of China in 1997. From 1999 to 2001, he was involved in

CE

postdoctoral research at Chongqing University, where he is currently a professor. From November 1997 to April 1998, he was a research associate at the Chinese University of Hong Kong. From October 1999 to October 2000, he was a research associate at the City University of Hong Kong.

AC

From March 2001 to June 2001 and March 2002 to June 2002, he was a senior research associate at the City University of Hong Kong. From March 2006 to April 2007, he was a research fellow at the City University of Hong Kong. He holds 6 patents, and has published 5 books and over 300 international journal and conference papers. His current research interests include neural networks, nonlinear dynamical systems, bifurcation and chaos, and cryptography.

22

CR IP T

ACCEPTED MANUSCRIPT

Tao Dong has received the BS and MS degrees from the University of Electronic Science and Technology of China in 2004 and 2007. He has received the PhD degree in computer science in Chongqing University. Now, he is an associate professor at Southwest University. His research

AC

CE

PT

ED

M

and chaos, and congestion control model.

AN US

interest focuses on multi agent system, neural networks, nonlinear dynamical systems, bifurcation

23