Distributed optimal coordination control for nonlinear multi-agent systems using event-triggered adaptive dynamic programming method

Distributed optimal coordination control for nonlinear multi-agent systems using event-triggered adaptive dynamic programming method

ISA Transactions 91 (2019) 184–195 Contents lists available at ScienceDirect ISA Transactions journal homepage: www.elsevier.com/locate/isatrans Re...

688KB Sizes 0 Downloads 61 Views

ISA Transactions 91 (2019) 184–195

Contents lists available at ScienceDirect

ISA Transactions journal homepage: www.elsevier.com/locate/isatrans

Research article

Distributed optimal coordination control for nonlinear multi-agent systems using event-triggered adaptive dynamic programming method Wei Zhao a , Huaipin Zhang b , a b



School of Mathematics, Southeast University, Nanjing, 210096, PR China Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing, 210023, PR China

highlights • Compared with time-triggered ADP methods (Vamvoudakis et al., 2012), the proposed ETADP method can alleviate the communication load and computation cost since the data transmissions and the critic NN weight are only updated at the trigger instants.

• The PI algorithm is implemented based on distributed asynchronous scheme. • Compared with the critic-actor network framework (Abouheaf et al., 2014), the single critic network for each agent is proposed to approximate the value functions which simplifies the network structure and reduces the weight updated number.

article

info

Article history: Received 6 February 2018 Received in revised form 26 November 2018 Accepted 17 January 2019 Available online 24 January 2019 Keywords: Multi-agent systems Event-triggered sampling Distributed optimal coordination control Adaptive dynamic programming

a b s t r a c t This paper is concerned with the design of distributed optimal coordination control for nonlinear multiagent systems (NMASs) based on event-triggered adaptive dynamic programming (ETADP) method. The method is firstly introduced to design the distributed coordination controllers for NMASs, which not only avoids the transmission of redundant data compared with traditional time-triggered adaptive dynamic programming (TTADP) strategy and minimizes the performance function of each agent. The event-triggered conditions are proposed based on Lyapunov functional method, which is deduced by guaranteeing the stability of NMASs. Then a new adaptive policy iteration algorithm is presented to obtain the online solutions of the Hamiton–Jocabi–Bellman (HJB) equations. In order to implement the proposed ETADP method, the fuzzy hyperbolic model based critic neural networks (NN) are utilized to approximate the value functions and help calculate the control policies. In critic NNs, the NN weight estimations are updated at the event-triggered instants leading to aperiodic weight tuning laws so that computation cost is reduced. It is proved that the weight estimation errors and the local neighborhood coordination errors is uniformly ultimately bounded (UUB). Finally, two simulation examples are provided to show the effectiveness of the proposed ETADP method. © 2019 ISA. Published by Elsevier Ltd. All rights reserved.

1. Introduction Distributed coordination control of multi-agent systems (MASs) has attracted extensive attention and made great progress [1–4] due to its extensive applications in various fields including power systems [5], unmanned aerial vehicles [6], mobile robots [7], etc. In past decades, considerable efforts have been taken to achieve coordination while the performance index, a crucial factor that affects energy utilization of the system in practical applications, was rarely taken into account. Thus optimal coordination control problem is an open and interesting research orientation which ∗ Corresponding author. E-mail address: [email protected] (H. Zhang). https://doi.org/10.1016/j.isatra.2019.01.021 0019-0578/© 2019 ISA. Published by Elsevier Ltd. All rights reserved.

not only makes all the agents reach consensus but minimizes the energy utilization [8]. In [9–12], the authors proposed an idea, that is, optimal coordination control for MASs depends on the solutions of the coupled Hamilton–Jacobi–Bellman (HJB) equations, which are difficult or impossible to be obtained analytically. In order to settle the problem, a novel and promising approximate control technique, adaptive dynamic programming (ADP), was proposed in [13]. In ADP method, one incorporates adaptive critics, reinforcement learning and dynamic programming [13–15]. Recently, much attentions have been paid to the study of ADP and its related fields [16–19]. In [16], iterative ADP algorithm is used to solve the near-optimal control problem for a class of nonlinear discrete-time systems with control constraints. [19] proposed a data-driven adaptive tracking control approach based on ADP to study continuous-time HJB

W. Zhao and H. Zhang / ISA Transactions 91 (2019) 184–195

equation. Meanwhile, ADP was also utilized to solve the coupled HJB equations for multi-agent differential graphical games [9– 12,20]. In above time-driven ADP works, they implement both the systems’ stability and satisfactory performance under timetriggered control sampling while leading to much computational load. To mitigate the transmission of redundant data and higher computational cost in solving the HJB equation based on timetriggered control scheme, a new data-sampled framework, i.e., event-triggered control mechanism, is proposed [5,21–26]. The key idea of event-triggered control scheme is that the control task is executed only if a predefined trigger condition, which is closely dependent on the system state, is violated. Event-triggered control provides a practical way to determine when the control action is carried out, which guarantees that only the really necessary state signal will be sampled. Thus the amount of the sending state signals is relatively less. It is also proved that event-triggered sampling [27, 28] can efficiently reduce the number of control task execution so that communication resources can be saved significantly without compromising the system performance. Recently event-triggered adaptive dynamic programming (ETADP) method, combined ADP technique with event-triggered sampling mechanism under a single framework, for nonlinear systems was studied in many works [29–34]. ETADP technique was widely used to solve continuous-time HJB equation [30,31,33] and discrete-time HJB equation [32] appearing in nonlinear optimal control. And [34] introduced a stochastic ETADP-based technique for nonlinear systems in the presence of network induced delays and packet losses. To the best of our knowledge, there is still no discussion focused on the ETADP method for MASs, which inspires us to study the topic. Motivated by the above factors, in this paper, we propose a novel distributed optimal coordination control for NMASs by using ETADP method. Based on Bellman’s principle of optimality, we construct the performance indices and the coupled HJB equations for NMASs. To solve the associated HJB equations, the critic neural networks are developed to approximate the value functions and help calculate the control policies. In order to reduce communication load and computation cost, event-triggered sampling scheme is adopted, which indicates that the control laws and the weight estimation are updated only when the trigger conditions are satisfied and keep constant otherwise. Finally the stability analysis for NMAS is presented and we also prove that the weight error and the local consensus error are uniformly ultimately bounded. The main contribution of this paper are listed as follows. (1) Compared with the existing traditional time-triggered adaptive dynamic programming (TTADP) methods [9,10,12,35], the proposed ETADP method can alleviate the communication load and computation cost due to the sampled data transmissions and the weight estimation of the critic NNs updates are only at the trigger instants. (2) The policy iteration algorithm is implemented based on distributed asynchronous scheme, which not only decreases communication transmission between neighbor agents but only needs ith agent to update its control policy while other agents in its neighborhood hold unchanged. (3) A single critic network for each agent is introduced to approximate the value functions. Compared with the dual critic– actor network framework [9,10], it simply the network structure and reduces the updated number of weight estimation. The paper is organized as follows. Section 2 presents preliminaries and problem formulation. ADP-based event-triggered coordination controllers are designed in Section 3, and the stability of local neighborhood consensus error systems is proved. Section 4 details policy iteration algorithm for solving the HJB equations. Generalized fuzzy hyperbolic models based critic neural networks

185

are used to approximated the value functions in Section 5, and we give a proof for UUB of the weight estimation errors and the local neighborhood coordination errors. Section 6 presents two simulation examples to illustrate the effectiveness of the scheme. Finally, conclusions are drawn in Section 7. 2. Preliminaries and problem formulation In the section, we first introduce notations and algebraic graph theory, subsequently the event-triggered coordination error dynamic is formulated. 2.1. Notations N+ represents the positive integer set. Rn denotes the n dimensional Euclidean space. 1 is a vector of ones. In is an identity matrix with n dimensions. For a vector x and matrix X , we denote ∥x∥ and ∥X ∥ as their 2-norm. A > 0(≥ 0) represents positive (semi-positive) definite matrix. λ(A), λmin (A) and tr(A) denote the minimum singular value, minimum eigenvalue and trace of matrix A, respectively. ⊗ is the Kronecker product operator. 2.2. Algebraic graph theory Denote G = {V , E , A} as a undirected communication topology graph composed of a set of nodes V = {v1 , . . . , vN }, a set of edges E = {(vi , vj ) : vi , vj ∈ V } ⊆ V × V , and a weighted adjacency matrix A = [aij ]N ×N . If there is a communication link between agent i and agent j, i.e., (vi , vj ) ∈ E , then aij > 0 and they are called neighbor, otherwise, aij = 0. Note that aii = 0. The set of neighbors of agent i is denoted by Ni = {j : (vi , vj ) ∈ E , j ̸ = i}, and we define that N¯ i = {Ni , i} includes agent i and its neighborhoods Ni . A path from vi to vj is a edge sequence of (vi , vi1 ), . . . , (vik , vj ) which starting from vi and ending with vj . And vi and vj are called connected if there is a path from vi to vj . If all pairs of nodes in G are connected, then G is called connected graph. The∑ degree matrix of G is given by D = diag {d1 , d2 , . . . , dn } with di = j∈Ni aij . The Laplacian matrix of G is defined as L = D − A and it is symmetric positive semidefinite matrix, i.e., L ≥ 0. 2.3. Local coordination error dynamic systems representation In this paper, we consider a NMAS consisting of N agents indexed by 1, 2, . . . , N. Every agent is represented by a node in a communication graph G . These nodes dynamics are modeled by: x˙ i = f (xi ) + gi (xi )ui , i ∈ Ω = {1, . . . , N } n

(1)

mi

where xi ∈ R and ui ∈ R represent the state and control input vectors of agent i, respectively. The functions f (xi ) ∈ Rn and gi (xi ) ∈ Rn×mi denote the nonlinear systems’ dynamics satisfying f (0) = 0, f + gi ui are Lipschitz continuous and contain the origin. And we assume ∥gi (xi )∥ ≤ βi . For simplicity, the time symbol t will be omitted in the following. The dynamics of the leader is represented as x˙ 0 = f (x0 )

(2) n

where the leader state x0 ∈ R , f (x0 ) is a differential function. Remark 1. Here the leader can be regarded as a command generator which generates the desired signals to be tracked by the followers. In this case, it is reasonable to assume that the leader has no neighbors, that is to say, it does not receive any information from external environment.

186

W. Zhao and H. Zhang / ISA Transactions 91 (2019) 184–195

In order to make all the agents synchronize to the leader, the local neighbor cooperative error is designed to prescribe cooperative team objectives. The local neighborhood cooperative error for agent i is defined as

δi =



aij (xi − xj ) + ci (xi − x0 )

Define the local performance index associated with i as

∫0 ∞

=

(3)

In event-based sampling mechanism, a monotonically increasing sequence of time instants {tki }∞ k=0 , referred to as event-triggered instants, satisfying tki < tki +1 , k = 1, 2, . . . where tki is the kth consecutive event-triggered sampling instant of agent i. The state of the aperiodic sampled system is a sequence of trigger states xˆ i,k , where xˆ i,k = xi (tki ) for all t ∈ [tki , tki +1 ), k ∈ N. Define the event-triggered error vector of agent i between the current consensus error and the last trigger instant consensus error as follows: ei,k = δˆ i − δi

(4)

for t ∈ [tki , tki +1 ) where δˆ i = δi (tki ) denotes local neighbor coordination error at the triggered instants tki for agent i, and the trigger error ei,k is reset to zero at event-triggered instants, i.e., ei,k = 0, t = tki , k ∈ N. Differentiating the local neighbor consensus error Eq. (3), one has aij (x˙ i − x˙ j ) + ci (x˙ i − x˙ 0 )

To make sense of the resulting optimal coordination control among the feedback control policies with complete state information, we introduce an admissible control policy definition. Definition 1 ([9,12] Admissible Coordination Control Polices). Feedback control polices ui , i ∈ Ω are defined as admissible coordination control with respect to (6) on a set Π ∈ Rn if ui is continuous, ui (0) = 0, ui stabilize systems (5) locally, and the local cost functional (6) are finite. In the case of giving admissible coordination control polices U(xˆ N¯ i ), the local value function Vi (δi ) for agent i is represented as ∞



Ui (δi , U(xˆ N¯ i ))dτ

(aij + bij )(f (xj )

tki

∫ +

j∈Ni

t

+ gj (xj )uˆ j − f (x0 ))

tki





(lij + bij )(f˜j + gj uˆ j )

= t

j∈Ni

(5)

j∈N¯ i j

Ui (δi (τ ), U(xˆ N¯ i ))dτ

Ui (δi , U(xˆ N¯ i ))dτ + Vi (δˆ i )

(7)

After transformation, Eq. (7) becomes lim

for t ∈ [tki , tki +1 ) where uˆ j = uj (tk ), j ∈ N¯ i denotes the control inputs at the event-triggered instant, f˜j = f (xj ) − f (x0 ), bii = ci and bij = 0. Note that the control policies are updated only at the eventtriggered instants in the context of event-triggered sampling.

Ui (δi (τ ), U(xˆ N¯ i ))dτ

tki

i i i k [tk ,tk+1 )=[tk ,∞)



tki +1





= ⋃

(lij + bij )(f˜j + gj uˆ j )



Remark 3. In event-triggered sampling framework, the performance indexes are approximated by using intermittently available system states, i.e., the states of triggering instants. Thus the number of calculations are reduced largely compared to [9,12].

aij (f (xj ) + gj (xj )uˆ j ) − ci f (x0 )



(6)

set for agent i and its neighbors at the event-triggered instants, and all weighting matrices Qii > 0, Rii > 0, and Rij ≥ 0 are constant matrices.

t

= (di + bii )(f (xi ) + gi (xi )uˆ i − f (x0 )) −

uˆ Tj Rij uˆ j )dt

j∈Ni

ˆT ˆ j∈Ni uj Rij uj is cost-to¯ go function of agent i, U(xˆ N¯ i ) = {ˆuj |j ∈ Ni } denotes the control input

j∈Ni

=



(δiT Qii δi + uˆ Ti Rii uˆ i +

where Ui (δi (0), U(xˆ N¯ i )) = δiT Qii δi + uˆ Ti Rii uˆ i +

Vi ( δ i ) =

j∈Ni

= (lii + bii )(f˜i + gi uˆ i ) +

uˆ Tj Rij uˆ j )dt

0

Remark 2. In communication graph G , it must guarantee that the leader can communication with a small subset of followers. If follower i communicates with the leader, then the pinning gain ci > 0, otherwise, ci = 0.



∑ j∈N¯ i

=

where ci ≥ 0 is the pinning gain.

= (di + ci )(f (xi ) + gi (xi )uˆ i ) −

(δiT Qii δi +







Ui (δi , U(xˆ N¯ i ))dt

0

j∈Ni

δ˙i =





Ji (δi (0), U(xˆ N¯ i )) =

Vi (δi ) − Vi (δˆ i ) t − tki

t →tki+

1

= − lim

t →tki+

tki



tki − t

t

Ui (δi , U(xˆ N¯ i ))dτ

(8)

By evaluating limit value of (8), then we obtain the local coupled HJ equation:

˙

Hi (δi , Vδˆ , U(xˆ N¯ i )) = Ui (δi , U(xˆ N¯ i )) + VδˆT δˆ i i

i

= δiT Qii δi + uˆ Ti Rii uˆ i +

3. ETADP-based distributed coordination controller design



uˆ Tj Rij uˆ j

j∈Ni

In the section, we design an ADP-based controller with eventtriggered sampling data to make all the agents synchronize the leader while optimizing provided performance indices of them. Normally, the optimal coordination control design of NMAS is regarded as the solutions to the coupled HJB equations. To obtain the local coupled HJB equations, we design the performance index functions that depend on the local neighbor consensus errors and the coordination control policies

+ VδˆT ·



(lij + bij )(f˜j + gj uˆ j )

i

j∈N¯ i

=0 where Vδˆ = i

∂V | ∂δi δi =δˆi

(9) is the partial derivative of the value function

Vi (δi ) with respect to δi at the event-triggered instants. If the minimum of the left-hand side of (9) exists and is unique. Then according to the necessary condition of Bellman’s optimal

W. Zhao and H. Zhang / ISA Transactions 91 (2019) 184–195

principle, associated feedback control polices are given by the gradient of (9) with respect to uˆ i . Thus we have uˆ i = −

di + bi 2

−1 T

Rii gi Vδˆ

(10)

i

Therefore we can obtain (di + bi )giT Vδ∗i = −2Rii u∗i Vδ∗i T

Assume that the local optimal value function Vi∗ (δi ) satisfy the coupled HJ equations (9), that is, min Hi (δi , Vδ∗i , uˆ ∗i , uˆ j ) uˆ i

=0

uˆ i = −

di + bi 2

(18)



(lij + bij )f˜j = −(δiT Qii δi + u∗i T Rii u∗i

j∈N¯ i

+



u∗j T Rij u∗j + Vδ∗i T



(lij + bij )gj u∗j )

(19)

j∈N¯ i

j∈Ni

(11)

For agent j, we have u∗j = uˆ ∗j during the time interval [tki , tki +1 ).

Thus the local optimal event-triggered coordination control policy for agent i is obtained ∗

187

−1 T

Rii gi Vδˆ∗ i

Substituting (18), (19) into (15), then one has V˙ i∗ (δi ) = −δiT Qii δi − u∗i T Rii u∗i −

(12)

u∗j T Rij u∗j

j∈Ni



To prove the stability of the local neighbor consensus error systems (5) expediently, we provide an assumption as follow.



bij )Vδ∗i T gj u∗j

(lij +

+ Vδ∗i T



(lij + bij )gj uˆ ∗j

j∈N¯ i

j∈N¯ i

= −δiT Qii δi − u∗i T Rii u∗i −

Assumption 1. The event-triggered sampling controller is Lipschitz continuous in a compact set. Then there exists a constant M such that

∥ui − uˆ i ∥ ≤ M ∥ei ∥





u∗j T Rij u∗j + 2u∗i T Rii u∗i

j∈Ni

− 2u∗i T Rii uˆ ∗i = −δiT Qii δi + u∗i T Rii u∗i − 2u∗i T Rii uˆ ∗i −

(13)



u∗j T Rij u∗j

(20)

j∈Ni

Then we have

where M is a positive real number. ∗T

Theorem 1. Assume that the performance index Ji and optimal coordination control policy uˆ ∗i for each agent are given by (6) and (12), respectively. Under distributed asynchronous scheme, the control policy uˆ ∗i of agent i is updated while the control polices uj , j ∈ Ni for the agent i’s neighbors hold invariant. If there exists functions Vi∗ (δi ) > 0, i ∈ Ω that satisfies the coupled HJB equations (11), and the triggering conditions satisfy

∥ei ∥2 ≤

∥riT uˆ ∗i ∥2 +



j∈Ni

λ(Rij )∥ˆu∗j ∥2

M 2 ∥ri ∥2

, t ∈ [tki , tki +1 )

V˙ i∗ (δi ) = Vδ∗i T δ˙ i (lij + bij )(f˜j + gj uˆ ∗j )

(15)

j∈N¯ i

di + bi 2

1 T ∗ R− ii gi Vδi

(16)

and u∗i T Rii u∗i +



+ Vδ∗i T ·



By using the Lipschitz condition in Assumption 1, (20) is rewritten as V˙ i∗ (δi ) = −δiT Qii δi + ∥riT u∗i − riT uˆ ∗i ∥2 − ∥riT uˆ ∗i ∥2 −



u∗j T Rij u∗j

j∈Ni 2

2

2

2

riT u∗i 2

≤ −λ(Qii )∥δi ∥ + M ∥ri ∥ ∥ei (t)∥ − ∥ ˆ ∥ ∑ − λ(Rij )∥u∗j ∥2

(22)

When the triggering condition for agent i is satisfied: 2

∥ei ∥ ≤

∥riT uˆ ∗i ∥2 +



j∈Ni

λ(Rij )∥ˆu∗j ∥2

M 2 ∥ri ∥2

Since Qii > 0, ri ≥ 0 and Rij ≥ 0, we have V˙ i∗ (δi ) ≤ −λ(Qii )∥δi ∥2 < 0. Thus the local neighborhood error systems (5) are asymptotically stable. (2) Based on the definition of performance function and value function, Ji∗ (δi (0), U(xˆ (N¯ i ))) = Vi∗ (δi (0)) is obvious. Although employing ETADP based controller to reduce computation numbers, the nonlinear characteristics of the coupled HJB equations (11) render to obtain its analytical solution difficultly. So policy iteration algorithm is presented to solve the coupled HJB equation in the next section. 4. Policy iteration algorithm for the coupled HJB equations

Consider the optimal coordination control polices (12) and the coupled HJ equations (9) in the traditional ADP method as u∗i = −

(21)

j∈Ni

Proof. Take the first derivative of Vi∗ (δi ) with respect to the local neighborhood consensus error δi , then we can obtain



= ∥riT u∗i − riT uˆ ∗i ∥2 − ∥riT uˆ ∗i ∥2

(14)

where ri denotes a semi-positive definite matrix which satisfies ri riT = Rii . Then the following conclusions hold: (1) The local neighborhood consensus error systems (5) are asymptotically stable. (2) The local performance index functions Ji∗ (δi (0)) are equal to the value functions Vi∗ (δi (0)).

= Vδ∗i T ·

ui Rii u∗i − 2u∗i T Rii uˆ ∗i

u∗j T Rij u∗j + δiT Qii δi

The section proposes a policy iterative (PI) algorithm to solve the coupled HJB equation for each agent. Generally speaking, the PI algorithm includes two steps: policy evaluation and policy improvement. These two steps are repeated until the result of policy improvement no longer changes the control policies. Note that the value functions need to be only evaluated by admissible control polices.

j∈Ni

j∈N¯ i

(lij + bij )(f˜j + gj u∗j ) = 0

(17)

Algorithm 1. Policy iteration algorithm for distributed coordination control of NMAS Step 1: Start with admissible initial policies u0i , i ∈ Ω

188

W. Zhao and H. Zhang / ISA Transactions 91 (2019) 184–195

Step 2: (Policy Evaluation) Given the N-tuple of policies uˆ l1 , uˆ l2 , . . . , uˆ lN , to solve Vil using (9) Hi (δi , V ˆl , uˆ li , uˆ lNi ) = 0, l = 1, 2, . . .

(23)

δi

where Vil denotes the value function of agent i at the lth iteration. Step 3: (Policy Improvement) Update the N-tuple of control policies using (12) uˆ li+1 = −

di + bi 2

5. Online solution of distributed cooperation control for MASs using ETADP method

1 T l R− ii gi V ˆ

(24)

δi

Go to Step 2. On convergence. Next a theorem is deduced to show the convergence of the PI algorithm for MASs. Theorem 2. In the distributed event-triggered coordination control, there exists an admissible policy uˆ i for agent i while its neighbor control policies uˆ Ni is invariant. If agent i performs Algorithm 1 , then the control policies converge to the optimal control uli → u∗i , i ∈ Ω , and the value functions converge to the optimal value functions Vil → Vi∗ . Proof. In PI algorithm we employ distributed asynchronous update pattern, then only agent i update its control policy while the control laws of its neighbor agents Ni hold invariant. Thus we only need to prove the convergence of PI algorithm in which agent i updates its control policy while other agents in its neighborhood do not change them. ′ ′ Since V˙ il+1 = −Ui (δi , uˆ li+1 , uˆ lN ) and V˙ il = −Ui (δi , uˆ li , uˆ lN ), then i i one has V˙ il+1 − V˙ il = (uˆ li+1 )T Rii uˆ li+1 − (uˆ li )T Rii uˆ li

= (uˆ li+1 − uˆ li )T Rii (uˆ li+1 − uˆ li ) + 2(uˆ li )T Rii (uˆ li+1 − uˆ li )

(25)

An online ETADP technique is used to solve the coupled HJB equations (11) in this section. In order to implement the proposed ETADP technique, the fuzzy hyperbolic model based critic neural networks (NN) [36] are utilized to approximate the value functions and help calculate the control policies. Compared with actor– critic dual NNs in [9,10], it is effective to use the single critic NN framework in NMASs because network structure is more simple and the number of system updates are relatively few. In critic NNs, the neural network weight estimations are updated at the eventtriggered instants leading to aperiodic weight tuning laws so that computation cost is reduced. Based on the event-triggered sampling scheme, the value function for agent i, which is approximated by fuzzy hyperbolic critic approximator, is presented as Vi (δi ) = WiT Φi (zi ) + εi (zi )

where Wi is the unknown weight vector, zi (t) is a vector which includes δi and locally available measurements δj (t), j ∈ Ni and Φi (zi (t)) = tanh(zi ) is the critic network activation function. Moreover εi (zi ) is the reconstruction error. ˆ i , the estimated Assuming weight estimation of agent i is W value function can be represented as

ˆ iT,k Φi (zi ), t ∈ [tki , tki +1 ) Vˆ i (δi ) = W

ˆ i,k is the weight estimation vector at trigger instant t = t i , where W k the activation function Φi (zi (t)) is selected as a basis for the value functions approximation and satisfy Φi (0) = 0. For agent i, the HJB equation error can be defined as = Ui (δi , Vδˆi , U(xˆ Ni )) + Vˆ δˆT δ˙i ∑i T T = δi Qii δi + uˆ i Rii uˆ i + uˆ Tj Rij uˆ j + φi (zi ), t ∈ [tki , tki +1 )

∆uˆ Ti Rii ∆uˆ i ≥ −2(uˆ li )T Rii ∆uˆ i 1 T = (di + bi )(Vδl− ) gi ∆uˆ i i

(26)

1 λ(Rii )∥∆ui (xˆ )∥ ≥ βi (di + bi )∥Vδl− ∥ i

(27)

According to the definition of Vi , it is known that Vi (∞) = 0. By integration of V˙ il ≤ V˙ il+1 over the interval [t , ∞), we obtain Vil+1 ≤ Vil

(28)

which shows that the value function Vil is a nonincreasing function bounded below by 0. As such Vil is convergent as k → ∞. we can write limk→∞ Vil = Vi∞ . According to the definition of the local value function (7), we know ∞



δ + (uˆ i ) Rii uˆ i +

T i Qii i

≥ 0

∗ T



(32)

j∈Ni

where ∆uˆ i = uˆ li+1 − uˆ li . In order to make the inequation (26) hold, we must guarantee



(31)

ˆ i,k , U(xˆ N )) eci,k = Hi (δi , W i

A sufficient condition for V˙ il ≤ V˙ il+1 is

Vil

(30)



∗ T





(uˆ j ) Rij uˆ j )dt = Vi

(29)

j∈Ni

Let l → ∞ then Vi∞ ≥ Vi∗ . Since Vi∞ ≤ Vi∗ , then we obtain Vi∞ = Vi∗ . Thus the value functions satisfy liml→∞ Vil = Vi∗ . Meanwhile, we can easily gain liml→∞ uli = u∗i , i ∈ Ω . Remark 4. Compared to [9], we only consider the case that agent i updates its control policy while the control policies of other agent hold invariant.

∂ Φ T (z )

˜ ˆ i,k ˆ where φi (zi ) = ∂δ i W j∈N¯ i (lij + bij )(fj + gj uj ). i Given any admissible coordination control policies uˆ N¯ i , it is



ˆ i,k to minimize the square residual error Ei (W ˆ i,k ) desired to select W i i during time interval ∈ [tk , tk+1 ) , it is designed as

ˆ i,k ) = E i (W

1 2

(eci,k )T eci,k

(33)

Under the event-triggered sampling mechanism, the critic NN weights will be updated only at the event-triggered instants. The updating law of weight estimation for agent i is selected as follows: when the event is not triggered, we have

˙

ˆ i,k = 0, t ∈ [tki , tki +1 ) W

(34)

and when the event-triggered condition is satisfied, the update law is described as

ˆ+ =W ˆ i,k − ai σi,k (σiT,k W ˆ i,k + Ui (δi , V ˆ , U(xˆ N ))), t = tki W i ,k δ i i

(35)

where ai > 0 is the gain of the adaptive updating law and σi,k = ∂ Φ (zi ) ∑ ˜ ˆ j∈N¯ (lij + bij )(fj + gj uj ). Thus the critic network weights are ∂δ i

i

updated in an aperiodic manner. This reduces computation costs significantly compared to the TTADP [9,12].

W. Zhao and H. Zhang / ISA Transactions 91 (2019) 184–195

In order to deduce the weight estimation errors, we substitute (30) into (9), it can be rewritten as Ui (δi , U(xˆ Ni )) + (

∂εiT (zi ) ∑ ∂ ΦiT (zi ) Wi + ) (lij + bij ) ∂δi ∂δi j∈N¯ i

× (f˜j + gj uˆ j ) = 0

(36)

189

Proof. Under the event-triggered scheme, the proof of UUB for the local consensus error δi , the critic network weight estimation error ˜ i,k is carried out by evaluating a Lyapunov function both at interW event times and event-triggered instants. Case 1. inter-event times, i.e., t ∈ (tki , tki +1 ), ∀i ∈ Ω , k = 1, . . .. Consider the Lyapunov function candidate given by Li = Li1 + Li2

That is to say, we have

∑ ∂ ΦiT (zi ) Wi (di + bi ) (lij + bij )(f˜j + gj uˆ j ) Ui (δi , U(xˆ Ni )) + ∂δi j∈N¯ i

∂εT (δi , ei ) ∑ =− i (lij + bij )(f˜j + gj uˆ j ) ∂δi

(37)

update laws (34) and the jump Eqs. (35) updated at the trigger instants, one obtains

˙

(41)

2ai

δ δ + Υi Vi (δi ), Li2 =

1 T 2 i i

˜T W ˜ ) tr(W i,k i,k 2ai

, and Υi > 0.

(38)

˙ L˙ i1 = δiT δ˙∑ i + Υ i Vi ( δ i ) T = δi (lij + bij )(f˜j + gj uˆ j ) − Υi Ui (δi , U(xˆ N¯ i )) ¯

=

∑ ∑j∈Ni uˆ Tj Rij uˆ j δiT (lij + bij )(f˜j + gj uˆ j ) − Υi δiT Qii δi − Υi j∈N¯ i

¯



j∈Ni ∑

(∥(lij + bij )∥ β − Υi λmin (Rij ))∥ˆuj ∥ 2

2 j

2

j∈N¯ i

+ (2 |N¯ i | − Υi λmin (Qii ))∥δi ∥2 ∑ + ∥(lij + bij )f˜j ∥2

and

˜+ =W ˜ i,k − ai σi,k (σiT,k W ˜ i,k + εHJi ), t = tki W i,k

˜TW ˜ tr(W i ,k i ,k )

Taking the time derivative of Li1 , one has

˜i = We also define the critic NN weight estimation error W ˆ Wi − Wi . Then we take the time derivative using the continuous

˜ i,k = 0, t ∈ [tki , tki +1 ) W

2

δiT δi + Υi Vi (δi ) +

where Li1 =

j∈N¯ i

= −εHJi

1

=

(39)

(42)

j∈N¯ i

Remark 5. In order to make the critic network weight estimation error converge to zero, the activation function Φi (zi (t)) must satisfy the persistency of excitation condition [37]. According to the estimation of the value function Vˆ i (δi ) and the ˆ i,k , the admissible coordination control policy weight tuning law W can be calculated as uˆ i = −

lii + bii 2

1 T R− ii gi

∂ Φ (zi ) T ˆ Wi,k , t = tki ∂δi

(40)

Simultaneously in event-triggered mechanism, the control policy is held invariant during the inter-event times, i.e., uˆ i = uˆ i (tki ), t ∈ [tki , tki +1 ). Before showing the uniformly ultimately bounded of the critic NN weights estimations error and actor NN weights estimations error, the following definition and assumption are necessary. Definition 2 (Uniformly Ultimately Bounded (UUB)). System (5) is said to be UUB if there exists a compact set S ∈ Rn and δi (t0 ) ∈ S, meanwhile exists a bound B and a time T (B, δi (t0 )), both independent of t0 ≥ 0, such that ∥δi ∥ ≤ B, ∀t ≥ t0 + T Assumption 2. The PE condition ensures ∥σi,k ∥ ≤ σMi , where σMi is positive constant. Assumption 3. The critic weights ,the actor weights, the activation functions and the traditional reconstruction errors of the critic and action NNs are bounded, that is, there exist positive constant Wm , WM , ΦM , εM and ε¯ HJ such that Wm ≤ ∥Wi ∥ ≤ WM , ∥Φi (zi )∥ ≤ ΦM , ∥εi (zi )∥ ≤ εM , and ∥εHJi ∥ ≤ ε¯ HJ . Next the UUB of the critic NN and actor NN weight estimation error is proved in the following theorem. Theorem 3. Consider the local neighbor consensus error dynamics (5), the value function and the control policy defined as in (31) and (40) with event-triggered weight update laws (34), (35). And let ˆ i,k and the control Assumption 2 hold. Suppose the critic weights W policies µi are updated when the event-triggered conditions (14) are satisfied. Then the local consensus error δi , the critic network weight ˜ i,k are UUB, respectively. Meanwhile all agents estimation error W reach a consensus on x0 (t).

Because the critic network weight estimation errors are held

˙

˜ i,k = 0. then one has invariant during the inter-event times, i.e., W L˙ i2 = 0

(43)

Combining (42) with (43), the derivative of the Lyapunov function is calculated as L˙ i ≤



(∥(lij + bij )∥2 βj2 − Υi λmin (Rij ))∥uj (xˆ j )∥2

j∈N¯ i

+ (2 |N¯ i | − Υi λmin (Qii ))∥δi ∥2 ∑ + ∥(lij + bij )f˜j (t)∥2

(44)

j∈N¯ i

To guarantee the negativeness of L˙ i , the following condition should hold:

 ∑ 1 2  ˜ 2 √ j∈N¯ i ∥(lij + bij )fj ∥ + 2ai ε¯ HJ ∥δi ∥ > Υi λmin (Qii ) − 2|N¯ i |

(45)

Based on Lyapunov extension theorem, when the condition (45) is satisfied, it means that the local consensus errors δi and weight ˜ i are UUB. estimation errors W Case 2. Event-triggered instants, i.e., t = tki , ∀i ∈ Ω , k = 1, 2, . . .. Consider the same Lyapunov function (41). The critic NNs weights are updated at these instants. Thus we consider the following form:

△Li = △Li1 + △Li2 1

=

2

+

(δi+ )T δi+ −

1

δ T δi + Υi Vi (δi+ ) − Υi Vi (δi ) ˜TW ˜ tr(W i,k i,k ) −

2 i + T ˜ + ˜ tr((Wi,k ) Wi,k ) 2ai

2ai

(46)

In Case 1, we have proved that the L˙ i1 ≤ 0, thus there exists 1

△Li2 =

(δi+ )T δi+ −

1

δ T δi + Υi Vi (δi+ ) − Υi Vi (δi ) ≤ 0 2 2 i Then the problem is converted to find a bound for

△Li1 =

˜ + )T W ˜ +) tr((W i,k i,k 2ai



˜TW ˜ tr(W i,k i,k ) 2ai

(47)

(48)

190

W. Zhao and H. Zhang / ISA Transactions 91 (2019) 184–195

Theorem 4. Assume that the event-triggered condition (14) and (i) (i) Theorem 3 hold. Then the inter-event time interval tk+1 − tk , ∀i ∈ V have a positive lower bound. Proof. Combined (4) with (5), we can obtain

∥˙ei ∥ = ∥δ˙ˆi − δ˙i ∥ ∑ = ∥(lii + bii )f (xi ) − aij f (xj ) − bii f (x0 ) + (lii + bii )gi (xi )uˆ i j∈Ni





aij gj (xj )uˆ j ∥

j∈Ni

≤ ∥(lii + bii )f (xi ) −



aij f (xj ) − bii f (x0 )∥

j∈Ni

+ ∥(lii + bii )gi (xi )uˆ i ∥ + ∥

Fig. 1. Structure of communication topology with a leader.



aij gj (xj )uˆ j ∥

(53)

j∈Ni

Based on Assumption 4, the sufficient condition of (53) can be selected as

Substituting (39) into (48), we obtain △Li2 =

˜ i,k − ai σi,k (σ T W ˜ i,k + εHJi )]T [W ˜ i,k − ai σi,k (σ T W ˜ i,k + εHJi )]) tr([W i,k i,k ˜



2ai

˜

tr(WiT,k Wi,k )

˜ iT,k σi,k (σiT,k W ˜ i,k + εHJi ) + = tr(−W

ai

ai

˜ iT,k σi,k σiT,k W ˜ i,k ) − 2tr( √ = −tr(W +

2

2

2ai

j∈Ni

1

˜ iT,k σi,k √ W

2ai

(49)

˜ i,k ∥2 ≤ W ˜ iT,k σi,k σiT,k W ˜ i,k ≤ pi ∥W ˜ i,k ∥2 qi ∥W T T 2 εHJi σi,k σi,k εHJi ≤ pi ε¯ HJ ˜ i,k ∥2 + △Li2 ≤ −qi ∥W

ai 2

˜ i ,k ∥ 2 + ∥σi,k ∥2 ∥W

1 2ai

i+

∥ei ∥ ≤ eq(t −tk ) ei (tki+ ) +

(50)

If ai satisfies (51)

˜ i,k ∥, the following inequality and for large ∥W 1 + 2a2i pi 2 2 2ai qi − a2i (σMi + p2i + pi − σMi )

ε¯ HJ

j∈Ni

and Πi2,j = q∥δˆ i ∥ + Πi1,j .

(52)

In order to avoid the Zeno behavior in event-triggered control scheme, the following assumption is given. Assumption 4. The nonlinear function f (·) is uniformly bounded, i.e., supxi ∥f (xi )∥ ≤ q∥xi ∥ with q being a positive constant.



2

t tki+

eq(t −τ ) Πi2,j dτ

(55)

Πi2,j 2q

i+

(eq(t −tk

)

− 1), t ∈ (tki , tki +1 ]

(56)

Based on (14), we have ∥ei ∥2 ≤

holds, then △Li2 ≤ 0. According to (47) and (52), we get △Li ≤ 0. By using Lyapunov extension theorem, we can infer that the local consensus error δi , ˜ i,k are UUB. the critic network weight estimation error W Based on the above analysis of two case , we have that all agents reach a consensus on x0 . ■

1

At the event-triggered instant t = tki , the event-triggered error is zero and ei (tki+ ) = 0. Then it yields

∥ei ∥ ≤

∥σMi ∥2 + pi + p2i

ˆ i,k ∥ + max βj |lij | W

Using the comparison lemma in [38], we have

2 ε¯ HJ

2 ai ai ai 2 2 ˜ i ∥2 + p2i + pi + σMi ) ∥W ≤ (−qi + σMi 2 2 2 1 2 +( + ai pi )ε¯ HJ 2ai 2(qi − σ

∂ Φ T (zj ) ˆ j,k ∥ W ∂δj

∂δi

(54)

According to Assumption 3 and Theorem 3, it is easily obtained ˆ i,k is bounded. Thus Π 1 and Π 2 are also bounded. that W i,j i,j

2 ˜ i,k ∥2 + pi (∥σi,k ∥2 ∥W ˜ i,k ∥2 + ε¯ HJ [p2i ∥W ) + pi ∥¯εHJ ∥2 ]

2 Mi )

1 T R− ii gi

1 = (lii + bii )βi ∥ lii +2bii R− ii gi

where Πi1,j l +b

lii + bii

T T ∂ Φ (zi )

1 T ∥ jj 2 jj R− jj gj

So we have

˜ i,k ∥ ≥ ∥W

≤ q∥δi ∥ + (lii + bii )βi ∥

εHJi )

In the section, two examples are studied to show the effectiveness of the proposed ETADP method. ˜ i,k > 0 and ε T σ T σi,k εHJi > 0, then there exists ˜ T σi σ T W Since W i HJi i,k i ,k pi > qi > 0 such that

ai ≤

∂ Φ T (zi ) ˆ Wi,k ∥ 2 ∂δi T ljj + bjj −1 T ∂ Φ (zj ) ˆ j,k ∥ + max βj |lij |∥ W Rjj gj j∈Ni 2 ∂δj ≤ q∥δˆi − ei ∥ + Πi1,j ≤ q∥ei ∥ + q∥δˆi ∥ + Πi1,j ≤ q∥ei ∥ + Πi2,j

˜ i,k + εHJi )T σiT,k σi,k (σiT,k W ˜ i,k + εHJi )) (σiT,k W

˜ iT,k σi,k σiT,k σi,k σiT,k W ˜ i,k ) [tr(W

ai

aij xj − bii x0 ∥ + (lii + bii )βi ∥ˆui ∥

j∈Ni

T ˜ iT,k σi,k σiT,k σi,k εHJi ) + tr(εHJi + 2tr(W σiT,k σi,k εHJi )]

+



+ max βj |lij |∥ˆuj ∥

2ai

ai

∥˙ei ∥ ≤ q∥(lii + bii )xi −

≤ ≤

∥riT uˆ ∗i ∥2 +



j∈Ni

M2

λ(Rij )∥ˆu∗j ∥2 2

∥ri ∥ ∑ l +b 1 T ∂ Φ (zi ) T ˆ 1 T ∥ lii +2bii riT R− Wi,k ∥2 + j∈N λ(Rij )∥ jj 2 jj R− ii gi jj gj ∂δ i i

(lii + bii )2 βi2 Θi,k 2M 2 ∥ri ∥4

where Θj,k = ∥

+

∂ Φ (zj ) T ˆ j,k ∥2 W ∂δj

M 2 ∥ri ∥2 ∑ λ(Rij )(ljj + bjj )2 βj2 Θj,k 2∥rj ∥4 ∥ri ∥2

j∈Ni

(57)

∂ Φ (zj ) T ˆ j,k ∥ 2 , j W ∂δj

∈ N¯ i . When an event is triggered at t = tki +1 , then (57) can be written

as

∥ei (tki + 1 )∥   (lii + bii )2 βi2 Θi,k ∑ λ(Rij )(ljj + bjj )2 βj2 Θj,k ≤√ + 2M 2 ∥ri ∥4 2∥rj ∥4 ∥ri ∥2 j∈Ni

(58)

W. Zhao and H. Zhang / ISA Transactions 91 (2019) 184–195

191

Fig. 2. Evolutions of the states of all agents xi .

Fig. 3. A comparison of the control inputs between the proposed ETADP algorithm and TTADP scheme: in [9,12] (a) Event-triggered control; (b) Time-triggered control.

ˆ i and W ˆ ti for ETADP algorithm and TTADP scheme in [9,12] (a) Event-triggered scheme; (b) Time-triggered scheme. Fig. 4. Convergence of critic NN weight estimations W

192

W. Zhao and H. Zhang / ISA Transactions 91 (2019) 184–195

Fig. 5. Inter-event times of all agents.

At time t = tki +1 , we use (56) and (58) to derive

Thus we can obtain the event-triggered internal as

   (lii + bii )2 βi2 Θi,k ∑ λ(Rij )(ljj + bjj )2 βj2 Θj,k √ + 2M 2 ∥ri ∥4 2∥rj ∥4 ∥ri ∥2



j∈Ni



Πi2,j 2q

(e

q(tki +1 −tki )

− 1)

tki +1 (59)



tki



1 q

log

(lii +bii )2 βi2 Θi,k 2M 2 ∥ri ∥4

+



λ(Rij )(ljj +bjj )2 βj2 Θj,k

j∈Ni

2∥rj ∥4 ∥ri ∥2

+ Πi2,j

Πi2,j (60)

W. Zhao and H. Zhang / ISA Transactions 91 (2019) 184–195

193

Table 1 Comparisons of the number of control update under ETADP and TTADP.

Fig. 6. Communication graph with a leader.

So there exists a positive lower bound for the next inter-event interval, that is to say, the Zeno behavior can be avoided. 6. Simulation Example 1. Select a NMAS with undirected communication topology in Fig. 1 to demonstrate the effectiveness of the proposed ETADP technique. The NMAS is composed of five agents with a leader v0 connected to agent v3 . For simulation, similar to [12], the dynamic of each agent is given as: x˙ i1 = xi2 − x2i1 xi2 x˙ i2 = −(xi1 + xi2 )(1 − xi1 )2 + xi2 ui , i = 1, 2, 3, 4, 5 x2i1 xi2

(61)

The proposed ETADP

The periodic ADP [9,12]

Agent 1 Agent 2 Agent 3 Agent 4 Agent 5

16 21 16 9 32

78 78 78 78 78

which almost simultaneously converges to zero. It is obvious that the control inputs under the proposed ETADP method are only updated at the event-triggered instants and remain the constant during inter-event time. Meanwhile, the evolutions of NN weight ˆ i and Wt ˆ i for ETADP and TTADP methods are shown estimations W in Fig. 4. In the figure, the weight updating laws are aperiodic and only based on the event-triggered sampled data. The NN weight estimations tend to constant which indicates that the NN weight estimation errors are UUB. Fig. 5 shows event-triggered instants and inter-event time between consecutive transmissions of agent i. From the figure, we see that the lower bound on the interevent times is found to be 0.128 s, which means that the proposed ETADP method avoids zeno behaviors. Subsequently, the number of control actions by the proposed ETADP method and periodic ADP techniques [9,12] are listed in Table 1. We obviously find that the cumulative number of control actions is largely reduced in the computation of around (24.1%) for the ETADP method than that of periodic ADP method. It is shown that the proposed ETADP method reduces the frequency of communication and decreases commutation cost.

2

where f (xi ) = [xi2 − ; −(xi1 + xi2 )(1 − xi1 ) ], gi (xi ) = [0; xi2 ]. And the state trajectory of the leader v0 is chosen as: x˙ 01 = x02 − x201 x02 x˙ i2 = −(x01 + x02 )(1 − x01 )2

Method

(62)

The simulation parameters are chosen as follows. The weight matrices Qii = I2 , Rii = 1, Rij = 0.1 (j ∈ Ni ), the pinning gain b3 = 1, and the learning gains a1 = 0.1, a2 = 0.5, a3 = 0.4, a4 = 0.3, a5 = 0.2. Moreover, the critic weights were initialized at random from a uniform distribution in the interval [0,0.1]. Fig. 2 depicts the evolution of all the agents’ states under the optimal coordination control policies (12). This figure shows that the five followers can reach a consensus with the leader after 13s. The convergence of optimal consensus control policies under the proposed ETADP scheme and TTADP (time-triggered adaptive dynamic programming) method for NMASs are shown in Fig. 3

Example 2. In order to show the expansibility of the proposed ETADP algorithm, a larger size communication network of NMAS is studied. Consider a 10-agent communication graph shown in Fig. 6, in which the leader v0 is pinned to the followers v2 , v5 , v10 . For the NMAS, the dynamics of each follower is selected as x˙ i1 = −xi1 + xi2 x˙ i2 = −0.5(xi1 + xi2 ) − 0.5xi2 sin2 (xi1 ) + sin(xi1 )ui

[ with the leader x˙ 0 = Rii b2 a3 a9

] −x01 + x02 . 2 −0.5(x01 + x02 ) − 0.5x02 sin (x01 )

Refer to Example 1, we select the parameters as: (1) Qii = 1, = 1, Rij = 0.1(j ∈ Ni ), i = 1, . . . , 10; (2) the pinning gain = b5 = b10 = 1; (3) the learning gains a1 = 0.1, a2 = 0.5, = 0.3, a4 = 0.4, a5 = 0.2, a6 = 0.15, a7 = 0.55, a8 = 0.35, = 0.45, a10 = 0.21.

Fig. 7. Evolution of the local consensus error.

194

W. Zhao and H. Zhang / ISA Transactions 91 (2019) 184–195

Fig. 8. Consensus of 10 agents to the leader. Fig. 10. Performance index function of all followers.

Fig. 9. Evolution of the control inputs.

In Fig. 7, the local consensus error vectors for all agents are shown in which they asymptotically converge to zero. And Fig. 8 describes the 3-dimensional state diagram of all agents. Figs. 7 and 8 indicate that all the followers can reach coordination to the leader under the optimal consensus control policies shown in Fig. 9. The Evolution of the performance index function for each agent is depicted in Fig. 10. In the event-triggered control mechanism, the event-triggered instants and inter-event time for all agents are seen in Fig. 11. It is easily seen that the cumulative number of control actions and sampled-data transmission are largely reduced in the computation and communication.

Fig. 11. Inter-event times of all followers.

Acknowledgments This work was supported by the China Postdoctoral Science Foundation under grant no. 2018M632208. References

7. Conclusion In this paper, we have considered the problem of distributed optimal coordination control for NMASs using ETADP method. In the proposed ETADP framework, the control policies and the weight estimation of the critic NNs are updated only when the designed triggered conditions are satisfied, thus the communication load and commutation cost can be reduced largely compared with the traditional periodic ADP. Moreover, GFHM based NN approximators using single network framework have been presented to approximate the value function and help calculate the control policies. It is more effective to only use single critic NN in our method compared with critic–actor NN approximator. A simulation example has been provided to show the effectiveness of our scheme. For future investigation, we may apply the proposed ETADP method to MAS with network-induced delay, packet dropout, disordering, etc.

[1] Su H, Qiu Y, Wang L. Semi-global output consensus of discrete-time multiagent systems with input saturation and external disturbances. ISA Trans 2017;67:131–9. [2] Cao Y, Zhang L, Li C, Chen MZ. Observer-based consensus tracking of nonlinear agents in hybrid varying directed topology. IEEE Trans Cybern 2017;47(8):2212–22. [3] Han T, Guan Z-H, Chi M, Hu B, Li T, Zhang X-H. Multi-formation control of nonlinear leader-following multi-agent systems. ISA Trans 2017;69:140–7. [4] Wen G, Duan Z, Chen G, Yu W. Consensus tracking of multi-agent systems with lipschitz-type node dynamics and switching topologies. IEEE Trans Circuits Syst I Regul Pap 2014;61(2):499–511. [5] Weng S, Yue D, Shi J. Distributed cooperative control for multiple photovoltaic generators in distribution power system under event-triggered mechanism. J Franklin Inst B 2016;353(14):3407–27. [6] Dong X, Yu B, Shi Z, Zhong Y. Time-varying formation control for unmanned aerial vehicles: theories and applications. IEEE Trans Control Syst Technol 2015;23(1):340–8. [7] Wang W, Huang J, Wen C, Fan H. Distributed adaptive control for consensus tracking with application to formation control of nonholonomic mobile robots. Automatica 2014;50(4):1254–63.

W. Zhao and H. Zhang / ISA Transactions 91 (2019) 184–195 [8] Ren W, Beard RW, Atkins EM. A survey of consensus problems in multiagent coordination. In: Proceedings of the 2005, American control conference, 2005.. IEEE; 2005, p. 1859–64. [9] Vamvoudakis KG, Lewis FL, Hudas GR. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 2012;48(8):1598–611. [10] Abouheaf MI, Lewis FL, Vamvoudakis KG, Haesaert S, Babuska R. Multiagent discrete-time graphical games and reinforcement learning solutions. Automatica 2014;50(12):3038–53. [11] Wei Q, Liu D, Lewis FL. Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games. Inform Sci 2015;317:96–113. [12] Zhang H, Zhang J, Yang G-H, Luo Y. Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming. IEEE Trans Fuzzy Syst 2015;23(1):152–63. [13] Murray JJ, Cox CJ, Lendaris GG, Saeks R. Adaptive dynamic programming. IEEE Trans Syst Man Cybern C 2002;32(2):140–53. [14] Lewis FL, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 2009;9(3):32–50. [15] Wang F-Y, Zhang H, Liu D. Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 2009;4(2):39–47. [16] Zhang H, Luo Y, Liu D. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 2009;20(9):1490–503. [17] Liu D, et al. Numerical adaptive learning control scheme for discrete-time non-linear systems. IET Control Theory Appl 2013;7(11):1472–86. [18] Liu D, Wei Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 2014;25(3):621–34. [19] Mu C, Ni Z, Sun C, He H. Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems. IEEE Trans Cybern 2017;47(6):1460–70. [20] Jiao Q, Modares H, Xu S, Lewis FL, Vamvoudakis KG. Multi-agent zero-sum differential graphical games for disturbance rejection in distributed control. Automatica 2016;69:24–34. [21] Yue D, Tian E, Han Q-L. A delay system method for designing eventtriggered controllers of networked control systems. IEEE Trans Automat Control 2013;58(2):475–81.

195

[22] Yin X, Yue D, Hu S, Peng C, Xue Y. Model-based event-triggered predictive control for networked systems with data dropout. SIAM J Control Optim 2016;54(2):567–86. [23] Dimarogonas DV, Frazzoli E, Johansson KH. Distributed event-triggered control for multi-agent systems. IEEE Trans Automat Control 2012;57(5):1291–7. [24] Guo G, Ding L, Han Q-L. A distributed event-triggered transmission strategy for sampled-data consensus of multi-agent systems. Automatica 2014;50(5):1489–96. [25] Su H, Li Z, Ye Y. Event-triggered kalman-consensus filter for two-target tracking sensor networks. ISA Trans 2017;71:103–11. [26] Wang W, Huang C, Cao J, Alsaadi FE. Event-triggered control for sampled-data cluster formation of multi-agent systems. Neurocomputing 2017;267:25–35. [27] Tabuada P. Event-triggered real-time scheduling of stabilizing control tasks. IEEE Trans Automat Control 2007;52(9):1680–5. [28] Liu Q, Wang Z, He X, Zhou D. A survey of event-based strategies on control and estimation. Syst Sci Control Eng: Open Access J 2014;2(1):90–7. [29] Tolić D, Fierro R, Ferrari S. Optimal self-triggering for nonlinear systems via approximate dynamic programming. In: 2012 IEEE international conference on control applications. IEEE; 2012, p. 879–84. [30] Avimanyu S, Hao X, Sarangapani J. Neural network-based event-triggered state feedback control of nonlinear continuous-time systems. IEEE Trans Neural Netw Learn Syst 2015;27(3):497–509. [31] Zhong X, He H. An event-triggered aDP control approach for continuous-time system with unknown internal states. IEEE Trans Cybern 2017;47(3):683–94. [32] Dong L, Zhong X, Sun C, He H. Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst 2017;28(7):1594–605. [33] Vamvoudakis KG. Event-triggered optimal adaptive control algorithm for continuous-time nonlinear systems. IEEE/CAA J Autom Sin 2014;1(3):282–93. [34] Sahoo A, Jagannathan S. Stochastic optimal regulation of nonlinear networked control systems by using event-driven adaptive dynamic programming. IEEE Trans Cybern 2017;47(2):425–38. [35] Adib Yaghmaie F, Lewis FL, Su R. Output regulation of heterogeneous linear multi-agent systems with differential graphical game. Internat J Robust Nonlinear Control 2016;26(10):2256–78. [36] Zhang H, Liu D. Fuzzy modeling and fuzzy control. Springer Science & Business Media; 2006. [37] Sarangapani J. Neural network control of nonlinear discrete-time systems. CRC/Taylor & Francis; 2006. [38] Khalil HK. Nonlinear systems,3rd ed. Upper Saddle River, NJ, USA: PrenticeHall; 2002.