Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming

Information Sciences 282 (2014) 167–179 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins...

Download PDF

697KB Sizes 0 Downloads 140 Views

Report

PDF Reader
Full Text

Information Sciences 282 (2014) 167–179

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming q Ding Wang, Derong Liu ⇑, Hongliang Li, Hongwen Ma The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

a r t i c l e

i n f o

Article history: Received 24 December 2013 Received in revised form 13 May 2014 Accepted 21 May 2014 Available online 11 June 2014 Keywords: Adaptive dynamic programming Approximate dynamic programming Decentralized optimal control Neural network Robust optimal control

a b s t r a c t In this paper, the neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming approach is investigated. First, the robust controller of the original uncertain system is derived by adding a feedback gain to the optimal controller of the nominal system. It is also shown that this robust controller can achieve optimality under a speciﬁed cost function, which serves as the basic idea of the robust optimal control design. Then, a critic network is constructed to solve the Hamilton– Jacobi–Bellman equation corresponding to the nominal system, where an additional stabilizing term is introduced to verify the stability. The uniform ultimate boundedness of the closed-loop system is also proved by using the Lyapunov approach. Moreover, the obtained results are extended to solve decentralized optimal control problem of continuous-time nonlinear interconnected large-scale systems. Finally, two simulation examples are presented to illustrate the effectiveness of the established control scheme. Ó 2014 Elsevier Inc. All rights reserved.

1. Introduction In practical control systems, model uncertainties arise frequently and can severely degrade the closed-loop system performance. Hence, the problem of designing robust controller for nonlinear systems with uncertainties has drawn considerable attention in recent literature [43,15,31]. Lin et al. [15] showed that the robust control problem can be solved by studying the optimal control problem of the corresponding nominal system, but the detailed procedure was not presented. In [31], the authors developed an iterative algorithm for online design of robust control for a class of continuous-time nonlinear systems. However, the optimality of the robust controller with respect to a speciﬁed cost function was not discussed. In [43], the authors addressed the problem of designing robust tracking controls for a class of uncertain nonholonomic systems actuated by brushed direct current motors, while the research was not related with the optimality. The starting point of the obtained strategy of this paper is optimal control. The nonlinear optimal control problem always requires to solve the Hamilton–Jacobi–Bellman (HJB) equation. Though dynamic programming has been a conventional method in solving optimization and optimal control problems, it often suffers from the curse of dimensionality, which was primarily due to the backward-in-time approach. To avoid the difﬁculty, based on function approximators, such as neural networks, adaptive/approximate dynamic programming (ADP) was proposed by Werbos [35] as a method to solve q This work was supported in part by the National Natural Science Foundation of China under Grants 61034002, 61233001, 61273140, 61304086, and 61374105, in part by Beijing Natural Science Foundation under Grant 4132078, and in part by the Early Career Development Award of SKLMCCS. ⇑ Corresponding author. Tel.: +86 10 82544761; fax: +86 10 82544799. E-mail addresses: [email protected] (D. Wang), [email protected] (D. Liu), [email protected] (H. Li), [email protected] (H. Ma).

http://dx.doi.org/10.1016/j.ins.2014.05.050 0020-0255/Ó 2014 Elsevier Inc. All rights reserved.

168

D. Wang et al. / Information Sciences 282 (2014) 167–179

optimal control problems forward-in-time. Recently, the study on ADP and related ﬁelds have gained much attention from various scholars [1–10,12–14,16–25,28–30,32–34,36–38,40–42,44–46]. Lewis and Vrabie [13] stated that the ADP technique is closely related to the ﬁeld of reinforcement learning. As is known to all, policy iteration is one of the basic algorithms of reinforcement learning. In addition, the initial admissible control is necessary when employing the policy iteration algorithm. However, in many situations, it is difﬁcult to ﬁnd the initial admissible control. To the best of our knowledge, there are few results on robust optimal control of uncertain nonlinear systems based on ADP, not to mention the decentralized optimal control of large-scale systems. This is the motivation of our research. Actually, it is the ﬁrst time that the robust optimal control scheme for a class of uncertain nonlinear systems via ADP technique and without using an initial admissible control is established. To begin with, the optimal controller of the nominal system is designed. It can be proved that the modiﬁcation of optimal control law is in fact the robust controller of the original uncertain system, which also achieves optimality under the deﬁnition of a cost function. Then, a critic network is constructed for solving the HJB equation corresponding to the nominal system. In addition, inspired by the work of [5,24], an additional stabilizing term is introduced to verify the stability, which relaxes the need for an initial stabilizing control. The uniform ultimate boundedness (UUB) of the closed-loop system is also proved via the Lyapunov approach. Furthermore, the aforementioned results are extended to deal with the decentralized optimal control for a class of continuous-time nonlinear interconnected systems. At last, two simulation examples are given to show the effectiveness of the robust optimal control scheme. 2. Problem statement and preliminaries In this paper, we study the continuous-time uncertain nonlinear systems given by

_ ðtÞ þ dðxðtÞÞÞ; xðtÞ ¼ f ðxðtÞÞ þ gðxðtÞÞðu n

ð1Þ m

ðtÞ 2 R is the control vector, f ðÞ and gðÞ are differentiable in their arguments with where xðtÞ 2 R is the state vector and u is the unknown nonlinear perturbation. Let xð0Þ ¼ x0 be the initial state. We assume that dð0Þ ¼ 0, so that f ð0Þ ¼ 0, and dðxÞ x ¼ 0 is an equilibrium of system (1). As in many other literature, for the nominal system

_ xðtÞ ¼ f ðxðtÞÞ þ gðxðtÞÞuðtÞ;

ð2Þ n

we also assume that f þ gu is Lipschitz continuous on a set X in R containing the origin and that system (2) is controllable. ðxÞ, such that For system (1), in order to deal with the robust control problem, we should ﬁnd a feedback control policy u In this paper, we will show that this probthe closed-loop system is globally asymptotically stable for all uncertainties dðxÞ. lem can be converted into designing an optimal controller for the corresponding nominal system with appropriate cost function. where dðxÞ 2 Rm is bounded by a Let R 2 Rmm be a symmetric positive deﬁnite matrix. Then, we denote dðxÞ ¼ R1=2 dðxÞ, known function dM ðxÞ, i.e., kdðxÞk 6 dM ðxÞ with dM ð0Þ ¼ 0. For system (2), in order to deal with the inﬁnite horizon optimal control problem, we have to ﬁnd the control policy uðxÞ, which minimizes the cost function given by

Jðx0 Þ ¼

Z

1 0

n o 2 dM ðxðsÞÞ þ uT ðxðsÞÞRuðxðsÞÞ ds:

ð3Þ

Based on optimal control theory, the designed feedback control must not only stabilize the system on X, but also guarantee that the cost function (3) is ﬁnite. In other words, the control policy must be admissible [1,28]. Let WðXÞ be the set of admissible controls on X. For any admissible control policy u 2 WðXÞ, if the associated cost function (3) is continuously differentiable, then its inﬁnitesimal version is the nonlinear Lyapunov equation given by 2

0 ¼ dM ðxÞ þ uT ðxÞRuðxÞ þ ðrJðxÞÞT ðf ðxÞ þ gðxÞuðxÞÞ;

ð4Þ

with Jð0Þ ¼ 0. In Eq. (4), the symbol rðÞ , @ðÞ=@x is the notation of gradient operator, for example, rJðxÞ ¼ @JðxÞ=@x. Deﬁne the Hamiltonian function of system (2) as follows: 2

Hðx; u; rJðxÞÞ ¼ dM ðxÞ þ uT ðxÞRuðxÞ þ ðrJðxÞÞT ðf ðxÞ þ gðxÞuðxÞÞ:

ð5Þ

The optimal cost function of system (2) can be formulated as

J ðx0 Þ ¼ min

u2WðXÞ

Z 0

1

n o 2 dM ðxðsÞÞ þ uT ðxðsÞÞRuðxðsÞÞ ds:

ð6Þ

According to optimal control theory, the optimal cost function J ðxÞ satisﬁes the HJB equation

0 ¼ min Hðx; u; rJ ðxÞÞ: u2WðXÞ

ð7Þ

Assume that the minimum on the right hand side of (7) exists and is unique. Then, the optimal control policy is

1 u ðxÞ ¼ R1 g T ðxÞrJ ðxÞ: 2 Based on (5) and (8), the HJB Eq. (7) becomes

ð8Þ

169

D. Wang et al. / Information Sciences 282 (2014) 167–179

1 2 T T 0 ¼ dM ðxÞ þ ðrJ ðxÞÞ f ðxÞ ðrJ ðxÞÞ gðxÞR1 g T ðxÞrJ ðxÞ; 4

ð9Þ

with J ð0Þ ¼ 0. Consider system (2) with cost function (3) and the optimal feedback control (8). The following assumption is presented for the robust optimal control design. Assumption 1. Let J s ðxÞ be a continuously differentiable Lyapunov function candidate satisfying

J_s ðxÞ ¼ ðrJ s ðxÞÞT x_ ¼ ðrJ s ðxÞÞT ðf ðxÞ þ gðxÞu Þ < 0:

ð10Þ

Assume there exists a positive deﬁnite matrix KðxÞ such that the following relation holds:

ðrJ s ðxÞÞT ðf ðxÞ þ gðxÞu Þ ¼ ðrJ s ðxÞÞT KðxÞrJ s ðxÞ:

ð11Þ

Remark 1. This is a common assumption that has been used in some literature, for instance [5,24], to facilitate discussing the stability issue of closed-loop system. According to [5], we assume that the closed-loop dynamics with optimal control can be bounded by a function of system state. Without loss of generality, we assume that kf ðxÞ þ gðxÞu k 6 gkrJ s ðxÞk with

g > 0. Hence, we can further obtain kðrJs ðxÞÞT ðf ðxÞ þ gðxÞu Þk 6 gkrJs ðxÞk2 . Let km and kM be the minimum and maximum eigenvalues of matrix KðxÞ. Considering ðrJ s ðxÞÞT ðf ðxÞ þ gðxÞu Þ < 0 and the fact that km krJ s ðxÞk2 6 ðrJ s ðxÞÞT KðxÞrJ s ðxÞ 6 kM krJ s ðxÞk2 , we can conclude that Assumption 1 is reasonable. Speciﬁcally, in this paper, J s ðxÞ can be obtained by properly selecting a quadratic polynomial.

3. Robust optimal control design of uncertain nonlinear systems In this section, for establishing the robust stabilizing control strategy of system (1), we modify the optimal control law (8) of system (2) by proportionally increasing a feedback gain, i.e.,

1 ðxÞ ¼ fu ðxÞ ¼ fR1 g T ðxÞrJ ðxÞ: u 2

ð12Þ

Now, we present the following lemma to indicate that the optimal control has inﬁnite gain margin. Lemma 1. For system (2), the feedback control given by (12) ensures that the closed-loop system is asymptotically stable for all f P 1=2. Proof. We show that J ðxÞ is a Lyapunov function. In light of (6), we can easily ﬁnd that J ðxÞ is positive deﬁnite. Considering (9) and (12), the derivative of J ðxÞ along the trajectory of the closed-loop system is T ðxÞÞ J_ ðxÞ ¼ ðrJ ðxÞÞ ðf ðxÞ þ gðxÞu 2 1 1 1=2 T 2 f g ðxÞrJ ðxÞ : ¼ dM ðxÞ R 2 2

Hence, J_ ðxÞ < 0 whenever f P 1=2 and x – 0. Then, the conditions for Lyapunov local stability theory are satisﬁed.

ð13Þ

Theorem 1. For system (1), there exists a positive number f1 P 1, such that for any f > f1 , the feedback control developed by (12) ensures that the closed-loop system is asymptotically stable. Proof. We select LðxÞ ¼ J ðxÞ as the Lyapunov function candidate. Taking the time derivative of LðxÞ along the trajectory of the closed-loop system, we obtain T _ ðxÞ þ dðxÞÞÞ: LðxÞ ¼ ðrJ ðxÞÞ ðf ðxÞ þ gðxÞðu

ð14Þ

Based on (13), we ﬁnd that

2 1 1 2 T T _ 1=2 1=2 LðxÞ 6 dM ðxÞ þ f ðrJ ðxÞÞ gðxÞR ðrJ ðxÞÞ gðxÞR dM ðxÞ : 2 2

ð15Þ

iT h T _ Let n ¼ dM ðxÞ; ðrJ ðxÞÞ gðxÞR1=2 . Then, we have LðxÞ 6 nT Hn, where

"

H¼

1 12

1 2

12

#

: 1

f2

ð16Þ

170

D. Wang et al. / Information Sciences 282 (2014) 167–179

From (16), we observe that there exists a positive number f1 P 1 such that any f > f1 can guarantee the positive deﬁniteness _ of H. Then, we have LðxÞ < 0, which implies that the closed-loop system is asymptotically stable. ðxÞ is the robust control strategy of the original system (1). Next, we will show that it also posAccording to Theorem 1, u sesses the property of optimality. For system (1), we deﬁne the following cost function

Jðx0 Þ ¼

Z

1 0

1 T ðxðsÞÞRu ðxðsÞÞ ds; Q ðxðsÞÞ þ u f

ð17Þ

where 2 T þ 1 ðf 1ÞðrJ ðxÞÞT gðxÞR1 g T ðxÞrJ ðxÞ: QðxÞ ¼ dM ðxÞ ðrJ ðxÞÞ gðxÞdðxÞ 4

ð18Þ

Lemma 2. There exists a positive number f2 P 2 such that for all f > f2 , the function Q ðxÞ is positive deﬁnite.

T

Proof. Adding and subtracting ð1=ðf 1ÞÞd ðxÞdðxÞ to (18), we ﬁnd that 2

QðxÞ ¼ dM ðxÞ þ

T 1 T T T T ðf 1ÞðrJ ðxÞÞ gðxÞR1=2 2d ðxÞ ðf 1ÞðrJ ðxÞÞ gðxÞR1=2 2d ðxÞ 4ðf 1Þ

1 T d ðxÞdðxÞ: f1

ð19Þ

Recalling kdðxÞk 6 dM ðxÞand f > 2, we can obtain 2

QðxÞ P dM ðxÞ

1 f2 2 T d ðxÞdðxÞ P d ðxÞ: f1 f1 M

This proves that Q ðxÞ is a positive deﬁnite function.

ð20Þ

Theorem 2. Consider system (1) with cost function (17). There exists a positive number f such that for any f > f , the feedback control law obtained by (12) is an asymptotically stabilizing solution of the optimal control problem. Proof. The Hamiltonian function of system (1) with cost function (17) is

1 T T ðxÞRu ðxÞ þ ðrJðxÞÞ ðf ðxÞ þ gðxÞðu ðxÞ þ dðxÞÞÞ; HðrJðxÞÞ ¼ Q ðxÞ þ u f

ð21Þ

where f > f2 P 2. Replacing JðxÞ with J ðxÞ and observing (18), the Eq. (21) becomes

1 1 T 2 T T ðxÞRu ðxÞ þ ðrJ ðxÞÞT gðxÞu ðxÞ: HðrJ ðxÞÞ ¼ dM ðxÞ þ ðrJ ðxÞÞ f ðxÞ þ ðf 1ÞðrJ ðxÞÞ gðxÞR1 g T ðxÞrJ ðxÞ þ u 4 f

ð22Þ

Using (9) and (12), we can further obtain that HðrJ ðxÞÞ ¼ 0, which shows that J ðxÞ is a solution of the HJB equation of system (1). Then, we say that the control law (12) achieves optimality with cost function (17). Furthermore, there exists a positive number f , maxff1 ; f2 g such that for any f > f , the control law (12) is an asymptotically stabilizing solution of the corresponding optimal control problem. Remark 2. Based on Theorems 1 and 2, there exists f > f such that the control law (12) cannot only stabilize system (1), but also achieve optimality with the deﬁned cost function. That is to say, for a ﬁxed f > f , the derived control law is the robust optimal control of the original uncertain nonlinear system. Remark 3. According to Theorems 1 and 2, in order to complete the robust optimal control design, we should put emphasis upon solving the optimal control problem of the nominal system. As we see in the introduction, the ADP approach is effective in nonlinear optimal control design. Hence, in next section, we will present the design method based on neural network and the corresponding stability proof of the closed-loop system.

4. Optimal control design via ADP approach and the stability proof According to the universal approximation property of neural networks, J ðxÞ can be reconstructed by a single-layer neural network on a compact set X as

D. Wang et al. / Information Sciences 282 (2014) 167–179

J ðxÞ ¼ xTc rc ðxÞ þ ec ðxÞ; l

171

ð23Þ l

where xc 2 R is the ideal weight, rc ðxÞ 2 R is the activation function, l is the number of neurons in the hidden layer, and ec ðxÞ is the approximation error. Then, we have

rJ ðxÞ ¼ ðrrc ðxÞÞT xc þ rec ðxÞ:

ð24Þ

Based on (24), the Lyapunov Eq. (4) becomes

2 0 ¼ dM ðxÞ þ uT ðxÞRuðxÞ þ xTc rrc ðxÞ þ ðrec ðxÞÞT ðf ðxÞ þ gðxÞuðxÞÞ:

ð25Þ

In light of [28,4,5], in this paper, we also assume that xc ; rrc ðxÞ, and ec ðxÞ and its derivative rec ðxÞ are all bounded on a compact set X. Since the ideal weights are unknown, a critic neural network is built as

bJðxÞ ¼ x ^ Tc rc ðxÞ;

ð26Þ

to approximate the optimal cost function. Similarly, we have

^ c: rbJðxÞ ¼ ðrrc ðxÞÞT x

ð27Þ

According to (8) and (24), we have

1 u ðxÞ ¼ R1 g T ðxÞ ðrrc ðxÞÞT xc þ rec ðxÞ : 2

ð28Þ

In light of (8) and (27), the approximate control function can be given as

1 ^ c: ^ ðxÞ ¼ R1 g T ðxÞðrrc ðxÞÞT x u 2

ð29Þ

Applying (29) to system (2), the closed-loop system dynamics is expressed as

1 ^ c: x_ ¼ f ðxÞ gðxÞR1 g T ðxÞðrrc ðxÞÞT x 2

ð30Þ

Using the neural network expression (24), the Hamiltonian function becomes

1 2 Hðx; xc Þ ¼ dM ðxÞ þ xTc rrc ðxÞf ðxÞ þ ecH xTc rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT xc ¼ 0; 4

ð31Þ

1 1 ecH ¼ ðrec ðxÞÞT f ðxÞ ðrec ðxÞÞT gðxÞR1 g T ðxÞðrrc ðxÞÞT xc ðrec ðxÞÞT gðxÞR1 g T ðxÞrec ðxÞ; 2 4

ð32Þ

where

denotes the residual error. Using the estimated weight vector, the approximate Hamiltonian function can be derived as

1 T b x ^ c Þ ¼ d2M ðxÞ þ x ^ Tc rrc ðxÞf ðxÞ x ^ rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT x ^ c , ec : Hðx; 4 c

ð33Þ

~ c ¼ xc x ^ c . Then, based on (31) and (33), we obtain Let the weight estimation error of the critic network be x

1 T 1 T ~ Tc rrc ðxÞf ðxÞ x ~ rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT x ~c þ x ~ rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT xc ecH : ec ¼ x 4 c 2 c

ð34Þ

^ c to minimize the objective function In order to train the critic network, we aim at designing x

Ec ¼

1 T e ec : 2 c

ð35Þ

Here, the weights of the critic network are tuned based on the standard steepest descent algorithm with an additional term introduced to ensure the boundedness of system state, i.e.,

^_ c ¼ ac x

@Ec ^c @x

1 ^ Þrrc ðxÞgðxÞR1 g T ðxÞrJ s ðxÞ; þ as Pðx; u 2

ð36Þ

where ac > 0 is the learning rate of the critic network, as > 0 is the learning rate of the additional term, J s ðxÞ is the Lyapunov ^ Þ is the additional stabilizing term deﬁned as function candidate given in Assumption 1, and Pðx; u

(

Pðx; u^Þ ¼

^ Þ < 0; 0; if J_s ðxÞ ¼ ðrJs ðxÞÞT ðf ðxÞ þ gðxÞu 1; else:

ð37Þ

172

D. Wang et al. / Information Sciences 282 (2014) 167–179

^ Þ is deﬁned based on the Lyapunov condition for stability. The second Remark 4. It is important to note that the term Pðx; u term of (36) is removed when the nonlinear system exhibits stable behavior. However, in case of the controlled system exhibits signs of instability, the second term of (36) is activated for reinforcing the training process. ~ c . According to (33), we have Next, we will ﬁnd the dynamics of the weight estimation error x

@ec 1 ^ c: ¼ rrc ðxÞf ðxÞ rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT x ^c 2 @x

ð38Þ

In light of (36), the dynamics of the weight estimation error is

~_ c ¼ ac ec x

@ec ^c @x

1 ^ Þrrc ðxÞgðxÞR1 g T ðxÞrJ s ðxÞ: as Pðx; u 2

ð39Þ

Then, combining (34) and (38), the error dynamics (39) becomes

1 4

~_ c ¼ ac x ~ Tc rrc ðxÞf ðxÞ x ~ Tc rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT x ~c x 1 T ~ c rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT xc ecH þ x 2 1 1 ~c rrc ðxÞf ðxÞ rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT xc þ rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT x 2 2 1 ^Þrrc ðxÞgðxÞR1 g T ðxÞrJ s ðxÞ: as Pðx; u 2

ð40Þ

Theorem 3. For system (2), let the control input be provided by (29) and the weight of the critic network be tuned by (36). Then, ~ c of the critic network are UUB. the state x of the closed-loop system and the weight estimation error x Proof. Choose the Lyapunov function candidate as

L¼

1 T a ~ x ~ c þ s J s ðxÞ: x 2ac c 2ac

ð41Þ

The derivative of (41) with respect to time along the dynamics (30) and (40) is

1 T_ a ~ þ s ðrJ s ðxÞÞT x: ~ x _ L_ ¼ x ac c c 2ac

ð42Þ

Substituting (40) into (42), we have

1 T ~ Tc x ~ Tc rrc ðxÞf ðxÞ x ~ rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT x ~c L_ ¼ x 4 c 1 T ~ rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT xc ecH þ x 2 c 1 1 ~c rrc ðxÞf ðxÞ rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT xc þ rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT x 2 2

as

2ac

~ Tc rrc ðxÞgðxÞR1 g T ðxÞrJ s ðxÞ þ Pðx; u^ Þx

as

2ac

_ ðrJ s ðxÞÞT x:

ð43Þ

Denote A ¼ rrc ðxÞgðxÞR1 g T ðxÞðrrc ðxÞÞT . Then, (43) becomes

1 T 1 T 1 T 1 T ~ Tc rrc ðxÞf ðxÞ þ x ~ c Ax ~c x ~ c Axc þ ecH ~ c Ax ~c x ~ c Ax c ~ Tc rrc ðxÞf ðxÞ þ x L_ ¼ x x 4 2 2 2

as 2ac

~ Tc rrc ðxÞgðxÞR1 g T ðxÞrJ s ðxÞ þ Pðx; u^ Þx

as 2ac

_ ðrJ s ðxÞÞT x:

ð44Þ

We assume k1m > 0 and k1M > 0 are the lower and upper bounds of the norm of matrix A, respectively. Additionally, assume that krrc ðxÞf ðxÞk 6 k3 , kAxc k 6 k4 , and kecH k 6 k5 , where k3 ; k4 , and k5 are positive constants. Hence, the inequality (44) becomes

( ! ! ) 1 3 2 3 2 2 1 3 3 3 3 ~ c k4 þ ~ c k2 þ k25 k2 kx /1 /2 k1m kx k1M k5 þ 1 þ 2 k23 þ þ L_ 6 8 8 16 2 4 16/22 4 4 8/1

as 2ac

~ Tc rrc ðxÞgðxÞR1 g T ðxÞrJ s ðxÞ þ Pðx; u^ Þx

as 2ac

_ ðrJ s ðxÞÞT x;

where /1 and /2 are constants chosen for the design purpose.

ð45Þ

D. Wang et al. / Information Sciences 282 (2014) 167–179

173

^ Þ ¼ 0. Since ðrJ s ðxÞÞT x_ < 0, there exists a positive constant k6 such that 0 < k6 krJ s ðxÞk 6 ðrJ s ðxÞÞT x_ holds. Case 1: Pðx; u Then, the inequality (45) becomes

3 a ~ c k4 þ k8 kx ~ c k2 þ k25 s k6 krJ s ðxÞk; L_ 6 k7 kx 4 2ac

ð46Þ

where

1 3 2 3 2 2 /1 /2 k1m ; 8 8 16 ! ! 1 3 3 3 k2 : þ k8 ¼ k1M k5 þ 1 þ 2 k23 þ 2 4 16/22 4 8/1

k7 ¼

ð47Þ ð48Þ

Therefore, whenever the inequality

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u uk þ 3k2 k þ k2 t 8 5 7 8 ~ ck P , A1 kx 2k7

ð49Þ

or

krJ s ðxÞk P

ac ð3k25 k7 þ k28 Þ , B1 2as k6 k7

ð50Þ

holds, we have L_ < 0. ^ Þ ¼ 1. According to (28) and (29), we have Case 2: Pðx; u

1 ~ c þ rec ðxÞ : ^ ¼ R1 g T ðxÞ ðrrðxÞÞT x u u 2

ð51Þ

In addition, we assume that kgðxÞR1 g T ðxÞk 6 k9 ; krrc ðxÞk 6 k10 , and krec ðxÞk 6 k11 , where k9 ; k10 , and k11 are also positive constants. Then, considering (11) and (51), the inequality (45) becomes

( ! ! ) 1 3 3 3 3 4 2 _L 6 1 3 /2 3 /2 k2 kx ~ ~ c k2 þ k25 k1M k5 þ 1 þ 2 k3 þ þ k2 kx ck þ 8 8 1 16 2 1m 2 4 16/22 4 4 8/1 þ

as 2ac

ðrJ s ðxÞÞT ðf ðxÞ þ gðxÞu Þ

as

as ~ T xc rrc ðxÞgðxÞR1 g T ðxÞrJs ðxÞ

4ac

ðrJ s ðxÞÞT gðxÞR1 g T ðxÞrec ðxÞ a 3 a ~ c k4 þ k8 þ s kx ~ c k2 þ k25 k12 krJ s ðxÞk2 þ s k9 k11 krJ s ðxÞk; 6 k7 kx 4 8ac 4ac þ

4ac

ð52Þ

where

k12 ¼

as 1 km k29 k210 : 4 2ac

ð53Þ

Therefore, whenever the inequality

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u u8a k þ a ð8ac k8 þ as Þ2 3k25 a 2 k2 k2 c 8 s ~ ck P t kx þ þ þ s 29 11 , A2 2 16ac k7 4k7 64ac k7 k12 256a2c k7

ð54Þ

or

as k9 k11 krJ s ðxÞk P þ 8ac k12

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ð8ac k8 þ as Þ2 3k25 a 2 k2 k2 þ þ s 9 11 , B2 256a2c k7 k12 4k12 64a2c k212

ð55Þ

holds, we obtain L_ < 0. ~ c k > maxðA1 ; A2 Þ or krJ s ðxÞk > maxðB1 ; B2 Þ holds, then L_ < 0. Thus, by According to Cases 1 and 2, if the inequality kx ~ c are UUB. using the standard Lyapunov extension theorem [11], the state x and the error x 5. Decentralized optimal control design of nonlinear interconnected systems Large-scale systems are common in engineering area when doing research on complex dynamical systems that can be partitioned into a set of interconnected subsystems. The decentralized control is one of the effective design approaches and has attracted a great amount of interest due to its advantages in easier implementation and lower dimensionality

174

D. Wang et al. / Information Sciences 282 (2014) 167–179

[17,10,26,27,39]. In this section, we generalize the aforementioned results to decentralized optimal control for a class of continuous-time nonlinear interconnected large-scale systems. This part is also an extension of the decentralized control strategy developed in [17]. Consider nonlinear large-scale systems composed of N interconnected subsystems which are described by

ðxD ðtÞÞ; i ¼ 1; 2; . . . ; N; i ðtÞ þ d x_ i ðtÞ ¼ fi ðxi ðtÞÞ þ g i ðxi ðtÞÞ u i ni

ð56Þ

mi

i ðtÞ 2 R are the state vector and the control vector of the ith subsystem, respectively, and where xi ðtÞ 2 R and u P T xD ¼ ½xT1 ; xT2 ; . . . ; xTN 2 RnD is the overall state with nD ¼ Ni¼1 ni . Note that for subsystem i; fi ðxi Þ; g i ðxi Þ, and g i ðxi Þdi ðxD Þ represent the nonlinear internal dynamics, the input gain matrix, and the interconnected term, respectively. Here, x1 ; x2 ; . . . ; xN are 1 ; u 2 ; . . . ; u N are local controls. called local states while u Let xi ð0Þ ¼ xi0 be the initial state of the ith subsystem, i ¼ 1; 2; . . . ; N. Let Ri 2 Rmi mi , i ¼ 1; 2; . . . ; N, be a set of symmetric positive deﬁnite matrices. Denote di ðxD Þ ¼ R1=2 di ðxD Þ; i ¼ 1; 2; . . . ; N, which are bounded as i N X

qij hij ðxj Þ; i ¼ 1; 2; . . . ; N:

kdi ðxD Þk 6

ð57Þ

j¼1

In (57), qij are nonnegative constants and hij ðxj Þ are positive semi-deﬁnite functions with i; j ¼ 1; 2; . . . ; N. If we deﬁne hi ðxi Þ ¼ maxfh1i ðxi Þ; h2i ðxi Þ; . . . ; hNi ðxi Þg, i ¼ 1; 2; . . . ; N, then the Eq. (57) becomes N X

sij hj ðxj Þ; i ¼ 1; 2; . . . ; N;

kdi ðxD Þk 6

ð58Þ

j¼1

where sij P qij hij ðxj Þ=hj ðxj Þ, i; j ¼ 1; 2; . . . ; N, are also nonnegative constants. In the following, we focus on designing the decentralized optimal control law. First, we should ﬁnd N control policies 1 ðx1 Þ; u 2 ðx2 Þ; . . ., u N ðxN Þ, such that the constituted control vector ðu 1 ðx1 Þ; u 2 ðx2 Þ; . . . ; u N ðxN ÞÞ can stabilize system (56). As is u shown in [17], the decentralized control can be obtained by solving the optimal control problem of the N isolated subsystems given by

x_ i ðtÞ ¼ fi ðxi ðtÞÞ þ g i ðxi ðtÞÞui ðtÞ; i ¼ 1; 2; . . . ; N:

ð59Þ

Let hi ðxi Þ 6 diM ðxi Þ; i ¼ 1; 2; . . . ; N. According to [17], we can ﬁnd a set of optimal control policies ui ðxi Þ; i ¼ 1; 2; . . . ; N, which minimize the local cost functions

J i ðxi0 Þ ¼

Z

1

n

0

o 2 diM ðxi ðsÞÞ þ uTi ðxi ðsÞÞRi ui ðxi ðsÞÞ ds; i ¼ 1; 2; . . . ; N:

ð60Þ

Using the notation of optimal cost functions J i ðxi Þ, i ¼ 1; 2; . . . ; N, the HJB equations of the isolated subsystems are

1 2 T T T 0 ¼ diM ðxi Þ þ ðrJ i ðxi ÞÞ fi ðxi Þ ðrJ i ðxi ÞÞ g i ðxi ÞR1 i g i ðxi ÞrJ i ðxi Þ; 4

ð61Þ

with J i ð0Þ ¼ 0. Then, there exist N positive numbers such that the feedback controls

1 i ðxi Þ ¼ fi ui ðxi Þ ¼ fi R1 g T ðxi ÞrJ i ðxi Þ; i ¼ 1; 2; . . . ; N; u 2 i i

ð62Þ

1 ðx1 Þ; u 2 ðx2 Þ; . . . ; u N ðxN ÞÞ, which is just the decentralized control strategy of system (56). can form a control pair ðu Next, we study the optimality of the decentralized control scheme with a speciﬁed overall cost function. Denote

3 2 3 2 1 ðxD Þ u1 ðx1 Þ d 7 6 7 6 f ðx Þ 7 6u 6 d2 ðxD Þ 7 6 2 2 7 6 2 ðx2 Þ 7 7 6 7 7 6 6 fD ðxD Þ ¼ 6 . 7; uD ðxD Þ ¼ 6 . 7; dD ðxD Þ ¼ 6 . 7; 7 6 . . . 4 . 5 4 . 5 4 . 5 N ðxN Þ u fN ðxN Þ dN ðxD Þ 1 1 1 RD ¼ diag R1 ; R2 ; . . . ; RN ; g D ðxD Þ ¼ diagfg 1 ðx1 Þ; g 2 ðx2 Þ; . . . ; g N ðxN Þg: f1 f2 fN 2

f1 ðx1 Þ

3

ð63Þ

For system (56), we deﬁne the following cost function

J D ðxD0 Þ ¼

Z

1 0

TD ðxD ðsÞÞRD u D ðxD ðsÞÞ ds; Q D ðxD ðsÞÞ þ u

ð64Þ

where

Q D ðxD Þ ¼

N X 2 T ðxD Þ þ 1 ðf 1ÞðrJ ðx ÞÞT g ðx ÞR1 g T ðx ÞrJ ðx Þ : diM ðxi Þ ðrJ i ðxi ÞÞ g i ðxi Þd i i i i i i i i i i 4 i i¼1

ð65Þ

D. Wang et al. / Information Sciences 282 (2014) 167–179

175

Then, we have the following theorem. Theorem 4. Consider system (56) with cost function (64). There exists a set of positive numbers such that the feedback control laws obtained by (62) constitute the decentralized optimal control of the interconnected large-scale system. Proof. Similar to Lemma 2 and by considering (58), we have

Q D ðxD Þ P

N X 2 diM ðxi Þ

1 T d ðxD Þdi ðxD Þ fi 1 i

i¼1 8 !2 9 N < N = X X 1 2 P diM ðxi Þ sij djM ðxj Þ : : ; fi 1 j¼1 i¼1

ð66Þ

Then, we ﬁnd that the positive deﬁniteness of Q D ðxD Þ can be guaranteed if fi ; i ¼ 1; 2; . . . ; N, are sufﬁciently large. Moreover, PN i¼1 J i ðxi Þ satisﬁes the equation

based on (61) and (65), we can prove that J D ðxD Þ ¼

D ðxD ÞÞ ¼ 0: TD ðxD ÞRD u D ðxD Þ þ ðrJ D ðxD ÞÞT fD ðxD Þ þ g D ðxD Þðu D ðxD Þ þ d Q D ðxD Þ þ u

ð67Þ

Therefore, consider system (56) with cost function (64), there exists a set of positive numbers such that the control pair 1 ðx1 Þ; u 2 ðx2 Þ; . . . ; u N ðxN ÞÞ obtained by (62) is the decentralized optimal control law. ðu Remark 5. It should be pointed out that, in this part, the ADP technique can also be employed to design the optimal controls of the isolated subsystems, where N critic networks will be constructed to facilitate the implementation procedure. Then, the decentralized control of the interconnected system can be obtained, which is, simultaneously, the optimal control with respect to an overall cost function. In this sense, we accomplish the decentralized optimal control design of the nonlinear interconnected large-scale system based on ADP approach.

6. Simulation studies Two examples are provided in this section to demonstrate the effectiveness of the robust optimal control strategy. Example 1. Consider the following continuous-time nonlinear system:

"

x_ ¼

0:5x1 þ x2 ð1 þ 0:5x22 Þ

#

þ

0:8ðx1 þ x2 Þ þ 0:5x2 ð1 0:3x22 Þ

0 þ dðxÞÞ; ðu 0:6

ð68Þ

2 R are the state and control variables, respectively. The term dðxÞ ¼ d1 x2 cosðd2 x1 þ d3 x2 Þ where x ¼ ½x1 ; x2 T 2 R2 and u reﬂects the uncertainty of the controlled plant, where d1 , d2 , and d3 are unknown parameters with d1 2 ½1; 1; d2 2 ½5; 5, and d3 2 ½3; 3. We set R ¼ I and choose dM ðxÞ ¼ kxk as the bound of dðxÞ. According to the results in this paper, in order to derive the optimal control of the nominal system

"

x_ ¼

0:5x1 þ x2 ð1 þ 0:5x22 Þ 0:8ðx1 þ x2 Þ þ 0:5x2 ð1 0:3x22 Þ

#

þ

0 0:6

u

ð69Þ

with cost function

Jðx0 Þ ¼

Z

1

n o kxðsÞk2 þ uT ðxðsÞÞRuðxðsÞÞ ds;

ð70Þ

0

we have to construct a neural network based on the idea of ADP. In this example, the critic network is built as

bJðxÞ ¼ x ^ c1 x21 þ x ^ c2 x1 x2 þ x ^ c3 x22 þ x ^ c4 x41 þ x ^ c5 x31 x2 þ x ^ c6 x21 x22 þ x ^ c7 x1 x32 þ x ^ c8 x42 :

ð71Þ

During the simulation process, the probing noise is introduced to satisfy the persistency of excitation condition. Let the learning rates of the critic network and the additional term be ac ¼ 0:8 and as ¼ 0:5, respectively. In addition, let the initial weight of the critic network be zero vector and the initial state of the controlled plant be x0 ¼ ½0:5; 0:5T . After simulation, we can observe that the convergence of the weights has occurred after 2500 s. Then, the probing signal is turned off. In fact, the weights of the critic network converge to ½0:8709; 0:1291; 1:0617; 0:0868; 0:1566; 0:2053; 0:0059; 0:0651T , which is displayed in Fig. 1. Next, the scalar parameters are chosen as f ¼ 3, d1 ¼ 0:8; d2 ¼ 5, and d3 ¼ 3, respectively, so as to evaluate the robust control performance. Under the action of the robust control strategy, the state trajectory of system (68) during the ﬁrst 20 s is shown in Fig. 2. In light of Theorem 2, it also achieves optimality with cost function deﬁned as in (17). These results authenticate the validity of the robust optimal control scheme developed in this paper.

176

D. Wang et al. / Information Sciences 282 (2014) 167–179 1.2

Weight of the critic network

1 0.8 0.6 0.4

ωac1

ωac2

ωac3

ωac4

ωac5

ωac6

ωac7

ωac8

0.2 0 −0.2 −0.4

0

500

1000

1500

2000

2500

3000

Time (s) ^ ci ; i ¼ 1; 2; . . . ; 8). Fig. 1. Convergence of the weight vector of the critic network (xaci represents x

0.5

x1

0.4

x2

0.3

State trajectory

0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 −0.5

0

5

10

15

20

Time (s) Fig. 2. The state trajectory.

Example 2. Consider the continuous-time nonlinear large-scale system [17]

x_ 1 ¼

x11 þ x12

þ

2

0

cosð2x11 Þ þ 2 0:5x11 0:5x12 0:5x12 ðcosð2x11 Þ þ 2Þ x22 0 2 ðxD Þ; 2 ðx2 Þ þ d þ u x_ 2 ¼ 2 x21 0:5x22 þ 0:5x21 x22 x21

1 ðxD Þ; 1 ðx1 Þ þ d u ð72Þ

1 ðx1 Þ 2 R are the state and control variables of subsystem 1, x2 ¼ ½x21 ; x22 T 2 R2 and where x1 ¼ ½x11 ; x12 T 2 R2 and u T 2 ðx2 Þ 2 R are the state and control variables of subsystem 2, and xD ¼ ½xT1 ; xT2 u is the

overall state. The interconnected terms 2 2 are d1 ðxD Þ ¼ ðx11 þ x22 Þ sin x12 cosð0:5x21 Þ and d2 ðxD Þ ¼ 0:5ðx12 þ x22 Þ cos ex21 . Let R1 ¼ R2 ¼ I, h1 ðx1 Þ ¼ kx1 k, and h2 ðx2 Þ ¼j x22 j. In order to design the decentralized optimal controller of interconnected system (72), we ﬁrst aim at solving the optimal control problem of the two isolated subsystems

x_ 1 ¼

x11 þ x12

0:5x11 0:5x12 0:5x12 ðcosð2x11 Þ þ 2Þ2 x22 0 þ u2 ðx2 Þ: x_ 2 ¼ x21 0:5x22 þ 0:5x221 x22 x21

þ

0 cosð2x11 Þ þ 2

u1 ðx1 Þ; ð73Þ

177

D. Wang et al. / Information Sciences 282 (2014) 167–179 3

Weight of the critic network 1

2

1

0

−1

ωac11 −2

ω

ac12

ωac13 −3

0

50

100

150

200

Time (s) ^ c11 ; x ^ c12 , and x ^ c13 , respectively). Fig. 3. Convergence of the weight vector of the critic network 1 (xac11 ; xac12 , and xac13 represent x

1.5

Weight of the critic network 2

1

0.5

0

ω

ac21

−0.5

ω

ac22

ω

ac23

−1

0

50

100

150

200

Time (s) ^ c21 ; x ^ c22 , and x ^ c23 , respectively). Fig. 4. Convergence of the weight vector of the critic network 2 (xac21 ; xac22 , and xac23 represent x

1

x

State trajectory of subsystem 1

0.8

x

11 12

0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 0

5

10

15

Time (s) Fig. 5. The state trajectory of subsystem 1.

20

25

178

D. Wang et al. / Information Sciences 282 (2014) 167–179 1

x

State trajectory of subsystem 2

0.8

21

x22

0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1

0

5

10

15

20

25

Time (s) Fig. 6. The state trajectory of subsystem 2.

Here, we choose d1M ðx1 Þ ¼ kx1 k and d2M ðx2 Þ ¼j x22 j. Hence, the cost functions of the optimal control problem are

J 1 ðx10 Þ ¼

Z

1

0

n o kx1 ðsÞk2 þ uT1 ðxðsÞÞR1 u1 ðxðsÞÞ ds;

ð74Þ

n o j x22 ðsÞj2 þ uT2 ðxðsÞÞR2 u2 ðxðsÞÞ ds;

ð75Þ

and

J 2 ðx20 Þ ¼

Z 0

1

respectively. T T Here, two critic networks are constructed with activation functions chosen as ½x211 ; x11 x12 ; x212 and ½x221 ; x21 x22 ; x222 , respectively. Besides, let the learning rates be the same as in Example 1, the initial weights of the two critic networks be ½0; 0; 0T , and the initial states of the two isolated subsystems be x10 ¼ x20 ¼ ½1; 1T . During the simulation, we can ﬁnd that after 180s, the weights of the critic networks converge to ½0:5000; 0:0001; 1:0000T and ½0:9949; 0:0034; 0:9959T (see Figs. 3 and 4). Next, we apply the decentralized control scheme to controlled plant (72) for 25s and obtain the evolution processes of the state trajectories illustrated in Figs. 5 and 6. These simulation results verify the validity of the decentralized optimal control scheme developed in this paper. 7. Conclusion A novel robust optimal control scheme for a class of uncertain nonlinear systems via ADP approach is developed in this paper. It is proved that the robust controller of the original uncertain system achieves optimality under a speciﬁed cost function. During the implementation process, a critic network is constructed to solve the HJB equation of the nominal system and an additional stabilizing term is introduced to verify the stability. The obtained results are also extended to design the decentralized optimal control for a class of nonlinear interconnected large-scale systems. Simulation studies verify the good control performance. In the future, we will focus on studying the robust and decentralized optimal control for nonlinear systems with unknown dynamics. In this sense, the requirement of system dynamics will be further reduced, which reﬂects the superiority of ADP technique in dealing with the optimal control problem under nonlinear, uncertain, and complex environment. References [1] M. Abu-Khalaf, F.L. Lewis, Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica 41 (2005) 779–791. [2] A. Al-Tamimi, F.L. Lewis, M. Abu-Khalaf, Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof, IEEE Trans. Syst. Man Cybernet. – Part B: Cybernet. 38 (2008) 943–949. [3] D.P. Bertsekas, M.L. Homer, D.A. Logan, S.D. Patek, N.R. Sandell, Missile defense and interceptor allocation by neuro-dynamic programming, IEEE Trans. Syst. Man Cybernet. – Part A: Syst. Humans 30 (2000) 42–51. [4] S. Bhasin, R. Kamalapurkar, M. Johnson, K.G. Vamvoudakis, F.L. Lewis, W.E. Dixon, A novel actor-critic-identiﬁer architecture for approximate optimal control of uncertain nonlinear systems, Automatica 49 (2013) 82–92. [5] T. Dierks, S. Jagannathan, Optimal control of afﬁne nonlinear continuous-time systems, in: Proceedings of the American Control Conference, Baltimore, MD, USA, 2010, pp. 1568–1573. [6] T. Dierks, S. Jagannathan, Online optimal control of afﬁne nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update, IEEE Trans. Neural Netw. Learn. Syst. 23 (2012) 1118–1129. [7] J. Fu, H. He, X. Zhou, Adaptive learning and control for MIMO system based on adaptive dynamic programming, IEEE Trans. Neural Netw. 22 (2011) 1133–1148.

D. Wang et al. / Information Sciences 282 (2014) 167–179

179

[8] A. Heydari, S.N. Balakrishnan, Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics, IEEE Trans. Neural Netw. Learn. Syst. 24 (2013) 145–157. [9] Y. Jiang, Z.P. Jiang, Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, Automatica 48 (2012) 2699–2704. [10] Z.P. Jiang, Y. Jiang, Robust adaptive dynamic programming for linear and nonlinear systems: an overview, Eur. J. Control 19 (2013) 417–425. [11] F.L. Lewis, S. Jagannathan, A. Yesildirek, Neural Network Control of Robot Manipulators and Nonlinear Systems, Taylor & Francis, London, 1999. [12] F.L. Lewis, D. Liu, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, Wiley, Hoboken, NJ, 2013. [13] F.L. Lewis, D. Vrabie, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circ. Syst. Mag. 9 (2009) 32–50. [14] J. Liang, G.K. Venayagamoorthy, R.G. Harley, Wide-area measurement based dynamic stochastic optimal power ﬂow control for smart grids with high variability and uncertainty, IEEE Trans. Smart Grid 3 (2012) 59–69. [15] F. Lin, R.D. Brand, J. Sun, Robust control of nonlinear systems: compensating for uncertainty, Int. J. Control 56 (1992) 1453–1459. [16] D. Liu, H. Li, D. Wang, Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm, Neurocomputing 110 (2013) 92–100. [17] D. Liu, D. Wang, H. Li, Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach, IEEE Trans. Neural Netw. Learn. Syst. 25 (2014) 418–428. [18] D. Liu, D. Wang, X. Yang, An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs, Inform. Sci. 220 (2013) 331–342. [19] D. Liu, D. Wang, D. Zhao, Q. Wei, N. Jin, Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming, IEEE Trans. Autom. Sci. Eng. 9 (2012) 628–634. [20] B. Luo, H.N. Wu, Approximate optimal control design for nonlinear one-dimensional parabolic PDE systems using empirical eigenfunctions and neural network, IEEE Trans. Syst. Man Cybernet. – Part B: Cybernet. 42 (2012) 1538–1549. [21] H. Modares, F.L. Lewis, M.B. Naghibi-Sistani, Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Automatica 50 (2014) 193–202. [22] H. Modares, M.B. Naghibi-Sistani, F.L. Lewis, A policy iteration approach to online optimal control of continuous-time constrained-input systems, ISA Trans. 52 (2013) 611–621. [23] Z. Ni, H. He, J. Wen, Adaptive learning in tracking control based on the dual critic network design, IEEE Trans. Neural Netw. Learn. Syst. 24 (2013) 913– 928. [24] D. Nodland, H. Zargarzadeh, S. Jagannathan, Neural network-based optimal adaptive output feedback control of a helicopter UAV, IEEE Trans. Neural Netw. Learn. Syst. 24 (2013) 1061–1073. [25] D.V. Prokhorov, D.C. Wunsch, Adaptive critic designs, IEEE Trans. Neural Netw. 8 (1997) 997–1007. [26] A. Saberi, On optimality of decentralized control for a class of nonlinear interconnected systems, Automatica 24 (1988) 101–104. [27] D.D. Siljak, Decentralized Control of Complex Systems, Academic Press, Boston, MA, 2012. [28] K.G. Vamvoudakis, F.L. Lewis, Online actor-critic algorithm to solve the continuous-time inﬁnite horizon optimal control problem, Automatica 46 (2010) 878–888. [29] D. Wang, D. Liu, Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique, Neurocomputing 121 (2013) 218– 225. [30] D. Wang, D. Liu, H. Li, Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems, IEEE Trans. Autom. Sci. Eng. 11 (2014) 627–632. [31] D. Wang, D. Liu, H. Li, X. Yang, A learning optimal control scheme for robust stabilization of a class of uncertain nonlinear systems, in: Proceedings of the Chinese Control Conference, Xi’an, China, 2013, pp. 7834–7839. [32] D. Wang, D. Liu, Q. Wei, Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach, Neurocomputing 78 (2012) 14–22. [33] D. Wang, D. Liu, Q. Wei, D. Zhao, N. Jin, Optimal control of unknown nonafﬁne nonlinear discrete-time systems based on adaptive dynamic programming, Automatica 48 (2012) 1825–1832. [34] F.Y. Wang, H. Zhang, D. Liu, Adaptive dynamic programming: an introduction, IEEE Comput. Intell. Mag. 4 (2009) 39–47. [35] P.J. Werbos, Approximate dynamic programming for real-time control and neural modeling, in: D.A. White, D.A. Sofge (Eds.), Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, New York, 1992. [36] H.N. Wu, B. Luo, Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear H1 control, IEEE Trans. Neural Netw. Learn. Syst. 23 (2012) 1884–1895. [37] H.N. Wu, B. Luo, Simultaneous policy update algorithms for learning the solution of linear continuous-time H1 state feedback control, Inform. Sci. 222 (2013) 472–485. [38] B. Xu, C. Yang, Z. Shi, Reinforcement learning output feedback NN control using deterministic learning technique, IEEE Trans. Neural Netw. Learn. Syst. 25 (2014) 635–641. [39] D. Xu, V. Ugrinovskii, Decentralized measurement feedback stabilization of large-scale systems via control vector Lyapunov functions, Syst. Control Lett. 62 (2013) 1187–1195. [40] X. Xu, Z. Hou, C. Lian, H. He, Online learning control using adaptive critic designs with sparse kernel machines, IEEE Trans. Neural Netw. Learn. Syst. 24 (2013) 762–775. [41] X. Xu, L. Zuo, Z. Huang, Reinforcement learning algorithms with function approximation: recent advances and applications, Inform. Sci. 261 (2014) 1– 31. [42] X. Yang, D. Liu, D. Wang, Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints, Int. J. Control 87 (2014) 553–566. [43] H.M. Yen, T.H.S. Li, Y.C. Chang, Design of a robust neural network-based tracking controller for a class of electrically driven nonholonomic mechanical systems, Inform. Sci. 222 (2013) 559–575. [44] H. Zhang, L. Cui, Y. Luo, Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP, IEEE Trans. Cybernet. 43 (2013) 206–216. [45] H. Zhang, D. Liu, Y. Luo, D. Wang, Adaptive Dynamic Programming for Control: Algorithms and Stability, Springer, London, UK, 2013. [46] H. Zhang, Y. Luo, D. Liu, Neural-network-based near-optimal control for a class of discrete-time afﬁne nonlinear systems with control constraints, IEEE Trans. Neural Netw. 20 (2009) 1490–1503.