Adaptive Dynamic Programming Based Robust Control of Nonlinear Systems with Unmatched Uncertainties

Adaptive Dynamic Programming Based Robust Control of Nonlinear Systems with Unmatched Uncertainties

Adaptive Dynamic Programming Based Robust Control of Nonlinear Systems with Unmatched Uncertainties Communicated by Dr Chenguang Yang Journal Pre-p...

2MB Sizes 0 Downloads 117 Views

Adaptive Dynamic Programming Based Robust Control of Nonlinear Systems with Unmatched Uncertainties

Communicated by Dr

Chenguang Yang

Journal Pre-proof

Adaptive Dynamic Programming Based Robust Control of Nonlinear Systems with Unmatched Uncertainties Jun Zhao , Jing Na, Guanbin Gao PII: DOI: Reference:

S0925-2312(20)30200-9 https://doi.org/10.1016/j.neucom.2020.02.025 NEUCOM 21893

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

9 October 2019 27 January 2020 1 February 2020

Please cite this article as: Jun Zhao , Jing Na, Guanbin Gao, Adaptive Dynamic Programming Based Robust Control of Nonlinear Systems with Unmatched Uncertainties, Neurocomputing (2020), doi: https://doi.org/10.1016/j.neucom.2020.02.025

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier B.V.

Adaptive Dynamic Programming Based Robust Control of Nonlinear Systems with Unmatched Uncertainties Jun Zhao1 , Jing Na1,∗ and Guanbin Gao1 Faculty of Mechanical and Electrical Engineering, Kunming University of Science and Technology, Kunming, 650500 China Corresponding author1,∗

Abstract This paper proposes a new approach to address robust control design for nonlinear continuous-time systems with unmatched uncertainties. First, we transform the robust control problem into an equivalent optimal control problem, which allows to simplify the control design. A critic neural network (NN) is then adopted to reformulate the derived Hamilton-Jacobi-Bellman (HJB) equation based on the optimal control methodology. Then, a novel adaptation algorithm is used to online directly estimate the unknown NN weights, so as to achieve the guaranteed convergence. The control system stability and the convergence of the derived control action to the optimal solution can be rigorously proved. Finally, two simulation examples are provided to illustrate the validity and efficacy of the proposed method. Keywords: Adaptive dynamic programming, robust control, neural network, adaptive control, uncertain systems

1. Introduction In practical industrial engineering, control systems are usually effected by unavoidable uncertainties stemming from external disturbances, modeling errors ∗ Jing

Na Email address: [email protected] (Corresponding author) 1 This work was supported by National Natural Science Foundation of China (NSFC) under Grants 61922037 and 61873115.

Preprint submitted to Neurocomputing

February 6, 2020

and nonsmooth dynamics [1, 2, 3, 4]. The presence of such uncertainties may 5

degrade the control performance of model-based control schemes. To accomodate the uncertainties, robust control [5] has been investigated, which can take the effects of uncertainties into consideration in the control designs such that the controlled systems can have guaranteed stability and satisfactory responses even in the presence of bounded uncertainties. Owing to this attractive prop-

10

erty, various robust control techniques have been subsequently proposed during the past decades, e.g., [6, 7, 5, 8]. However, apart from the well known linear matrix inequality (LMI) [9], the derived robust control problem is difficult to solve directly. Hence, many efforts have been also made to seek for alternative solutions for robust control designs. In [7, 8], it has been found that the robust

15

control problem can be transformed into an optimal control problem and then solved indirectly via the optimal control methodologies. This idea provides a new pathway to address robust control of uncertain systems by addressing the equivalent optimal control of nominal systems subject to a constructive optimal cost function [10]. However, the online solution of the derived optimal control

20

problem has not been addressed in [7, 8, 10]. Considering the optimal control design for linear systems, many techniques have been proposed to solve the associated Riccati equation (ARE). While for nonlinear systems, solving the Hamilton-Jacobi-Bellman (HJB) equation is even more difficult [11]. In classical optimal control schemes, the derived ARE or HJB

25

equations are usually solved in an offline manner [12]. With the wish to solve the optimal control problem online, reinforcement learning (RL) was further explored, leading to the named adaptive dynamic programming (ADP) method [13]. In recent years, ADP has been proved as a feasible technique to learn the optimal control solution [14, 15, 16, 17, 18]. The ADP methods have been

30

initially developed for discrete-time systems [19] due to its iteration feature. For continuous-time systems, several offline [20] or online [21] ADP methods have been subsequently proposed based on the classical critic-actor framework. Recently, a novel identifier-critic based ADP structure [22] was developed to address the optimal tracking control [23], nonzero-sum game [24] and the out2

35

put energy maximization problem [25], where the requirements on the accurate system dynamics and the constraints can be effectively addressed. In parallel, Jiang [26, 27] proposed an iterative algorithm for solving decentralized optimal control for multi-machine power systems. In this line, the ADP method was recently adopted to address robust control problem with matched uncertain-

40

ties [28, 29] by extending the idea of [8]. Although ADP based robust control for nonlinear system with unmatched uncertainties was reported in recent work [30, 31], the actor NN was used there, leading to a complex ADP framework with increased computational costs. In fact, in most existing ADP literatures, the adopted online learning methods for NN weights were derived based on the clas-

45

sical gradient algorithm so that it is generally difficult to retain the convergence of the estimated weights to the true values. Based on the above observations, the aim of this paper is to propose an alternative method to solve the robust control of nonlinear systems with unmatched uncertainties. We first transform the robust control problem into an

50

optimal control problem with a constructive cost function and an augmented auxiliary system, and then tailor the ADP technique to propose a single critic NN based ADP method to online solve the derived optimal control problem. In this respect, the actor NN that has been widely used in the existing ADP structures is avoided. Hence, a modified HJB equation can be derived to solve

55

the optimal control problem. Finally, we suggest a novel adaptation algorithm to online directly estimate the unknown weights of critic NN, which are used to estimate the solution of HJB equation. With this new learning algorithm, the convergence of the derived control action to the optimal solution can be guaranteed. More specifically, the computational efficiency for the robust control

60

problem can be improved by using the proposed single critic NN based ADP method even in the presence of unmatched uncertainties. The major contributions of this paper include: 1. The robust control design for nonlinear systems with unmatched uncertainties is addressed by investigating its relationship to the optimal control

3

65

of the constructed auxiliary system. This method allows to simplify the control design. 2. An alternative ADP method with a single critic NN is proposed to online solve the derived modified HJB equation, where the actor NN used in the existing ADP schemes [30, 31] is avoided.

70

3. A newly developed adaptation algorithm driven by the NN weights error is used to online learn the critic NN weights. Different to [32, 23], the convergence of the estimated NN weights to the true values can be guaranteed. The rest of this paper is organized as follows. In Section 2, we summarize

75

the robust control problem and introduce the equivalence between the original robust control and the optimal control of an auxiliary system. Section 3 provides the proposed control design and the closed-loop system stability analysis. Two simulation examples are given to verify the validity of the proposed method in Section 4. Finally, some conclusions are stated in Section 5.

80

Notations. In this paper, R denotes the set of real numbers. Rn is the Euclidean space of n-dimensional real vectors. Rn×m is the space of n × m real

matrices. k · k means the vector norm of a vector in Rn or the matrix norm of

a matrix in Rn×m . I denotes the identity matrix, 0 represents the zero matrix. λmax (·) and λmin (·) stand for the maximum and minimum eigenvalues of a ma85

trix, respectively. diag{a1 , a2 , a3 ....an } denotes the diagonal matrix composed of a1 , a2 , a3 ....an . Ω stands for a compact subset of Rn and (·)x , ∂(·)/∂(x) denotes the partial derivative operator.

2. Problem Statement and Preliminaries Consider the following continuous-time nonlinear system x(t) ˙ = f (x(t)) + g(x(t))u(t) + k(x(t))d(x(t))

(1)

where x ∈ Rn and u ∈ Rm are the state vector and control input, respectively.

90

f (x) ∈ Rn with f (0) = 0 is a nonlinear continuous function, g(x) ∈ Rn×m is 4

a smooth control function. k(x) ∈ Rn×p with g(x) 6= k(x) is a known smooth function, and d(x) ∈ Rp denotes the unknown uncertainties bounded by a known

nonnegative function dM (x), such that kd(x)k ≤ dM (x) with dM (0) = 0 and d(0) = 0. It is noted that the system dynamics f (x), g(x) are known in this 95

paper, while the uncertainties d(x) denoting the modeling errors or disturbances are in the unmatched form due to g(x) 6= k(x). This will make the control design more challenging than the matched case. To address the robust control of system (1) with unmatched uncertainties, the uncertain term k(x)d(x) can be separated into two additive parts including the matched and unmatched elements as k(x)d(x) = g(x)g + (x)k(x)d(x) + (I − g(x)g + (x))k(x)d(x)

(2)

where g + (x) ∈ Rn×m is the Moore-Penrose pseudo-inverse of matrix function g(x). As shown in (2), the lumped uncertainties k(x)d(x) can be represented 100

as the matched part g(x)g + (x)k(x)d(x) and the residual unmatched part (I − g(x)g + (x))k(x)d(x).

Inspired by the work [8, 28], the robust control problem of system (1) can be transformed into an equivalent optimal control problem, which paves a new way to address the robust control design by tailoring the methodologies developed for optimal control. To show this idea, an auxiliary system is constructed as follows x˙ = f (x) + g(x)u + (I − g + (x)g(x))k(x)ν

(3)

where ν ∈ Rp is the auxiliary control. It is noted that the first two terms in the right side of (3) define the nominal system of (3) (i.e., without uncertainties), and the final term denotes the effects of the unmatched residual dynamics (the 105

final term in (2)), which will be addressed by the auxiliary control ν as [8]. This representation will be used to reformulate the robust control problem as an optimal control problem. Then, we define two augmented matrices µ = [uT , ν T ]T ∈ Rm+p and M (x) =

5

[g(x), (I − g(x)g + (x))k(x)] ∈ Rn×(m+p) , so that system (3) can be written as x˙ = f (x) + M (x)µ

(4)

For system (4), we can find a feedback control law µ∗ = [u∗T , ν ∗T ]T to minimize the following cost function Z V (x(t)) =

t



z(x(τ ), µ(τ ))dτ

(5)

where the utility function can be defined as 2 z(x(τ ), µ(τ )) = d2M (x(τ )) + ψM (x(τ )) + U (x(τ ), µ(τ ))

(6)

with the bounded term ψM (x) given by kg + (x)k(x)d(x)k ≤ ψM (x)

(7)

and the basic utility term in (6) can be selected as U (x, µ) = xT x + µT µ ≥ 0 with U (0, 0) = 0 and µT µ = uT u + ν T ν.

110

2 are Remark 1. In the cost function (5), two extra terms d2M and ψM

included to address the effects of the divided uncertainties g(x)g + (x)k(x)d(x) and (I − g(x)g + (x))k(x)d(x), respectively. This modified cost function can help to explore the equivalence between the original robust control of system (1) and the optimal control of auxiliary system (4) with the cost function (5), 115

which will be shown in the following Lemma 1. Moreover, since the conditions d(0) = 0, dM (0) = 0 are true, we can verify that ψM (0) = 0, which implies that the cost function (5) is finite and feasible. To formulate the optimal control problem, we define the optimal cost function as V ∗ (x(t)) = min V (x(t)) µ∈Rm+p

(8)

Based on the optimal principle, we can derive the associated HJB equation as 2 min {d2M + ψM + U (x, µ∗ (x)) + (Vx∗ (x))T (f (x) + M (x)µ∗ (x))} = 0

µ∈Rm+p

(9)

where Vx∗ is the partial derivative of optimal cost function V ∗ with respect to x. Hence, the optimal control action can be obtained by 1 µ∗ (x) = − M T (x)Vx∗ (x) 2 6

(10)

so that the following control actions u∗ and ν ∗ are given as 1 u∗ (x) = − g T (x)Vx∗ (x) 2 1 ν ∗ (x) = − k T (x)(I − g(x)g + (x))T Vx∗ (x) 2

(11)

The following Lemma shows that the robust control of system (1) can be represented as the optimal control of auxiliary system (4) with cost function 120

(5): Lemma 1. Assume that the solution of optimal control for nominal system (4) with cost function (5) exists, this solution can make the original system (1) with unmatched uncertainties k(x)d(x) asymptotically stable, i.e., the optimal control action µ∗ (x) of nominal system (4) is the solution of robust control for

125

uncertain system (1). Proof. Consider V ∗ (x) given in (8), we know V ∗ (x) > 0 for all x 6= 0 and

V ∗ (x) = 0 for x = 0. Thus, V ∗ (x) can be considered as the Lyapunov function.

Then, using the optimal control solution µ∗ = [u∗T , ν ∗T ]T of system (3), the time derivative for the optimal cost function V ∗ (x) along system (1) can be obtained as V˙ ∗ (x) =Vx∗T x˙ = Vx∗T (f (x) + g(x)u∗ (x) + k(x)d(x)) =Vx∗T [f (x) + g(x)u∗ (x) + g(x)g + (x)k(x)d(x) + (I − g(x)g + (x))k(x)d(x)] =Vx∗T [f (x) + g(x)u∗ (x) + (I − g(x)g + (x))k(x)ν ∗ (x) + g(x)g + (x)k(x)d(x) + (I − g(x)g + (x))k(x)(d(x) − ν ∗ (x))] ∗



(12) +



Since M (x)µ (x) = g(x)u (x) + (I − g(x)g (x))ν (x) is true, according to (11) and (9), we can write (12) as 2 V˙ ∗ (x) = − d2M (x) − ψM (x) − xT x − ν ∗T (x)ν ∗T (x) − u∗T (x)u∗ (x)

− 2u∗T (x)g + (x)k(x)d(x) − 2ν ∗T (x)(d(x) − ν ∗ (x))

(13)

From equation (13), we can obtain −u∗T (x)u∗ (x)−2u∗T (x)g + (x)k(x)d(x) = −ku∗ (x) + g + (x)k(x)d(x)k2 + kg + (x)k(x)d(x)k2 7

(14)

According to Young’s inequality, we know −2ν ∗T (x)d(x) ≤ kν ∗ (x)k2 + kd(x)k2

(15)

Then, substituting (14) and (15) into (13), we can obtain 2 V˙ ∗ (x) ≤ − d2M (x) − ψM (x) − xT x − ν ∗T (x)ν ∗ (x) − ku∗ (x) + g + (x)k(x)d(x)k2

+ kg + (x)k(x)d(x)k2 + kν ∗ (x)k2 + kd(x)k2 + 2ν ∗T (x)ν ∗ (x) 2 ≤ − (ψM (x) − kg + (x)k(x)d(x)k2 ) − (d2M (x) − kd(x)k2 ) + 2ν ∗T (x)ν ∗ (x)

− xT x

(16)

Considering the facts kd(x)k ≤ dM (x) and kg + (x)k(x)d(x)k ≤ ψM (x), we know V˙ ∗ (x) ≤ −(xT x − 2ν ∗T (x)ν ∗ (x))

(17)

Then, we can verify that V˙ ∗ (x) ≤ 0 for 2ν ∗T ν ∗ ≤ xT x. Moreover, following similar analysis given in [10], one knows from (17) that system (1) is stable for all d(x) ∈ Rn×p , and V ∗ → 0 and x → 0 are also true. In this respect, the

optimal action µ∗ is the robust control solution of system (1). This completes 130

the proof. Remark 2. The key merit of Lemma 1 lies in that the original robust control problem can be transformed into the equivalent optimal control problem with a constructive cost function. Thus, the robust control problem can be solved in terms of the optimal control methods. This reformulation allows to develop a

135

new robust control design method by tailoring the ADP method, as presented in the next section. Remark 3. In the above developments, the unmatched uncertainties k(x)d(x) are considered. In the ideal case where the uncertainties are in the matched form (i.e., k(x) = g(x)), we can set ψM = 0 in the cost function, and thus the aux-

140

iliary control ν can be removed. Consequently, the problem can be reduced to the optimal control of nominal system x˙ = f (x) + g(x)u. This can be taken as a special case of the current study.

8

3. Robust Control Design via ADP Based Optimal Control To obtain the optimal control action given in (11), the HJB equation (9) 145

should be solved such that the optimal cost function V ∗ (x) can be calculated. However, it is not easy to solve the partial differential equation (9) directly. To overcome this issue, a critic NN is proposed to estimate the solution of this HJB equation. Then, a recently developed adaptation algorithm is used to online update the unknown NN weights, where the convergence to the optimal

150

solution can be rigorously guaranteed. The proposed control system structure is given in Figure 1.

HJB Equation

Cost Function

Critic NN

Augmented µ Control

Augmented x System x

Figure 1: Schematic of the proposed control system.

3.1. Critic NN and Online Adaptation In this section, we will develop an ADP scheme to obtain the solution of the above defined optimal control problem. The basic idea is to train a critic NN to approximate the optimal cost function V ∗ (x). For this purpose, we assume that the cost function V ∗ (x) is a continuous function, which can be approximated via the following NN [33, 34] V ∗ (x) = W T σ(x) + εv (x)

(18)

where W ∈ Rl are the ideal critic NN weights, σ(x) ∈ Rl is the regressor vector, and the l is the number of neurons, εv (x) is the approximation error of NN. 9

Then, the partial derivative of cost function can be written as Vx∗ (x) = (∇σ(x))T W + ∇εv (x) 155

(19)

where ∇σ(x) = ∂σ(x)/∂x ∈ Rl×n is a regressor matrix and ∇εv (x) = ∂εv (x)/∂x ∈ Rn is the NN error.

Without loss of generality, the following assumption is used as [20]: Assumption 1. The NN weights W , the regressor vector σ, the regressor matrix ∇σ(x) , the approximate errors εv (x) and ∇εv (x) are all bounded, i.e., 160

kW k ≤ WM , kσk ≤ σN , k∇σk ≤ σM , k∇εv k ≤ σε for positive constants σN , σM , σε . In practice, the ideal NN weights W are unknown, thus only the estimated ˆ that are online updated can be used. Hence, the estimated cost NN weights W function can be given as ˆ T σ(x) Vˆ (x) = W

(20)

So that the approximated optimal solution can be calculated as ˆ Vˆx (x) = (∇σ(x))T W

(21)

From equation (18) and (19), the ideal optimal control (10) can be written as µ∗ (x)

= − 12 M T (x)[(∇σ(x))T W + ∇εv (x)]

(22)

and the practical optimal control action can be written as 1 ˆ µ ˆ(x) = − M T (x)(∇σ(x))T W 2

(23)

ˆ, The problem to be further addressed is to determine the critic NN weights W which can ensure the stability of the control system and achieve convergence to the ideal values W . Most existing ADP schemes can only guarantee the 165

uniform ultimate boundedness (UUB) of the estimated NN weights rather than the convergence. This paper will introduce a new adaptive learning scheme to ˆ to W . This enhanced convergence property can guarantee the convergence of W contribute to avoiding the use of actor NN, while the calculated optimal control based on the critic NN can approach to the ideal optimal solution as well. 10

According to equation (9) and (19), the HJB equation with critic NN approximation can be given as 2 0 = d2M + ψM + kxk2 + kµk2 + W T ∇σ(x)[f (x) + M (x)µ] + εHJB 170

(24)

where εHJB = (∇εv (x))T [f (x) + M (x)µ] is a bounded residual HJB error stemming from the critic NN approximation error ∇εv (x). In order to develop an adaptive law to estimate W , we denote the known terms in (24) as Ξ = ∇σ(x)[f (x) + M (x)µ] Θ = Φ(x) + kxk2 + kµk2

(25)

2 . Then the approximated HJB equation (24) can be with Φ(x) = d2M + ψM

rewritten as Θ = −W T Ξ − εHJB

(26)

From equation (26), we know that unknown critic NN weights W are in a linearly parameterized form. Therefore, it can be ’directly’ estimated by applying a recently suggested learning method as [22], which is derived by using the extracted estimation error. For this purpose, the auxiliary regressor matrix P ∈ Rl×l and vector ∅ ∈ Rl are defined as   P˙ = − `P + ΞΞT , P (0) = 0  ˙ ∅ = − `∅ + ΞΘ,

(27)

∅(0) = 0

where ` > 0 is a positive parameter serving as a damping coefficient in (27). Hence, we can derive its solution as  Z t   P = e−`(t−τ ) Ξ(τ )ΞT (τ )dτ  0 Z t    e−`(t−τ ) Ξ(τ )Θ(τ )dτ ∅=

(28)

0

which can be online calculated based on the measurable system dynamics. We define an auxiliary vector ℵ ∈ Rl based on P and ∅ in (28) as ˆ +∅ ℵ = PW 11

(29)

By substituting (26) into (28), we can obtain ∅ = −P W + v1 , where v1 = R t −`(t−τ ) − 0e εHJB (τ )ΞT (τ )dτ is a bounded variable for any bounded state x and control input µ, e.g., kv1 k ≤ ε1v for a positive constant εv1 > 0. Then, we can obtain from (27)-(29) that ˆ + ∅ = −P W ˜ + v1 ℵ = PW

(30)

˜ = W −W ˆ are the estimation errors of NN weights. Obviously, the where W

˜ . The use of estimaauxiliary vector ℵ is a function of the NN weights errors W

tion errors in the adaptive law can contribute to retaining the convergence of the online learning algorithm as shown in [22, 35]. Therefore, the adaptive law ˆ can be designed as for online updating W ˆ˙ = −Γℵ W

(31)

with Γ > 0 being the adaptive gain. It is shown in (30) that the derived adaptive law (31) is driven by the errors 175

˜ , which can be calculated based on the measurable system input µ and output W x. The motivation for developing this new algorithm is to retain the convergence ˆ to the unknown weights W . Hence, the proposed of the estimated weights W adaptive law (31) is essentially different to the existing online ADP results (e.g., [20, 21] and references therein), which adopt the gradient based algorithms [36],

180

ˆ only. In this line, we will prove that fast, guaranteed and retain the UUB of W convergence of this new adaptive law can be retained. As a consequence of this convergence property, the actor NN that has been used in the existing ADP literatures can be avoided. Remark 4. The paper aims at developing an adaptive method to online

185

estimate the unknown critic NN weights, so as to obtain the practical optimal control. In this case, the regressor Ξ and the introduced matrix P defined in (27) are well defined. This representation can help to avoid using the other offline optimization algorithms (e.g., particle swarm optimization (PSO), genetic algorithm (GA)).

12

190

Before we present the main results of this section, we show the positive definiteness of the introduced matrix P as summarized in the following Lemma: Lemma 2 [22, 37]. The persistent excitation (PE) of the regressor Ξ in (27) is equivalent to the positive definite property of the matrix P defined in (28).

195

Lemma 2 indicates that the standard PE condition is required to retain the convergence of the developed algorithm. This requirement is not stringent and has been widely utilized in the ADP literatures (e.g., [20, 21, 29]). In practice, to fulfill this condition, we can introduce a vanishing probing noise into the measured system input/output during the transient learning stage as [20, 21].

200

Now, we have the following results: Theorem 1. For adaptive law (31) of critic NN with the regressor vector Ξ ˜ converge in (25) fulfilling the PE condition, then the critic NN weights errors W to a small bounded set around 0, whose size depends on the amplitude of NN approximation error and the excitation level. Proof. From Lemma 2, we know that if the regressor Ξ is PE, the matrix P is positive definite, i.e., the minimum eigenvalue λmin (P ) > % > 0. Now, we ˜ , and obtain its derivative V˙ ˜ T Γ−1 W select the Lyapunov function as V = 12 W along equation (31) as ˜˙ = −W ˜ TPW ˜ +W ˜ T v1 ˜ T Γ−1 W V˙ = W

(32)

Since there is a bounded NN approximation error, we know εHJB 6= 0. Hence, equation (32) can be further written as ˜ TPW ˜ +W ˜ T v1 ≤ −kW ˜ k(%kW ˜ k − ε1v ) V˙ = −W 205

(33)

˜ will converge Hence, we can obtain that the estimation errors of NN weights W ˜ |kW ˜ | ≤ ε1v /%}. Specifically, the size of this set is to a compact set Ω : {W determined by the bound of the approximation error εv and the excitation level %, i.e., for an arbitrarily small NN approximation error (according to the NN approximation property, this error can be arbitrarily small for sufficient NN

210

ˆ to W can be nodes, i.e., ∇εv (x) → 0 with l → ∞), the convergence of W 13

obtained. In particular, in the ideal case with εHJB = 0 and v1 = 0, we can ˜ converge to zero exponentially. claim that the estimation errors of weights W Remark 5.

In the ADP literatures, the classical critic-actor structure is

used to obtain the optimal control action, where the critic NN and actor NN are 215

trained to approximate the cost function and the control action, respectively. This leads to heavy computational costs. Unlike these existing methods, this paper proposes a single critic NN to approximate the solution of the derived HJB equation, which is also used to directly calculate the optimal control action as (23), since the convergence of the critic NN weights can be accomplished in

220

terms of the novel adaptive law as proved in Theorem 1. 3.2. Stability Analysis ˆ based on (31) and the approximated optimal After obtaining the estimates W control action (23), we need to study the stability of the proposed control system. For this purpose, by substituting (23) into system (4), we can obtain the closedloop system dynamics as x˙ = f (x) + M (x)µ∗ + M (x)(ˆ µ − µ∗ )

1 ˜ + ∇εv (x)] = f (x) + M (x)µ∗ + M (x)M T (x)[(∇σ)T W 2

(34)

To facilitate the stability analysis, the following assumption is employed, which has been widely used in the previous literatures [22, 38] and can be fulfilled in practical systems. 225

Assumption 2 [22, 38]. The function f (x) and M (x) fulfil the condition kf (x)k ≤ bf kxk, kM (x)k ≤ bg , where bf > 0 and bg > 0 are positive constants. Now, the stability of the controlled system can be summarized as: Theorem 2. For system (4) with optimal control (23) and adaptive law (31), ˜ converge if the regressor Ξ of critic NN is PE, then the NN weights errors W

230

to a small residual set around zero, and the approximated optimal control µ ˆ in (23) converges to a small region around the ideal optimal solution µ∗ given in (22), i.e., kˆ µ − µ∗ k ≤ εµ .

14

Proof. Consider the Lyapunov function as J = J1 + J2 + J3 =

1 ˜ T −1 ˜ W Γ W + Γ1 xT x + KV ∗ + γ1 v1T v1 2

(35)

where V ∗ is the optimal cost function given in (8). K > 0, Γ1 > 0 and γ1 > 0 are positive constants. According to the inequality 2ab ≤ a2 η +

b2 η

for η > 0 and (32), we have

˜ TPW ˜ +W ˜ T v1 J˙1 = −W ≤ −(% −

2 1 ˜ k2 + ηkv1 k )kW 2η 2

(36)

Moreover, J˙2 can be deduced from (8) and (34) that J˙2 =2Γ1 xT x˙ + K[−Φ(x) − kxk2 − kµ∗ k2 ]

1 ˜ + ∇ε] + M (x)µ∗ } =2Γ1 xT {f (x) + M (x)M T (x)[(∇σ)T W 2 + K[−Φ(x) − kxk2 − kµ∗ k2 ]

1 1 ˜ k2 + 1 Γ21 b4g k∇εv k2 ≤ − [K − 2Γ1 bf − ( ηb2g σM + 2)]kxk2 + Γ21 b2g σM kW 2 2η 4 − (K − Γ21 b2g )kµ∗ k2 − KΦ(x) 235

(37)

2 (x) is a nonnegative function. where Φ(x) = d2M (x) + ψM

Moreover, we know from equation (24) that εHJB = (∇εv )T [f + M µ] and can verify the fact v˙ 1 = −`v1 + ΞεHJB , so that J˙3 can be calculated by J˙3 =2γ1 v1T v˙ 1 =2γ1 v1T (−`v1 + Ξ∇εTv [f (x) + M (x)µ]) 1 1 2 2 ≤ − (2γ1 ` − 3η)kv1 k2 + γ12 σε2 b2f kΞk2 kxk2 + γ12 b2g σM bW kΞk2 k∇εv (x)k2 η 4η (38) ˆ k is a bounded variable. where bW = kW

15

According to (36), (37) and (38), then J˙ can be written as J˙ ≤J˙1 + J˙2 + J˙3

1 1 ˜ k2 − (2γ1 ` − 7 η)kv1 k2 − Γ21 b2g σM )kW 2η 2η 2 1 2 1 2 2 2 − (K − 2Γ1 bf − ηbg σM − 2 − γ1 σε bf kΞk2 )kxk2 2 η 1 1 2 2 − (K − Γ21 b2g )kµ∗ k2 + ( Γ21 b4g + γ12 b2g σM bW kΞk2 )k∇εv k2 4 4η

≤ − (% −

(39)

Thus, if we choose the design parameters γ1 , η, Γ1 and K to satisfy the following conditions 1 + Γ21 b2g σM 7η , γ1 > , 2% 4` ηb2g σM 1 K > max{2Γ1 bf + + γ12 σε2 b2f kΞk2 + 2, Γ21 b2g } 2 η η>

(40)

then the equation (39) can be further reduced to ˜ k2 − a2 kv1 k2 − a3 kxk2 − a4 kµ∗ k2 + a5 J˙ ≤ −a1 kW

(41)

where the constants a1 , a2 , a3 , a4 and a5 are given by 1 1 − Γ2 b2 σM 2η 2η 1 g 7 a2 = 2γ1 ` − η 2 1 1 a3 = K − 2Γ1 bf − ηb2g σM − 2 − γ12 σε2 b2f kΞk2 2 η a1 = % −

(42)

a4 = K − Γ21 b2g

1 1 2 2 a5 = ( Γ21 b4g + γ12 b2g σM bW kΞk2 )k∇εv k2 4 4η It is clear that for above stated design parameters η, γ1 and K, the constants a1 , a2 , a3 , a4 and a5 can be guaranteed to be positive from equation (42). Specifically, the residual error a5 is a positive bounded constant which is affected 240

by the critic NN error ∇εv , so that a5 → 0 for ∇εv → 0. Hence, we know from (42) that the derivative of J is negative, if any of the following conditions are true: ˜ k2 > a5 ⇒ kW ˜k> a1 kW 16

p

a5 /a1

p a2 kv1 k2 > a5 ⇒ kv1 k > a5 /a2 p a3 kxk2 > a5 ⇒ kxk > a5 /a3 p a4 kµ∗ k2 > a5 ⇒ kµ∗ k > a5 /a4

Based on the Lyapunov theory, we can come to the conclusions that the closed˜ , the system state x, the loop system is stable, and the critic NN weights errors W control action µ∗ and the residual error v1 are all bounded. Specifically, it is true that these variables will converge to a small compact set around zero, whose size depends on the constant a5 determined by the critic NN approximation error, which can be made arbitrarily small by choosing sufficient NN nodes. In this respect, we can further analyze the error between the approximated optimal control action and the ideal optimal solution, i.e., µ ˆ − µ∗ , such that as 1 ˆ + 1 M T (x)[(∇σ)T W + ∇εv (x)] µ ˆ − µ∗ = − M T (x)(∇σ)T W 2 2 1 T T ˜ = M (x)[(∇σ) W + ∇εv (x)] 2 1 ˜ k + σε ] ≤ bg [σM kW 2

(43)

≤εµ where εu is a positive constant, depending on the critic NN approximation error. This indicates that the obtained approximated optimal control action will converge to a small region around the optimal solution. The proof is completed.

4. Simulation 245

4.1. Example 1: Second-order Nonlinear System Consider the following continuous-time nonlinear system [28, 39]: x(t) ˙ = f (x) + g(x)u + k(x)d(x)

17

(44)

where



f (x) = 



g(x) =  

k(x) = 

−x1 + x2

−0.5x1 − 0.5x2 (1 − (cos(2x2 ) + 2)2 )  0  cos(2x1 ) + 2  x2  0

  (45)

In this system, x = [x1 , x2 ]T ∈ R2 is the state variable, d(x) = ωx1 sin(2x2 ) is the uncertain term, and ω ∈ [−1, 1] is the uncertain parameter. Moreover,

the initial state for system is set as x0 = [0.5, 0]T . In this case, we know g + (x) = [0, 1/(cos(2x1 ) + 2)] and then g + (x)k(x)d(x) = 0. Consequently, the boundary functions can be defined as dM = kxk, ψM = 0. Then, we can obtain (I − g(x)g + (x))k(x) = [x2 , 0]T . Hence, the augmented system with auxiliary

control ν can be formulated as  f (x) = 

−x1 + x2

−0.5x1 − 0.5x2 (1 − (cos(2x2 ) + 2)2 )   0 x2  M (x) =  cos(2x1 ) + 2 0

 

(46)

and µ = [uT , ν T ]T ∈ R2 .

In this simulation, we set the the utility function as z(x, µ) = kxk2 + xT x +

µT µ. Then, a critic NN is introduced to approximate the solution V ∗ of the HJB equation (9), where the unknown critic NN weights can be online updated by applying the adaptive law (31). As shown in [28], the optimal cost function can be given by V ∗ (x) = x21 + 2x22

(47)

Then, the ideal control action u∗ and ν ∗ can be obtained as 1 u∗ = − g T Vx∗ = −(cos(2x1 ) + 2)2x2 2 1 ∗ ν = − k T (I − gg + )T Vx∗ = −x1 x2 2 18

(48)

To complete the simulation, the critic NN regressor σ(x) is selected as σ(x) = [x21 , x1 x2 , x22 ]T

(49)

ˆ = 0, and the In the simulation, we set the initial critic NN weights as W online learning gains are ` = 10 and Γ = 100. To show the robustness of the proposed control, two cases with ω = 1 (Case 1) and ω = −1 (Case 2) are 250

ˆ can be found in considered. The profile of the estimated critic NN weights W ˆ converge to the Figure 2, which indicates that estimated critic NN weights W true values W = [1, 0, 2]. This result verifies the claims of convergence property given in Theorem 1 and validates the efficacy of the proposed learning algorithm. To better show the performance of the proposed adaptation algorithm

255

for updating the critic NN weights, the error between the true value of optimal cost function V ∗ (x) and the approximated cost function Vˆ (x) is displayed in Figure 3. It can be found that fairly satisfactory approximation performance can be obtained. In addition, for Case 1, the verification of the condition stated in Lemma

260

1 is provided in Figure 4, where the condition 2ν ∗T ν ∗ ≤ xT x holds. Hence, the robust control can be solved by studying the transformed optimal control problem. The evolution of the controlled system state with the derived optimal control action is shown in Figure 5, which indicates that the closed-loop system is asymptotically stable. The corresponding control action is given in Figure 6,

265

which is bounded and smooth. Moreover, for Case 2 with ω = −1 (Case 2), the control and state trajectories of system (44) are given in Figure 7 and Figure 8, respectively. These results also verify the efficacy of the presented control approach. These results indicate that the proposed control derived based on the proposed ADP scheme can solve the robust control problem.

270

4.2. Example 2:Power System Application In this subsection, a power system [40, 41, 39] is used to further validate the applicability of the proposed technique. In this example, the distributed and renewable energies have been integrated into micro-grids. However, there may 19

2.5

W

2

Critic NN weights

1

W2 W3

1.5 1

0.5 0

0.5

-0.5

0

0.05

0 -0.5

0

10

20

30

40

50

Time (s) ˆ Figure 2: Convergence of the critic NN weights W

0.03 0.025

V-V *

0.02 0.015 0.01 0.005 0 1 0.5

1 0.5

0

x2

0

-0.5

-0.5 -1

-1

x1

Figure 3: Critic NN approximation error of optimal cost function

20

0.25

Verification condition

0.2

0.15

0.1

0.05

0

0

10

20

Time (s)

30

40

50

Figure 4: The verification of condition used in Lemma 1

0.5

x1 x2

System state x

0.4

0.3

0.2

0.1

0

-0.1

0

5

10

15

20

25

30

35

40

45

Time (s)

Figure 5: Profile of system state x when ω = 1 (Case 1)

21

50

0.14

u v

0.12 0.1 0.08 0.06 0.04 0.02 0

0

5

10

15

20

25

30

35

40

45

50

Time (s)

Figure 6: The proposed control action µ when ω = 1 (Case 1)

0.5

x1 x2

System state x

0.4

0.3

0.2

0.1

0

-0.1

0

5

10

15

20

25

30

35

40

45

Time (s)

Figure 7: Profile of system state x when ω = −1 (Case 2)

22

50

0.18

u v

0.16 0.14

Control action

0.12 0.1 0.08 0.06 0.04 0.02 0

0

5

10

15

20

25

30

35

40

45

50

Time (s)

Figure 8: The proposed control action µ when ω = −1 (Case 2)

be frequency deviations owing to the imbalance between the load consumptions 275

and power generations. Therefore, it is practically demanding to investigate the frequency regulation for the micro-grids even in the presence of modeling uncertainties. For this purpose, we consider a practical power system including a turbine generator, a system load, and an automatic generation control. The detailed modeling of this system can be found in [40, 41, 39]. In order to facilitate simulation, we consider ζ1 , ζ2 and ζ3 as the incremental change of the frequency deviation, the generator output, and the governor position, respectively, while the control input u represents the incremental speed change of positive deviation. Moreover, d(x) = ωx1 x2 cos(x2 x3 ) is the incremental change in the electrical power with ω ∈ [−1, 1]. Then, we can define x =

[ζ1 , ζ2 , ζ3 ]T ∈ R3 as the state vector. Hence, the state-space formulation of the power system can be written as  −1 0 − F11T1  T1  K1 x(x) ˙ = T − T12 0  2 K2 0 − T13 T3





1 T1





0



           x +  0  u −  0  d(x)      K2 0 T3

(50)

The corresponding system parameters can be found in Table 1.

In the simulation, we know g + (x) = [5, 0, 0] and thus g + (x)k(x)d(x) = 0. 23

Table 1: PARAMETERS OF THE PROPOSED POWER SYSTEM

Symbol

Meaning

values

T1

Time constant of the governor

5

T2

Time constant of the turbine model

10

T3

Time constant of the generator model

10

F1

Feedback regulation constant

0.5

K1

Gain constant of the turbine model

1

K2

Gain constant of the generator model

1

In this case, the boundary functions can be defined as dM = kxk, ψM = 0. Moreover, we can obtain (I − g(x)g + (x))k(x) = [0, 0, −0.1]T , and then the augmented system with auxiliary control ν can be formulated as     −0.2x1 − 0.4x3 0.2 0         f (x) =  0.1x1 − 0.1x2  , M (x) =  0 0      0 −0.1 0.1x2 − 0.1x3 280

(51)

and µ = [uT , ν T ]T ∈ R2 . We can set the utility function as z(x, µ) = kxk2 + ˆ (0) = 0, the learnxT x + µT µ. The initial critic NN weights are set as W

ing gains are ` = 15 and Γ = 800. Simulation is carried out with the initial system state x0 = [0.5, 0.2, 0.2]T . The regressor of critic NN is set as σ(x) = [x21 , x1 x2 , x1 x3 , x22 , x2 x3 , x23 ]T . Similar to Example 1, we consider two 285

cases with ω = 1 (Case 1) and ω = 0.5 (Case 2) to show the ability of the derived control action against various system uncertainties. ˆ = [W1 , W2 , W3 , W4 , W5 , W6 ]T The responses of the estimated critic NN weights W are given in Figure 9, which converge to [5.13, 1.86, 0.23, 0.70, 2.72, 10.54] after a transient process. For Case 1, the profile of the system trajectory with the

290

derived optimal control action is shown in Figure 10. It is shown that the system trajectory converges to zero. The required condition in Lemma 1 can be verified in Figure 11, which again indicates that the transformation of robust control into optimal control is feasible. Moreover, Figure 12 shows the evolution of the controlled system state with the derived optimal control action, which 24

295

implies that the controlled system is stabilised even in the presence of parameter uncertainties. The corresponding control action is given in Figure 13. For another case with ω = 0.5 (Case 2), the system state response under the derived optimal control is depicted in Figure 14 and the corresponding control action is given in Figure 15. These results again verify the effectiveness of the

300

proposed robust control strategy. 12

W1 W2

10

W3 W4

Critic NN weights

8

W5 W6

6 4 2 0 -2

0

50

100

150

Time (s)

ˆ Figure 9: Convergence of the critic NN weights W

5. Conclusions This paper is dedicated to address the robust control problem for continuoustime systems with unmatched uncertainties. The key idea is to transform the robust control problem into the optimal control problem of an augmented aux305

iliary system with a constructive cost function. Then we develop a single critic NN based ADP framework to solve the derived optimal equation (i.e., HJB equation). To obtain the solution of the derived HJB equation, a critic NN is online trained, where the obtained NN weights can be used to calculate the control action due to its attractive property. To achieve the convergence, a

310

novel adaptation algorithm is used to online directly estimate the unknown NN

25

0.2 Initial state 0.15

x3

0.1 0.05 0 -0.05 0.3 0.2

0.6 0.4

0.1

0.2

0

x2

-0.1

0 -0.2

x1

Figure 10: State trajectory x1 , x2 and x3

0.35

Verification condition

0.3 0.25 0.2 0.15 0.1 0.05 0

0

50

Time (s)

100

Figure 11: The verification of condition in Lemma 1

26

150

0.5

x1 x2

0.4

x3

System state x

0.3 0.2 0.1 0 -0.1 -0.2

0

50

100

150

Time (s)

Figure 12: Profile of system state x when ω = 1 (Case 1)

0.2 u v

0.1

Control action

0

-0.1

-0.2

-0.3

-0.4

0

50

Time (s)

100

150

Figure 13: The proposed control action µ when ω = 1 (Case 1)

27

0.5

x1 x2

0.4

x3

System state x

0.3 0.2 0.1 0 -0.1 -0.2

0

50

100

150

Time (s)

Figure 14: Profile of system state x when ω = 0.5 (Case 2)

0.2

u v

0.1

Control action

0

-0.1

-0.2

-0.3

-0.4

0

50

Time (s)

100

150

Figure 15: The proposed control action µ when ω = 0.5 (Case 2)

28

weights. As a consequence of this convergent learning algorithm, the widely used actor NN in the existing ADP methods can be avoided. In this line, the proposed method paves a new pathway to solve the robust control problem by using the ADP based optimal control methodologies. Simulation results are 315

presented to validate the effectiveness of the proposed methods. Future work will focus on the robust tracking control design for nonlinear systems with fully unknown unmatched dynamics and the experimental validation.

Author Contributions 1) Jun Zhao: carry out the control design and theoretical analysis, and 320

prepare the draft of this paper. 2) Jing Na: propose the initial idea, participate the stability analysis, write parts of the manuscript and revise the paper 3) Guanbin Gao: participate the simulation studies and the result analysis.

Declaration of Competing Interest 325

No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.

References [1] Q. Chen, S. Xie, M. Sun, X. He, Adaptive nonsingular fixed-time attitude stabilization of uncertain spacecraft, IEEE Transactions on Aerospace and 330

Electronic Systems 54 (6) (2018) 2937–2950. doi:10.1109/TAES.2018. 2832998. [2] S. Wang, H. Yu, J. Yu, J. Na, X. Ren, Neural-network-based adaptive funnel control for servo mechanisms with unknown dead-zone, IEEE Transactions on Cybernetics PP (99) (2019) 1–12. doi:10.1109/TCYB.2018.

335

2875134.

29

[3] M. Chen, S. S. Ge, Adaptive neural output feedback control of uncertain nonlinear systems with unknown hysteresis using disturbance observer, IEEE Transactions on Industrial Electronics 62 (12) (2015) 7706–7716. doi:10.1109/tie.2015.2455053. 340

[4] D. Zheng, Y. Pan, K. Guo, H. Yu, Identification and control of nonlinear systems using neural networks: A singularity-free approach, IEEE Transactions on Neural Networks and Learning Systems 30 (9) (2019) 2696–2706. doi:10.1109/TNNLS.2018.2886135. [5] K. Zhou, J. Doyle, Essentials of robust control, Prentice hall NJ,USAdoi:

345

10.1016/S0005-1098(01)00272-2. [6] T. Ba¸sar, P. Bernhard, H-infinity optimal control and related minimax design problems: a dynamic game approach, Springer Science & Business Media, Berlin, Germany, 2008. doi:10.1109/TAC.1996.536519. [7] F. Lin, R. D. Brandt, J. Sun, Robust control of nonlinear systems: com-

350

pensating for uncertainty, International Journal of Control 56 (6) (1992) 1453–1459. doi:10.1080/00207179208934374. [8] F. Lin, An optimal control approach to robust control design, International Journal of Control 73 (3) (2000) 177–186. doi:10.1080/002071700219722. [9] P. Gahinet, A. Nemirovskii, A. J. Laub, M. Chilali, The lmi control toolbox,

355

in: Proceedings of 1994 33rd IEEE Conference on Decision and Control, Vol. 3, 1994, pp. 2038–2041. doi:10.1109/CDC.1994.411440. [10] F. Lin, Robust control design: an optimal control approach, John Wiley & Sons Ltd, Chichester, England, 2007. doi:10.1002/9780470059579. [11] F. L. Lewis, D. Vrabie, V. L. Syrmos, Optimal control, John Wiley & Sons,

360

Inc., Hoboken, USA, 2012. doi:10.1002/0471667196.ess0698. [12] J. Allwright, A lower bound for the solution of the algebraic riccati equation of optimal control and a geometric convergence rate for the kleinman al30

gorithm, IEEE Transactions on Automatic Control 25 (4) (1980) 826–829. doi:10.1109/TAC.1980.1102412. 365

[13] P. J. Werbos, Approximate dynamic programming for real-time control and neural modeling, Handbook of Intelligent Control Neural Fuzzy & Adaptive Approaches, D. A.White and D. A. Sofge, Eds (1992) 493–525doi:10. 1016/0893-6080(94)90107-4. [14] F. L. Lewis, D. Vrabie, Reinforcement learning and adaptive dynamic pro-

370

gramming for feedback control, IEEE Circuits and Systems Magazine 9 (3) (2009) 32–50. doi:10.1109/MCAS.2009.933854. [15] H. Jiang, H. Zhang, X. Xie, J. Han, Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming, Neurocomputing

375

344 (2019) 13–19. doi:10.1016/j.neucom.2018.02.107. [16] H. Ren, Y. Wen, C. Liu, Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator, Neurocomputing 335 (2019) 96–104. doi:10.1016/j.neucom.2019. 01.033.

380

[17] A. Heydari, S. N. Balakrishnan, Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics, IEEE Transactions on Neural Networks and Learning Systems 24 (1) (2013) 145–157. doi:10.1109/tnnls.2012.2227339. [18] T. Bian, Y. Jiang, Z.-P. Jiang, Decentralized adaptive optimal control of

385

large-scale systems with application to power systems, IEEE Transactions on Industrial Electronics 62 (4) (2014) 2439–2447. doi:10.1109/TIE. 2014.2345343. [19] Q. Wei, D. Liu, X. Yang, Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems, IEEE Transactions on Neural

31

390

Networks and Learning Systems 26 (4) (2015) 866–879. doi:10.1109/ TNNLS.2015.2401334. [20] M. Abu-Khalaf, F. L. Lewis, Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network hjb approach, Automatica 41 (5) (2005) 779–791. doi:10.1016/j.automatica.2004.

395

11.034. [21] K. G. Vamvoudakis, F. L. Lewis, Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica 46 (5) (2010) 878–888. doi:10.1016/j.automatica.2010.02.018. [22] J. Na, G. Herrmann, Online adaptive approximate optimal tracking control

400

with simplified dual approximation structure for continuous-time unknown nonlinear systems, IEEE/CAA Journal of Automatica Sinica 1 (4) (2014) 412–422. doi:10.1109/JAS.2014.7004668. [23] H. Zhang, L. Cui, X. Zhang, Y. Luo, Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive

405

dynamic programming method, IEEE Transactions on Neural Networks 22 (12) (2011) 2226–2236. doi:10.1109/TNN.2011.2168538. [24] Y. Lv, X. Ren, J. Na, Online optimal solutions for multi-player nonzero-sum game with completely unknown dynamics, Neurocomputing 283 (2018) 87 – 97. doi:10.1016/j.neucom.2017.12.045.

410

[25] J. Na, B. Wang, G. Li, S. Zhan, W. He, Nonlinear constrained optimal control of wave energy converters with adaptive dynamic programming, IEEE Transactions on Industrial Electronics 66 (10) (2019) 7904–7915. doi:10.1109/TIE.2018.2880728. [26] Y. Jiang, Z.-P. Jiang, Computational adaptive optimal control for

415

continuous-time linear systems with completely unknown dynamics, Automatica 48 (10) (2012) 2699–2704. doi:10.1016/j.automatica.2012. 06.096. 32

[27] Y. Jiang, Z.-P. Jiang, Robust adaptive dynamic programming and feedback stabilization of nonlinear systems, IEEE Transactions on Neural 420

Networks and Learning Systems 25 (5) (2014) 882–893. doi:10.1002/ 9781118453988.ch13. [28] D. Wang, D. Liu, H. Li, Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems, IEEE Transactions on Automationence and Engineering 11 (2) (2014) 627–632.

425

doi:10.1109/TASE.2013.2296206. [29] D. Wang, D. Liu, C. Mu, Y. Zhang, Neural network learning and robust stabilization of nonlinear systems with dynamic uncertainties, IEEE Transactions on Neural Networks and Learning Systems 29 (4) (2018) 1342–1351. doi:10.1109/TNNLS.2017.2749641.

430

[30] X. Yang, H. He, X. Zhong, Adaptive dynamic programming for robust regulation and its application to power systems, IEEE Transactions on Industrial Electronics 65 (7) (2018) 5722–5732. doi:10.1049/iet-cta. 2017.0154. [31] D. Wang, Robust policy learning control of nonlinear plants with case stud-

435

ies for a power system application, IEEE Transactions on Industrial Informatics (2019) 1–1doi:10.1109/TII.2019.2925632. [32] S. Bhasin, R. Kamalapurkar, M. Johnson, K. G. Vamvoudakis, F. L. Lewis, W. E. Dixon, A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , Automatica 49 (1) (2013)

440

82–92. doi:10.1016/j.automatica.2012.09.019. [33] C. Yang, C. Chen, N. Wang, Z. Ju, J. Fu, M. Wang, Biologically inspired motion modeling and neural control for robot learning from demonstrations, IEEE Transactions on Cognitive and Developmental Systems 11 (2) (2019) 281–291. doi:10.1109/TCDS.2018.2866477.

33

445

[34] C. Yang, C. Chen, W. He, R. Cui, Z. Li, Robot learning system based on adaptive neural control and dynamic movement primitives, IEEE Transactions on Neural Networks and Learning Systems 30 (3) (2019) 777–787. doi:10.1109/TNNLS.2018.2852711. [35] Y. Lv, X. Ren, Approximate nash solutions for multiplayer mixed-zero-

450

sum game with reinforcement learning, IEEE Transactions on Systems, Man, and Cybernetics: Systems 49 (12) (2019) 2739–2750. doi:10.1109/ TSMC.2018.2861826. [36] F. Ding, X. Liu, J. Chu, Gradient-based and least-squares-based iterative algorithms for hammerstein systems using the hierarchical identification

455

principle, Control Theory & Applications Iet 7 (2) (2013) 176–184. doi: 10.1049/iet-cta.2012.0313. [37] F. Luan, J. Na, Y. Huang, G. Gao, Adaptive neural network control for robotic manipulators with guaranteed finite-time convergence, Neurocomputing 337 (2019) 153–164. doi:10.1016/j.neucom.2019.01.063.

460

[38] Y. Lv, J. Na, Q. Yang, X. Wu, Y. Guo, Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics, International Journal of Control 89 (1) (2016) 99–112.

doi:

10.1080/00207179.2015.1060362. [39] D. P. Iracleous, A. T. Alexandridis, A multi-task automatic generation 465

control for power regulation, Electric Power Systems Research 73 (3) (2005) 275–285. doi:10.1016/j.epsr.2004.06.011. [40] Y. Tang, H. He, J. Wen, J. Liu, Power system stability control for a wind farm based on adaptive dynamic programming, IEEE Transactions on Smart Grid 6 (1) (2017) 166–177. doi:10.1109/tsg.2014.2346740.

470

[41] C. Mu, Y. Tang, H. He, Improved sliding mode design for load frequency control of power system integrated an adaptive learning strategy, IEEE

34

Transactions on Industrial Electronics 64 (8) (2017) 6742–6751. doi:10. 1109/TIE.2017.2694396.

475

Jun Zhao is currently pursuing his PhD at the Kunming University of Science and Technology, Kunming, China. He received the B.Sc. degree in Mechanical Design, Manufacturing and Automation from Qingdao University of Science and Technology, Qingdao, China, in 2016. His current research interests include robust control output-feedback control and kinematics parameter identification

480

of industrial robots.

Jing Na (M’5) received the B.Eng. and Ph.D. degrees from the School of Automation, Beijing Institute of Technology, Beijing, China, in 2004 and 2010, 485

respectively. From 2011 to 2013, he was a Monaco/ITER Postdoctoral Fellow at the ITER Organization, Saint-Paul-ls-Durance, France. From 2015 to 2017, he was a Marie Curie Intra-European Fellow with the Department of Mechanical Engineering, University of Bristol, U.K. Since 2010, he has been with the Faculty of Mechanical and Electrical Engineering, Kunming University of Sci-

490

ence and Technology, Kunming, China, where he became a Professor in 2013. He has coauthored one monograph and more than 100 international journal and conference papers. His current research interests include intelligent control, adaptive parameter estimation, nonlinear control and applications for robotics, vehicle systems and wave energy convertor, etc. He is currently an Associate

35

495

Editor of the IEEE Transactions on Industrial Electronics, the Neurocomputing, and has served as the Organization Committee Chair of DDCLS 2019, international program committee Chair of ICMIC 2017. Dr Na has been awarded the Best Application Paper Award of the 3rd IFAC International Conference on Intelligent Control and Automation Science (IFAC ICONS 2013), and the 2017

500

Hsue-shen Tsien Paper Award.

Guanbin Gao received the B.Sc. and M.Sc. degrees in mechanical engineering and automation from Northeastern University, Shenyang, China, in 2001 and 505

2004, respectively, and the Ph.D. degree in mechanical manufacturing and automation from Zhejiang University, Hangzhou, China, in 2010. He is currently associate professor at the Kunming University of Science and Technology. His research mainly focuses on precision measuring & control, kinematics of industrial robots, and neural networks.

36