Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints

Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints

ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Contents lists available at ScienceDirect ISA Transactions journal homepage: www.elsevier.com/locate/isatrans Res...

2MB Sizes 14 Downloads 197 Views

ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

ISA Transactions journal homepage: www.elsevier.com/locate/isatrans

Research article

Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints Quan-Yong Fan a, Guang-Hong Yang a,b,n a b

College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning 110819, P.R. China State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, Liaoning 110819, P.R. China

art ic l e i nf o

a b s t r a c t

Article history: Received 18 February 2016 Received in revised form 23 August 2016 Accepted 31 October 2016

The state inequality constraints have been hardly considered in the literature on solving the nonlinear optimal control problem based the adaptive dynamic programming (ADP) method. In this paper, an actor-critic (AC) algorithm is developed to solve the optimal control problem with a discounted cost function for a class of state-constrained nonaffine nonlinear systems. To overcome the difficulties resulting from the inequality constraints and the nonaffine nonlinearities of the controlled systems, a novel transformation technique with redesigned slack functions and a pre-compensator method are introduced to convert the constrained optimal control problem into an unconstrained one for affine nonlinear systems. Then, based on the policy iteration (PI) algorithm, an online AC scheme is proposed to learn the nearly optimal control policy for the obtained affine nonlinear dynamics. Using the information of the nonlinear model, novel adaptive update laws are designed to guarantee the convergence of the neural network (NN) weights and the stability of the affine nonlinear dynamics without the requirement for the probing signal. Finally, the effectiveness of the proposed method is validated by simulation studies. & 2016 ISA. Published by Elsevier Ltd. All rights reserved.

Keywords: Nonaffine nonlinear systems Inequality constraints Adaptive optimal control Neural network

1. Introduction In many practical dynamic systems, there always exist some kinds of physical constraints for the control input or the system states. For example, the velocity of a car should be limited due to safety considerations. At the same time, the acceleration constraint of a vehicle should be considered because of the hardware limitations and the comfortableness of passengers. Therefore, more and more researchers have been investigating the control problems for linear or nonlinear systems with constraints [1–3]. The input-constrained problems in controlling nonlinear systems have been investigated widely, such as dynamic inversion model reference control [4], optimal control method [5,6] and anti-windup technique [7]. By comparison, designing a proper controller to stabilize the nonlinear system without violating the state constraints is a more challenging problem, which has received considerable attention in the past years [8–10]. For the practical control systems, the acceptable performance is also an important requirement besides the system stability. For example, optimizing the fuel consumption is usually required for n Corresponding author at: College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning 110819, P.R. China E-mail address: [email protected] (G.-H. Yang).

the spacecraft attitude control systems. Optimal control theory provides methods to compute the control policies making the systems deliver the optimal performance, which was developed by Pontryagin [11] and Bellman [12]. After that, it has been playing a more and more important role in modern control systems and has been investigated widely [13,14]. Designing the optimal control policies for the linear systems with the quadratic performance function usually involves solving the Riccati equation. When the dynamic models of the linear systems are known, the Riccati equation can be solved using some numerical methods. If the models are unknown, some data-based policy iteration methods [15,16] can be used to obtain the optimal control policies. For the nonlinear systems, the Hamilton-Jacobi-Bellman (HJB) equations should be solved to guarantee the existence of the optimal solution [17]. By comparison, it is more difficult to handle the nonlinear optimal control problems, because it is difficult or impossible to obtain the analytical solutions of the HJB equations for the continuous-time nonlinear systems [17,18]. This motivates the research of finding the approximate solution to the nonlinear optimal control problem. To conquer the difficulties in solving the HJB equations, dynamic programming (DP) method proposed in [12] gives a way to solve the optimal control problem. However, DP is generally implemented off-line and the computation complexity grows exponentially with the increasing system dimension. Considering

http://dx.doi.org/10.1016/j.isatra.2016.10.019 0019-0578/& 2016 ISA. Published by Elsevier Ltd. All rights reserved.

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i

Q.-Y. Fan, G.-H. Yang / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

the drawbacks of DP, Werbos in [19] proposed the ADP method which is an effective scheme to solve the optimal control problem for the nonlinear systems. In the last decades, ADP has been investigated and extended by many researchers [20–24]. In [20], NNs were used to approximate the value function of the HJB equation with nonquadratic functionals, which resulted in a nearly optimal input-constrained state feedback controller. Using three NN structures, a novel AC-identifier was developed to approximately solve the infinite-horizon optimal control problem for nonlinear systems with unknown dynamics in [21]. Based on the ADP method, some novel robust control schemes were proposed in [22–25] for uncertain nonlinear systems. Therefore, the ADP method is also expected to be extended to deal with the constrained optimal problems in this paper. 1.1. Related work and motivation In [26,27], barrier Lyapunov functions (BLF) are constructed to handle the constrained control problems for nonlinear systems with special triangular structures. An output transformation technique in [28] was used to design a novel tracking controller for a class of nonaffine SISO nonlinear systems with time-varying asymmetric output constraints. As an effective technique, model predictive control method has also been adopted widely to handle the state constraint issues [29,30]. However, the model predictive control methods are almost numerical and often have heavily computational complexity. How to design a systematical control scheme to guarantee the stability and the optimal performance of state constrained MIMO nonlinear systems is still a difficult problem. This is the first motivation of this work. The aforementioned analysis shows that ADP is an effective approach to solve the optimal control problems. Based on the known nonlinear dynamic model and PI, an online algorithm for learning the optimal control policy for nonlinear systems was presented in [31]. In most of the existing literature based on ADP, the dynamic models of the nonlinear systems are usually assumed to be affine and the input matrices are required to be bounded. A novel iteration ADP algorithm was given in [32] to solve the optimal control policy for the nonaffine nonlinear systems with unknown dynamics. Most of the existing ADP methods require the probing signal to guarantee the convergence of the learning algorithm, which is not suitable for the practical online implementation. In [33], a model-based reinforcement learning scheme was proposed to solve approximate optimal regulation problems online under a PE-like rank condition, where the input constraints or the state constraints are not considered. To the best of authors’ knowledge, there are no results solving the optimal control problem for continuous-time state constrained nonlinear systems based on the ADP method. This is the second motivation of this work. It is known that Bellman's optimality principle [12] offers a comprehensive strategy to solve nonlinear optimal control problems with constraints. Thus, there are many constrained control problems which are solved based on the optimal control method. Several decades ago, a novel transformation technique was proposed in [34] to deal with the optimal control problem with a inequality constraint condition. Although a special technique was proposed in [35] to overcome the singular arcs problem in [34], the performance index was changed and the method is suit for finite horizon optimal control problems with a scalar control variable and a scalar state inequality constraint. In [8,9], the penalty functions were introduced to approximately convert the constrained optimal problems into unconstrained ones, where the finite-time optimization are considered and the converted optimal control problems were still difficult to solve for nonlinear systems. The transformation method in [9] was extended in [36] to solve

the optimal control problem subjected to inequality constraints using iterative ADP algorithm, where the discrete-time affine nonlinear systems are considered. The existing transformation approaches are not suitable for solving the continuous-time infinite horizon optimal control problem with state constraints based on the ADP methods. This is the third motivation of this work. 1.2. Main contribution In this paper, the optimal control problem is investigated for a class of state-constrained nonaffine nonlinear systems. Some proper techniques are introduced to achieve the problem transformation. Then, the ADP method is used to approximately solve the transformed optimal control problem. The main contributions of this paper are presented as follows: 1. Different from [34,35], the transformation technique with new slack functions and a pre-compensator are employed to convert the constrained optimal control problem for nonaffine nonlinear systems into an unconstrained one for augmented affine nonlinear systems. 2. By designing suitable AC algorithm with novel adaptive weight update laws, the approximate optimal control policy for a class of the nonaffine nonlinear systems with state constraints and the discounted performance function can be obtained without adding the probing signal. The rest of this paper is organized as follows: The problem is formulated in Section 2. In Section 3, the transformation technique with new slack functions and a pre-compensator method are introduced to transform the constrained optimal control problem to an unconstrained one, which is followed by designing the ADP algorithm for the transformed dynamics in Section 4. Then, simulation results are shown in Section 5. Finally, Section 6 draws the conclusion. The following notations are used throughout the paper. The superscript ”T” stands for matrix transposition, n denotes the n dimensional Euclidean space. diag (X1, X2 , … , Xn ) denotes a block diagonal matrix with matrices X1, X2 , … , Xn on its main diagonal.

∥a∥=

k

∑i = 1 ai2

is

defined

as

the

norm

of

a

vector

a = (a1, a2, … , ak )T . λmin (X ) is the minimum eigenvalue of matrix X. df (t ) For the smooth function f(t), we define (f (t ))′ = dt .

2. System description and preliminaries Consider a class of continuous-time nonaffine nonlinear systems:

ẋ (t ) = f (x (t ), u (t )),

(1)

where x (t ) = [x1, x2, … , xn ]T ∈ n is the measurable system state, u (t ) ∈ m is the control input, f (x, u) : n × m → n is a locally Lipschitz function with f (0, 0) = 0. In this paper, the state inequality constraints are considered for the above nonaffine nonlinear systems, which are given by

¯ k , k = 1, …, l, mk <  k (x (t )) < m

(2)

¯ k are the lower and upper bounds of  k (x (t )) where mk and m which are the polynomials in x. Remark 1. There commonly exist the state constraints in the practical control systems [28]. For example, the velocity and acceleration of a vehicle should be limited due to safety considerations and the hardware limitations. Similar constraints are also

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i

Q.-Y. Fan, G.-H. Yang / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

considered in the agile missile [1], the robot manipulator [3] and the circuit system [27]. Most of the above practical systems can be modeled as the mathematics model (1) with the constraints (2). The objective of this paper is to solve the following problem. Constrained Optimal Control (COC) Problem: Design a proper control policy u to stabilize the dynamic system (1) without violating the constraints in (2) and optimize the following performance index

J0 (x) =

∫t



e λ (τ − t ) [xT (τ ) Qx (τ ) + uT Ru] dτ ,

where Q ≥ 0 and R > 0 are suitable symmetric matrices, discount factor.

(3)

λ is the

Remark 2. The discounted performance function (3) is introduced as the one in [18] and [37] where the state constraints were not considered. Choosing different λ results in different system performance. If the reference signal or the control input contains a nonzero steady-state part, the performance function (3) without the discount factor may be unbounded [37]. Therefore, it is necessary to consider the discounted performance index in some cases. In order to solve the COC problem, NNs in this paper are used to approximate the nonlinear functions. Considering the results in [38] and [39], it is known that the smooth nonlinear function  (x ) on the compact set  of x ∈ n can be approximated as

 (x) = W0 σ 0 (x) + ε0 (x),

3

the nonlinear system (1). Firstly, the constraints (2) can be rewritten as

¯ k ) < 0, k = 1, …, l. ( k (x (t )) − mk )( k (x (t )) − m

(5)

For some special cases, (5) can be simplified as ¯ k ) < 0 if  k (x (t )) − mk > 0 for any x(t). ( k (x (t )) − m Then, constraints in (5) are converted into the following equality conditions by introducing slack functions γk e αk (t ) , which is different from the transformation methods in [34] and [35].

¯ k ) + γk e αk (t ) = 0, ( k (x (t )) − mk )( k (x (t )) − m

(6)

where γk are positive constants. It is known that γk e αk (t ) > 0. As long as αk (t ) are bounded, the constraints (2) can be guaranteed. Specially, γk can be chosen so that αk (t ) = 0 when x (t ) = 0, which ¯ k ). means γk = − ( k (0) − mk )( k (0) − m Based on the system equation (1), differentiate (6) pk times with respect to t until u appears. Then, we have

∂ k ∂ k ¯ k ) x ̇T − ( mk + m + γk e αk αk,1 = 0, ∂x ∂x ⎛ ⎛ ∂ k ⎞′ ∂ k ⎞⎟′ ⎟ + γ e αk (α 2 + α ¯ k ) ⎜ x ̇T 2 ⎜  k x ̇T − ( mk + m k,2 ) = 0, k,1 k ⎝ ⎝ ∂x ⎠ ∂x ⎠ 2 k xṪ

⋮ Πk, pk (x, u) + γk e αk Γk (αk,1, …, αk, pk − 1) + γk e αk αk, pk = 0,

(7)

(4)

where W0 is ideal weight vector, σ0 (x ) and ε0 (x ) are the nonlinear activation function and the function reconstruction error, respectively. Like most of the existing results based on NNs, the system stability in this paper is only guaranteed on some considered compact sets, which means the results are semiglobal. Accordingly, the following definition and assumptions are given. Definition 1. [5] For the dynamic system x ̇ = fd (x, t ), the equilibrium point x0 is said to be uniformly ultimately bounded (UUB) stable if there exist a bound  x and a time tu to guarantee that ∥ x (t ) − x0 ∥ ≤  x on a compact set  for all t ≥ t0 + tu . Assumption 1. The ideal NN weight vector W0 is bounded, that is ¯ 0 . On a compact set, the NN activation function and the ∥ W0 ∥ ≤ W approximation error are bounded and their gradients are bounded, which means there exist constants σ¯ 0 , ε¯0 , σ¯ 0x and ε¯0x such that ∥ σ0 (x (t ))∥ < σ¯ 0 , ∥ ε0 (x )∥ < ε¯0 , ∥∂σ0 (x (t )) /∂x ∥ < σ¯ 0x and ∥∂ε0 (x ) /∂x ∥ < ε¯0x . Assumption 2. The initial state x (0) and the equilibrium point x ¼0 satisfy the constraints (2). There exists a control policy u to solve the COC problem. Remark 3. Assumption 1 is a standard assumption in the NN-related literature, such as [31–33]. Assumption 2 is a reasonable assumption for the state-constrained control problem.

3. Model transformation for handling the inequality constraints Because it is difficult to solve the optimal control problem directly for nonaffine nonlinear systems with inequality constraints, a novel technique is given to convert the COC problem into an unconstrained one. Moreover, a pre-compensator is designed to circumvent the difficulty resulting from the nonaffine property of

where αk, j (j = 1, … , pk ) denote the jth order derivative of αk (t ). Πk, pk (x, u) are the nonlinear terms about x and u after differentiating the left-hand side of (6) pk times. Γk (αk,1, … , αk, pk − 1) are the polynomials involving αk,1, … , αk, pk − 1. l

Define an augmented state xα = [xT , χ1T , ⋯ , χkT , ⋯ , χlT ] ∈ n +∑i = 1 pk , where χk = [αk, αk,1, ⋯ , αk, pk − 1]T . Then, an augmented system is obtained as

⎧ ẋ = f (x, u) ⎪ ⎪ χ1̇ = A1χ1 + B1α1, p1 ⎪ ⎪⋮ ⎨ . ⎪ χk̇ = Ak χk + Bk αk, pk ⎪⋮ ⎪ ⎪ ⎩ χl̇ = Al χl + Bl αl, pl

(8)

where

⎡0 ⎢ ⎢0 Ak = ⎢ ⋮ ⎢0 ⎢ ⎣0

1 0 ⋮ 0 0

0 1 ⋱ ⋯ 0

⋯ ⋯ ⋱ 0 0

⎡ 0⎤ 0⎤ ⎥ ⎢ ⎥ 0⎥ ⎢ 0⎥ ⋮ ⎥ ∈  pk × pk, Bk = ⎢ ⋮ ⎥ ∈  pk ⎢ 0⎥ 1⎥ ⎥ ⎢⎣ ⎥⎦ 0⎦ 1

Based on the fact γk e αk > 0 and the last equation in (7), αk, pk can be solved as follows

αk, pk = − Γk (αk,1, …, αk, pk − 1) −

1 −αk e Πk, pk (x, u). γk

(9)

Therefore, (8) is rewritten as

xα̇ = F (xα , u),

(10)

where

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i

Q.-Y. Fan, G.-H. Yang / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

⎡ ⎤ f (x , u) ⎢ ⎥ ⎛ ⎞⎥ ⎢ 1 −α1 ⎢ A1χ1 + B1 ⎜ − Γ1 (α1,1, …, α1, p1− 1) − e Π1, p1 (x α , u) ⎟ ⎥ ⎝ ⎠⎥ γ1 ⎢ ⎢ ⎥ ⋮ ⎢ ⎛ ⎞⎥⎥ F (x α , u) = ⎢ 1 −αk A χ + Bk ⎜ − Γk (αk,1, …, αk, pk − 1) − e Πk, pk (x α , u) ⎟ ⎢ k k ⎝ ⎠⎥ γk ⎢ ⎥ ⋮ ⎢ ⎥ ⎢ ⎛ ⎞ ⎥ ⎢ Al χl + Bl ⎜ − Γl (αl,1, …, αl, p − 1) − 1 e−αl Πl, p (x α , u) ⎟ ⎥ l l ⎢⎣ ⎝ ⎠ ⎥⎦ γl

policy for the dynamics (14) with the performance function (15). The transformed problem rather than the original constrained problem is suitable to be solved based on the ADP method. Then, the definition of admissible control will be presented and it is assumed that there exist admissible control policies for the l

dynamics (14) on a set Ω ∈ n + m +∑k = 1 pk .

(11)

It is assumed that F (xα , u) is a locally Lipschitz function. According to the initial condition of the system (1), the initial values αk (t0 ), αk,1 (t0 ), ⋯, αk, pk − 1 (t0 ) should be chosen such that (6) and (7) hold. Remark 4. From the above transformation method, it is known that the constraints (2) will not be violated as long as the the slack variables αk (t ) are kept bounded. It also means that the slack variables αk (t ) can keep to be bounded provided that the dynamics (1) can be stabilized without violating the constraints (2). Therefore, there exists a control policy u which can stabilize (10) if the COC problem admits a solution. Then, the new objective is to design the optimal control u to stabilize (10) and optimize the following performance index

J1 (xα ) =

∫t



e λ (τ − t ) ⎡⎣ xαT (τ ) Q α xα (τ ) + uT Ru⎤⎦ dτ ,

(12)

where Q α = diag (Q , 0) with appropriate dimensions. It is found that the mathematical expression of the new performance index (12) is the same as (3). However, there is no explicit inequality constraints for the augmented system (10). To overcome the difficulty introduced by the nonaffine nonlinearities, the pre-compensator method [40] is used such that u governed by the following dynamic equation

u̇ =  1(u) +  2 (u) ν

(13)

where  1(u) and  2 (u) are Lipschitz continuous function with respect to u, which should be chosen to save the energy of the controller as much as possible. Then, the new augmented system is obtained as

x¯ α̇ = F¯ (x¯ α ) + G¯ (x¯ α ) ν

Remark 6. The controller dynamics (13) should be designed such that u̇ =  1(u) is globally asymptotically stable and  2 (u) is bounded for any u. Generally, we choose  1(u) = Ao u and  2 (u) = Bo , where Ao and Bo are constant matrices.

4. Adaptive AC algorithm for the augmented dynamics In this section, an adaptive AC algorithm will be designed to solve the transformed optimal control problem for the dynamics (14) with the discounted performance index (15). Some approximate optimal control schemes for nonlinear systems with discounted performance functions have been developed in [18] and [37], where the probing signals are generally required to guarantee the convergence of the weight estimations. Based on the augmented dynamic model (14), an online AC algorithm with redesigned adaptive update laws is presented in this section, where the probing signal is not required. For the admissible control policy μ (t ), the infinitesimal version of (15) is obtained as

x¯ αT Q x¯ x¯ α + ζ 2μT μ + VxT¯α (F¯ (x¯ α ) + G¯ (x¯ α ) μ) + λV (x¯ α ) = 0.

H (x¯ α , μ , Vx¯α ) = x¯ αT Q x¯ x¯ α + ζ 2μT μ + VxT¯α (F¯ (x¯ α ) + G¯ (x¯ α ) μ) + λV (x¯ α ),

⎛ ⎜

μ ∈ Ψ (Ω ) ⎝

In order to achieve the nearly optimal control, the performance index is redefined as

∫t



e λ (τ − t ) ⎡⎣ x¯ αT (τ ) Q x¯ x¯ α (τ ) + ζ 2ν T ν⎤⎦ dτ ,

where Q x¯ = diag (Q α, R ) with appropriate dimensions, ζ > 0 is a small constant. Now, the constrained control problem for nonaffine nonlinear systems is transformed to the above optimal control problem of the affine nonlinear system (14) with the discounted performance function (15). Remark 5. Although the performance function (15) is not equivalent to (3), it is found that the stability of the original system (1) can be guaranteed without violating the constraint condition (2) and (3) can be optimized if we obtain the optimal control

∫t



⎞ e λ (τ − t ) ⎡⎣ x¯ αT (τ ) Q x¯ x¯ α (τ ) + ζ 2μT μ⎤⎦ dτ ⎟, ⎠

which satisfies the HJB equation minμ ∈ Ψ (Ω) (H (xα , μ, V x*¯ α )) = 0. It is assumed that minμ ∈ Ψ (Ω) (H (xα , μ, V x*¯ α )) exists and is unique. Based on the stationarity condition [17], the optimal control input ν* is obtained as

ν* = − (15)

(17)

Then, the optimal cost function V * (x¯ α ) is defined as

V * (x¯ α ) = min

⎡ F (xα , u)⎤ ⎡ 0 ⎤ ⎡x ⎤ x¯ α = ⎢ α ⎥, F¯ (x¯ α ) = ⎢ ⎥, G¯ (x¯ α ) = ⎢ ⎥ ⎣ u⎦ ⎣  2 (u)⎦  u ( ) ⎣ 1 ⎦

(16)

where Vx¯ α = ∂V /∂x¯ α . Define the Hamiltonian function as

(14)

where

V (x¯ α ) =

Definition 2. [31,37](Admissible Control) A control policy μ (xα ) is said to be admissible with respect to (15) on Ω, denoted by μ (x¯ α ) ∈ Ψ (Ω ), if μ (x¯ α ) is continuous on Ω, μ (0) = 0, ν (x¯ α ) = μ (x¯ α ) stabilizes (14) on Ω, and V (x¯ α (0)) is finite ∀ x¯ α (0) ∈ Ω .

1 ¯T G (x¯ α ) V x*¯α , 2ζ 2

(18)

where V x*¯ α = ∂V */∂x¯ α . Substituting u0* back into (16) results in the following HJB equitation in terms of V x*¯ α ,

x¯ αT Q x¯ x¯ α + V x*¯αT F¯ (x¯ α ) −

1 *T ¯ T V x¯ G (x¯ α ) G¯ (x¯ α ) V x*¯α + λV * = 0. 4ζ 2 α

(19)

Since the discounted performance function (15) is considered in this paper, whether the optimal control policy (18) can guarantee the stability of the nonlinear system (14) when λ ≠ 0 should be analyzed. Considering the results for constrained-input nonlinear systems in [37] and H∞ tracking control problem in [41], the

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i

Q.-Y. Fan, G.-H. Yang / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

following lemma is presented. n

Lemma 1. For the system (14), it is assumed that V is the optimal value function for the optimal control problem with the performance function (15). ν* is the optimal control policy (18) which can minimize the performance index (15). When the discount factor λ ≥ 0, the optimal control policy ν* can guarantee asymptotical stability of the tracking error dynamics.

μ and the value function

Proof. For any admissible control V (x¯ α (0)), it can be obtained that

V (x¯ α (0), μ) =

∫0 ∫0 +





d λτ (e V * (x¯ α )) dτ + V * (x¯ α (0)) dτ

∫0

e λτ ((V x*¯α )T (F¯ (x¯ α )

To solve the optimal control problem (18) and (19), the iteration algorithm is given as follows: (1) Choose an initial admissible control policy μ0 (x¯ α ) and a proper small scalar ϖ b > 0. Let i¼0 and V0 = 0. (2) With the control policy μi (x¯ α ), solve the following equation for Vi

V i (0) = 0.

μi + 1 = −

1 ¯T G (x¯ α ) V xi¯α . 2ζ 2

(4) Let i = i + 1 and back to step 2 until ∥

+ G¯ (x¯ α ) μ) λV * (x¯ α )) dτ

+ V * (x¯ α (0))

(28)

(3) Update the control policy with

e λτ [x¯ αT (τ ) Q x¯ x¯ α (τ ) + ζ 2μT μ] dτ ∞

can be obtained by choosing small enough discounted factor ∥ λ ∥ and large enough Q (see [37] and [41]).

x¯ αT Q x¯ x¯ α + ζ 2μiT μi + (V xi¯α )T (F¯ (x¯ α ) + G¯ (x¯ α ) μi ) + λV i = 0,

e λτ [x¯ αT (τ ) Q x¯ x¯ α (τ ) + ζ 2μT μ] dτ

∫0

+ =



5

(20)

(29) V i+1



Vi

∥ ≤ ϖb.

H (x¯ α , μ , V x*¯α ) = x¯ αT Q x¯ x¯ α + ζ 2μT μ + (V x*¯α )T (F¯ (x¯ α ) + G¯ (x¯ α ) μ) + λV * (x¯ α )

In [18] and [37], it was shown that the iteration on Eqs. (28) and (29) with an initial admissible policy μ0 (x¯ α ) will converge to the solution of the HJB Eq. (19). Despite all this, the optimal control policy cannot obtained directly, because the step 2 of the PI algorithm is still difficult to solve. In the following, an adaptive AC structure will be presented to learn the value function and optimal control policy based on the PI algorithm. On the compact set Ω, the optimal value function V can be approximated as

= x¯ αT Q x¯ x¯ α + ζ 2ν*T ν* + (V x*¯α )T (F¯ (x¯ α ) + G¯ (x¯ α ) ν*) + λV * (x¯ α )

V = W T σ (x¯ α ) + ε (x¯ α ),

Based on (17), the above equation is rewritten as

V (x¯ α (0), μ) =

∫0



e λτ H (x¯ α , μ , V x*¯α ) dτ + V * (x¯ α (0))

(21)

n

Considering the optimal value function V , the Hamiltonian function is obtained as

+

ζ 2μT μ



ζ 2ν*T ν*

+ (V x*¯α )T G¯ (x¯ α )(μ − ν*)

= ζ 2μT μ − ζ 2ν*T ν* + (V x*¯α )T G¯ (x¯ α )(μ − ν*)

(22)

Using the HJB Eq. (19) and the optimal control policy (18), it follows

H (x¯ α , μ , V x*¯α ) = ζ 2μT μ − ζ 2ν*T ν* + (V x*¯α )T G¯ (x¯ α )(μ − ν*) = ζ 2μT μ − ζ 2ν*T ν* − 2ζ 2ν*T μ + 2ζ 2ν*T ν* = ζ 2 (μ − ν*)T (μ − ν*) ≥ 0

(23)

V (x¯ α (0), μ) =

∫0



e λτ ζ 2 (μ − ν*)T (μ − ν*) dτ + V * (x¯ α (0))

(24)

which demonstrates that V (x¯ α (0) , μ) ≥ V * (x¯ α (0)) for all μ ≠ ν*. It shows that the optimal control policy ν* can minimize the performance index (15). Then, for the augmented dynamics (14) with the admissible control ν = μ and any value function V (x¯ α ), we have

dV (x¯ α ) = VxT¯α (F¯ (x¯ α ) + G¯ (x¯ α ) μ) dt

(26)

Multiplying the both sides of the above equation with e λt yields

d λt (e V * (x¯ α )) = − e λt (x¯ αT Q x¯ x¯ α + ζ 2ν*T ν*) ≤ 0, dt

1 T εHJB = x¯ αT Q x¯ x¯ α + W T ∇σF¯ (x¯ α ) − W T ∇σG¯ (x¯ α ) G¯ (x¯ α )∇σ T W 4ζ 2 (31)

where εHJB is the residual error resulting from the ideal NN approximation errors. As in [31] and [37], εHJB is assumed to be bounded on the compact set Ω. The AC structure contains a critic NN and an actor NN, where the critic NN is used to approximate the value function and the actor NN is to approximate the optimal control policy. That is

^ ^T V = Wc σ (x¯ α )

(32)

(25)

With the HJB equation (19) and the optimal control policy (18), one has

dV * (x¯ α ) = − λV * − x¯ αT Q x¯ x¯ α − ζ 2ν*T ν* dt

where W = [w1, w2, … , wN ] is the weight vector, σ (x¯ α ) = [σ1, σ 2, … , σN ] is the activation function vector. ε (x¯ α ) is the approximation error. Therefore, the derivative of V with respect to x¯ α is Vx¯ α = ▿σ T W + ▿ε . Using the NN approximation (30), the HJB Eq. (19) can be rewritten as

+ λW T σ

Therefore, (21) is rewritten as

(30)

(27)

which means that the tracking error is asymptotically stable when □ λ ≥ 0. Remark 7. Actually, considering (26) with λ < 0 on the compact set Ω, the locally asymptotic stability of the augmented dynamics

μa = −

1 ¯T ^ G (x¯ α )∇σ T Wa 2ζ 2

(33)

^ and ^ are the estimation of the ideal weight vector where both W Wa c ^ and c = W − W W. Define the weight estimation errors W c ^ .  Wa = W − Wa Based on the NN approximation (32) and the HJB equation, the residual error is obtained as

^T ^T ξc = x¯ αT Q x¯ x¯ α + ζ 2μaT μa + Wc ∇σ (F¯ (x¯ α ) + G¯ (x¯ α ) μa ) + λWc σ

(34)

Inspired by the PE-like rank condition in [33], the weight update law of the critic NN is designed to minimize the residual error ξc based on the augmented dynamics (14), that is

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i

Q.-Y. Fan, G.-H. Yang / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

^̇ Wc = Tw1 + Tw2

(35)

T

 a −1W a . L wa = 2 W a 2 Firstly, we have 1

where

V̇ (x¯ α ) = VxT¯α (F¯ (x¯ α ) + G¯ (x¯ α ) μa )

^ Tw1 = − a c1β¯ (β T Wc + x¯ αT Q x¯ x¯ α + ζ 2μaT μa )

(38)

h

Tw2 = − a c2

∑ β¯i (βiT W^c + iT Q x¯ i + ζ2μaT (i ) μa (i )),

Based on the HJB equation and the NN approximation, (38) is rewritten as

i=1

β = ∇σ (F¯ (x¯ α ) + G¯ (x¯ α ) μa ) + λσ ,

1 T (▿σ T W + ▿ε )T G¯ (x¯ α ) G¯ (x¯ α )(▿σ T W + ▿ε ) 4ζ 2 ⎛ 1 ¯T ^ ⎞ G (x¯ α )∇σ T Wa ⎟ − λ (W T σ + ε ) + (▿σ T W + ▿ε )T G¯ (x¯ α ) ⎜ − ⎝ ⎠ 2ζ 2

V̇ (x¯ α ) = − x¯ αT Q x¯ x¯ α +

β¯ = β/(β T β + 1)2 , βi = ▿σ (i )(∇σ (F¯ (i ) + G¯ (i ) μa (i )) + λσ (i ), β¯i = βi /(βiT βi + 1)2 , ac1 and ac2 are positive constants. Enough per-sampled points i, i = 1, 2, … , h are selected randomly in a small compact set Ω x¯ α ∈ Ω so that h

∑ (β¯mi β¯miT ) > kβ I

(36)

i=1

βi /(βiT βi

where β¯mi = + 1) and kβ > 0. The weight update law of the actor NN is given by

^̇ Wa = Proj

{

⎡ ^ ^ ⎤ − a2 ⎢ Fa Wa − Wc ⎥ ⎣ ⎦

(

)}

= − x¯ αT Q x¯ x¯ α +

1 T ^T W ▿σG¯ (x¯ α ) G¯ (x¯ α )▿σ T Wa + εv1 2ζ 2 1 1 T T a W T ▿σG¯ (x¯ α ) G¯ (x¯ α )▿σ T W + = − x¯ αT Q x¯ x¯ α − ▿ε T G¯ (x¯ α ) G¯ (x¯ α )▿σ T W 4ζ 2 2ζ 2 1 T a + εv1 W T ▿σG¯ (x¯ α ) G¯ (x¯ α )▿σ T W + (39) 2ζ 2 −

where

(37)

εv1 =

where a2 is a positive constant and Fa is a positive definite matrix. ^ The projection operator Proj {·} makes sure that the estimation W a  is bounded [42]. W is a constant vector. Therefore, Wa is also bounded. Remark 8. The number of the activation functions should be chosen properly. More activation functions could achieve the better approximation. However, the computation complexity will be increased. Generally, h should be selected larger than the number of the activation functions. Large h is required to make (36) hold. Too large h could lead to the high computational burden. There is a tradeoff between the performance and the computation complexity. Remark 9. The AC structure in [43] and [44] is used in this paper. However, the adaptive weight update law of the critic NN is different from the one in [43] where the probing signal is required to guarantee the PE condition. Although the proposed weight update laws are similar to the ones in [44], the faulty system model should be learnt online based on the fault estimations and the bounded nearly optimal complementary control is desired, which makes the HJB equation and the control policy different from the ones in this paper. Moreover, the optimal control problem with inequality constraints cannot be handled using the schemes proposed in [43] and [44]. Then, the following theorem is given to show the system stability and the convergence of the NN weights. Theorem 1. For the augmented dynamics (14) with Assumptions 1– 2, the controller (33) with NN weight update laws (35) and (37) can guarantee that the augmented state x¯ α and NN weight approximation errors are uniformly ultimately bounded.

1 1 T T ▿εT G¯ (x¯ α ) G¯ (x¯ α )▿ε − ▿εT G¯ (x¯ α ) G¯ (x¯ α )▿ 4ζ 2 2ζ 2 σ1T W − λ (W T σ + ε)

Then, using the update law (35), it follows

T W ̇c = W  T ( − Tw1 − Tw2 ) L ̇ wc = W c c

Proof. The Lyapunov function is chosen as

^ Tw1 = − a c1β¯ (β T Wc + x¯ αT Q x¯ x¯ α + ζ 2μaT μa ) ^ = − a c1β¯ (β T Wc + x¯ αT Q x¯ x¯ α + ζ 2μaT μa − x¯ αT Q x¯ x¯ α − W T ∇σF¯ (x¯ α ) 1 T + εHJB + W T ∇σG¯ (x¯ α ) G¯ (x¯ α )∇σ T W − λW T σ ) 4ζ 2 ⎛ 1 ^T ^ ^ T = − a c1β¯ ⎜ β T Wc + Wa ∇σG¯ (x¯ α ) G¯ (x¯ α )∇σ T Wa 4ζ 2 ⎝ +

1 T W T ∇σG¯ (x¯ α ) G¯ (x¯ α )∇σ T W 4ζ 2

⎞ − W T (∇σ (F¯ (x¯ α ) + G¯ (x¯ α ) μa ) + λσ ⎟ ⎠ + εHJB −

V (x¯ α ) is the optimal cost function,

1 T  L wc = 2 W c Wc and

1 ^ T W T ∇σG¯ (x¯ α ) G¯ (x¯ α )∇σ T Wa ) 2ζ 2

⎛ ⎞ c + 1 W  T ∇σG¯ (x¯ α ) G¯ T (x¯ α )∇σ T W a + εHJB ⎟ = − a c1β¯ ⎜ −β T W a ⎝ ⎠ 4ζ 2

(41)

Similarly, we have h

Tw 2 = − a c 2 = − ac2

L (t ) = V (x¯ α ) + L wc + L wa

(40)

Based on the error Eq. (31), it is obtained that

∑ β¯i i =1

where

1 1 T T a W T ▿σG¯ (x¯ α ) G¯ (x¯ α )▿σ T W + ▿ε T G¯ (x¯ α ) G¯ (x¯ α )▿σ T W 4ζ 2 2ζ 2

h



i =1



( β W^ +  Q  + ζ μ ( )μ ( )) T i

c

c + ∑ β¯i ⎜ −βiT W

T i



i

2 T a

i

a

i

⎞ 1 T T a + εHJB (i ) ⎟ Wa ∇σ (i ) G¯ (i ) G¯ (i )∇σ T (i ) W 4ζ 2 ⎠ (42)

Therefore, substituting (41) and (42) into (40) yields

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i

Q.-Y. Fan, G.-H. Yang / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

 T ββ c − a c2 W T ¯ TW L ̇ wc = − a c1W c c

⎡ λmin (Q x¯ ) ⎡ 0⎤ 0 ⎤ ⎥ > 0, ΦL = ⎢ ⎥. ΨL = ⎢ ⎣ Ξ¯1⎦ 0 a k ⎣ c2 β ⎦

h

c ∑ β¯i βiT W i=1

1  T β¯ W  T ∇σG¯ (x¯ α ) G¯ T (x¯ α )∇σ T W a + a c1W c a 4ζ 2 +

1 T a c2 W c 4ζ 2

Then, we have

L ̇ (t ) < − λmin (ΨL )∥ X¯ ∥2 + Φ¯ L ∥ X¯ ∥ + Ξ¯2.

h

 T ∇σ (i ) G¯ (i ) G¯ T (i )∇σ T (i ) W a ∑ β¯i W a

with Φ¯ L ≥ ∥ ΦL ∥. Therefore, L ̇ (t ) is negative when

i=1

⎛  T ⎜ a c1βε ¯ HJB + a c2 +W c ⎜ ⎝

h



i=1



∑ β¯i εHJB (i ) ⎟⎟

(43)

Based on the property of the projection operator in [42], we have

^ ̇ ⎞⎟  T a −1 ⎛⎜ − W L ̇ wa = W a a 2 ⎝ ⎠  T a −1Proj = −W a 2

⎡ ^ ^ ⎤ − a2 ⎢ Fa Wa − Wc ⎥ ⎣ ⎦

{

(

)}

^ ^ ⎤  T ⎡ Fa W ≤W a − Wc ⎦ a ⎣ ⎢ ⎥

(

)

 T Fa W c − W a =W a

(

)

(44)

Considering (39), (43) and (44), the derivative of the Lyapunov function L(t) is bounded as

1 1 T L ̇ (t ) ≤ − x¯ αT Q x¯ x¯ α − W T ▿σG¯ (x¯ α ) G¯ (x¯ α )▿σ T W + ▿εT G¯ (x¯ α ) 4ζ 2 2ζ 2 T a + 1 W T ▿σG¯ (x¯ α ) G¯ T (x¯ α )▿ G¯ (x¯ α )▿σ T W 2ζ 2 a + εv1 − a c1W  T ββ c − a c2 W T ¯ TW σTW c c

h

c ∑ β¯i βiT W i=1

1  T β¯ W  T ∇σG¯ (x¯ α ) G¯ T (x¯ α )∇σ T W a a c1W + c a 4ζ 2 1 T a c2 W c 4ζ 2

+

 T ⎜ a c1βε ¯ HJB W c ⎜

i=1



.

which means that the system state x¯ α and NN weight approxc are bounded. The projector operator guarantees imation error W  that Wa is bounded. The proof is therefore complete. □ Remark 10. In the adaptive weight update laws (35) and (37), ac1, ac2 and a2 are proper positive constants. Fa is a positive definite matrix. Large ac1, ac2 and Fa could increase the response rate of the weight update law of the critic NN and reduce the weight approximation errors. If these parameters are too large, there may be the high-frequency oscillations which affect the system stability in practice. Remark 11. If the general optimal control methods are used instead of designing the augmented dynamics (14), the system stability can still be guaranteed. However, it is very difficult for the system to meet the constraint requirements on the system state variables. Whereas, by introducing the augmented states αk (t ) which are further used to design the nearly optimal controller based on ADP method, the requirements for the system performance and the state constraints can be meet, because the nearly optimal control policy (33) can guarantee the stability of the augmented dynamics (14).

5. Simulation results

⎞  T Fa (W c − W a ) ∑ β¯i εHJB (i ) ⎟⎟ + W a ⎠ i=1

+ a c2

To demonstrate the effectiveness of the proposed adaptive optimal control method for constrained nonlinear systems, simulation results and the corresponding analysis are given as follows.

h

c + W  T Ξ1 + Ξ2 ∑ β¯i βiT W c

Example 1. In this example, the nonlinear system in [45] is considered as:

where

1  T ∇σG¯ (x¯ α ) G¯ T (x¯ α )∇σ T W a a c1β¯ W a 4ζ 2

x1̇ = − x1 + x2 + 2x23 x2̇ = − 0.5 (x1 + x2 ) + sin (x1) u

h

 T ∇σ (i ) G¯ (i ) G¯ T (i )∇σ T (i ) W a ∑ β¯i W a

where x (t ) = [x1 x2 is the system state. The constraint is considered as −1 < x1 < 1.

i=1

a ∑ β¯i εHJB (i ) + FaT W i=1

1 1 T T a W T ▿σG¯ (x¯ α ) G¯ (x¯ α )▿σ T W + ▿εT G¯ (x¯ α ) G¯ (x¯ α )▿σ T W 4ζ 2 2ζ 2 1 T a + εv1 − W  T Fa W a W T ▿σG¯ (x¯ α ) G¯ (x¯ α )▿σ T W + a 2ζ 2

Ξ2 = −

Based on the condition (36) and the above analysis, it can be obtained that

c ∥2 + Ξ¯1 ∥ W c ∥ + Ξ¯2 L ̇ (t ) ≤ − λmin (Q x¯ )∥ x¯ α ∥2 − a c2 kβ ∥ W where Ξ¯1 ≥ ∥ Ξ1 ∥, Ξ¯2 ≥ ∥ Ξ2 ∥. c ∥]T , (45) is rewritten as Define X¯ = [∥ x¯ α ∥∥ W

L ̇ (t ) ≤ − X¯ T ΨL X¯ + X¯ T ΦL + Ξ¯2

(46)

]T

h

where

2λmin (ΨL )

h

i=1

¯ HJB + a c2 + a c1βε

Φ¯ L2 + 4λmin (ΨL ) Ξ¯2

 T ∇σ (i ) G¯ (i ) G¯ T (i )∇σ T (i ) W a ∑ β¯i W a

T ≤ − x¯ αT Q x¯ x¯ α − a c2 W c

1 a c2 + 4ζ 2

Φ¯ L + ∥ X¯ ∥ >

h

+



Ξ1 =

7

(45)

To show the effectiveness of the proposed scheme in the aspect of keeping the state satisfying the constraint conditions, the novel ADP method in [45] is firstly adopted to control the system (46). In [45], the pre-compensation technique is used to avoid that their algorithm is dependent on the system model. However, the state constraints are not considered. Using the pre-compensation technique, the augmented dynamic model is obtained as

x1̇ = − x1 + x2 + 2x23 x2̇ = − 0.5 (x1 + x2 ) + sin (x1) x3 (47)

x3̇ = k a x3 + kb uv

where x3 is the original control input u. uv is regarded as the new control command. To make the response rate high enough, we choose ka = − 80 and kb ¼ 10. Without considering the state ^ constraint, the nearly optimal control policy uv = − 1 G¯ T (x )∇σ T W a 2ζ 2

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i

Q.-Y. Fan, G.-H. Yang / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

8

Fig. 1. The state trajectories for 30 different initial states based on the control policy uv. T

with G¯ (x ) = [0, 0, kb ] could be obtained using the adaptive optimal control algorithm in [45] or in Section 4 of this paper. Choose the activation function vector of the critic NN as

Fig. 2. Evolution of the augmented system states.

T σ (x) = ⎡⎣ x12 , x1x2 , x3 x1, x22 , x3 x2 , x32 ⎤⎦ .

The other parameters are chosen as Q = diag (1, 1), R ¼1, ζ = 1 and λ = 0. The initial NN weights are chosen randomly in the range [ − 1, 1]. Then, the weights of the actor NN converge to ^ Wa = [0.9823, 0.6039, 0.5339, 0.2105, 0.0688, 0.0062]T . Using the ^ and choosing nearly optimal control policy uv = − 1 G¯ T (x )∇σ T W a 2ζ 2

30 different initial states, the 3D plot of the state trajectories are given in Fig. 1. Then, using the transformation method proposed in this paper, the state constraint condition −1 < x1 < 1 could be firstly rewritten as (x1 − 1)(x1 + 1) < 0 which is transformed to an equality constraint (x1 − 1)(x1 + 1) + e α1(t ) = 0 by introducing a slack variable α1 (t ). Differentiating the equality with respect to time until u appears yields (7) with l ¼1. Then, the augmented system (10) is obtained. Although the augmented dynamics is affine nonlinear equation for this example, it is not reasonable to use the actorcritic algorithm directly, because the input matrix is required to be bounded in Theorem 1. Therefore, the pre-compensator is designed so that u is governed by (13) with  1 = − 80 and  1 = 10. Define x¯ α = [x1x2 α1α1,1x3 ]T , where x3 = u . The final augmented dynamics is obtained as (14). Choose the activation function vector of the NN as

^ Fig. 3. Weight estimations Wc of the critic NN.

2 σ (x¯ α ) = [x12 , x22 , x32 , α12, α1,1 , x3 α1, x3 α1,1, x3 x1, x3 x2 ]T .

The other parameters are chosen as Q = diag (1, 1) R¼1, ζ = 1 and λ = 0 . The initial NN weights are selected randomly in the range [ − 1, 1]. Then, the state trajectories are plotted in Fig. 2. The weight estimations of the critic and actor NNs are shown in Figs. 3 and 4. The weight estimations of the actor NN converge to ^ Wa = [0.2639, 0.1313, 0.1127, 0.2294, 0.4994, − 0.4464, − 0.3374, − 0.0787, − 0.2533 ].T

Using the nearly optimal control policy v = −

1 2ζ 2

^ T G¯ (x¯ α )∇σ T Wa with

T G¯ (x¯ α ) = [0, 0, 0, 0, 10] and choosing 30 different initial states as in Fig. 1, the 3D plot of the trajectories of x1, x2 and x3 are given in Fig. 5. In Fig. 1, it is found that the state constraint −1 < x1 < 1 cannot be satisfied, which demonstrates that the method proposed in [45] cannot be used to handle the constrained nonlinear control problem. Fig. 5 shows that the stability of the system is guaranteed and the state constraint is not violated. The actual control input is u = x3 which has been plotted in Figs. 1 and 5. Compared with Fig. 1, it can be found that more control effort is required to guarantee that the state satisfies the constraints in Fig. 5. In fact, it is reasonable that larger control input is required to avoid the violation of the state constraints. The

^ Fig. 4. Weight estimations Wa of the critic NN.

above results show that the proposed method is effective to deal with the constrained nonlinear control problem. Example 2. In this example, numerical analysis will be carried out on a torsional pendulum system [46]. The input nonlinearities commonly appear in nonlinear systems, which are often modeled as hyperbolic tangent functions of the control input [32]. The dynamics of the pendulum with the input nonlinearity is given as:

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i

Q.-Y. Fan, G.-H. Yang / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

dθe = ωe dt dωe dθ Je = − Me gle sin (θe ) − fd e + be (tanh (u) + u) dt dt

(48)

where θe is the angular position of the pendulum, ωe is the angular rate. be ¼ 1 is a constant. The mass and the length of the pendulum are Me = 1/3 kg and le = 2/3. Je = 4/3Me le2 kg and fd ¼0.2 are the rotary inertia and frictional factor, respectively. In this example, the state constraint is considered as θe2/la2 + ωe2/lb2 < 1 with la ¼1.2 and lb ¼1.5, which means that the system state vector [θe, ωe ] stays in a proper elliptic region. Firstly, even if the above constraint is not considered, the nearly optimal control problem for (48) is not easy to solve, because the dynamic Eq. of (48) is nonaffine with respect to the control input. However, the pre-compensator method proposed in Section 3 can be used to transform the nonaffine nonlinear dynamic equation to an affine one by introducing (13) with  1 = − 80 and  1 = 10. Define the augmented state x¯u = [θe, ωe , u]T . Choose Q = diag (1, 1), R¼ 1, ζ = 1 and λ = 0.05. Without considering the state constraint, ^ with the nearly optimal control policy uv2 = − 1 G¯ T (x¯u )∇σ T W a

9

state trajectories are plotted in Fig. 6. Then, the inequality constraint is considered. By introducing a slack variable α1 (t ), the constraint θe2/la2 + ωe2/lb2 < 1 is transformed to θe2/la2 + ωe2/lb2 − 1 + e α1(t ) = 0. Based on the transformation method proposed in Section 3, the augmented dynamics (14) can be obtained. Choose Q = diag (1, 1), R ¼1, ζ = 1 and λ = 0.05. The activation function vector of the NN is selected as T σ (x¯ α ) = ⎡⎣ θe2, θe ωe , α1θe, θe u, ωe2, ωe α1, ωe u, α12, α1u, u2⎤⎦ .

The initial NN weights are chosen randomly in the range [ − 1, 1]. The nearly optimal control policy (33) with G¯ T (x¯ α ) = [0, 0, 0, 10] could be obtained using the AC algorithm proposed in Section 4. The state trajectories are plotted in Fig. 7. The weight estimations of the critic and actor NNs are shown in Figs. 8 and 9. The weight estimations of the actor NN converge to ^ Wa = [0.4595, 0.2550, − 0.0090, 0.5274, 0.1229, 0.3233, 0.2583, 0.3344, 0.2177, 0.3659]T . 1 ¯T ^ Using the nearly optimal control policy μ = − G (x¯ α )∇σ T Wa a

2ζ 2

and choosing different initial states ( θe (0) = 0.8, − 0.8 ≤ ωe (0) < 0 and θe (0) = 0.7, 0 ≤ ωe (0) ≤ 1.2) as the ones in Fig. 6, the

2ζ 2

T G¯ (x¯u ) = [0, 0, 10] could be obtained using the adaptive optimal control algorithm proposed in Section 4. Using the resulted nearly optimal control policy uv2 and choosing different initial states, the

Fig. 7. Evolution of the augmented system states. Fig. 5. The state trajectories for 30 different initial states based on the proposed algorithm.

Fig. 6. The state trajectories for different initial states based on the control policy uv2.

^ Fig. 8. Weight estimations Wc of the critic NN.

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i

10

Q.-Y. Fan, G.-H. Yang / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

^ Fig. 9. Weight estimations Wa of the critic NN.

constrained nearly optimal control method proposed in this paper, Fig. 10 shows that the system stability is guaranteed and the state constraint is not transgressed for different initial states. The control efforts for different initial states ( θe (0) = 0.7, 0 ≤ ωe (0) ≤ 1.2 ) are plotted in Fig. 11. It is found that the larger control input is required to guarantee that the state satisfies the constraints for the larger initial state. The additional dynamics about χk is introduced to construct the augmented dynamics (10), which combined with the system state x is used to design the nearly optimal control policy. Different from the general control method, the additional dynamics can work timely to change the control input and avoid the validation of the state constraints. These simulation results demonstrate the effectiveness of the proposed method. To test the robustness of the proposed method, some noise signals and uncertainties will be considered in this simulation. At some instant, it is assumed that the system dynamics with the noise signals and uncertainties are presented as follows.

dθe = ωe + Δθ dt dωe dθ Je = − Me gle sin (θe ) − fd e + be (tanh (u) + u) + d (t ) dt dt

(49)

where Δθ = sin (2ωe θe ) is the uncertainty, d(t) represents the noise signal with three different cases: d (t ) = 0.1 sin (3t ), d (t ) = sin (3t ) and d (t ) = 5 sin (3t ). The control parameters are designed as the above ones. Using ^ and the nearly optimal control policy μ = − 1 G¯ T (x¯ α )∇σ T W a a

Fig. 10. The state trajectories for different initial states based on the proposed algorithm.

2ζ 2

choosing different initial states ( θe (0) = 0.8, − 0.8 ≤ ωe (0) < 0 and θe (0) = 0.7, 0 ≤ ωe (0) ≤ 1.2) as the ones in Fig. 6, the trajectories of the states θe and ωe are shown in Fig. 12 for the noise signals d (t ) = 0.1 sin (3t ) and d (t ) = sin (3t ). When d (t ) = 5 sin (3t ), the state trajectories cannot satisfy the desired constraints, which are not plotted. When the uncertainties are not too complex and the noise signals are not too large, Fig. 12 shows that the state constraints are not violated and the bounded stability can still be guaranteed, which means that the proposed control scheme has some degree of robustness to the disturbances and the uncertainties.

6. Conclusion

Fig. 11. The control efforts for different initial states based on the proposed algorithm.

trajectories of the states θe and ωe are shown in Fig. 10. Figs. 7 and 8 show that the convergence of the NN weights is achieved without using the probing signal and the system stability is guaranteed. Fig. 6 shows that the system stability can also be guaranteed based on the existing nearly optimal control method, but the state constraint condition could be violated. Based on the

This paper considered the problem of designing the nearly optimal control policy for a class of state-constrained nonaffine nonlinear systems with the discounted performance function. For the inequality constraints, a novel refined transformation method was given to transform the constrained optimal control problem to an unconstrained one. Considering the nonaffine nonlinearity of the transformed dynamics, a pre-compensator method was introduced so that an affine nonlinear dynamics can be obtained. Then, an adaptive AC algorithm was developed to solve the nearly optimal control problem for the obtained affine nonlinear dynamics. The system stability and the convergence of the adaptive update laws were established via Lyapunov-based analysis. Finally, simulation results were presented to demonstrate the effectiveness of the developed strategy. Future research will focus on the optimal control problem for discrete-time nonlinear systems with state constraints. Additionally, this method will be applied to solve the optimal regulation problems.

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i

Q.-Y. Fan, G.-H. Yang / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

11

Fig. 12. The state trajectories: (a) with d (t ) = 0.1 sin (3t ) and (b) with d (t ) = sin (3t ) for different initial states based on the proposed algorithm.

Acknowledgment This paper was supported in part by National Natural Science Foundation of China (Grant nos. 61621004, 61273148, 61420106016) and the Research Fund of State Key Laboratory of Synthetical Automation for Process Industries (Grant no. 2013ZCX01).

References [1] Han D, Balakrishnan SN. State-constrained agile missile control with adaptivecritic-based neural networks. IEEE Trans Control Syst Technol 2002;10(4): 481–489. [2] Liu YJ, Tong S. Barrier Lyapunov functions-based adaptive control for a class of nonlinear pure-feedback systems with full state constraints. Automatica 2016;64:70–5. [3] Jin X. Adaptive fault tolerant control for a class of input and state constrained MIMO nonlinear systems. Int J Robust Nonlinear Control 2016;26(2):286–302. [4] Lavretsky E, Hovakimyan N. Positive-modification for stable adaptation in dynamic inversion based adaptive control with input saturation. In: Proceedings of the American Control Conference; 2005. p. 3373–8. [5] Modares H, Lewis FL, Naghibi-Sistani MB. Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learn Syst 2013;24(10):1513–25. [6] Liu DR, Wang D, Yang X. An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs. Inf Sci 2013;220:331–42. [7] Rehan M, Hong KS. Decoupled-architecture-based nonlinear anti-windup design for a class of nonlinear systems. Nonlinear Dyn 2013;73(3):1955–67. [8] Li B, Yu CJ, Teo KL, Duan GR. An exact penalty function method for continuous inequality constrained optimal control problem. J Optim Theory Appl 2011;151 (2):260–91. [9] Lin WS, Zheng CH. Constrained adaptive optimal control using a reinforcement learning agent. Automatica 2012;48(10):2614–9. [10] Meng WC, Yang QM, Sun YX. Adaptive neural control of nonlinear MIMO systems with time-varying output constraints. IEEE Trans Neural Netw Learn Syst 2015;26(5):1074–85. [11] Pontryagin LS. Optimal control processes. Usp Mat Nauk ( Russ) 1959;14(3): 3–20. [12] Bellman RE. Dynamic Programming.New Jersey: Princeton University Press; 1957. [13] Branicky MS, Borkar VS, Mitter SK. A unified framework for hybrid control: model and optimal control theory. IEEE Trans Autom Control 1998;43(1): 31–45. [14] Vincent TL. Nonlinear and Optimal Control Systems.New York: John Wiley and Sons; 1997. [15] Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis FL. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 2009;45 (2):477–84. [16] Jiang Y, Jiang ZP. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 2012;48 (10):2699–704. [17] Lewis FL, Vrabi D, Syrmos VL. Optimal Control.New York: John Wiley and Sons; 2012.

[18] Liu DR, Yang X, Li HL. Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics. Neural Comput Appl 2013;23(7–8):1843–50. [19] Werbos PJ. A Menu of Designs for Reinforcement Learning over Time.Cambridge: MIT Press; 1990. [20] Abu-Khalaf M, Lewis FL. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 2005;41(5):779–91. [21] Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 2013;49(1):82–92. [22] Liu DR, Wang D, Wang FY, Li HL, Yang X. Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems. IEEE Trans Cybern 2014;44(12):2834–47. [23] Wang D, Liu DR, Li HL, Ma HW. Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf Sci 2014;282:167–79. [24] Jiang Y, Jiang ZP. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans Neural Netw Learn Syst 2014;25 (5):882–93. [25] Gao WN, Jiang Y, Jiang ZP, Chai TY. Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica 2016;72:37–45. [26] Tee KP, Ge SS, Tay EH. Barrier lyapunov functions for the control of outputconstrained nonlinear systems. Automatica 2009;45(4):918–27. [27] Niu B, Xiang ZR. State-constrained robust stabilisation for a class of high-order switched non-linear systems. IET Control Theory Appl 2015;9(12):1901–8. [28] Meng WC, Yang QM, Si JN, Sun YX. Adaptive neural control of a class of output-constrained nonaffine systems. IEEE Trans Cybern 2016;46(1):85–95. [29] Nguyen HN, Gutman PO, Olaru S, Hovd M. Implicit improved vertex control for uncertain, time-varying linear discrete-time systems with state and control constraints. Automatica 2013;49(9):2754–9. [30] Zhang LG, Liu XJ. The synchronization between two discrete-time chaotic systems using active robust model predictive control. Nonlinear Dyn 2013;74 (4):905–10. [31] Vamvoudakis KG, Lewis FL. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010;46 (5):878–88. [32] Bian T, Jiang Y, Jiang ZP. Adaptive dynamic programming and optimal control of nonlinear nonaffine systems. Automatica 2014;50(10):2624–32. [33] Kamalapurkar R, Walters P, Dixon WE. Model-based reinforcement learning for approximate optimal regulation. Automatica 2016;64:94–104. [34] Jacobson DH, Lele MM. A transformation technique for optimal control problems with a state variable inequality constraint. IEEE Trans Autom Control 1969;14(5):457–64. [35] Yeo BP. Quasilinearization and optimal control problems with a state inequality constraint. Int J Control 2003;76(14):1469–74. [36] Luo YH, Xiao GY. ADP-based optimal control for a class of nonlinear discretetime systems with inequality constraints. In: Proceedings of the 2014 IEEE Symposium on ADPRL; 2014. p. 1–5. [37] Modares H, Lewis FL. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 2014;50(7):1780–92. [38] Hornik K, Stinchcombe M, White H. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 1990;3(5):551–60. [39] Finlayson BA. The Method of Weighted Residuals and Variational Principles. Philadelphia: SIAM; 2013.

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i

12

Q.-Y. Fan, G.-H. Yang / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

[40] Cox C, Saeks R. Adaptive critic control and functional link neural networks. In: Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics; 1998. p. 1652–7. [41] Modares H, Lewis FL, Jiang ZP. H-infinity tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neural Netw Learn Syst 2015;26(10):2550–62. [42] Ioannou PA, Sun J. Robust Adaptive Control.New York: Courier Corporation Publications; 2012. [43] Fan QY, Yang GH. Adaptive actor-critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE

Trans Neural Netw Learn Syst 2016;27(1):165–77. [44] Fan QY, Yang GH. Active complementary control for affine nonlinear control systems with actuator faults. IEEE Trans Cybern 2016 [Online]. [45] Zhang JL, Zhang HG, Liu ZW, Wang YC. Model-free optimal controller design for continuous-time nonlinear systems by adaptive dynamic programming based on a precompensator. ISA Trans 2015;57:63–70. [46] Liu DR, Wei QL. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 2014;25 (3):621–34.

Please cite this article as: Fan Q-Y, Yang G-H. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints. ISA Transactions (2016), http://dx.doi.org/10.1016/j.isatra.2016.10.019i