Barrier Lyapunov function based reinforcement learning control for air-breathing hypersonic vehicle with variable geometry inlet

Barrier Lyapunov function based reinforcement learning control for air-breathing hypersonic vehicle with variable geometry inlet

JID:AESCTE AID:105537 /FLA [m5G; v1.261; Prn:18/11/2019; 14:46] P.1 (1-12) Aerospace Science and Technology ••• (••••) •••••• 1 Contents lists ava...

2MB Sizes 0 Downloads 40 Views

JID:AESCTE AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.1 (1-12)

Aerospace Science and Technology ••• (••••) ••••••

1

Contents lists available at ScienceDirect

67 68

2 3

Aerospace Science and Technology

4

69 70 71

5

72

6

www.elsevier.com/locate/aescte

7

73

8

74

9

75

10

76

11 12 13

Barrier Lyapunov function based reinforcement learning control for air-breathing hypersonic vehicle with variable geometry inlet

16 17 18 19 20

Chen Liu

a,b,∗

b

c

, Chaoyang Dong , Zhijie Zhou , Zhaolei Wang

d

a

Science and Technology on Special System Simulation Laboratory, Beijing Simulation Center, Beijing 100854, China b School of Aeronautic Science and Engineering, Beihang University, Beijing 100191, China c Department of Automation, High-Tech Institute of Xi’an, Xi’an, 710025, China d Beijing Aerospace Automatic Control Institute, Beijing, 100854, China

a r t i c l e

i n f o

25 26 27 28 29 30 31 32 33

81 82 83 84 85 86 88

a b s t r a c t

89

23 24

79

87

21 22

78 80

14 15

77

Article history: Received 6 April 2019 Received in revised form 25 July 2019 Accepted 3 November 2019 Available online xxxx Keywords: Hypersonic vehicle Variable geometry inlet Reinforcement learning Barrier Lyapunov function Performance-guaranteed tracking

34 35

Based on barrier Lyapunov functions, a reinforcement learning control method is proposed for airbreathing hypersonic vehicles with variable geometry inlet (AHV-VGI) subject to external disturbances and diversified uncertainties. The longitudinal dynamic for the AHV-VGI is transformed into strict feedback form. Controllers for velocity and altitude subsystems are designed, respectively. Taking advantage of the reinforcement learning strategy, two radial basis function (RBF) neural networks are applied to estimate the “total disturbances” in the flight control system. Actor network is used for generating the estimate of the disturbance. Critic network is used for evaluating the estimation accuracy. Prescribed tracking performances and state constraints can be guaranteed by introducing barrier Lyapunov functions (BLFs). Tracking differentiators are used to generate the derivatives of virtual controllers in the backstepping design process. Simulation results illustrate the effectiveness and advantages of the proposed control strategy. © 2019 Elsevier Masson SAS. All rights reserved.

90 91 92 93 94 95 96 97 98 99 100 101

36

102

37

103

38

104

39

105

40 41

1. Introduction

42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

In recent decades, air-breathing hypersonic vehicles (AHVs) have attracted a lot of attentions and related researches have become more and more important in both military and civil fields [1–5]. However, the model of AHV is subject to highly nonlinearity, serious coupling characteristic, unknown disturbance and parameter uncertainty. Thus, the controller design for AHVs becomes a difficult task. Moreover, since operating conditions vary significantly over the flight envelope, the dynamics of AHVs would vary accordingly [6]. This fact makes the control problem even more challenging. To solve the problem, a wide range of approaches have been investigated for flight control of the AHV. An early method is gainscheduling control approach [7]. However, the method requires a large amount of controller gains design and schedule such that the design process is complex. Feedback linearization [8] is also an effective control approach for AHV but the exact model of AHV is needed. More recently, lots of advanced control strategy have been

59 60 61 62 63 64 65 66

*

Corresponding author at: School of Aeronautic Science and Engineering, Beihang University, Beijing 100191, China. E-mail address: [email protected] (C. Liu). https://doi.org/10.1016/j.ast.2019.105537 1270-9638/© 2019 Elsevier Masson SAS. All rights reserved.

developed for AHVs with uncertainty and external disturbance. In Ref. [9], a high order sliding mode observer was designed to estimate the unmeasured states, and a quasi-continuous high-order sliding mode controller is proposed for AHVs to ensure the stable signal tracking. In Ref. [10], an improved nonlinear dynamic inversion control method for flexible AHVs was proposed. Ref. [11] presented an adaptive backstepping control for AHVs with input nonlinearities by designing an input nonlinear pre-compensator. Observer based control is another widely applicable method in the AHV controller design procedure, in which observers are introduced to compensate the unknown disturbance and uncertainty [12]. In Ref. [13], an adaptive control strategy is proposed for AHVs to handle the time-varying uncertain coefficients of aerodynamic force and moment, actuator faults and flexible dynamics during a full-envelope hypersonic flight. In Ref. [14], the authors applied an adaptive finite-time observer to estimate the unknown states of the vehicle and presented an adaptive twisting sliding mode control approach for hypersonic reentry vehicles. By using a class of non-homogeneous disturbance observer, an adaptive fast terminal sliding mode controller was designed to ensure finite-time guidance law tracking of AHV in Ref. [15]. Due to the superior approximation ability to nonlinear functions of neural networks and fuzzy systems, intelligent control strategy has also been widely used

106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

JID:AESCTE

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.2 (1-12)

C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••

in AHV flight control. Ref. [16] designed a fault tolerant control scheme for AHV with actuator gain loss fault based on a fuzzy logic system. In Ref. [17], the authors designed a new back-stepping control approach for AHV with neural networks introduced to estimate the unknown dynamics. From the existing results, it is known that for the AHV with fixed geometry inlet, the shockwave would deviated away from the scramjet lip during low Mach flight. This fact would cause insufficient air stream supplement of the scramjet engine, so that the thrust decreases accordingly [18]. To improve the flight performance of the vehicle, AHV with variable geometry inlet (AHV-VGI) is studied. The VGI conformation can extend the velocity range effectively, and it is conducive to the acceleration control of AHV. However, the movement of translating cowl would introduce new uncertainty to the flight control system, such as the uncertain changes of the aerodynamic forces, aerodynamic moments and thrust. Ref. [19] established a fuzzy disturbance observer to reject the uncertainty caused by the movement of translating cowl and designed a dynamic surface control strategy. In Ref. [20], a longitudinal dynamic for AHV-VGI was established and sliding mode controllers based on fuzzy logic system were designed. In Ref. [21], a multi-mode model is established and a switching control method based on RBF neural network is proposed for AHV-VGI. Ref. [22] proposed a performance-guaranteed adaptive back-stepping controller design method for a class of nonlinear systems with uncertainties and disturbances. The proposed method was applied to AHV-VGI effectively. In Ref. [23], the authors proposed two adaptive controller for AHV-VGI to ensure that the vehicle can track the velocity and altitude reference signals stably. At present, the study of AHV-VGIs is still insufficient. The control system design of AHV-VGI urgently awaits to further research. Reinforcement learning is a relative new methodology to deal with uncertain system. Different from traditional supervised learning, the optimal action is obtained via the information from the environment in reinforcement learning strategy [24]. Ref. [25] designed a reinforcement controller for a single-link flexible manipulator to suppress the vibration caused by the flexible light-weight structure. Ref. [26] proposed a data-driven reinforcement learning method to design the robust controller for a class of uncertain nonlinear systems with completely unknown dynamic. An incremental approximate dynamic programming algorithm based on output feedback was presented for flight control in Ref. [27]. In Ref. [28], a data-driven supplementary control strategy for AHV tracking control based on adaptive dynamic programming was proposed. A main advantage of reinforcement learning control is that the controller is updated in real time when the system affected by unknown disturbance and uncertainty [29–31]. Considering the apparent online-learning and online-adjusting advantage, in this paper, reinforcement learning method is applied to solve the controller design problem of AHV-VGI. Furthermore, in practical flight control systems, considering signal tracking performance, actuator saturation and safety specification, there are constraints on system outputs or states. Once the constrains are violated, it would lead to performance degradation, instability and even system damage [32]. To solve state or output constraint problems, a credible solution is barrier Lyapunov function (BLF)-based controller design method, which can guarantee the prescribed tracking performance [33,34]. BLFs have been widely used in practical engineering and theoretical study. By using BLF, Ref. [32] proposed an adaptive control law to guarantee the tracking performance of velocity and altitude for hypersonic flight vehicles. Ref. [35] designed an adaptive fuzzy control scheme based on BLF for permanent magnet synchronous motors system to ensure the position tracking constraint. In Ref. [36], an adaptive control scheme was investigated for a class of nonlinear uncertain stochastic systems with all states constraints. As far as the authors

67 68 69 70 71 72 73 74 75

Fig. 1. The structure of AHV-VGI.

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90

Fig. 2. The schematic diagram of AHV with the translating cowl. (For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)

91 92 93 94

know, there are few researches on the AHV-VGI with velocity and altitude tracking performance constraints, which motivates us for this paper. To further solve the signal tracking problem of AHV-VGI with tracking performance constraints, in this paper, a reinforcement learning control strategy is proposed for AHV-VGI to achieve accurate velocity and altitude tracking based on barrier Lyapunov function. In the proposed control method, all the parameter uncertainties, uncertainties introduced by variable geometry inlet and external disturbances of AHV-VGI can be estimated by a reinforcement learning strategy, which can ensure a superior disturbance estimation performance. Then, by constructing and analyzing the BLFs for the closed-loop system, the prescribe tracking performance constraints are guaranteed theoretically. Also, tracking differentiators are used to generate the time derivatives of virtual control signals. Compared with traditional filters, tracking differentiator has a simpler structure and can get a better approximation of the time derivatives of original signals. From the advantages above, it can be seen that the proposed method is prone to apply in practical engineering. The rest of this paper is organized as follows. The longitudinal model and preliminaries are stated in Section 2. Reinforcement learning controller design methods based on barrier Lyapunov function for both velocity and altitude subsystems are presented in Section 3, respectively. In Section 4, the stability and tracking performance of the closed-loop system are analyzed. A numerical example is provided in Section 5 to verify the effectiveness and advantage of the proposed control strategy. Section 6 is the conclusion.

95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124

2. Problem description and preliminaries

125 126

2.1. Longitudinal model of AHV-VGI

127 128

The structure of AHV-VGI studied in this paper is shown in Fig. 1. It can be seen that the scramjet inlet has a translating cowl which can adjust according to the Mach number and the angle of attack. Specifically, as shown in Fig. 2, when the AHV cruises in a

129 130 131 132

JID:AESCTE AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.3 (1-12)

C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

relatively high Mach number, there would be the forebody oblique shock (blue dash) in the vehicle and the shock wave angle is given as θs . Then, when the free steam (red solid line) meets the oblique shock, it turns parallel to the forebody. Hence, the flow turn angle is δs = α + τ1l . When the vehicle cruises at low Mach number, the shock wave angle θs increases. If the cowl of the scramjet inlet is fixed, the shock wave deviates from the lip of scramjet. This would cause that the engine cannot capture enough air (D 1 is the actual captured area). If the cowl can move the position L 1 , the shock wave would be completely sealed by the cowl (D 1 + D 2 is the actual captured area). According to the research result of [19], if define the forebody length L f = 47 ft, the height of engine inlet h = 3.5 ft and forebody angle τ1l = 3.5◦ , D 1 and D 2 can be calculated as

D1 =

18 19 20 21

D2 =

hsin(θs )cos(τ1l )

(1)

sin(θs − α − τ1l ) h + L f tan(τ1l )sin(θs )

(2)

sin(θs − α )

Also, the optimal position of the translating cowl is

22 23 24 25 26 27 28 29 30 31

L 1 = L f − ( L f tan(τ1l ) + h)cot(θs − α )

C T = C Tα α + C TMa ∗ Ma + C T0 φα

φ

φ Ma

CT = CT ∗ α + CT

71

Ma 0 ˆ C˜ D = C lD · ˆl = (C α D ,l α + C D ,l Ma + C D ,l ) · l C˜ T = C l · ˆl = (C α α + C Ma Ma + C 0 ) · ˆl

73

T

T ,l

α α+ C˜ M = C lM · ˆl = (C M ,l

40 41 42 43 44 45 46 47 48 49 50 51

78

function of Mach number Ma and ting method, the expression of ˆl is

Q˙ = M / I y y + d Q

58 59 60 61 62 63 64 65 66

L ≈ q¯ S (C L + C˜ L ) D ≈ q¯ S (C D + C˜ D )

where detailed parameters value can also be found in the Appendix. Then, it can be obtained that the model of AHV-VGI contains two outputs — the velocity V and altitude h. It also contains two inputs — the fuel equivalence ratio φ and the elevator deflection δe .

80 82 83 84 85 86 87 88 89 90

2.2. Strict feedback form

91

The longitudinal model of AHV-VGI can be rewritten in strict feedback form as

92 93 94

V˙ = g V · φ + f V + V

95

h˙ = V sinγ

96 97

(8)

98 99 100

Q˙ = g Q · δe + f Q + Q

101 103 104 105

f V = (¯q · C T · cosα − q¯ S · C D )/m − gsinγ

106 107

g γ = q¯ S · C Lα /(mV )

108

δ

f γ = q¯ S · (C LMa · Ma + C Le · δe + C L0 − C Lα · γ )/(mV )

+ T · sinα /(mV ) − g · cosγ / V

109

(9)

where C L , C D , C T and C M are the aerodynamic lift coefficient, drag coefficient, thrust coefficient and pitching moment coefficient, respectively. q¯ = 1/2ρ v 2 denotes the dynamic pressure. ρ , S , c¯ , z T are density, reference area, aerodynamic chord and thrust moment arm, respectively. φ is the fuel equivalence ratio. C˜ L , C˜ D , C˜ T and C˜ M are introduced by the translating cowl. The aerodynamic parameters mentioned above are given as follows and the detailed coefficients value can be found in the Appendix (Table 1). δ C L = C Lα α + C LMa Ma + C Le δe + C L0 δe2

α 2 + C δDe δe + C D δe2 + C DMa Ma + C 0D

110 111

g α = 1; f α = − g γ · α − f γ

112

δe

g Q = q¯ S c¯ · C M / I y y

113

α · α + C Ma · Ma + C 0 ) + z q¯ · (C f Q = [¯q S c¯ · (C M T T M M

115

114 116

φ

(5)

φ

2

(7)

g V = q¯ · C T · cosα /m

where V is velocity, h is altitude, γ is flight path angle, α is angle of attack, Q is pitching rate, m is mass, I y y is moment of inertia and g is acceleration of gravity. di (i = V , γ , α , q) denote unknown external disturbances. L , D , T , M are lift, drag, thrust and pitching moment, respectively. In AHV-VGI model, the change of aerodynamic forces caused by the translating cowl is supposed to be considered. By curve fitting method, the aerodynamic forces are given as

α CD = Cα Dα + CD

79 81

ˆl ≈ C α ∗ α + C α 2 ∗ α 2 + C M a ∗ Ma + C 0 l l l l

φ

M ≈ q¯ S c¯ (C M + C˜ M ) + z T ∗ T

57

α . Similarly, by using curve fit-

77

102

53

56

75

where

T ≈ q¯ (C T + C T · φ + C˜ T )

55

(4)

α˙ = Q − γ˙ + dα

52 54

74

The estimated value of the optimal elongation distance ˆl is a

α˙ = gα · Q + f α + α

39

T ,l 0 ˆ + CM ,l ) · l

76

h˙ = V sinγ

38

T ,l Ma CM ,l Ma

72

(6)

33

37

70

0 ˆ C˜ L = C lL · ˆl = (C Lα,l α + C LMa ,l Ma + C L ,l ) · l

γ˙ = gγ · α + f γ + γ

36

69

(3)

However, in the practical flight, the optimal elongation distance can not be acquired precisely. Hence, the application of the moving cowl introduces other aerodynamic uncertainties for the AHV-VGI, which makes the control system design of AHV-VGI more challenging. In this sense, the longitudinal dynamics of the AHV-VGI is formulated by the following equations [19]:

γ˙ = ( T sinα + L )/(mV ) − g ∗ cosγ / V + dγ

68

∗ Ma + C T δ

V˙ = ( T ∗ cosα − D )/m − g ∗ sinγ + d V

35

67

φ

α ∗ α + C Ma ∗ Ma + C e ∗ δ + C 0 CM = CM e M M M

32 34

3

+ C T · φ]/ I y y

117

Then, non-linear parts introduced by the variable geometry inlet, parameter uncertainties and external disturbances of the vehicle are all treated as “total disturbances” i (i = V , γ , α ) and expressed as

118

V = g V · φ + f V + ( f Vl + f Vl ) · ˆl + d V γ = g γ · α + f γ + ( f l + f l ) · ˆl + dγ

123

γ

γ

α = f α + ( f αl + f αl ) · ˆl + dα

119 120 121 122 124

(10)

Q = g Q · δe + f Q + ( f Ql + f Ql ) · ˆl + d Q where f Vl , f γl , f αl , f Ql are the uncertainties introduced by movable cowl, and f Vl , f γl , f αl , f Ql denote the uncertainties of aerodynamic coefficients, which are described as

125 126 127 128 129 130 131 132

JID:AESCTE

AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.4 (1-12)

C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

f Vl = (¯q C lT cosα − q¯ SC lD )/m f γ = q¯ SC lL /(mV ) f Ql = q¯ ( S c¯ C lM + z T C lT )/ I y y l l l

(11)

fα = − fγ

Suppose that V c and hc are some reference velocity and altitude signals and time derivatives of V c and hc are bounded. The goal of this paper is to realize the tracking control of AHVVGI which suffers from parameter uncertainty, unknown external disturbance and nonlinearity caused by translating cowl with the prescribed tracking performances: | z V | = | V − V c | < A zV , | zh | = |h − hc | < A zh and state constraints: |γ | < A γ , |α | < A α , | Q | < A Q , where A zV , A zh , A γ , A α , A Q are positive constants. Then, the following assumptions are made for AHV-VGI.

16 17 18 19 20 21 22 23

Assumption 1. [19] The functions g i and its differentials g˙ i are bounded, i = V,γ, Q . Assumption 2. [32] The initial flight conditions of AHV-VGI satisfy the prescribed performances and constraints, that is, | z V (0)| < A zV , | zh (0)| < A zh , |γ (0)| < A γ , |α (0)| < A α , | Q (0)| < A Q .

24 25 26 27 28 29 30 31 32 33 34

Remark 1. Assumption 1 is applied to ensure that the control inputs are non-singular and bounded. Since g i , i = V , γ , Q are relative to the dynamic pressure, mass, Mach number, attack angle and aerodynamic coefficients of the vehicle. During the practical flight process, all the parameters mentioned above are bounded and in reasonable range. Therefore, Assumption 1 is acceptable. Assumption 2 means that the initial attitude angles of the vehicle should be adjusted to the initial values given by the guidance law. This is easy to achieve in the practical flight. Hence Assumption 2 is reasonable.

where μi = [μi1 , μi2 , · · · , μik ] is the center of receptive field and μk is the width of the Gaussian function. It is proved that for any continuous function f ( Z ), where Z is within a compact set z ⊂ Rk , the approximation errors of an RBF neural network can be made arbitrarily small if the number of hidden neurons is large enough. Namely, there is

67

f ( Z ) = W ∗ S ( Z ) + ε , ∀ Z ∈ z

74

where W ∗ is the optimal weight of the network and bounded approximation error.

2.3. Preliminaries

37 38 39

To apply the reinforcement learning control scheme with the actor-critic structure, a long-term cost function is designed as

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

∞ I (t ) =

e



66

ψ

ϕ (m)dm

(12)

t

where ψ is a discount factor which is applied to discount the future cost.ϕ (t ) is an instant cost function and given as T

ϕ (t ) = Z P Z

(13)

where P is a 5 × 5 positive definite symmetric matrix, Z = [ z V , zh , zγ , zα , z Q ]T , zγ = γ − γc , zα = α − αc , z Q = Q − Q c . γc , αc and Q c are virtual controllers, which will be designed in the next section. Due to the superiority in approximating nonlinear functions, RBF neural networks are used to construct reinforcement learning control strategy. For an arbitrary continuous function f ( Z ) : Rk → R, the following RBF neural network is given as

f nn ( Z ) = W S ( Z )

(14)

where Z = [ Z 1 , Z 2 , · · · , Z k ]T ⊂ Rk is the input vector, W = [ w 1 , w 2 , · · · , w l ] ⊂ Rl is the weight vector with l NN nodes, l > 1. S ( Z ) = [s1 ( Z ), s2 ( Z ), · · · , sl ( Z )]T , where si ( Z ) is given as a Gaussian function:



64 65

m−t

si ( Z ) = exp

−( Z − μi )T ( Z − μi ) 2 k

μ



(15)

70 71 72 73

76 77 78 79 80

The structure of the controller proposed in this paper is shown in Fig. 3. The controller contains 4 parts: critic network, actor network, velocity subsystem controller and altitude subsystem controller. In the subsequent content of this section, detailed design process will be given.

81 82 83 84 85 86

3.1. Design of critic network

87 88

A critic network is designed to evaluate the estimation performance of the actor network. The critic network is constructed using the RBF neural network. In the ideal case, define I = W c∗ T S c (c in ) + εc , where W c∗ is the optimal weight of the critic network, εc is a bounded approximation error. And the input and output vector of the critic actor are c in = [ z V , zh , zγ , zα , z Q ]T , cout = ˆI , respectively. Therefore, the function of the network can be exˆ cT S c (c in ), where W ˆ cT is the approximation of W c∗ . pressed as ˆI = W Combined with (12), the estimation error of the cost-to-go function becomes

γ (t ) = ϕ (t ) −

1

ψ

ˆI (t ) + ˙ˆI (t )

γ (t ) = ϕ (t ) + ˙ˆI (t ) = φ(t ) + ∇ ˆI (t )˙c in

˙ˆ = −σ ∂ E c W c c ∂Wc

(19)

˙

92 93 94 95 96 97 98 100 102 103 105 106 107 108 109 110 111

γ T γ . Combine with (18), it can be obtained that

ˆ c = −σc γ (t ) W

91

104

(18)

where ∇ represents the gradient of c in . The updating law of the critic network can be described as

1 2

90

101

As the constant ψ → ∞, the approximation error of the cost-to-go function is rewritten as

where E c =

89

99

(17)

∂[ϕ (t ) − (1/ψ) ˆI (t ) + ˙ˆI (t )] = −σc γ (t ) ∂Wc  ˆ   ˆ ∂I ∂ 1 ∂I = −σc γ (t ) − + ψ ∂Wc ∂ W c ∂ c in = −σc (ϕ (t ) +

112 113

∂γ ∂Wc

114 115 116

(20)

117 118 119 120 121

W cT )

122

where σc > 0 is the learning rate of the critic network,  = −( S c /ψ) + ∇ S c c˙ in . Then, a Lyapunov candidate function for the critic network is designed as

123 124 125 126 127

Lc =

1 2

˜ cT W ˜c W

(21)

˙

˙

˜ cT (ϕ (t ) + W cT ) ˜c=W ˆ c = −σc W ˜ cT W ˜ cT W L˙ c = W

128 129 130

Substituting (20) into (21), the derivative of L c is

, i = 1, 2, · · · , l

69

75

ε is a

3. Controller design

35 36

(16)

68

131

(22)

132

JID:AESCTE AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.5 (1-12)

C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••

5

1

67

2

68

3

69

4

70

5

71

6

72

7

73

8

74

9

75

10

76

11

77

12

78

13

79

14

80

15

81

16

82

17

83

18

84

19

85

20

86

21

87

22

88

23

89

24

90

25

91

26

92

27 29 30 31 32

∗T Sc

ϕ (t ) = W c

ψ

35 36 37 38 39 40 41 42 43 44 45

= W c∗ T

48 49 50 51 52 53 54 55 56 57 58 59 60 61

ψ

+

ψ

εc ψ

− ∇ I c˙ in − ∇( W c∗ T S c (c in ) + εc )˙c in

(23)

∗T

= − W c  + c where c = εc + ∇ εc Z˙ c , ||εc || ≤ c ,max . Substituting (23) into (22), there is

˜ cT ( W ˜ c S c + c ) L˙ c = −σc W

σc T  ˜ T ˜ σc T ≤− Wc Wc + c c 2

≤−

(24)

2

64 65 66

˙ˆ = −σ ∂ E a W a a ∂ Wa where E a =

(27)

La =

3.2. Design of actor network

2

(28)

ˆ a S a (ain ), where W ˆ a is the estimation network is given as aout = W of W a∗ . Therefore, the instant approximation can be defined as

ςa = [ςaV , ςaγ , ςaα , ςa Q ] = W˜ aT S a (ain ) T

(25)

ˆ a − W a∗ . The error with the actor network is ex˜a=W where W pressed as

100 101

108

111 113 114 115 116 117

(31)

118 119

˜ cT S c (c in ), one has Since ˆI = W c∗ T S c (c in ) + W I I ≤ 2( W c S c )

106

112

(30)

T

105

110

˜ aT S a ( W ˆ aT S a (ain ) + K I ˆI ) L˙ a = −σa W ∗T

99

109

(29)

˜ aT W ˜a W

ˆT ˆ

98

107

ςa is

The derivative of (30) is The input vector of the actor network is ain = [ z V , zh , zγ , zα , z Q ]. The output vector of the network is expressed as aout = ˆ V , ˆ γ , ˆ α, ˆ Q ]T , where ˆ V, ˆγ, ˆ α and ˆ Q are the estima[ tion value of V , γ , α and Q in (10), respectively. Then, define the optimal neural weight of the actor network as W a∗ . There is [ V , γ , α , Q ]T = W a∗ T S c (ain ) + εa , where εa = [εaV , εaγ , εaα , εa Q ]T , εaV , εaγ , εaα and εa Q are the approximation errors of V , γ , α and Q , respectively. The function of actor

97

104

˙ˆ = −σ ( W ˆ aT S a (ain ) + K I ˆI ) S a W a a 1

96

103

So, (27) can be rewritten as

Choose a Lyapunov candidate function for the actor NN as

2

95

102

1 T e e . 2 a a

˙ˆ = −σ ∂ E a ∂ ea ∂ ςa = −σ (ς + K ˆI ) S W a a a a I a ∂ ea ∂ ςa ∂ W a

σc   ˜ T ˜ σc Wc Wc + ||c,max ||2

62 63

where I d (t ) = 0, which represents the desired ideal cost-to-go, and K I is a 4 × 1 gain matrix. The updating law of the critic network is

T

2

(26)

where σa > 0 is the learning rate of the actor network. Since unknown, the updating law is redefined as

˜ cT W ˜ c − σc W ˜ cT c  = −σc   W T

46 47

Sc

+

εc

94

ea = ςa + K I ( ˆI (t ) − I d (t ))

As γ → 0, ϕ (t ) = ( I /ψ) − ˙I . Then, one has

33 34

93

Fig. 3. The diagram of AHV-VGI control system based on reinforcement learning.

28

120 121

˜ cT S c ˜ cT S c )T W W c S c + 2( W ∗

(32)

122

Combined with (31), it can be obtained that

123

˜ aT S a ( W a∗ T S a (ain ) − W ˜ aT S a (ain ) + K I ˆI ) L˙ a = −σa W

125

∗T 2 ˜T ˜ 2 a W a S a W a S a (ain ) + a || W a || || S a || T ∗ 2 ˜T ˜ 2 a W a S a W a S a (ain ) + a || W a || || S a || a a ˜ a ||2 || S a ||2 + || W K IT K I || ˆI ||2

= −σ

σ

≤ −σ

σ

− ≤−

σ

2

˜ aT S a K I ˆI − σa W

2

˜ a ||2 || S a ||2 + || W

σa 2

|| W a∗ ||2 || S a ||2 −

126 127 128 129

σ

2

σa

124

130

σa 2

˜ a ||2 || S a ||2 || W

131 132

JID:AESCTE

AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.6 (1-12)

C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••

6

1 2 3

˜ c ||2 || S c ||2 ) +σa K IT K I (|| W c∗ ||2 || S c ||2 + || W ˜ a ||2 || S a ||2 + σa || W a∗ ||2 || S a ||2 = −σa || W 2

T ∗ 2 2 a K I K I || W c || || S c || T 2 ˜ 2 a K I K I || W c || || S c ||

4



5



6

(33)

7 8

3.3. Controller design for velocity system

9 10 11 12 13 14 15 16 17 18

With z V = V − V c defined, the dynamic velocity tracking error is described as

z˙ V = g V · φ + f V + V − V˙ c

To guarantee the prescribed tracking performance | z V | < A zV , a barrier Lyapunov function candidate is chosen as

1

L1 =

2

19 20

(34)

log

A 2zV



A 2zV

(35)

z2V

where log(·) denotes the natural logarithm of (·).

23 24 25 26 27 28 29 30

φ = gV

− kV zV −

L˙ 1 =

36

zV

=

38 40



2( A 2zV

z2V )

ˆV + V˙ c − f V −



A 2zV − z2V

=−

41

k V z2V A 2zV



z2V

− kV zV − −

2( A 2zV − z2V )

z2V 2( A 2zv



zV



z2V )2

+

+ ςaV + εaV

z V ςaV A 2zV



z2V

+

z V εaV A 2zV

46 47



z V ςaV A 2zV



z2V

z V εaV

49

A 2zV − z2V

51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

≤ ≤

z2V 4( A 2zV



z2V )2

z2V 4( A 2zV − z2V )2

+ ||ςaV ||2 + ||εaV ||

2

(39)

Substituting (38) and (39) into (37), there is

L˙ 1 ≤ −

k V z2V A 2zV − z2V

+ ||ςaV ||2 + ||εaV ||2

3.4. Controller design for altitude system In this section, a controller for altitude subsystem is designed. To guarantee the prescribed altitude signal track performance | zh | < | A zh |, the state constrains |γ | < A γ , |θ| < A θ , | Q | < A Q are supposed to be guaranteed simultaneously. Naturally, it can be assumed that

|γc | < A γc < A γ |αc | < A αc < A α |Q c| < A Qc < A Q

(41)

73 74 76

80 81 82 84

(44)

85 86 87

| z|a sign( z), | z| > δ, z

90

(45)

| z| ≤ δ, δ > 0

,

δ 1−a

89

| z| > δ

91 92

and the parameters in (45) satisfy: R > 0, 0 < a < 1, b > 0, δ > 0.

93 94

Remark 3. In (44), function f al( z, a, δ) is applied to avoid the chattering at the origin. To obtain better tracking performance and shoter convergence time, parameter R should be relatively large. However, excessively large R would increase high-frequency noise. Parameter δ has a major impact on the tracking signal. δ is related to R. Specifically, with the increase of R, δ increases with the same increase rate.

− kγ zγ −

zγ 2( A 2zγ − zγ2 )

(46)

ˆγ + γ˙c − f γ −

Similar to (44), introduce a new variable ˙c: tracking differentiator to obtain α

97 98 99 100 101 102

105 106 107



108

(47)

αc and let α¯ c pass the

⎧ ⎨ α˙ c = xα   f al(xα , a, δ) ⎩ x˙ α = − R f al(αc − α¯ c , a, δ) + b a

96

104

The nominal virtual control for (46) is designed as



95

103

109 110 111 112 113 114

(48)

R

(40)

72

88



α¯ c = gγ (38)

71

83

⎧ ⎨ γ˙c = xγ   f al(xγ , a, δ) ⎩ x˙ γ = − R f al(γc − γ¯c , a, δ) + b a

−1

Applying Young’s inequality, one has

48 50

γc and let γ¯c pass the tracking dif-

z˙ γ = γ˙ − γ˙c = g γ α + f γ + γ − γ˙c

z2V

44 45

Next, introduce a new variable ferentiator to obtain γ˙c :

Step 2: The dynamics of the virtual tracking error zγ is

(37)

42 43

(36)

70

79

(43)



z V z˙ V A 2zV − z2V

39

zV

With (36), the time derivative of L 1 yields

35 37

V

f al( z, a, δ) =



69

78

(−kh zh + h˙ c )

Therefore, if the boundedness of L 1 can be guaranteed, the system will comply with the given regulation | z V | < A zV .

33 34

γ¯c =

1

zV

−1

68

77

where

31 32

(42)

The virtual control for system (42) is given as

Remark 2. It is obvious that L 1 is a valid Lyapunov function candidate if z V satisfying | z V | < A zV , and there is lim z V → A − L 1 = +∞.

Next, the control law for the velocity subsystem is designed as

67

75

z˙ h ≈ V γ − h˙ c = V ( zγ + γc ) − h˙ c

R

21 22

And then, state tracking performance can be defined as: | zγ | < A zγ ≤ A γ − A γc , | zα | < A zα ≤ A α − A αc , | z Q | < A z Q ≤ A Q − A Q c . Different from conventional backstepping design procedure, the tracking differentiator (TD) is applied in the design procedure of this paper such that the “explosion of term” is avoided and a better system performance can be acquired. The design procedure is shown as follows: Step 1: The dynamics of the altitude tracking error zh is

115 116 117

Step 3: The dynamics of the virtual tracking error zα is described as

118

˙ − α˙ c = Q − γ˙ + dα − α˙ c z˙ α = α

121

(49)

The nominal virtual control for (49) is designed as

Q¯ c = −kα zα −

zα 2) 2( A 2zα − zα

ˆα + α˙ c − f α −

119 120 122 123 124

(50)

125 126

By introducing a new variable Q c , let Q¯ c pass the tracking differentiator to obtain Q˙ c :

⎧ ⎨ Q˙ c = xα   f al(x Q , a, δ) ⎩ x˙ Q = − R f al( Q c − Q¯ c , a, δ) + b a R

127 128 129 130

(51)

131 132

JID:AESCTE AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.7 (1-12)

C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••

1 2 3 4 5 6 7

z˙ Q = Q˙ − Q˙ c = g Q · δe + f Q + Q − Q˙ c

10 11 12 13 14 15 16 17



1 − kQ zQ − δe = g − Q

20

23

26

1

29 30 31

Lh =

2

log

1

Lγ =

2 1

Lα =

34

L˙ 2 =

=

36

2

A 2zγ

(54)

38 39

+

40 41

+

42 43

log

A 2zα

+

48 49 50



51 52 53



54 55

zγ ςaγ

− zγ2

A 2zγ

2 kα zα

A 2zα

− α z2

k Q z2Q A 2z Q − z2Q

zγ z˙ γ

− zγ2

+

zα z˙ α

− kα zα − 

− zγ2 )

2( A 2zγ zγ 2( A 2zα

− α

z2 )

61 62

zα ςaα

64 65 66

A 2zα

− α z2

z Q ςa Q A 2z Q



z2Q



kγ zγ



A 2zγ − zγ2



A 2zα



− α z2

A 2z Q



79

z2Q

80 81 82 83

(58)

84

+

A 2zγ



+

+ ςα + εα + ςQ + εQ

− α

+

z2Q 2( A 2z Q − z2Q )2

(59)

91

A 2zV − z2V 2 kα zα

A 2zα

σc

− α z2

kh zh2

− −

A 2zh − zh2



A 2z Q

||c,max || +

σa 2

99 100 101 102

z2Q

103

∗ 2

2

T ∗ 2 2 a K I K I || W c || || S c ||

|| W a || || S a || + σ

κ = min

A 2zα

− α z2

+

z Q ςa Q A 2z Q − z2Q

ξ=

zα εaα A 2zα

+

− α z2

2

107

(60)

T 2 a K I K I || S c ||

σc   − 2σ T

− α z2Q



z2Q )2

109

2

||c,max ||2 +

σa 2

, σa , k V , kh , kγ , kα , k Q

112

(61)

113 114 115

|| W a∗ ||2 || S a ||2

116 117

(62)

118 119

A 2z Q − z2Q

Moreover, to ensure

κ > 0, the following conditions are given:

120 121

T

T 2 a K I K I || S c ||

σc   − 2σ

> 0, σa − 1 > 0

(63)

122 123 124 125 126

2

+ ||ςa Q ||

108

111



+ ||ςaγ ||2 + ||ςaα ||

105 106

+ σa K IT K I || W c∗ ||2 || S c ||2 + ||ςa ||2 + ||εa ||2

z Q εa Q

104

110



σc

96 98

A 2zγ − zγ2

k Q z2Q

2

2



95 97

kγ zγ2

where

zα ςaα

+

93

Then, the main result of the paper can be obtained.

z2 )2

4( A 2z Q

89

= −κ L + ξ

2( A 2zγ − zγ2 )2

z2 )2

2 zα

4( A 2zα

88

+||ςa ||2 + ||εa ||2



− zγ2

2( A 2zα

k V z2V





2 zα



86

94



zγ εaγ

4( A 2zγ − zγ2 )2



z2

2

2

zγ2



− γ

A 2zγ

σc T  ˜ T ˜ ˜ c ||2 || S c ||2 − σa || W ˜ a ||2 || S a ||2 L˙ ≤ − W c W c + σa K IT K I || W

+ ςγ + εγ

2( A 2z Q − z2Q )

2



z Q z˙ Q A 2z Q − z2Q

zQ

58

A 2zγ − zγ2



78

k Q z2Q

87





− kQ zQ − −

+

− zα2

A 2zα

By Young’s inequality, one has

60



zh2

2 kα zα

92

(56)

zγ ςaγ

A 2zh

kγ zγ2

From (24), (33), (40) and (58), one has

56

59

kh zh2

77

90

A 2zγ

A 2z Q − z2Q kh zh2 A 2zh − zh2

76

Substituting (57) into (56), the time derivative of L 2 satisfies

L˙ 2 ≤ −

74 75

L = L c + La + L 1 + L 2



zQ

(57)

Consider the following Lyapunov function candidate

− kγ zγ −

− α

73

+ ||εa Q ||2

z2Q )2

(55)



z2

72

85

A 2z Q − z2Q



71

4. Stability and performance analysis

2 A 2zα − zα 2 AzQ

+



4( A 2z Q

70

+ ||εaα ||2

z2Q



z2Q

z2 )2

69

+||εaα ||2 + ||ςa Q ||2 + ||εa Q ||2

− γ

− zγ2

A 2zγ

=−

47

63



A 2z Q

zh2 z2



44

57

z Q εa Q

− α

4( A 2zα

68

+||ςaγ ||2 + ||εaγ ||2 + ||ςaα ||2

A 2zα

zh z˙ h A 2zh − zh2 −kh zh2 A 2zh − zh2

+

37

46

(53)

Combined with (42), (46), (49) and (52), the time derivative of L 2 yields that

35

45

)

− α

z2 )2

2 zα

67

+ ||εaγ ||2

− γ

4( A 2zγ



z2

A 2zγ

log

1



A 2zh

log

2

LQ =

32 33

ˆQ + Q˙ c − f Q −

zα εaα

A 2zh

27 28

z2Q

z2

where

24 25



2( A 2z Q



L 2 = Lh + L γ + L α + L Q

21 22

zQ

zγ2



− γ

A 2zγ A 2zα

To guarantee the prescribed tracking performance | z V | < A zV , a barrier Lyapunov function candidate for altitude subsystem is chosen as

18 19

(52)

The actual controller of the altitude subsystem can be designed as

8 9

zγ εaγ

Step 4: For the dynamics of virtual tracking error z Q

7

2

Theorem 1. Consider the AHV-VGI, with the proposed reinforcement learning control scheme, if Assumptions 1-3 are satisfied, the closed-loop ˜ c and W ˜ a are semi-globally bounded. Furthersignals V , h, γ , α , Q , W ˜ c and W ˜ a will eventually more, the error signals z V , zh , zγ , zα , z Q , W remain within the compact set z V , zh , zγ , zα , z Q , W˜ and W˜ , c a respectively, which are given as follows:

127 128 129 130 131 132

JID:AESCTE

AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.8 (1-12)

C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••

8

1

z V = { z V ∈ R | | z V | ≤ A zV }

2

zh = { zh ∈ R | | zh | ≤ A zh }

3 4 5 6 7 8 9 10 11 12

67 68 69

zγ = { z γ ∈ R | | z γ | ≤ A z γ }

70 71

zα = { z α ∈ R | | z α | ≤ A z α }

Wa

15

73 74 75 76 77 78

κ

13 14

72

(64)

z Q = { z Q ∈ R | | z Q | ≤ A z Q }

˜ c ∈ R N c | || W ˜ c || ≤ 2( L (0) + ξ ) W˜ c = W κ

˜ a ∈ R Na | || W ˜ a || ≤ 2( L (0) + ξ ) ˜ = W

79

where N c and N a are numbers of nodes of critic network and actor network, respectively, ξ and κ are positive constants.

80 81 82

16 17 18 19

Proof. With assumption 3 satisfied, one has z V (0) < A zV , zh (0) ≤ A zh , zγ (0) < A zγ , zα (0) < A zα , z Q (0) < A z Q , and the following inequality is established from (60)

20 21 22 23 24 25 26 27

 L (t ) ≤ N (t ) =

1

29

2

31 32 33 34 35 36 37

1 2

log

40 41 42 43 44 45

48

+

ξ

κ

log

A 2zV − z2V (t ) A 2zh A 2zh − zh2 (t )

< +∞

| z V (t )| ≤ | zh (t )| ≤



(66)

≤ N (t )



1 − e −2N (t ) A zV , lim | z V (t )| ≤ t →∞

1 − e −2N (t ) A zh , lim | zh (t )| ≤ t →∞



ξ

1 − e −2 κ A zV (67) ξ

1 − e −2 κ A zh

˜ aT W ˜ cT W ˜ a ≤ L (0) + ξ and (1/2) W ˜c ≤ From (65), there is (1/2) W κ ξ L (0) + κ . Therefore, one can also obtain

˜ a ||2 ≤ 2( L (0) + || W ˜ c ||2 ≤ 2( L (0) + || W

ξ

κ ξ

κ

) )

51 52 53 54 55 56

(68)

Remark 4. From (67), velocity tracking error z V and altitude tracking error zh exponentially converge into the set zV and zh , respectively. The steady tracking error can be reduced by increasing σc , σa , k V , kh , kγ , kα , k Q and reducing K I . However, such an operation would increase the amplitude of control signals. Hence, the capability of the actuators should be taken into consideration when tuning the parameters.

57 58 59 60 61 62 63 64 65 66

87 88

and control ability of neural networks have been largely improved, which makes the neural networks can be implemented in practical engineering.

89 90 91 92 93 94

49 50

86

(65)

the transient and steady-state tracking performances can be described as



85

Fig. 4. Altitude.

≤ N (t )

46 47

κ

e

−κ t

84

5. Simulation

A 2zV

38 39



which indicates all the prescribed performance: | z V | ≤ A zV , | zh | ≤ Azh, | zγ | ≤ A zγ , | zα | ≤ A zα , | z Q | ≤ A z Q are not violated. Further, considering the inequality

28 30

L (0) −

ξ

83

Remark 5. The core of the reinforcement learning control strategy is exchanging part of real-time performance for the optimal disturbance estimation performance. Hence, the issue of real time becomes a main problem prohibiting widespread use of the reinforcement learning method. With the development of the neural networks, the real-time problem has been preliminary solved. Nowadays, the neural networks are usually constructed by separate hardware solutions in practical engineering, such as FPGA, GPA or ASIC, et al. By utilizing the hardware, the processing speed

In this section, a numerical simulation study is given to demonstrate the superiority and effectiveness of the proposed control strategy. The initial states are: V (0) = 7.0 Mach, h(0) = 85,000 ft, γ (0) = 0◦ , α (0) = 0◦ and Q (0) = 0◦ /s. The parameters of control system are set as: k V = 3, kh = 5.1, kγ = 2.4, kα = 10, k Q = 5.5. As for the tracking differentiator, parameters are chosen as: R = 200, δ = 0.01, a = 0.5, b = 1.14. For reinforcement learning, the parameters are set as follows: the numbers of hidden nodes of the critic and action networks are 20 and 18, respectively; the learning rate to actor NN σa and critic NN σc are both 0.05: the weights of actor NN and critic NN are randomly initialized in [−1, 1]; for S a ( Z a ) and S c ( Z c ), center parameters are chosen as either 1 or −1, and variances are chosen as ηa = 3600 and ηc = 250; P in (13) is given as P = 0.1I 4×4 , where I is identity matrix; K I in (26) is set as K I = [10, 10, 10, 10]T . The velocity command varies from 7 to 7.3 Mach and the altitude command varies from 85,000 to 95,500 ft. Controller parameters are k V = 6, kh = 6, kγ = 2.5, kθ = 10 and k Q = 5. The prescribed tracking performances are given as | z V | < 0.01 Mach/s and | zh | < 40ft. Also, states and control input constraints are |γ | < 0.5◦ , |α | < 3◦ , | Q | < 3◦ , φ ∈ [0, 1] and δe ∈ [−15, 15]. The external disturbances are set as d V = 2 ft/s at 120 < t ≤ 150 s, d V = 2cos(0.2t ) ft/s at 170 < t ≤ 250 s and dγ = 0.01◦ , dα = 0.02◦ , d Q = 0.2◦ / s at 120 < t ≤ 150 s, dγ = 0.01sin(0.2t )◦ , dα = 0.02sin(0.2t )◦ , d Q = 0.2sin(0.3t )◦ /s at 170 < t ≤ 250 s for AHV-VGI system. Assume that aerodynamic coefficients have 5% uncertainty. Finally, define w c1 = [ w c11 , · · · , w c51 ]T and w a1 = [ w a11 , · · · , w a51 ]T as the weights from all the inputs to the first hidden node of critic network and actor network, respectively. With the controller in [19] employed for comparison, the simulation results are shown in Fig. 4 - Fig. 18. The altitude and velocity signal tracking performance is shown in Figs. 4-7, respectively. The outputs of AHV-VGI can track the reference signals fast and stably. Also, the given tracking error constrains are satisfied. Furthermore, compared with the controller in [19], smaller tracking errors can be achieved by the proposed controller. Figs. 8-10 demonstrate the curves of the flight path angle γ , angle of attack α and pitch angle rate Q , respectively. It

95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

JID:AESCTE AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.9 (1-12)

C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••

9

1

67

2

68

3

69

4

70

5

71

6

72

7

73

8

74

9

75

10

76

11

77

12

78

13

79

14

80

15

81

16

82

17

83

18

84 85

19 20

Fig. 5. Altitude tracking error.

Fig. 8. Flight path angle.

86

21

87

22

88

23

89

24

90

25

91

26

92

27

93

28

94

29

95

30

96

31

97

32

98

33

99

34

100

35

101

36

102

37

103

38

104

39

105

40

106 107

41 42

Fig. 9. Angle of attack.

Fig. 6. Velocity.

can be seen that all the states are regulated within the desired ranges. Figs. 11-12 illustrate the curves of fuel equivalency ratio φ and elevator deflection δe , respectively. The controller inputs are both under the given ranges. The dynamic of disturbances and their estimates are shown in Figs. 13-15. The disturbances can be real-time estimated by the neural networks. Figs. 16-18 show the weights updating of the critic and actor network, respectively. The weight values of two networks are adaptively adjusted during the flight process. Figs. 13-18 show the effectiveness of the reinforcement learning scheme. Hence, the simulation result demonstrates that the BLF based reinforcement learning controller can guarantee the system achieves all the control objectives as analyzed in this paper.

44 45 46 47 48 49 50 51 52 53 54 55 56

110 111 112 113 114 115 116 117 118 119 120 121 122 123

57

6. Conclusion

58

124 125

59 60 61 62 63 64 65 66

108 109

43

Fig. 7. Velocity tracking error.

To solve the signal tracking problem with tracking performance constraints for AHV-VGI, a BLF-based reinforcement learning controller is proposed in this paper. The proposed reinforcement learning method with actor-critic structure can effectively approximate the unknown disturbances and uncertainties in the flight control system. By constructing and analyzing the BLFs of the flight control system, the prescribed tracking performance can be guar-

126 127 128 129 130 131 132

JID:AESCTE

AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.10 (1-12)

C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••

10

1

67

2

68

3

69

4

70

5

71

6

72

7

73

8

74

9

75

10

76

11

77

12

78

13

79

14

80

15

81

16

82

17

83

18

84

19

85

20 21

Fig. 10. Pitch angle rate.

86

Fig. 13. Disturbance V .

87

22

88

23

89

24

90

25

91

26

92

27

93

28

94

29

95

30

96

31

97

32

98

33

99

34

100

35

101

36

102

37

103

38

104

39

105

40

106

41

107 108

42 43

Fig. 11. Fuel equivalency ratio.

44

Fig. 14. Disturbance γ .

109 110

45

111

46

112

47

113

48

114

49

115

50

116

51

117

52

118

53

119

54

120

55

121

56

122

57

123

58

124

59

125

60

126

61

127

62

128

63

129

64

130 131

65 66

Fig. 12. Elevator deflection.

Fig. 15. Disturbance α .

132

JID:AESCTE AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.11 (1-12)

C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••

11

anteed theoretically. Tracking differentiators are applied to avoid the “explosion of term” problem. The simulation results illustrate the effectiveness and advantage of the proposed control strategy. The future research works will divided into three parts. First, the controller design under time-varying tracking error and state constraints for AHV-VGI will be concentrated. Secondly, motivated by [37–40], the coupling between the variable geometry inlet and the AHV will be further considered. And controller design under this circumstance will be focused. Thirdly, by using the novel backstepping strategy without virtual controllers, simplify the structure of the proposed control strategy.

1 2 3 4 5 6 7 8 9 10 11

68 69 70 71 72 73 74 75 76 77 78

12

Declaration of competing interest

13

79 80

14

None declared.

15

81 82

16

Acknowledgements

17

83 84

18 19 20

67

Fig. 16. Disturbance Q .

21 22 23

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61833016 and 61873295) and Shanghai Aerospace Science and Technology Innovation Fund (SAST2017-096).

86 87 88 89

Appendix. (Table 1)

24

85

90 91

25

Table 1 Fitting coefficients in aerodynamic forces.

26 27

92 93

28

Parameters

Values

Parameters

29

C lα

−0.2416

C lα

C lMa C l0 C LMa C L0

30 31 32 33 34 35 36

2

Values 0.0633

−5.2380

2 C lMa

0.1598

37.5193



0.0157

δe

0.0066

L

5.45 × 10−5

CL

0.0046



D

2 Cα

D

3.58 × 10−4

δe2

CD

CD

4.37 × 10−5

1.28 × 10−4

94 95 96 97 98 99 100

δe

−1.97 × 10−10

CD

α δe

9.78 × 10−5

C 0D

0.0133

103

101 102

Ma CD

−5.32 × 10−4

C Tα

0.0328

C TMa

0.0026

−0.152

104

C T0

φα

39

CT

0.3252

105

40

CT

−0.703

φ

CT

8.9227

106

41

α CM

0.0064

Ma CM

−0.0022

107

42

C Me

δ

−0.014

0 CM

0.051

108

−1.317 × 10−4

109

1.844 × 10−5

110

37 38

φ Ma

C Lα,l

9.02 × 10−5

44

C L0,l

0.0012

45

Ma CD ,l

−1.0037 × 10−5

C 0D ,l

5.121 × 10−5

111

46

C Tα,l

0.0025

C TMa ,l

−0.0045

112

47

C T0 ,l

0.0596

α CM ,l

1.529 × 10−5

113

48

Ma CM ,l

−2.53 × 10−4

114

43

Fig. 17. Weights updating of w c1 .

4.303 × 10−5

C LMa ,l Cα D ,l

0 CM ,l

115

49 50

References

51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Fig. 18. Weights updating of w a1 .

[1] K. Bilimoria, D. Schmidt, Integrated development of the equations of motion for elastic hypersonic flight vehicles, J. Guid. Control Dyn. 18 (1) (1995) 73–81, https://doi.org/10.2514/3.56659. [2] J. Parker, A. Serrani, S. Yurkovich, M. Bolender, D. Doman, Control-oriented modeling of an air-breathing hypersonic vehicle, J. Guid. Control Dyn. 30 (3) (2007) 856–869, https://doi.org/10.2514/1.27830. [3] S. Zhang, Q. Wang, C. Dong, Extended state observer based control for generic hypersonic vehicles with nonaffine-in-control character, ISA Trans. 80 (2018) 127–136, https://doi.org/10.1016/j.isatra.2018.05.020. [4] Y. Shen, W. Huang, T. Zhang, L. Yan, Parametric modeling and aerodynamic optimization of expert configuration at hypersonic speeds, Aerosp. Sci. Technol. 84 (2019) 641–649, https://doi.org/10.1016/j.ast.2018.11.007. [5] T. Zhang, Z. Wang, W. Huang, L. Yan, Parameterization and optimization of hypersonic-gliding vehicle configurations during conceptual design, Aerosp. Sci. Technol. 58 (2016) 225–234, https://doi.org/10.1016/j.ast.2016.08.020. [6] W. Zhang, W. Chen, W. Yu, Entry guidance for high-l/d hypersonic vehicle based on drag-vs-energy profile, ISA Trans. 83 (2018) 176–188, https:// doi.org/10.1016/j.isatra.2018.08.012.

116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

JID:AESCTE

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

AID:105537 /FLA

[m5G; v1.261; Prn:18/11/2019; 14:46] P.12 (1-12)

C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••

[7] A. Roenneke, A. Markl, Re-entry control to a drag-vs-energy profile, J. Guid. Control Dyn. 17 (5) (1994) 916–920, https://doi.org/10.2514/3.21290. [8] J. Ge, P. Frank, C. Lin, Robust H state feedback control for linear systems with state delay and parameter uncertainty, Automatica 32 (8) (1996) 1183–1185, https://doi.org/10.1016/0005-1098(96)00053-2. [9] Q. Zong, J. Wang, B. Tian, Y. Tao, Quasi-continuous high-order sliding mode controller and observer design for flexible hypersonic vehicle, Aerosp. Sci. Technol. 27 (2013) 127–137, https://doi.org/10.1016/j.ast.2012.07.004. [10] G. Wu, X. Meng, F. Wang, Improved nonlinear dynamic inversion control for a flexible air-breathing hypersonic vehicle, Aerosp. Sci. Technol. 78 (2018) 734–743, https://doi.org/10.1016/j.ast.2018.04.036. [11] Q. Hu, Y. Meng, C. Wang, Y. Zhang, Adaptive backstepping control for airbreathing hypersonic vehicles with input nonlinearities, Aerosp. Sci. Technol. 73 (2018) 289–299, https://doi.org/10.1016/j.ast.2017.12.001. [12] X. Bu, X. Wu, R. Zhang, Z. Ma, J. Huang, Tracking differentiator design for the robust backstepping control of a flexible air-breathing hypersonic vehicle, J. Franklin Inst. 352 (4) (2015) 1739–1765, https://doi.org/10.1016/j.jfranklin. 2015.01.014. [13] H. An, Q. Wu, C. Wang, Differentiator based full-envelope adaptive control of air-breathing hypersonic vehicles, Aerosp. Sci. Technol. 82–83 (2018) 312–322, https://doi.org/10.1016/j.ast.2018.09.032. [14] Z. Guo, J. Chang, J. Guo, J. Zhou, Adaptive twisting sliding mode algorithm for hypersonic reentry vehicle attitude control based on finite-time observer, ISA Trans. 77 (2018) 20–29, https://doi.org/10.1016/j.isatra.2018.04.001. [15] J. Sun, S. Xu, S. Song, X. Dong, Finite-time tracking control of hypersonic vehicle with input saturation, Aerosp. Sci. Technol. 71 (2017) 272–284, https://doi.org/ 10.1016/j.ast.2017.09.036. [16] J. Niu, F. Chen, G. Tao, Nonlinear fuzzy fault-tolerant control of hypersonic flight vehicle with parametric uncertainty and actuator fault, Nonlinear Dyn. 92 (2018) 1299–1315, https://doi.org/10.1007/s11071-018-4127-z. [17] X. Bu, G. He, K. Wang, Tracking control of air-breathing hypersonic vehicles with non-affine dynamics via improved neural back-stepping design, ISA Trans. 75 (2018) 88–100, https://doi.org/10.1016/j.isatra.2018.02.010. [18] S. Macheret, M. Shneider, R.B. Miles, Scramjet inlet control by off-body energy addition: a virtual cowl, AIAA J. 42 (11) (2004) 2294–2302, https://doi.org/10. 2514/1.3997. [19] L. Dou, P. Su, Q. Zong, Z. Ding, Fuzzy disturbance observer-based dynamic surface control for air-breathing hypersonic vehicle with variable geometry inlets, IET Control Theory Appl. 12 (1) (2018) 10–19, https://doi.org/10.1049/iet-cta. 2017.0742. [20] L. Dou, P. Su, Z. Ding, Modeling and nonlinear control for air-breathing hypersonic vehicle with variable geometry inlet, Aerosp. Sci. Technol. 67 (2017) 422–432, https://doi.org/10.1016/j.ast.2017.04.024. [21] L. Dou, J. Gao, Q. Zong, Z. Ding, Modeling and switching control of air-breathing hypersonic vehicle with variable geometry inlet, J. Franklin Inst. 355 (15) (2018) 6904–6926, https://doi.org/10.1016/j.jfranklin.2018.07.007. [22] H. An, B. Fidan, Q. Wu, C. Wang, X. Cao, Sliding mode differentiator based tracking control of uncertain nonlinear systems with application to hypersonic flight, Asian J. Control 21 (1) (2019) 143–155, https://doi.org/10.1002/asjc.1932. [23] H. An, Q. Wu, H. Xia, C. Wang, X. Cao, Adaptive controller design for a switched model of air-breathing hypersonic vehicles, Nonlinear Dyn. 94 (2018) 1851–1866, https://doi.org/10.1007/s11071-018-4461-1. [24] D. Liu, H. Javaherian, O. Kovalenko, T. Huang, Adaptive critic learning techniques for engine torque and air–fuel ratio control, IEEE Trans. Syst. Man Cybern., Part B, Cybern. 38 (4) (2008) 988–993, https://doi.org/10.1109/TSMCB. 2008.922019.

[25] Y. Ouyang, W. He, X. Li, Reinforcement learning control of a singlelink flexible robotic manipulator, IET Control Theory Appl. 11 (9) (2017) 1426–1433, https://doi.org/10.1049/iet-cta.2016.1540. [26] H. Jiang, H. Zhang, Y. Cui, G. Xiao, Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method, Neurocomputing 273 (2018) 68–77, https:// doi.org/10.1016/j.neucom.2017.07.058. [27] Y. Zhou, E. Kampen, Q. Chu, Nonlinear adaptive flight control using incremental approximate dynamic programming and output feedback, J. Guid. Control Dyn. 40 (2017) 493–496, https://doi.org/10.2514/1.G001762. [28] C. Mu, Z. Ni, C. Sun, H. He, Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming, IEEE Trans. Neural Netw. Learn. Syst. 28 (3) (2017) 584–598, https://doi.org/10.1109/TNNLS.2016.2516948. [29] D. Prokhorov, R. Santiago, D. Wunsch, Adaptive critic designs: a case study for neurocontrol, Neural Netw. 8 (9) (1995) 1367–1372, https://doi.org/10.1016/ 0893-6080(95)00042-9. [30] Z. Ni, H. He, X. Zhong, D. Prokhorov, Model-free dual heuristic dynamic programming, IEEE Trans. Neural Netw. Learn. Syst. 26 (8) (2015) 1834–1839, https://doi.org/10.1109/TNNLS.2015.2424971. [31] D. Wang, D. Liu, Q. Wei, D. Zhao, N. Jin, Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Automatica 48 (8) (2012) 1825–1832, https://doi.org/10.1016/j.automatica.2012. 05.049. [32] H. An, H. Xia, C. Wang, Barrier lyapunov function-based adaptive control for hypersonic flight vehicles, Nonlinear Dyn. 88 (3) (2017) 1833–1853, https:// doi.org/10.1007/s11071-017-3347-y. [33] K. Tee, B. Ren, S. Ge, Control of nonlinear systems with time-varying output constraints, Automatica 47 (2011) 2511–2516, https://doi.org/10.1016/j. automatica.2011.08.044. [34] X. Bu, Guaranteeing prescribed output tracking performance for air-breathing hypersonic vehicles via non-affine back-stepping control design, Nonlinear Dyn. 91 (1) (2018) 525–538, https://doi.org/10.1007/s11071-017-3887-1. [35] W. Chang, S. Tong, Adaptive fuzzy tracking control design for permanent magnet synchronous motors with output constraint, Nonlinear Dyn. 87 (2017) 291–302, https://doi.org/10.1007/s11071-016-3043-3. [36] Y. Liu, S. Lu, S. Tong, X. Chen, C. Chen, D. Li, Adaptive control-based barrier lyapunov functions for a class of stochastic nonlinear systems with full state constraints, Automatica 87 (2018) 83–93, https://doi.org/10.1016/j.automatica. 2017.07.028. [37] T. Zhang, Z. Wang, W. Huang, S. Li, A design approach of wide-speed-range vehicles based on the cone-derived theory, Aerosp. Sci. Technol. 71 (2017) 42–51, https://doi.org/10.1016/j.ast.2017.09.010. [38] S. Li, Z. Wang, W. Huang, J. Lei, S. Xu, Design and investigation on variable mach number waverider for a wide-speed range, Aerosp. Sci. Technol. 76 (2018) 291–302, https://doi.org/10.1016/j.ast.2018.01.044. [39] Z. Zhao, W. Huang, S. Li, T. Zhang, L. Yan, Variable mach number design approach for a parallel waverider with a wide-speed range based on the osculating cone theory, Acta Astronaut. 147 (2018) 163–174, https://doi.org/10.1016/j. actaastro.2018.04.008. [40] Z. Zhao, W. Huang, B. Yan, L. Yan, T. Zhang, R. Moradi, Design and high speed aerodynamic performance analysis of vortex lift waverider with a wide-speed range, Acta Astronaut. 151 (2018) 848–863, https://doi.org/10.1016/j.actaastro. 2018.07.034.

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112

47

113

48

114

49

115

50

116

51

117

52

118

53

119

54

120

55

121

56

122

57

123

58

124

59

125

60

126

61

127

62

128

63

129

64

130

65

131

66

132