JID:AESCTE AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.1 (1-12)
Aerospace Science and Technology ••• (••••) ••••••
1
Contents lists available at ScienceDirect
67 68
2 3
Aerospace Science and Technology
4
69 70 71
5
72
6
www.elsevier.com/locate/aescte
7
73
8
74
9
75
10
76
11 12 13
Barrier Lyapunov function based reinforcement learning control for air-breathing hypersonic vehicle with variable geometry inlet
16 17 18 19 20
Chen Liu
a,b,∗
b
c
, Chaoyang Dong , Zhijie Zhou , Zhaolei Wang
d
a
Science and Technology on Special System Simulation Laboratory, Beijing Simulation Center, Beijing 100854, China b School of Aeronautic Science and Engineering, Beihang University, Beijing 100191, China c Department of Automation, High-Tech Institute of Xi’an, Xi’an, 710025, China d Beijing Aerospace Automatic Control Institute, Beijing, 100854, China
a r t i c l e
i n f o
25 26 27 28 29 30 31 32 33
81 82 83 84 85 86 88
a b s t r a c t
89
23 24
79
87
21 22
78 80
14 15
77
Article history: Received 6 April 2019 Received in revised form 25 July 2019 Accepted 3 November 2019 Available online xxxx Keywords: Hypersonic vehicle Variable geometry inlet Reinforcement learning Barrier Lyapunov function Performance-guaranteed tracking
34 35
Based on barrier Lyapunov functions, a reinforcement learning control method is proposed for airbreathing hypersonic vehicles with variable geometry inlet (AHV-VGI) subject to external disturbances and diversified uncertainties. The longitudinal dynamic for the AHV-VGI is transformed into strict feedback form. Controllers for velocity and altitude subsystems are designed, respectively. Taking advantage of the reinforcement learning strategy, two radial basis function (RBF) neural networks are applied to estimate the “total disturbances” in the flight control system. Actor network is used for generating the estimate of the disturbance. Critic network is used for evaluating the estimation accuracy. Prescribed tracking performances and state constraints can be guaranteed by introducing barrier Lyapunov functions (BLFs). Tracking differentiators are used to generate the derivatives of virtual controllers in the backstepping design process. Simulation results illustrate the effectiveness and advantages of the proposed control strategy. © 2019 Elsevier Masson SAS. All rights reserved.
90 91 92 93 94 95 96 97 98 99 100 101
36
102
37
103
38
104
39
105
40 41
1. Introduction
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
In recent decades, air-breathing hypersonic vehicles (AHVs) have attracted a lot of attentions and related researches have become more and more important in both military and civil fields [1–5]. However, the model of AHV is subject to highly nonlinearity, serious coupling characteristic, unknown disturbance and parameter uncertainty. Thus, the controller design for AHVs becomes a difficult task. Moreover, since operating conditions vary significantly over the flight envelope, the dynamics of AHVs would vary accordingly [6]. This fact makes the control problem even more challenging. To solve the problem, a wide range of approaches have been investigated for flight control of the AHV. An early method is gainscheduling control approach [7]. However, the method requires a large amount of controller gains design and schedule such that the design process is complex. Feedback linearization [8] is also an effective control approach for AHV but the exact model of AHV is needed. More recently, lots of advanced control strategy have been
59 60 61 62 63 64 65 66
*
Corresponding author at: School of Aeronautic Science and Engineering, Beihang University, Beijing 100191, China. E-mail address:
[email protected] (C. Liu). https://doi.org/10.1016/j.ast.2019.105537 1270-9638/© 2019 Elsevier Masson SAS. All rights reserved.
developed for AHVs with uncertainty and external disturbance. In Ref. [9], a high order sliding mode observer was designed to estimate the unmeasured states, and a quasi-continuous high-order sliding mode controller is proposed for AHVs to ensure the stable signal tracking. In Ref. [10], an improved nonlinear dynamic inversion control method for flexible AHVs was proposed. Ref. [11] presented an adaptive backstepping control for AHVs with input nonlinearities by designing an input nonlinear pre-compensator. Observer based control is another widely applicable method in the AHV controller design procedure, in which observers are introduced to compensate the unknown disturbance and uncertainty [12]. In Ref. [13], an adaptive control strategy is proposed for AHVs to handle the time-varying uncertain coefficients of aerodynamic force and moment, actuator faults and flexible dynamics during a full-envelope hypersonic flight. In Ref. [14], the authors applied an adaptive finite-time observer to estimate the unknown states of the vehicle and presented an adaptive twisting sliding mode control approach for hypersonic reentry vehicles. By using a class of non-homogeneous disturbance observer, an adaptive fast terminal sliding mode controller was designed to ensure finite-time guidance law tracking of AHV in Ref. [15]. Due to the superior approximation ability to nonlinear functions of neural networks and fuzzy systems, intelligent control strategy has also been widely used
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
JID:AESCTE
2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.2 (1-12)
C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••
in AHV flight control. Ref. [16] designed a fault tolerant control scheme for AHV with actuator gain loss fault based on a fuzzy logic system. In Ref. [17], the authors designed a new back-stepping control approach for AHV with neural networks introduced to estimate the unknown dynamics. From the existing results, it is known that for the AHV with fixed geometry inlet, the shockwave would deviated away from the scramjet lip during low Mach flight. This fact would cause insufficient air stream supplement of the scramjet engine, so that the thrust decreases accordingly [18]. To improve the flight performance of the vehicle, AHV with variable geometry inlet (AHV-VGI) is studied. The VGI conformation can extend the velocity range effectively, and it is conducive to the acceleration control of AHV. However, the movement of translating cowl would introduce new uncertainty to the flight control system, such as the uncertain changes of the aerodynamic forces, aerodynamic moments and thrust. Ref. [19] established a fuzzy disturbance observer to reject the uncertainty caused by the movement of translating cowl and designed a dynamic surface control strategy. In Ref. [20], a longitudinal dynamic for AHV-VGI was established and sliding mode controllers based on fuzzy logic system were designed. In Ref. [21], a multi-mode model is established and a switching control method based on RBF neural network is proposed for AHV-VGI. Ref. [22] proposed a performance-guaranteed adaptive back-stepping controller design method for a class of nonlinear systems with uncertainties and disturbances. The proposed method was applied to AHV-VGI effectively. In Ref. [23], the authors proposed two adaptive controller for AHV-VGI to ensure that the vehicle can track the velocity and altitude reference signals stably. At present, the study of AHV-VGIs is still insufficient. The control system design of AHV-VGI urgently awaits to further research. Reinforcement learning is a relative new methodology to deal with uncertain system. Different from traditional supervised learning, the optimal action is obtained via the information from the environment in reinforcement learning strategy [24]. Ref. [25] designed a reinforcement controller for a single-link flexible manipulator to suppress the vibration caused by the flexible light-weight structure. Ref. [26] proposed a data-driven reinforcement learning method to design the robust controller for a class of uncertain nonlinear systems with completely unknown dynamic. An incremental approximate dynamic programming algorithm based on output feedback was presented for flight control in Ref. [27]. In Ref. [28], a data-driven supplementary control strategy for AHV tracking control based on adaptive dynamic programming was proposed. A main advantage of reinforcement learning control is that the controller is updated in real time when the system affected by unknown disturbance and uncertainty [29–31]. Considering the apparent online-learning and online-adjusting advantage, in this paper, reinforcement learning method is applied to solve the controller design problem of AHV-VGI. Furthermore, in practical flight control systems, considering signal tracking performance, actuator saturation and safety specification, there are constraints on system outputs or states. Once the constrains are violated, it would lead to performance degradation, instability and even system damage [32]. To solve state or output constraint problems, a credible solution is barrier Lyapunov function (BLF)-based controller design method, which can guarantee the prescribed tracking performance [33,34]. BLFs have been widely used in practical engineering and theoretical study. By using BLF, Ref. [32] proposed an adaptive control law to guarantee the tracking performance of velocity and altitude for hypersonic flight vehicles. Ref. [35] designed an adaptive fuzzy control scheme based on BLF for permanent magnet synchronous motors system to ensure the position tracking constraint. In Ref. [36], an adaptive control scheme was investigated for a class of nonlinear uncertain stochastic systems with all states constraints. As far as the authors
67 68 69 70 71 72 73 74 75
Fig. 1. The structure of AHV-VGI.
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
Fig. 2. The schematic diagram of AHV with the translating cowl. (For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)
91 92 93 94
know, there are few researches on the AHV-VGI with velocity and altitude tracking performance constraints, which motivates us for this paper. To further solve the signal tracking problem of AHV-VGI with tracking performance constraints, in this paper, a reinforcement learning control strategy is proposed for AHV-VGI to achieve accurate velocity and altitude tracking based on barrier Lyapunov function. In the proposed control method, all the parameter uncertainties, uncertainties introduced by variable geometry inlet and external disturbances of AHV-VGI can be estimated by a reinforcement learning strategy, which can ensure a superior disturbance estimation performance. Then, by constructing and analyzing the BLFs for the closed-loop system, the prescribe tracking performance constraints are guaranteed theoretically. Also, tracking differentiators are used to generate the time derivatives of virtual control signals. Compared with traditional filters, tracking differentiator has a simpler structure and can get a better approximation of the time derivatives of original signals. From the advantages above, it can be seen that the proposed method is prone to apply in practical engineering. The rest of this paper is organized as follows. The longitudinal model and preliminaries are stated in Section 2. Reinforcement learning controller design methods based on barrier Lyapunov function for both velocity and altitude subsystems are presented in Section 3, respectively. In Section 4, the stability and tracking performance of the closed-loop system are analyzed. A numerical example is provided in Section 5 to verify the effectiveness and advantage of the proposed control strategy. Section 6 is the conclusion.
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124
2. Problem description and preliminaries
125 126
2.1. Longitudinal model of AHV-VGI
127 128
The structure of AHV-VGI studied in this paper is shown in Fig. 1. It can be seen that the scramjet inlet has a translating cowl which can adjust according to the Mach number and the angle of attack. Specifically, as shown in Fig. 2, when the AHV cruises in a
129 130 131 132
JID:AESCTE AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.3 (1-12)
C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
relatively high Mach number, there would be the forebody oblique shock (blue dash) in the vehicle and the shock wave angle is given as θs . Then, when the free steam (red solid line) meets the oblique shock, it turns parallel to the forebody. Hence, the flow turn angle is δs = α + τ1l . When the vehicle cruises at low Mach number, the shock wave angle θs increases. If the cowl of the scramjet inlet is fixed, the shock wave deviates from the lip of scramjet. This would cause that the engine cannot capture enough air (D 1 is the actual captured area). If the cowl can move the position L 1 , the shock wave would be completely sealed by the cowl (D 1 + D 2 is the actual captured area). According to the research result of [19], if define the forebody length L f = 47 ft, the height of engine inlet h = 3.5 ft and forebody angle τ1l = 3.5◦ , D 1 and D 2 can be calculated as
D1 =
18 19 20 21
D2 =
hsin(θs )cos(τ1l )
(1)
sin(θs − α − τ1l ) h + L f tan(τ1l )sin(θs )
(2)
sin(θs − α )
Also, the optimal position of the translating cowl is
22 23 24 25 26 27 28 29 30 31
L 1 = L f − ( L f tan(τ1l ) + h)cot(θs − α )
C T = C Tα α + C TMa ∗ Ma + C T0 φα
φ
φ Ma
CT = CT ∗ α + CT
71
Ma 0 ˆ C˜ D = C lD · ˆl = (C α D ,l α + C D ,l Ma + C D ,l ) · l C˜ T = C l · ˆl = (C α α + C Ma Ma + C 0 ) · ˆl
73
T
T ,l
α α+ C˜ M = C lM · ˆl = (C M ,l
40 41 42 43 44 45 46 47 48 49 50 51
78
function of Mach number Ma and ting method, the expression of ˆl is
Q˙ = M / I y y + d Q
58 59 60 61 62 63 64 65 66
L ≈ q¯ S (C L + C˜ L ) D ≈ q¯ S (C D + C˜ D )
where detailed parameters value can also be found in the Appendix. Then, it can be obtained that the model of AHV-VGI contains two outputs — the velocity V and altitude h. It also contains two inputs — the fuel equivalence ratio φ and the elevator deflection δe .
80 82 83 84 85 86 87 88 89 90
2.2. Strict feedback form
91
The longitudinal model of AHV-VGI can be rewritten in strict feedback form as
92 93 94
V˙ = g V · φ + f V + V
95
h˙ = V sinγ
96 97
(8)
98 99 100
Q˙ = g Q · δe + f Q + Q
101 103 104 105
f V = (¯q · C T · cosα − q¯ S · C D )/m − gsinγ
106 107
g γ = q¯ S · C Lα /(mV )
108
δ
f γ = q¯ S · (C LMa · Ma + C Le · δe + C L0 − C Lα · γ )/(mV )
+ T · sinα /(mV ) − g · cosγ / V
109
(9)
where C L , C D , C T and C M are the aerodynamic lift coefficient, drag coefficient, thrust coefficient and pitching moment coefficient, respectively. q¯ = 1/2ρ v 2 denotes the dynamic pressure. ρ , S , c¯ , z T are density, reference area, aerodynamic chord and thrust moment arm, respectively. φ is the fuel equivalence ratio. C˜ L , C˜ D , C˜ T and C˜ M are introduced by the translating cowl. The aerodynamic parameters mentioned above are given as follows and the detailed coefficients value can be found in the Appendix (Table 1). δ C L = C Lα α + C LMa Ma + C Le δe + C L0 δe2
α 2 + C δDe δe + C D δe2 + C DMa Ma + C 0D
110 111
g α = 1; f α = − g γ · α − f γ
112
δe
g Q = q¯ S c¯ · C M / I y y
113
α · α + C Ma · Ma + C 0 ) + z q¯ · (C f Q = [¯q S c¯ · (C M T T M M
115
114 116
φ
(5)
φ
2
(7)
g V = q¯ · C T · cosα /m
where V is velocity, h is altitude, γ is flight path angle, α is angle of attack, Q is pitching rate, m is mass, I y y is moment of inertia and g is acceleration of gravity. di (i = V , γ , α , q) denote unknown external disturbances. L , D , T , M are lift, drag, thrust and pitching moment, respectively. In AHV-VGI model, the change of aerodynamic forces caused by the translating cowl is supposed to be considered. By curve fitting method, the aerodynamic forces are given as
α CD = Cα Dα + CD
79 81
ˆl ≈ C α ∗ α + C α 2 ∗ α 2 + C M a ∗ Ma + C 0 l l l l
φ
M ≈ q¯ S c¯ (C M + C˜ M ) + z T ∗ T
57
α . Similarly, by using curve fit-
77
102
53
56
75
where
T ≈ q¯ (C T + C T · φ + C˜ T )
55
(4)
α˙ = Q − γ˙ + dα
52 54
74
The estimated value of the optimal elongation distance ˆl is a
α˙ = gα · Q + f α + α
39
T ,l 0 ˆ + CM ,l ) · l
76
h˙ = V sinγ
38
T ,l Ma CM ,l Ma
72
(6)
33
37
70
0 ˆ C˜ L = C lL · ˆl = (C Lα,l α + C LMa ,l Ma + C L ,l ) · l
γ˙ = gγ · α + f γ + γ
36
69
(3)
However, in the practical flight, the optimal elongation distance can not be acquired precisely. Hence, the application of the moving cowl introduces other aerodynamic uncertainties for the AHV-VGI, which makes the control system design of AHV-VGI more challenging. In this sense, the longitudinal dynamics of the AHV-VGI is formulated by the following equations [19]:
γ˙ = ( T sinα + L )/(mV ) − g ∗ cosγ / V + dγ
68
∗ Ma + C T δ
V˙ = ( T ∗ cosα − D )/m − g ∗ sinγ + d V
35
67
φ
α ∗ α + C Ma ∗ Ma + C e ∗ δ + C 0 CM = CM e M M M
32 34
3
+ C T · φ]/ I y y
117
Then, non-linear parts introduced by the variable geometry inlet, parameter uncertainties and external disturbances of the vehicle are all treated as “total disturbances” i (i = V , γ , α ) and expressed as
118
V = g V · φ + f V + ( f Vl + f Vl ) · ˆl + d V γ = g γ · α + f γ + ( f l + f l ) · ˆl + dγ
123
γ
γ
α = f α + ( f αl + f αl ) · ˆl + dα
119 120 121 122 124
(10)
Q = g Q · δe + f Q + ( f Ql + f Ql ) · ˆl + d Q where f Vl , f γl , f αl , f Ql are the uncertainties introduced by movable cowl, and f Vl , f γl , f αl , f Ql denote the uncertainties of aerodynamic coefficients, which are described as
125 126 127 128 129 130 131 132
JID:AESCTE
AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.4 (1-12)
C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••
4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
f Vl = (¯q C lT cosα − q¯ SC lD )/m f γ = q¯ SC lL /(mV ) f Ql = q¯ ( S c¯ C lM + z T C lT )/ I y y l l l
(11)
fα = − fγ
Suppose that V c and hc are some reference velocity and altitude signals and time derivatives of V c and hc are bounded. The goal of this paper is to realize the tracking control of AHVVGI which suffers from parameter uncertainty, unknown external disturbance and nonlinearity caused by translating cowl with the prescribed tracking performances: | z V | = | V − V c | < A zV , | zh | = |h − hc | < A zh and state constraints: |γ | < A γ , |α | < A α , | Q | < A Q , where A zV , A zh , A γ , A α , A Q are positive constants. Then, the following assumptions are made for AHV-VGI.
16 17 18 19 20 21 22 23
Assumption 1. [19] The functions g i and its differentials g˙ i are bounded, i = V,γ, Q . Assumption 2. [32] The initial flight conditions of AHV-VGI satisfy the prescribed performances and constraints, that is, | z V (0)| < A zV , | zh (0)| < A zh , |γ (0)| < A γ , |α (0)| < A α , | Q (0)| < A Q .
24 25 26 27 28 29 30 31 32 33 34
Remark 1. Assumption 1 is applied to ensure that the control inputs are non-singular and bounded. Since g i , i = V , γ , Q are relative to the dynamic pressure, mass, Mach number, attack angle and aerodynamic coefficients of the vehicle. During the practical flight process, all the parameters mentioned above are bounded and in reasonable range. Therefore, Assumption 1 is acceptable. Assumption 2 means that the initial attitude angles of the vehicle should be adjusted to the initial values given by the guidance law. This is easy to achieve in the practical flight. Hence Assumption 2 is reasonable.
where μi = [μi1 , μi2 , · · · , μik ] is the center of receptive field and μk is the width of the Gaussian function. It is proved that for any continuous function f ( Z ), where Z is within a compact set z ⊂ Rk , the approximation errors of an RBF neural network can be made arbitrarily small if the number of hidden neurons is large enough. Namely, there is
67
f ( Z ) = W ∗ S ( Z ) + ε , ∀ Z ∈ z
74
where W ∗ is the optimal weight of the network and bounded approximation error.
2.3. Preliminaries
37 38 39
To apply the reinforcement learning control scheme with the actor-critic structure, a long-term cost function is designed as
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
∞ I (t ) =
e
−
66
ψ
ϕ (m)dm
(12)
t
where ψ is a discount factor which is applied to discount the future cost.ϕ (t ) is an instant cost function and given as T
ϕ (t ) = Z P Z
(13)
where P is a 5 × 5 positive definite symmetric matrix, Z = [ z V , zh , zγ , zα , z Q ]T , zγ = γ − γc , zα = α − αc , z Q = Q − Q c . γc , αc and Q c are virtual controllers, which will be designed in the next section. Due to the superiority in approximating nonlinear functions, RBF neural networks are used to construct reinforcement learning control strategy. For an arbitrary continuous function f ( Z ) : Rk → R, the following RBF neural network is given as
f nn ( Z ) = W S ( Z )
(14)
where Z = [ Z 1 , Z 2 , · · · , Z k ]T ⊂ Rk is the input vector, W = [ w 1 , w 2 , · · · , w l ] ⊂ Rl is the weight vector with l NN nodes, l > 1. S ( Z ) = [s1 ( Z ), s2 ( Z ), · · · , sl ( Z )]T , where si ( Z ) is given as a Gaussian function:
64 65
m−t
si ( Z ) = exp
−( Z − μi )T ( Z − μi ) 2 k
μ
(15)
70 71 72 73
76 77 78 79 80
The structure of the controller proposed in this paper is shown in Fig. 3. The controller contains 4 parts: critic network, actor network, velocity subsystem controller and altitude subsystem controller. In the subsequent content of this section, detailed design process will be given.
81 82 83 84 85 86
3.1. Design of critic network
87 88
A critic network is designed to evaluate the estimation performance of the actor network. The critic network is constructed using the RBF neural network. In the ideal case, define I = W c∗ T S c (c in ) + εc , where W c∗ is the optimal weight of the critic network, εc is a bounded approximation error. And the input and output vector of the critic actor are c in = [ z V , zh , zγ , zα , z Q ]T , cout = ˆI , respectively. Therefore, the function of the network can be exˆ cT S c (c in ), where W ˆ cT is the approximation of W c∗ . pressed as ˆI = W Combined with (12), the estimation error of the cost-to-go function becomes
γ (t ) = ϕ (t ) −
1
ψ
ˆI (t ) + ˙ˆI (t )
γ (t ) = ϕ (t ) + ˙ˆI (t ) = φ(t ) + ∇ ˆI (t )˙c in
˙ˆ = −σ ∂ E c W c c ∂Wc
(19)
˙
92 93 94 95 96 97 98 100 102 103 105 106 107 108 109 110 111
γ T γ . Combine with (18), it can be obtained that
ˆ c = −σc γ (t ) W
91
104
(18)
where ∇ represents the gradient of c in . The updating law of the critic network can be described as
1 2
90
101
As the constant ψ → ∞, the approximation error of the cost-to-go function is rewritten as
where E c =
89
99
(17)
∂[ϕ (t ) − (1/ψ) ˆI (t ) + ˙ˆI (t )] = −σc γ (t ) ∂Wc ˆ ˆ ∂I ∂ 1 ∂I = −σc γ (t ) − + ψ ∂Wc ∂ W c ∂ c in = −σc (ϕ (t ) +
112 113
∂γ ∂Wc
114 115 116
(20)
117 118 119 120 121
W cT )
122
where σc > 0 is the learning rate of the critic network, = −( S c /ψ) + ∇ S c c˙ in . Then, a Lyapunov candidate function for the critic network is designed as
123 124 125 126 127
Lc =
1 2
˜ cT W ˜c W
(21)
˙
˙
˜ cT (ϕ (t ) + W cT ) ˜c=W ˆ c = −σc W ˜ cT W ˜ cT W L˙ c = W
128 129 130
Substituting (20) into (21), the derivative of L c is
, i = 1, 2, · · · , l
69
75
ε is a
3. Controller design
35 36
(16)
68
131
(22)
132
JID:AESCTE AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.5 (1-12)
C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••
5
1
67
2
68
3
69
4
70
5
71
6
72
7
73
8
74
9
75
10
76
11
77
12
78
13
79
14
80
15
81
16
82
17
83
18
84
19
85
20
86
21
87
22
88
23
89
24
90
25
91
26
92
27 29 30 31 32
∗T Sc
ϕ (t ) = W c
ψ
35 36 37 38 39 40 41 42 43 44 45
= W c∗ T
48 49 50 51 52 53 54 55 56 57 58 59 60 61
ψ
+
ψ
εc ψ
− ∇ I c˙ in − ∇( W c∗ T S c (c in ) + εc )˙c in
(23)
∗T
= − W c + c where c = εc + ∇ εc Z˙ c , ||εc || ≤ c ,max . Substituting (23) into (22), there is
˜ cT ( W ˜ c S c + c ) L˙ c = −σc W
σc T ˜ T ˜ σc T ≤− Wc Wc + c c 2
≤−
(24)
2
64 65 66
˙ˆ = −σ ∂ E a W a a ∂ Wa where E a =
(27)
La =
3.2. Design of actor network
2
(28)
ˆ a S a (ain ), where W ˆ a is the estimation network is given as aout = W of W a∗ . Therefore, the instant approximation can be defined as
ςa = [ςaV , ςaγ , ςaα , ςa Q ] = W˜ aT S a (ain ) T
(25)
ˆ a − W a∗ . The error with the actor network is ex˜a=W where W pressed as
100 101
108
111 113 114 115 116 117
(31)
118 119
˜ cT S c (c in ), one has Since ˆI = W c∗ T S c (c in ) + W I I ≤ 2( W c S c )
106
112
(30)
T
105
110
˜ aT S a ( W ˆ aT S a (ain ) + K I ˆI ) L˙ a = −σa W ∗T
99
109
(29)
˜ aT W ˜a W
ˆT ˆ
98
107
ςa is
The derivative of (30) is The input vector of the actor network is ain = [ z V , zh , zγ , zα , z Q ]. The output vector of the network is expressed as aout = ˆ V , ˆ γ , ˆ α, ˆ Q ]T , where ˆ V, ˆγ, ˆ α and ˆ Q are the estima[ tion value of V , γ , α and Q in (10), respectively. Then, define the optimal neural weight of the actor network as W a∗ . There is [ V , γ , α , Q ]T = W a∗ T S c (ain ) + εa , where εa = [εaV , εaγ , εaα , εa Q ]T , εaV , εaγ , εaα and εa Q are the approximation errors of V , γ , α and Q , respectively. The function of actor
97
104
˙ˆ = −σ ( W ˆ aT S a (ain ) + K I ˆI ) S a W a a 1
96
103
So, (27) can be rewritten as
Choose a Lyapunov candidate function for the actor NN as
2
95
102
1 T e e . 2 a a
˙ˆ = −σ ∂ E a ∂ ea ∂ ςa = −σ (ς + K ˆI ) S W a a a a I a ∂ ea ∂ ςa ∂ W a
σc ˜ T ˜ σc Wc Wc + ||c,max ||2
62 63
where I d (t ) = 0, which represents the desired ideal cost-to-go, and K I is a 4 × 1 gain matrix. The updating law of the critic network is
T
2
(26)
where σa > 0 is the learning rate of the actor network. Since unknown, the updating law is redefined as
˜ cT W ˜ c − σc W ˜ cT c = −σc W T
46 47
Sc
+
εc
94
ea = ςa + K I ( ˆI (t ) − I d (t ))
As γ → 0, ϕ (t ) = ( I /ψ) − ˙I . Then, one has
33 34
93
Fig. 3. The diagram of AHV-VGI control system based on reinforcement learning.
28
120 121
˜ cT S c ˜ cT S c )T W W c S c + 2( W ∗
(32)
122
Combined with (31), it can be obtained that
123
˜ aT S a ( W a∗ T S a (ain ) − W ˜ aT S a (ain ) + K I ˆI ) L˙ a = −σa W
125
∗T 2 ˜T ˜ 2 a W a S a W a S a (ain ) + a || W a || || S a || T ∗ 2 ˜T ˜ 2 a W a S a W a S a (ain ) + a || W a || || S a || a a ˜ a ||2 || S a ||2 + || W K IT K I || ˆI ||2
= −σ
σ
≤ −σ
σ
− ≤−
σ
2
˜ aT S a K I ˆI − σa W
2
˜ a ||2 || S a ||2 + || W
σa 2
|| W a∗ ||2 || S a ||2 −
126 127 128 129
σ
2
σa
124
130
σa 2
˜ a ||2 || S a ||2 || W
131 132
JID:AESCTE
AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.6 (1-12)
C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••
6
1 2 3
˜ c ||2 || S c ||2 ) +σa K IT K I (|| W c∗ ||2 || S c ||2 + || W ˜ a ||2 || S a ||2 + σa || W a∗ ||2 || S a ||2 = −σa || W 2
T ∗ 2 2 a K I K I || W c || || S c || T 2 ˜ 2 a K I K I || W c || || S c ||
4
+σ
5
+σ
6
(33)
7 8
3.3. Controller design for velocity system
9 10 11 12 13 14 15 16 17 18
With z V = V − V c defined, the dynamic velocity tracking error is described as
z˙ V = g V · φ + f V + V − V˙ c
To guarantee the prescribed tracking performance | z V | < A zV , a barrier Lyapunov function candidate is chosen as
1
L1 =
2
19 20
(34)
log
A 2zV
−
A 2zV
(35)
z2V
where log(·) denotes the natural logarithm of (·).
23 24 25 26 27 28 29 30
φ = gV
− kV zV −
L˙ 1 =
36
zV
=
38 40
−
2( A 2zV
z2V )
ˆV + V˙ c − f V −
A 2zV − z2V
=−
41
k V z2V A 2zV
−
z2V
− kV zV − −
2( A 2zV − z2V )
z2V 2( A 2zv
zV
−
z2V )2
+
+ ςaV + εaV
z V ςaV A 2zV
−
z2V
+
z V εaV A 2zV
46 47
−
z V ςaV A 2zV
−
z2V
z V εaV
49
A 2zV − z2V
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
≤ ≤
z2V 4( A 2zV
−
z2V )2
z2V 4( A 2zV − z2V )2
+ ||ςaV ||2 + ||εaV ||
2
(39)
Substituting (38) and (39) into (37), there is
L˙ 1 ≤ −
k V z2V A 2zV − z2V
+ ||ςaV ||2 + ||εaV ||2
3.4. Controller design for altitude system In this section, a controller for altitude subsystem is designed. To guarantee the prescribed altitude signal track performance | zh | < | A zh |, the state constrains |γ | < A γ , |θ| < A θ , | Q | < A Q are supposed to be guaranteed simultaneously. Naturally, it can be assumed that
|γc | < A γc < A γ |αc | < A αc < A α |Q c| < A Qc < A Q
(41)
73 74 76
80 81 82 84
(44)
85 86 87
| z|a sign( z), | z| > δ, z
90
(45)
| z| ≤ δ, δ > 0
,
δ 1−a
89
| z| > δ
91 92
and the parameters in (45) satisfy: R > 0, 0 < a < 1, b > 0, δ > 0.
93 94
Remark 3. In (44), function f al( z, a, δ) is applied to avoid the chattering at the origin. To obtain better tracking performance and shoter convergence time, parameter R should be relatively large. However, excessively large R would increase high-frequency noise. Parameter δ has a major impact on the tracking signal. δ is related to R. Specifically, with the increase of R, δ increases with the same increase rate.
− kγ zγ −
zγ 2( A 2zγ − zγ2 )
(46)
ˆγ + γ˙c − f γ −
Similar to (44), introduce a new variable ˙c: tracking differentiator to obtain α
97 98 99 100 101 102
105 106 107
108
(47)
αc and let α¯ c pass the
⎧ ⎨ α˙ c = xα f al(xα , a, δ) ⎩ x˙ α = − R f al(αc − α¯ c , a, δ) + b a
96
104
The nominal virtual control for (46) is designed as
95
103
109 110 111 112 113 114
(48)
R
(40)
72
88
α¯ c = gγ (38)
71
83
⎧ ⎨ γ˙c = xγ f al(xγ , a, δ) ⎩ x˙ γ = − R f al(γc − γ¯c , a, δ) + b a
−1
Applying Young’s inequality, one has
48 50
γc and let γ¯c pass the tracking dif-
z˙ γ = γ˙ − γ˙c = g γ α + f γ + γ − γ˙c
z2V
44 45
Next, introduce a new variable ferentiator to obtain γ˙c :
Step 2: The dynamics of the virtual tracking error zγ is
(37)
42 43
(36)
70
79
(43)
z V z˙ V A 2zV − z2V
39
zV
With (36), the time derivative of L 1 yields
35 37
V
f al( z, a, δ) =
69
78
(−kh zh + h˙ c )
Therefore, if the boundedness of L 1 can be guaranteed, the system will comply with the given regulation | z V | < A zV .
33 34
γ¯c =
1
zV
−1
68
77
where
31 32
(42)
The virtual control for system (42) is given as
Remark 2. It is obvious that L 1 is a valid Lyapunov function candidate if z V satisfying | z V | < A zV , and there is lim z V → A − L 1 = +∞.
Next, the control law for the velocity subsystem is designed as
67
75
z˙ h ≈ V γ − h˙ c = V ( zγ + γc ) − h˙ c
R
21 22
And then, state tracking performance can be defined as: | zγ | < A zγ ≤ A γ − A γc , | zα | < A zα ≤ A α − A αc , | z Q | < A z Q ≤ A Q − A Q c . Different from conventional backstepping design procedure, the tracking differentiator (TD) is applied in the design procedure of this paper such that the “explosion of term” is avoided and a better system performance can be acquired. The design procedure is shown as follows: Step 1: The dynamics of the altitude tracking error zh is
115 116 117
Step 3: The dynamics of the virtual tracking error zα is described as
118
˙ − α˙ c = Q − γ˙ + dα − α˙ c z˙ α = α
121
(49)
The nominal virtual control for (49) is designed as
Q¯ c = −kα zα −
zα 2) 2( A 2zα − zα
ˆα + α˙ c − f α −
119 120 122 123 124
(50)
125 126
By introducing a new variable Q c , let Q¯ c pass the tracking differentiator to obtain Q˙ c :
⎧ ⎨ Q˙ c = xα f al(x Q , a, δ) ⎩ x˙ Q = − R f al( Q c − Q¯ c , a, δ) + b a R
127 128 129 130
(51)
131 132
JID:AESCTE AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.7 (1-12)
C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••
1 2 3 4 5 6 7
z˙ Q = Q˙ − Q˙ c = g Q · δe + f Q + Q − Q˙ c
10 11 12 13 14 15 16 17
1 − kQ zQ − δe = g − Q
20
23
26
1
29 30 31
Lh =
2
log
1
Lγ =
2 1
Lα =
34
L˙ 2 =
=
36
2
A 2zγ
(54)
38 39
+
40 41
+
42 43
log
A 2zα
+
48 49 50
−
51 52 53
−
54 55
zγ ςaγ
− zγ2
A 2zγ
2 kα zα
A 2zα
− α z2
k Q z2Q A 2z Q − z2Q
zγ z˙ γ
− zγ2
+
zα z˙ α
− kα zα −
− zγ2 )
2( A 2zγ zγ 2( A 2zα
− α
z2 )
61 62
zα ςaα
64 65 66
A 2zα
− α z2
z Q ςa Q A 2z Q
−
z2Q
≤
kγ zγ
−
A 2zγ − zγ2
−
A 2zα
−
− α z2
A 2z Q
−
79
z2Q
80 81 82 83
(58)
84
+
A 2zγ
+
+ ςα + εα + ςQ + εQ
− α
+
z2Q 2( A 2z Q − z2Q )2
(59)
91
A 2zV − z2V 2 kα zα
A 2zα
σc
− α z2
kh zh2
− −
A 2zh − zh2
−
A 2z Q
||c,max || +
σa 2
99 100 101 102
z2Q
103
∗ 2
2
T ∗ 2 2 a K I K I || W c || || S c ||
|| W a || || S a || + σ
κ = min
A 2zα
− α z2
+
z Q ςa Q A 2z Q − z2Q
ξ=
zα εaα A 2zα
+
− α z2
2
107
(60)
T 2 a K I K I || S c ||
σc − 2σ T
− α z2Q
−
z2Q )2
109
2
||c,max ||2 +
σa 2
, σa , k V , kh , kγ , kα , k Q
112
(61)
113 114 115
|| W a∗ ||2 || S a ||2
116 117
(62)
118 119
A 2z Q − z2Q
Moreover, to ensure
κ > 0, the following conditions are given:
120 121
T
T 2 a K I K I || S c ||
σc − 2σ
> 0, σa − 1 > 0
(63)
122 123 124 125 126
2
+ ||ςa Q ||
108
111
+ ||ςaγ ||2 + ||ςaα ||
105 106
+ σa K IT K I || W c∗ ||2 || S c ||2 + ||ςa ||2 + ||εa ||2
z Q εa Q
104
110
σc
96 98
A 2zγ − zγ2
k Q z2Q
2
2
−
95 97
kγ zγ2
where
zα ςaα
+
93
Then, the main result of the paper can be obtained.
z2 )2
4( A 2z Q
89
= −κ L + ξ
2( A 2zγ − zγ2 )2
z2 )2
2 zα
4( A 2zα
88
+||ςa ||2 + ||εa ||2
− zγ2
2( A 2zα
k V z2V
−
zγ
2 zα
−
86
94
−
zγ εaγ
4( A 2zγ − zγ2 )2
≤
z2
2
2
zγ2
≤
− γ
A 2zγ
σc T ˜ T ˜ ˜ c ||2 || S c ||2 − σa || W ˜ a ||2 || S a ||2 L˙ ≤ − W c W c + σa K IT K I || W
+ ςγ + εγ
2( A 2z Q − z2Q )
2
−
z Q z˙ Q A 2z Q − z2Q
zQ
58
A 2zγ − zγ2
−
78
k Q z2Q
87
zγ
− kQ zQ − −
+
− zα2
A 2zα
By Young’s inequality, one has
60
−
zh2
2 kα zα
92
(56)
zγ ςaγ
A 2zh
kγ zγ2
From (24), (33), (40) and (58), one has
56
59
kh zh2
77
90
A 2zγ
A 2z Q − z2Q kh zh2 A 2zh − zh2
76
Substituting (57) into (56), the time derivative of L 2 satisfies
L˙ 2 ≤ −
74 75
L = L c + La + L 1 + L 2
zQ
(57)
Consider the following Lyapunov function candidate
− kγ zγ −
− α
73
+ ||εa Q ||2
z2Q )2
(55)
z2
72
85
A 2z Q − z2Q
zα
71
4. Stability and performance analysis
2 A 2zα − zα 2 AzQ
+
−
4( A 2z Q
70
+ ||εaα ||2
z2Q
≤
z2Q
z2 )2
69
+||εaα ||2 + ||ςa Q ||2 + ||εa Q ||2
− γ
− zγ2
A 2zγ
=−
47
63
−
A 2z Q
zh2 z2
zγ
44
57
z Q εa Q
− α
4( A 2zα
68
+||ςaγ ||2 + ||εaγ ||2 + ||ςaα ||2
A 2zα
zh z˙ h A 2zh − zh2 −kh zh2 A 2zh − zh2
+
37
46
(53)
Combined with (42), (46), (49) and (52), the time derivative of L 2 yields that
35
45
)
− α
z2 )2
2 zα
67
+ ||εaγ ||2
− γ
4( A 2zγ
≤
z2
A 2zγ
log
1
−
A 2zh
log
2
LQ =
32 33
ˆQ + Q˙ c − f Q −
zα εaα
A 2zh
27 28
z2Q
z2
where
24 25
−
2( A 2z Q
L 2 = Lh + L γ + L α + L Q
21 22
zQ
zγ2
≤
− γ
A 2zγ A 2zα
To guarantee the prescribed tracking performance | z V | < A zV , a barrier Lyapunov function candidate for altitude subsystem is chosen as
18 19
(52)
The actual controller of the altitude subsystem can be designed as
8 9
zγ εaγ
Step 4: For the dynamics of virtual tracking error z Q
7
2
Theorem 1. Consider the AHV-VGI, with the proposed reinforcement learning control scheme, if Assumptions 1-3 are satisfied, the closed-loop ˜ c and W ˜ a are semi-globally bounded. Furthersignals V , h, γ , α , Q , W ˜ c and W ˜ a will eventually more, the error signals z V , zh , zγ , zα , z Q , W remain within the compact set z V , zh , zγ , zα , z Q , W˜ and W˜ , c a respectively, which are given as follows:
127 128 129 130 131 132
JID:AESCTE
AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.8 (1-12)
C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••
8
1
z V = { z V ∈ R | | z V | ≤ A zV }
2
zh = { zh ∈ R | | zh | ≤ A zh }
3 4 5 6 7 8 9 10 11 12
67 68 69
zγ = { z γ ∈ R | | z γ | ≤ A z γ }
70 71
zα = { z α ∈ R | | z α | ≤ A z α }
Wa
15
73 74 75 76 77 78
κ
13 14
72
(64)
z Q = { z Q ∈ R | | z Q | ≤ A z Q }
˜ c ∈ R N c | || W ˜ c || ≤ 2( L (0) + ξ ) W˜ c = W κ
˜ a ∈ R Na | || W ˜ a || ≤ 2( L (0) + ξ ) ˜ = W
79
where N c and N a are numbers of nodes of critic network and actor network, respectively, ξ and κ are positive constants.
80 81 82
16 17 18 19
Proof. With assumption 3 satisfied, one has z V (0) < A zV , zh (0) ≤ A zh , zγ (0) < A zγ , zα (0) < A zα , z Q (0) < A z Q , and the following inequality is established from (60)
20 21 22 23 24 25 26 27
L (t ) ≤ N (t ) =
1
29
2
31 32 33 34 35 36 37
1 2
log
40 41 42 43 44 45
48
+
ξ
κ
log
A 2zV − z2V (t ) A 2zh A 2zh − zh2 (t )
< +∞
| z V (t )| ≤ | zh (t )| ≤
(66)
≤ N (t )
1 − e −2N (t ) A zV , lim | z V (t )| ≤ t →∞
1 − e −2N (t ) A zh , lim | zh (t )| ≤ t →∞
ξ
1 − e −2 κ A zV (67) ξ
1 − e −2 κ A zh
˜ aT W ˜ cT W ˜ a ≤ L (0) + ξ and (1/2) W ˜c ≤ From (65), there is (1/2) W κ ξ L (0) + κ . Therefore, one can also obtain
˜ a ||2 ≤ 2( L (0) + || W ˜ c ||2 ≤ 2( L (0) + || W
ξ
κ ξ
κ
) )
51 52 53 54 55 56
(68)
Remark 4. From (67), velocity tracking error z V and altitude tracking error zh exponentially converge into the set zV and zh , respectively. The steady tracking error can be reduced by increasing σc , σa , k V , kh , kγ , kα , k Q and reducing K I . However, such an operation would increase the amplitude of control signals. Hence, the capability of the actuators should be taken into consideration when tuning the parameters.
57 58 59 60 61 62 63 64 65 66
87 88
and control ability of neural networks have been largely improved, which makes the neural networks can be implemented in practical engineering.
89 90 91 92 93 94
49 50
86
(65)
the transient and steady-state tracking performances can be described as
85
Fig. 4. Altitude.
≤ N (t )
46 47
κ
e
−κ t
84
5. Simulation
A 2zV
38 39
which indicates all the prescribed performance: | z V | ≤ A zV , | zh | ≤ Azh, | zγ | ≤ A zγ , | zα | ≤ A zα , | z Q | ≤ A z Q are not violated. Further, considering the inequality
28 30
L (0) −
ξ
83
Remark 5. The core of the reinforcement learning control strategy is exchanging part of real-time performance for the optimal disturbance estimation performance. Hence, the issue of real time becomes a main problem prohibiting widespread use of the reinforcement learning method. With the development of the neural networks, the real-time problem has been preliminary solved. Nowadays, the neural networks are usually constructed by separate hardware solutions in practical engineering, such as FPGA, GPA or ASIC, et al. By utilizing the hardware, the processing speed
In this section, a numerical simulation study is given to demonstrate the superiority and effectiveness of the proposed control strategy. The initial states are: V (0) = 7.0 Mach, h(0) = 85,000 ft, γ (0) = 0◦ , α (0) = 0◦ and Q (0) = 0◦ /s. The parameters of control system are set as: k V = 3, kh = 5.1, kγ = 2.4, kα = 10, k Q = 5.5. As for the tracking differentiator, parameters are chosen as: R = 200, δ = 0.01, a = 0.5, b = 1.14. For reinforcement learning, the parameters are set as follows: the numbers of hidden nodes of the critic and action networks are 20 and 18, respectively; the learning rate to actor NN σa and critic NN σc are both 0.05: the weights of actor NN and critic NN are randomly initialized in [−1, 1]; for S a ( Z a ) and S c ( Z c ), center parameters are chosen as either 1 or −1, and variances are chosen as ηa = 3600 and ηc = 250; P in (13) is given as P = 0.1I 4×4 , where I is identity matrix; K I in (26) is set as K I = [10, 10, 10, 10]T . The velocity command varies from 7 to 7.3 Mach and the altitude command varies from 85,000 to 95,500 ft. Controller parameters are k V = 6, kh = 6, kγ = 2.5, kθ = 10 and k Q = 5. The prescribed tracking performances are given as | z V | < 0.01 Mach/s and | zh | < 40ft. Also, states and control input constraints are |γ | < 0.5◦ , |α | < 3◦ , | Q | < 3◦ , φ ∈ [0, 1] and δe ∈ [−15, 15]. The external disturbances are set as d V = 2 ft/s at 120 < t ≤ 150 s, d V = 2cos(0.2t ) ft/s at 170 < t ≤ 250 s and dγ = 0.01◦ , dα = 0.02◦ , d Q = 0.2◦ / s at 120 < t ≤ 150 s, dγ = 0.01sin(0.2t )◦ , dα = 0.02sin(0.2t )◦ , d Q = 0.2sin(0.3t )◦ /s at 170 < t ≤ 250 s for AHV-VGI system. Assume that aerodynamic coefficients have 5% uncertainty. Finally, define w c1 = [ w c11 , · · · , w c51 ]T and w a1 = [ w a11 , · · · , w a51 ]T as the weights from all the inputs to the first hidden node of critic network and actor network, respectively. With the controller in [19] employed for comparison, the simulation results are shown in Fig. 4 - Fig. 18. The altitude and velocity signal tracking performance is shown in Figs. 4-7, respectively. The outputs of AHV-VGI can track the reference signals fast and stably. Also, the given tracking error constrains are satisfied. Furthermore, compared with the controller in [19], smaller tracking errors can be achieved by the proposed controller. Figs. 8-10 demonstrate the curves of the flight path angle γ , angle of attack α and pitch angle rate Q , respectively. It
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
JID:AESCTE AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.9 (1-12)
C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••
9
1
67
2
68
3
69
4
70
5
71
6
72
7
73
8
74
9
75
10
76
11
77
12
78
13
79
14
80
15
81
16
82
17
83
18
84 85
19 20
Fig. 5. Altitude tracking error.
Fig. 8. Flight path angle.
86
21
87
22
88
23
89
24
90
25
91
26
92
27
93
28
94
29
95
30
96
31
97
32
98
33
99
34
100
35
101
36
102
37
103
38
104
39
105
40
106 107
41 42
Fig. 9. Angle of attack.
Fig. 6. Velocity.
can be seen that all the states are regulated within the desired ranges. Figs. 11-12 illustrate the curves of fuel equivalency ratio φ and elevator deflection δe , respectively. The controller inputs are both under the given ranges. The dynamic of disturbances and their estimates are shown in Figs. 13-15. The disturbances can be real-time estimated by the neural networks. Figs. 16-18 show the weights updating of the critic and actor network, respectively. The weight values of two networks are adaptively adjusted during the flight process. Figs. 13-18 show the effectiveness of the reinforcement learning scheme. Hence, the simulation result demonstrates that the BLF based reinforcement learning controller can guarantee the system achieves all the control objectives as analyzed in this paper.
44 45 46 47 48 49 50 51 52 53 54 55 56
110 111 112 113 114 115 116 117 118 119 120 121 122 123
57
6. Conclusion
58
124 125
59 60 61 62 63 64 65 66
108 109
43
Fig. 7. Velocity tracking error.
To solve the signal tracking problem with tracking performance constraints for AHV-VGI, a BLF-based reinforcement learning controller is proposed in this paper. The proposed reinforcement learning method with actor-critic structure can effectively approximate the unknown disturbances and uncertainties in the flight control system. By constructing and analyzing the BLFs of the flight control system, the prescribed tracking performance can be guar-
126 127 128 129 130 131 132
JID:AESCTE
AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.10 (1-12)
C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••
10
1
67
2
68
3
69
4
70
5
71
6
72
7
73
8
74
9
75
10
76
11
77
12
78
13
79
14
80
15
81
16
82
17
83
18
84
19
85
20 21
Fig. 10. Pitch angle rate.
86
Fig. 13. Disturbance V .
87
22
88
23
89
24
90
25
91
26
92
27
93
28
94
29
95
30
96
31
97
32
98
33
99
34
100
35
101
36
102
37
103
38
104
39
105
40
106
41
107 108
42 43
Fig. 11. Fuel equivalency ratio.
44
Fig. 14. Disturbance γ .
109 110
45
111
46
112
47
113
48
114
49
115
50
116
51
117
52
118
53
119
54
120
55
121
56
122
57
123
58
124
59
125
60
126
61
127
62
128
63
129
64
130 131
65 66
Fig. 12. Elevator deflection.
Fig. 15. Disturbance α .
132
JID:AESCTE AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.11 (1-12)
C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••
11
anteed theoretically. Tracking differentiators are applied to avoid the “explosion of term” problem. The simulation results illustrate the effectiveness and advantage of the proposed control strategy. The future research works will divided into three parts. First, the controller design under time-varying tracking error and state constraints for AHV-VGI will be concentrated. Secondly, motivated by [37–40], the coupling between the variable geometry inlet and the AHV will be further considered. And controller design under this circumstance will be focused. Thirdly, by using the novel backstepping strategy without virtual controllers, simplify the structure of the proposed control strategy.
1 2 3 4 5 6 7 8 9 10 11
68 69 70 71 72 73 74 75 76 77 78
12
Declaration of competing interest
13
79 80
14
None declared.
15
81 82
16
Acknowledgements
17
83 84
18 19 20
67
Fig. 16. Disturbance Q .
21 22 23
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61833016 and 61873295) and Shanghai Aerospace Science and Technology Innovation Fund (SAST2017-096).
86 87 88 89
Appendix. (Table 1)
24
85
90 91
25
Table 1 Fitting coefficients in aerodynamic forces.
26 27
92 93
28
Parameters
Values
Parameters
29
C lα
−0.2416
C lα
C lMa C l0 C LMa C L0
30 31 32 33 34 35 36
2
Values 0.0633
−5.2380
2 C lMa
0.1598
37.5193
Cα
0.0157
δe
0.0066
L
5.45 × 10−5
CL
0.0046
Cα
D
2 Cα
D
3.58 × 10−4
δe2
CD
CD
4.37 × 10−5
1.28 × 10−4
94 95 96 97 98 99 100
δe
−1.97 × 10−10
CD
α δe
9.78 × 10−5
C 0D
0.0133
103
101 102
Ma CD
−5.32 × 10−4
C Tα
0.0328
C TMa
0.0026
−0.152
104
C T0
φα
39
CT
0.3252
105
40
CT
−0.703
φ
CT
8.9227
106
41
α CM
0.0064
Ma CM
−0.0022
107
42
C Me
δ
−0.014
0 CM
0.051
108
−1.317 × 10−4
109
1.844 × 10−5
110
37 38
φ Ma
C Lα,l
9.02 × 10−5
44
C L0,l
0.0012
45
Ma CD ,l
−1.0037 × 10−5
C 0D ,l
5.121 × 10−5
111
46
C Tα,l
0.0025
C TMa ,l
−0.0045
112
47
C T0 ,l
0.0596
α CM ,l
1.529 × 10−5
113
48
Ma CM ,l
−2.53 × 10−4
114
43
Fig. 17. Weights updating of w c1 .
4.303 × 10−5
C LMa ,l Cα D ,l
0 CM ,l
115
49 50
References
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
Fig. 18. Weights updating of w a1 .
[1] K. Bilimoria, D. Schmidt, Integrated development of the equations of motion for elastic hypersonic flight vehicles, J. Guid. Control Dyn. 18 (1) (1995) 73–81, https://doi.org/10.2514/3.56659. [2] J. Parker, A. Serrani, S. Yurkovich, M. Bolender, D. Doman, Control-oriented modeling of an air-breathing hypersonic vehicle, J. Guid. Control Dyn. 30 (3) (2007) 856–869, https://doi.org/10.2514/1.27830. [3] S. Zhang, Q. Wang, C. Dong, Extended state observer based control for generic hypersonic vehicles with nonaffine-in-control character, ISA Trans. 80 (2018) 127–136, https://doi.org/10.1016/j.isatra.2018.05.020. [4] Y. Shen, W. Huang, T. Zhang, L. Yan, Parametric modeling and aerodynamic optimization of expert configuration at hypersonic speeds, Aerosp. Sci. Technol. 84 (2019) 641–649, https://doi.org/10.1016/j.ast.2018.11.007. [5] T. Zhang, Z. Wang, W. Huang, L. Yan, Parameterization and optimization of hypersonic-gliding vehicle configurations during conceptual design, Aerosp. Sci. Technol. 58 (2016) 225–234, https://doi.org/10.1016/j.ast.2016.08.020. [6] W. Zhang, W. Chen, W. Yu, Entry guidance for high-l/d hypersonic vehicle based on drag-vs-energy profile, ISA Trans. 83 (2018) 176–188, https:// doi.org/10.1016/j.isatra.2018.08.012.
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
JID:AESCTE
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
AID:105537 /FLA
[m5G; v1.261; Prn:18/11/2019; 14:46] P.12 (1-12)
C. Liu et al. / Aerospace Science and Technology ••• (••••) ••••••
[7] A. Roenneke, A. Markl, Re-entry control to a drag-vs-energy profile, J. Guid. Control Dyn. 17 (5) (1994) 916–920, https://doi.org/10.2514/3.21290. [8] J. Ge, P. Frank, C. Lin, Robust H state feedback control for linear systems with state delay and parameter uncertainty, Automatica 32 (8) (1996) 1183–1185, https://doi.org/10.1016/0005-1098(96)00053-2. [9] Q. Zong, J. Wang, B. Tian, Y. Tao, Quasi-continuous high-order sliding mode controller and observer design for flexible hypersonic vehicle, Aerosp. Sci. Technol. 27 (2013) 127–137, https://doi.org/10.1016/j.ast.2012.07.004. [10] G. Wu, X. Meng, F. Wang, Improved nonlinear dynamic inversion control for a flexible air-breathing hypersonic vehicle, Aerosp. Sci. Technol. 78 (2018) 734–743, https://doi.org/10.1016/j.ast.2018.04.036. [11] Q. Hu, Y. Meng, C. Wang, Y. Zhang, Adaptive backstepping control for airbreathing hypersonic vehicles with input nonlinearities, Aerosp. Sci. Technol. 73 (2018) 289–299, https://doi.org/10.1016/j.ast.2017.12.001. [12] X. Bu, X. Wu, R. Zhang, Z. Ma, J. Huang, Tracking differentiator design for the robust backstepping control of a flexible air-breathing hypersonic vehicle, J. Franklin Inst. 352 (4) (2015) 1739–1765, https://doi.org/10.1016/j.jfranklin. 2015.01.014. [13] H. An, Q. Wu, C. Wang, Differentiator based full-envelope adaptive control of air-breathing hypersonic vehicles, Aerosp. Sci. Technol. 82–83 (2018) 312–322, https://doi.org/10.1016/j.ast.2018.09.032. [14] Z. Guo, J. Chang, J. Guo, J. Zhou, Adaptive twisting sliding mode algorithm for hypersonic reentry vehicle attitude control based on finite-time observer, ISA Trans. 77 (2018) 20–29, https://doi.org/10.1016/j.isatra.2018.04.001. [15] J. Sun, S. Xu, S. Song, X. Dong, Finite-time tracking control of hypersonic vehicle with input saturation, Aerosp. Sci. Technol. 71 (2017) 272–284, https://doi.org/ 10.1016/j.ast.2017.09.036. [16] J. Niu, F. Chen, G. Tao, Nonlinear fuzzy fault-tolerant control of hypersonic flight vehicle with parametric uncertainty and actuator fault, Nonlinear Dyn. 92 (2018) 1299–1315, https://doi.org/10.1007/s11071-018-4127-z. [17] X. Bu, G. He, K. Wang, Tracking control of air-breathing hypersonic vehicles with non-affine dynamics via improved neural back-stepping design, ISA Trans. 75 (2018) 88–100, https://doi.org/10.1016/j.isatra.2018.02.010. [18] S. Macheret, M. Shneider, R.B. Miles, Scramjet inlet control by off-body energy addition: a virtual cowl, AIAA J. 42 (11) (2004) 2294–2302, https://doi.org/10. 2514/1.3997. [19] L. Dou, P. Su, Q. Zong, Z. Ding, Fuzzy disturbance observer-based dynamic surface control for air-breathing hypersonic vehicle with variable geometry inlets, IET Control Theory Appl. 12 (1) (2018) 10–19, https://doi.org/10.1049/iet-cta. 2017.0742. [20] L. Dou, P. Su, Z. Ding, Modeling and nonlinear control for air-breathing hypersonic vehicle with variable geometry inlet, Aerosp. Sci. Technol. 67 (2017) 422–432, https://doi.org/10.1016/j.ast.2017.04.024. [21] L. Dou, J. Gao, Q. Zong, Z. Ding, Modeling and switching control of air-breathing hypersonic vehicle with variable geometry inlet, J. Franklin Inst. 355 (15) (2018) 6904–6926, https://doi.org/10.1016/j.jfranklin.2018.07.007. [22] H. An, B. Fidan, Q. Wu, C. Wang, X. Cao, Sliding mode differentiator based tracking control of uncertain nonlinear systems with application to hypersonic flight, Asian J. Control 21 (1) (2019) 143–155, https://doi.org/10.1002/asjc.1932. [23] H. An, Q. Wu, H. Xia, C. Wang, X. Cao, Adaptive controller design for a switched model of air-breathing hypersonic vehicles, Nonlinear Dyn. 94 (2018) 1851–1866, https://doi.org/10.1007/s11071-018-4461-1. [24] D. Liu, H. Javaherian, O. Kovalenko, T. Huang, Adaptive critic learning techniques for engine torque and air–fuel ratio control, IEEE Trans. Syst. Man Cybern., Part B, Cybern. 38 (4) (2008) 988–993, https://doi.org/10.1109/TSMCB. 2008.922019.
[25] Y. Ouyang, W. He, X. Li, Reinforcement learning control of a singlelink flexible robotic manipulator, IET Control Theory Appl. 11 (9) (2017) 1426–1433, https://doi.org/10.1049/iet-cta.2016.1540. [26] H. Jiang, H. Zhang, Y. Cui, G. Xiao, Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method, Neurocomputing 273 (2018) 68–77, https:// doi.org/10.1016/j.neucom.2017.07.058. [27] Y. Zhou, E. Kampen, Q. Chu, Nonlinear adaptive flight control using incremental approximate dynamic programming and output feedback, J. Guid. Control Dyn. 40 (2017) 493–496, https://doi.org/10.2514/1.G001762. [28] C. Mu, Z. Ni, C. Sun, H. He, Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming, IEEE Trans. Neural Netw. Learn. Syst. 28 (3) (2017) 584–598, https://doi.org/10.1109/TNNLS.2016.2516948. [29] D. Prokhorov, R. Santiago, D. Wunsch, Adaptive critic designs: a case study for neurocontrol, Neural Netw. 8 (9) (1995) 1367–1372, https://doi.org/10.1016/ 0893-6080(95)00042-9. [30] Z. Ni, H. He, X. Zhong, D. Prokhorov, Model-free dual heuristic dynamic programming, IEEE Trans. Neural Netw. Learn. Syst. 26 (8) (2015) 1834–1839, https://doi.org/10.1109/TNNLS.2015.2424971. [31] D. Wang, D. Liu, Q. Wei, D. Zhao, N. Jin, Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Automatica 48 (8) (2012) 1825–1832, https://doi.org/10.1016/j.automatica.2012. 05.049. [32] H. An, H. Xia, C. Wang, Barrier lyapunov function-based adaptive control for hypersonic flight vehicles, Nonlinear Dyn. 88 (3) (2017) 1833–1853, https:// doi.org/10.1007/s11071-017-3347-y. [33] K. Tee, B. Ren, S. Ge, Control of nonlinear systems with time-varying output constraints, Automatica 47 (2011) 2511–2516, https://doi.org/10.1016/j. automatica.2011.08.044. [34] X. Bu, Guaranteeing prescribed output tracking performance for air-breathing hypersonic vehicles via non-affine back-stepping control design, Nonlinear Dyn. 91 (1) (2018) 525–538, https://doi.org/10.1007/s11071-017-3887-1. [35] W. Chang, S. Tong, Adaptive fuzzy tracking control design for permanent magnet synchronous motors with output constraint, Nonlinear Dyn. 87 (2017) 291–302, https://doi.org/10.1007/s11071-016-3043-3. [36] Y. Liu, S. Lu, S. Tong, X. Chen, C. Chen, D. Li, Adaptive control-based barrier lyapunov functions for a class of stochastic nonlinear systems with full state constraints, Automatica 87 (2018) 83–93, https://doi.org/10.1016/j.automatica. 2017.07.028. [37] T. Zhang, Z. Wang, W. Huang, S. Li, A design approach of wide-speed-range vehicles based on the cone-derived theory, Aerosp. Sci. Technol. 71 (2017) 42–51, https://doi.org/10.1016/j.ast.2017.09.010. [38] S. Li, Z. Wang, W. Huang, J. Lei, S. Xu, Design and investigation on variable mach number waverider for a wide-speed range, Aerosp. Sci. Technol. 76 (2018) 291–302, https://doi.org/10.1016/j.ast.2018.01.044. [39] Z. Zhao, W. Huang, S. Li, T. Zhang, L. Yan, Variable mach number design approach for a parallel waverider with a wide-speed range based on the osculating cone theory, Acta Astronaut. 147 (2018) 163–174, https://doi.org/10.1016/j. actaastro.2018.04.008. [40] Z. Zhao, W. Huang, B. Yan, L. Yan, T. Zhang, R. Moradi, Design and high speed aerodynamic performance analysis of vortex lift waverider with a wide-speed range, Acta Astronaut. 151 (2018) 848–863, https://doi.org/10.1016/j.actaastro. 2018.07.034.
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
47
113
48
114
49
115
50
116
51
117
52
118
53
119
54
120
55
121
56
122
57
123
58
124
59
125
60
126
61
127
62
128
63
129
64
130
65
131
66
132