Neural Networks 32 (2012) 267–274
Contents lists available at SciVerse ScienceDirect
Neural Networks journal homepage: www.elsevier.com/locate/neunet
2012 Special Issue
Fault-tolerant nonlinear adaptive flight control using sliding mode online learning Thomas Krüger ∗ , Philipp Schnetter, Robin Placzek, Peter Vörsmann Institute of Aerospace Systems, Technische Universität Braunschweig, Hermann-Blenk-Str. 23, 38108 Braunschweig, Germany
article
info
Keywords: Adaptive flight control Sliding mode online learning Variable learning rate Unmanned aircraft system
abstract An expanded nonlinear model inversion flight control strategy using sliding mode online learning for neural networks is presented. The proposed control strategy is implemented for a small unmanned aircraft system (UAS). This class of aircraft is very susceptible towards nonlinearities like atmospheric turbulence, model uncertainties and of course system failures. Therefore, these systems mark a sensible testbed to evaluate fault-tolerant, adaptive flight control strategies. Within this work the concept of feedback linearization is combined with feed forward neural networks to compensate for inversion errors and other nonlinear effects. Backpropagation-based adaption laws of the network weights are used for online training. Within these adaption laws the standard gradient descent backpropagation algorithm is augmented with the concept of sliding mode control (SMC). Implemented as a learning algorithm, this nonlinear control strategy treats the neural network as a controlled system and allows a stable, dynamic calculation of the learning rates. While considering the system’s stability, this robust online learning method therefore offers a higher speed of convergence, especially in the presence of external disturbances. The SMC-based flight controller is tested and compared with the standard gradient descent backpropagation algorithm in the presence of system failures. © 2012 Elsevier Ltd. All rights reserved.
1. Introduction The increasing degree of automation in complex dynamical systems, not only in the field of aerospace technology, calls for control strategies which enable robust control of nonlinear systems also in the presence of parameter uncertainties or even system failures. This especially is the case as more and more applications strive for at least semi-autonomous behavior in terms of artificial intelligence and agent-like characteristics (Russell & Norvig, 2004). Aircraft systems today normally use conventional cascade control strategies (Brockhaus, Alles, & Luckner, 2011), which show adequate performance in standard flight conditions but come to their limits in the presence of significant nonlinearities. This is in particular true for smaller unmanned aircraft as these systems display an increased sensitivity towards turbulence and gusts (Kroonenberg, 2009). The learning capabilities of artificial neural networks (ANNs) offer the property of real-time adaptivity during operation and are therefore predestined to counteract such effects often too complex to determine analytically. In the field of aeronautics neural networks have been widely implemented: as neural control elements (Ferrari & Stengel, 2004;
∗
Corresponding author. Tel.: +49 531 3919977; fax: +49 531 3919966. E-mail addresses:
[email protected] (T. Krüger),
[email protected] (P. Schnetter),
[email protected] (R. Placzek),
[email protected] (P. Vörsmann). 0893-6080/$ – see front matter © 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2012.02.025
Pashilkar, Sundararajan, & Saratchandran, 2006), for nonlinear aerodynamic parameter identification (Das, Kuttieri, Sinha, & Jategaonkar, 2010; Seifert, 2003) and also for flight control of small UAS (Dierks & Jagannathan, 2010; Krüger et al., 2010). Recent research work shows, that neural networks also become more important regarding their application to robust control strategies of high assurance systems (Schuhmann & Liu, 2010). One of these control strategies is the concept of dynamic inversion, also called feedback linearization, which is usually combined with adaptive neural networks and has been significantly discussed for flight control purposes over the last years (Burken, Nguyen, & Griffin, 2010; Calise, Lee, & Sharma, 2000; Holzapfel, 2004; Johnson, 2000; Kim, 2003; Rysdyk & Calise, 2005). The basic principle of dynamic inversion is to realize a set point independent linearization of the system, where existing inversion errors due to model uncertainties can be approximated and compensated using an ANN (Hornik, Stinchcombe, & White, 1989). These networks can be trained with an expanded backpropagation approach, which adds the so called e-modification and a robustifying term (Lewis, Yegildirek, & Liu, 1996; Narendra & Annaswamy, 1987). A possibility to increase the speed of convergence of the backpropagation algorithm, while guaranteeing stability and robustness towards external disturbances, is to use the concept of sliding mode learning, which is derived from nonlinear control theory. In this expansion of backpropagation learning the ANN is considered a controlled system in which the network
268
T. Krüger et al. / Neural Networks 32 (2012) 267–274
where Lg Lfr −1 h(⃗ x) ̸= 0 is found. The number of differentiations r is called the relative degree of the system. For a multipleinput–multiple-output (MIMO) system with m outputs (4) can be written as:
(ri )
yi
m×1
= b(⃗x) + A(⃗x) · u⃗,
(5)
with
Fig. 1. The unmanned aircraft system CAROLO P360.
output is optimized utilizing an error feedback loop. The main advantage of SMC-backpropagation, in contrast to the standard gradient descent, is the dynamic calculation of the learning rate, while assuring stability of the system (Nied, Seleme, Parma, & Menezes, 2005, 2007; Shakev, Topalov, & Kaynak, 2003; Topalov & Kaynak, 2001). The work presented here is based on the results from Krüger, Schnetter, Placzek, and Vörsmann (2011) and elaborates more accurately the differences of the training approaches regarding the speed of convergence in failure situations. For this comparison the adaptive flight control strategy is implemented for the unmanned aircraft CAROLO P360 depicted in Fig. 1. It is a smallscale UAS with a take-off mass of up to 25 kg and a wing span of 360 cm, which design parameters are described in Scholtz et al. (2011). The nonlinear simulation environment from Krüger et al. (2011) is adapted using the aerodynamic parameters of this aircraft, while maintaining all important subsystems like the integrated navigation system for attitude and position determination (Winkler, 2007). Also, an atmospheric model for wind and turbulence (Brockhaus et al., 2011) is used to analyze the network behavior in presence of realistic external disturbances. 2. Basic control strategy
Consider a nonlinear single-input–single-output (SISO) system with affinity in control u:
r −1 Lg1 Lf 2 h2 (x) A(⃗ x) = .. . r −1 Lg1 Lf m hm (x) r1 Lf h1 ( x ) Lr2 h (x) 2 . ⃗b(⃗x) = f .. . rm Lf hm (x)
Lgm Lf 1
r −1
h1 (x)
Lgm Lf 2
r −1
h2 (x)
, .. . r m −1 Lgm Lf hm (x)
(6)
The linear input output map can now be found by utilizing a state transformation and a state feedback
⃗ = A−1 (⃗x) · ν⃗ − b⃗(⃗x) , u
(7)
which is equivalent to the inverted system dynamics. The new input or control signal of the inverted system is called pseudo control ν⃗ . The pseudo control signal ν⃗ is generated to calculate ⃗, that assures a desired system output the necessary input signal u ⃗(r ) . Combining (7) and (5) reduces the system dynamics to an y integrator chain of the form
(ri )
yi
m×1
= ν⃗i ,
(8)
which allows standard linear control principles to be used. If the system dynamics are not available in a closed algebraic form, which is, for example, the case for many aerodynamic data sets, an approximate dynamic inversion has to be performed. 2.2. Error dynamics
A(⃗ x) ̸= A(⃗ x),
⃗(⃗x) ̸= b⃗(⃗x), b
(9)
which means that
y = h(⃗ x),
(1)
where f and g are vector fields in the domain D ⊂ R and f , g as well as h are smooth in D. The principle of dynamic inversion is, to realize a linear input/output behavior of the system. To do so, the output y or a differentiation of y has to be directly controllable by the control input u. Therefore, the derivative y˙ can be written as: n
∂h ˙ ∂h · ⃗x = · [f (⃗x) + g (⃗x) · u] = Lf h(⃗x) + Lg h(⃗x) · u, ∂⃗x ∂⃗x
(2)
(3)
is called the Lie derivative of h with respect to f (Isidori, 1996). If Lg h(⃗ x) at ⃗ x=⃗ x0 is zero and therefore y˙ is not directly controllable the equation has to be differentiated r times until the relationship (4)
(10)
(r )
⃗R is the r-th differentiation of the desired reference signal. where y Taking a SISO system for clarity, the inversion error can be written as: ∆ = y(i ri ) m×1 − yR(,rii )
m×1
.
(11)
⃗ and the desired The deviation between the actual system output y ⃗R is expressed as: output y e1
⃗e = ... =
where
y(r ) = Lrf h(⃗ x) + Lg Lfr −1 h(⃗ x) · u,
··· ··· .. . ···
⃗(⃗x) ̸= y⃗(Rr ) , ⃗(r ) = b⃗(⃗x) + A(⃗x) · y A−1 (⃗ x) · ν⃗ − b
⃗x˙ = f (⃗x) + g (⃗x) · u,
∂h · f (⃗x) = Lf h(⃗x) ∂⃗x
h1 ( x )
Due to model uncertainties, the dynamics used for the inversion (marked with index ) often do not match the characteristics of the real system. Taking this into account one can assume:
2.1. Nonlinear dynamic inversion
y˙ =
r −1
Lg1 Lf 1
y1 − yR,1
.. , . ym − yR,m
em
(12)
which gives the error state vector for one output m
e(t ) e˙ (t ) χ⃗m (t ) = ...
e(r −1)
.
(13)
T. Krüger et al. / Neural Networks 32 (2012) 267–274
269
For clarity all following derivations are again given for a SISO system (only one of the m outputs is observed). It is possible to describe the error dynamics with a linear differential equation:
Using the definition of the error signal e from (12) and the error state vector χ ⃗ (13) on (21), the final expression of the error dynamics with PCH becomes
cT χ ⃗ = ∆, e(r ) + cr −1 e(r −1) + · · · + c1 e˙ + c0 e = e(r ) + ⃗
e(r ) + c⃗T χ ⃗ = ∆,
(14)
where ⃗ c T is the vector of gains of the linear controller. Thus, the inversion error can be interpreted as the input signal of the error dynamics. With the state vector χ ⃗ , the state space model of the error dynamics becomes:
χ⃗˙ = AE χ⃗ + b⃗E ∆,
(15)
with 0 0
1 0
. AE = .. 0 −c0
.. .
0
−c1
··· ··· .. . ··· ···
0 0
0
.. . , 1
0 ⃗E = .. . b . 0
−cr −1
(16)
1
The stability of the error dynamics can be ensured with positive design parameters ci > 0, leading to negative real parts of all eigenvalues, so AE is a Hurwitz matrix. Using the Lyapunov equation ATE PE + PE AE = −QE ,
(17)
yields that PE and QE are positive matrices.
Saturated actuator dynamics pose a problem for adaptive as well as non-adaptive control. A limited actuator output cancels the affinity in control and can lead to an integrator wind-up. In presence of an ANN this promotes an adaption to saturated actuator dynamics which may result in a constant growth of the network weights. By modifying the reference model through the so called Pseudo Control Hedging (PCH) the linear controller and the ANN can be shielded from these limited system characteristics. The basic principle of PCH is to measure or, if not possible, to estimate the actual actuator output signal u = gA (uC ) and to calculate the difference between commanded and achieved pseudo-control signal ν − ν = F (⃗ x, gA (uC )), since it is an indicator for the time delay of the system. For a stable implementation of PCH it is essential to determine u = gA (uC ) accurately, as it may otherwise induce oscillations into the control loop. The hedge signal is calculated with: u
F −1 (⃗ x, ν))). (18) νh = ν − ν = F (⃗ x , uc ) − F (⃗ x , u) = ν − F (⃗ x, gA (
uc
Then νh is utilized for slowing down the reference model to a state where it generates a reference signal the system can adhere to. The r-th differentiation of the reference model output can be written as: (r )
yR = νR − νh .
(19) (r )
Now, the difference between the achieved system reaction y and the commanded pseudo-control ν can be written as follows: y
(⃗x, u) − ν = ∆ + ν − ν = ∆ − νh ,
(20)
with ν = νR + νC , where νC is the signal of the linear controller. Using (19) on (20) the inversion error ∆ can be determined by: (r )
y(r ) (⃗ x, u) − yR (⃗ x , u) +
r i =1
ci−1 · e(i−1) = ∆.
which is equivalent to the controller architecture without Pseudo Control Hedging. Hence, the stability and boundedness of the error dynamics are not affected by the application of PCH. However, due to closing an additional loop, the effect of PCH on the stability of the overall system has to be analyzed by using a Lyapunov-function which considers the modified reference model dynamics (Kim, 2003). In Fig. 2 the control architecture, expanded with neural networks and PCH, is depicted for the rotational motion of an aircraft, which is a MIMO system. 2.4. Expanded controller using neural networks Standard feedforward networks are implemented to cancel the inversion error ∆. These networks use a hyperbolic tangent transfer function in the hidden and linear functions in the input and output layer. In general, the forward propagation step of any neuron is calculated by:
oLj
=f
n
wij ·
oiL−1
+ bj ,
(23)
i=1
where j indicates the neuron of layer L and i the n neurons of layer L − 1; b is the bias of neuron j and f the transfer function. This can be summarized in vectorial notation:
2.3. Pseudo Control Hedging
(r )
(22)
(21)
⃗ = w(2) · ⃗f (2) w(1) ⃗x , y
(24)
where ⃗ x is the input vector, w(1) and w(2) are the weight matrices of the network, ⃗ f (2) is the transfer function of the hidden layer and ⃗ is the network output. The resulting squared error of an output y signal is then calculated with:
2 1 y t ,j − y j . (25) 2 For a gradient descent without momentum, which is commonly implemented for this control strategy (Burken et al., 2010; Calise et al., 2000; Kim, 2003; Rysdyk & Calise, 2005), the change of the weights can be determined as: Ej =
1wij = −µ
∂ Ej = −µ · δj · oLi −1 , ∂wij
(26)
with the learning rate µ > 0. The learning rate strongly affects the size of one training step, so it has to be chosen carefully, as it may induce instability into the learning phase. In Rojas (1996) the complete algorithm is deduced. Using the chain rule, 1wij can be calculated, starting with the backpropagated error signal δj of the output layer. ′
δj = (yt ,j − yj ) · f L (wij · oLi −1 )
(27)
To cancel the inversion error, as depicted in Fig. 2, the ANN has to approximate this nonlinear parameter. As shown in Hornik et al. (1989), an ANN with one hidden layer and sigmoid transfer function can approximate any nonlinear function, leaving a reconstruction error ε ⃗:
⃗ = w(∗2) · ⃗f (2) w∗(1) ⃗x + ε⃗ (⃗x), ∆
(28) (1)
(2)
with ∥ε∥ < ε and ε bounded. The matrices w∗ and w∗ represent the optimal weights, where ε is minimal. For simplification, the following general weight matrix is defined: w(2) w= 0
0 . w(1)
(29)
270
T. Krüger et al. / Neural Networks 32 (2012) 267–274
Fig. 2. Expanded controller design for the rotational motion of an aircraft.
Using (28) the error dynamics from (15) are expanded:
χ⃗˙ = AE · χ⃗ + bE ( ∆ − νad − νr ),
approximation and is bounded, as discussed in Holzapfel (2004) and Rysdyk and Calise (2005): (30)
νad = w(2) · ⃗f (2) w(1) ⃗x .
(31)
The term νr defines a robustifying term (Lewis et al., 1996), which is important regarding Lyapunov stability of the system:
νr = [ kr0 + kr1 · (∥w∥F + w∗ )] · ς⃗ .
2
⃗ w (2) f ′ (w(1) ⃗x) w(∗1) ⃗x + w(∗2) O (1) ⃗x . z = ε (⃗ x) − w
where the output of the ANN denoted by νad .
Combining (30) and (37) results in the final error dynamics, which have to be stable. This can be assured by defining the weight update laws as:
˙ (1) = Γ (1) · ⃗x · ς⃗ · w(2) · f ′ (w(1) ⃗x) − λ · ∥⃗ w ς ∥2 · w(1)
(32)
Here, kr0 and kr1 are positive design parameters, ς ⃗ = χ⃗ T PE bE denotes the filtered error term and ∥w∥F is the Frobenius norm of matrix w which is bounded by:
(38)
˙ (2) w
T
ς − f ′ (w(1) ⃗x) · w(1) ⃗x · ς⃗ = Γ (2) · ⃗f w(1) ⃗x ·⃗ T − λ · ∥⃗ ς ∥2 · w(2) ,
,
(39)
(40)
The optimal weight matrix w∗ , where the reconstruction error ε ⃗ , as is minimal, can be used to introduce a matrix weight error w difference between optimal and actual weights:
where Γ (1) as well as Γ (2) are positive matrices containing the learning rates of each layer and λ is a positive constant of the so called e-modification (Narendra & Annaswamy, 1987), which ensures upper bounds of the network weights. Note that in (39) and (40) the first term within the brackets denotes the weight change due to the standard backpropagation algorithm. To prove stability the following Lyapunov candidate function can be selected:
= w − w∗ . w
V (χ ⃗) =
n m ∥w∥F = wij2 ≤ w∗ .
(33)
i=1 j=1
(34)
According to (28) and (31) the remaining approximation error can be written as:
∆ − νad = w(∗2) ⃗f w(∗1) ⃗x − w(2) ⃗f w(1) ⃗x +⃗ε (⃗x) ,
(35)
⃗(2)
where f is replaced by ⃗ f for clarity. As the remaining approximation error has to be taken into account, a Taylor series expansion about the current output of the hidden layer is used and rewritten (1) = w(1) − w∗(1) : with w
⃗f ( w
(1)
2 ⃗ w (1) ⃗x +O (1) ⃗x , ⃗x) = f (w ⃗x) w ′
(1)
(36)
1 2
χ⃗ T PE χ⃗ +
+
1 2
1 2
(1) Γ (1) trace w
(2) Γ (2) trace w
−1
−1
(2) w
T
(1) w
T
.
(41)
The derivative with respect to time of V (χ ⃗ ) combined with (39) and (40) can finally be expressed with:
1 V˙ (χ ⃗ ) = − QE χ⃗ − ς⃗ kr0 + kr1 · ∥w∥F + w∗ · ς⃗ T 2 · wT . + ς⃗ z + λ · ∥⃗ ς ∥2 · tr w
(42)
where f ′ is a diagonal matrix containing the derivatives of the transfer function ⃗ f (2) . Combining (35) with (36) and rewriting yields:
In Holzapfel (2004), Johnson (2000) and Rysdyk and Calise (2005) the boundedness of the error dynamics and the network weights as well as the negativeness of V˙ (χ ⃗ ) are verified. The next section illustrates that SMC learning allows the dynamic calculation of the learning rates of an ANN.
(2) ⃗f w(1) ⃗x −f ′ (w(1) ⃗x)w(1) ⃗x ∆ − νad = z − w
3. Sliding mode learning algorithm
−w
(2) ′
(1)
f (w
(1)
⃗x) w ⃗ x,
(37)
where z is a deviation term, which takes into account the approximation error as well as the higher order terms of the Taylor
Sliding mode control theory is a concept from the field of variable structure systems (VSS). It can be implemented to control nonlinear MIMO-systems like the one presented in Section 2.
T. Krüger et al. / Neural Networks 32 (2012) 267–274
271
In Kaynak et al. (2001) it is shown that V˙ (S⃗) is negative for a general case of a continuous system. For a discrete time system however, such as a flight control circuit, the following stability condition should be implemented (Nied et al., 2007; Sarpturk et al., 1987; Topalov & Kaynak, 2001; Utkin et al., 2009): Fig. 3. Training of a neural network considered as a control process.
|S⃗k+1 | < |S⃗k |.
As the name VSS implies, such a controller changes the structure of the system and with it the control law whenever a predefined switching surface in the state space is crossed. This ensures that the state trajectories of a dynamic system are forced on a predefined subspace of the state space. Within this region of the state space a sliding mode can be realized on the switching surface so that the system is stable and returns to the equilibrium within finite time after activation. In the case of a reached sliding mode, the control law has to assure that the state vector does not leave the region of the sliding surface again, while repeatedly crossing the surface (Kaynak, Erbatur, & Ertugrul, 2001; Yu & Kaynak, 2009). A beneficial aspect of a dynamic system in sliding mode is, that it becomes robust against external disturbances and parameter changes (Kaynak et al., 2001; Sarpturk, Istefanopulos, & Kaynak, 1987; Utkin, Guldner, & Shi, 2009). In order to transfer this approach to neural networks, it makes sense to consider an ANN and its internal processes as a dynamic system. Taking this into account, the training of a network can be seen as a control process of this system as depicted in Fig. 3. The following procedure is given in general vector notation for networks with more than one output signal and marks a generic training approach, also applicable to other online training problems (Krüger et al., 2010). ⃗ based on the input The ANN propagates an output signal y ⃗ signals ⃗ x and the current weights w. The actual network error ε ⃗t and the is the difference between the desired network output y ⃗. The current error is used in a feedback loop and actual output y ⃗t . The difference is fed compared with the desired network error ε into the training block which generates a control signal composed of the weight changes of the neural network. There are different approaches to combine ANN with SMC, especially regarding possible definitions of the sliding surface functions S⃗ (Nied et al., 2007; Shakev et al., 2003; Topalov & Kaynak, 2001). To realize a network training according to Fig. 3 the change of the connection weights can be defined in a general way as follows:
1w =
∂⃗y(w, ⃗x, y⃗t ) ∂ w(t )
T
· µ · diag sign(S⃗) · |⃗ε|.
(43)
This is an expansion of the standard gradient descent method ⃗ This adapted adding the sign of the sliding surface function S. backpropagation weight change can be used within (39) and (40) and allows to dynamically calculate the learning rates. In the following, this procedure is explained starting with the definition of the sliding surface function S⃗ from (43):
⃗˙ + λ · ε⃗. S⃗ = ε
(44)
⃗ and ε⃗˙ are states of the ANN controlled via sliding This means that ε ⃗ mode. For S = 0 the system is directly on the sliding surface, where the network error converges towards 0, if the factor λ is positive: ⃗˙ + λ · ε⃗ = 0 ⇒ ε⃗ = ε⃗(t0 ) · e S⃗ = ε
−λ(t −t0 )
.
(45)
A general approach to prove existence as well as reachability of the sliding mode and therefore stability is to choose a Lyapunov candidate function for the sliding surface S⃗ as shown in (46). V (S⃗) =
1 2
S⃗T S⃗.
(46)
(47)
This means that the algorithm converges towards the sliding surface, when the absolute value of the sliding surface function S⃗ demagnifies with each time step. In the following, the index k denotes the actual training step for clarity : S⃗(t ) = S⃗k ;
S⃗(t + Ts ) = S⃗k+1 ;
···.
(48)
⃗ is determined using the In addition, the derivative of the error ε step size of the control loop Ts . ε⃗(t ) − ε⃗(t − Ts ) . ε⃗˙ ≈
(49)
Ts
With (44) and (49) S⃗k as well as S⃗k+1 can be calculated.
⃗Sk = ε⃗˙ k + λ · ε⃗k = λ + 1 ε⃗k − 1 ε⃗k−1 Ts
⃗˙ k+1 + λ · ε⃗k+1 = S⃗k+1 = ε
(50)
Ts
λ+
1 Ts
ε⃗k+1 −
1 Ts
ε⃗k .
(51)
⃗k+1 in (51) has to be determined, which can The unknown error ε also be written as: ε⃗k+1 = ε⃗k + 1ε⃗k .
(52)
⃗k within one time step Ts can be Here, the change of the error 1ε calculated: 1ε⃗k = ε⃗k+1 − ε⃗k = (⃗yt ,k+1 − y⃗k+1 ) − (⃗yt ,k − y⃗k )
= 1y⃗t ,k − 1y⃗k ,
(53)
where the index t denotes target output values. For the unknown ⃗k a Taylor series expansion is assumed: parameter 1y
1y⃗k =
∂⃗yk (wk , ⃗xk ) ∂⃗yk (wk , ⃗xk ) 1wk + 1⃗xk . ∂ wk ∂⃗xk
The derivatives
∂⃗yk (wk ,⃗xk ) ∂ wk
and
∂⃗yk (wk ,⃗xk ) ∂⃗xk
(54)
can be calculated by
⃗k through the ANN. backpropagation of the network output y The parameter 1wk is given in (43) and already known for the calculation of S⃗k+1 . The change of the network inputs 1⃗ xk and of ⃗t ,k are assumed to be minimal for a small step the target outputs 1y size Ts . Putting these relations into (47) yields: ⃗ Sk 1 ∂⃗yk ∂⃗yk 1 > λ+ ε⃗k + 1y⃗t ,k − 1wk − 1⃗x − ε⃗k . Ts ∂ wk ∂⃗xk k Ts (55) For clarity the parameters ai and bi are introduced, where i is the index of the network output, whereas all networks used for the presented application have only one output neuron (i = 1).
∂⃗yk 1 ε⃗k,i + 1y⃗t ,k,i − 1⃗xk − ε⃗k,i Ts ∂⃗x Ts i T ∂⃗yk ∂⃗yk 1 bi = λ + diag sign(S⃗k ) |⃗ εk | . Ts ∂ wk ∂ wk
ai =
λ+
1
(56)
(57)
i
Due to stability reasons it is known from (45) that the parameter λ is positive, leaving only the learning rate µ an unknown parameter
272
T. Krüger et al. / Neural Networks 32 (2012) 267–274
to ensure reachability of the sliding mode. Using now (56) and (57) on (47) yields the following inequality:
⃗ Sk,i > |ai − µ · bi | .
(58)
For the stable limitation of µ we finally get:
−
S⃗k,i
for
S⃗k,i bi for
+
bi
ai bi
<µ<
S⃗k,i bi
+
ai
bi
S⃗k,i > 0 ∧ bi > 0 ∨ S⃗k,i < 0 ∧ bi < 0 ,
+
ai bi
<µ<−
S⃗k,i bi
+
ai
bi
S⃗k,i > 0 ∧ bi < 0 ∨ S⃗k,i < 0 ∧ bi > 0 .
(59)
4. Sliding mode online learning results The controller architecture using the two different learning approaches is tested and validated in simulation runs with different time frames and atmospheric conditions as well as varying system failures. The vertical and horizontal profile of the flight trajectory used for this comparison is depicted in Fig. 4. This trajectory represents a dynamic flight envelope of a small fixed-wing UAS, including close curve as well as climb and descent maneuvers, which allow an appropriate analysis of the flight performance. The flight path is defined using cubic splines leading to a mathematically described trajectory, which is advantageous for the control process (Krüger et al., 2010). The first analysis focusses on the speed of convergence of the different training algorithms in comparable failure situations. For this reason the first 20 s of the flight path from Fig. 4 are taken and charged with a constant inversion error of ∆ = 5 rad/s2 in second five. This is added to the pitch rate channel of the pseudo control signal νω , illustrated in Fig. 2. The mentioned part of the flight path displays stationary cruise flight merging into a climb maneuver combined with a close curve, which is demanding considering the high constant inversion error. Also, for direct comparison of the learning approaches the PCH is deactivated, as it influences the error dynamics and with it the network training. The results of gradient descent (GD) and sliding mode control (SMC) based learning without modeled wind and turbulence as external disturbances are depicted in Fig. 5. The learning rate for GD-learning is iteratively determined with µ = 0.03. The offset inversion error, induced in second five, leads to a jump in the filtered error term ς , which is used for training and is therefore equivalent to the network error ε . Both training approaches show nearly identical behavior and cancel the network error completely after about five seconds, while reducing it significantly after only one second. It can also be seen, that the SMC learning rate jumps at the beginning, adapting the untrained network to the system and demagnifies after canceling the inversion error, keeping the network weights at a nearly optimal state. In Fig. 6 the mode of functioning of the complete control strategy with presence of the inversion error described before is observable. In (a) the outputs of the neural network νad , the robustifying element νr and their combined time response νad,r are illustrated. In consistence to the time response of the network error ε from Fig. 5, the combined signal νad,r completely cancels the inversions error ∆ after less than five seconds. It is also apparent, that the robustifying term νr accelerates the error cancelation at first and then strongly decreases in favor of the neural network. In Fig. 6(b) the overall control signal νω is depicted as a combination of the inversion error ∆, the adaptive term νad,r and the combined
Fig. 4. Vertical and horizontal profile of the flight path.
Fig. 5. Comparison between GD and SMC-learning for a reference failure situation without wind and learning rate of the SMC algorithm.
controller signal ν = νR + νC . Again, it can be seen, that the inversion error is canceled by the ANN after a short learning phase, which leaves the basic control signal ν and with it νω inactive after the short period of excitation because of the jump in the inversion error. To further analyze the robustness of the training algorithms the scenario described before is repeated inducing additional wind and a turbulence spectrum as realistic external disturbances (Brockhaus et al., 2011). The velocities of the wind vector components, given in the geodetic coordinate system, are 1 m/s for u, v, w , while the turbulence spectrum leads to significant gusts of up to 4 m/s. The results are depicted in Fig. 7. Fig. 7 underlines two main aspects: The noisy disturbances characteristics of the turbulence spectrum added to the inversion error ∆ cannot be compensated by the GD algorithm, leading to increased oscillation of the network error as the flight attitude changes from cruise to climb at about 15 s. And, the SMC algorithm is capable of canceling these effects because of the constant adaptation of the learning rate according to the actual external disturbance. This keeps the error ε as a defined state of the neural network (see Section 3) within the subspace of the sliding mode and hence strongly limits it.
T. Krüger et al. / Neural Networks 32 (2012) 267–274
a
b
273
To compare the speed of convergence and robustness of both algorithms not only under a reference failure situation, an actuator damage in the elevator is simulated. This system failure limits the maximum negative elevator deflection from originally −20° to −12° strongly reducing control effectiveness. The PCH is activated to evaluate the overall architecture and again this is combined with wind and turbulence leading to the results shown in Fig. 8. Both control architectures perform similarly over a longer time frame until at about 80 s, going into a descent maneuver during curve flight, the GD-based controller starts to oscillate around the target flight path. During the recovery from descent flight at about 100 s, the aircraft continuously oscillating reaches a stall situation and crashes. Apparently, the actuator damage and the atmospheric disturbances combined with the coupling effects of lateral on longitudinal motion induce instability, which cannot be compensated by the GD-based ANN. The SMC-based controller on the other side handles this difficult flight control task with reduced overall performance instead of a loss of the system. These results show that SMC learning offers improved robustness and a higher speed of convergence in the presence of realistic system failure and turbulence scenarios. Hence, the concept of sliding mode learning can improve the established control strategy of dynamic inversion combined with neural networks.
Fig. 6. Time response of the individual control signals using SMC-learning for a constant inversion error.
5. Conclusion
Fig. 7. Comparison between GD and SMC-learning for the same failure situation with wind and turbulence.
Due to their learning capabilities neural networks are very well suited for complex nonlinear control problems, especially regarding fault tolerance. In this work the established nonlinear adaptive flight control strategy of feedback linearization is implemented for a fixed-wing UAS and expanded using sliding mode online learning. The SMC learning algorithm is derived from variable structure control theory and considers a neural network and its training as a control process. While assuring output stability, this allows for dynamic calculation of the learning rates of an ANN and increases the robustness against external disturbances. The adaptive controller using standard backpropagation learning shows adequate and stable performance over a wide nonlinear flight envelope, yet it is outperformed by SMC-learning when facing significant system failures and turbulence. By apprehending the training of an ANN as a control problem an improvement of the online weight adaption due to a variable learning rate can be achieved while assuring system stability. Hence, the implementation SMC-learning seems a sensible expansion for fault-tolerant, adaptive control purposes, since considerable improvements regarding system performance can be realized.
Fig. 8. Comparison of altitude and flight path error for both training algorithms. The results include external disturbances from wind with additional turbulence as well as an elevator actuator degradation, resulting in a significant error. The gradient descent algorithm is not able to compensate for this error and the aircraft crashes after about 100 s.
274
T. Krüger et al. / Neural Networks 32 (2012) 267–274
Acknowledgment This research has been conducted within the project ‘‘Bürgernahes Flugzeug’’ which is supported by the German federal state of Lower Saxony. References Brockhaus, R., Alles, W., & Luckner, R. (2011). Flugregelung. Berlin: Springer-Verlag, ISBN: 978-3-642-01442-0. Burken, J., Nguyen, N. T., & Griffin, B. J. (2010). Adaptive flight control design with optimal control modification on an F-18 aircraft model. In AIAA infotech@aerospace conference (pp. 2010–3364). Atlanta, Georgia: AIAA. Calise, A., Lee, S., & Sharma, M. (2000). Development of a reconfigurable flight control law for the X-36 tailless fighter aircraft. In AIAA guidance, navigation, and control conference (pp. 2000–3940). Denver, CO: AIAA. Das, S., Kuttieri, R. A., Sinha, M., & Jategaonkar, R. (2010). Neural partial differential method for extracting aerodynamic derivatives from flight data. Journal of Guidance, Control and Dynamics, 33(2), 376–384. Dierks, T., & Jagannathan, S. (2010). Output feedback control of a quadrotor UAV using neural networks. IEEE Transactions on Neural Networks, 21(1), 50–66. Ferrari, S., & Stengel, R. (2004). Online adaptive critic flight control. Journal of Guidance, Control and Dynamics, 27(5), 777–786. Holzapfel, F. (2004). Nichtlineare adaptive Regelung eines unbemannten Fluggerätes. Ph.D. Thesis. Lehrstuhl für Flugmechanik und Flugregelung. Technische Universität München. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359–366. Isidori, A. (1996). Nonlinear control systems. Berlin: Springer Verlag. Johnson, E. N. (2000). Limited authority adaptive flight control. Ph.D. Thesis. Georgia Institute of Technology. Kaynak, O., Erbatur, K., & Ertugrul, M. (2001). The fusion of computationally intelligent methodologies and sliding-mode control—a survey. IEEE Transactions on Industrial Electronics, 48(1), 4–17. Kim, N. (2003). Improved methods in neural network-based adaptive output feedback control with applications to flight control. Ph.D. Thesis. School of Aerospace Engineering. Georgia Institute of Technology. Krüger, T., Mößner, M., Kuhn, A., Axmann, J., & Vörsmann, P. (2010). Sliding mode online learning for flight control applications in unmanned aerial systems. In WCCI—world congress on computational intelligence (pp. 3738–3745). Barcelona, Spain: IEEE. Krüger, T., Schnetter, P., Placzek, R., & Vörsmann, P. (2011). Nonlinear adaptive flight control using sliding mode online learning. In International joint conference on neural networks (pp. 2897–2904). San José, California, USA: IEEE.
Kroonenberg, A. (2009). Airborne measurement of small-scale turbulence with special regard to the polar boundary layer. Ph.D. Thesis. Zentrum für Luft- und Raumfahrt. Technische Universität Braunschweig. Lewis, F. L., Yegildirek, A., & Liu, K. (1996). Multilayer neural-net robot controller with guaranteed tracking performance. IEEE Transactions on Neural Networks, 7(2), 388–399. doi:10.1109/72.485674. Narendra, K., & Annaswamy, A. (1987). A new adaptive law for robust adaptation without persistent excitation. IEEE Transactions on Automatic Control, 32(2), 134–145. doi:10.1109/TAC.1987.1104543. Nied, A., Seleme, S. I., Parma, G. G., & Menezes, B. R. (2005). On-line adaptive neural training algorithm for an induction motor flux observer. In IEEE power electronic specialists conference. Nied, A., Seleme, S. I., Parma, G. G., & Menezes, B. R. (2007). On-line neural training algorithm with sliding mode control and adaptive learning rate. Neurocomputing, 70, 2687–2691. Pashilkar, A., Sundararajan, N., & Saratchandran, P. (2006). A fault-tolerant neural aided controller for aircraft auto-landing. Aerospace Science and Technology, 10, 49–61. Rojas, R. (1996). Neural networks—a systematic introduction. Berlin: Springer-Verlag. Russell, S., & Norvig, P. (2004). Künstliche Intelligenz—ein moderner Ansatz. München: Pearson Education, ISBN: 3-8273-7089-2. Rysdyk, R., & Calise, A. (2005). Robust nonlinear adaptive flight control for consistent handling qualities. IEEE Transactions on Control Systems Technology, 13(6), 896–910. Sarpturk, S., Istefanopulos, Y., & Kaynak, O. (1987). On the stability of discrete-time sliding mode control systems. IEEE Transactions on Automatic Control, AC-32(10), 930–932. Scholtz, A., Krüger, T., Wilkens, C. -S., Krüger, T., Hiraki, K., & Vörsmann, P. (2011). Scientific application and design of small unmanned aircraft systems, In 14th Australian international aerospace congress. Melbourne, Australia. Paper no. 58.00. Schuhmann, J., & Liu, Y. (2010). Applications of neural networks in high assurance systems. Berlin: Springer Verlag, ISBN: 978-3-642-10689-7. Seifert, J. (2003). Identifzierung nichtlinearer aerodynamischer Derivative mit einem modularen neuronalen Netzwerk. Ph.D. Thesis. Universität der Bundeswehr München. Fakultät für Luft- und Raumfahrttechnik. Institut für Systemdynamik und Flugmechanik. Shakev, N. G., Topalov, A. V., & Kaynak, O. (2003). Sliding mode algorithm for online learning in analog multilayer feedforward neural networks. Lecture Notes in Computer Science, 2714, 1064–1072. Topalov, A. V., & Kaynak, O. (2001). Online learning in adaptive neurocontrol schemes with a sliding mode algorithm. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 31(3), 445–450. Utkin, V., Guldner, J., & Shi, J. (2009). Sliding mode control in electro-mechanical systems. London: CRC Press. Winkler, S. (2007). Zur Sensordatenfusion für integrierte Navigationssysteme unbemannter Keinstflugzeuge. Ph.D. Thesis. Zentrum für Luft- und Raumfahrt. Technische Universität Braunschweig. Yu, X., & Kaynak, O. (2009). Sliding-mode control with soft computing: a survey. IEEE Transactions on Industrial Electronics, 56(9), 3275–3285.