Neural-network-based iterative learning control of nonlinear systems

Neural-network-based iterative learning control of nonlinear systems

ISA Transactions xxx (xxxx) xxx Contents lists available at ScienceDirect ISA Transactions journal homepage: www.elsevier.com/locate/isatrans Pract...

872KB Sizes 0 Downloads 98 Views

ISA Transactions xxx (xxxx) xxx

Contents lists available at ScienceDirect

ISA Transactions journal homepage: www.elsevier.com/locate/isatrans

Practice article

Neural-network-based iterative learning control of nonlinear systems ∗

Krzysztof Patan , Maciej Patan Institute of Control and Computation Engineering, University of Zielona Góra, ul. Szafrana 2, 65-516 Zielona Góra, Poland

article

info

Article history: Received 13 February 2019 Received in revised form 13 August 2019 Accepted 28 August 2019 Available online xxxx Keywords: Iterative learning control Nonlinear process Neural networks Convergence analysis

a b s t r a c t This work reports on a novel approach to effective design of iterative learning control of repetitive nonlinear processes based on artificial neural networks. The essential idea discussed here is to enhance the iterative learning scheme with neural networks applied for controller synthesis as well as for system output prediction. Consequently, an iterative control update rule is developed through efficient data-driven scheme of neural network training. The contribution of this work consists of proper characterization of the control design procedure and careful analysis of both convergence and zero error at convergence properties of the proposed nonlinear learning controller. Then, the resulting sufficient conditions can be incorporated into control update for the next process trial. The proposed approach is illustrated by two examples involving control design for pneumatic servomechanism and magnetic levitation system. © 2019 ISA. Published by Elsevier Ltd. All rights reserved.

1. Introduction Iterative learning control (ILC) was proposed the first time in the late 1970 as a solution of accurate reference tracking of repetitive processes. Nowadays, ILC is classified as a modern, intelligent control scheme, very popular among control engineers. The popularity of iterative learning control follows from the fact that closed-loop control approaches behave in the same way for each subsequent repetition of the process (so-called trial). In the view of the published works on this topic for the class of linear time-invariant systems, we can say that theoretical analysis can be treated as mature and nearly complete. For reviews, the interested reader can be referred to comprehensive papers and monographs [1–4]. However, in the case of nonlinear systems there are no universal control design and analysis procedures available. Since nonlinear processes working in repeated sequence of operations are frequently observed in production systems, so such contributions would be highly desirable [5]. In fact, the motivations stem directly from various tangible engineering applications where nonlinear process in combination with repeated trials requires a dedicated control design to achieve a high tracking accuracy and performance. ILC demonstrated its great capabilities in various industrial applications such as robotics and servoing [6– 8], injection-moulding machines [9], elastic structures [10,11], combustion processes [12] and photolithography [13,14]. Although the need for systematic approaches dedicated for nonlinear systems was widely recognized with some existing ∗ Corresponding author. E-mail addresses: [email protected] (K. Patan), [email protected] (M. Patan).

interesting solutions, i.e. related to control-affine or linear parameter varying systems [15,16], most methods communicated by various authors usually use a linear learning controller [15,17]. However, complex industrial plants working in many operating points they require the application of more sophisticated approaches. Thus, controllers with a nonlinear or time-varying inherent structure [5] are sensible and reasonable alternative to linear solutions. Taking into account that nowadays neural networks are perceived as an efficient tool to deal with nonlinear systems they are suitable candidate to nonlinear ILC synthesis. It is important to stress that there are very few ILC solutions developed by means on neural networks. Substantial difficulties can be mainly attributed to developing stability and convergence conditions for nonlinear learning controllers. Neural-networkbased ILC was proposed in the paper [18] where the authors used recurrent networks to represent both a plant and a controller. However, convergence of the proposed ILC was not investigated at all. In turn, the authors of the work [19] also used two neural networks but the control law was obtained applying feedback linearization. Moreover, the control scheme was elaborated for a particular class of nonlinear processes. Similar treatment has been used in the paper [20] where the neural network was used to represent the nonlinear part of the system and then the model was used to realize iterative learning linear predictive control. In [21] the radial basis function network was used to deal with nonlinear uncertainties. Summarizing, most often neural networks are used for modelling of a plant only while as a controller the classical linear formula is employed [22,23]. A result presented in this work is the extension of the research undertaken by the authors in the field of nonlinear ILC synthesis using neural networks where two neural network models were

https://doi.org/10.1016/j.isatra.2019.08.044 0019-0578/© 2019 ISA. Published by Elsevier Ltd. All rights reserved.

Please cite this article as: K. Patan and M. Patan, Neural-network-based iterative learning control of nonlinear systems. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.08.044.

2

K. Patan and M. Patan / ISA Transactions xxx (xxxx) xxx

used, the first one to model the nonlinear process and the second one to play the role of a time-varying learning controller [8,24]. Although the results achieved by the authors were pretty well and promising it should be stressed that to data convergence of the control signal was examined only. Moreover, the verification of the convergence conditions was done in a very restrictive manner. Finally, the training algorithm used constant learning parameters which leaded to problems with controller parameters update. In particular the current work:

• provides the zero error at convergence property for both P-type and D-type learning controllers,

• proposes the learning process with adaptable training parameters, cf. a step-decay for the regularization coefficient carried out in the trial domain and an exponential decay of the learning rate carried out in the time domain, • suggests less restrictive practical verification of convergence conditions. Two general advantages of the resulting approach can be directly indicated:

• the control design itself is not dependent on linearization of any type as in the [25] where the linearization around the nominal trajectories is applied or [14] where the linear parametrization of the feedforward controller is used, • the complexity of the control design is kept on the relatively low level, as the online control update do not involve solving any complex auxiliary optimization/feasibility problems. Finally, the approach is illustrated with two non-trivial examples of repetitive nonlinear control: the pneumatic servo and magnetic levitation laboratory stand.

In the paper we investigate a case where a mathematical model of the plant is not available. Then to develop a plant model a state-space neural network is employed [24,26]:

(

)

x ˆ p (k) + V u1 up (k) + V b1 + V b2 , ˆ xˆ p (k), up (k)) = V w g( 2 σ V 1x

(5)

b where xˆ p (k) is estimated state, V u1 , V x1 , V b1 , V w 2 and V 2 are adaptvm vm able weight parameters, σ : R → R is a nonlinear activation function, R is a set of real numbers and vm is the number of hidden neurons. The commonly applied activation function is the hyperbolic tangent one σ (x) = tanh(x). Although the model (5) contains only one hidden layer it is in a position to represent any nonlinear continuous function with the assumed accuracy. Then it is well suited to represent systems of the Lipschitz kind.

Lemma 1 ([26]). For the system (5) the Lipschitz constant can be derived from model weighs as follows: L = max{∥V 2 V x1 ∥, ∥V 2 V u1 ∥}. In order to properly train the neural model input–output data describing a plant should be recorded. In the paper we deal with poorly damped and unstable systems then data should be recorded using closed-loop control. For the considered statespace neural models, we need to select the model order, the number of hidden processing nodes and an activation function, e.g. by means of the trial-and-error method. After that the neural network is trained and then validated using the testing data. If the quality of the neural model is satisfactory the model is kept for control synthesis purposes. Otherwise, the identification process is repeated using a different neural network settings. 3. Controller design

2. System modelling

3.1. Neural controller

In the paper the general form of nonlinear state-space discrete-time systems is considered

For the clarity of presentation and without loosing the generality of discussion we focus here our attention to the single-input single-output systems. Nevertheless, the presented approach can be easily generalized to the class of multi-input multi-output systems. Here, we deal with current-iteration ILC (see Fig. 1) intended for control unstable or poorly damped systems [1]. The control signal is composed of two components:

xp (k + 1) = g(xp (k), up (k))

,

yp (k) = C xp (k)

(1)

where k = 0, . . . , N − 1 stands for the discrete-time index, N is the length of a trial, p ∈ N represents the trial, N is a set of nonnegative integers, xp (k), up (k) and yp (k) are state, input and output of a system at the pth trial, respectively and g(·, ·) is a nonlinear function. Throughout the paper the following standard properties of the considered class of systems are assumed. Assumption A1. For any realizable reference trajectory yr (k) defined over k ∈ N
.

(2)

ff up (k) = ufb p (k) + up (k),

(6)

fb where up (k) represents the control generated by a feedback conff troller and up (k) is the output of a learning controller operating in

the feedforward path. The idea is to realize the learning controller in the form of nonlinear time-varying function f (·) [8]. Thus, we can develop a controller with pretty flexible structure, able to adapt to changing working conditions of a plant. In order to design a learning controller a neural network is employed [8,24]:

Assumption A2. For all trials the same initial conditions are

uffp (k) = f (ϕp−1 (k)),

employed:

where ϕp−1 (k) represents a regression vector containing signals recorded during the previous trial. In this paper we cope with the following two structures:

∀p xp (0) = xr (0).

(3)

Assumption A3. g(xp (k), up (k)) is globally Lipschitz in xp and up on [0, N − 1], that is for all k ∈ [0, N − 1]

≤ L (∥x1 (k) − x2 (k)∥ + |u1 (k) − u2 (k)|) , for a positive constant L < ∞.

• P-type regressor: ϕp−1 (k) = [up−1 (k), ep−1 (k)]T ,

∥g(x1 (k), u1 (k)) − g(x2 (k), u2 (k))∥ (4)

(7)

(8)

• D-type regressor: ϕp−1 (k) = [up−1 (k), ep−1 (k + 1)]T .

(9)

Please cite this article as: K. Patan and M. Patan, Neural-network-based iterative learning control of nonlinear systems. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.08.044.

K. Patan and M. Patan / ISA Transactions xxx (xxxx) xxx

3

For detailed derivation of the training algorithm the interested reader is referred to the previous works of the authors [8,24, 26]. Applying the stochastic gradient method the weight update becomes: ∂ Jp ∆θ p = −η(k) , (12)

∂θ p

Fig. 1. Current-iteration ILC utilizing neural networks.

The interesting characteristic of the neural-network-based learning controller is that using a suitable representation of the vector ϕp−1 (k) a controller of the specific kind can be obtained, e.g. PD type, higher-order or dynamic controller. Let us consider the static neural network represented by the following difference equation:

¯ 2,p σ¯ (W ¯ 1,p ϕ¯ p−1 ), uffp (k) = W

(10)

b ¯ 1,p = [W u1,p W e1,p W b1,p ] ∈ Rvc ×3 and W ¯ 2,p = [W w where W 2,p W p,2 ] ∈ R1×(vc +1) are adaptable weight matrices, precisely speaking W u1,p , W e1,p and W b1,p are input-to-hidden layer weights conb nected to the input, error and bias, respectively, W w 2,p , W 2,p represents weights and bias of the hidden-to-output layer, σ¯ (·) = [σ (·)T 1]T , σ : Rvc → Rvc is the activation function, ϕ¯ p−1 (k) = [ϕTp−1 (k) 1]T , and vc is the number of hidden units. By using a proper form of the regression vector ϕp−1 (k) the neural network (10) is able to represent the ILC controller of the appropriate kind. Bearing in mind, that the neural model needs to be retrained after each trial our intention was to use as a simple model structure as possible. Then during experiments a very simple model with one hidden layer and a relatively small number of processing units was tried first, increasing the number of processing units if necessary. It is important to say that the proposed approach renders it possible to use much more complex models, i.e. models with more hidden layers, deep neural network models or ever recurrent models.

3.2. Controller training In the presented control scheme the parameters of the ILC controller are updated after each consecutive trial. Let θ p ∈ RM be a generalized network parameter vector at the pth trial, ¯ 1,p )T , W ¯ 2,p ]T , where svec(W ¯ 1,p )T is the column i.e. θ p = [svec(W

¯ T1,p . The optimal vector achieved by stacking the columns of the W ⋆ vector of the controller parameters θ p is derived as a solution of the following optimization problem:

[ ⋆

θ p = arg min

N −1 1∑

2

k=0

(yr (k) − yp (k; θ p )) + 2

1 2

µp

M ∑

θ

) N ( ∑ ∂ Jp ∂ yˆ p (k) ∂ up (k − 1) =− ep (k) + µp θ p . ∂θ p ∂ up (k − 1) ∂θ p

(13)

k=1

In order to derive the first partial derivative in (13) we need to know a model of the system. Bearing in mind that in many cases the mathematical model of an industrial process is unknown, to accomplish this task the neural network defined in the state-space (5) is applied here. In this way we get

( ′ ) ∂ yˆ p (k) u = CV w 2 σ ◦ V1 , ∂ up (k − 1)

(14)

where σ ′ is the activation function derivative and ◦ is the element-wise product. To derive the second partial derivative in (13) the structure of the neural controller (7) needs to be analysed. At this point it should be pointed out that the changes in the learning controller parameters have the indirect impact on the reaction of the feedback controller. However, together with the convergence of the feedforward control the response of the feedback controller changes very slightly and can be treated as a constant signal. Therefore, in the majority of existing contributions, this aspect, is usually neglected. Since it is somewhat beyond the main subject of the paper, for the clarity of presentation, in this work we reduce our attention to the iterative learning controller only, assuming that influence of feedback controller response on the learning controller parameters is negligible. Finally, the following two update rule are derived:

¯ 1,p : • for the weight matrix W ) ∂ up (k − 1) ( w T = (W 2,p ) ◦ σ ′ ϕ¯ Tp−1 (k − 1), ¯ ∂ W 1 ,p

(15)

¯ 2,p : • for the weight matrix W ∂ up (k − 1) ¯ 1,p ϕ¯ p−1 ), = σ¯ (W ∂ W 2 ,p

(16)

3.3. Summary of the design procedure

] 2 p,i

where η(k) is the learning rate. Based on the experience of our previous works on this topic where we used a constant learning rate, in this work we decided to use an adaptable learning rate to improve and speed up the training. The adaptation of the learning rate can be carried out in the similar manner as in case of regularization coefficient, however the learning rate is adapted along the discrete-time domain while the regularization parameter is adapted along the trials domain. The gradient of the cost (11) with respect to the vector of parameters can be calculated in a straightforward manner as follows:

.

(11)

i=1

where µp is a penalty coefficient during the pth trial and θp,i stands for the ith component of θ p . The cost (11) contains the weight decay term in order to improve generalization properties of neural controller [24]. However, regularization prevents achieving the cost equal to zero and consequently the neural controller could not be trained in order to satisfy ep (k) = 0 as p tends to infinity. A possible solution of this problem proposed in this work is to use an adaptable regularization parameter, e.g. using a step decay or exponential decay schedule.

The synthesis of neural-network-based ILC is summarized by Algorithm 1. In Step 4.2 the algorithm checks convergence violation. To do this we need to determine the convergence condition of the ILC scheme. This problem is further discussed in the subsequent section. 4. ILC analysis of convergence The supreme objective when designing ILC is to satisfy convergence of the control. In the paper two convergence properties are discussed:

Please cite this article as: K. Patan and M. Patan, Neural-network-based iterative learning control of nonlinear systems. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.08.044.

4

K. Patan and M. Patan / ISA Transactions xxx (xxxx) xxx

Algorithm 1: Neural-network-based ILC Step Step Step Step Step Step Step

1. 1. 2. 3. 4. 4.1. 4.2.

Design a feedback controller Evaluate a control system to get a portion of input–output data T Using the set T train a neural network model of a plant (5) Select the structure of ILC controller (7)–(9) for each trial do update the controller parameters using (14)–(16) check the convergence condition violation (35) for P-type ILC or (51) for D-type ILC

P1 Convergence property lim up (k) = ur (k);

p→∞

(17)

P2 Zero error at convergence property

∀k lim ep (k) = 0. p→∞

∥∆xp (k)∥ ≤

Definition 1. Let fu (k) and fe (k) be partial derivatives of the learning controller (7) and (8) with respect to the input and tracking error, respectively, and are defined in the following way:

(20)

k∈N
Now, we are in position to provide sufficient convergence conditions of the P-type neural-network-based ILC controller. Theorem 1 ([26]). Let consider the system (1) assuming that A1–A3 hold, then the learning controller (7) with regressor (8) is convergent (Property P1) if (21) k

k

+ sup ∥fe (k)C ∥ sup α −λk k

k−1 ∑

k

Lk−i |∆up (i)|.

i=0

By setting α equal to the Lipschitz constant L and performing similar calculations as presented in [26,27], the final form of last supremum in (27) is sup α −λk

k ∑

α k−i |∆up (i)| ≤ ∥∆up (k)∥λ · α −(λ−1) (28)

i=0

·

1 − α −(λ−1)(N −1) 1 − α −(λ−1)

.

It is easy to show that using λ sufficiently large and taking into account that α > 1 lim

1 − α −(λ−1)(N −1)

λ→∞

α λ−1 − 1

= 0,

(29)

Finally, (27) can be rewritten as follows (30)

k

γ <1 then

∀k ∈ N
(23)

lim ∥∆up (k)∥λ = 0

(31)

p→∞

which implies

∀k ∈ N
lim ∥∆up (k)∥ = 0.

(32)

p→∞

From (32) one can deduce that ∀k ∈ N
lim up (k) = ur (k) and

p→∞

the convergence property P1 is satisfied.

where ∆xp (k) = xr (k) − xp (k). Substituting (23) into (22) and applying the norm to newly derived formula it gives (24)

Similarly, taking into account that the system is globally Lipschitz and taking the norm of the both sides of the state equation (23) we obtain

∥∆xp (k + 1)∥ ≤ L∥∆xp (k)∥ + L|∆up (k)|.

k

(22)

where ∆up (k) = ur (k) − up (k), and ep (k) = ∆yp (k) = yr (k) − yp (k). Using the assumption A1 the system (1) can be expressed as

∥∆up+1 (k)∥ ≤ ∥fu (k)∆up (k)∥ + ∥fe (k)C ∆xp (k)∥.

sup α −λk ∥∆up+1 (k)∥ ≤sup α −λk ∥fu (k)∆up (k)∥

where γ = sup∥fu (k)∥. It is obvious that if

Proof. The proof can be derived analogously to the approach presented in [27] and [26]. Applying Taylor’s formula to the controller (7) and (8) and taking first-order approximation it yields

∆xp (k + 1) = g(xr (k), ur (k)) − g(xp (k), up (k)), ∆yp (k) = C ∆xp (k)

Now, substituting (26) into (24) and applying the λ-norm to the both sides of derived inequality it yields

∥∆up+1 (k)∥λ ≤ γ ∥∆up (k)∥λ

where γ = sup∥fu (k)∥.

∆up+1 (k) = fu (k)∆up (k) − fe (k)∆yp (k)

(26)

i=0

k

with α > 1 and a finite constant λ > 0.

γ < 1,

Lk−i |∆up (i)|, k = 1, . . . , N − 1.

(27) (19)

Definition 2. The λ-norm of a vector z(k) is defined as follows

∥z(k)∥λ = sup α −λk ∥z(k)∥

k−1 ∑

(18)

Before starting convergence analysis the P-type ILC let introduce definitions.

∂ f (up (k), ep (k)) ∂ f (up (k), ep (k)) fu (k) = , fe (k) = ∂ up (k) ∂ ep (k)

According to the assumption A2 we know that ∥∆xp (0)∥ = 0, then taking advantage of the recursive nature of the system (25) it can be rewritten as:

(25)

Remark 1. To guarantee the convergence of the control scheme the Lipschitz constant of the neural network model have to be greater than 1. Then, the neural model weight matrices should be properly updated. The simple heuristic to achieve this goal was proposed in the previous work of the authors [26]. Remark 2. It should be noted that achieved convergence conditions may be easily implemented by checking values of the

Please cite this article as: K. Patan and M. Patan, Neural-network-based iterative learning control of nonlinear systems. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.08.044.

K. Patan and M. Patan / ISA Transactions xxx (xxxx) xxx

controller parameters. Let consider the controller structure (10) with the regressor (8), the partial derivative fu (k) is derived as follows [26]: T ′ fu (k) = W w 2,p ◦ σ

(

)T

W u1,p .

(33)

In our previous works [24,26] we used quite restrictive approximation u sup∥fu (k)∥ = ∥W w 2,p W 1,p ∥.

(34)

k

However, such a condition constitutes the worst case which assumes that a derivative of the activation function for all time instants is always equal to one. In this paper a less restrictive condition of the form T u ′ T sup∥(W w 2,p ◦ σ ) W 1,p ∥ < 1

(35)

k

is applied which can result in better convergence rate of the proposed ILC scheme. Obviously, σ ′ can be readily calculated using output of the hidden neurons [28]. Corollary 1. Given the nonlinear system (1) with the P-type controller and assuming that A1–A3 hold, a sufficient condition for the zero error at convergence (property P2) is the condition (21). Proof. Let apply λ-norm to both sides of (26) with L = α yields 0 ≤ ∥∆xp (k)∥λ ≤ sup α −λk k

k−1 ∑

α k−i |∆up (i)| (36)

i=0

(

≤ ∥∆up (k)∥λ α

−(λ−1)(N −1) −(λ−1) 1 − α

)

1 − α −(λ−1)

lim ∥∆xp (k)∥λ = 0,

p→∞

lim ∥∆xp (k)∥ = 0.

(38)

lim ∥∆yp (k)∥ = lim ∥ep (k)∥ = 0. p→∞

(39)

lim ep (k) = 0.

p→∞

(40)

which leads to property P2 given by (18). This completes the proof. The same reasoning is used to investigate convergence of the D-type neural-network-based ILC. Let us define the sensitivities of the controller. Definition 3. Let fud (k) and fed (k) be partial derivatives of the learning controller (7) and (9) with respect to the input and tracking error, respectively, defined as fud (k) =

∂ f (up (k), ep (k + 1)) d ∂ f (up (k), ep (k + 1)) , fe (k) = ∂ up (k) ∂ ep (k + 1)

(41)

Theorem 2 ([26]). for the system (1) assuming that A1–A3 hold, the learning controller (7) with regressor (9) is convergent (Property P1) if

γ1 + γ2 · α < 1,

∆up+1 (k) = fud (k)∆up (k) − fed (k)∆yp (k + 1)

(43)

Substituting (23) into (43) and applying the λ-norm we get

∥∆up+1 (k)∥λ ≤ γ1 ∥∆up (k)∥λ + γ2 ∥xp (k + 1)∥λ

(44)

Then, making use of the recursive nature of (25) under assumption A2 we can write

∥∆xp (k + 1)∥λ ≤ sup α −λk k

k ∑

Lk−i+1 |∆up (i)|.

(45)

i=0

Substituting (45) into (44) gives

∥∆up+1 (k)∥λ ≤ γ1 ∥∆up (k)∥λ + γ2 sup α −λk k

k ∑

Lk−i+1 |∆up (i)|

i=0

(46) Analogously to (27), setting L = α the last supremum in (46) will be [26]: sup α −λk

k ∑

k

α k−i+1 |∆up (i)| = ∥∆up (k)∥λ · α

i=0

1 − α −(λ−1)N 1 − α −(λ−1)

.

It is obvious that for α > 1 we can find λ sufficiently large to get 1−α −(λ−1)N = 1 and finally to obtain 1−α −(λ−1)

∥∆up+1 (k)∥λ ≤ (γ1 + γ2 · α) ∥∆up (k)∥λ .

(48)

γ1 + γ2 · α < 1 ∀k ∈ N
lim ∥∆up (k)∥λ = 0

(49)

p→∞

which further gives

Finally

∀k ∈ N
Proof. By analogy to the proof for the P-type controller we start with first-order Taylor expansion of the control law (7) with the regressor given by (9)

then

p→∞

p→∞

k

Clearly, if (37)

Now, taking into account the observation equation (1)

∀k ∈ N
k

coefficient.

(47)

which further provides

∀k ∈ N
where γ1 = sup∥fud (k)∥, γ2 = sup∥fed (k)C ∥ and α is the Lipschitz

,

with the last inequality being a direct consequence of (28). From (31) we know that under condition (21) we have the monotonous convergence of error norm to zero, i.e. ∥∆up (k)∥λ → 0 as p → ∞. Hence, the squeeze theorem applied to (36) implies

∀k ∈ N
5

(42)

∀k ∈ N
lim ∥∆up (k)∥ = 0

(50)

p→∞

and from (50) we can conclude that the convergence property P1 is satisfied. Remark 3. Derivatives (41) can be derived in the similar fashion as (33). In result the condition (42) can be rewritten as follows T u wT e ′ T ′ T sup∥(W w 2,p ◦ σ ) W 1,p ∥ + L · sup∥(W 2,p ◦ σ ) W 1,p C ∥ < 1.

k

(51)

k

Now, the zero error at convergence property can be also derived as follows. Corollary 2. Given the nonlinear system (1) with the D-type controller and assuming that A1–A3 hold, a sufficient condition for the zero error at convergence (property P2) is the condition (42). Proof. Let us return to (45) which combined with (47) it yields 0 ≤ ∥∆xp (k + 1)∥λ ≤ sup α −λk k

k ∑

α k−i+1 |∆up (i)| (52)

i=0

≤ ∥∆up (k)∥λ α

1 − α −(λ−1)N 1 − α −(λ−1)

.

Please cite this article as: K. Patan and M. Patan, Neural-network-based iterative learning control of nonlinear systems. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.08.044.

6

K. Patan and M. Patan / ISA Transactions xxx (xxxx) xxx

Convergence of the control norm (49) to zero together with (52) implies

∀k ∈ N
lim ∥∆xp (k + 1)∥λ = 0,

p→∞

(53)

which further provides

∀k ∈ N
lim ∥∆xp (k + 1)∥ = 0,

p→∞

(54)

and consecutively

∀k ∈ N
lim ∥∆yp (k)∥ = lim ∥ep (k)∥ = 0.

(55)

lim ep (k) = 0.

(56)

p→∞

p→∞

Finally

∀k ∈ N
p→∞

which provides the property P2. 5. Illustrative application examples 5.1. Pneumatic servo-mechanism As the first example presenting the delineated approach, a position control of moving mass with pneumatic servomechanism is used based on the benchmark reported in [28,29]. The physical effects related to nonlinear friction and compressibility of air result in complex nonlinear dynamics of the system [24]. 5.1.1. Process modelling The inherent feature of such systems is that they are poorly damped and they involve the integration action as well. Therefore, a closed-loop scheme has to be applied to collect data for neural network training. For this purpose, a proportional controller has been applied for the sake of simplicity. In order to persistently excite the plant, the random steps triggered randomly with levels covering piston positions from the admissible interval (−0.245, 0.245) were chosen as the reference signal. The fb gain kp of the proportional controller up (k) = kp ep (k) was set to 30. The control signal up (k) and the piston position yp (k) play the role of model input and output, respectively. The dataset for training was formed from 16000 samples. Careful analysis of the process physics provide some empirical clues for the order of the model [24,26]. The best trade-off between neural model size and acceptable modelling accuracy was obtained with network parameter values nx = 3 and vm = 5. The hyperbolic tangent and linear activation functions were applied for the hidden and output neurons, respectively. Off-line training regime with the Levenberg–Marquardt (LM) algorithm was applied with final values of Sum of Squared Errors equal to SSE=0.0438 indicating relatively high model quality. 5.1.2. ILC synthesis In this study P-type ILC controller (7)–(8) is employed. After some experiments the number of hidden units was set to be 20. The parameters of the controller are initialized with random values from the uniform distribution U (−0.1, 0.1). Such a setting guarantees that at the beginning of the simulation the convergence condition (35) is satisfied. After that the controller is retrained after each trial according to the methodology delineated in Section 3.2. To provide good performance and avoid overfitting during the learning process, the training rate was arbitrarily set to η = 0.05 and the number of training epochs was reduced to 50. In order to enable zero-error convergence the regularization parameter was adapted using step-decay schedule. Then µp was dropped by 10 every 15 trials. The initial value of µp was set to 0.0001.

Fig. 2. ILC convergence without Q-filter (a), with Q-filter (b) plot of the convergence factor γ (c).

The simple heuristic can be proposed to satisfy the convergence condition (21). Namely, an arbitrarily chosen safety margin κ is introduced (here κ = 0.01). Ts hen, if γ > 1 − κ , the weights are not updated, but are set back to the best weights stored during training process. The best weights ensure the lowest value of the tracking error norm. Fig. 2a shows the quality of the control over consecutive trials. An initial trend of significant improvement in the error norm quite quickly goes into the deterioration, even if convergence conditions are fulfilled (see Fig. 2c). This can be explained in terms of so-called learning transient [1]. A very simple yet effective method to decrease learning transients is to use a Qfilter [1,24]. In such a way, learning transients can be reduced to great extent and control update become more robust. Thus, the control law becomes: uffp (k) = Q (z)f (ϕp−1 (k)),

(57)

Q (z) being some adequately selected transfer function. In the experimental setup, the cut-off frequency of the first-order filleading to the following ter was experimentally set to 0.034 rad s transfer function: Q (z) =

0.1912 z − 0.8088

.

(58)

In Fig. 2b we can clearly see the effect of introducing Q-filter into the control law (57) (blue line). In this case nearly monotonic convergence was achieved. A very interesting aspect of learning is illustrated in Fig. 3 where the control signal is plotted together with its feedback and feedforward components after approximately 50 trials. Clearly, the learning process leads to gradual takeover of the dominant role by neural controller over the feedback one. Finally, the feedforward controller determines the main trend of control signal,

Please cite this article as: K. Patan and M. Patan, Neural-network-based iterative learning control of nonlinear systems. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.08.044.

K. Patan and M. Patan / ISA Transactions xxx (xxxx) xxx

Fig. 3. Control signal components.

7

Fig. 5. Comparison of different ILC schemes.

Fig. 6. Magnetic suspension laboratory stand.

Fig. 4. Reference tracking: the PID controller (a), and ILC (b) (the reference indicated in blue-dash–dot and the plant output indicated in red-solid).

while the feedback controller compensates the random disturbances. Results on tracking quality are presented in Fig. 4. In the first scenario, for comparison with ILC scheme, the control system with the proportional controller was investigated (cf. red line in Fig. 4a), obtaining the tracking error norm ∥ep (k)∥ = 0.3156. It is obvious, that the P controller has serious difficulties with the accurate reference tracking and achieving a zero steadystate error. In the second experiment, this control system was expanded by the learning controller as illustrated in Fig. 1. The system response is depicted in Fig. 4b. In comparison to the first scenario, the tracking accuracy of ILC are definitely better with the tracking error norm ∥ep (k)∥ equal to 0.0627. The proposed approach was also compared with other efficient control strategy — approximate predictive control (APC) [28]. For APC achieved the tracking error norm ∥ep (k)∥ = 0.0977. Clearly, ILC shows superior behaviour over the APC approach. 5.1.3. Convergence rate of the learning controllers Finally, comparative study of the ILC of the P-type with other learning controllers, namely D-type (9) and PD-type, being the

combination of (8) and (9) is investigated. The convergence results are shown in Fig. 5. Although all the learning controllers turned out to be competitive in the particular process under consideration, the best results were obtained using P-type controller. This applies to the tracking control accuracy, but also the convergence rate is significantly better than achieved by other two controllers. This may be an indirect effect of the less conservative convergence condition in the case of P-type ILC scheme. 5.2. Magnetic suspension system A second application example is related to control synthesis for the Amira MA 401 Magnetic Suspension laboratory stand [26], cf. Fig. 6. The system is characterized by unstable nonlinear dynamics with fast changes in response. The control is applied via Speedgoat real-time target machine providing a flexible Matlab/Simulink interface to the embedded controller of magnetic suspension. In this experimental study, the control task was to control the position of the levitating body in such a way as to follow the reference signal in the form of sine with the frequency of 0.5 rad s and magnitude of 4. The sampling frequency is set to fs = 100 Hz. The input control is the voltage on the coil of electromagnet and the output is the distance of the levitating body from the surface of the magnet. Both the measured control signal and the measured system response take the values from −10 to 10 V. In the case of the output this corresponds to the real distance yp (k) varying from 0 to 5 mm.

Please cite this article as: K. Patan and M. Patan, Neural-network-based iterative learning control of nonlinear systems. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.08.044.

8

K. Patan and M. Patan / ISA Transactions xxx (xxxx) xxx

Fig. 8. Magnetic suspension: tracking error norm.

Fig. 7. Magnetic suspension: trajectory tracking.

5.2.1. PID controller settings Since the system is unstable, the first step of the control design was to apply a feedback controller to stabilize it. Thus, the PID controller was applied and tuned using pole placement assuming the settling time equal to 0.2s, the overshot equal to 5% and the peak time equal to 0.08s. The plant was approximated with the following linearized discrete-time model: G(z) =

0.2065z 2 + 0.5467z + 0.07699 z 3 − 2.327z 2 + 1.297z − 0.1353

(59)

with the real poles: z1 = 0.1353, z2 = 0.6474 and z3 = 1.5446. The third pole z3 is located outside the unit circle resulting in instablility of the plant. After closing the control loop the following PID controller was determined: GPID (z) =

1.2z 2 − 1.97z + 0.81 z 2 − 0.57z − 0.43

Fig. 9. Magnetic suspension: convergence condition satisfaction.

(60)

The control problem here is quite challenging due to the nonlinearities and the fact that the magnetic suspension starts from the lowest position of the levitating body (not nominal one). Hence, at the beginning the PID controller needs some time to damp oscillations, see Fig. 7. The reference signal is represented by the blue-dash-dotted line while the output of the plant by the reddashed line. The tracking error norm in this case is ∥ep (k)∥ = 12.26. Fig. 10. Magnetic suspension: control signal components.

5.2.2. Plant modelling Data for experiment was recorded in closed-loop control using random steps as an input signal and the PID controller (60). The total dataset consist of 20000 samples. The best structure of the neural model (10) was identified via ‘trial and error’ procedure as: nx = 3 and vm = 5. Again, hyperbolic tangent and linear activation functions were applied to hidden and output layers of the neural network, respectively. The Lipschitz constant of the model is L = 4.5805, then the requirements of Definition 2 were satisfied automatically without any additional effort. 5.2.3. ILC synthesis The P-type learning controller (7) with the regressor (8) was applied. After some trials the number of hidden neurons was set to be 20 and the controller parameters were initialized randomly from the uniform distribution U (−0.1, 0.1). Then, the neural controller was retrained after each trial. The number of training epochs was set to 400. A serious problem is to properly choose the training rate especially in case of unstable systems control. Then in this study we decided to use an adaptable training rate in the form of an exponential decay schedule:

η(j) = η0 e−0.05j ,

(61)

where j is the iteration index, and η0 is the initial value of the learning rate. The setting η0 = 0.001 gave satisfactory results. This time the weight decay was not used then µp = 0.

We used also the Q-filter of the first order with the cut-off frequency equal to 9.5 rad . The convergence of the tracking error s norm is depicted in Fig. 8. Taking into account the challenging control problems, ILC performs pretty well as the tracking norm is decreasing, however with some perturbations. It is evident that the convergence condition is satisfied, cf. Fig. 9. However, the convergence indicator reaches the safety margin (set on 0.99) very quickly which is mainly caused by the relatively large number of training epochs and the application of the adaptable learning rate. Clearly, the ILC is able to significantly improve the efficiency compared to scenario with using PID controller only, what is illustrated in Fig. 7. The ILC controller is able to track the reference more closely than the PID one. After 10 trials the tracking error norm is ∥e10 (k)∥ = 7.45 contrary to ∥e1 (k)∥ = 12.26 achieved by the PID controller. Fig. 10 shows components of the control signal after 10 trials. Thanks to the learning process the feedforward component (black-solid line) is shifted in phase contrary to the feedback controller (blue-dash-dotted line) thus reducing the steady-state error. The feedforward component is smooth due to its role in compensation of repetitive disturbances while the feedforward component is much more varying because its objective is to compensate nonrepetitive disturbances.

Please cite this article as: K. Patan and M. Patan, Neural-network-based iterative learning control of nonlinear systems. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.08.044.

K. Patan and M. Patan / ISA Transactions xxx (xxxx) xxx

6. Conclusion The paper proposes an appealing alternative approach to the conventional adaptive and iterative learning control schemes for repetitive or batch processes. The application of neural networks makes it possible to obtain very flexible solution which thanks to training process adapts to the changing working conditions of the plant. The main contribution of this work is to formulate sufficient conditions for zero-error at convergence as well as to present the careful analysis on how to incorporate these conditions to the training of the neural controller. The very important feature of the resulting convergence conditions is their relative small level of conservatism, which is a direct consequence of the redundancy of neural modelling, i.e. it is not hard to fulfil them during training. This advantage cannot be overestimated from the point of view of real engineering applications. The application of adaptive training parameters is also not without significance. Experimental results clearly show that they significantly influence on the convergence speed and stability of training. Analysing the achieved results we can state that the main factors affecting the convergence rate are as follows: i improper structure of the ILC controller. Too small number of processing units may slow down the convergence rate; ii inadequate value of the learning rate. Slow adaptation of ILC parameters affects directly the convergence rate; iii quality of the plant model. The ILC controller uses partial derivatives of the plant output with respect to the control signal, then inaccurate model have a great impact on the convergence rate. Our future research interest will be directed to providing monotonic convergence of the proposed neural-network-based ILC. The requirement for monotonic convergence is used to eliminate harmful learning transients which are not acceptable in engineering applications. However, for a class of nonlinear ILC schemes this constitutes non trivial and difficult to solve problem. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgement The work was supported by the National Science Center in Poland under the grant 2017/27/B/ST7/01874. References [1] Bristow DA, Tharayil M, Alleyne AG. A survey of iterative learning control: a learning-based method for high-performance tracking control. IEEE Control Syst Mag 2006;26(3):96–114. [2] Ahn H-S, Moore KL, Chen Y. Iterative learning control. robustness and monotonic convergence for interval systems. Communications and control engineering, London: Springer-Verlag; 2007. [3] Freeman C, Rogers E, Burridge J, Hughes A-M, Meadmore K. Iterative learning control for electrical stimulation and stroke rehabilitation. Springerbriefs in control, automation and robotics, London: Springer-Verlag; 2015.

9

[4] Owens DH. Iterative learning control. An optimization paradigm. Advances in Industrial Control, London: Springer-Verlag; 2016. [5] Moore KL. Iterative learning control for determinisic systems. Advances in industrial control, London: Springer-Verlag; 1993. [6] Rogers E, Galkowski K, Owens DH. In: Control systems theory and applications for linear repetitive processes, vol. 349, Springer; 2007. [7] Bouakrif F. D-type iterative learning control without resetting condition for robot manipulators. Robotica 2011;29(7):975–80. http://dx.doi.org/10. 1017/S0263574711000191. [8] Patan K, Patan M, Kowalów D. Neural networks in design of iterative learning control for nonlinear systems. In: IFAC Papers on-Line, 20th IFAC World Congress, vol. 50; 2017. p. 13402–7. http://dx.doi.org/10.1016/j. ifacol.2017.08.2277. [9] Gao F, Yang Y, Shao C. Robust iterative learning control with applications to injection molding process. Chem Eng Sci 2001;56(24):7025–34. [10] Madadi E, Dirk S. Model-free control of unknown nonlinear systems using an iterative learning concept: theoretical development and experimental validation. Nonlinear Dynam 2018;94(2):1151–63. [11] Patan M, Patan K, Gałkowski K, Rogers E. Iterative learning control of repetitive transverse loads in elastic materials. In: 57th IEEE conference on decision and control. Miami Beach, USA: IEEE Explore; 2018, p. 5270–5. [12] Kowalów D, Patan M. Optimal sensor selection for model identification in iterative learning control of spatio-temporal systems. In: 21st IEEE conf. on methods and models in automation and robotics; 2016. p. 70–5. [13] Oomen T, Rojas CR. Sparse iterative learning control with application to a wafer stage: achieving performance, resource efficiency, and task flexibility. Mechatronics 2017;47:134–47. [14] Song F, Liu Y, Xu J-X, Yang X. Data-driven iterative feedforward tuning for a wafer stage: a high-order approach based on instrumental variables. IEEE Trans Ind Electron 2019;66(4):3106–16. [15] Xu J-X, Tan Y. Linear and nonlinear iterative learning control for determinisic systems. Lecture notes in control and information sciences, vol. 291, Berlin: Springer; 2003. [16] Chen Y, Wen C. Iterative learning control. convergence, robustness, applications. Lecture notes in control and information sciences, vol. 248, London: Springer; 1999. [17] Simba KR, Bui BD, Msukwa MR, Uchiyama N. Robust iterative learning contouring controller with disturbance observer for machine tool feed drives. ISA Trans 2017;75:207–15. [18] Chow TWS, Li X-D, Fang Y. A real-time learning control approach for nonlinear continuous-time system using recurrent neural networks. IEEE Trans Ind Electron 2000;47:478–86. [19] Chi R, Hou Z. A new neural network-based adaptive ILC for nonlinear discrete-time systems with dead zone scheme. J Syst Sci Complexity 2009;22:435–45. [20] Zhang R, Xue A, Wang J, Wang S, Ren Z. Neural network based iterative learning predictive control design for mechatronic systems with isolated nonlinearity. J Process Control 2009;19:68–74. [21] Wei J, Zhang Y, Sun M, Geng B. Adaptive iterative learning control of a class of nonlinear time-delay systems with unknown backlash-like hysteresis input and control direction. ISA Trans 2017;70:79–92. [22] Xiong Z, J. Z. A batch-to-batch iterative optimal control strategy based on recurrent neural network models. J Process Control 2005;15:11–21. [23] Xiong W, Ho DWC, Yu X. Saturated finite interval iterative learning for tracking of dynamic systems with HNN-structural output. IEEE Trans Neural Netw Learn Syst 2016;27:1578–84. [24] Patan K, Patan M. Design and convergence of iterative learning control based on neural networks. In: European control conference; 2018. p. 3161–6. http://dx.doi.org/10.23919/ECC.2018.8550315. [25] Radac M-B, Precup R-E. Data-based two-degree-of-freedom iterative control approach to constrained non-linear systems. IET Control Theory Appl 2015;9(7):1000–10. [26] Patan K. Robust and fault-tolerant control. Neural-network-based solutions. Cham, Switzerland: Springer-Nature; 2019. [27] Shen D, Zhang W, Xu J. Iterative learning control for discrete nonlinear systems with randomly iteration varying lengths. Systems Control Lett 2016;96:81–7. [28] Norgaard M, Ravn O, Poulsen N, Hansen L. Networks for modelling and control of dynamic systems. London: Springer-Verlag; 2000. [29] Patan K. Two stage neural network modelling for robust model predictive control. ISA Trans 2018;72:56–65.

Please cite this article as: K. Patan and M. Patan, Neural-network-based iterative learning control of nonlinear systems. ISA Transactions (2019), https://doi.org/10.1016/j.isatra.2019.08.044.