A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis

A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis

ARTICLE IN PRESS JID: JARAP [m5G;October 22, 2019;9:31] Annual Reviews in Control xxx (xxxx) xxx Contents lists available at ScienceDirect Annual...

1MB Sizes 1 Downloads 26 Views

ARTICLE IN PRESS

JID: JARAP

[m5G;October 22, 2019;9:31]

Annual Reviews in Control xxx (xxxx) xxx

Contents lists available at ScienceDirect

Annual Reviews in Control journal homepage: www.elsevier.com/locate/arcontrol

Review article

A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysisR Dong Shen a,∗, Xuefang Li b a b

School of Mathematics, Renmin University of China, Beijing 100872, P.R. China School of Intelligent Systems Engineering, Sun Yat-sen University, Guangzhou 511400, P.R. China

a r t i c l e

i n f o

Article history: Received 1 June 2019 Revised 5 September 2019 Accepted 9 October 2019 Available online xxx Keywords: Iterative learning control Varying trial lengths Random model Compensation mechanism Convergence analysis

a b s t r a c t The nonuniform trial length problem, which causes information dropout in learning, is very common in various control systems such as robotics and motion control systems. This paper presents a comprehensive survey of recent progress on iterative learning control with randomly varying trial lengths. Related works are reviewed in three dimensions: model, synthesis, and convergence analysis. Specifically, we first present both random and deterministic models of varying trial lengths to provide a mathematical description and to reveal the effects and difficulties of nonuniform trial lengths. Then, control synthesis focusing on compensation mechanisms for the missing information and key ideas in designing control algorithms are summarized. Lastly, four representative convergence analysis approaches are elaborated, including deterministic analysis approach, switching system approach, contraction mapping approach, and composite energy function approach. Promising research directions and open issues in this area are also discussed. © 2019 Elsevier Ltd. All rights reserved.

Contents 1. 2. 3.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Varying trial length problem . . . . . . . . . . . . . . . . . Models of the varying trial length . . . . . . . . . . . . 3.1. Random models . . . . . . . . . . . . . . . . . . . . . . 3.2. Deterministic models. . . . . . . . . . . . . . . . . . 4. Control synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Compensation of the missing information . 4.2. Controller design . . . . . . . . . . . . . . . . . . . . . 4.3. Influence of varying trial lengths . . . . . . . . 5. Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . 5.1. Deterministic analysis approach . . . . . . . . . 5.2. Switching system approach. . . . . . . . . . . . . 5.3. Contraction mapping approach . . . . . . . . . . 5.4. Composite energy function approach . . . . . 6. Promising directions and open issues. . . . . . . . . . 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Declaration of Competing Interest . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

1 3 3 3 5 5 5 7 8 9 9 10 10 11 12 13 13 13

1. Introduction R This work was supported by National Natural Science Foundation of China (61673045). ∗ Corresponding author. E-mail address: [email protected] (D. Shen).

During the past decades, iterative learning control (ILC) has been intensively studied and well applied in various practical systems (Bristow, Tharayil, & Alleyne, 2006; Shen, 2018a; Shen &

https://doi.org/10.1016/j.arcontrol.2019.10.003 1367-5788/© 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

JID: JARAP 2

ARTICLE IN PRESS

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

Wang, 2014). In particular, the basic scheme of an ILC algorithm is formulated as follows. For an iteration, a predefined input drives the system dynamics over a finite time interval and the generates the corresponding output that aims to track a given reference. Then, the input for the next iteration is formulated as a linear or nonlinear function of the input, output, and desired reference of the current iteration. In this scheme, the error between the actual output and the desired reference acts as innovation information for correcting the input gradually in the iteration domain (Bristow et al., 2006). This scheme has shown distinct advantages of ILC in contrast to other control methods. Firstly, ILC is designed to deal with repetitive systems and it can improve the control performance iteratively by virtue of learning, whereas other control methods such as feedback control cannot take the advantage of repetition and they generate the same control performance in different iterations. Secondly, the control objective of ILC is to achieve sufficiently perfect tracking performance as iteration number increases, while other control techniques aim at asymptotical convergence along time axis. Thirdly, the updating law of ILC is easily to be implemented since it can be viewed as a feedforward control methodology in time domain. It is these advantages that attract an extensive attention on ILC from various discipline communities and an in-depth investigation in various directions. For a comprehensive tutorial of ILC, ones can refer to Bristow et al. (2006). In traditional ILC, to ensure the perfect tracking performance, various repetitive conditions are required (Arimoto, Kawamura, & Miyazaki, 1984), which include the identical tracking reference, identical initial state, identical trial length, identical system plant, etc. However, it is difficult to guarantee these strictly repetitive conditions in real-time, which thus hinders practical applications of ILC and motivates researchers to make efforts to relax/remove the repetitive conditions. For instance, there are some works aiming to extend ILC to systems with iteration-varying tracking references (Boeren, Bareja, Kok, & Oomen, 2016; Chien, 2008; Jin, 2017; Li, Lv, & Ho, 2016; Saab, Vogt, & Mickle, 1997; Xu & Xu, 2004; Yu & Li, 2017; Zhu & Xu, 2017), most of which are at the cost of degrading the tracking accuracy. Moreover, to remove the identical initial condition is also a hot research topic in ILC field since strictly resetting the initial state at each iteration is a hard task. In literature, Li and Li (2014); Li, Chow, Ho, and Zhang (2009); Sun and Wang (20 02, 20 03); Xu and Yan (2005) are five representative works on ILC without identical initialization condition (i.i.c.). Additionally, there is a very recent work addressing ILC with iterationvarying systems (Altin, Willems, Oomen, & Barton, 2017). In short, many efforts have been devoted to relax the strict limitations in traditional ILC, which may help widen the application scope of ILC in real-time. Different with the above mentioned works, this paper will focus on ILC with iteration-varying trial lengths. In conventional ILC, it is assumed that the control system repeats on a fixed time interval, namely, the trial length at each iteration is identical. In practical applications, however, the actual trial lengths may vary from iteration to iteration due to the consideration of safety problem and/or the operation ability of the controlled system. Some detailed explanations and illustrative examples will be presented in the next section. Based on these observations, ILC with iteration-varying trial lengths is then motivated. In general, varying trial length implies certain loss of control information, which constitutes a part of the research on ILC with incomplete information. In the past few years, several works have been published on this topic, such as Guth, Seel, and Raisch (2013); Li, Xu, and Huang (2013, 2015); Li and Xu (2015); Longman and Mombaur (2006); Meng and Zhang (2017); Seel, Schauer, and Raisch (2011, 2017); Seel, Werner, Raisch, and Schauer (2016a); Seel, Werner, and Schauer (2016b); Shen, Zhang, Wang, and Chien (2016a); Shen, Zhang, and Xu (2016b). The objective of this

paper is to provide an overview of ILC with varying trial lengths and the recent progress in this area. Our insights on this problem will also be presented and discussed from practical application point of view. We should point out that the newly introduced issue with varying trial lengths is different from another learning scheme with varying time-scales (Cheah & Xu, 20 0 0; Kawamura, 1987; Kawamura & Fukao, 1995; Xu, 1998; Xu & Feng, 1998; Xu & Song, 1998; 20 0 0; Xu, Xu, & Viswanathan, 2002; Xu & Zhu, 1999). In particular, the difference lies in the formulation of the desired tracking trajectories in different iterations. In the newly introduced issue, the desired trajectory retains the same for all iterations but with varying trial lengths, while for the learning scheme with varying timescales, the desired tracking trajectory should retain its integrity by expanding and contracting the time scale. In Kawamura (1987); Kawamura and Fukao (1995), an inverse dynamics-based approach was proposed to solve the problem. Another approach called direct learning control was initiated by Xu in a series of contributions (Cheah & Xu, 20 0 0; Xu, 1998; Xu & Feng, 1998; Xu & Song, 1998; 20 0 0; Xu et al., 2002; Xu & Zhu, 1999). In detail, we will first elaborate the issue of randomly varying trial lengths, where three practical examples are discussed in turn. It shows that the varying trial lengths problem is unavoidable in real-time applications, and the conventional ILC that requires exactly identical trial length are thus no longer applicable. To deal with control systems with iteration-varying trial lengths, three key points: model of nonuniform trial lengths, control synthesis, and convergence analysis need be carefully considered. About the model of varying trial length, we review both the random and the deterministic cases, where the former has attracted much attention from the community. Furthermore, to fully understand the problem of ILC with nonuniform trial lengths, comparisons between the different trial length varying models are presented. For the control synthesis with varying trial lengths, we focus on the design of mechanisms to compensate the lost control information and the associated controller design. The general influence of varying trial lengths to control systems is also discussed. Moreover, various convergence analysis techniques for ILC with randomly varying trial lengths will also be reviewed. Specifically, the deterministic analysis approach that neglects the randomness of trial lengths by imposing additional requirements is first briefed, which is followed by the contraction mapping approach. It is worthwhile to note that similar as in conventional ILC, the contraction mapping method is only applicable to linear systems and nonlinear systems with globally Lipschitz condition. To achieve a stronger convergence result, a novel switching system approach is explained for linear systems. Lastly, a Lyapunov-like function approach is introduced to address nonlinear systems without globally Lipschitz condition. Further discussions and promising future work will also be discussed after the survey of recent contributions. Through this survey, an in-depth understanding of the principle, description, influence, and treatment of varying trial lengths can be expected. In fact, varying trial length can be regarded as a special case of incomplete information-based ILC problem (Shen, 2018b). That is, the whole operating interval is separated into two sections, the completed section and the untrodden section. The latter section provides no tracking information and thus affect the learning performance such as slowing down convergence speed. Therefore, to deal with ILC with iteration-varying trial lengths has some similarities with ILC with random data dropouts, and some techniques in handling ILC with data dropouts can actually be applied to deal with the varying trial length problem, as illustrated in the subsequent sections. However, it is worth to point out that the general methods for handling data dropouts and other incomplete information are not always suitable for varying trial length problem as the time-domain-wise independence property is no longer valid. It

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

JID: JARAP

ARTICLE IN PRESS

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

should be pointed out that the application study for varying trial length problem is far behind the theoretical analysis. More effort is expected in this direction to establish a solid research system. The rest of the paper is organized as follows. Section 2 provides the background and formulation of the varying trial length problem. Section 3 presents the problem description of varying trial lengths as well as its essential influence in learning performance. Section 4 summarizes the control synthesis including information compensation mechanisms, controller design, and impact of nonuniform trial lengths on convergence. Section 5 addresses the convergence analysis in consideration of varying trial lengths. Further discussions and suggestions are explained in Section 6. Section 7 concludes the whole paper. 2. Varying trial length problem In traditional ILC, the operation time interval is restricted to be a fixed value so that a perfect learning process can be conducted along the iteration axis (Bristow et al., 2006). However, this condition might often be violated in real-time applications due to unknown uncertainties and unpredictable factors. In other words, it is difficult to repeat a control system in practice on a fixed time interval. This observation motivates the research of varying trial length problem. In particular, it indicates the case that the trial length is not identical for all iterations but varies within a prespecified interval (Li et al., 2013). To clearly understand the problems with varying trial lengths, three practical examples will be illustrated in this section, which are followed by some discussions on the difficulties to deal with such kind of problem. Example 1. ILC was applied to the functional electrical stimulation (FES) for upper limb movement and gait assistance in Seel et al. (2011). In these applications, due to the safety consideration that the system output cannot deviate too much from the desired trajectory, the actual operation lengths may be terminated earlier at the first few trials. That is, the operation length may be shorter than the desired length. Recently, the iteration-varying trial lengths problem is also observed in FES-induced foot motion (Seel et al., 2016a; Seel et al., 2016b). According to these observations, it is clear that the varying trial length problem is quite common in rehabilitation process. Example 2. Longman and Mombaur (2006) applied ILC to solve the output deviation problem of humanoid and biped walking robots because they featured periodic or quasi-periodic gaits. The learning trials are divided into phases by the time at which the foot strikes the ground. With this setting, the learning control concept can be employed to improve the tracking performance. However, the duration of the resulting phases can vary from trial to trial when the robots cannot complete the entire trial due to conditions such as safety and constraints. Example 3. The work Guth et al. (2013) provides another varying trial length example in trajectory-tracking problem for a lab-scale gantry crane. As the load is only allowed to move in a specified neighborhood of the desired reference, the trial will be disrupted when the output drifts out of the constrained area. Consequently, the trial lengths are not identical for all trials. Even for the incomplete trial, the available tracking section can still offer contributions to improve learning performance. Based on the above three examples and others in the literature, it is of importance to make an in-depth investigation of ILC with varying trial lengths. To demonstrate the essential influence, an illustration of varying trial length setting is presented in Fig. 1, where Fig. 1(a) displays the complete trial length with Td being the desired iteration length while Figs. 1(b)-1(d) present possible incomplete trial lengths. In other words, the varying trial length

3

problem here indicates that the iteration may end before its desired time length but the tracking objective retains the same for all iterations. Therefore, the major influence of this setting is that the latter part of the tracking information is missing if the iteration ends early. To solve the varying trial lengths problem, there are three critical issues to be addressed: • How to formulate the varying trial lengths problem in the framework of ILC. That is, a mathematical description of the incomplete trial lengths needs to be provided. • How to design suitable learning algorithms to ensure the learning performance when the trial lengths vary from iteration to iteration. In this step, it is necessary to compensate the missing information caused by the varying trial lengths and to evaluate its influence on the learning ability. • How to analyze the convergence of the proposed learning algorithms because the new problem formulation is no longer identical to traditional ILC due to the introduction of randomness. These issues will be detailed separately in Sections 3–5. Here we would like to mention that the varying trial length problem can be regarded as a special case of the data dropout problem. That is, the essential of the former is information loss of the untrodden section, which behaves like successive data dropouts in the time domain. From this point of view, the investigation of the former problem would offer a clear understanding of the latter problem, especially for the time-dependent data dropout problem. Moreover, the study of varying trial length problem can also help us fully understand the inherent mechanism of learning control. That is, the information utilization would be elaborated according to various control environments. Before ending this section, we emphasize that the application study of varying trial length problem is identically important to the above three critical issues. However, current progress in this direction is rather limited. Besides motivating us to consider the varying trial length issue, Examples 1–3 are also typical applications of the theory study (Guth et al., 2013; Longman & Mombaur, 2006; Seel et al., 2011). Some numerical simulations have been conducted for robot fish (Li, Xu, & Huang, 2015), high-speed trains (Yu, Bu, Chi, & Hou, 2018), and manipulators (Zeng, Shen, & Wang, 2019) to name a few. Experimental validations are desirable in the future. 3. Models of the varying trial length In this section, formulation of the varying trial lengths problem will be discussed. That is the mathematical modelling of the varying trial length. We will review the proposal of random and deterministic descriptions of the trial length variable. We first address the random model, where the necessity, models, and related remarks are given in turn. Then, we discuss the deterministic model as a counterpart for completeness. We present the random model in the first place because it is more common than the deterministic model. The details are presented in the following subsections. 3.1. Random models First of all, a varying trial length problem is illustrated in Fig. 2, where discrete time instants are taken into account. Assume that the desired trial length Nd is equal to 8, the minimal trial length Nmin is equal to 5, and the maximal trial length Nmax is equal to 10. Thus, the actual trial length Nk for the kth iteration can vary between 5 and 10, as shown by the dashed horizontal line. Four possible trials are shown in the figure, N1 = 5, N2 = 10, N3 = 8, and N4 = 7. For k = 1, 4, there has Nk < Nd . In this case, the trial length is not completed and the untrodden section is with no information. As a consequence, only the available tracking information can

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

ARTICLE IN PRESS

JID: JARAP 4

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

Fig. 1. Illustration of varying trial lengths: the horizontal axis is time with Td being the desired trial length, and the vertical axis represents the system output.

Fig. 2. Illustration of varying trial lengths.

be employed to update the algorithm. For k = 2, there has Nk > Nd . In this case, the trial length is completed with redundant information; that is, the tracking information at the time instant t = 9, 10 is not necessary for updating the algorithm. Therefore, when the actual trial length is larger than the desired one, the case can be regarded same to the desired length case because the redundant information is discarded directly. In other words, if Nmax > Nd , we can combine all the cases of trial length being Nd , Nd + 1, . . . , Nmax into a single case of trial length being Nd . It is seen from Fig. 2 that the trial length may vary within a certain interval. Thus, it is reasonable to express the length by a variable, denoted by Nk as above. Then, to model the varying trial length is to establish a model for the variable Nk . On the one hand, noting that the trial length depends many factors in applications such as the subject ability, safety, and environment constraints, it is necessary to make a randomness assumption of the trial length. On the other hand, the random modeling of the length can provide us a basic cognition of the problem and facilitate the associated algorithm design and analysis. Evidently, we can establish a direct model for the trial length Nk . In particular, the actual trial length varies within the set {Nmin , . . . , Nmax } with h  Nmax − Nmin + 1 denoting the amount of possible cases. Let the probability of the trial length being Nmin , . . . , Nmax be p1 , . . . , ph . That is, P(ANmin ) = p1 , P(ANmin +1 ) = p2 , . . . , P(ANmax ) = ph , where Am denotes the event that the trial length is m, Nmin ≤ m ≤ Nmax . Clearly, we have pi > 0, 1 ≤ i ≤ h, and

p1 + p2 + · · · + ph = 1.

(1)

In other words, the events {ANmin , . . . , ANmax } are mutually exclusive. Moreover, it should be pointed out that no specific probability distribution is assumed on pi ; thus, the above model of the random trial length is general. In addition, as discussed above, if we regard all the cases with the trial length equal to or longer than the desired one Nd as the single case with the trial length equal to  max Nd , denoted by AN (i.e., AN = N Ai ), then the related probai=N d

d

d

bility of AN is P(AN ) = P(ANd ) + · · · + P(ANmax ). In the rest of the d

d

paper, we assume Nd = Nmax . Noting that the trial length is a combination of successive time instants, we can also form the randomness with respect to the time instant. In particular, we denote the probability of the occurrence of the output at time instant t by p(t). Clearly, p(t ) = 1, 0 ≤ t ≤ Nmin and 0 < p(t) < 1, Nmin + 1 ≤ t ≤ Nd . Moreover, note that the outputs at any time instant t1 with t1 < t0 are definitely available as long as the output at time instant t0 appears. In other

words, the event Bt0 always implies the event Bt1 with t1 < t0 , where Bt denotes the event that the output at time instant t is available. Therefore, we note that p(t1 ) = P(Bt1 ) > P(Bt0 ) = p(t0 ) where t1 < t0 . In short, p(Nmin ) > p(Nmin + 1 ) > · · · > p(Nd ) > 0. It is worthy pointing out that p(Nd ) > 0 implies that the trial length will take the full length with a positive probability, which is natural for practical applications to ensure an asymptotical convergence for all required time instants. It is worthwhile to mention that the above two models of the random trial length are essentially equivalent. In particular, the fact that the trial length is N◦ implies that the outputs at time instants 0 ≤ t ≤ N◦ are available while those at time instants N◦ + 1 ≤ t ≤ Nd are missing. In other words,

P(AN◦ ) = P(BN◦ ) − P(BN◦ +1 ).

(2)

From this relationship, we can conclude that P(ANd ) = P(BNd ) and Nd N P(At ) = t=d N [P(Bt ) − P(Bt+1 )] = P(BNmin ) = 1. The latt=N min

min

ter coincides with (1). Both models have been applied in literature. For instance, Li et al. (2013) introduce the first model, i.e., trial length based model, and then computed the probability of the output occurrence at each time instant. While Shen et al. (2016b) applied the second model, namely, the authors define the time instant based model first, and then calculated the probability of possible trial lengths. Furthermore, the above two models mainly focus on the randomness of one iteration, whereas the randomness in the iteration domain is not specified. Generally, we consider the random trial length to be independent along the iteration axis. This is a standard and simple assumption but effective in convergence analysis because the independence allows us to separate the variable from successive iterations. Indeed, one can proceed to investigate the iteration-axis-dependent case while considering the problem that the current trial may affect the possible trial length of the next iteration. For example, a Markov chain model of the trial length variable can be employed to express the random iteration-dependence of trial lengths. This issue has not been well explored. We should point out that the random models provide us a certain statistical property of the involved variables. On the one hand, this information can be utilized in the control design and analysis such as the convergence conditions in Li et al. (2013). Meanwhile, the removal of the statistics information is also valuable as it may widen the application scope. On the other hand, taking into account the randomness of the trial length, the convergence should be conducted in the sense of randomness such as convergence in expectation, mean-square convergence, and almost-sure convergence (Shen et al., 2016a). To this end, the stochastic analysis techniques would be of great help. Before ending this subsection, we present the counterpart of continuous-time systems. For this case, the operation interval is a section of the real number axis, i.e., [0, Td ], rather than a set of discrete integers, where Td denotes the desired trial length. The actual trial length for continuous-time case is denoted by Tk varying in the interval [Tmin , Tmax ]. Clearly, Tmax ≥ Td and Tk can be larger than Td as long as Tmax > Td . Similar to the discrete-time case, if Tk < Td ,

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

ARTICLE IN PRESS

JID: JARAP

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

5

can be a positive constant, which indicates that the trial length can take the minimum length Tmin with a positive probability, i.e., P(Tk = Tmin ) > 0. Meanwhile, as t approaches Tmax from the left, p(t) need not approach 1. Mathematically, p(Tmax ) can be a constant less than 1, which indicates that the trial length can take the maximum length Tmax with a positive probability, i.e., P(Tk = Tmax ) > 0. In fact, P(Tk = Tmax ) = 1 − p(Tmax ) according to the above model. Furthermore, we note that the above defined probability distribution function satisfies the left-continuous property. In short, it is a general formulation of the random trial length variable.

iterations, there is at least one iteration achieving the desired length. Clearly, the deterministic model removes the random hypothesis by introducing a bounded learning iteration-period. In other words, it employs a deterministic factor to ensure the continuous learning process. It is worth to note that the random model and the deterministic model cannot cover each other. On the one hand, the deterministic model does not require the probability distribution or statistics property of the trial length, but pays the price of successive uncomplete trials being bounded. On the other hand, the random model may require probability hypothesis; however, the general successive uncompleted trials are allowed to occur following a certain probability distribution. Therefore, the stochastic analysis techniques can be borrowed for convergence analysis. Moreover, as we claimed before, the random trial length problem can be regarded as a special case of incomplete information problem. Accordingly, both the random and deterministic models of the incomplete information has been presented in literature (cf. Shen (2018b), Chapter 1). For example, the finite successive iteration assumption was also adopted in Shen and Chen (2012); Shen and Wang (2015a,b) to model the random data dropouts and communication delays. Furthermore, we note that Seel, Schauer, and Raisch (2017) presented a direct analysis of the two adjacent iterations to derive a monotonic convergence property. As a result, the trial length is not imposed any specific condition. In particular, the tracking error for each iteration is denoted by e¯ k = [eTk  eTk ]T , where the ek is the measured error and  ek is the hypothetical error. The dimension of  ek can vary from iteration to iteration. Comparing with random models, it is noticed that the deterministic models allow more flexibility in the iteration-domain dependence. In other words, the iteration-dependent trial length is admitted in the above models. We should emphasize that the essential principle to guarantee a well learning process under iteration-varying trial length circumstances is that the full trial length occurs infinite times along the iteration axis. This is the most general condition, which reveals that more efforts can be devoted to develop novel design and analysis techniques for admitting the most general case. Note that the differences between the random and deterministic models include two aspects: one of which is the specific description of the trial length uncertainty and the other is the analysis approach for establishing the convergence results.

3.2. Deterministic models

4. Control synthesis

While the random model of trial length can provide a clear description of the length uncertainty, several papers consider a deterministic depiction to some extent, where the randomness is removed or generalized. As a matter of fact, the deterministic depiction of trial length mainly contributes to remove the prior statistical assumption of the random model; however, the latter is not vital in the design and analysis of learning algorithms. In other words, we regard the deterministic model as that of no specific statistics. In Meng and Zhang (2017), a finite successive iteration assumption on the trial length was presented. In particular, the trial length can vary completely unknown, but there must exist an unknown positive integer m such that the trial can operate with the desired length at least once during any m successive iterations. It was called a persistent full-learning property in Meng and Zhang (2017). A visual illustration for this assumption is demonstrated in Fig. 3, where the finite number is set m = 4. In the figure, the horizontal lines denote different trial lengths and the dashed boxes denote a collection of possible successive trials. From this example, one can notice that for any 4 successive

In this section, we consider the associated control synthesis issues. First, an uncompleted trial length implies a part of missing information in the learning process. Therefore, it is of interest to consider the compensation of unavailable tracking section. Second, with a suitable compensation mechanism, the guidelines for controller design are also important for practical applications. Moreover, it deserves an in-depth study to understand the inherent influence of the varying trial length on the tracking performance and convergence speed. Throughout this section, the vectors of input, output, and tracking error are denoted by uk (t ) ∈ R p , yk (t ) ∈ Rq , and ek (t ) ∈ Rq , where p and q are dimensions.

Fig. 3. Illustration of the persistent full-learning property.

then the redundant tracking information will be discarded directly and thus we regard this case as Tk = Td without loss of generality. That is, we assume Tmax = Td in the rest part. Moreover, the Tk is of infinite-dimension and we cannot define a specified probability for a given trial length as the discrete-time case above. In contrast, we employ the following model. Tk is a random variable with its probability distribution function being



FTk (t )  P(Tk < t ) =

0, t ∈ [0, Tmin ] p(t ), t ∈ (Tmin , Tmax ] 1, t > Tmax

(3)

where 0 ≤ p(t) ≤ 1 is a continuous function. From the above model, some features can be observed. First of all, FTk (Tmin ) = 0 indicates that the trial length cannot be shorter than the minimum length Tmin . Moreover, the probability distribution function p(t) does not have to approach its both side values. In particular, as t approaches Tmin from the right, p(t) need not + approach 0. Mathematically, it implies that p(Tmin )  limt ↓T + p(t ) min

4.1. Compensation of the missing information In this section, several compensation mechanisms for missing information are summarized and the comparisons between different mechanisms are presented. If a trial is not completed, then only a part of tracking information is available for updating the control signal. In detail, the desired length is Nd but the actual trial length is Nk , Nk < Nd . Then, the outputs at the first Nk time instants

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

ARTICLE IN PRESS

JID: JARAP 6

are available, while those at the remaining Nd − Nk time instants are unavailable since the trial has already been terminated at the time instant Nk . Consequently, we have tracking errors for previous Nk time instants only. On the other hand, we have no prior knowledge of the next iteration, which requires us to update the control signal for the whole trial. To this end, we need to compensate the missing information at the remaining Nd − Nk time instants with suitable data. In short, the main difficulty of the varying trial length problem lies in that the latter part of tracking information is missing. To overcome this difficulty, various compensation mechanisms are motivated in literature, which are summarized below. Zero-Compensation Mechanism. When the trial terminates earlier, the untrodden section provides no information for learning. For this case, a direct and simple compensation is to replace the absent system outputs by the desired reference values. That is, let yk (t ) = yd (t ), Nk + 1 ≤ t ≤ Nd . Then, the tracking errors ek (t )  yd (t ) − yk (t ) at the untrodden time instants become zero, ek (t ) = 0, Nk + 1 ≤ t ≤ Nd . This simple compensation is called zerocompensation mechanism, which results in the following extended tracking error



e∗k (t ) =

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

ek (t ), 0 ≤ t ≤ Nk 0,

Nk + 1 ≤ t ≤ Nd

(4)

By setting the tracking error to zero, the input signal of the traditional P-type learning algorithm will not be updated. In particular, the traditional P-type learning algorithm is formulated as uk+1 (t ) = uk (t ) + Lek (t + 1 ) provided that the relative degree is one without loss of generality, where L ∈ R p×q is the learning gain matrix. Obviously, we have uk+1 (t ) = uk (t ) for Nk ≤ t ≤ Nd − 1 when the zero-compensation mechanism is applied. In other words, the zero-compensation mechanism actually imposes a basic updating principle for the input signal: updating the control input signal iteratively when the tracking error is available and retaining the input signal if the tracking information is unavailable. Clearly, when utilizing the zero-compensation mechanism, the ILC updating law is



uk+1 (t ) =

uk (t ) + Lek (t + 1 ), 0 ≤ t ≤ Nk − 1 uk (t ),

Nk ≤ t ≤ Nd

(5)

Noting that the variable Nk is generally unknown and varies randomly in the iteration domain, to facilitate convergence analysis, an indicator function 1{t≤Nk } is usually introduced, by which (4) can be rewritten as

e∗k (t ) = 1{t≤Nk } ek (t ),

0 ≤ t ≤ Nd ,

(6)

where the indicator function 1{A} is equal to 1 if the indicated event A hold. Otherwise, it is equal to 0 (Shen et al., 2016b). For similar purpose, some works, such as Li et al. (2015), introduce a random variable satisfying Bernoulli distribution. With this indicator function, we can regard the tracking errors for the untrodden section Nk + 1 ≤ t ≤ Nd to be virtually generated and all the analysis can be conducted similar as traditional ILC. It is noted that the indicator function 1{t≤Nk } is not independent along the time axis within an iteration, similar to the dependence of Bt in Section 3.1. In particular, if t1 < t0 , then 1{t0 ≤Nk } absolutely implies 1{t1 ≤Nk } . Consequently, convergence analysis must account for this time dependence. All-Historical-Data-Compensation Mechanism. It is clear that the zero-compensation mechanism is a lazy strategy that the input signal will not be updated if the corresponding information is lost. This kind of mechanism somehow will slow down the learning speed. To solve this problem, Li et al. (2013); Li and Xu (2015) provided an estimation of the possible tracking error by using the historical data, and an all-historical-data-compensation mechanism was proposed by using the average operator given in Park (2005),

which is defined as follows:

A{e∗k (t )} =

k 1  ∗ e j (t ), k+1

(7)

j=0

with e∗k (t ) being given in (4) and (6). The actually employed error in the learning algorithm is A{e∗k (t )}. As a consequence, all the historical tracking information are accumulated to compensate the missing data. In order to reduce the memory burden, the iterationaverage operator can be calculated recursively,

A{e∗k (t )} =

1  ∗ ek (t ) + kA{e∗k−1 (t )} . k+1

For the compensation mechanism (7), its main drawback is that the old tracking information (i.e., trials at the very beginning) are always affecting the learning process as the iteration number increases. Obviously, the latest trials can provide more accurate control information than those “older” trials, which motivates the introduction of the following compensation mechanisms. Iteration-Moving-Average-Compensation Mechanism. Let an integer m denote the size of a moving window along the iteration axis. Then, an iteration-moving-average-compensation mechanism is given by the following operator

MA{e∗k (t )} =

m−1 1  ∗ ek− j (t ) m

(8)

j=0

for a sequence e∗k−m+1 (t ), e∗k−m+2 (t ), . . . , e∗k (t ) given in (4) and (6). In this operator, only the tracking information at the latest m trials will be used for learning, which may provide more accurate information and thus can be expected to expedite the convergence speed. In contrast to the all-historical-compensation mechanism, this mechanism can reduce the effect of redundant tracking information in the design of ILC algorithms. However, noting that e∗k (t ) is employed in the algorithm, which can be 0 for the incomplete trials. In this case, the average operation by m may greatly suppress the possible correction effect of the very limited tracking information. For example, if there is only one iteration that generates a tracking error during some m successive iterations, this error will be downscaled by m times when applying for correcting the input signal. Thus, the correction performance is evidently weakened. Generally, for any time instant t, the occurrence of its tracking error is modeled by p(t) (see Section 3.1), the actual correction effect is reduced about 1 − p(t ) by average. To overcome this disadvantage, two searching mechanism-based compensation methods were presented in Li and Shen (2017) as follows. Random-Searching-Compensation Mechanism. This mechanism is a modified version of (8). In particular, select an integer m > 1 as the window size. Denote m St,k  {Nk− j | t ≤ Nk− j , j = 0, 1, . . . , m − 1}

(9)

as the set of iterations counting from the kth back to the (k + 1 − m )th whose trial length can continue to the time instant t. m | be the amount of the elements in Sm . That is, Let ntk = |St,k t,k

there are ntk iterations with available tracking information in the past m iterations starting from the kth iteration for time instant t, while the other iterations cannot provide useful tracking message. Clearly, ntk is a random variable, 0 ≤ ntk ≤ m. To clearly understand this mechanism, an example is illustrated in Fig. 4. In the figure, we set the searching window size m = 8 and the current iteration number is k. For time instant t1 , it can be seen that t Stm,k = {Nk , Nk−1 , Nk−2 , Nk−3 , Nk−5 , Nk−6 } and nk1 = 6. For time in1

t

stant t2 , we have Stm,k = {Nk , Nk−3 , Nk−5 } and nk2 = 3. It is evident 2

that ntk is random due to the randomness of trial lengths.

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

ARTICLE IN PRESS

JID: JARAP

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

Fig. 4. Illustration of Random-Searching-Compensation Mechanism.

Fig. 5. Illustration of Fixed-Searching-Compensation Mechanism.

The random-searching-compensation mechanism is to replace the untrodden tracking error with the moving averaged signal

MAR {e∗k (t )} =

m−1 1  ∗ ek− j (t ). ntk

(10)

j=0

It is easy to find that the difference between (10) and (8) lies in the denominators, where the denominators in (8) is deterministic but in (10) is random. In order to avoid the singularity in the compensation mechanism (10), we usually assume ntk ≥ 1. That is, there exists at least one iteration such that the trial length is larger than t for any m successive iterations. Clearly, this condition can be guaranteed when the deterministic model of trial length is applied. Moreover, the condition is not very strict for practical applications and the reasons are twofold. On the one hand, according to the probability of the output occurrence at time instant t, we can obtain that the mathematical expectation of ntk is E[ntk ] = p(t )m. Therefore, ntk would increase to infinity as the window size m goes to infinity in the sense of mathematical expectation. On the other hand, if ntk = 0, it implies that none tracking information can be found in the latest m iterations for the given time instant t. In other words, all the latest m trials are terminated before the time instant t and thus nothing can be learned. For this special case, we may simply set MAR {e∗k (t )} = 0 similar to the common zero-compensation mechanism. Notice that the actual amount of data packets ntk for the compensation is random depending on trial length randomness, thus the learning algorithms with such mechanism may fluctuate along the iteration axis. In other words, the transient performance of the iterative learning process may not be steady. To overcome this issue, an alternative searching mechanism-based compensation is proposed. Fixed-Searching-Compensation Mechanism. For any given iteration number k ≥ m and time instant t, we can find m past iterations such that the trial length is larger than t. Denote the corresponding iteration numbers being k − rk, j , j = 1, . . . , m, where 0 ≤ rk,j ≤ k are integers such that rk,j , j = 1, . . . , m is an increasing sequence. Compared with the previous random-searching mechanism, the major difference is that this searching mechanism always find m iterations for updating the input signal and this is why we call it as fixed-searching mechanism. An illustrative example of this mechanism is demonstrated in Fig. 5, where the searching size is set m = 3 (i.e., we always find 3 available trials). For time instant t1 , the available iteration index set is {k, k − 1, k − 2}, corresponding to rk,1 = 0, rk,2 = 1, and rk,3 = 2. For time instant t2 , the available iteration index set is {k, k − 3, k − 5}, corresponding to rk,1 = 0, rk,2 = 3, and rk,3 = 5. Clearly, the variable rk,j is random due to the trial length randomness. The fixed-searching-compensation mechanism is to replace the untrodden tracking error with the following averaged signal

MAF {ek (t )} =

m 1  ek−rk, j (t ). m j=1

7

(11)

Three differences can be observed in contrast to (10). First, the denominator value is fixed to be m in (11) while it is a random number ntk in (10). Second, the randomness factor has been moved to the subscript, i.e., rk,j , in (11) while it is ntk in the denominator of (10). Third, the averaged signal is the original tracking error ek (t) rather than the extended signal e∗k (t ) because they are all available in this case. A final remark on this mechanism should be noted. In order to ensure the effectiveness of the proposed scheme, we must find enough past iterations (i.e., the prior given number m) for any iteration k. However, this requirement is not valid for the first few iterations. For these iterations, we may employ the other compensation mechanisms such as the simple zero-compensation mechanism. After a sufficiently large number of iterations, the intended condition is always satisfied. For comparison, all the above compensation mechanisms are summarized in Table 1. The zero-compensation mechanism is the most common candidate due to its simplicity, for which the merit is that only the latest information is taken into account and thus is easy to be implemented and analyzed. In contrast, all the other mechanisms further introduce information from the previous iterations. The involvement of information from previous iterations takes the advantage of providing a sufficient compensation but meanwhile may slow down the convergence speed. Among these historical information-based compensation mechanisms, the random/fixed-searching-compensation mechanisms provide a well trade-off between the compensation quality and convergence speed. At the end of this section, we should point out that all the above mentioned mechanisms are based on the previous tracking information. It can be classified into the iteration-domain-based compensation scheme. While how to propose a time-domainbased compensation scheme is still open. That is, it is yet unclear how to compensate the untrodden tracking information based on the system dynamics within an iteration. This is a promising direction for further research. 4.2. Controller design The learning controller design for varying trial length problem is similar to the conventional scheme except that the compensation information is employed if the corresponding output is not available. Therefore, the controller can be designed following two categories: direct scheme and indirect scheme. For the direct scheme, the input for the current iteration is generated by a direct combination of the input and tracking information from the previous iterations. For the indirect scheme, the system structure such as unknown parameters is iteratively learned by using the previous tracking information and then the input is generated adaptively. Denote the compensated tracking error by ek (t ), which can be e∗k (t ), A{e∗k (t )}, MA{e∗k (t )} and others defined in the previous section. We first consider the conventional P-type learning algorithm, which is widely applied in literature. We call an update law as

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

ARTICLE IN PRESS

JID: JARAP 8

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx Table 1 Compensation mechanisms. Compensation mechanism

formula

reference

Zero-compensation mechanism

e∗k (t ) = 1{t≤Nk } ek (t ) 1 k ∗ A{e∗k (t )} = k+1 j=0 e j (t ) 1 m−1 ∗ ∗ MA{ek (t )} = m j=0 ek− j (t ) m−1 ∗ MAR {e∗k (t )} = n1t j=0 ek− j (t ) k 1 m MAF {ek (t )} = m j=1 ek−rk, j (t )

(6)

All-historical-data-compensation mechanism Iteration-moving-average-compensation mechanism Random-searching-compensation mechanism Fixed-searching-compensation mechanism

P-type if the tracking error is added to the input signal in a linear form with suitable coefficients. The form of ILC law should be modified correspondingly according to the compensation signal. For example, if we employ the simple compensation e∗k (t ), the learning algorithm is designed to be

uk+1 (t ) = uk (t ) + Le∗k (t + 1 ).

(12)

If we employ A{e∗k (t )}, the learning algorithm becomes

uk+1 (t ) = A{uk (t )} + (k + 2 )LA{e∗k (t + 1 )}.

(13)

If we employ MA{e∗k (t )}, the learning algorithm is given as

uk+1 (t ) = MA{uk (t )} + LMA{e∗k (t )}.

(14)

Clearly, it is noticed from the above update algorithms that the input signal to be updated is also modified according to the adopted compensation signal, which is used to facilitate the convergence analysis. In particular, the selected iterations for the input signal and tracking error signal are identical at the right-hand side of the algorithms. In this case, the input signals from each selected iteration have a corresponding tracking error signal to make a correction for generating the subsequent input command. Another major issue in designing the controller is the selection of suitable learning gain (matrix). Noting that the varying trial length mainly affect the accessibility of the operation data; thus, it is generally independent of the specific selection of the learning gain (matrix) for ensuring a fundamental convergence property. From this viewpoint, the learning gain (matrix) issue is the same to the conventional ILC problems. However, we also note that the untrodden tracking error has been replaced with various compensate information, which may render us to make necessary revisions of the learning gain matrix for certain objectives such as accelerating the convergence speed and improving the transient performance. For these direct schemes, the common methods for convergence analysis are contraction mapping principle and its variants. For example, the contraction mapping principle can ensure a nondecreasing contraction between successive iterations under suitable norms (such as the well-known λ-norm), but the asymptotical convergence should be completed with more careful analysis on the scenarios of strict contraction and non-expansion of the contraction coefficient. While considering the average-compensationbased schemes, the contraction mapping should be conducted for the corresponding collection of the associated iterations. The indirect scheme of the controller design is generally called adaptive ILC that the inherent system information rather than the optimal control signal is updated by using the iterative learning idea. Among the existing literature, the most common framework is to model the plant as a linearly parametric form with known nonlinearities and unknown parameters. Then, the control is realized in the combination of an iterative parameter estimation law and an associated feedback protocol, where the parameter estimation process provides an asymptotical identification of the system structure and the feedback protocol guarantees the transient stability and precise tracking performance (Xu, 2011). The major difference between the adaptive ILC and adaptive control lies in the parameter updating. For the former the parameters are updating in

(7) (8) (10) (11)

iteration-domain while the latter updates the parameters in time domain. Some further variants of the above basic scheme can be found in literature. For example, in Zeng, Shen, and Wang (2018), two combination-type schemes were presented in consideration of knowing partial structure information. In particular, the first scheme assumed that the controlled plant included both time-varying and time-invariant parameters, thus a mixing-type adaptive learning scheme was proposed consisting of difference learning and differential learning. The second scheme assumed that the separation of the time-varying and time-invariant parts were not obvious and a difference-differential hybrid-type scheme was constructed. For these indirect schemes, the convergence analysis can be conducted with the help of the well-known composite energy function (CEF) method (Xu, 2011), which is a typical Lyapunov-like function method. The CEF contains two parts: one of which is a quadratic form of the tracking errors indicating the tracking performance and the other one is a functional of the estimation errors indicating the learning performance. The details will be elaborated in Section 5.4. At the end of this subsection, we should point out that the common update laws generally follows the classic P-type structure as shown by (12)-(14). However, this structure is not prerequisite to achieve a perfect tracking performance. We adopt this structure mainly because of its simplicity. The application of nonlinear controller may introduce additional merits in tracking performance, which is still an open problem. Indeed, it is also far from complete for the application of nonlinear controllers in the conventional ILC field (Ahn, Chen, & Moore, 2007; Bristow et al., 2006; Xu, 2011). The indirect scheme can be regarded as a class of nonlinear controller according to its adaptive structure paradigm. However, more efforts are expected in this direction, which is one of the open issues. 4.3. Influence of varying trial lengths While considering the inherent influence of the varying trial lengths, it should go to the influence of unavailable tracking information for the untrodden section since all the left are identical to the conventional ILC problem. This starting point leads us to consider the compensation of unknown tracking information and its associate controller design based on the compensated signal. Thereafter, an open and interesting issue goes to the influence on the convergence speed under certain conditions. We note that the convergence speed issue is classic in the ILC literature as it is difficult to present a strict measurement. Early attempts are conducted according to the contraction mapping method in the λ-norm sense (Xu & Tan, 2002a; 20 02b; 20 03), where a Q-factor is defined as the upper limit of uk+1 λ / uk λ with uk denoting the input error. Then, a min-max optimization problem was solved to find a mathematical measurement of the convergence speed. However, it was argued in Schmid (2007) that the derived upper bound was conservative and cannot be employed to measure the convergence speed along the

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

JID: JARAP

ARTICLE IN PRESS

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

iteration axis. Thus, generally speaking, the convergence speed is still an open problem in the ILC field. In fact, the convergence speed within an iteration is affect by various factors, thus it is not apparent to establish a comprehensive index. Among the factors, the primary point is the inherent evolution dynamics in the time domain. That is, the whole system is dynamic during the operation process, and therefore, the latter tracking performance will be affected by the performance at the early stage. In other words, the convergence speed at the latter stage would be slower than that at the early stage since the latter may contain the tracking error of the early stage. Moreover, considering the varying trial length, we can observe that it results in different update frequencies for different time instants of the whole time interval (cf. Fig. 1). In particular, for the time interval [0, Nmin ], the input would be updated for all the iterations and thus the convergence speed is believed fast. While for the latter stage, the update frequency reduces as the time index increases. Consequently, the convergence speed would slow down along the time axis. In other words, the varying trial length has essential effect on the convergence speed; however, how to model and calculate the speed in consideration of the specific formulation of varying trial lengths is not clear at present. More efforts are expected in this direction, which can pave a significant way for practical applications. In short, the convergence speed issue is an interesting and tough direction for further developments of ILC with varying trial lengths. The primary difficulty for this issue is to present a suitable measurement of the convergence speed along iteration axis. 5. Convergence analysis In the previous sections, we provide a comprehensive review of models for varying trial lengths and the corresponding control synthesis. In this section, we proceed to consider the analysis topic to complete the whole framework. To this end, we need to revisit the essential difficulty under the environments of nonuniform trial lengths and then present the convergence analysis accordingly. In the following subsections, four representative convergence analysis approaches are reviewed. First, as can be seen from Section 3, the specific model of varying trial lengths imposes particular requirements for the analysis tools such as stochastic analysis and probability theory, since we mainly adopt the stochastic models. Moreover, a series of compensation mechanisms are presented in Section 4.1, which introduces more variables in learning algorithms in contrast to conventional P/PD-type learning algorithms. Therefore the convergence analysis should be changed correspondingly. In addition, due to that the varying trial length problem can be regarded as a special case of incomplete information, the time-dependence property of the newly introduced random variables should be treated carefully.

The concept of varying trial length problem was introduced by Seel et al when applying ILC to FES-induced foot motion stimulator (Seel et al., 2016a; Seel et al., 2016b) and a lab-scale gantry crane (Guth et al., 2013), in which there is no theoretical convergence analysis. The first analysis result on ILC with iteration-varying trial lengths was given in Seel et al. (2011), which was then extended to a journal paper (Seel et al., 2017). In fact, these two papers focus on the monotonic convergence property which is analyzed in a deterministic way. To facilitate the convergence analysis, let us consider a linear discrete-time system with a desired operation length Nd

yk (t ) = Cxk (t ).

For system (15), we assume that the minimum trial length is Nmin . That is, the actual trial lengths would vary in the set of {Nmin , Nmin + 1, . . . , Nd }. Denote

uk =[uTk (0 ), uTk (1 ), . . . , uTk (Nd − 1 )]T ∈ RNd p , yk =[yTk (1 ), yTk (2 ), . . . , yTk (Nd )]T ∈ RNd q , ek =[eTk (1 ), eTk (2 ), . . . , eTk (Nd )]T ∈ RNd q , with p and q being the dimension of input and output, respectively. The linear system (15) can be rewritten as follows:

y k = P uk + v ,

(16)

where P is a lifted lower-triangular matrix denoting the process of an iteration and v is the corresponding initial states. However, when the trial length is iteration-varying, the latter part of the tracking information would be missing. That is, for some time instants at latter part of a trial, the output yk and tracking error ek may be unavailable. To facilitate the convergence analysis, if the actual trial length at the kth iteration is Nk ( < Nd ), the last-(Nd − Nk ) elements of ek is simply replaced with 0. That is, the actual tracking error is

e∗k =[eTk (1 ), . . . , eTk (Nk ), 0Tq , . . . , 0Tq ]T ∈ RNd q ,

(15)

(17)

where 0m denotes a m-dimensional vector with all entries being zero. The operation corresponds to a pre-multiplication by the matrices MNk = diag{INk , ONd −Nk }  Iq , where Im and Om denote the unit matrix and all-zero matrix with dimension of m × m, respectively. That is,

e∗k = MNk ek .

(18)

The conventional P-type law (5) was employed in Seel et al. (2017, 2016b), i.e.,

uk+1 = uk + Le∗k = uk + LMNk ek ,

(19)

where L = diag{L, . . . , L}. Multiplying P from the left to both sides of (19) and subtracting from the reference yd (defined similar to yk ) leads to





ek+1 = ek − P LMNk ek = I − P LMNk ek ,

(20)

provided that the initial state is precisely reset to the one corresponding to the desired reference (i.e., xk (0 ) = xd (0 )). We will assume this resetting condition throughout this section. The monotonic convergence was derived for various p-norm senses in Seel et al. (2017, 2016b), for examples, p = 1, p = 2, and p = ∞. Here, we take the 1-norm as an example. Generally, to ensure the monotonic convergence along the iteration axis, a simple condition is that

I − P LMNk 1 ≤ 1, ∀t ∈ {Nmin , . . . , Nd }.

5.1. Deterministic analysis approach

xk (t + 1 ) = Axk (t ) + Buk (t ),

9

(21)

However, this condition is not easy to check in practice because the calculation burden may be fairly large when high sampling rates arise (i.e., Nd − Nmin is quite large). By carefully exploiting the structure of the associated matrix, a simple condition was presented in Seel et al. (2017, 2016b) as follows,

I − P L 1 ≤ 1.

(22)

The controller design with Q-filter was also considered in Seel et al. (2017, 2016b), where general controller design guidelines were provided. In short, Seel et al. (2017, 2016b) provide a comprehensive understanding of the tracking error relationship between successive iterations, which shows the inherent mechanism ensuring a monotonic convergence in the presence of varying trial lengths.

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

ARTICLE IN PRESS

JID: JARAP 10

5.2. Switching system approach As shown in the previous section, a conservative convergence condition is derived under certain matrix norms for ILC with varying trial lengths, which motivates the studies of stronger convergence results. In Shen et al. (2016a), an asymptotic convergence condition obtained by introducing a novel switching system approach will be presented in this section. Recalling the conventional P-type update law (19) for the linear system (16) and subtracting both sides from ud (defined similar to uk ), we have

 uk+1 = uk − LMNk ek = I − LMNk P uk ,

(23)

where uk = ud − uk is the input error. Denote Xk = I − LMNk P . Then (23) can be rewritten as follows:

uk+1 = Xk uk .

(24)

Due to the randomness of the trial length Nk taking values in the set {Nmin , . . . , Nd } and MNk , the matrix MNk is thus random and takes the values in the set {MNmin , . . . , MNd }, which therefore implies that the matrix Xk is random and it has finite possible values {I − LMNmin P , . . . , I − LMNd P }. In other words, the evolution of the lifted input error uk behaves as a random switching system along the iteration axis. Finally, the asymptotical convergence analysis of the input error uk now has been converted into the convergence analysis of the associated switching systems. From (24), it gives

uk+1 = Zk u0 ,

(25)

where Zk = Xk Xk−1 · · · X1 X0 is also a random matrix. In order to derive a convergence property, we calculate the statistics of the newly introduced matrix Zk Shen et al. (2016a). To this end, let S k = {Zk : taken over all sample paths}. Denote the mean of S k by Kk , then we are evident to have



Kk =

h 



pi (I − LMNmin +i−1 P ) Kk−1 ,

where pi = P(Xk = I − LMNmin +i−1 P ). Similarly, denote the covariance of S k by Rk , then simple calculations lead to

Rk = Jk −

Kk KkT ,

(27)

where Jk is generated recursively as h 







pi I − LMNmin +i−1 P Jk−1 I − LMNmin +i−1 P

5.3. Contraction mapping approach Contraction mapping approach is one of the most common analysis method in the ILC field. Generally, it is incorporated with the well-known λ-norm technique to transform the whole timedependent dynamics into an time-independent variable such that the asymptotical convergence along the iteration axis can be conducted. This approach is applicable to both linear and nonlinear systems (Li & Shen, 2017; Li et al., 2013; 2015; Li & Xu, 2015; Shen et al., 2016b; Wang, Li, & Shen, 2018). For illustration purpose, here we consider a simple affine nonlinear system (Shen et al., 2016b)

xk (t + 1 ) = f (xk (t )) + Buk (t ),

(30)

yk (t ) = Cxk (t ),

where f(·) is a nonlinear function satisfying the globally Lipschitz condition, f (x1 ) − f (x2 ) ≤ l f x1 − x2 . Note that the globally Lipschitz condition is mainly for the application of the λ-norm or equivalently the Gronwall lemma. This condition can be relaxed if the λ-norm technique is not applied. First of all, let us consider conventional P-type update law (5),

uk+1 (t ) = uk (t ) + Le∗k (t + 1 ) = uk (t ) + 1{t≤Nk } Lek (t + 1 ).

(31)

Subtracting both sides from ud (t) and taking norms leads to

uk+1 (t ) ≤ I − 1{t≤Nk } LCB uk (t ) + l f 1{t≤Nk } LC xk (t ) , (32)

T

.

where the globally Lipschitz condition is applied. Since a random variable 1{t≤Nk } is involved in (32), we take mathematical expectation to both sides

 I − 1{t≤Nk } LCB E[ uk (t ) ]   + l f E 1{t≤Nk } LC xk (t ) ,

E[ uk+1 (t ) ] ≤ E





(28)

i=1

These direct calculations pave a intuitive and effective way to derive the convergence conditions in various senses. Indeed, noticing that L is a block diagonal matrix, P is a block lower triangular matrix, and Mi is also block diagonal for Nmin ≤ i ≤ Nd , we are evident that LMi P is a block lower triangular matrix with diagonal blocks being LCB for the first i blocks or Op for the left Nd − i blocks. As a consequence, the convergence condition is given as

0 < I − LCB < I.

Remark 2. It is worth to note that the switching system approach is effective for linear control systems since the control signals at different time instant within an iteration can be lifted into a compact form, which however is not applicable to nonlinear systems. Techniques for nonlinear systems will be presented in the following sections.

(26)

i=1

Jk =

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

(29)

With condition, Shen et al. (2016a) has proved that uk converged to zero in mathematical expectation sense, mean-square sense, and almost sure sense simultaneously. That is, limk→∞ E[uk ] = 0, limk→∞ E[ uk 2 ] = 0, and P(limk→∞ uk = 0 ) = 1. Remark 1. Although the convergence condition (29) seems conservative in contrast to traditional ILC, stronger convergence results can be achieved and the probability distribution of the random trial length is not required. In other words, with the new analysis approach, the applicable range of the proposed algorithm is surely extended.



(33)

where the additional term E 1{t≤Nk } LC xk (t ) can be compressed by the well-known λ-norm techniques to an sufficiently small value compared with the main term E[ uk (t ) ]. For convergence of ILC law (31), a direct condition is

E



 I − 1{t≤Nk } LCB < 1.

(34)

However, this condition is hard to check for practical applications since the mathematical expectation operator is involved. To simplify the convergence condition (34), it is interesting to investigate whether the expectation operator and matrix norm operator can be exchanged with each other. Actually this problem has been solved in Shen et al. (2016b). In detail, assume that η is a Bernoulli binary random variable with P(η = 1 ) = η and P(η = 0 ) = 1 − η, and M is a positive-definite matrix. Then, the following equality

E[ I − ηM E ] = I − ηM E

(35)

holds if and only if one of the following conditions is satisfied: (1) η = 0; (2) η = 1; and (3) 0 < η < 1 and 0 < M ≤ I, where η¯ = Eη and · E denotes the Euclidean norm of a matrix. Because 1{t≤Nk }





is a binary random variable and E 1{t≤Nk } = p(t ), the convergence condition (34) can be rewritten as

I − p(t )LCB E < 1.

(36)

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

ARTICLE IN PRESS

JID: JARAP

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

Moreover, the above condition always holds if we can select L such that 0 < LCB < I. Clearly, the latter condition is much suitable for implementation in applications. As we discussed in Section 4.2, for systems with iterationvarying trial lengths the P-type ILC scheme simply replaces the missing information with zero which might slow down the learning speed. To expedite the learning process, several compensation mechanisms are provided. Here let us take the random-searchingcompensation mechanism as an example to derive the convergence condition by contraction mapping methodology. Consider the ILC law with random-searching-compensation mechanism

uk+1 (t ) =

m 1  γk+1− j (t )uk+1− j (t ) t nk j=1

1  L γk+1− j (t )ek+1− j (t + 1 ). ntk m

+

(37)

j=1

Clearly, it is not easy to derive a contraction map of uk (t) between two adjacent iterations based on (37) since a random average operator is involved. By subtracting ud (t) on both sides of (37) and applying the λnorm technique, it gives

uk+1 (t ) λ ≤

m 1  γk+1− j (t )ρ uk+1− j (t ) λ t nk j=1

≤ρ

max

j=1,2,...,m

uk+1− j (t ) λ ,

(38)

where ρ < 1 is a contraction coefficient satisfying I − LCB ≤ ρ with an appropriate L. Similarly, we can obtain

uk+2 (t ) λ ≤ ρ max uk+2− j (t ) λ . j=1,2,...,m

(39)

By noticing

max

(39) becomes

By similar derivations and mathematical induction principle, we have ∀ j = 1, 2, . . . , m

(40)

In other words, we have established a batch contraction of successive m iterations, i.e.,

max



(41)

with initial state xk (0 ) = x0 corresponding to the desired initial referential point, where θ ◦ (t) is unknown time-varying parameters and ξ ◦ (xk , t) is known nonlinear functions that may not satisfy Lipschitz condition. Unlike the previous sections, system (41) is formulated as a scalar system; that is, xk ∈ R, uk ∈ R. However, the results can be extended to multi-dimensional systems (Shen & Xu, 2019a; 2019b). It is assumed that the control direction is fixed. Without loss of generality, we assume that b(t) ≥ b > 0. Moreover, the continuous model of random trial length Tk presented in Section 3.1 will be applied. Denote the tracking reference by xr (t). The error dynamics is given as

−1

j=1,2,...,m

j=1,2,...,m

◦T

x˙ k = θ (t )ξ (xk , t ) + b(t )uk (t )

uk = −b

uk+2 (t ) λ ≤ ρ max uk+1− j (t ) λ .

j=1,2,...,m

In traditional ILC, it is well known that the contraction mapping method is only suitable for systems with globally Lipschitz condition. For systems without globally Lipschitz condition, the composite energy function (CEF) approach is a powerful tool to analyze the convergence of ILC algorithms, provided that the control system is repeated on a fixed time interval. Recently, it is interesting to note that this CEF approach can be also extended to deal with ILC with iteration-varying trial length problems (Shen & Xu, 2019a; 2019b), which will be reviewed in this section. Consider the following continuous-time parameterized nonlinear systems (Shen & Xu, 2019a)

T

◦T ◦ k

ξ − b−1 x˙ r )

ξ k ),

(42)

where the arguments are omitted for saving space without causing ◦ ◦ ◦T ◦T confusion, ξ k = ξ (xk , t ), θ = [b−1 θ , −b−1 ]T , ξ k = [ξ k , x˙ r ]T . Based on this error dynamics, the feedback control can be designed as

uk+1− j (t ) λ ,

uk+ j (t ) λ ≤ ρ max uk+1− j (t ) λ .

5.4. Composite energy function approach

= b( u k + θ

j=2,...,m

j=1,2,...,m

one wonder whether some approach can be proposed to derive the time-iteration-varying gain matrix Lk,t rather than a fixed L. Recent, a study introduced the Kalman filtering-based technique to generate the gain matrix by optimizing the trace of the input error covariance matrix (Liu, Shen, & Wang, 2019b).

e˙ k = x˙ k − x˙ r = b(uk + b−1 θ

uk+2− j (t ) λ j=1,2,...,m   ≤ max max uk+2− j (t ) λ , uk+1 (t ) λ max



11

uk+ j (t ) λ ≤ ρ max uk+2− j (t ) λ , j=1,2,...,m

rather than a single contraction of two adjacent iterations. In short, comparing the conventional P-type law and the average-operator based law, we find that the design condition for the learning gain matrix L is almost the same. However, the convergence analysis using contraction mapping method should be extended from adjacent contraction to batch contraction. Similarly, the contraction mapping in Li et al. (2013); Li and Xu (2015) was carried out according to the averaged input error rather than the input error itself because the latter is difficult to check under the all-historical-data-compensation mechanism. In addition, a sampled-data learning control scheme was provided in Wang et al. (2018) by using contraction mapping technique because a sampled-data system is essentially discrete. In addition,

T

μe k −  θ k ξ k , t ≤ Tk ,

(43)

where μ > 0 is the feedback gain and  θ k is the estimation of the unknown time-varying system uncertainty θ defined in (42). Although the operation process may end at any time instant Tk , we should update the estimation of θ for the whole time interval [0, Td ]. A simple mechanism for the untrodden section of the iteration is to copy the parameter value in the previous iteration. That is, the update law for  θ k is given by

 θk =

  θ k−1 + ηξ k ek , t ≤ Tk ,  θ

k−1 ,

Tk < t ≤ T ,

(44)

with  θ −1 = 0, ∀t ∈ [0, T], where η > 0 is the learning gain. To prove the asymptotical convergence, we need to define a CEF similar to those in Xu (2011). The CEF is generally defined as a combination of the tracking error and a functional of the parameter estimation error. Thus it can be regarded as a kind of energy of the whole system. Our objective is to derive a decreasing trend of the introduced CEF and the asymptotical convergence is a direct consequence of this objective. However, it is not straightforward to accomplish this step due to the varying trial length. To get an intuitive recognition of the difficulty, a simple example is presented in Fig. 6. For the regular trial length case shown in

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

ARTICLE IN PRESS

JID: JARAP 12

Fig. 6. CEF design for regular length and varying lengths.

Fig. 6(a), the CEF for the kth and (k + 1 )th iterations are with the same trial length, which indicates that the functional in the CEFs are with identical integral interval. The above mentioned decreasing trend can directly lead to an asymptotical convergence of the tracking error and estimation error. While considering the varying trial length case shown in Fig. 6(b), the CEFs are defined on different time intervals for the kth and (k + 1 )th iterations. As illustrated in Fig. 6(b), the trial length for the (k + 1 )th iteration is shorter than that for the kth iteration. In this case, even if we have proved that the CEF at the (k + 1 )th iteration is less than that at the kth iteration, we cannot deduce that the decreasing trend is originated from the improvement of tracking performance. In other words, the varying trial length leads to inconclusive decision of the performance improvements. To solve the above mentioned problem, a simple and direct idea is compensate the untrodden section with suitable values and thus the CEF at two successive iterations can be compared on the same time interval. In such case, the convergence analysis will be performed similarly as the traditional CEF approach (Xu, 2011). To facilitate the convergence analysis, a virtual tracking error  k (t) is defined as follows:



k (t ) =

ek (t ),

0 ≤ t ≤ Tk ,

ek (Tk ), Tk < t ≤ T .

(45)

That is, k (t ) = γk (t )ek (t ) + (1 − γk (t ))ek (Tk ), 0 ≤ t ≤ T, based on which a CEF is proposed

Ek (t ) =

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

1 2 1  (t ) + 2 k 2η



t 0

T b(τ ) θ k (τ ) θ k ( τ )d τ ,

(46)

with  θk   θ k − θ denoting the estimation error. Thanks to the newly proposed CEF, the convergence analysis of the controller (43) with the associated parameter estimation (44) can be performed similarly as the one presented in Xu (2011). It is worthwhile to mention that with the newly proposed CEF, more practical and complicated ILC problems with iterationvarying trial length can be solved, such as the systems with nonparameterized uncertainty (Shen & Xu, 2019b), systems with partial structure information (Zeng et al., 2018), robot manipulators (Zeng et al., 2019), and general nonlinear systems with initial state deviation and unknown control direction (Liu, Shen, & Wang, 2019a). Moreover, it would also be straightforward to extend the ILC with nonuniform trial length problem to systems with unknown lower bound of input gain, systems with iteration-varying tracking references, high-order systems, and multi-input-multioutput systems, etc, by applying the modified CEF with the newly defined virtual tracking error. While we assume a random model of trial length in most studies, the convergence results are generally expressed in a deterministic interval. It is due to the view angle in expressions. In particular, the trial length is defined in the time domain at arbitrary iteration and thus is random, while the convergence is established in the iteration domain for arbitrary time instant and then the set of all valid time instants is deterministic. 6. Promising directions and open issues In the previous sections, existing techniques for dealing with ILC with iteration-varying trial lengths, including the models of

varying trial lengths, control synthesis and convergence analysis, have been summarized, in which the key ideas and tricks for handling with the nonuniform trial lengths in ILC have been presented clearly. In this section, some associated extensions and open issues in the field of ILC with randomly varying trial lengths are discussed. Firstly, we would like to discuss two possible directions for further extensions, for which the problem formulation and/or analysis approaches may not be covered by previous sections: (1) New control systems with iteration-varying trial lengths. Although both linear and nonlinear systems with nonuniform trial lengths have been considered in the previous sections, there are many specific control systems in practice being excluded, such as systems with noises, fractional order dynamic systems, impulsive systems, etc. In fact, some results have been achieved in this area. For instance, a class of discrete-time affine-nonlinear systems with random noises is studied in Shi, He, and Zhou (2016) under the frame work of ILC with nonuniform trial lengths,

xk (t + 1 ) = f (xk (t ), t ) + B(t )uk (t ) + wk (t ), yk (t ) = C (t )xk (t ) + vk (t ),

(47)

with wk (t) and vk (t) being bounded random variables. In this work, a modified all-historical-data-compensation mechanism was adopted for completing the untrodden tracking information. However, due to the existence of noises/uncertainties in the system, only the convergence to a neighborhood of zero is proved for the expectation of the tracking error. In this direction, stronger convergence results are expected since there would be many type of uncertainties in practical systems. Moreover, the problem of ILC with varying trial lengths has been extended to a nonlinear fractional order dynamic system in Liu and Wang (2017), c α 0 Dt xk

(t ) = f (xk (t ), t ) + Buk (t ), t ∈ [0, Tk ]

yk (t ) = Cxk (t ) + D



t 0

ϕ (s, uk (s ))ds,

(48)

where c0 Dtα xk (t ) denotes the Caputo derivative with lower limit zero of order α to the function xk at time t. With an iteration-moving-average-compensation mechanism, various learning algorithms are developed and the convergence analysis are performed by virtue of the contraction mapping method. Additionally, Liu, Debbouche, and Wang (2017) considers a stochastic system that is governed by the following random impulsive differential equations

x˙ (t ) = ax(t ) + f (t ), x(

ti+

t ∈ [0, Td ]\{t1 , t2 , . . . , tN−1 }

) − x( ) = i (x(ti− )) + i , ti−

i ∈ {1, 2, . . . , N − 1}. (49)

In this work, according to the impulsive time instants, the operation interval is divided into several segments which are defined as the trial lengths of ILC, instead of the whole time interval. To deal with the iteration-varying trial lengths, the iteration-moving-average-compensation mechanism is applied in the controller design and similar results are achieved as in Liu and Wang (2017). On top of Liu et al. (2017), ILC with random trial lengths is extended to systems governed by fractionalorder random impulsive differential equations in Liu, Wang, Shen, and O’Regan (2018), where both random-searchingcompensation and fixed-searching-compensation mechanisms proposed in Shen et al. (2016b) are employed. (2) New control problems with iteration-varying trial lengths. In traditional ILC, besides the identical trial length, there are

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

JID: JARAP

ARTICLE IN PRESS D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

many other constraints, such as identical initial state, identical learning target, fixed control direction, etc. Whether it is possible to relax one or more of these constraints together with the nonuniform trial length is also an interesting problem, which would extend the field of ILC applications in real-time. In literature, there have been several achievements in this direction. For instance, ILC with both nonuniform trial lengths and non-identical initial condition is investigated in Li et al. (2013); Shen et al. (2016b); Wang et al. (2018). Furthermore, a general vector relative degree issue is considered in Wei and Li (2017) for a discretetime linear system with iteration-varying trial length. Furthermore, let us discuss about some open issues existing in the area of ILC with iteration-varying trial lengths: (1) Randomness of the trial length. In existing literature, a common assumption is that the nonuniform trial length is randomly varying in the iteration domain. That is, the trial lengths of previous iterations will not affect that of the current iteration. For such case, it has been shown that the binary random variable model is a powerful tool to handle the varying trial lengths. However, in practice the trial length may not be completely random but depend on information of previous iterations. How to include the iteration dependent property of the nonuniform trial lengths into ILC design is an interesting topic. (2) Dynamic dependent compensation mechanisms. As discussed in Section 4.1, most of the existing compensation mechanisms are designed to compensate missing information by using that of previous iterations at the corresponding time instant, and time dependent dynamics are never considered in the compensation methods. Therefore, whether the missing information can be compensated by applying predicated system information (which may be achieved by using prediction techniques, such as model predictive control, machine learning method, etc), i.e., how to design a time dependent compensation approach, is a promising direction. (3) Convergence speed evaluation. As we have mentioned in Section 4.3, due to the randomness of the trial lengths, the direct impact is for slowing down the learning speed. However, no results are reported to evaluate the relationship between the learning speed and randomness of the trial lengths. A prediction based approach was reported recently in Lin, Chi, and Huang (2019). (4) New updating laws. In connection with 3), the convergence speed of learning algorithms would be slowed down because of the randomness of the trial lengths, if conventional types of ILC algorithms, such as the PID form, are adopted with just modified tracking errors. Thus it would be valuable, from application point of view, to design new type of learning schemes that can expedite the learning speed or improve the learning performance even the trial lengths are randomly varying. For example, the nonlinear scheme is a promising candidate. (5) Real-time validations of ILC with nonuniform trial length. As can be seen from the previous sections, most of results achieved in ILC with iteration-varying trial lengths lie in theoretical analysis and simulation study, and it is rarely validated by real-time applications. Whether the existing ILC schemes with varying trial lengths can achieve good performance in real-time implementation is an open problem. Therefore, it would be very helpful to further improve the ILC schemes if they can be implemented and validated experimentally.

[m5G;October 22, 2019;9:31] 13

7. Conclusions This paper provides a survey of recent contributions on ILC with nonuniform trial lengths, which is quite common in practical applications. In order to clearly presented the achievements in this area, the related works are reviewed and summarized in three parts, including the models (random and deterministic) of varying trial lengths, control synthesis (compensation mechanisms and control algorithms) and convergence analysis approaches (deterministic analysis approach, switching system approach, contraction mapping approach and composite energy function approach), where the key ideas in designing ILC under the circumstance of varying trial lengths are elaborated. For further developments, possible extensions and open issues in the field of ILC with iterationvarying trial lengths are also discussed. More details in techniques can refer to a recent monograph (Shen & Li, 2019). Declaration of Competing Interest The authors declared that they have no conflicts of interest to this work. References Ahn, H.-S., Chen, Y., & Moore, K. L. (2007). Iterative learning control: Brief survey and categorization. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 37(6), 1099–1121. Altin, B., Willems, J., Oomen, T., & Barton, K. (2017). Iterative learning control of iteration-varying systems via robust update laws with experimental implementation. Control Engineering Practice, 62, 36–45. Arimoto, S., Kawamura, S., & Miyazaki, F. (1984). Bettering operation of robots by learning. Journal of Robotic systems, 1(2), 123–140. Boeren, F., Bareja, A., Kok, T., & Oomen, T. (2016). Frequency-domain ilc approach for repeating and varying tasks: With application to semiconductor bonding equipment. IEEE/ASME Transactions on Mechatronics, 21(6), 2716–2727. Bristow, D. A., Tharayil, M., & Alleyne, A. G. (2006). A survey of iterative learning control: A learning-based method for highperformance tracking control. IEEE Control systems magazine, 26(3), 96–114. Cheah, C.-C., & Xu, J.-X. (20 0 0). Comments on “direct learning of control efforts for trajectories with different time scales”[with reply]. IEEE Transactions on Automatic Control, 45(6), 1214–1215. Chien, C.-J. (2008). A combined adaptive law for fuzzy iterative learning control of nonlinear systems with varying control tasks. IEEE Transactions on Fuzzy Systems, 16(1), 40–51. Guth, M., Seel, T., & Raisch, J. (2013). Iterative learning control with variable pass length applied to trajectory tracking on a crane with output constraints. In 52nd ieee conference on decision and control (pp. 6676–6681). IEEE. Jin, X. (2017). Iterative learning control for non-repetitive trajectory tracking of robot manipulators with joint position constraints and actuator faults. International Journal of Adaptive Control and Signal Processing, 31(6), 859–875. Kawamura, S. (1987). Intelligent control of robot motion based on learning method. In Proc. of the IEEE Intelligent Control, 1987. Kawamura, S., & Fukao, N. (1995). A time-scale interpolation for input torque patterns obtained through learning control on constrained robot motions. In Proceedings of 1995 ieee international conference on robotics and automation: 2 (pp. 2156–2161). IEEE. Li, J., & Li, J. (2014). Adaptive fuzzy iterative learning control with initial-state learning for coordination control of leader-following multi-agent systems. Fuzzy sets and Systems, 248, 122–137. Li, X., & Shen, D. (2017). Two novel iterative learning control schemes for systems with randomly varying trial lengths. Systems & Control Letters, 107, 9–16. Li, X., Xu, J.-X., & Huang, D. (2013). An iterative learning control approach for linear systems with randomly varying trial lengths. IEEE Transactions on Automatic Control, 59(7), 1954–1960. Li, X., Xu, J.-X., & Huang, D. (2015). Iterative learning control for nonlinear dynamic systems with randomly varying trial lengths. International Journal of Adaptive Control and Signal Processing, 29(11), 1341–1353. Li, X.-D., Chow, T. W. S., Ho, J. K., & Zhang, J. (2009). Iterative learning control with initial rectifying action for nonlinear continuous systems. IET Control Theory & Applications, 3(1), 49–55. Li, X.-D., Lv, M.-M., & Ho, J. K. (2016). Adaptive ilc algorithms of nonlinear continuous systems with non-parametric uncertainties for non-repetitive trajectory tracking. International Journal of systems science, 47(10), 2279–2289. Li, X.-F., & Xu, J.-X. (2015). Lifted system framework for learning control with different trial lengths. International Journal of Automation and Computing, 12(3), 273–280.

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003

JID: JARAP 14

ARTICLE IN PRESS

[m5G;October 22, 2019;9:31]

D. Shen and X. Li / Annual Reviews in Control xxx (xxxx) xxx

Lin, N., Chi, R., & Huang, B. (2019). Auxiliary predictive compensation-based ilc for variable pass lengths. IEEE Transactions on Systems, Man, and Cybernetics: Systems. Liu, C., Shen, D., & Wang, J. (2019a). Adaptive learning control for general nonlinear systems with nonuniform trial lengths, initial state deviation, and unknown control direction. International Journal of Robust and Nonlinear Control doi:10.1002/rnc.4718. Liu, C., Shen, D., & Wang, J. (2019b). A two-dimensional approach to iterative learning control with randomly varying trial lengths. Journal of Systems Sciences and Complexity. Liu, S., Debbouche, A., & Wang, J. (2017). On the iterative learning control for stochastic impulsive differential equations with randomly varying trial lengths. Journal of Computational and Applied Mathematics, 312, 47–57. Liu, S., & Wang, J. (2017). Fractional order iterative learning control with randomly varying trial lengths. Journal of the Franklin Institute, 354(2), 967–992. Liu, S., Wang, J., Shen, D., & O’Regan, D. (2018). Iterative learning control for noninstantaneous impulsive fractional-order systems with varying trial lengths. International Journal of Robust and Nonlinear Control, 28(18), 6202–6238. Longman, R., & Mombaur, K. (2006). Investigating the use of iterative learning control and repetitive control to implement periodic gaits. In Fast motions in biomechanics and robotics (pp. 189–218). Springer. Meng, D., & Zhang, J. (2017). Deterministic convergence for learning control systems over iteration-dependent tracking intervals. IEEE transactions on neural networks and learning systems, 29(8), 3885–3892. Park, K.-H. (2005). An average operator-based pd-type iterative learning control for variable initial state error. IEEE Transactions on Automatic Control, 50(6), 865–869. Saab, S. S., Vogt, W. G., & Mickle, M. H. (1997). Learning control algorithms for tracking” slowly” varying trajectories. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 27(4), 657–670. Schmid, R. (2007). Comments on robust optimal design and convergence properties analysis of iterative learning control approaches and on the p-type and newton– type ilc schemes for dynamic systems with non-affine input factors. Automatica, 43(9), 1666–1669. Seel, T., Schauer, T., & Raisch, J. (2011). Iterative learning control for variable pass length systems. IFAC Proceedings Volumes, 44(1), 4880–4885. Seel, T., Schauer, T., & Raisch, J. (2017). Monotonic convergence of iterative learning control systems with variable pass length. International Journal of Control, 90(3), 393–406. Seel, T., Werner, C., Raisch, J., & Schauer, T. (2016a). Iterative learning control of a drop foot neuroprosthesis – generating physiological foot motion in paretic gait by automatic feedback control. Control Engineering Practice, 48, 87–97. Seel, T., Werner, C., & Schauer, T. (2016b). The adaptive drop foot stimulator–multivariable learning control of foot pitch and roll motion in paretic gait. Medical engineering & physics, 38(11), 1205–1213. Shen, D. (2018a). Iterative learning control with incomplete information: a survey. IEEE/CAA Journal of Automatica Sinica, 5(5), 885–901. Shen, D. (2018b). Iterative Learning Control with Passive Incomplete Information: Algorithms Design and Convergence Analysis. Springer. Shen, D., & Chen, H.-F. (2012). Iterative learning control for large scale nonlinear systems with observation noise. Automatica, 48(3), 577–582. Shen, D., & Li, X. (2019). Iterative Learning Control for Systems with Iteration-Varying Trial Lengths. Springer. Shen, D., & Wang, Y. (2014). Survey on stochastic iterative learning control. Journal of Process Control, 24(12), 64–77. Shen, D., & Wang, Y. (2015a). Ilc for networked nonlinear systems with unknown control direction through random lossy channel. Systems & Control Letters, 77, 30–39. Shen, D., & Wang, Y. (2015b). Iterative learning control for networked stochastic systems with random packet losses. International Journal of Control, 88(5), 959–968. Shen, D., & Xu, J.-X. (2019a). Adaptive learning control for nonlinear systems with randomly varying iteration lengths. IEEE Transactions on Neural Networks and Learning Systems, 30(4), 1119–1132. Shen, D., & Xu, J.-X. (2019b). Robust learning control for nonlinear systems with nonparametric uncertainties and nonuniform trial lengths. International Journal of Robust and Nonlinear Control, 29(5), 1302–1324.

Shen, D., Zhang, W., Wang, Y., & Chien, C.-J. (2016a). On almost sure and mean square convergence of p-type ilc under randomly varying iteration lengths. Automatica, 63, 359–365. Shen, D., Zhang, W., & Xu, J.-X. (2016b). Iterative learning control for discrete nonlinear systems with randomly iteration varying lengths. Systems & Control Letters, 96, 81–87. Shi, J., He, X., & Zhou, D. (2016). Iterative learning control for nonlinear stochastic systems with variable pass length. Journal of the Franklin Institute, 353(15), 4016–4038. Sun, M., & Wang, D. (2002). Iterative learning control with initial rectifying action. Automatica, 38(7), 1177–1182. Sun, M., & Wang, D. (2003). Initial shift issues on discrete-time iterative learning control with system relative degree. IEEE Transactions on Automatic Control, 48(1), 144–148. Wang, L., Li, X., & Shen, D. (2018). Sampled-data iterative learning control for continuous-time nonlinear systems with iteration-varying lengths. International Journal of Robust and Nonlinear Control, 28(8), 3073–3091. Wei, Y.-S., & Li, X.-D. (2017). Varying trail lengths-based iterative learning control for linear discrete-time systems with vector relative degree. International Journal of Systems Science, 48(10), 2146–2156. Xu, J.-X. (1998). Direct learning of control efforts for trajectories with different time scales. IEEE Transactions on Automatic Control, 43(7), 1027–1030. Xu, J.-X. (2011). A survey on iterative learning control for nonlinear systems. International Journal of Control, 84(7), 1275–1294. Xu, J.-X., & Feng, P. D. (1998). Time-scale direct learning control scheme with an application to a robotic manipulators. Int. J. Intell. Contr. Syst., 2(2), 315–328. Xu, J.-X., & Song, Y. (1998). Direct learning control of non-uniform trajectories. In Z. Bien, & J.-X. Xu (Eds.), Iterative learning control–analysis, design, integration and applications (pp. 261–283). Springer. Xu, J.-X., & Song, Y. (20 0 0). Multi-scale direct learning control of linear time-varying high-order systems. Automatica, 36(1), 61–68. Xu, J.-X., & Tan, Y. (2002a). On the p-type and newton-type ilc schemes for dynamic systems with non-affine-in-input factors. Automatica, 38(7), 1237–1242. Xu, J.-X., & Tan, Y. (2002b). Robust optimal design and convergence properties analysis of iterative learning control approaches. Automatica, 38(11), 1867–1880. Xu, J.-X., & Tan, Y. (2003). Linear and nonlinear iterative learning control: 291. Springer. Xu, J.-X., & Xu, J. (2004). On iterative learning from different tracking tasks in the presence of time-varying uncertainties. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 34(1), 589–597. Xu, J.-X., Xu, J., & Viswanathan, B. (2002). Recursive direct learning of control efforts for trajectories with different magnitude scales. Asian Journal of Control, 4(1), 49–59. Xu, J.-X., & Yan, R. (2005). On initial conditions in iterative learning control. IEEE Transactions on Automatic Control, 50(9), 1349–1354. Xu, J.-X., & Zhu, T. (1999). Dual-scale direct learning control of trajectory tracking for a class of nonlinear uncertain systems. IEEE Transactions on Automatic Control, 44(10), 1884–1888. Yu, M., & Li, C. (2017). Robust adaptive iterative learning control for discrete-time nonlinear systems with time-iteration-varying parameters. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 47(7), 1737–1745. Yu, Q., Bu, X., Chi, R., & Hou, Z. (2018). Modified p-type ilc for high-speed trains with varying trial lengths. In 2018 ieee 7th data driven control and learning systems conference (ddcls) (pp. 1006–1010). IEEE. Zeng, C., Shen, D., & Wang, J. (2018). Adaptive learning tracking for uncertain systems with partial structure information and varying trial lengths. Journal of the Franklin Institute, 355(15), 7027–7055. Zeng, C., Shen, D., & Wang, J. (2019). Adaptive learning tracking for robot manipulators with varying trial lengths. Journal of the Franklin Institute, 356(12), 5993–6014. Zhu, Q., & Xu, J.-X. (2017). Dual im-based ilc scheme for linear discrete-time systems with iteration-varying reference. IET Control Theory & Applications, 12(1), 129–139.

Please cite this article as: D. Shen and X. Li, A survey on iterative learning control with randomly varying trial lengths: Model, synthesis, and convergence analysis, Annual Reviews in Control, https://doi.org/10.1016/j.arcontrol.2019.10.003