Near-optimal neural-network robot control with adaptive gravity compensation

Near-optimal neural-network robot control with adaptive gravity compensation

ARTICLE IN PRESS JID: NEUCOM [m5G;January 29, 2020;15:40] Neurocomputing xxx (xxxx) xxx Contents lists available at ScienceDirect Neurocomputing ...

1MB Sizes 0 Downloads 77 Views

ARTICLE IN PRESS

JID: NEUCOM

[m5G;January 29, 2020;15:40]

Neurocomputing xxx (xxxx) xxx

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Near-optimal neural-network robot control with adaptive gravity compensation M. Razmi, C.J.B. Macnab∗ Schulich School of Engineering, University of Calgary, Calgary, Alberta, Canada

a r t i c l e

i n f o

Article history: Received 14 February 2019 Revised 16 October 2019 Accepted 9 January 2020 Available online xxx Communicated by Dr. Biao Luo Keywords: Nonlinear optimal control Direct adaptive control Neural-adaptive control Overlearning Weight drift Bursting Cerebellar Model Articulation Controller Elastic-joint robot

a b s t r a c t Adaptive nonlinear optimal control methods, as proposed in the literature, give rise to some questions around practical implementation in robotics, especially how to find a solution in a reasonable time and how to deal with gravity. This paper proposes a method to solve these problems by using a neural network with local basis-function domains, specifically the Cerebellar Model Articulation Controller (CMAC). The algorithm uses the local domains in order to keep track of the value of local cost-functionals. In this way, it can freeze the learning of the network’s weights in a feedforward-like component in the CMAC when the bias has been overcome identified by using an error-based cost-functional e.g. automatic gravity compensation in a robot. After the feedforward component has been established, the algorithm then starts to learn another set of weights which contribute to feedback-like terms in the CMAC and these weights get frozen when they no longer reduce a cost-functional that includes additional control effort e.g. in a robot the control effort beyond that needed to compensate for gravity is penalized. Lyapunov methods guarantee uniformly ultimately bounded signals and ensure weight drift and bursting do not occur. One advantage is that the training time for finding a near-optimal control does not increase over previous neural-adaptive methods. Another advantage is that penalizing the control effort in a search for optimization does result in any steady-state error due to gravity. Simulations show that the proposed method significantly outperforms a standard adaptive-CMAC control using e-modification, without increasing control effort or training time. An experimental flexible-joint robot verifies that the adaptive method significantly outperforms a linear quadratic regulator. © 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license. (http://creativecommons.org/licenses/by-nc-nd/4.0/)

1. Introduction Neural-adaptive control remains a promising methodology for systems with nonlinear and uncertain dynamics, where knowing the structure of the dynamics allows adaptation to uncertainty for constant parameters and nonlinear functions with known inputs. These schemes typically update the neural weights according to robust adaptive control laws derived with Lyapunov methods. Since update laws typically increase the weight magnitudes until the output error vanishes, without a careful (robust) design continued training may lead to excessive control effort or undesirable control chatter when real-world effects prevent the error from going exactly to zero. We point out an analogy to overlearning during static training. In a worst-case scenario the excessive control leads to bursting, where the output error suddenly increases after a period of convergence. Robust designs for the Cerebellar Model Articulation Controller (CMAC) have proved particularly challeng∗

Corresponding author.

ing due to the local nature of the hypercube cells that form the basis-function domains. Specifically, an oscillation across the origin and between two (or more) cells tends to force weights to grow in opposite directions on either side of the origin. Typical robust modifications for preventing weight drift, like leakage or deadzone, tend to sacrifice too much performance for practical purposes in the case of CMAC [1]. Since CMAC learns much faster than multilayer perceptrons and handles many more inputs than radial-basisfunction networks, solving this problem satisfactorily might significant advance the field of robotic control in particular e.g. robots could quickly adapt to unknown payloads, changing friction, and/or adapt to the type of highly nonlinear functions that often arise in robot applications outside of controlled environments. Previously proposed novel solutions for robust adaptive CMAC have merely stopped the weight growth before bursting can occur [2–6]; but in this work we aim to stop the weight growth at a particularly desirable point. Previous approaches in the literature addressing adaptive nonlinear optimal controls typically use approximators (neural

https://doi.org/10.1016/j.neucom.2020.01.026 0925-2312/© 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license. (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Please cite this article as: M. Razmi and C.J.B. Macnab, Near-optimal neural-network robot control with adaptive gravity compensation, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.026

JID: NEUCOM 2

ARTICLE IN PRESS

or fuzzy) to identify either the forward dynamics or a cost functional, and information gleaned from this forward approximator (identifier) then guides the training of the approximateadaptive control [7–13]. However, training of the identifier and the search for the optimal control in these approaches takes significant time. Approaches for acheiving near-optimal control include [8,14–17] – however these approaches still all involve some type of search that would require extra time. In contrast, the method in this paper aims to find a near-optimal feedback-control CMAC during a normal training interval, by just freezing a set of reactive weights when the cost-functional has reached its lowest value so far. Previous approaches for gravity compensation in robots include [18–21], but these approaches do not address how one might also achieve an optimal or near-optimal control signal. Penalizing control effort in an optimization cost-functional can cause steady-state error. Our proposed solution trains a feedforward CMAC’s nonreactive weights during an initial training interval while control effort remains unpenalized, so as to first eliminate steady-state error in the presence of a bias term. The algorithm automatically detects the end of this initial training interval. Thus, in robotics applications our method can adapt to gravitational force and/or unknown payloads so as to achieve (nearly) zero steady-state error with near-minimal control effort, and then an (extra) near-optimal control effort provides robustness to disturbances. In the proposed method a cell’s cost-functional evaluation occurs over four subsequent cell activations on the same CMAC layer, so that an entire period of a (small) oscillation can be included assuming the inputs are close in frequency. The sum of the nonreactive weights and reactive weights provide a supervisory term in a leakage-like robust weight update. Lyapunov methods guarantee stability, where a robust CMAC (trained with conservative emodification) and a performance CMAC (trained with supervisory term but with arbitrarily bounded weights) ensure uniformly ultimately bounded (UUB) signals. The organization of this paper is as follows. Section 2 describes the structure of CMAC as background information. Section 3 describes the proposed method, with a method for achieving UUB signals for training CMAC with any arbitrary computational algorithm in Section 3.1, and then a description of the proposed computational algorithm that stops training when weights no longer reduce cost-functionals in Section 3.2. The Results (Section 4) develops the method for a two-link flexible-joint robot and shows e-mod compared to the proposed method in simulation (Section 4.2), followed by an experiment comparing the nearoptimal control to LQR control on an experiment Quanser robot (Section 4.3). The Conclusions include comments on future work for achieving true on-line optimization (Subsection 5.1). 2. Background The CMAC was one the first artificial neural networks capable of approximating a nonlinear function [22]. Like the human cerebellum, the algorithm’s structure proves particularly suitable for providing low-level feedback control signals. Similar to a radialbasis-function network, the CMAC outputs a weighted sum of basis functions to estimate a nonlinear function f(x) using

fˆ(x ) = w,

[m5G;January 29, 2020;15:40]

M. Razmi and C.J.B. Macnab / Neurocomputing xxx (xxxx) xxx

(1)

where row vector  contains basis functions and column vector w holds the weights. But the CMAC provides a computationally efficient way of defining and indexing the basis function domains – by placing them in m offset arrays, or layers, of hypercube cells. Thus, the n-dimensional input only activates one cell per layer, enabling real-time calculations even when the number of inputs grows large. The algorithm stacks the layers in an offset manner

Fig. 1. A one-input binary CMAC with m = 3 layers, q = 4 quantizations, with activated basis functions illustrated with thick lines.

providing overlap between the activated cells and local generalization ability. The original CMAC uses rectangular basis functions, i.e. binary activations, and the output simply becomes the sum of activated weights (Fig. 1). If each array has q quantizations per input, then the total number of cells ends up being N = mqn ; a random hash-coding scheme maps the (previously) activated cells to a large physical memory, avoiding physical allocation of these N memory locations [23]. The CMAC provides universal nonlinear approximation ability; specifically it can approximate a nonlinear function f(x) with input vector x ∈ Rn in a local region D ∈ Rn with estimate fˆ such that

f (x ) = fˆ(x ) + d (x ),

∀x ∈ D,

(2)

where d(x) is the bounded approximation error such that |d (x )| < dmax ∀ x ∈ D, where dmax is a positive constant. Researchers have often utilized CMAC in control of minimumphase systems without disturbances e.g. for robots in [24–26]. However, when inputs have oscillations (due to dynamics or disturbances), the weight drift problem becomes difficult to address due to the local nature of the cells; typically one gets an unreasonable trade-off between stability and performance when using standard robust-adaptive-control update modifications. 3. Proposed method We present the proposed method in two parts: 1. UUB Signals: Guaranteeing UUB signals for any computational-algorithm-based weight update, 2. Near-Optimal Control: Designing a computational algorithm to achieve near-optimal control. In order to guarantee UUB signals for Part (1), the computational algorithm updates only one set of weights, the arbitrarily bounded performance weights, while another set of robust weights trains in parallel using traditional e-modification. To accomplish Part (2), our proposed robust computational algorithm identifies two sets of weights: the nonreactive weights suitable for a feedforward/bias term and the reactive weights that should provide a near-optimal (additional) control signal. The sum of the reactive and nonreactive weights supervises the training of the performance weights. Assumption 1. We follow the standard framework of neuraladaptive controls applied to robotic manipulators typically used in the literature. Specifically, we assume that the electrical dynamics are much faster than the mechanical dynamics, such that torques at the joints can be designed as the control signals rather than voltages for the motors. We also assume the control-rate of the robot is fast enough that continuous-time designs will suffice i.e. effects from discretization of the control signal are small enough that discrete-time control designs would not visibly improve performance. For the construction of the robot, the most important

Please cite this article as: M. Razmi and C.J.B. Macnab, Near-optimal neural-network robot control with adaptive gravity compensation, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.026

ARTICLE IN PRESS

JID: NEUCOM

[m5G;January 29, 2020;15:40]

M. Razmi and C.J.B. Macnab / Neurocomputing xxx (xxxx) xxx

3

assumption is that optical encoders have been placed on each side of the harmonic drive allowing direct, noise-free measurement of both rotor and link angles. 3.1. UUB signals In this section we assume the designer has created a computational algorithm for training a CMAC that seems to achieve both high performance and weight convergence in practice/experiment, but does not have mathematical stability guarantees; we shall refer to this as a performance algorithm. Achieving UUB-guaranteed signals while using any arbitrary performance algorithm involves imposing an arbitrarily bound on the resulting performance weights, while training a set of robust weights in parallel using a traditional robust weight update modification (in our case e-modification). If the performance algorithm seems to work in practice/experiment, then one should expect the performance weights to never reach the imposed bound under the same conditions as the experiment. However, if poor design or an unexpected disturbance causes the performance weights to saturate, then the continued training of the robust weights will still guarantee UUB signals. Note that this method differs significantly from the well-known adaptive parameter/weight projection method, because the saturation function does not require knowledge of the ideal weights (in practice a designer could choose bounds with just safety considerations in mind). Although we ultimately test the method with a flexible-joint robot in the results section, here we develop the method for a just a single-input, single-output nonlinear system. Later we show how it can be extended to strict-feedback (backsteppable) systems like the flexible-joint robot. If we wish to track desired trajectory xdes , x˙ des with error z = x − xdes , then first write the error dynamics

z˙ = f (x ) + bu,

(3)

where f is the nonlinearity, u the control input, and we assume b a positive constant. Consider the following control law with both performance weights, pˆ , and robust weights, rˆ , contributing to the nonlinear compensation

u = −(x )pˆ − (x )rˆ − Gz,

(4)

where G is a positive control gain. The robust weights update with standard e-modification

rˆ˙ = β [ (x )z − νr |z|rˆ ], T

(5)

where β is positive adaptation gain and ν is a parameter chosen relatively large i.e. a conservative choice. The performance weights have a possible weight update T P˙ = β [ z − ν p |z|(a − pˆ )],

(6)

where a defines the supervisory weights provided by the computational (ad-hoc or intelligent) algorithm. Consider arranging the CMAC basis functions and weights in a one-input CMAC in terms of columns

⎡ ⎤ c0



(x )a = γ 0 (x )

γ 1 (x )

...

⎢ c 1 ⎥ ⎥ γ m (x ) ⎢ ⎣ .. ⎦,

(7)

. cm



c1

...

cq

T

.

For the supervisory weights we will simply impose an arbitrary bound, positive constant cmax , on each column norm such that

ci  ≤ cmax for i = 0, . . . , q.

(9)

Note, since the input only activates one column of weights at a time, update (10) only applies to a single value of i in real-time calculations. Thus, the possible performance weight update occurs only if the performance weights have not reached their imposed bound



cˆ˙ i, j =

0

if cˆ i  = cmax ,

P˙ j

otherwise.

(10)

Remark 1. Since the imposed saturation takes the form of a bound on the column norm of activated weights, one can use model estimate knowledge and knowledge of the trajectory to choose the bound. For instance, given estimate of the maximum values in the nonlinear function, f¯(x ), then one could choose cmax such that the following would always be true:

(x )c < f¯(x )∀x.

(11)

For a first-order system of form (3), an adaptive Lyapunov func˜ =w−w ˆ and p˜ = p − pˆ , provides a way tion, with weight errors w to analyze stability

V =

1 2 1 1 z + r˜ T r˜ + p˜ T p˜ . 2b β β

(12)

Taking the time derivative gives

V˙ =

1 1 z ( f (x ) + d + bu − x˙ des ) − r˜ T rˆ˙ − p˜ T pˆ˙ . b β β

(13)

which is in the standard form of Lyapunov derivative found with robust adaptive controls using leakage or e-modification, and thus we can use the tools from [27,28] for analysis. The robust CMAC can approximate the nonlinearities as in (2) all on its own

( f (x ) − x˙ des )/b = r + d,

(14)

so that the performance CMAC does not require an ideal output in the stability analysis i.e. p = 0. The time derivative becomes

where, for example, in Fig. 2 the vector γ 2 represents the activated basis functions and c2 the activated weights (thus the ordering in  and w would change every time the input transitions across a CMAC cell boundary). Let us use vector ci to indicate the activated column of weights in p as in Fig. 2

p = c0

Fig. 2. A one-input binary CMAC, arranging weights by columns, where column of weights c2 is defined by the set of activated weights and the other columns then follow (definitions of columns change each time there is a change in activated CMAC cell on a layer).

(8)

V˙ = z(r + p + d + u ) − r˜ T rˆ˙ /β − p˜ T pˆ˙ /β , T T = z(rˆ + pˆ + d + u ) + r˜ T ( z − rˆ˙ /β ) + p˜ T ( z − pˆ˙ /β ). (15)

ˆ and (10) for the pˆ Using control (4) and (5) for the updates to w updates results in Lyapunov derivative (with p = 0)

V˙ = −Gz2 + zd + νr |z|r˜ T r − νr |z|r˜ T r˜ + ν|z|p˜ T τ − ν p |z|p˜ T pˆ .

(16)

Please cite this article as: M. Razmi and C.J.B. Macnab, Near-optimal neural-network robot control with adaptive gravity compensation, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.026

ARTICLE IN PRESS

JID: NEUCOM 4

Since the weights p have initial conditions of zero, the column in pˆ cannot grow greater than cmax , and the total number of columns, nq, implies

pˆ  ≤ nqcmax .

(17)

Also, since p = 0 we have

p˜  ≤ 2nqcmax .

(18)

However, since the input trajectories will only cover a very small subspace of the total CMAC domain, the real bound on p˜  will be much smaller than this. Consider defining qi,activated as the total number of CMAC columns that have been activated on input i during the system’s lifespan, which implies n

qi,activated  nq.

(19)

It may be helpful for the reader, to understand predicted performance, to consider the case after convergence i.e. when the supervisory weights in a become constant. In this case any finite vector a can be assumed to provide p = a if one defines r using

f j (x ) =  j (x )(r + a ) + d (x ).

V˙ = zT (d − Kz ) + z(νr r˜ T r − νr r˜ T r˜ − ν p p˜ T p˜ ).

pmax =

qi,activated cmax ,

(20)

V˙ < z(dmax − Kmin z + νr r˜ r − νr r˜ 2 − ν p p˜ 2 ).

δz = δr =

+ 2ν p p2max − ν p p˜ 2 ).

(21)

Thus V˙ < 0 when either z > δ z or r˜  > δr or p˜  > δ p where

dmax νr r2 2ν p p2max = + + , Kmin 4Kmin Kmin

δr =

r 2

δp =

+

dmax

νp

dmax

νr

+

+

r2 4

dmax νr  r  2 + , Kmin 4Kmin

r 2



V˙ < |z|(−G|z| + dmax + νr r˜ r − νr r˜ 2

+

(22)

2ν p p2max

νr  r  2 2 , + 2Pmax 4ν p

νr

,

(23)

(24)

which implies there is a region B in (|z|, r˜ , p˜  ) space, outside of which V˙ < 0 (and inside of which, or on the surface, we cannot draw any conclusions about V˙ ). Theorem 1. For system (3) with applied control (4) depending on weight updates (5) and (10), the trajectory-tracking error z, robust weight error r˜ , and performance weight error p˜ will all be uniformly ultimately bounded (UUB). Proof. Since Lyapunov candidate (12) depends on (|z|, r˜ , p ) and is positive definite and V˙ (|z|, r˜ , p ) < 0 outside B, if the initial conditions (|z0 |, r˜ o, po ) are outside β then the trajectory (|z|, r˜ , p ) will enter β in finite time i.e. ultimately. Once the trajectory is in B, or if the initial conditions are inside B, then the trajectory cannot enter an area where V˙ < 0, and thus it cannot leave the smallest Lyapunov surface that encloses B given by

V (z, r˜ , p˜  ) = V (δz , δr , δ p ).

(25)

Since this ultimate bound does not depend on time it is uniform. Thus (z, r˜ , p ) is a uniformly ultimately bounded trajectory.  Remark 2. The original work in [29] resulted in pmax = 2mqn wmax where wmax was an imposed bound on each individual weight; this was impractical because the bound increased exponentially with the number of inputs. By bounding the magnitude of activated weight column vectors instead, the results here have the bound go up merely linearly with the number of inputs according to (19).

(28)

In this case

i=1

results in a guaranteed bound on the Lyapunov time derivative

(27)

Then

Then defining n

(26)

Then weight updates (5) for rˆ and updates (10) for elements in pˆ result in

i=1

δz

[m5G;January 29, 2020;15:40]

M. Razmi and C.J.B. Macnab / Neurocomputing xxx (xxxx) xxx

δp =

+

dmax

νp

dmax

νr

+

+

(29)

r2

νr  r  2 , 4ν p

4

,

(30)

(31)

and thus all signals are UUB with a considerably smaller ultimate bound than during training of supervisory term a. 3.2. Near-optimal control 3.2.1. Proposed algorithm for identifying the supervisory weights The supervisory weights a update off-line (in-between their CMAC cell activations) and they have two separate contributions: 1. The nonreactive weights g identify the best weights (found so far) for a nonreactive feedforward term, providing a bias to compensate for nonlinearities at the origin e.g. gravity compensation in a robot, 2. The reactive weights o then find the best additional value for the weights (found so far) for providing a near-optimal control about the origin, and start to be identified after training of the nonreactive weights has stopped. The measure of the “best weight found so far” in the training occurs by using a cost functional (CF) measured over four sequential cell activations on the same layer. The number four assumes that the inputs are correlated such that an oscillation traces out a planar ellipse in the multidimensional input space, and thus a small oscillation relative to the cell sizes goes through four cells. (If this does not describe the system inputs then one can simply measure the number of cells activated in a small oscillation, and use this number instead). Note the importance of capturing a full period of oscillation in the CF, otherwise the CF could indicate error in a local region is decreasing when a vibration is actually getting larger in amplitude. At the moment a cell becomes deactivated, the algorithm evaluates the CF for the previous four activated cells on the same layer and makes a decision about the weight in the 4thlast cell activated. If the measurement indicates the CF qualifies as the lowest found so far in the training for its associated cell (the 4th last), then the weight becomes our “best so far.” We note that penalizing control effort in a local CF would result in steady state error if the origin includes a bias (a nonzero nonlinearity at the origin). Thus, for identifying the nonreactive weights, defining the feedforward component, the error CF takes into account only state error. That is, each time the error CF reaches its lowest value so far the associated weight for that cell now becomes the best nonreactive weight found so far. A nonreactive weight’s training stops if the trajectory crosses the origin

Please cite this article as: M. Razmi and C.J.B. Macnab, Near-optimal neural-network robot control with adaptive gravity compensation, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.026

ARTICLE IN PRESS

JID: NEUCOM

[m5G;January 29, 2020;15:40]

M. Razmi and C.J.B. Macnab / Neurocomputing xxx (xxxx) xxx

5

Fig. 3. Block diagram of proposed method: solid lines represent control signals and dashed lines represent training/adaptation.

and we define the error CF for the ith activation of this cell as

Gi ( L ) =

1

Tj,k(i)−3

T j,k(i )+1

E dt,

(34)

T j,k(i )−3

where

E = h1 e1 + h2 e21 + h3 e22 .

(35)

We include a penalty on the reactive weights (thus penalizing further control effort) in the optimal CF

Oi ( L ) =

Fig. 4. Visualizing k(i) index (of physical memory address Lj,k(i) ) on the jth array of the CMAC. In a close-up of a trajectory oscillating through 4 cells we see that the first time the bold cell is activated i = 1 and the second time i = 2, whereas k counts the total number of cell activations on the array.

within the next 4 sequential cell activations on the same layer i.e. when it has moved the system “close enough” to the origin. Once the nonreactive weight stops training, the optimal CF starts to evaluate the performance, taking into account both state error and weight magnitude i.e. penalizing control effort. Every time the optimal CF reaches a new low value the algorithm identifies a new best reactive weight. The summation of nonreactive weights g and reactive weights o defines the supervisory term a (Fig. 3). 3.2.2. Defining the cost functionals Here we introduce appropriate notation for defining the two CFs. First consider Lj,k as the physical memory address of the kth cell activated sequentially the jth layer. It’s CMAC cell has been activated i times, and thus k() = k(i ) (Fig. 4). Note that for a small oscillation between exactly four cells we have L j,k(i ) = L j,k(i )−4 = L j,k(i−1 )+4 . The time Tj,k denotes the initial activation time for cell in location Lj,k , while T j,k+1 indicates the time of deactivation. Thus, a CF that includes the previous four activations on the same layer looks evaluates signals over time interval L j,k−3 to Lj,k

Tj,k−3 = Tj,k+1 − Tj,k−3 .

(32)

We can denote the 4th last cell, i.e. the one associated with the CF, as

L = L j,k(i )−3 ,

(33)

1

Tj,k(i)−3

T j,k(i )+1

T j,k(i )−3

(E + h4 o2 (L ))dt,

(36)

where h1 , h2 , h3 and h4 are positive constants. Qualitatively speaking, we may describe the e1 term as penalizing average error near the origin, e21 and e22 as penalizing both trajectory error and vibrations/overshoot, and o2 (L ) as penalizing weights/effort. 3.2.3. Describing the supervisory weights’ updates Here we introduce appropriate notation for defining the updates for the nonreactive and reactive weights. When the cell experiences activation, the update occurs for the cell at location L = L j,k(i )−3 . The algorithm may update the nonreactive weight g(L ) while the error has still not reached the origin within the 4 sequential cells. Then it may update the reactive weight o(L ). To measure if the trajectory has crossed the origin in 4 sequential cells, the algorithm keeps track of the number of sign changes using

bi ( L ) =



inf e1 (t )

t=t1 ...t2





sup e1 (t ) ,

t=t1 ...t2

(37)

where t1 = T j,k(i )−3 and t2 = T j,k(i )+1 . Then the indication that the trajectory has ever crossed the origin in 4 sequential cell activations is



Bi ( L ) =

1 if infs=1...i bs (L ) < 0, 0

otherwise.

(38)

The nonreactive weight update for the jth activated cell updates as

⎧ ⎪ ⎨ pˆ (L )

if Gi (L ) < infs=1...i−1 (Gs (L )) and Bi (L ) = 0 gi ( L ) = and o(L ) = 0, ⎪ ⎩ gi−1 (L ) otherwise.

(39)

Please cite this article as: M. Razmi and C.J.B. Macnab, Near-optimal neural-network robot control with adaptive gravity compensation, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.026

JID: NEUCOM 6

ARTICLE IN PRESS

[m5G;January 29, 2020;15:40]

M. Razmi and C.J.B. Macnab / Neurocomputing xxx (xxxx) xxx

In words, at the instant a cell experiences deactivation its nonreactive weight becomes the value of the performance weight if (1) the performance weight has scored the best on the error CF so far, (2) during the last activation the trajectory did not cross the origin within four subsequent cell activations after this cell, and (3) no reactive weight updates have occurred yet within this cell. Once the nonreactive weight stops updating, the reactive weight’s update becomes



oi ( L ) =

pˆ (L ) − gˆ(L ) if Oi (L ) < infs=1...i−1 (Os (L )) and Bi (L ) = 1, oi−1 (L ) otherwise.

(40)

In words, at the instant a cell experiences deactivation its reactive weight becomes the value of the performance weight if (1) the performance weight has scored the best on the optimal CF so far, and (2) the trajectory did cross the origin within four subsequent cell activations after this cell last time the cell was activated. Note that (40) subtracts gˆ from the total online weight because o describes the best additional weight (found so far) beyond that needed to create a nonreactive feedforward component. The total supervisory weight for this cell becomes

ai ( L ) = gi ( L ) + oi ( L ) .

Fig. 5. Quanser two-link flexible-joint rigid-link robot. Table 1 CMAC cell parameters.

(41)

Please see Appendix B for an algorithm written in pseudo-code that precisely describes computer implementation of the above equations.

Input

Our simulations and experiments validate the method, using a two-link flexible-joint robot tracking a Cartesian end-effector trajectory. An adaptive backstepping control make the proposed method applicable at each step of backstepping. The nonreactive weights will learn the gravity compensation and the reactive weights will fine-tune the tracking performance without excessive control effort. The control is adaptive and thus the CMAC learns completely on-line; all weights start with an initial condition of zero and repetitive trials of a trajectory occur in order to get convergence. The results are compared with LQR in order to give the reader a notion of the difficulty of controlling our highly flexible and nonlinear system; we do not suggest LQR controls are used in practice for such a system. For fair comparison to an existing nonlinear control method we simply compare to adaptive backstepping using the standard e-modification robust weight update, which does as well as any controller in the literature that does not require precise knowledge of the model (to our knowledge). 4.1. Simulations of flexible-joint robot Our experiment comes from Quanser (Fig. 5) and they provide specifications: link lengths l1 = 0.34, l2 = 0.26 m, distance from the first joint to the first link’s center of mass c1 = 0.16 m, distance from the second joint to the second link’s center of mass c2 = 0.055 m, link masses m1 = 1.5, m2 = 0.87 kg, link inertias I1 = 0.0392, I2 = 0.0081 kg m2 (where I2 includes motor 2), and joint stiffness (spring constants) K1 = 9 Nm/rad and K2 = 4 N m/rad. Spong’s model [30] with an assumption of large gear ratio gives

M(θ2 )θ¨ = − C(θ , θ˙ )θ˙ − Dθ θ˙ − K(θ − φ ),

(42)

Jφ¨ = − Dφ φ˙ − K(φ − θ ) + u,

(43)

where θ ∈ contains the link angles, φ ∈ contains rotor angles after gear reduction, M ∈ 2x2 is the inertia matrix, C ∈ 2x2 contains Coriolis and centripetal terms, J = diag(0.111, 0.2304 ) Kg m2 is the inertia of the rotors (after gear reduction), Dθ = 2x1

2x1

Max ◦

e1 e2

θ d , x3 θ˙ d , x4 θ¨des

4. Results

Min −100 −100◦ −900◦ −270◦ /s −1620◦ /s2



100 100◦ 900◦ 270◦ /s −1620◦ /s2

Q 10 10 10 10 10

diag(4.5, 0.5 ), Dφ = diag(0.0704, 0.0282 ) N m s/rad has damping coefficients, K = diag(5, 5 ) N m s/rad contains the joint stiffnesses, an u ∈ 2x1 is the motor torque. The form of the stiffness matrix is



K=

K1 0

0 K2



(44)

A derivation for an adaptive backstepping control of a flexible-joint robot appears in the Appendix. Since each step of the backstepping procedure has error dynamics analogous to (3), we simply apply the methods described in this paper at each step of backstepping independently resulting in three (virtual) controls and three adaptive CMACs. 4.2. Simulation In the simulation test the planar two-link flexible-joint robotic arm lies on a 10◦ slant from the horizontal. The robot carries a 1Kg a payload and the elbow-straight configuration results in natural frequencies

ω1 = 0.6 Hz, ω2 = 2.0 Hz. A disturbance torque at the second joint perturbs the system

  2π τ (t ) = 0.001 sin √ t , 2

(45)

in order to thoroughly test stability. The CMACs gets n = 8 inputs (desired trajectory positions, velocities, and accelerations as well as the auxiliary errors in z1 ), m = 100 layers, and q = 10 quantizations per input (Table 1). The adaptation gain is β = 10 and feedback gains are = G1 = G2 = G3 = diag(2, 2 ). The desired trajectory commands the end effector to follow a 3.6 cm × 3.6 cm square trajectory (X0 = 0.5 m, Y0 = 0 m), with each side defined by a 2 s constant desired acceleration followed

Please cite this article as: M. Razmi and C.J.B. Macnab, Near-optimal neural-network robot control with adaptive gravity compensation, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.026

ARTICLE IN PRESS

JID: NEUCOM

[m5G;January 29, 2020;15:40]

M. Razmi and C.J.B. Macnab / Neurocomputing xxx (xxxx) xxx

7

Fig. 6. Simulation training: the near-optimal control prevents weight drift without sacrificing performance. Table 2 Results on first trial: proposed method outperforms e-mod.

RMS error (mm) RMS torque (Nm)

Near Optimal

ν = 0.3

Stable e-mod

Unstable e-mod ν = 0.07

2.1 0.43

2.8 0.26

1.7 0.26

Fig. 7. Simulation performance on 80th trial: the near-optimal control outperforms stable e-modification by 85%.

Table 3 Results on 80th trial: proposed method even outperforms too-small emod.

RMS error (mm) RMS torque (N m)

Near Optimal

ν = 0.3

Stable e-mod

Unstable e-mod with ν = 0.07 (before bursting)

0.021 0.61

0.14 0.43

0.049 0.45

by 2 s of constant deceleration of 0.01 m/s2 . After the end of each trajectory trial the robot attempts to remain stationary for 4 seconds. The results with e-modification clearly show a trade-off between performance and stability. Think of this in terms of someone tuning a control system; if they reduce ν in a search for better performance then when the value reaches as low as ν = 0.07 bursting occurs after 80 trials due to the continued weight drift (Fig. 6, bottom graph dashed line), although the performance looked impressive at first (Fig. 6, top graph dashed line). Whereas if one chooses a conservative design that prevents weight drift altogether after 10 trials with a value of ν = 0.3 (Fig. 6, bottom graph dash– dot line), the performance becomes significantly worse (Fig. 6, top graph dash-dot line). On the other hand, the proposed method results in both high performance and weight convergence after 300 trials (Fig. 6, solid line). The proposed method outperforms stable e-modification by 25% on the very first attempt at the trajectory (Table 2), and by 85% on the 80th trial (Table 3. The most dramatic performance miscue for e-modification occurs at the bottom of the trajectory (Fig. 7, dashed line), where the effect of the e-mod trying to drive weights to zero results in a large steady-state error due to gravity. In contrast, our near-optimal method easily compensates for gravity and the error comes only from the difficulty of trajectory tracking (Fig. 7, solid line). The near-optimal method uses 40% more control effort than stable emodification (Table 3, but this is because of the appropriate grav-

Fig. 8. Simulation controls: the near-optimal control uses similar control effort as stable e-modification.

ity compensation effort, not due to increased vibrations or chatter (Fig. 8). 4.3. Experiment In the physical experiment, the two-link, flexible-joint robot’s tip attempts to follow a square trajectory, 20 cm per side, in 16 seconds. In the simulations, the real-time control calculations allowed a control rate of 1600 Hz when Matlab code is converted into C code. In the experiments the controller rate is 100 Hz i.e. more than enough time for the control calculations to occur within the interval. Using the proposed method, the RMS error converges to less than 2 mm within 30 trials and less than 1.6 mm by 200 trials without any bursting behaviour (Fig. 9, solid line). The weights converge after 200 trials, i.e. the algorithm prevents excessive weight drift, and thus we would not expect bursting to occur (Fig. 9, solid line). The performance looks dramatically better

Please cite this article as: M. Razmi and C.J.B. Macnab, Near-optimal neural-network robot control with adaptive gravity compensation, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.026

JID: NEUCOM 8

ARTICLE IN PRESS

[m5G;January 29, 2020;15:40]

M. Razmi and C.J.B. Macnab / Neurocomputing xxx (xxxx) xxx

To test the system, a highly elastic flexible-joint arm with significant gravitational force tries to follow a periodic trajectory while being subjected to a small sinusoidal disturbance. Using only one CMAC with a small e-modification term, the system at first converges to low error before bursting due to weight drift. Using a large e-modification term prevents drift but results in a large error. In contrast, the proposed method captures the weights at the best performance level (found so far) and uses those weights to guide the training to ensure continued low error with no weight drift; the proposed method outperforms standard e-modification by a factor by 85% without risk of weight-drift/bursting. In an experiment, the proposed method significantly outperforms a linear quadratic regulator. 5.1. Future work

Fig. 9. Experimental training: the near-optimal method prevents weight drift without sacrificing performance.

The main drawback to the current method is that learning simply stops at a “sweet spot,” freezing the cost-functional at the lowest point found so far and preventing it from going back up. Future work will look at further reducing the cost-functional after this point (getting closer to the optimal solution) by doing an online parameter search; the most logical approach is to define some long training-time intervals, and during each interval only one weight/cell on each CMAC layer is allowed to be trained and which one would switch each interval. In this way further training can continue without risk of weight drift and bursting, albeit at a much slower rate than in the initial training phase. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgment This research was funded by the Natural Sciences and Engineering Research Council of Canada through a Discovery Grant No. 04831-2019. Appendix A. Backstepping for flexible-joint robots

Fig. 10. Experimental performance: the near-optimal method significantly outperforms LQR control, where the LQR was designed for the no-payload case.

than an LQR control designed without accounting for the payload (Fig. 10) 5. Conclusions This paper proposes a new robust update modification for direct adaptive control using CMAC. The algorithm first tries to find the best set of weights for overcoming nonlinearities like gravity and driving the average error near the origin i.e. establishing a feedforward-like component. Then the algorithm looks for the best set of additional weights for achieving a near-optimal control about the origin i.e. finds a feedback-like component that minimizes extra control effort, fine-tunes performance, and provides robustness to disturbances. A Lyapunov framework results in guarantees of uniformly ultimately bounded signals with bounds comparable to those found with e-modification.

Here we present a method for neural-adaptive control of n-link flexile-joint robots from the second author’s previous work [31]. Unlike the neural-adaptive backstepping approaches in [32,33] which require high-gain robust control terms, or those proposed in [34–36] which require extra adaptive-parameters and/or neural-networks to deal with joint stiffnesses, our method uses only linear feedback control and only n CMAC neural networks at each step of backstepping i.e. it is completely analogous to rigidrobot control at each step of backstepping. If the state variables are x1 = θ , x2 = θ˙ , x3 = φ, x4 = φ˙ and desired trajectory is xd = [θ d θ˙ d θ¨ d ] then the errors are

e1 = x1 − θ d ,

e2 = x2 − θ˙ d ,

Using an auxiliary error provides a convenient way to achieve position and velocity tracking in the first step of backstepping

z1 = e1 + e2 ,

(46)

where  is positive-definite. We can write the system in terms of error dynamics

K−1 Mz˙ 1 =K−1 e2 − K−1 Cθ˙ − K−1 Dθ − x1 + x3 − K−1 Mθ¨ d ,

(47)

x˙ 3 =x4 ,

(48)

Please cite this article as: M. Razmi and C.J.B. Macnab, Near-optimal neural-network robot control with adaptive gravity compensation, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.026

ARTICLE IN PRESS

JID: NEUCOM

[m5G;January 29, 2020;15:40]

M. Razmi and C.J.B. Macnab / Neurocomputing xxx (xxxx) xxx

Appendix B. Control algorithm

Jx˙ 4 = − Dφ − K(x3 − x1 ) + u.

(49)

The errors for the second and third steps of backstepping are

z 2 = x 3 − α1 ,

(50)

z 3 = x 4 − α2 ,

(51)

Consider designing the virtual controls as

ˆ 1 − G1 z1 ), α1 =x1 − 1 (θ d , θ˙ d , θ¨ d , e1 , e2 )w

(52)

ˆ 2 − z1 α2 =x2 − 2 (θ d , θ˙ d , θ¨ d , θ d(3) , e1 , e2 , x3 )w − G2 z2 ,

At a single discrete-control instant the calculation (54) occurs according to the following Matlab pseudo-code:

inputs1=measurementsFromRobotJoints(); inputs2=desiredTrajecotory(time); For each step of backstepping: [gamma weights] = cmacOutput(inputs1, inputs2); cmacTraining(inputs1,inputs2,gamma, weights); virtualControl = controlLaw(inputs1, inputs2,gamma,weights);

(53)

The cmacTraining() function for updates (5) and (10) uses:

where G1 , G2 are positive-definite gain matrices. Note that virtual controls α1 , α2 are the desired values of the rotor angles and the rotor velocities, respectively. A suitable design for the control torques is then

For each CMAC layer: For each step of backstepping: For each joint: z = findBacksteppingError(inputs1, inputs2) r = r + dt∗ beta∗ ( gamma∗ z - abs(z)∗ robust_nu_optimal∗ r); p = p + dt∗ beta∗ ( gamma∗ z + abs(z)∗ nu_optimal∗ (b + g - r -p ));

(3 ) (4 ) ˆ 3 − z2 − G3 z3 . u = − 3 (θ d , θ˙ d , θ¨ d , θ d , θ d , e1 , e2 , x3 , x4 )w

(54)

To design the weight updates, let us use the robust weight update method called e-modification [28]

ˆ˙ k = β (Tk zk − νzw ˆ k) w

for k = 1, 2, 3.

(55)

In order to analyze stability, consider Lyapunov-like function 3 1 1 T 1 1  T ˜ k Kw ˜ k, V = zT1 K−1 z2 Kz2 + zT3 KJz3 + w 1 Mz1 + 2 2 2 2β

where

K = diag(K1 , . . . , K1 , K2 , . . . , K2 , . . . , . . . , Kn , . . . , Kn ), where n is the number if links and each Ki repeats N times, with N the number of weights in the CMAC. Adding the K term ensures that the (virtual) control design at each step of backstepping remains completely analogous to the design for a rigid robot i.e. with no extra neural-networks or adaptive parameters required to model the joint stiffnesses [31]. Taking the time derivative gives



z1 V˙ = − z2 z3

K 0 0

0 K 0

0 0 K

z1 + z2 z3

K 0 0

0 K 0

0 0 K

 T 

+ νz



˜ 1,k w ˜ 2,k w ˜ 3,k w

T 

K 0 0

G1 0 0



0 K 0

0 G2 0

0 0 G3

 1 + T1 2,k 3,k + T2   0 0 K

  z1 z2 z3

ˆ1 w ˆ2 , w ˆ3 w

where b, g, r, p are the best (optimal, reactive), gravity (nonreactive), robust, and performance weights respectively. We make the following algorithm (below) readable by

(56)

k=1

 T 

9

• not showing the loop subscripts for each variable e.g. possibleBestWeightj,k,i for the jthe layer, kth step of backstepping, and ith joint, • not showing initializations of variables, • not showing how to initialize variables for (and generally deal with) CMAC cells indexed for the very first time, • omitting the code used to find the variables reachedZero and notReachedZero, indicating whether the origin (zero error) is encountered within the next four CMAC cell activations (as determined from the previous trial). The pseudo-code for deciding whether to stop updating weights, Eqs. (34), (36), (39), and (40), is:

(57)

¯ Gz + zT K ¯ d + νzW ˜ T K¯ W ˆ, V˙ = − zT K k ¯ Gz + zT K ¯ d + νzW ˜ T K¯ Wk − νzW ˜ T K¯ W ˜ k, = − zT K k k ˜ k Wk  V˙ <z(−Kmin Gmin z + Kmax dmax,k + ν Kmax W ˜ k 2 ). − ν Kmin W

(58)

Then V˙ < 0 when either Wk  > δ w, K or z > δ z,K where

δw,K

Kmax Wk  = + 2Kmin

δz,K =



2 W 2 Kmax dmax,k Kmax k + , 2 ν Kmin 4Kmin

dmax,k Kmax K 2 νWk 2 + max , 2 Gmin Kmin 4Gmin Kmin

(59)

(60)

which implies all signals are UUB. Moreover, each step of backstepping takes the form of (3), and at each step of backstepping we can apply (55) for the robust CMAC and (10) for the performance CMAC to test the proposed method on the flexible-joint robot.

with error summations over the 4 subsequent cell activations on each layer calculated immediately after:

Please cite this article as: M. Razmi and C.J.B. Macnab, Near-optimal neural-network robot control with adaptive gravity compensation, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.026

JID: NEUCOM 10

ARTICLE IN PRESS

[m5G;January 29, 2020;15:40]

M. Razmi and C.J.B. Macnab / Neurocomputing xxx (xxxx) xxx

References [1] C. Macnab, Preventing bursting in approximate-adaptive control when using local basis functions, Fuzzy Sets Syst. 160 (2009) 439–462. [2] M. Abdelhameed, U. Pinspon, S. Cetinkunt, Adaptive learning algorithm for CMAC, Mechatronics 12 (2002) 859–873. [3] S.-K. Wang, J.-Z. Wang, D.-W. Shi, CMAC-based compound control of hydraulically driven 6-DOF parallel manipulator, J. Mech. Sci. Technol. 25 (6) (2011) 1595–1602. [4] L. Kraft, J.J. Pallotta, Real-time vibration control using CMAC neural networks with weightsmoothing, in: Proceedings of the IEEE American Control Conference, Chicago, 20 0 0, pp. 3939–3943. [5] K. Masaud, C. Macnab, Preventing bursting in adaptive control using an introspective neural network algorithm, Neurocomputing 136 (2014) 300314. [6] C. Nicol, C. Macnab, A. Ramirez-Serrano, Robust adaptive control of a quadrotor helicopter, Mechatronics 21 (6) (2011) 927–938. [7] P.E. Almeida, M.G. Simoes, Neural optimal control of PEM fuel cells with parametric CMAC networks, IEEE Trans. Indust. Appl. 41 (1) (2005) 237–245. [8] H. Modares, F.L. Lewis, M.-B. Naghibi-Sistani, Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks, IEEE Trans. Neural Netw. Learn. Syst. 24 (10) (2013) 1513–1525. [9] D. Wang, D. Liu, H. Li, H. Ma, Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming, Inf. Sci. 282 (2014) 167–179. [10] H. Zhang, C. Qin, Y. Luo, Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming, IEEE Trans. Autom. Sci. Eng. 11 (3) (2014) 839–849. [11] Y.-J. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete-time systems with dead-zone, IEEE Trans. Fuzzy Syst. 24 (1) (2016) 16–28. [12] H. Xu, S. Jagannathan, Neural network-based finite horizon stochastic optimal control design for nonlinear networked control systems, IEEE Trans. Neural Netw. Learn. Syst. 26 (3) (2015) 472–485. [13] H. Su, H. Zhang, Y. Liang, Y. Mu, Online event-triggered adaptive critic design for non-zero-sum games of partially unknown networked systems, Neurocomputing (2019), doi:10.1016/j.neucom.2019.07.029. [14] Y. Zhang, S. Li, X. Liu, Adaptive near-optimal control of uncertain systems with application to underactuated surface vessels, IEEE Trans. Control Syst. Technol. 26 (4) (2018) 1204–1218. [15] S. Dutta, P.K. Patchaikani, L. Behera, Near-optimal controller for nonlinear continuous-time systems with unknown dynamics using policy iteration, IEEE Trans. Neural Netw. Learn. Syst 27 (7) (2016) 1537–1549. [16] H. Zhang, L. Cui, Y. Luo, Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network adp, IEEE Trans. Cybern. 43 (1) (2013) 206–216. [17] Y. Yang, K.G. Vamvoudakis, H. Ferraz, H. Modares, Dynamic intermittent q-learning–based model-free suboptimal co-design of-stabilization, Int. J. Robust Nonlinear Control 29 (9) (2019) 2673–2694. [18] C. Sun, W. He, W. Ge, C. Chang, Adaptive neural network control of biped robots, IEEE Trans. Syst. Man Cybern. Syst. 47 (2) (2017) 315–326. [19] H. Zhang, M. Du, G. Wu, W. Bu, Pd control with RBF neural network gravity compensation for manipulator, Eng. Lett. 26 (2) (2018) 236–244. [20] J. Fujishiro, Y. Fukui, T. Wada, Finite-time pd control of robot manipulators with adaptive gravity compensation, in: Proceedings of the 2016 IEEE Conference on Control Applications (CCA), IEEE, 2016, pp. 898–904. [21] Y. Huang, Z. Li, Z. Huang, Q. Huang, Pd-type control with neural-network-based gravity compensation for compliant joint robots, in: Proceedings of the 2015 IEEE International Conference on Mechatronics and Automation (ICMA), IEEE, 2015, pp. 831–836.

[22] J. Albus, A new approach to manipulator control: the cerebellar model articulation controller (CMAC), J. Dyn. Sys. Meas. Contr. 97 (1975a) 220–227. [23] J. Albus, Data storage in the cerebellar model articulation controller (CMAC), J. Dyn. Sys. Meas. Contr. 97 (1975b) 228–233. [24] W.T. Miller, R.P. Hewes, F.H. Glanz, L.G. Kraft, Real-time dynamic control of an industrial manipulator using a neural network-based learning controller, IEEE Trans. Robot. Autom. 6 (1) (1990) 1–9. [25] D. Chen, P. Xu, R. Zhou, X. Ma, A CMAC-PID based on pitch angle controller for direct drive permanent magnet synchronous wind turbine, J. Vib. Control 22 (6) (2014) 1657–1666. [26] P. Zhang, B. Li, G. Du, An adaptive human-robot system using CMAC and over damping, in: Proceedings of the IEEE International Conference Cyber Technology in Automation, Control, and Intelligent Systems, Shenyang, 2015, pp. 835–840. [27] P. Ioannuou, P. Kokotovic, Instability analysis and improvement of robustness of adaptive control, Automatica 20 (5) (1984) 583–594. [28] K. Narendra, A. Annaswamy, A new adpative law for robust adaptation without persistant excitation, IEEE Trans. Automat. Contr. AC-32 (2) (1987) 134–145. [29] C. Macnab, Finding a near-optimal neural-adaptive control solution without increasing the training time, in: Proceedings of the 2017 IEEE Fifty-sixth Annual Conference on Decision and Control (CDC), IEEE, 2017, pp. 3316–3323. [30] M. Spong, M. Vidyasagar, Robot Dynamics and Control, John Wiley and Sons, New York, 1989. [31] C. Macnab, Neural-adaptive backstepping for flexible-joint robots with neither extra parameters, extra networks, nor robust terms, in: Proceedings of the IEEE International Conference on Industrial Technology (ICIT), IEEE, Toronto, 2017, pp. 854–859. [32] C. Kwan, F. Lewis, Robust backstepping control of nonlinear systems using neural networks, IEEE Trans. Syst. Man Cybern. A 30 (20 0 0) 753–766. [33] C. Kwan, F. Lewis, Robust neural network control of rigid link flexible-joint robots, Asian J. Control 1 (1999) 188–197. [34] W. Chatlatanagulchai, P.H. Meckl, Intelligent control of a two-link flexible-joint robot, using backstepping, neural networks, and direct method, in: Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Edmonton, 2005, pp. 1594–1599. [35] Y.-C. Chang, J. Shaw, A regressor free adaptive backstepping design of flexible joint robot based on function approximation technique, in: Proceedings of the 2011 First International Conference on Robot, Vision and Signal Processing, IEEE, Kaohsiung, Taiwan, 2011, pp. 130–136. [36] W. Chatlatanagulchai, P.H. Meckl, Motion control of two-link flexible-joint robot, using backstepping, neural networks, and indirect method, in: Proceedings of 2005 IEEE Conference on Control Applications, 2005. CCA 2005., IEEE, Toronto, 2005, pp. 601–605. Mohammadsaleh Razmi received a B.Sc. from Amirkabir University of Technology, Tehran, Iran in 2016 and an M.Sc. in electrical engineering from the University of Calgary in 2019. He received a transformative talent award for his period of fourmonth internship in robotics engineering at Engineering Services Inc., Toronto, Ontario, Canada. He currently works as a research assistant at Project neuroArm in the Cumming School of Medicine, Calgary, Alberta, Canada. His research interests include robotics, adaptive control, and artificial intelligence.

Chris Macnab received her B.Eng. from the Royal Military College of Canada in Engineering Physics in 1993. She received a Ph.D. from the University of Toronto Institute for Aerospace Studies in 1999. Her topic was stable neuralnetwork control of space manipulators with joint flexibility. She is currently an associate professor in the Department of Electrical and Computer Engineering at the University of Calgary. Her research is focused on achieving robust stable neural-network control, with applications including flexible-joint robots, quadrotor helicopters, haptic teleoperation, and wastewater treatment.

Please cite this article as: M. Razmi and C.J.B. Macnab, Near-optimal neural-network robot control with adaptive gravity compensation, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.01.026